Browse
No description available
Ask AI about Browse
Powered by Claude Β· Grounded in docs
I know everything about Browse. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
@ulpi/browse
Fast headless browser and native app automation CLI for AI coding agents. Persistent Chromium daemon via Playwright, ~100ms per command. Automate Android, iOS, and macOS apps through the same interface.
Installation
Global Installation (recommended)
npm install -g @ulpi/browse
Requires Node.js 18+. Chromium is installed automatically via Playwright on first npm install. If Bun is installed, browse automatically uses it for ~2x faster command execution.
Project Installation (local dependency)
npm install @ulpi/browse
Then use via package.json scripts or by invoking browse directly.
From Source
git clone https://github.com/ulpi-io/browse
cd browse
npm install
npx tsx src/cli.ts goto https://example.com # Dev mode
npm run build # Build bundle
Quick Start
browse goto https://example.com
browse snapshot -i # Get interactive elements with refs
browse click @e2 # Click by ref from snapshot
browse fill @e3 "test@example.com" # Fill input by ref
browse text # Get visible page text
browse screenshot page.png
browse stop
The Ref Workflow
Every snapshot assigns refs (@e1, @e2, ...) to elements. Use refs as selectors in any command β no CSS selector construction needed:
$ browse snapshot -i
@e1 [button] "Submit"
@e2 [link] "Home"
@e3 [textbox] "Email"
$ browse click @e1 # Click the Submit button
Clicked @e1
$ browse fill @e3 "user@example.com" # Fill the Email field
Filled @e3
Traditional Selectors (also supported)
browse click "#submit"
browse fill ".email-input" "test@example.com"
browse click "text=Submit"
Commands
Navigation
browse goto <url> # Navigate to URL
browse back # Go back
browse forward # Go forward
browse reload # Reload page
browse url # Get current URL
Content Extraction
browse text # Visible text (clean, no DOM mutation)
browse html [sel] # Full HTML or element innerHTML
browse links # All links as "text -> href"
browse forms # Form structure as JSON
browse accessibility # Raw ARIA snapshot tree
browse schema # JSON-LD, Microdata, RDFa structured data
browse meta # Page meta tags (title, description, OG, canonical, hreflang)
browse headings # H1-H6 heading hierarchy with counts
Interaction
browse click <sel> # Click element
browse rightclick <sel> # Right-click element (context menu)
browse dblclick <sel> # Double-click element
browse fill <sel> <val> # Clear and fill input
browse select <sel> <val> # Select dropdown option
browse hover <sel> # Hover element
browse focus <sel> # Focus element
browse tap <sel> # Tap element (requires touch context via emulate)
browse check <sel> # Check checkbox
browse uncheck <sel> # Uncheck checkbox
browse type <text> # Type text via keyboard (current focus)
browse press <key> # Press key (Enter, Tab, etc.)
browse keydown <key> # Key down event
browse keyup <key> # Key up event
browse keyboard inserttext <text> # Insert text without key events
browse scroll [sel|up|down] # Scroll element into view or direction
browse scrollinto <sel> # Scroll element into view (explicit)
browse swipe <dir> [px] # Swipe up/down/left/right (touch events)
browse drag <src> <tgt> # Drag and drop
browse highlight <sel> # Highlight element with visual overlay
browse download <sel> [path] # Download file triggered by click
browse upload <sel> <files...> # Upload files to input
Mouse Control
browse mouse move <x> <y> # Move mouse to coordinates
browse mouse down [button] # Press mouse button (left/right/middle)
browse mouse up [button] # Release mouse button
browse mouse wheel <dy> [dx] # Scroll wheel
Settings
browse set geo <lat> <lng> # Set geolocation
browse set media <scheme> # Set color scheme (dark/light/no-preference)
Wait
browse wait <selector> # Wait for element
browse wait <selector> --state hidden # Wait for element to disappear
browse wait <ms> # Wait for milliseconds
browse wait --url <pattern> # Wait for URL
browse wait --text "Welcome" # Wait for text to appear in page
browse wait --fn "js expr" # Wait for JavaScript condition
browse wait --load <state> # Wait for load state (load/domcontentloaded/networkidle)
browse wait --network-idle # Wait for network idle
browse wait --download [path] # Wait for download to complete
Snapshot
browse snapshot # Full accessibility tree
browse snapshot -i # Interactive elements only (terse flat list)
browse snapshot -i -f # Interactive elements, full indented tree
browse snapshot -i -C # Include cursor-interactive elements (onclick, cursor:pointer)
browse snapshot -V # Viewport only β elements visible on screen
browse snapshot -c # Compact β remove empty structural elements
browse snapshot -d 3 # Limit depth to 3 levels
browse snapshot -s "#main" # Scope to CSS selector
browse snapshot -i -c -d 5 # Combine options
| Flag | Description |
|---|---|
-i | Interactive elements only (buttons, links, inputs) β terse flat list |
-f | Full β indented tree with props and children (use with -i) |
-V | Viewport β only elements visible in current viewport |
-c | Compact β remove empty structural elements |
-C | Cursor-interactive β detect divs with cursor:pointer, onclick, tabindex |
-d N | Limit tree depth |
-s <sel> | Scope to CSS selector |
The -C flag catches modern SPA patterns that ARIA trees miss β <div onclick>, cursor: pointer, tabindex, and data-action elements.
Find Elements
browse find role <role> [name] # By ARIA role
browse find text <text> # By text content
browse find label <label> # By label
browse find placeholder <placeholder> # By placeholder
browse find testid <id> # By data-testid
browse find alt <text> # By alt text
browse find title <text> # By title attribute
browse find first <sel> # First matching element
browse find last <sel> # Last matching element
browse find nth <n> <sel> # Nth matching element (0-indexed)
Inspection
browse js <expr> # Evaluate JavaScript expression
browse eval <file> # Evaluate JavaScript file
browse css <sel> <prop> # Get computed CSS property
browse attrs <sel> # Get element attributes as JSON
browse element-state <sel> # Element state (visible, enabled, checked, etc.)
browse value <sel> # Get input/select value
browse count <sel> # Count elements matching selector
browse box <sel> # Get bounding box as JSON {x, y, width, height}
browse clipboard [write <text>] # Read or write clipboard
browse console [--clear] # Console log buffer
browse errors [--clear] # Page errors only (filtered from console)
browse network [--clear] # Network request buffer
browse cookies # Browser cookies as JSON
browse storage [set <k> <v>] # localStorage/sessionStorage
browse perf # Navigation timing (dns, ttfb, load)
browse devices [filter] # List available device names
browse images [sel] [--inline] # List page images with src, alt, dimensions
Visual
browse screenshot [path] # Take screenshot (viewport)
browse screenshot --full [path] # Full-page screenshot
browse screenshot <sel|@ref> [path] # Screenshot specific element
browse screenshot --clip x,y,w,h [path] # Screenshot clipped region
browse screenshot --annotate [path] # Annotated screenshot with numbered labels
browse pdf [path] # Save page as PDF
browse responsive [prefix] # Mobile/tablet/desktop screenshots
Compare
browse diff <url1> <url2> # Text diff between two pages
browse snapshot-diff # Diff current vs last snapshot
browse screenshot-diff <baseline> [current] # Pixel-level visual diff
Tabs
browse tabs # List all tabs
browse tab <id> # Switch to tab
browse newtab [url] # Open new tab
browse closetab [id] # Close tab
Frames
browse frame <sel> # Switch to iframe
browse frame main # Back to main frame
Device Emulation
browse emulate "iPhone 14" # Emulate device
browse emulate reset # Reset to desktop (1920x1080)
browse devices # List all available devices
browse devices iphone # Filter device list
browse viewport 1280x720 # Set viewport size
100+ devices: iPhone 12β17, Pixel 5β7, iPad, Galaxy, and all Playwright built-ins.
Cookies
browse cookie <name>=<value> # Set cookie (simple)
browse cookie set <n> <v> [--domain --secure ...] # Set cookie with options
browse cookie clear # Clear all cookies
browse cookie export <file> # Export cookies to JSON
browse cookie import <file> # Import cookies from JSON
browse cookies # Read all cookies
Network
browse route <pattern> block # Block matching requests
browse route <pattern> fulfill <status> [body] # Mock response
browse route clear # Remove all routes
browse offline [on|off] # Toggle offline mode
browse header <name>:<value> # Set extra HTTP header
browse useragent <string> # Set user agent
Dialogs
browse dialog # Last dialog info
browse dialog-accept [text] # Accept next dialog (optional prompt text)
browse dialog-dismiss # Dismiss next dialog
Recording
browse har start # Start HAR recording
browse har stop [path] # Stop and save HAR file
browse video start [dir] # Start video recording (WebM)
browse video stop # Stop recording
browse video status # Check recording status
browse record start # Record browsing commands as you go
browse record stop # Stop recording
browse record status # Check recording status
browse record export browse [path] # Export as chain-compatible JSON (replay with browse chain)
browse record export flow [path] # Export as YAML flow (replay with browse flow)
browse record export replay [path] # Export as Chrome DevTools Recorder (browser only)
browse record export playwright [path] # Export as Playwright Test (browser only)
browse record export replay --selectors css,aria [path] # Filter selector types in export
React DevTools
browse react-devtools enable # Enable (downloads hook on first use)
browse react-devtools tree # Component tree
browse react-devtools props <sel> # Props/state of component
browse react-devtools suspense # Suspense boundary status
browse react-devtools errors # Error boundaries
browse react-devtools profiler # Render timing
browse react-devtools hydration # Hydration timing
browse react-devtools renders # What re-rendered
browse react-devtools owners <sel> # Parent component chain
browse react-devtools context <sel> # Context values
browse react-devtools disable # Disable
Performance Audit
browse perf-audit [url] # Full performance audit with actionable report
browse perf-audit [url] --no-coverage # Skip JS/CSS coverage (faster)
browse perf-audit [url] --no-detect # Skip stack detection
browse perf-audit [url] --json # Structured JSON output
browse perf-audit save [name] # Save audit report for later comparison
browse perf-audit compare <base> [curr] # Compare saved baseline vs current or saved audit
browse perf-audit list # List saved audit reports
browse perf-audit delete <name> # Delete a saved audit
browse detect # Tech stack fingerprint (frameworks, SaaS, CDN, infra)
browse coverage start # Start JS/CSS code coverage collection
browse coverage stop # Stop and report per-file used/unused bytes
browse initscript set <code> # Inject JS before every page load
browse initscript show # Show current init script
browse initscript clear # Remove init script
perf-audit runs a complete performance analysis in one command:
- Core Web Vitals β LCP, CLS, TBT, FCP, TTFB, INP with Google's good/needs-improvement/poor thresholds
- LCP Analysis β identifies the LCP element, its network entry, render-blocking chain, and critical path
- Layout Shift Attribution β each shift traced to font swap, missing image dimensions, or dynamic content
- Long Task Attribution β maps blocking JS to source scripts and domains with per-domain TBT
- Resource Breakdown β JS/CSS/images/fonts/API categorized with sizes and largest files
- Render-Blocking Detection β sync scripts and blocking stylesheets in
<head> - Image Audit β format (JPEG vs WebP), missing dimensions, missing lazy-load, missing fetchpriority, oversized images, srcset usage
- Font Audit β per-font font-display value, preload status, FOIT/FOUT risk
- DOM Complexity β node count, max depth, largest subtree (flags >1,500 and >3,000 thresholds)
- Stack Detection β 108 frameworks (React, Vue, Angular, Next.js, Nuxt, Laravel, WordPress, Magento, etc.), 55 SaaS platforms (Shopify, Wix, Squarespace, etc.), CDN, protocol, compression, caching
- Third-Party Impact β per-domain inventory with size, request count, and category (analytics/ads/social/chat/monitoring)
- Coverage β per-file JS/CSS used vs unused bytes
- Correlation Engine β connects LCP to blocking CSS, Long Tasks to scripts, CLS to font swaps, fonts to FCP blocking
- Recommendations β prioritized, data-driven action items (platform-specific when SaaS detected)
$ browse perf-audit https://example.com --no-coverage
Core Web Vitals:
TTFB 580ms good
FCP 696ms good
LCP 696ms good
CLS 0.015 good
TBT 599ms needs improvement
LCP Analysis:
Element: <img src='hero.webp'>
Critical path: TTFB(580ms) -> CSS(styles.css) -> JS(vendor.js) -> Image(hero.webp) -> LCP(696ms)
DOM Complexity:
Total nodes: 4,476
WARNING: exceeds 3,000 threshold (poor)
Top Recommendations:
1. Add fetchpriority="high" to LCP image
2. Add font-display:swap to fallback fonts (FOIT risk)
3. Lazy-load YouTube embeds (click-to-play facade)
Audit completed in 13.2s (reload: 10.0s, settle: 3.0s, collect: 41ms, detection: 75ms)
detect gives a quick stack fingerprint without the full audit:
$ browse detect
Stack:
meta-framework Next.js (production), router: app, rsc: true
css-framework Tailwind CSS
build-tool Turbopack
Infrastructure:
CDN: Amazon CloudFront
Protocol: h2 (64%)
Cache rate: 74% (134/180)
DNS origins: 24 unique (15 missing preconnect)
DOM: 4,476 nodes, depth 23
Third-Party (4.4MB total):
www.youtube.com 3.0MB 45 reqs video
www.googletagmanager.com 331KB 3 reqs analytics
connect.facebook.net 214KB 2 reqs ads
Handoff (Human Takeover)
browse handoff [reason] # Swap to Chrome for CAPTCHA/MFA/OAuth (falls back to Chromium)
browse handoff --chromium # Force Playwright Chromium instead of Chrome
browse resume # Swap back to headless, returns fresh snapshot
Handoff defaults to your system Chrome (bypasses Turnstile and bot detection). Falls back to Playwright Chromium if Chrome is not installed. Agent asks permission first via AskUserQuestion, then hands off. Server auto-suggests handoff after 3 consecutive failures.
Cloud Providers
browse provider save browserbase <api-key> # Save API key (encrypted)
browse provider save browserless <token> # Save token (encrypted)
browse --provider browserbase goto https://... # Use cloud browser
browse provider list # List saved providers
browse provider delete <name> # Remove saved key
API keys are encrypted at rest in .browse/providers/ β never visible to agents.
State & Auth
browse state save [name] # Save cookies + localStorage
browse state load [name] # Restore saved state
browse state list # List saved states
browse state show [name] # Show state details
browse auth save <name> <url> <user> <pass> # Save encrypted credential
browse auth save <name> <url> <user> --password-stdin # Password from stdin
browse auth login <name> # Auto-login with saved credential
browse auth list # List saved credentials
browse auth delete <name> # Delete credential
browse cookie-import --list # List browsers with cookies
browse cookie-import chrome [--domain .example.com] # Import cookies from Chrome
browse cookie-import chrome --profile "Profile 1" # Specific browser profile
Multi-Step (Chaining)
Execute a sequence of commands in one call:
echo '[["goto","https://example.com"],["snapshot","-i"],["text"]]' | browse chain
Server Control
browse status # Server health report
browse instances # List all running browse servers
browse version # Print CLI version
browse doctor # System check (Node, Playwright, Chromium)
browse upgrade # Self-update via npm
browse stop # Stop server
browse restart # Restart server
browse inspect # Open DevTools (requires BROWSE_DEBUG_PORT)
Setup
browse install-skill [path] # Install Claude Code skill
Sessions
Run multiple AI agents in parallel, each with isolated browser state, sharing one Chromium process:
# Agent A
browse --session agent-a goto https://site-a.com
browse --session agent-a snapshot -i
browse --session agent-a click @e3
# Agent B (simultaneously)
browse --session agent-b goto https://site-b.com
browse --session agent-b snapshot -i
browse --session agent-b fill @e2 "query"
# Or set once via env var
export BROWSE_SESSION=agent-a
browse text
Each session has its own:
- Browser context (cookies, storage, cache)
- Tabs and navigation history
- Refs from snapshots
- Console and network buffers
browse sessions # List active sessions
browse session-close agent-a # Close a session
browse status # Shows total session count
Sessions auto-close after the idle timeout (default 30 min). Without --session, everything runs in a "default" session.
For full process isolation (separate Chromium instances), use BROWSE_PORT to run independent servers.
Profiles vs Sessions
--session | --profile | |
|---|---|---|
| Chromium | Shared (one process) | Own (one per profile) |
| Memory | ~5MB per session | ~200MB per profile |
| State | Ephemeral (auto-persisted cookies) | Full persistence (cookies, cache, IndexedDB) |
| Multiplexing | Yes (parallel agents) | No (one agent per profile) |
| Use case | Parallel browsing, lightweight | Real login state, heavy |
Native App Automation
Automate Android, iOS, and macOS apps through the same CLI and ref workflow:
Enable Platforms
browse enable android # Installs adb, JDK, Android SDK, emulator, system image, driver APK
browse enable ios # Builds iOS runner (requires Xcode)
browse enable macos # Builds macOS AX bridge (requires Xcode CLI tools)
browse enable all # Enable all platforms
Each enable command installs all dependencies and builds the native driver for that platform. Run once β everything is cached for future use.
Simulator/Emulator Lifecycle
browse sim start --platform ios --app com.apple.Preferences --visible
browse sim start --platform android --app com.android.settings --visible
browse sim stop --platform ios
browse sim stop --platform android
browse sim status --platform ios
browse sim status --platform android
sim startboots the simulator/emulator, launches the target app, and starts the automation driver--visibleopens the simulator/emulator window (default: headless)- Switching
--appon a running simulator reconfigures the target without rebooting - Auto-install: Android
sim startautomatically installsadb, Java, Android SDK, system image, and emulator via Homebrew if missing
Android
browse sim start --platform android --app com.android.settings --visible
browse --platform android --app com.android.settings snapshot -i
browse --platform android --app com.android.settings tap @e3
browse --platform android --app com.android.settings swipe up
browse --platform android --app com.android.settings press back
browse --platform android --app com.android.settings text
browse --platform android --app com.android.settings screenshot app.png
browse sim stop --platform android
Auto-installs the full toolchain on first use (adb, JDK 21, Android SDK, emulator, system image, AVD). No manual setup required.
browse doctor --platform android # Check setup
iOS
browse sim start --platform ios --app com.apple.Preferences --visible
browse --platform ios --app com.apple.Preferences snapshot -i
browse --platform ios --app com.apple.Preferences tap @e2
browse --platform ios --app com.apple.Preferences swipe up
browse --platform ios --app com.apple.Preferences press home
browse --platform ios --app com.apple.mobilesafari snapshot -i # Switch app
browse sim stop --platform ios
Requires: Xcode. Simulator boots automatically.
macOS
browse --app "System Settings" snapshot -i
browse --app "System Settings" tap @e5
browse --app "System Settings" swipe up
browse --app TextEdit type "Hello"
browse --app TextEdit press "cmd+n" # Modifier combos supported
Requires: macOS, Accessibility permission granted to the terminal.
Platform Flags
| Flag | Description |
|---|---|
--platform android|ios|macos | Target platform (default: browser) |
--app <name> | App package name (Android), bundle ID (iOS), or process name (macOS) |
--device <serial> | Device serial (Android), simulator UDID (iOS) |
--visible | Show simulator/emulator window (default: headless) |
Unified Command Surface
All platforms support the same commands: snapshot, text, tap, fill, type, press, swipe, screenshot. The @ref workflow is identical β snapshot -i assigns refs, then tap @e1, fill @e2 "text", etc. Commands requiring browser capabilities (navigation, tabs, JavaScript) are blocked with clear errors on app targets.
Workflow Commands
Flows
browse flow run.yaml # Execute YAML automation script
browse flow save login-flow # Save current recording as named flow
browse flow run login-flow # Execute saved flow
browse flow list # List saved flows
browse retry # Retry command with backoff (browser only)
browse watch # Watch DOM element for changes (browser only)
Flows work on all platforms (browser, Android, iOS, macOS). Each flow step goes through the executeCommand() pipeline β capability-gated per target. Recording captures individual flow steps, not the flow wrapper.
Browser-only workflow commands: retry, watch, har start/stop, video start/stop
Browser-only export formats: record export replay, record export playwright
Assertions
browse expect "text 'Welcome'" # Assert text exists on page
browse expect "count .item > 3" # Assert element count
browse expect "url contains /dashboard" # Assert URL
browse expect "title 'My App'" # Assert page title
SDK Mode
import { createBrowser } from '@ulpi/browse/sdk';
const browser = await createBrowser();
await browser.goto('https://example.com');
const text = await browser.text();
await browser.close();
Custom Extensibility
Extend browse with project-local JSON/YAML configuration in .browse/:
Custom Audit Rules
.browse/rules/my-rules.json:
[
{ "kind": "metric-threshold", "metric": "lcp", "max": 2000, "severity": "critical" },
{ "kind": "selector-count", "selector": "img:not([alt])", "max": 0, "severity": "warning" }
]
Custom Detection Signatures
.browse/detections/my-framework.json:
[
{ "name": "MyFramework", "detect": "!!window.__MY_FRAMEWORK__", "versionExpr": "window.__MY_FRAMEWORK__.version", "category": "custom" }
]
Project Config
browse.json:
{
"detectionPaths": [".browse/detections"],
"rulePaths": [".browse/rules"],
"flowPaths": [".browse/flows"],
"startupFlows": ["setup.yaml"]
}
Security
All security features are opt-in β existing workflows are unaffected until you explicitly enable a feature.
Domain Allowlist
Restrict navigation and sub-resource requests to trusted domains:
browse --allowed-domains "example.com,*.example.com" goto https://example.com
# Or via env var
BROWSE_ALLOWED_DOMAINS="example.com,*.api.io" browse goto https://example.com
Blocks HTTP requests, WebSocket, EventSource, and sendBeacon to non-allowed domains. Wildcards like *.example.com match the bare domain and all subdomains.
Action Policy
Gate commands with a browse-policy.json file:
{ "default": "allow", "deny": ["js", "eval"], "confirm": ["goto"] }
Precedence: deny > confirm > allow > default. Hot-reloads on file change β no server restart needed.
Credential Vault
Encrypted credential storage (AES-256-GCM). The LLM never sees passwords:
echo "mypassword" | browse auth save github https://github.com/login myuser --password-stdin
browse auth login github # Auto-navigates, detects form, fills + submits
browse auth list # List saved credentials (no passwords shown)
Key is auto-generated at .browse/.encryption-key or set via BROWSE_ENCRYPTION_KEY.
Content Boundaries
Wrap page output in CSPRNG nonce-delimited markers so LLMs can distinguish tool output from untrusted page content:
browse --content-boundaries text
JSON Output
Machine-readable output for agent frameworks:
browse --json snapshot -i
# Returns: {"success": true, "data": "...", "command": "snapshot"}
Configuration
Create a browse.json file at your project root to set persistent defaults:
{
"session": "my-agent",
"json": true,
"contentBoundaries": true,
"allowedDomains": ["example.com", "*.api.io"],
"idleTimeout": 3600000,
"viewport": "1280x720",
"device": "iPhone 14",
"runtime": "playwright",
"detectionPaths": [".browse/detections"],
"rulePaths": [".browse/rules"],
"flowPaths": [".browse/flows"],
"startupFlows": ["setup.yaml"]
}
CLI flags and environment variables override config file values.
Usage with AI Agents
Claude Code (recommended)
Install as a Claude Code skill via skills.sh:
npx skills add https://github.com/ulpi-io/skills --skill browse
Or install directly:
browse install-skill
Both copy the skill definition to .claude/skills/browse/SKILL.md and add all browse commands to permissions β no more approval prompts.
CLAUDE.md / AGENTS.md
Add to your project instructions:
## Browser Automation
Use `browse` for web automation. Run `browse --help` for all commands.
Core workflow:
1. `browse goto <url>` β Navigate to page
2. `browse snapshot -i` β Get interactive elements with refs (@e1, @e2)
3. `browse click @e1` / `fill @e2 "text"` β Interact using refs
4. Re-snapshot after page changes
Just ask the agent
Use browse to test the login flow. Run browse --help to see available commands.
MCP Server Mode
Run browse as an MCP server for editors that support the Model Context Protocol. All CLI commands are available as MCP tools β browser automation, app automation, perf-audit, detect, coverage, flows, and more.
browse --mcp
Use --json alongside --mcp for structured responses ({success, data, command}).
Note: Requires
npm install @modelcontextprotocol/sdkalongside browse.
Cursor
.cursor/mcp.json:
{
"mcpServers": {
"browse": {
"command": "browse",
"args": ["--mcp"]
}
}
}
Claude Desktop
claude_desktop_config.json:
{
"mcpServers": {
"browse": {
"command": "browse",
"args": ["--mcp"]
}
}
}
Windsurf
{
"mcpServers": {
"browse": {
"command": "browse",
"args": ["--mcp"]
}
}
}
Options
| Flag | Description |
|---|---|
--session <id> | Named session (isolates tabs, refs, cookies) |
--profile <name> | Persistent browser profile (own Chromium, full state) |
--context [state|delta|full] | Action context: state = page changes, delta = ARIA diff with refs, full = complete snapshot with refs |
--json | Wrap output as {success, data, command} |
--content-boundaries | Wrap page content in nonce-delimited markers |
--allowed-domains <d,d> | Block navigation/resources outside allowlist |
--max-output <n> | Truncate output to N characters |
--headed | Show browser window (not headless) |
--chrome | Shortcut for --runtime chrome --headed |
--cdp <port> | Connect to Chrome on a specific debugging port |
--connect | Auto-discover and connect to a running Chrome instance |
--provider <name> | Cloud browser provider (browserless, browserbase) |
--runtime <name> | Browser runtime: playwright (default), rebrowser (stealth), lightpanda, camoufox (anti-detection Firefox), chrome |
--camoufox-profile <name> | Named camoufox profile from .browse/camoufox-profiles/<name>.json (server-spawn-only) |
Environment Variables
| Variable | Default | Description |
|---|---|---|
BROWSE_PORT | auto (9400β10400) | Fixed server port |
BROWSE_PORT_START | 9400 | Start of port scan range |
BROWSE_SESSION | (none) | Default session ID for all commands |
BROWSE_INSTANCE | auto (PPID) | Instance ID for multi-agent isolation |
BROWSE_IDLE_TIMEOUT | 1800000 (30m) | Idle auto-shutdown in ms |
BROWSE_TIMEOUT | (none) | Override all command timeouts (ms) |
BROWSE_LOCAL_DIR | .browse/ or /tmp | State/log/screenshot directory |
BROWSE_JSON | (none) | Set to 1 for JSON output mode |
BROWSE_CONTEXT | (none) | Set to 1/state/delta/full for action context levels |
BROWSE_CONTENT_BOUNDARIES | (none) | Set to 1 for nonce-delimited output |
BROWSE_ALLOWED_DOMAINS | (none) | Comma-separated domain allowlist |
BROWSE_MAX_OUTPUT | (none) | Truncate output to N characters |
BROWSE_HEADED | (none) | Set to 1 for headed browser mode |
BROWSE_CDP_URL | (none) | Connect to remote Chrome via CDP |
BROWSE_PROXY | (none) | Proxy server URL |
BROWSE_PROXY_BYPASS | (none) | Proxy bypass list |
BROWSE_SERVER_SCRIPT | auto-detected | Override path to server.ts |
BROWSE_DEBUG_PORT | (none) | Port for DevTools debugging |
BROWSE_POLICY | browse-policy.json | Path to action policy file |
BROWSE_CONFIRM_ACTIONS | (none) | Commands requiring confirmation |
BROWSE_ENCRYPTION_KEY | auto-generated | 64-char hex AES key for credential vault |
BROWSE_AUTH_PASSWORD | (none) | Password for auth save (alt to --password-stdin) |
BROWSE_RUNTIME | playwright | Browser runtime (playwright, rebrowser, lightpanda, camoufox, chrome) |
BROWSE_CAMOUFOX_PROFILE | (none) | Named camoufox profile from .browse/camoufox-profiles/ |
BROWSE_CHROME | (none) | Set to 1 to use system Chrome |
BROWSE_CHROME_PATH | auto-detected | Override Chrome executable path |
Architecture
browse [--session <id>] [--platform <p>] [--app <name>] <command>
|
CLI (thin HTTP client)
|
Persistent server (localhost, auto-started)
|
SessionManager + CommandRegistry + executeCommand()
βββ Browser sessions:
β βββ "default" β BrowserManager β Chromium (Playwright)
β βββ "agent-a" β BrowserManager β Chromium (shared)
β βββ "agent-b" β BrowserManager β Chromium (shared)
βββ App sessions:
β βββ "app:com.example" β AndroidAppManager β adb β device driver
β βββ "app:com.example.ios" β IOSAppManager β simctl β Simulator
β βββ "app:Safari" β AppManager β browse-ax β macOS AX
βββ All targets implement AutomationTarget interface
- First command: ~2s (server + Chromium startup, once)
- Every command after: ~100β200ms (HTTP to localhost)
- Server auto-starts on first command, auto-shuts down after 30 min idle
- Crash recovery: CLI detects dead server and restarts transparently
- State file:
.browse/browse-server.json(pid, port, token)
Benchmarks
vs Agent Browser & Browser-Use (Token Cost)
Tested on 3 sites across multi-step browsing flows β navigate, snapshot, scroll, search, extract text:
| Tool | Total Tokens | Total Time | Context Used (200K) |
|---|---|---|---|
| browse | 14,134 | 28.5s | 7.1% |
| agent-browser | 39,414 | 36.2s | 19.7% |
| browser-use | 34,281 | 72.7s | 17.1% |
browse uses 2.4x fewer tokens than browser-use, 2.8x fewer than agent-browser, and completes 2.5x faster than browser-use.
vs @playwright/mcp (Architecture)
@playwright/mcp dumps the full accessibility snapshot on every action. browse returns ~15 tokens per action β the agent requests a snapshot only when needed:
| @playwright/mcp | browse | |
|---|---|---|
Tokens on navigate | ~14,578 (auto-dumped) | ~11 |
Tokens on click | ~14,578 (auto-dumped) | ~15 |
| 10-action session | ~145,780 | ~11,388 |
| Context consumed (200K) | 73% | 6% |
Rerun: npm run benchmark
Changelog
See CHANGELOG.md for full release history.
Acknowledgments
Inspired by and originally derived from the /browse skill in gstack by Garry Tan.
License
Apache-2.0
