Scout
Find licensed real estate agents, search MLS, route leads. Remote MCP for SC + GA brokerages.
Ask AI about Scout
Powered by Claude Β· Grounded in docs
I know everything about Scout. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Scout
Browser automation, one binary. The simpler alternative to Playwright β no Node, no Python, no runtime. Drive a real browser from Go, any shell, any AI agent (built-in MCP server), or a chat UI.
A single statically-linked scout binary gives you a CLI, a 69-tool MCP server (so any MCP-aware agent β Claude Desktop, Cursor, Cline, custom β has a browser), a conversational chat UI, and a Go library with Gin-like middleware composition. Same engine, four access points.
brew install felixgeelhaar/tap/scout
vs. Playwright
| Scout | Playwright | |
|---|---|---|
| Install | one ~15 MB binary | npm + ~600 MB browser cache |
| Runtime dep | none (static) | Node.js always; Python/Java/.NET as wrappers |
| Drive from | Go, any shell, MCP, chat UI | TS/JS first-class; others lag |
| AI-agent native | built-in scout mcp serve | separate playwright-mcp project |
| Token-aware extraction | DOM diff, distillation, observation budgets (50β80% fewer tokens) | not provided |
| Action playbooks | record & replay deterministic JSON | codegen produces a script you maintain |
| Container deploy | drop into scratch or distroless | carry Node + browser binaries |
| CDP access | direct WebSocket, zero abstraction | internal protocol over CDP |
Quick Start
# CLI β visible browser, one-shot commands
scout observe https://example.com # structured page snapshot
scout markdown https://news.ycombinator.com # page as compact markdown
scout screenshot https://github.com # save screenshot
scout extract https://example.com h1 # extract element text
scout frameworks https://react.dev # detect React, Vue, etc.
# MCP Server β give AI agents browser superpowers
claude mcp add scout -- scout mcp serve
# Browser UI β conversational browser automation
scout ui serve --provider=ollama --model=mistral
cd ui && npm install && npm run dev # open http://localhost:3000
Install
# Homebrew
brew install felixgeelhaar/tap/scout
# Direct binary
curl -fsSL https://raw.githubusercontent.com/felixgeelhaar/scout/main/install.sh | bash
# Go
go install github.com/felixgeelhaar/scout/cmd/scout@latest
# As a library
go get github.com/felixgeelhaar/scout
MCP Server β 66 Tools
Run scout mcp serve and any MCP-aware agent has a browser. No second project to install, no Node runtime, no Python interpreter β the binary is the server. Configure in any MCP client:
claude mcp add scout -- scout mcp serve # Claude Code
{"mcpServers": {"scout": {"command": "scout", "args": ["mcp", "serve"]}}}
Tool Categories
| Category | Tools |
|---|---|
| Navigation | navigate, observe, observe_diff, observe_with_budget |
| Interaction | click, click_label, type, hover, double_click, right_click, select_option, scroll_to, scroll_by, focus, drag_drop, dispatch_event |
| Forms | fill_form, fill_form_semantic, discover_form |
| Extraction | extract, extract_all, extract_table, auto_extract, scroll_and_collect, markdown, readable_text, accessibility_tree |
| Capture | screenshot, annotated_screenshot, pdf |
| Network | enable_network_capture, network_requests |
| Tabs | open_tab, switch_tab, close_tab, list_tabs |
| Frameworks | wait_spa, detect_frameworks, component_state, app_state |
| Playback | start_recording, stop_recording, save_playbook, replay_playbook |
| Video | start_screen_recording, stop_screen_recording |
| Smart Helpers | dismiss_cookies, check_readiness, suggest_selectors, session_history |
| Vision | hybrid_observe, find_by_coordinates |
| Batch | execute_batch |
| Iframe | switch_to_frame, switch_to_main_frame |
| Trace | start_trace, stop_trace |
| Diagnostics | detect_dialog, detect_auth_wall, console_errors, compare_tabs, upload_file |
| Utility | has_element, wait_for, configure, web_vitals, select_by_prompt |
All tools have MCP annotations (ReadOnly, OpenWorld, ClosedWorld, Idempotent) for smart auto-approval. Read-only tools like observe, extract, and screenshot run without permission prompts.
Runtime Configuration
Switch between headless and visible browser without restarting, and opt into local-dev workflows (loopback, private IPs):
Agent: configure(headless: false) β browser window appears
Agent: navigate("https://...") β watch it work
Agent: configure(headless: true) β back to headless
Agent: configure(allow_private_ips: true) β unlock localhost / 192.168.* / 10.*
Agent: navigate("http://127.0.0.1:4173/") β drive your local dev server
The MCP server also reads SCOUT_ALLOW_PRIVATE_IPS=1 at startup as a one-shot toggle for trusted environments.
Screen Recording (video)
Record the active page as a video. Pure CDP β works in headless, no Playwright needed. Recording survives navigate, open_tab, and switch_tab calls in between, so a multi-page demo lands as one continuous clip:
Agent: start_screen_recording({ width: 1280, height: 800, fps: 15, format: "webm" })
Agent: navigate("https://example.com")
Agent: click("#cta")
Agent: navigate("https://example.com/dashboard") # recording continues across pages
Agent: stop_screen_recording()
β { path: "/tmp/scout-rec-XXX.webm", format: "webm", encoder: "ffmpeg",
frame_count: N, duration_ms: N }
If ffmpeg is on PATH, the result is encoded to WebM (libvpx-vp9) or MP4 (libx264). If not, scout returns the raw JPEG frames directory plus an ffmpeg concat list so you can encode offline. The result is always a file path, never base64 β never enters your LLM token budget.
Realistic FPS: ~10β15 on typical pages, capped at 30. Implementation polls Page.captureScreenshot (CDP Page.startScreencast events are silently dropped under --headless=new Chrome).
Browser UI
A conversational browser automation interface. Type natural language, watch the browser respond in real-time.
# Start the AG-UI server (Go backend)
scout ui serve --provider=ollama --model=mistral # local, no API key
scout ui serve --provider=claude # needs ANTHROPIC_API_KEY
scout ui serve --provider=openai --model=gpt-4o # needs OPENAI_API_KEY
scout ui serve --provider=groq --base-url=https://api.groq.com/openai --model=llama-3.3-70b-versatile
# Start the Vue frontend
cd ui && npm install && npm run dev # http://localhost:3000
The UI streams AG-UI protocol events over SSE:
- Chat panel with markdown rendering and quick-action pills
- Live browser viewport with screenshot streaming and URL bar
- Activity timeline showing tool calls in real-time
- Stop button to cancel mid-stream
The Go server handles the agentic loop: LLM decides which scout tools to call, executes them, streams browser state deltas back to the frontend. Supports any OpenAI-compatible endpoint via --base-url.
Agent Package (Go)
High-level Go API for callers that want to embed scout in a program. Structured output, auto-wait, goroutine-safe. Most users reach scout through the CLI or MCP server above β this section is for the Go-library path.
session, _ := agent.NewSession(agent.SessionConfig{Headless: true})
defer session.Close()
// Navigate and observe
session.Navigate("https://example.com")
obs, _ := session.Observe() // links, inputs, buttons, text + action costs
// DOM diff β only what changed (saves 50-80% tokens)
session.Click("#submit")
_, diff, _ := session.ObserveDiff()
// diff.Classification: "modal_appeared"
// diff.Summary: "Modal/dialog appeared: Login required"
// Semantic form filling β no CSS selectors
session.FillFormSemantic(map[string]string{
"Email": "user-example", "Password": "secret",
})
// Visual grounding β click by number, not selector
result, _ := session.AnnotatedScreenshot() // numbered labels on elements
session.ClickLabel(7) // click element [7]
// Multi-tab coordination
session.OpenTab("pricing", "https://example.com/pricing")
session.SwitchTab("default")
// Framework detection (19 frameworks)
frameworks, _ := session.DetectedFrameworks() // ["react", "nextjs"]
state, _ := session.ComponentState("#app") // read React/Vue state
// Network capture β read API responses directly
session.EnableNetworkCapture("/api/")
captured := session.CapturedRequests("/api/users")
// Action replay β record once, replay without LLM
session.StartRecordingPlaybook("login-flow")
// ... do stuff ...
pb, _ := session.StopRecordingPlaybook()
agent.SavePlaybook(pb, "login.json")
// Later: session.ReplayPlaybook(pb) // 100x cheaper
// Persistent profiles
session.SaveProfile("session.json") // cookies + localStorage
session.LoadProfile("session.json")
// Content distillation (5 levels)
session.Markdown() // ~2-8KB compact markdown
session.ReadableText() // ~1-4KB main content only
session.AccessibilityTree() // ~1-4KB semantic tree
session.ObserveWithBudget(500) // fit in ~500 tokens
Core Library (Go)
Gin-like Engine/Context/Group/HandlerFunc with middleware composition. The lowest-level Go API β use it when you want full control of task lifecycle, named groups, and middleware chains:
engine := browse.Default(browse.WithHeadless(true))
engine.MustLaunch()
defer engine.Close()
engine.Use(middleware.Stealth())
engine.Use(middleware.Retry(middleware.RetryConfig{MaxAttempts: 3}))
engine.Use(middleware.Timeout(30 * time.Second))
admin := engine.Group("admin", middleware.BasicAuth("admin", "secret"))
admin.Task("export", func(c *browse.Context) {
c.MustNavigate("https://app.example.com/admin")
table, _ := c.ExtractTable("#users")
c.Set("data", table)
})
engine.RunGroup("admin")
Middleware
| Category | Middleware |
|---|---|
| Resilience | Retry, Timeout, CircuitBreaker, RateLimit, Bulkhead |
| Auth | BearerAuth, BasicAuth, CookieAuth, HeaderAuth |
| Anti-detection | Stealth (10 patches: webdriver, plugins, WebGL, etc.) |
| Network | BlockResources, WaitNetworkIdle |
| Utilities | ScreenshotOnError, SlowMotion, Viewport |
CLI
CLI defaults to visible browser (--headless to hide):
scout navigate <url> # page info as JSON
scout observe <url> # structured observation
scout markdown <url> # compact markdown
scout screenshot <url> [--output f] # save screenshot
scout pdf <url> [--output f] # save PDF
scout extract <url> <selector> # extract text
scout eval <url> <expression> # run JavaScript
scout form discover <url> # discover form fields
scout frameworks <url> # detect frameworks
scout watch <url> [--interval=5s] # live-watch page changes
scout pipe <command> [selector] # batch process URLs from stdin
scout record <url> [--output f] # interactive recording β playbook
scout mcp serve # start MCP server
scout version # print version
Architecture
scout/
βββ browse.go, engine.go, context.go # Gin-like API
βββ page.go, selection.go # CDP page & element interaction
βββ recorder.go # Action playbook recording (Navigate/Click/Type β JSON)
βββ middleware/ # stealth, resilience, auth, network
βββ agent/ # AI agent API (50+ methods)
β βββ session.go # Session lifecycle, Navigate, Click, Type
β βββ observe.go, diff.go # Observe, ObserveDiff, cost estimation
β βββ content.go # Markdown, ReadableText, AccessibilityTree
β βββ form.go # DiscoverForm, FillFormSemantic, MatchFormField
β βββ annotate.go # AnnotatedScreenshot, ClickLabel
β βββ network.go # EnableNetworkCapture, CapturedRequests
β βββ spa.go # DetectedFrameworks, ComponentState, GetAppState
β βββ tabs.go # OpenTab, SwitchTab, CloseTab, ListTabs
β βββ playbook.go # StartRecording, ReplayPlaybook, SavePlaybook
β βββ interact.go # Hover, DragDrop, SelectOption, ScrollTo
β βββ profile.go # CaptureProfile, ApplyProfile, SaveProfile
β βββ selector.go # Playwright :text() selector translation
β βββ budget.go # ObserveWithBudget, EstimateTokens
β βββ nlselect.go # SelectByPrompt, fuzzy NL element matching
β βββ batch.go # ExecuteBatch, sequential multi-action
β βββ vision.go # HybridObserve, FindByCoordinates
β βββ trace.go # StartTrace, StopTrace, action tracing
β βββ screencast.go # StartScreenRecording / StopScreenRecording β video via captureScreenshot polling + ffmpeg encode
β βββ iframe.go # SwitchToFrame, SwitchToMainFrame
β βββ vitals.go # WebVitals (LCP/CLS/INP)
βββ internal/cdp/ # WebSocket CDP client (context-aware)
βββ internal/launcher/ # Chrome process management
βββ cmd/scout/ # CLI + MCP server (69 tools)
βββ docs/ # Landing page (GitHub Pages)
License
MIT
