Browsegrab
Token-efficient browser agent for local LLMs. Playwright + accessibility tree + MarkGrab.
Ask AI about Browsegrab
Powered by Claude Β· Grounded in docs
I know everything about Browsegrab. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
browsegrab
Token-efficient browser agent for local LLMs β Playwright + accessibility tree + MarkGrab, MCP native.
browsegrab is a lightweight browser automation library designed for local LLMs (8B-35B parameters). It combines Playwright's accessibility tree with MarkGrab's HTML-to-markdown conversion to achieve 5-8x fewer tokens per step compared to alternatives like browser-use.
Features
- Token-efficient: ~500-1,500 tokens/step (vs 4,000-10,000 for browser-use)
- Local LLM first: Optimized for vLLM, Ollama, and OpenAI-compatible endpoints
- MCP native: Built-in MCP server with 8 browser automation tools
- MarkGrab integration: HTML β clean markdown for content extraction
- Accessibility tree + ref system: Stable element references (
e1,e2, ...) without vision models - Success pattern caching: Zero LLM calls on repeated workflows
- 5-stage JSON parser: Robust action parsing for local LLM outputs
- Minimal dependencies: Only
playwright+httpxin core
Installation
pip install browsegrab
playwright install chromium
With optional features:
pip install browsegrab[mcp] # MCP server support
pip install browsegrab[content] # MarkGrab content extraction
pip install browsegrab[cli] # CLI with rich output
pip install browsegrab[all] # Everything
Quick Start
Python API
from browsegrab import BrowseSession
async with BrowseSession() as session:
# Navigate and get accessibility tree snapshot
await session.navigate("https://example.com")
snap = await session.snapshot()
print(snap.tree_text)
# - heading "Example Domain" [level=1]
# - link "Learn more": [ref=e1]
# Click using ref ID
result = await session.click("e1")
print(result.url) # https://www.iana.org/help/example-domains
# Type into search box
await session.navigate("https://en.wikipedia.org")
snap = await session.snapshot()
await session.type("e4", "Python programming", submit=True)
# Extract compressed content (AX tree + markdown)
content = await session.extract_content()
CLI
# Accessibility tree snapshot
browsegrab snapshot https://example.com
# JSON output
browsegrab snapshot https://example.com -f json
# Extract content (AX tree + markdown)
browsegrab extract https://en.wikipedia.org/wiki/Python
# Agentic browse (requires LLM endpoint)
browsegrab browse https://example.com "Find the about page"
MCP Server
browsegrab-mcp # Start MCP server (stdio)
Claude Desktop / Cursor / VS Code config:
{
"mcpServers": {
"browsegrab": {
"command": "browsegrab-mcp"
}
}
}
8 MCP tools: browser_navigate, browser_click, browser_type, browser_snapshot, browser_scroll, browser_extract_content, browser_go_back, browser_wait
How It Works
Agent Browse Loop
flowchart LR
A["π URL + Goal"] --> B["Navigate"]
B --> C["AX Tree Snapshot\n~200β500 tokens"]
C --> D{"LLM\nDecision"}
D -->|"click / type / scroll"| E["Execute Action"]
E --> C
D -->|"goal reached"| F["Extract Content\n(MarkGrab)"]
F --> G["β
Result"]
Token Efficiency
browsegrab separates structure (accessibility tree) from content (MarkGrab markdown), sending only what the LLM needs:
flowchart TD
A["Raw HTML"] --> B["Accessibility Tree"]
A --> C["MarkGrab Markdown"]
B --> D["Structure: ~200β500 tokens\nInteractive elements with ref IDs"]
C --> E["Content: ~300β800 tokens\nClean markdown Β· on-demand"]
D --> F["Combined: ~500β1,300 tokens/step\nβ‘ 5β8Γ fewer than browser-use"]
E --> F
Token efficiency (measured)
| Page | Interactive elements | Tokens | browser-use equivalent |
|---|---|---|---|
| example.com | 1 | ~60 | ~500+ |
| Wikipedia article | 452 | ~1,254 | ~10,000+ |
Architecture
browsegrab/
βββ config.py # Dataclass configs (env var loading)
βββ result.py # Result types (ActionResult, BrowseResult, ...)
βββ session.py # BrowseSession orchestrator
βββ browser/
β βββ manager.py # Playwright lifecycle (async context manager)
β βββ snapshot.py # Accessibility tree + ref system
β βββ selectors.py # 4-strategy selector resolver
β βββ actions.py # navigate, click, type, scroll, go_back, wait
βββ dom/
β βββ ref_map.py # ref ID β element bidirectional mapping
β βββ compress.py # AX tree + MarkGrab β compressed context
βββ llm/
β βββ base.py # LLMProvider ABC
β βββ provider.py # vLLM, Ollama, OpenAI-compatible
β βββ prompt.py # System prompts (~400 tokens)
β βββ parse.py # 5-stage JSON fallback parser
βββ agent/
β βββ history.py # Sliding window history compression
β βββ cache.py # Domain-based success pattern cache
β βββ loop_guard.py # Duplicate action detection
βββ __main__.py # CLI (click)
βββ mcp_server.py # FastMCP server (8 tools)
Configuration
All settings via environment variables (BROWSEGRAB_* prefix):
# Browser
BROWSEGRAB_BROWSER_HEADLESS=true
BROWSEGRAB_BROWSER_TIMEOUT_MS=30000
# LLM (for agentic browse)
BROWSEGRAB_LLM_PROVIDER=vllm # vllm | ollama | openai
BROWSEGRAB_LLM_BASE_URL=http://localhost:8000/v1
BROWSEGRAB_LLM_MODEL=Qwen/Qwen3.5-32B-AWQ
# Agent
BROWSEGRAB_AGENT_MAX_STEPS=10
BROWSEGRAB_AGENT_ENABLE_CACHE=true
Part of the QuartzUnit Ecosystem
| Library | Role |
|---|---|
| markgrab | Passive extraction (URL β markdown) |
| snapgrab | Passive capture (URL β screenshot) |
| docpick | Document OCR β structured JSON |
| browsegrab | Active automation (goal β browser actions β results) |
Development
git clone https://github.com/QuartzUnit/browsegrab.git
cd browsegrab
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
playwright install chromium
# Unit tests (no browser needed)
pytest tests/ -m "not e2e"
# Full suite including E2E
pytest tests/ -v
License
Part of the QuartzUnit ecosystem β composable Python libraries for data collection, extraction, search, and AI agent safety.
