Brevix Shrink
MCP middleware proxy that compresses tools/list, prompts/list, and resources/list responses to save tokens. Sister project to Brevix.
Ask AI about Brevix Shrink
Powered by Claude · Grounded in docs
I know everything about Brevix Shrink. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Brevix
Compress LLM output safely. Save tokens without breaking your code.
Install · Usage · Cost Routing · Accuracy Guard · Benchmarks · Roadmap · Contributing
Cuts response tokens 40–75% with a deterministic rule engine. Routes each task to the cheapest capable Claude tier — saving up to ~80% of API spend. Verifies meaning is preserved before emit. Works across 20+ AI coding tools.
Why Brevix
Brevix is two cost-saving layers in one tool:
- Output compression — cuts response tokens 40–75% with a rule engine that's verified by a local Accuracy Guard. No silent meaning loss on dense technical prose.
- Cost management & smart model routing — picks the cheapest Claude tier (Haiku / Sonnet / Opus) per task, escalates only on low-confidence answers, and enforces token + USD budget caps. Typical savings: ~80% of API spend vs always running Opus.
Works with Claude Code · Cursor · Windsurf · OpenAI Codex CLI · Google Antigravity · Gemini CLI · GitHub Copilot Chat · Aider · Continue.dev · Cline · Roo Code · Zed AI · Augment · Kilo · OpenHands · Tabnine · Warp · Replit · Sourcegraph Amp — plus any tool reading AGENTS.md.
💰 Cost Management & Model Routing
What it does: Stops you from paying Opus prices for tasks Haiku can solve in a single shot.
The problem
Most LLM coding tools call the most expensive model for every prompt. A task like "classify this support ticket" hits Opus at $0.05 per call, when Haiku could answer it for $0.0002 — 250× cheaper, identical answer.
The fix
Brevix Route is a thin layer in front of the Anthropic SDK that:
- Classifies the task (regex/keyword-based, ~6μs, no API call).
- Picks the cheapest capable model from a
task → modelrule table you control. - Optionally scores the response for confidence (hedge phrases, validity, length, optional semantic check).
- Escalates to the next tier (Haiku → Sonnet → Opus) only when confidence is below threshold.
- Records every call to a local JSONL log so you can audit exactly what was saved.
How to use it
Option A — From your code (Python)
from brevix import RoutedClient
client = RoutedClient(log_enabled=True)
# Cheap task -> auto-routed to Haiku.
r = client.call("Classify this ticket: app crashes on login")
print(r.text, r.model, r.cost_usd)
# > "bug" claude-haiku-4-5 0.000028
# Hard task -> auto-routed to Opus.
r = client.call("Architect a multi-region payment system with sub-100ms p99 latency")
print(r.model)
# > claude-opus-4-7
# Confidence-guarded mode: retries on next tier only if the first answer hedges.
r = client.call("Review this auth middleware for token expiry bugs", confidence_check=True)
print(r.model, r.escalations, r.confidence)
# > claude-opus-4-7 1 0.94
Option B — From the CLI
brevix route --init # one-time: write default config
brevix route "classify ticket: app crashes" --explain # dry-run, show decision + cost
brevix route "..." --call # actually call the model
brevix route "..." --call --confidence # call + escalate on low conf
brevix route --budget-show # current spend
brevix stats --routing --since 7d # weekly cost-saved report
brevix route --learn-suggest # recommend rule changes
brevix route --learn-apply # apply them
Option C — From Claude Code (slash commands)
After /plugin install brevix@brevix:
/brevix-route classify this support ticket: "app crashes on login"
↓
[brevix] task=classify tier=haiku escalated=0
bug
/brevix-route architect a multi-region payment system with sub-100ms p99 latency
↓
[brevix] task=architecture tier=opus escalated=0
<full Opus response>
/brevix-route-stats --since 7d # cost saved vs Opus-only baseline
/brevix-learn # suggested rule changes
/brevix-learn apply # apply them
Hard budget cap
Edit ~/.brevix/route.json:
{
"budget": { "tokens": 0, "cost_usd": 50.00 }
}
When the cap is hit, the next call raises BudgetExceededError instead of silently overspending.
Or per-run from the CLI:
brevix route --budget-cost 5.00 --call "your prompt"
Real-world savings
| Workload | Naive (Opus only) | Brevix routed | Saved |
|---|---|---|---|
| Solo dev, 50 calls/day, mostly easy tasks | $0.40/day | $0.06/day | 85% |
| Team of 5, 500 calls/day, mixed | $4.00/day | $0.80/day | 80% |
| Customer-support bot, 10k tickets/day, 90% classify | $80/day | $1.50/day | 98% |
Same answer quality — confidence guard ensures hard cases still hit Opus.
See the full Routing tutorial in
## ⚡ Usagefor advanced flags, force-tier overrides, and the auto-tuning learn loop.
Features
Compression Engine
|
Safety & Verification
|
Integrations
|
Insights
|
💰 Cost Management & Model Routing — new
| |
🚀 Install
One-liner — macOS / Linux / WSL
curl -fsSL https://raw.githubusercontent.com/Yash-Koladiya30/brevix/main/install.sh | bash -s -- --all
Windows — PowerShell
irm https://raw.githubusercontent.com/Yash-Koladiya30/brevix/main/install.ps1 | iex
skills CLI — one command, 9 tools at once
Auto-installs Brevix skills into Antigravity, Claude Code, Cline, Codex, Cursor, Gemini CLI, GitHub Copilot, Kiro CLI, and Qoder simultaneously.
npx skills add https://github.com/Yash-Koladiya30/brevix
Pick a specific skill:
npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix
npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix-commit
npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix-stats
Listing → skills.sh/Yash-Koladiya30/brevix
Manual install
pip install brevix # core
pip install 'brevix[guard]' # + semantic Accuracy Guard
pip install 'brevix[tokens]' # + accurate tiktoken counts
pip install 'brevix[all]' # everything
Plug into your LLM coding tool
brevix install --list # show all 20 targets
brevix install claude-code # Claude Code plugin layout
brevix install cursor # .cursor/rules/brevix.mdc
brevix install codex # AGENTS.md + .codex/hooks.json
brevix install gemini # gemini-extension.json + GEMINI.md
brevix install all # write rule files for every tool
Idempotent — re-running updates the Brevix block, leaves your other content alone.
Claude Code marketplace
/plugin marketplace add Yash-Koladiya30/brevix
/plugin install brevix@brevix
MCP middleware
Compress upstream MCP server descriptions:
npm install -g brevix-shrink
Wrap any MCP server in your Claude config:
{
"mcpServers": {
"fs-shrunk": {
"command": "npx",
"args": ["brevix-shrink", "npx", "-y",
"@modelcontextprotocol/server-filesystem", "/tmp"]
}
}
}
⚡ Usage
Slash commands — Claude Code, Cursor, etc.
# Output compression
/brevix # toggle on (full mode)
/brevix lite # gentle compression
/brevix ultra # max compression
/brevix auto # pick best mode per response
/brevix off # disable
/brevix-commit # terse Conventional Commit message
/brevix-check # run Accuracy Guard on a snippet
/brevix-stats # show compression savings
# Smart model routing (saves API cost)
/brevix-route <task> # route to cheapest capable tier (Haiku/Sonnet/Opus)
/brevix-route-stats # cost saved vs Opus-only baseline
/brevix-learn # suggested rule changes from observed escalations
/brevix-learn apply # apply suggestions to ~/.brevix/route.json
For Codex CLI (no slash commands), use $brevix lite|full|ultra|auto|off.
CLI
# Output compression
brevix compress "Your verbose text here" --mode full
brevix compress - # stdin
brevix compress . --mode auto -v # adaptive picks best
brevix compress . --guard --strict --threshold 0.85
# File compression (CLAUDE.md, AGENTS.md, project notes)
brevix compress-file CLAUDE.md # writes .original.md backup
brevix compress-file CLAUDE.md --dry-run
# Stats
brevix stats # estimated, in-process
brevix stats --real --since 7d # parsed from Claude Code session logs
brevix stats --share # tweet-ready one-liner
brevix stats --reset
# Verification
brevix check "original" "compressed"
brevix count "how many tokens?"
# Install rules into a project
brevix install cursor
brevix install --list
# Smart model routing
brevix route --init # write default config to ~/.brevix/route.json
brevix route "classify ticket: app crashes" # print suggested model
brevix route "..." --explain # task, model, est cost, reason
brevix route "..." --call # actually call the chosen model
brevix route "..." --call --confidence # also escalate on low-confidence answers
brevix route --budget-show # current spend vs cap
brevix route --budget-tokens 1000000 --budget-cost 50.00 "..." --call
brevix stats --routing # cost saved, escalation rate, by model/task
brevix stats --routing --since 7d
brevix route --learn-suggest # recommend rule changes
brevix route --learn-apply # apply them
Subagents — Claude Code
agents/ ships six focused subagents that emit ~60% smaller tool results than vanilla agents:
| Agent | Purpose | Output format |
|---|---|---|
| brevix-investigator | Read-only code locator | path:line — symbol — note |
| brevix-builder | Surgical 1–2 file edits with verification | Diff + verify status |
| brevix-reviewer | Bug-focused diff review | path:line: 🔴 bug: …. fix. |
| brevix-haiku | Cheap tier — classify, parse, format, rename | Brief, exact answer |
| brevix-sonnet | Mid tier — review, refactor, debug, explain | Direct, file:line cited |
| brevix-opus | Heavy tier — architecture, multi-agent, hard bugs | Decision + reasoning chain |
Smart Model Routing — /brevix-route
Save ~80% on Claude API cost without manually picking a model per task.
Quick start (5 steps, ~1 minute)
1. Install Brevix and the Anthropic SDK
pip install brevix anthropic
export ANTHROPIC_API_KEY=sk-ant-...
2. Install the Claude Code plugin
/plugin marketplace add Yash-Koladiya30/brevix
/plugin install brevix@brevix
This adds the routing-tier subagents (brevix-haiku / brevix-sonnet / brevix-opus) and the /brevix-route, /brevix-route-stats, /brevix-learn slash commands.
3. Initialize the routing config
brevix route --init
Writes ~/.brevix/route.json with sensible defaults: classify→Haiku, code review→Sonnet, architecture→Opus, escalation chain Haiku→Sonnet→Opus, no budget cap.
4. Use it from Claude Code
/brevix-route classify this support ticket: "app crashes on login"
↓
[brevix] task=classify tier=haiku escalated=0
bug
/brevix-route architect a multi-region payment system with sub-100ms p99 latency
↓
[brevix] task=architecture tier=opus escalated=0
<full Opus response>
5. Check what you saved
/brevix-route-stats --since 7d
How it works under the hood
/brevix-route <task>runsbrevix route ... --explainunder the hood.- Reads the suggested model from CLI output.
- Spawns the matching subagent (
brevix-haiku/brevix-sonnet/brevix-opus) via theTasktool. - If the subagent returns
escalate: <reason>, retries on the next tier (capped at 1 retry per tier). - Records the call to
~/.brevix/routing_log.jsonlfor stats.
Force a tier when you already know the right one:
/brevix-route --force-tier=sonnet refactor this 200-line module
Track your savings — /brevix-route-stats
/brevix-route-stats # all-time
/brevix-route-stats --since 7d # last week
Sample output:
Brevix Routing Stats
--------------------
Window: 7d
Calls: 412
Total cost: $1.84
Opus-only baseline: $9.67
Saved: $7.83 (80.9%)
Escalations: 18 (4.4% of calls)
By model:
claude-haiku-4-5 289 (70.1%) $0.08
claude-sonnet-4-6 108 (26.2%) $0.65
claude-opus-4-7 15 ( 3.7%) $1.11
Auto-tune from your usage — /brevix-learn
After a few days, Brevix recommends rule changes from observed escalation patterns:
/brevix-learn
refactor: claude-sonnet-4-6 -> claude-opus-4-7
samples=54 escalation_rate=68.5%
reason: 37/54 escalated (69%); 32 of those landed on claude-opus-4-7
Apply with /brevix-learn apply to update ~/.brevix/route.json. User customizations and budget caps are preserved.
Hard budget cap
Edit ~/.brevix/route.json:
{
"budget": { "tokens": 0, "cost_usd": 50.00 }
}
Or per-run from the CLI:
brevix route --budget-cost 5.00 --call "your prompt"
When the cap is hit, the next call raises BudgetExceededError instead of silently overspending.
🛡 How Accuracy Guard works
- Compress output via the rule engine.
- Score the original vs compressed text with a local sentence-transformer (no API cost).
- If similarity ≥ threshold (default
0.85) → emit compressed. Otherwise warn, or in--strictmode fall back to original. - Without
sentence-transformersinstalled → falls back to content-word containment (drops stopwords without penalty, fair to compression).
Result: compression you can trust on production code, specs, and contracts.
💡 Compression example
Before
The reason your React component is re-rendering on every parent update is that you are passing an inline object as a prop. In JavaScript, every render creates a new object reference, even if the contents are identical. To fix this, wrap the object in
useMemoso the reference stays stable across renders.
After (full mode)
Inline object prop = new ref each render = re-render. Wrap in
useMemo.
Tokens saved: ~75% · Meaning preserved: ✅ similarity 0.91
📊 Benchmarks
Reproducible three-arm A/B harness in evals/. Compares no-system-prompt vs "be terse" control vs Brevix on 10 developer prompts.
| arm | n | median | mean | total | vs baseline | vs control |
|---|---|---|---|---|---|---|
| baseline | 10 | 221 | 247.3 | 2473 | — | — |
| control | 10 | 178 | 191.6 | 1916 | 22.5% | — |
| brevix | 10 | 119 | 128.4 | 1284 | 48.1% | 33.0% |
Run yourself:
pip install 'brevix[all]' anthropic
export ANTHROPIC_API_KEY=...
python evals/llm_run.py --model claude-sonnet-4-6
python evals/measure.py
The
vs controlcolumn is the honest savings — what Brevix adds beyond "just be brief."
🗺 Roadmap
- Core compression engine (lite / full / ultra)
- Adaptive (auto) mode
- Accuracy Guard (semantic + content-word fallback)
- Local stats counter
- Multi-platform installer (20 targets)
- File-level compression (
brevix compress-file) - MCP middleware (
brevix-shrink) - Statusline badge + Claude Code hooks
- Subagents (investigator / builder / reviewer)
- Three-arm eval harness
- PowerShell installer + uninstaller
- Model routing engine (Haiku / Sonnet / Opus tiering)
- Confidence-driven escalation (hedge / validity / length scorers)
- Token + cost budget enforcement (
BudgetExceededError) - Routing stats dashboard (
brevix stats --routing) - Learn loop (
brevix route --learn-suggest|--learn-apply) - Claude Code routing tier subagents + slash commands
- VSCode extension UI
- Browser extension (claude.ai, chatgpt.com web)
- Two-way compression (compress prompts before send)
- Custom user-defined rule packs
- Web dashboard (team tier)
📜 License & Contributing
MIT — free for personal and commercial use. Issues and PRs welcome — see docs/CONTRIBUTING.md.
