📦

Brevix Shrink

MCP middleware proxy that compresses tools/list, prompts/list, and resources/list responses to save tokens. Sister project to Brevix.

0 installs

Trust: 37 — Low

Blockchain

Ask AI about Brevix Shrink

I know everything about Brevix Shrink. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Brevix

Compress LLM output safely. Save tokens without breaking your code.

Install · Usage · Cost Routing · Accuracy Guard · Benchmarks · Roadmap · Contributing

Cuts response tokens 40–75% with a deterministic rule engine. Routes each task to the cheapest capable Claude tier — saving up to ~80% of API spend. Verifies meaning is preserved before emit. Works across 20+ AI coding tools.

Why Brevix

Brevix is two cost-saving layers in one tool:

Output compression — cuts response tokens 40–75% with a rule engine that's verified by a local Accuracy Guard. No silent meaning loss on dense technical prose.
Cost management & smart model routing — picks the cheapest Claude tier (Haiku / Sonnet / Opus) per task, escalates only on low-confidence answers, and enforces token + USD budget caps. Typical savings: ~80% of API spend vs always running Opus.

Works with Claude Code · Cursor · Windsurf · OpenAI Codex CLI · Google Antigravity · Gemini CLI · GitHub Copilot Chat · Aider · Continue.dev · Cline · Roo Code · Zed AI · Augment · Kilo · OpenHands · Tabnine · Warp · Replit · Sourcegraph Amp — plus any tool reading AGENTS.md.

💰 Cost Management & Model Routing

What it does: Stops you from paying Opus prices for tasks Haiku can solve in a single shot.

The problem

Most LLM coding tools call the most expensive model for every prompt. A task like "classify this support ticket" hits Opus at $0.05 per call, when Haiku could answer it for $0.0002 — 250× cheaper, identical answer.

The fix

Brevix Route is a thin layer in front of the Anthropic SDK that:

Classifies the task (regex/keyword-based, ~6μs, no API call).
Picks the cheapest capable model from a task → model rule table you control.
Optionally scores the response for confidence (hedge phrases, validity, length, optional semantic check).
Escalates to the next tier (Haiku → Sonnet → Opus) only when confidence is below threshold.
Records every call to a local JSONL log so you can audit exactly what was saved.

How to use it

Option A — From your code (Python)

from brevix import RoutedClient

client = RoutedClient(log_enabled=True)

# Cheap task -> auto-routed to Haiku.
r = client.call("Classify this ticket: app crashes on login")
print(r.text, r.model, r.cost_usd)
# > "bug"  claude-haiku-4-5  0.000028

# Hard task -> auto-routed to Opus.
r = client.call("Architect a multi-region payment system with sub-100ms p99 latency")
print(r.model)
# > claude-opus-4-7

# Confidence-guarded mode: retries on next tier only if the first answer hedges.
r = client.call("Review this auth middleware for token expiry bugs", confidence_check=True)
print(r.model, r.escalations, r.confidence)
# > claude-opus-4-7  1  0.94

Option B — From the CLI

brevix route --init                                    # one-time: write default config
brevix route "classify ticket: app crashes" --explain  # dry-run, show decision + cost
brevix route "..." --call                              # actually call the model
brevix route "..." --call --confidence                 # call + escalate on low conf
brevix route --budget-show                             # current spend
brevix stats --routing --since 7d                      # weekly cost-saved report
brevix route --learn-suggest                           # recommend rule changes
brevix route --learn-apply                             # apply them

Option C — From Claude Code (slash commands)

After /plugin install brevix@brevix:

/brevix-route classify this support ticket: "app crashes on login"
        ↓
[brevix] task=classify tier=haiku escalated=0
bug

/brevix-route architect a multi-region payment system with sub-100ms p99 latency
        ↓
[brevix] task=architecture tier=opus escalated=0
<full Opus response>

/brevix-route-stats --since 7d         # cost saved vs Opus-only baseline
/brevix-learn                          # suggested rule changes
/brevix-learn apply                    # apply them

Hard budget cap

Edit ~/.brevix/route.json:

{
  "budget": { "tokens": 0, "cost_usd": 50.00 }
}

When the cap is hit, the next call raises BudgetExceededError instead of silently overspending.

Or per-run from the CLI:

brevix route --budget-cost 5.00 --call "your prompt"

Real-world savings

Workload	Naive (Opus only)	Brevix routed	Saved
Solo dev, 50 calls/day, mostly easy tasks	$0.40/day	$0.06/day	85%
Team of 5, 500 calls/day, mixed	$4.00/day	$0.80/day	80%
Customer-support bot, 10k tickets/day, 90% classify	$80/day	$1.50/day	98%

Same answer quality — confidence guard ensures hard cases still hit Opus.

See the full Routing tutorial in ## ⚡ Usage for advanced flags, force-tier overrides, and the auto-tuning learn loop.

Features

Compression Engine

Three modes — Lite · Full · Ultra
Adaptive Auto mode — picks safest aggressive level per response
Protected regions — code blocks, URLs, error quotes never touched
File compression — CLAUDE.md, AGENTS.md, project notes with .original backup

Safety & Verification

Accuracy Guard — semantic similarity check before emit
Strict mode — auto-fallback to original when meaning would be lost
Local-first — no telemetry, no API calls in the engine
Reproducible benchmarks — three-arm A/B eval harness with tiktoken o200k_base

Integrations

MCP middleware (brevix-shrink) — compresses tool/prompt/resource descriptions
20 platform install targets — idempotent BREVIX:BEGIN/END markers
Statusline badge — [BREVIX] ⛏ X.Xk saved
Subagents — investigator · builder · reviewer with terse output

Insights

Real session-log token counts — stats --real --since 7d
Shareable savings — stats --share produces tweet-ready output
Per-mode breakdown — see exactly what each level saves
Free + MIT licensed

💰 Cost Management & Model Routing — new

Task-aware tiering — classify/parse → Haiku, code review/debug → Sonnet, architecture → Opus
Confidence-driven escalation — hedge / validity / length scorers retry on the next tier only when needed
Hard budget caps — BudgetExceededError blocks runaway spend (per-token or per-USD limits)
Routing dashboard — brevix stats --routing shows cost saved vs Opus-only baseline, escalation rate, per-model + per-task breakdown
Self-tuning — brevix route --learn-suggest|--learn-apply patches your config from observed escalation patterns
Claude Code subagents — brevix-haiku · brevix-sonnet · brevix-opus plus /brevix-route slash dispatcher

🚀 Install

One-liner — macOS / Linux / WSL

curl -fsSL https://raw.githubusercontent.com/Yash-Koladiya30/brevix/main/install.sh | bash -s -- --all

Windows — PowerShell

irm https://raw.githubusercontent.com/Yash-Koladiya30/brevix/main/install.ps1 | iex

`skills` CLI — one command, 9 tools at once

Auto-installs Brevix skills into Antigravity, Claude Code, Cline, Codex, Cursor, Gemini CLI, GitHub Copilot, Kiro CLI, and Qoder simultaneously.

npx skills add https://github.com/Yash-Koladiya30/brevix

Pick a specific skill:

npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix
npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix-commit
npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix-stats

Listing → skills.sh/Yash-Koladiya30/brevix

Manual install

pip install brevix                  # core
pip install 'brevix[guard]'         # + semantic Accuracy Guard
pip install 'brevix[tokens]'        # + accurate tiktoken counts
pip install 'brevix[all]'           # everything

Plug into your LLM coding tool

brevix install --list                # show all 20 targets
brevix install claude-code           # Claude Code plugin layout
brevix install cursor                # .cursor/rules/brevix.mdc
brevix install codex                 # AGENTS.md + .codex/hooks.json
brevix install gemini                # gemini-extension.json + GEMINI.md
brevix install all                   # write rule files for every tool

Idempotent — re-running updates the Brevix block, leaves your other content alone.

Claude Code marketplace

/plugin marketplace add Yash-Koladiya30/brevix
/plugin install brevix@brevix

MCP middleware

Compress upstream MCP server descriptions:

npm install -g brevix-shrink

Wrap any MCP server in your Claude config:

{
  "mcpServers": {
    "fs-shrunk": {
      "command": "npx",
      "args": ["brevix-shrink", "npx", "-y",
               "@modelcontextprotocol/server-filesystem", "/tmp"]
    }
  }
}

⚡ Usage

Slash commands — Claude Code, Cursor, etc.

# Output compression
/brevix                  # toggle on (full mode)
/brevix lite             # gentle compression
/brevix ultra            # max compression
/brevix auto             # pick best mode per response
/brevix off              # disable
/brevix-commit           # terse Conventional Commit message
/brevix-check            # run Accuracy Guard on a snippet
/brevix-stats            # show compression savings

# Smart model routing (saves API cost)
/brevix-route <task>     # route to cheapest capable tier (Haiku/Sonnet/Opus)
/brevix-route-stats      # cost saved vs Opus-only baseline
/brevix-learn            # suggested rule changes from observed escalations
/brevix-learn apply      # apply suggestions to ~/.brevix/route.json

For Codex CLI (no slash commands), use $brevix lite|full|ultra|auto|off.

CLI

# Output compression
brevix compress "Your verbose text here" --mode full
brevix compress -                      # stdin
brevix compress . --mode auto -v       # adaptive picks best
brevix compress . --guard --strict --threshold 0.85

# File compression (CLAUDE.md, AGENTS.md, project notes)
brevix compress-file CLAUDE.md         # writes .original.md backup
brevix compress-file CLAUDE.md --dry-run

# Stats
brevix stats                           # estimated, in-process
brevix stats --real --since 7d         # parsed from Claude Code session logs
brevix stats --share                   # tweet-ready one-liner
brevix stats --reset

# Verification
brevix check "original" "compressed"
brevix count "how many tokens?"

# Install rules into a project
brevix install cursor
brevix install --list

# Smart model routing
brevix route --init                                # write default config to ~/.brevix/route.json
brevix route "classify ticket: app crashes"        # print suggested model
brevix route "..." --explain                       # task, model, est cost, reason
brevix route "..." --call                          # actually call the chosen model
brevix route "..." --call --confidence             # also escalate on low-confidence answers
brevix route --budget-show                         # current spend vs cap
brevix route --budget-tokens 1000000 --budget-cost 50.00 "..." --call
brevix stats --routing                             # cost saved, escalation rate, by model/task
brevix stats --routing --since 7d
brevix route --learn-suggest                       # recommend rule changes
brevix route --learn-apply                         # apply them

Subagents — Claude Code

agents/ ships six focused subagents that emit ~60% smaller tool results than vanilla agents:

Agent	Purpose	Output format
brevix-investigator	Read-only code locator	`path:line — symbol — note`
brevix-builder	Surgical 1–2 file edits with verification	Diff + verify status
brevix-reviewer	Bug-focused diff review	`path:line: 🔴 bug: …. fix.`
brevix-haiku	Cheap tier — classify, parse, format, rename	Brief, exact answer
brevix-sonnet	Mid tier — review, refactor, debug, explain	Direct, file:line cited
brevix-opus	Heavy tier — architecture, multi-agent, hard bugs	Decision + reasoning chain

Smart Model Routing — `/brevix-route`

Save ~80% on Claude API cost without manually picking a model per task.

Quick start (5 steps, ~1 minute)

1. Install Brevix and the Anthropic SDK

pip install brevix anthropic
export ANTHROPIC_API_KEY=sk-ant-...

2. Install the Claude Code plugin

/plugin marketplace add Yash-Koladiya30/brevix
/plugin install brevix@brevix

This adds the routing-tier subagents (brevix-haiku / brevix-sonnet / brevix-opus) and the /brevix-route, /brevix-route-stats, /brevix-learn slash commands.

3. Initialize the routing config

brevix route --init

Writes ~/.brevix/route.json with sensible defaults: classify→Haiku, code review→Sonnet, architecture→Opus, escalation chain Haiku→Sonnet→Opus, no budget cap.

4. Use it from Claude Code

/brevix-route classify this support ticket: "app crashes on login"
        ↓
[brevix] task=classify tier=haiku escalated=0
bug

/brevix-route architect a multi-region payment system with sub-100ms p99 latency
        ↓
[brevix] task=architecture tier=opus escalated=0
<full Opus response>

5. Check what you saved

/brevix-route-stats --since 7d

How it works under the hood

/brevix-route <task> runs brevix route ... --explain under the hood.
Reads the suggested model from CLI output.
Spawns the matching subagent (brevix-haiku / brevix-sonnet / brevix-opus) via the Task tool.
If the subagent returns escalate: <reason>, retries on the next tier (capped at 1 retry per tier).
Records the call to ~/.brevix/routing_log.jsonl for stats.

Force a tier when you already know the right one:

/brevix-route --force-tier=sonnet refactor this 200-line module

Track your savings — `/brevix-route-stats`

/brevix-route-stats             # all-time
/brevix-route-stats --since 7d  # last week

Sample output:

Brevix Routing Stats
--------------------
Window:             7d
Calls:              412
Total cost:         $1.84
Opus-only baseline: $9.67
Saved:              $7.83 (80.9%)
Escalations:        18 (4.4% of calls)

By model:
  claude-haiku-4-5    289 (70.1%)  $0.08
  claude-sonnet-4-6   108 (26.2%)  $0.65
  claude-opus-4-7      15 ( 3.7%)  $1.11

Auto-tune from your usage — `/brevix-learn`

After a few days, Brevix recommends rule changes from observed escalation patterns:

/brevix-learn

  refactor: claude-sonnet-4-6 -> claude-opus-4-7
    samples=54  escalation_rate=68.5%
    reason: 37/54 escalated (69%); 32 of those landed on claude-opus-4-7

Apply with /brevix-learn apply to update ~/.brevix/route.json. User customizations and budget caps are preserved.

Hard budget cap

Edit ~/.brevix/route.json:

{
  "budget": { "tokens": 0, "cost_usd": 50.00 }
}

Or per-run from the CLI:

brevix route --budget-cost 5.00 --call "your prompt"

When the cap is hit, the next call raises BudgetExceededError instead of silently overspending.

🛡 How Accuracy Guard works

Compress output via the rule engine.
Score the original vs compressed text with a local sentence-transformer (no API cost).
If similarity ≥ threshold (default 0.85) → emit compressed. Otherwise warn, or in --strict mode fall back to original.
Without sentence-transformers installed → falls back to content-word containment (drops stopwords without penalty, fair to compression).

Result: compression you can trust on production code, specs, and contracts.

💡 Compression example

Before

The reason your React component is re-rendering on every parent update is that you are passing an inline object as a prop. In JavaScript, every render creates a new object reference, even if the contents are identical. To fix this, wrap the object in useMemo so the reference stays stable across renders.

After (full mode)

Inline object prop = new ref each render = re-render. Wrap in useMemo.

Tokens saved: ~75% · Meaning preserved: ✅ similarity 0.91

📊 Benchmarks

Reproducible three-arm A/B harness in evals/. Compares no-system-prompt vs "be terse" control vs Brevix on 10 developer prompts.

arm	n	median	mean	total	vs baseline	vs control
baseline	10	221	247.3	2473	—	—
control	10	178	191.6	1916	22.5%	—
brevix	10	119	128.4	1284	48.1%	33.0%

Run yourself:

pip install 'brevix[all]' anthropic
export ANTHROPIC_API_KEY=...
python evals/llm_run.py --model claude-sonnet-4-6
python evals/measure.py

The vs control column is the honest savings — what Brevix adds beyond "just be brief."

🗺 Roadmap

📜 License & Contributing

MIT — free for personal and commercial use. Issues and PRs welcome — see docs/CONTRIBUTING.md.

_{Built with care by Yash Koladiya · If Brevix saves you tokens, ⭐ the repo}

Brevix Shrink

Reviews

Documentation

Brevix

Compress LLM output safely. Save tokens without breaking your code.

Why Brevix

💰 Cost Management & Model Routing

The problem

The fix

How to use it

Option A — From your code (Python)

Option B — From the CLI

Option C — From Claude Code (slash commands)

Hard budget cap

Real-world savings

Features

Compression Engine

Safety & Verification

Integrations

Insights

💰 Cost Management & Model Routing — new

🚀 Install

One-liner — macOS / Linux / WSL

Windows — PowerShell

skills CLI — one command, 9 tools at once

Manual install

Plug into your LLM coding tool

Claude Code marketplace

MCP middleware

⚡ Usage

Slash commands — Claude Code, Cursor, etc.

CLI

Subagents — Claude Code

Smart Model Routing — /brevix-route

Quick start (5 steps, ~1 minute)

How it works under the hood

Track your savings — /brevix-route-stats

Auto-tune from your usage — /brevix-learn

Hard budget cap

🛡 How Accuracy Guard works

💡 Compression example

📊 Benchmarks

🗺 Roadmap

📜 License & Contributing

Security Checklist

`skills` CLI — one command, 9 tools at once

Smart Model Routing — `/brevix-route`

Track your savings — `/brevix-route-stats`

Auto-tune from your usage — `/brevix-learn`