Headroom
The Context Optimization Layer for LLM Applications
Installation
npx headroomAsk AI about Headroom
Powered by Claude Β· Grounded in docs
I know everything about Headroom. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Every tool call, log line, DB read, RAG chunk, and file your agent injects into a prompt is mostly boilerplate. Headroom strips the noise and keeps the signal β losslessly, locally, and without touching accuracy.
100 logs. One FATAL error buried at position 67. Both runs found it. Baseline 10,144 tokens β Headroom 1,260 tokens β 87% fewer, identical answer.
python examples/needle_in_haystack_test.py
Quick start
Works with Anthropic, OpenAI, Google, Bedrock, Vertex, Azure, OpenRouter, and 100+ models via LiteLLM.
Wrap your coding agent β one command:
pip install "headroom-ai[all]"
headroom wrap claude # Claude Code
headroom wrap codex # Codex
headroom wrap cursor # Cursor
headroom wrap aider # Aider
headroom wrap copilot # GitHub Copilot CLI
Using pipx? Current release wheels are built for Python 3.10 through 3.13, so
choose a supported interpreter explicitly:
pipx install --python python3.13 "headroom-ai[all]"
Drop it into your own code β Python or TypeScript:
from headroom import compress
result = compress(messages, model="claude-sonnet-4-5")
response = client.messages.create(model="claude-sonnet-4-5", messages=result.messages)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")
import { compress } from 'headroom-ai';
const result = await compress(messages, { model: 'gpt-4o' });
Or run it as a proxy β zero code changes, any language:
headroom proxy --port 8787
ANTHROPIC_BASE_URL=http://localhost:8787 your-app
OPENAI_BASE_URL=http://localhost:8787/v1 your-app
Why Headroom
- Accuracy-preserving. GSM8K 0.870 β 0.870 (Β±0.000). TruthfulQA +0.030. SQuAD v2 and BFCL both 97% accuracy after compression. Validated on public OSS benchmarks you can rerun yourself.
- Runs on your machine. No cloud API, no data egress. Compression latency is milliseconds β faster end-to-end for Sonnet / Opus / GPT-4 class models than a hosted service round-trip.
- Kompress-base on HuggingFace. Our open-source text compressor, fine-tuned on real agentic traces β tool outputs, logs, RAG chunks, code. Install with
pip install "headroom-ai[ml]". - Cross-agent memory and learning. Claude Code saves a fact, Codex reads it back.
headroom learnmines failed sessions and writes corrections straight toCLAUDE.md/AGENTS.md/GEMINI.mdβ reliability compounds over time. - Reversible (CCR). Compression is not deletion. The model can always call
headroom_retrieveto pull the original bytes. Nothing is thrown away.
Bundles the RTK binary for shell-output rewriting β full attribution below.
How it fits
Your agent / app
(Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own codeβ¦)
β prompts Β· tool outputs Β· logs Β· RAG results Β· files
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Headroom (runs locally β your data stays here) β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β CacheAligner β ContentRouter β CCR β
β ββ SmartCrusher (JSON) β
β ββ CodeCompressor (AST) β
β ββ Kompress-base (text, HF) β
β β
β Cross-agent memory Β· headroom learn Β· MCP β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β compressed prompt + retrieval tool
βΌ
LLM provider (Anthropic Β· OpenAI Β· Bedrock Β· β¦)
β Architecture Β· CCR reversible compression Β· Kompress-base model card
Canonical pipeline lifecycle
Headroom now exposes one stable request lifecycle across compress(), the SDK, and the proxy:
Setup β Pre-Start β Post-Start β Input Received β Input Cached β Input Routed β Input Compressed β Input Remembered β Pre-Send β Post-Send β Response Received
- Transforms still do the work: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor, Kompress-base, IntelligentContext / RollingWindow.
- Pipeline extensions observe or customize those lifecycle stages via
on_pipeline_event(...). - Compression hooks still work and now sit alongside the canonical lifecycle instead of being the only extension seam.
- Proxy extensions remain the server/app integration seam for ASGI middleware, routes, and startup policy.
Provider slices
Provider and tool-specific behavior is being moved behind dedicated modules under headroom/providers/ so core orchestration stays focused on lifecycle, sequencing, and policy.
- CLI/tool slices:
headroom/providers/claude,copilot,codex,openclaw - Provider runtime slices:
headroom/providers/claude,gemini, plus shared backend/runtime dispatch inheadroom/providers/registry.py - Core files stay orchestration-first:
wrap.py,client.py,cli/proxy.py, andproxy/server.pynow delegate provider-specific env shaping, API target normalization, backend selection, and transport dispatch instead of inlining those rules.
Proof
Savings on real agent workloads:
| Workload | Before | After | Savings |
|---|---|---|---|
| Code search (100 results) | 17,765 | 1,408 | 92% |
| SRE incident debugging | 65,694 | 5,118 | 92% |
| GitHub issue triage | 54,174 | 14,761 | 73% |
| Codebase exploration | 78,502 | 41,254 | 47% |
Accuracy preserved on standard benchmarks:
| Benchmark | Category | N | Baseline | Headroom | Delta |
|---|---|---|---|---|---|
| GSM8K | Math | 100 | 0.870 | 0.870 | Β±0.000 |
| TruthfulQA | Factual | 100 | 0.530 | 0.560 | +0.030 |
| SQuAD v2 | QA | 100 | β | 97% | 19% compression |
| BFCL | Tools | 100 | β | 97% | 32% compression |
Reproduce:
python -m headroom.evals suite --tier 1
Community, live:
β Full benchmarks & methodology
Built for coding agents
| Agent | One-command wrap | Notes |
|---|---|---|
| Claude Code | headroom wrap claude | --memory for cross-agent memory, --code-graph for codebase intel |
| Codex | headroom wrap codex --memory | Shares the same memory store as Claude |
| Cursor | headroom wrap cursor | Prints Cursor config β paste once, done |
| Aider | headroom wrap aider | Starts proxy, launches Aider |
| Copilot CLI | headroom wrap copilot | Starts proxy, launches Copilot |
| OpenClaw | headroom wrap openclaw | Installs Headroom as ContextEngine plugin |
MCP-native too β headroom mcp install exposes headroom_compress, headroom_retrieve, and headroom_stats to any MCP client.
Integrations
Drop Headroom into any stack
| Your setup | Hook in with |
|---|---|
| Any Python app | compress(messages, model=β¦) |
| Any TypeScript app | await compress(messages, { model }) |
| Anthropic / OpenAI SDK | withHeadroom(new Anthropic()) Β· withHeadroom(new OpenAI()) |
| Vercel AI SDK | wrapLanguageModel({ model, middleware: headroomMiddleware() }) |
| LiteLLM | litellm.callbacks = [HeadroomCallback()] |
| LangChain | HeadroomChatModel(your_llm) |
| Agno | HeadroomAgnoModel(your_model) |
| Strands | Strands guide |
| ASGI apps | app.add_middleware(CompressionMiddleware) |
| Multi-agent | SharedContext().put / .get |
| MCP clients | headroom mcp install |
What's inside
- SmartCrusher β universal JSON: arrays of dicts, nested objects, mixed types.
- CodeCompressor β AST-aware for Python, JS, Go, Rust, Java, C++.
- Kompress-base β our HuggingFace model, trained on agentic traces.
- Image compression β 40β90% reduction via trained ML router.
- CacheAligner β stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.
- IntelligentContext β score-based context fitting with learned importance.
- CCR β reversible compression; LLM retrieves originals on demand.
- Cross-agent memory β shared store, agent provenance, auto-dedup.
- SharedContext β compressed context passing across multi-agent workflows.
headroom learnβ plugin-based failure mining for Claude, Codex, Gemini.
Install
pip install "headroom-ai[all]" # Python, everything
npm install headroom-ai # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest
Granular extras: [proxy], [mcp], [ml] (Kompress-base), [agno], [langchain], [evals]. Requires Python 3.10+.
β Installation guide β Docker tags, persistent service, PowerShell, devcontainers.
Documentation
| Start here | Go deeper |
|---|---|
| Quickstart | Architecture |
| Proxy | How compression works |
| MCP tools | CCR β reversible compression |
| Memory | Cache optimization |
| Failure learning | Benchmarks |
| Configuration | Limitations |
Compared to
Headroom runs locally, covers every content type (not just CLI or text), works with every major framework, and is reversible.
| Scope | Deploy | Local | Reversible | |
|---|---|---|---|---|
| Headroom | All context β tools, RAG, logs, files, history | Proxy Β· library Β· middleware Β· MCP | Yes | Yes |
| RTK | CLI command outputs | CLI wrapper | Yes | No |
| Compresr, Token Co. | Text sent to their API | Hosted API call | No | No |
| OpenAI Compaction | Conversation history | Provider-native | No | No |
Attribution. Headroom ships with the excellent RTK binary for shell-output rewriting β
git showβgit show --short, noisylsβ scoped, chatty installers β summarized. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it.
Contributing
git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest
Devcontainers in .devcontainer/ (default + memory-stack with Qdrant & Neo4j). See CONTRIBUTING.md.
Community
- Live leaderboard β 60B+ tokens saved and counting.
- Discord β questions, feedback, war stories.
- Kompress-base on HuggingFace β the model behind our text compression.
License
Apache 2.0 β see LICENSE.
