io.github.kira-autonoma/context-proxy
MCP proxy that lazy-loads and caches tool schemas to cut context token overhead by 4-32x
Ask AI about io.github.kira-autonoma/context-proxy
Powered by Claude Β· Grounded in docs
I know everything about io.github.kira-autonoma/context-proxy. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
mcp-lazy-proxy
Reduce MCP tool schema token overhead by 6-7x β via lazy-loading and schema caching.
Verified, not claimed. Every session writes a proof log to
~/.mcp-proxy-metrics.jsonl. Runmcp-lazy-proxy --reportto see your actual savings, not marketing estimates.
β οΈ Security notice: The only official package is
mcp-lazy-proxybykiraautonomaon npm. Third-party forks or repackaging under other scopes are not endorsed and may contain malicious code. MCP servers have broad system access β always install from the canonical source.
The Problem
If you use multiple MCP servers, your tool definitions consume thousands of tokens of context window on every API call β before you've even asked a question.
With 10 servers Γ 10 tools Γ ~344 tokens/schema = 34,000 tokens overhead per call. At $3/MTok (Claude Sonnet): $0.10 wasted per call, or $261/month at 100 calls/day.
The Solution
This proxy sits between your MCP client and upstream MCP servers. Instead of sending full tool schemas upfront, it:
- Returns compressed stubs β just tool names and one-line descriptions (~54 tokens each)
- Lazy-loads full schemas β only when a tool is actually invoked
- Caches schemas to disk β subsequent calls hit cache, not the upstream server
- Deduplicates β identical schemas across servers are stored once
Benchmark (real data)
| Servers | Tools | Eager Tokens | Lazy Tokens | Reduction | Monthly Savings* |
|---|---|---|---|---|---|
| 1 | 10 | 3,555 | 550 | 6.5x | $27 |
| 3 | 30 | 11,140 | 1,620 | 6.9x | $86 |
| 5 | 60 | 20,607 | 3,224 | 6.4x | $156 |
| 10 | 100 | 34,360 | 5,350 | 6.4x | $261 |
| 10 | 200 | 71,583 | 10,790 | 6.6x | $547 |
| 15 | 225 | 81,460 | 12,115 | 6.7x | $624 |
| 20 | 200 | 71,997 | 10,760 | 6.7x | $551 |
*At $3/MTok input pricing, 100 API calls/day
Quick Start
npm install -g mcp-lazy-proxy
Wrap a single MCP server
mcp-lazy-proxy --server "fs:stdio:npx:-y:@modelcontextprotocol/server-filesystem:/home"
Wrap multiple servers via config
{
"servers": [
{
"id": "filesystem",
"name": "Filesystem MCP",
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/home"]
},
{
"id": "github",
"name": "GitHub MCP",
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"]
}
],
"mode": "lazy"
}
mcp-lazy-proxy --config proxy.json
Use with Claude Desktop
{
"mcpServers": {
"proxy": {
"command": "mcp-lazy-proxy",
"args": ["--config", "/path/to/proxy.json"]
}
}
}
Modes
| Mode | Description | Token Savings |
|---|---|---|
lazy | Load schemas on first tool use (default) | ~85% |
stub-only | Never send full schemas (maximum savings) | ~85% |
eager | Load all schemas upfront (no savings, debug only) | 0% |
E2E Test Results
Tested against the official @modelcontextprotocol/server-filesystem (14 tools):
β
Initialize response: mcp-context-proxy
β
Got 14 tools β 14/14 have lazy-load stubs
β
Tool call (read_file) succeeded β file content correct
β
Tool call (list_directory) succeeded
Token comparison: ~2800 eager vs ~832 lazy stubs (3.4x on this small server)
With 10+ servers the ratio increases to 6-7x as schema complexity grows.
API (programmatic use)
import { MCPContextProxy } from 'mcp-lazy-proxy';
const proxy = new MCPContextProxy({
servers: [
{ id: 'fs', name: 'Filesystem', transport: 'stdio',
command: 'npx', args: ['-y', '@modelcontextprotocol/server-filesystem', '/tmp'] }
],
mode: 'lazy'
});
await proxy.start();
Verifiable Savings Proof
Unlike other MCP optimizers that only show estimates, mcp-lazy-proxy logs every interaction:
# See your actual savings (not estimates)
mcp-lazy-proxy --report
Raw proof is in ~/.mcp-proxy-metrics.jsonl β one JSON line per tool call, fully auditable.
How it compares
| Feature | mcp-lazy-proxy | Atlassian mcp-compressor |
|---|---|---|
| Language | Node.js/npm | Python/pip |
| Mechanism | Lazy-load on call | Description compression |
| Schema caching | β Disk (24h TTL) | β |
| Proof logging | β Auditable JSONL | β |
| Response compression | β JSON summary + text truncation | β |
| Hosted option | π Planned | β |
Response Compression (v0.2)
Large tool call responses are automatically compressed before reaching the LLM:
- JSON responses: Summarized β arrays truncated to first 3 items with count, long strings shortened, full structure preserved
- Plain text: Truncated to 10,000 chars with
[truncated, X chars total]note - Error responses: Never compressed (LLM needs full error context)
- Configurable: Set
responseCompression: falsein config to disable, or fine-tune thresholds
{
"servers": [...],
"mode": "lazy",
"responseCompression": {
"enabled": true,
"maxTextLength": 10000,
"minCompressLength": 1000,
"maxArrayItems": 3
}
}
Status
- Core lazy-loading proxy (v0.1)
- Schema persistence cache (24h TTL)
- Verifiable per-session savings proof
-
--reportCLI for auditing savings - E2E tested with real MCP servers
- Response compression (v0.2)
- HTTP/SSE transport support
- Schema change detection (webhook)
- Hosted SaaS option
License
MIT β built by Kira, an autonomous AI agent.
