Context Cutter
Lightweight, high-performance middleware for token-efficient Tool Calling. Shrink your API overhead by 10x with Lazy Handles.
Ask AI about Context Cutter
Powered by Claude Β· Grounded in docs
I know everything about Context Cutter. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
ContextCutter
Stop feeding entire API responses to your LLM. Give it a handle instead.
When an agent calls a REST API, the full JSON response lands in the context window β even if the agent only needs one field. On a 500-item list, that's 97 KB of tokens consumed to read two values. ContextCutter intercepts those responses before they reach the model, stores them in a fast in-memory store, and returns a compact structural summary (a teaser) plus a deterministic handle ID. The agent then queries only the fields it actually needs.
The result: 86β99% fewer tokens spent on API responses in typical agent workflows.
How it works
βββββββββββ fetch_json_cutted(url) ββββββββββββββββββββ HTTP GET βββββββββββββββ
β Agent β βββββββββββββββββββββββββΊ β ContextCutter β ββββββββββββΊ β Remote API β
β (LLM) β β MCP Server β ββββββββββββ β β
β β βββββββββββββββββββββββββ β (Rust binary) β JSON blob βββββββββββββββ
β β { handle_id, teaser } β β
β β β DashMap store β
β β query_handle(id, path) β (in-memory) β
β β βββββββββββββββββββββββββΊ β β
β β βββββββββββββββββββββββββ β β
βββββββββββ "$.users[0].email" ββββββββββββββββββββ
β "alice@example.com"
Step 1 β fetch: The agent calls fetch_json_cutted(url). The server fetches the URL, stores the full JSON payload, and responds with a teaser (structural summary) and a handle_id.
Step 2 β query: The agent inspects the teaser to understand the shape of the data, then calls query_handle(handle_id, "$.path.to.field") to retrieve only what it needs.
The full payload never enters the context window.
Token savings
Measured against realistic API response shapes:
| Response type | Full payload | Teaser returned | Tokens saved |
|---|---|---|---|
| 10-item paginated list | 2,005 chars | 287 chars | 86% |
| 50-item repo listing | 11,576 chars | 268 chars | 98% |
| 100-item event stream | 21,005 chars | 283 chars | 99% |
| 500-item batch export | 97,465 chars | 261 chars | 100% |
| Deep nested config blob | 19,943 chars | 341 chars | 98% |
Teaser size stays roughly constant (~250β350 chars) regardless of payload size, because it describes structure, not values.
Quickstart
The fastest way to try ContextCutter is with npx β no install required:
npx context-cutter-mcp
Add it to your agent client in under a minute:
OpenCode (~/.config/opencode/config.json):
{
"mcp": {
"context-cutter": {
"type": "local",
"command": "npx",
"args": ["-y", "context-cutter-mcp"]
}
}
}
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"context-cutter": {
"command": "npx",
"args": ["-y", "context-cutter-mcp"]
}
}
}
Once connected, ContextCutter registers two tools with your agent automatically. No prompting or configuration needed β the server describes itself via MCP.
See examples/ for Cursor, VS Code, OpenAI Agents SDK, and LangChain configs.
MCP tool reference
fetch_json_cutted
Fetches a URL, stores the JSON response, and returns a structural teaser.
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | β | HTTPS URL to fetch (required) |
method | string | GET | HTTP method |
headers | object | {} | Additional request headers |
body | any | β | Request body (serialized as JSON) |
timeout_seconds | number | 45 | Request timeout |
Returns: { handle_id: "hdl_<12hex>", teaser: { ... } }
query_handle
Runs a JSONPath expression against a previously stored payload.
| Parameter | Type | Description |
|---|---|---|
handle_id | string | Handle returned by fetch_json_cutted |
json_path | string | JSONPath expression (e.g. $.users[0].email) |
Returns: The matched value(s) as JSON.
Handle IDs are deterministic (SHA-256 of canonicalized JSON) β the same payload always produces the same hdl_<12hex>, making repeated fetches idempotent.
Proxy mode
The fetch-and-query pattern above requires the agent to explicitly call fetch_json_cutted. If you're using an existing MCP server β such as a ClickUp, GitHub, or Notion integration β its tools return large payloads directly, and there's no interception point.
Proxy mode wraps any HTTP MCP server: ContextCutter sits between your agent and the upstream MCP, transparently forwarding tool calls and intercepting large responses before they reach the context window.
βββββββββββ tools/call (any tool) ββββββββββββββββββββ tools/call ββββββββββββββββ
β Agent β βββββββββββββββββββββββββΊ β ContextCutter β ββββββββββββββΊ β Upstream β
β (LLM) β β (proxy mode) β βββββββββββββ β MCP server β
β β βββββββββββββββββββββββββ β β full payload ββββββββββββββββ
β β { handle_id, preview } β intercept if β
β β β β₯ threshold β
β β query_handle(id, path) β β
β β βββββββββββββββββββββββββΊ β β
β β βββββββββββββββββββββββββ β β
βββββββββββ "$.field" ββββββββββββββββββββ
Setup
Replace the upstream MCP entry in your config with context-cutter-mcp --proxy <url>:
Claude Desktop (or any claude_desktop_config.json-style client):
{
"mcpServers": {
"clickup": {
"command": "npx",
"args": ["-y", "context-cutter-mcp", "--proxy", "https://mcp.clickup.com/mcp"]
}
}
}
With auth headers (repeat --proxy-header for multiple headers):
{
"mcpServers": {
"clickup": {
"command": "npx",
"args": [
"-y", "context-cutter-mcp",
"--proxy", "https://mcp.clickup.com/mcp",
"--proxy-header", "Authorization: Bearer $TOKEN"
]
}
}
}
The agent sees all the upstream tools exactly as before, plus query_handle is automatically added.
What the agent sees for large responses
Instead of a raw JSON dump, the agent receives a compact preview:
[context-cutter] Response stored (18.3 KB β handle: hdl_a1b2c3d4e5f6)
Preview:
id: "86cxyz123"
name: "Fix login bug in staging"
status: "in progress"
assignees: Array[2]
description: "Steps to reproduce..." (truncated)
custom_fields: Array[8]
date_created: "1712345678000"
Call query_handle("hdl_a1b2c3d4e5f6", "$.field") to extract specific fields.
Small responses (below --proxy-threshold, default 2 KB) pass through unchanged.
Authentication
For MCPs that require OAuth (e.g. ClickUp), authenticate once through your MCP client's normal auth flow. The token is saved to a file (Claude Code saves to ~/.claude/<server-name>-token). Point the proxy at that file with --proxy-token-file and it handles the rest automatically:
- If the token is valid β used silently
- If the token is missing or expired β browser opens automatically for re-authentication (OAuth 2.0 + PKCE), token saved, proxy continues
{
"clickup": {
"command": "npx",
"args": [
"-y", "context-cutter-mcp",
"--proxy", "https://mcp.clickup.com/mcp",
"--proxy-token-file", "~/.claude/clickup-token"
]
}
}
CLI flags
| Flag | Default | Description |
|---|---|---|
--proxy <url> | β | Upstream HTTP MCP URL to proxy |
--proxy-threshold <bytes> | 2048 | Responses β₯ this size are intercepted |
--proxy-header <Key: Value> | β | Extra header forwarded to upstream (repeatable) |
--proxy-token-file <path> | β | Path to Bearer token file; re-read on every request so token refreshes are automatic. Supports ~ expansion. |
Install
Binary (recommended for production)
Download the pre-built binary for your platform from Releases and place it on PATH:
| Platform | Binary name |
|---|---|
| Linux x86_64 | context-cutter-mcp-x86_64-linux-gnu |
| macOS Intel | context-cutter-mcp-x86_64-apple-darwin |
| macOS Apple Silicon | context-cutter-mcp-aarch64-apple-darwin |
| Windows x86_64 | context-cutter-mcp-x86_64-pc-windows-msvc.exe |
Then point your client at the binary directly instead of using npx.
npx (zero-install)
npx context-cutter-mcp
Downloads the matching GitHub Release binary on first run. Suitable for development and CI.
npm (global install)
npm install -g context-cutter-mcp
context-cutter-mcp
Docker
docker run --rm -i ghcr.io/nikitaclicks/context-cutter-mcp:latest
Build from source
Requires Rust 1.77+:
cargo build --release --bin context-cutter-mcp
./target/release/context-cutter-mcp
Python SDK (optional)
For embedding ContextCutter directly in a Python agent without running a separate process:
pip install context-cutter
from context_cutter import store_response, generate_teaser, query_handle
handle = store_response(api_response_dict)
teaser = generate_teaser(handle) # compact summary for the model
value = query_handle(handle, "$.users[0].email")
The @lazy_handle decorator wraps any function that returns JSON:
from context_cutter import lazy_handle
@lazy_handle
def get_users() -> dict:
return requests.get("https://api.example.com/users").json()
result = get_users()
# result = {"handle_id": "hdl_...", "teaser": {...}}
See CONTRIBUTING.md for full Python SDK documentation.
Configuration
Environment variables for the MCP server:
| Variable | Default | Description |
|---|---|---|
CONTEXT_CUTTER_MAX_HANDLES | 1000 | Max payloads held in the LRU store |
CONTEXT_CUTTER_TTL_SECS | 3600 | Seconds before a handle expires |
CONTEXT_CUTTER_MAX_PAYLOAD_BYTES | 10485760 | Max accepted response size (10 MB) |
CONTEXT_CUTTER_LOG_FORMAT | plain | plain or json structured logs |
RUST_LOG | info | Tracing filter (e.g. debug, trace) |
Proxy mode CLI flags are documented in Proxy mode β CLI flags above.
Security
- HTTPS-only URL fetching (SSRF hardening β
http://is rejected) - Null-byte rejection on all string inputs
- JSONPath expressions capped at 4096 characters
- Payload size enforced before storing (
MAX_PAYLOAD_BYTES) - No credentials stored β headers are not persisted with payloads
Performance
Operation latencies (median, on commodity hardware):
| Operation | Median latency |
|---|---|
generate_teaser (medium payload) | 35 Β΅s |
store_response (small payload) | 64 Β΅s |
query_handle (wildcard path) | 94 Β΅s |
Throughput: ~10,000β27,000 operations/second per operation type.
Prior art & related work
The problem of tool-result context bloat is well-recognized across the AI engineering community and is being addressed from several directions. The table below situates ContextCutter among the most relevant approaches at the mechanism level.
Comparison with Anthropic's built-in mitigations
| Approach | Who executes filtering | Model must write code? | Requires sandbox? | Scope |
|---|---|---|---|---|
| ContextCutter | Rust MCP proxy β intercepts before the model sees anything | No | No | Any HTTPS JSON API |
| Programmatic Tool Calling (Nov 2025) | Model writes Python; runs in Anthropic's Code Execution sandbox | Yes | Yes | Any tool registered with allowed_callers |
| Web Search Dynamic Filtering (Feb 2026) | Model writes Python; runs in Anthropic's Code Execution sandbox | Yes | Yes | Web search / web fetch tools only |
| Tool Search Tool (Nov 2025) | Host-side deferred loading | No | No | Tool schema definitions β a different problem |
Programmatic Tool Calling and Dynamic Filtering pursue the same goal β keeping intermediate data out of the context window β by letting the model generate filtering code executed in a sandboxed environment. Anthropic reports a 37% token reduction (PTC on complex research tasks) and a 24% token reduction with 11% accuracy improvement (Dynamic Filtering on web search benchmarks). ContextCutter achieves 86β99% savings by intercepting at the transport layer before any model inference, with no code generation or sandbox dependency.
The Tool Search Tool addresses a complementary but distinct problem: schema-level bloat from large tool libraries (one measured case: 106 MySQL tools β 54,600 tokens of schema before a single query [Layered.dev, 2026]). ContextCutter and Tool Search Tool can be used together.
Research context
-
SUPO β Summarization-augmented Policy Optimization (ICLR 2026, under review): trains LLM agents via RL to periodically compress tool-use history with LLM-generated summaries, enabling long-horizon tasks beyond a fixed context limit. Related problem (context overflow from sequential tool results) but a learned, fine-tuning-based approach rather than a deterministic proxy. [arXiv preprint]
-
NormCode (arXiv 2512.10563, Dec 2025): a semi-formal language for context-isolated AI planning where each step receives only explicitly-passed inputs, eliminating cross-step contamination by construction. Operates at the workflow-language level rather than the transport layer. [arXiv]
-
Unified Tool Integration for LLMs (arXiv 2508.02979, Aug 2025): a protocol-agnostic function-calling framework with automated schema generation and dual-mode concurrent execution, reporting 60β80% code reduction across integration scenarios. [arXiv]
Development
# Rust
cargo test
cargo clippy -- -D warnings
cargo fmt --check
# Python SDK
pip install -e ".[dev]"
maturin develop --features python
pytest -m "not ai_e2e_live"
# Benchmarks
pytest -m benchmark --benchmark-json benchmark.json
See CONTRIBUTING.md for the full contributor workflow and architecture notes.
Project layout
src/
engine.rs Pure Rust: handle ID, store, teaser, JSONPath query
store.rs Bounded in-memory store (TTL + LRU eviction)
parser.rs Teaser generation and JSONPath helpers
lib.rs Optional PyO3 bindings (--features python)
bin/mcp.rs MCP stdio server binary
python/context_cutter/
core.py store_response, generate_teaser, query_path
interceptor.py @lazy_handle decorator
store.py BaseStore, InMemoryStore, RedisStore
tools.py generate_tool_manifest (OpenAI-style schemas)
examples/
opencode.md Full OpenCode walkthrough with session transcript
claude-desktop.md Claude Desktop showcase
openai-agents-sdk.py
langchain_mcp.md
License
MIT. See LICENSE.
