📦

Context Cutter

Lightweight, high-performance middleware for token-efficient Tool Calling. Shrink your API overhead by 10x with Lazy Handles.

0 installs

Trust: 39 — Low

Blockchain

Ask AI about Context Cutter

I know everything about Context Cutter. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

ContextCutter

Stop feeding entire API responses to your LLM. Give it a handle instead.

When an agent calls a REST API, the full JSON response lands in the context window — even if the agent only needs one field. On a 500-item list, that's 97 KB of tokens consumed to read two values. ContextCutter intercepts those responses before they reach the model, stores them in a fast in-memory store, and returns a compact structural summary (a teaser) plus a deterministic handle ID. The agent then queries only the fields it actually needs.

The result: 86–99% fewer tokens spent on API responses in typical agent workflows.

How it works

┌─────────┐   fetch_json_cutted(url)  ┌──────────────────┐   HTTP GET   ┌─────────────┐
│  Agent  │ ────────────────────────► │  ContextCutter   │ ───────────► │  Remote API │
│  (LLM)  │                           │   MCP Server     │ ◄─────────── │             │
│         │ ◄──────────────────────── │  (Rust binary)   │   JSON blob  └─────────────┘
│         │   { handle_id, teaser }   │                  │
│         │                           │  DashMap store   │
│         │   query_handle(id, path)  │  (in-memory)     │
│         │ ────────────────────────► │                  │
│         │ ◄──────────────────────── │                  │
└─────────┘   "$.users[0].email"      └──────────────────┘
              → "alice@example.com"

Step 1 — fetch: The agent calls fetch_json_cutted(url). The server fetches the URL, stores the full JSON payload, and responds with a teaser (structural summary) and a handle_id.

Step 2 — query: The agent inspects the teaser to understand the shape of the data, then calls query_handle(handle_id, "$.path.to.field") to retrieve only what it needs.

The full payload never enters the context window.

Token savings

Measured against realistic API response shapes:

Response type	Full payload	Teaser returned	Tokens saved
10-item paginated list	2,005 chars	287 chars	86%
50-item repo listing	11,576 chars	268 chars	98%
100-item event stream	21,005 chars	283 chars	99%
500-item batch export	97,465 chars	261 chars	100%
Deep nested config blob	19,943 chars	341 chars	98%

Teaser size stays roughly constant (~250–350 chars) regardless of payload size, because it describes structure, not values.

Quickstart

The fastest way to try ContextCutter is with npx — no install required:

npx context-cutter-mcp

Add it to your agent client in under a minute:

OpenCode (~/.config/opencode/config.json):

{
  "mcp": {
    "context-cutter": {
      "type": "local",
      "command": "npx",
      "args": ["-y", "context-cutter-mcp"]
    }
  }
}

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "context-cutter": {
      "command": "npx",
      "args": ["-y", "context-cutter-mcp"]
    }
  }
}

Once connected, ContextCutter registers two tools with your agent automatically. No prompting or configuration needed — the server describes itself via MCP.

See examples/ for Cursor, VS Code, OpenAI Agents SDK, and LangChain configs.

MCP tool reference

`fetch_json_cutted`

Fetches a URL, stores the JSON response, and returns a structural teaser.

Parameter	Type	Default	Description
`url`	string	—	HTTPS URL to fetch (required)
`method`	string	`GET`	HTTP method
`headers`	object	`{}`	Additional request headers
`body`	any	—	Request body (serialized as JSON)
`timeout_seconds`	number	`45`	Request timeout

Returns: { handle_id: "hdl_<12hex>", teaser: { ... } }

`query_handle`

Runs a JSONPath expression against a previously stored payload.

Parameter	Type	Description
`handle_id`	string	Handle returned by `fetch_json_cutted`
`json_path`	string	JSONPath expression (e.g. `$.users[0].email`)

Returns: The matched value(s) as JSON.

Handle IDs are deterministic (SHA-256 of canonicalized JSON) — the same payload always produces the same hdl_<12hex>, making repeated fetches idempotent.

Proxy mode

The fetch-and-query pattern above requires the agent to explicitly call fetch_json_cutted. If you're using an existing MCP server — such as a ClickUp, GitHub, or Notion integration — its tools return large payloads directly, and there's no interception point.

Proxy mode wraps any HTTP MCP server: ContextCutter sits between your agent and the upstream MCP, transparently forwarding tool calls and intercepting large responses before they reach the context window.

┌─────────┐   tools/call (any tool)   ┌──────────────────┐   tools/call   ┌──────────────┐
│  Agent  │ ────────────────────────► │  ContextCutter   │ ─────────────► │  Upstream    │
│  (LLM)  │                           │  (proxy mode)    │ ◄──────────── │  MCP server  │
│         │ ◄──────────────────────── │                  │   full payload └──────────────┘
│         │  { handle_id, preview }   │  intercept if    │
│         │                           │  ≥ threshold     │
│         │   query_handle(id, path)  │                  │
│         │ ────────────────────────► │                  │
│         │ ◄──────────────────────── │                  │
└─────────┘   "$.field"               └──────────────────┘

Setup

Replace the upstream MCP entry in your config with context-cutter-mcp --proxy <url>:

Claude Desktop (or any claude_desktop_config.json-style client):

{
  "mcpServers": {
    "clickup": {
      "command": "npx",
      "args": ["-y", "context-cutter-mcp", "--proxy", "https://mcp.clickup.com/mcp"]
    }
  }
}

With auth headers (repeat --proxy-header for multiple headers):

{
  "mcpServers": {
    "clickup": {
      "command": "npx",
      "args": [
        "-y", "context-cutter-mcp",
        "--proxy", "https://mcp.clickup.com/mcp",
        "--proxy-header", "Authorization: Bearer $TOKEN"
      ]
    }
  }
}

The agent sees all the upstream tools exactly as before, plus query_handle is automatically added.

What the agent sees for large responses

Instead of a raw JSON dump, the agent receives a compact preview:

[context-cutter] Response stored (18.3 KB → handle: hdl_a1b2c3d4e5f6)

Preview:
  id: "86cxyz123"
  name: "Fix login bug in staging"
  status: "in progress"
  assignees: Array[2]
  description: "Steps to reproduce..." (truncated)
  custom_fields: Array[8]
  date_created: "1712345678000"

Call query_handle("hdl_a1b2c3d4e5f6", "$.field") to extract specific fields.

Small responses (below --proxy-threshold, default 2 KB) pass through unchanged.

Authentication

For MCPs that require OAuth (e.g. ClickUp), authenticate once through your MCP client's normal auth flow. The token is saved to a file (Claude Code saves to ~/.claude/<server-name>-token). Point the proxy at that file with --proxy-token-file and it handles the rest automatically:

If the token is valid → used silently
If the token is missing or expired → browser opens automatically for re-authentication (OAuth 2.0 + PKCE), token saved, proxy continues

{
  "clickup": {
    "command": "npx",
    "args": [
      "-y", "context-cutter-mcp",
      "--proxy", "https://mcp.clickup.com/mcp",
      "--proxy-token-file", "~/.claude/clickup-token"
    ]
  }
}

CLI flags

Flag	Default	Description
`--proxy <url>`	—	Upstream HTTP MCP URL to proxy
`--proxy-threshold <bytes>`	`2048`	Responses ≥ this size are intercepted
`--proxy-header <Key: Value>`	—	Extra header forwarded to upstream (repeatable)
`--proxy-token-file <path>`	—	Path to Bearer token file; re-read on every request so token refreshes are automatic. Supports `~` expansion.

Install

Binary (recommended for production)

Download the pre-built binary for your platform from Releases and place it on PATH:

Platform	Binary name
Linux x86_64	`context-cutter-mcp-x86_64-linux-gnu`
macOS Intel	`context-cutter-mcp-x86_64-apple-darwin`
macOS Apple Silicon	`context-cutter-mcp-aarch64-apple-darwin`
Windows x86_64	`context-cutter-mcp-x86_64-pc-windows-msvc.exe`

Then point your client at the binary directly instead of using npx.

npx (zero-install)

npx context-cutter-mcp

Downloads the matching GitHub Release binary on first run. Suitable for development and CI.

npm (global install)

npm install -g context-cutter-mcp
context-cutter-mcp

Docker

docker run --rm -i ghcr.io/nikitaclicks/context-cutter-mcp:latest

Build from source

Requires Rust 1.77+:

cargo build --release --bin context-cutter-mcp
./target/release/context-cutter-mcp

Python SDK (optional)

For embedding ContextCutter directly in a Python agent without running a separate process:

pip install context-cutter

from context_cutter import store_response, generate_teaser, query_handle

handle = store_response(api_response_dict)
teaser = generate_teaser(handle)   # compact summary for the model
value  = query_handle(handle, "$.users[0].email")

The @lazy_handle decorator wraps any function that returns JSON:

from context_cutter import lazy_handle

@lazy_handle
def get_users() -> dict:
    return requests.get("https://api.example.com/users").json()

result = get_users()
# result = {"handle_id": "hdl_...", "teaser": {...}}

See CONTRIBUTING.md for full Python SDK documentation.

Configuration

Environment variables for the MCP server:

Variable	Default	Description
`CONTEXT_CUTTER_MAX_HANDLES`	`1000`	Max payloads held in the LRU store
`CONTEXT_CUTTER_TTL_SECS`	`3600`	Seconds before a handle expires
`CONTEXT_CUTTER_MAX_PAYLOAD_BYTES`	`10485760`	Max accepted response size (10 MB)
`CONTEXT_CUTTER_LOG_FORMAT`	`plain`	`plain` or `json` structured logs
`RUST_LOG`	`info`	Tracing filter (e.g. `debug`, `trace`)

Proxy mode CLI flags are documented in Proxy mode → CLI flags above.

Security

HTTPS-only URL fetching (SSRF hardening — http:// is rejected)
Null-byte rejection on all string inputs
JSONPath expressions capped at 4096 characters
Payload size enforced before storing (MAX_PAYLOAD_BYTES)
No credentials stored — headers are not persisted with payloads

Performance

Operation latencies (median, on commodity hardware):

Operation	Median latency
`generate_teaser` (medium payload)	35 µs
`store_response` (small payload)	64 µs
`query_handle` (wildcard path)	94 µs

Throughput: ~10,000–27,000 operations/second per operation type.

Prior art & related work

The problem of tool-result context bloat is well-recognized across the AI engineering community and is being addressed from several directions. The table below situates ContextCutter among the most relevant approaches at the mechanism level.

Comparison with Anthropic's built-in mitigations

Approach	Who executes filtering	Model must write code?	Requires sandbox?	Scope
ContextCutter	Rust MCP proxy — intercepts before the model sees anything	No	No	Any HTTPS JSON API
Programmatic Tool Calling (Nov 2025)	Model writes Python; runs in Anthropic's Code Execution sandbox	Yes	Yes	Any tool registered with `allowed_callers`
Web Search Dynamic Filtering (Feb 2026)	Model writes Python; runs in Anthropic's Code Execution sandbox	Yes	Yes	Web search / web fetch tools only
Tool Search Tool (Nov 2025)	Host-side deferred loading	No	No	Tool schema definitions — a different problem

Programmatic Tool Calling and Dynamic Filtering pursue the same goal — keeping intermediate data out of the context window — by letting the model generate filtering code executed in a sandboxed environment. Anthropic reports a 37% token reduction (PTC on complex research tasks) and a 24% token reduction with 11% accuracy improvement (Dynamic Filtering on web search benchmarks). ContextCutter achieves 86–99% savings by intercepting at the transport layer before any model inference, with no code generation or sandbox dependency.

The Tool Search Tool addresses a complementary but distinct problem: schema-level bloat from large tool libraries (one measured case: 106 MySQL tools → 54,600 tokens of schema before a single query [Layered.dev, 2026]). ContextCutter and Tool Search Tool can be used together.

Research context

SUPO — Summarization-augmented Policy Optimization (ICLR 2026, under review): trains LLM agents via RL to periodically compress tool-use history with LLM-generated summaries, enabling long-horizon tasks beyond a fixed context limit. Related problem (context overflow from sequential tool results) but a learned, fine-tuning-based approach rather than a deterministic proxy. [arXiv preprint]
NormCode (arXiv 2512.10563, Dec 2025): a semi-formal language for context-isolated AI planning where each step receives only explicitly-passed inputs, eliminating cross-step contamination by construction. Operates at the workflow-language level rather than the transport layer. [arXiv]
Unified Tool Integration for LLMs (arXiv 2508.02979, Aug 2025): a protocol-agnostic function-calling framework with automated schema generation and dual-mode concurrent execution, reporting 60–80% code reduction across integration scenarios. [arXiv]

Development

# Rust
cargo test
cargo clippy -- -D warnings
cargo fmt --check

# Python SDK
pip install -e ".[dev]"
maturin develop --features python
pytest -m "not ai_e2e_live"

# Benchmarks
pytest -m benchmark --benchmark-json benchmark.json

See CONTRIBUTING.md for the full contributor workflow and architecture notes.

Project layout

src/
  engine.rs        Pure Rust: handle ID, store, teaser, JSONPath query
  store.rs         Bounded in-memory store (TTL + LRU eviction)
  parser.rs        Teaser generation and JSONPath helpers
  lib.rs           Optional PyO3 bindings (--features python)
  bin/mcp.rs       MCP stdio server binary
python/context_cutter/
  core.py          store_response, generate_teaser, query_path
  interceptor.py   @lazy_handle decorator
  store.py         BaseStore, InMemoryStore, RedisStore
  tools.py         generate_tool_manifest (OpenAI-style schemas)
examples/
  opencode.md      Full OpenCode walkthrough with session transcript
  claude-desktop.md  Claude Desktop showcase
  openai-agents-sdk.py
  langchain_mcp.md

License

MIT. See LICENSE.