📦

Refcache

Reference-based caching for FastMCP servers with namespace isolation, access control, and private computation support.

0 installs

Trust: 34 — Low

Blockchain

Ask AI about Refcache

I know everything about Refcache. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

mcp-refcache

Reference-based caching for FastMCP servers with namespace isolation, access control, and private computation support.

Overview

mcp-refcache is a caching library designed for FastMCP servers that solves critical challenges when building AI agent systems:

Context Explosion Prevention - Large API responses are stored by reference, returning only previews to agents
Private Computation - Agents can use values in computations without ever seeing the actual data
Namespace Isolation - Separate caches for public data, user sessions, and custom scopes
Access Control - Fine-grained permissions for both users and agents (CRUD + Execute)
Cross-Tool Data Flow - References act as a "data bus" between tools without exposing values

Backends: Memory (default), SQLite (persistent, cross-process), Redis (distributed, multi-server)

Token Counting: Built-in support for tiktoken (OpenAI models) and HuggingFace tokenizers for accurate preview sizing

The Problem

When an AI agent calls a tool that returns a large dataset (e.g., 500KB JSON), the entire response goes into the agent's context window, causing:

Token explosion - Expensive and hits context limits
Distraction - Agent gets overwhelmed with data it doesn't need
Security risks - Sensitive data exposed in conversation history

The Solution

# Instead of returning 500KB of data...
{"users": [{"id": 1, "name": "...", ...}, ... 10000 more ...]}

# mcp-refcache returns a reference + preview
{
    "ref_id": "a1b2c3",
    "preview": "[User(id=1), User(id=2), ... and 9998 more]",
    "total_items": 10000,
    "namespace": "session:abc123"
}

The agent can then:

Paginate through the data as needed
Pass the reference to another tool (server resolves it automatically)
Control preview size at server, tool, or per-call level
Use without seeing - Execute permission enables blind computation

Showcase

https://github.com/user-attachments/assets/f084212b-ede3-40aa-b306-833ebffe3bf8

Cross-server caching demo: Generate 1000 primes → paginate → pass ref_id to another server for analysis → transform and analyze the result. All without flooding the agent's context.

Installation

# Core library (memory backend)
uv add mcp-refcache

# With Redis backend
uv add "mcp-refcache[redis]"

# With FastMCP integration (cache management tools)
uv add "mcp-refcache[mcp]"

# With SQLite backend (persistent, cross-tool sharing)
# No extra install needed - SQLite is in Python stdlib!

# Everything
uv add "mcp-refcache[all]"

Repository Structure

mcp-refcache/
├── src/mcp_refcache/     # Main library code
├── tests/                # Test suite (80%+ coverage)
├── examples/             # Git submodules with demos (optional)
│   ├── BundesMCP/       # Government API server example
│   ├── finquant-mcp/    # Financial data server example
│   └── fastmcp-template/ # Template for new servers
└── docs/                # Additional documentation

Note: Example servers are managed as git submodules and treated as references by default. They are not included in the PyPI package and are intentionally decoupled from core library releases.

Using Examples

Examples are linked as references in this repository and are not installed with pip. Use them for discovery, architecture patterns, and integration examples, while shipping core fixes from mcp-refcache independently.

Pragmatic Submodule Policy

Keep example servers as submodules for discoverability and separation of concerns.
Prefer dedicated feature branches inside submodules when synchronized updates are needed.
Avoid blocking mcp-refcache patch releases on submodule implementation timelines.
Treat submodule pointers as optional coordination artifacts, not required for core package validation.

Submodule Cleanup Status

Repository policy is now explicit: reference-by-default for example servers.
Core testing and release validation remain centered on the Python package under packages/python/.
Submodule edits are coordinated separately unless explicitly required for a given release.

Quick Start

from fastmcp import FastMCP
from mcp_refcache import RefCache, Namespace, Permission

# Create cache with namespaces
cache = RefCache(
    namespaces=[
        Namespace.PUBLIC,
        Namespace.session("conv-123"),
        Namespace.user("user-456"),
    ]
)

mcp = FastMCP("MyServer")

@mcp.tool()
@cache.cached(namespace="session:conv-123")
async def get_large_dataset(query: str) -> dict:
    """Returns large dataset - agent sees only preview."""
    return await fetch_huge_data(query)  # 500KB response

@mcp.tool()
async def process_data(data_ref: str) -> dict:
    """Process data by reference - agent never sees raw data."""
    # Server resolves reference, agent only passed ref_id
    data = cache.resolve(data_ref)
    return {"processed": len(data["items"])}

Core Concepts

Preview Size Control

Preview size can be configured at three levels (highest priority first):

from mcp_refcache import RefCache, PreviewConfig

# Level 1: Server default (lowest priority)
cache = RefCache(
    preview_config=PreviewConfig(max_size=1024)  # tokens or chars
)

# Level 2: Per-tool (medium priority)
@cache.cached(max_size=500)  # Override for this tool
async def generate_large_data(...):
    ...

# Level 3: Per-call (highest priority)
response = cache.get(ref_id, max_size=100)  # Override for this call
# Or via tool:
get_cached_result(ref_id, max_size=100)

This hierarchy allows:

Server admins to set sensible defaults
Tool authors to specify appropriate limits per tool
Agents to request smaller/larger previews as needed

Namespaces

Namespaces provide isolation and scoping for cached values:

Namespace	Scope	Typical TTL	Use Case
`public`	Global, shared	Long (hours/days)	API responses, static data
`session:<id>`	Single conversation	Short (minutes)	Conversation context
`user:<id>`	User across sessions	Medium (hours)	User preferences, history
`user:<id>:session:<id>`	User's specific session	Short	Session-specific user data
`org:<id>`	Organization	Long	Shared org resources
`custom:<name>`	Arbitrary	Configurable	Project-specific needs

Permission Model

from mcp_refcache import Permission, AccessPolicy

# Permission flags (can be combined with |)
Permission.READ      # Resolve reference to see value
Permission.WRITE     # Create new references
Permission.UPDATE    # Modify existing cached values
Permission.DELETE    # Remove/invalidate references
Permission.EXECUTE   # Use value in computation WITHOUT seeing it!
Permission.CRUD      # READ | WRITE | UPDATE | DELETE
Permission.FULL      # CRUD | EXECUTE

The EXECUTE permission enables private computation - agents can use values without reading them.

Access Control

The access control system supports multiple layers:

from mcp_refcache import AccessPolicy, DefaultActor, Permission

# Role-based defaults (backwards compatible)
policy = AccessPolicy(
    user_permissions=Permission.FULL,
    agent_permissions=Permission.READ | Permission.EXECUTE,
)

# With ownership - owner gets special permissions
policy = AccessPolicy(
    user_permissions=Permission.READ,
    owner="user:alice",
    owner_permissions=Permission.FULL,
)

# With explicit allow/deny lists
policy = AccessPolicy(
    user_permissions=Permission.FULL,
    denied_actors=frozenset({"agent:untrusted-*"}),
    allowed_actors=frozenset({"agent:trusted-service"}),
)

# Session binding - lock to specific session
policy = AccessPolicy(
    user_permissions=Permission.FULL,
    bound_session="session-abc123",
)

Identity-Aware Actors

Actors represent users, agents, or system processes with optional identity:

from mcp_refcache import DefaultActor

# Anonymous actors (backwards compatible with "user"/"agent" strings)
user = DefaultActor.user()
agent = DefaultActor.agent()

# Identified actors
alice = DefaultActor.user(id="alice", session_id="sess-123")
claude = DefaultActor.agent(id="claude-instance-1")

# Pattern matching for ACLs
alice.matches("user:alice")  # True
alice.matches("user:*")      # True (wildcard)
claude.matches("agent:claude-*")  # True (glob pattern)

Private Computation

Agents can orchestrate computations on sensitive data without accessing it:

# Store with EXECUTE-only for agents
cache.set(
    "user_secrets",
    {"ssn": "123-45-6789"},
    policy=AccessPolicy(
        user_permissions=Permission.FULL,
        agent_permissions=Permission.EXECUTE,  # Can use, can't see!
    )
)

# Tool resolves reference server-side
@mcp.tool()
def validate_identity(secrets_ref: str) -> bool:
    secrets = cache.resolve(secrets_ref)  # Server sees value
    return verify_ssn(secrets["ssn"])     # Agent never sees it

Backends

mcp-refcache supports multiple storage backends for different deployment scenarios:

Memory Backend (Default)

In-memory caching for testing and simple single-process use cases:

from mcp_refcache import RefCache
from mcp_refcache.backends import MemoryBackend

cache = RefCache(
    name="my-cache",
    backend=MemoryBackend(),  # Default if not specified
)

Use when: Testing, simple scripts, single-process applications.

SQLite Backend

Persistent caching with zero external dependencies. Enables cross-tool reference sharing between multiple MCP servers on the same machine:

from mcp_refcache import RefCache
from mcp_refcache.backends import SQLiteBackend

# Default path: ~/.cache/mcp-refcache/cache.db
cache = RefCache(
    name="my-cache",
    backend=SQLiteBackend(),
)

# Custom path
cache = RefCache(
    name="my-cache",
    backend=SQLiteBackend("/path/to/cache.db"),
)

# Or via environment variable
# export MCP_REFCACHE_DB_PATH=/path/to/cache.db

Features:

WAL mode for concurrent access
Thread-safe with connection-per-thread model
Cross-process reference sharing
XDG-compliant default path
Zero external dependencies (SQLite is in stdlib)

Use when: Single-machine deployments, multiple MCP servers sharing cache, persistent cache across restarts.

Redis Backend

Distributed caching for multi-user, multi-machine scenarios:

from mcp_refcache import RefCache
from mcp_refcache.backends import RedisBackend

# Connect to Redis/Valkey
cache = RefCache(
    name="my-cache",
    backend=RedisBackend(
        host="localhost",
        port=6379,
        password="your-password",  # Optional
    ),
)

# Or via URL
cache = RefCache(
    name="my-cache",
    backend=RedisBackend(url="redis://:password@localhost:6379/0"),
)

Features:

Valkey/Redis compatible
Native TTL via Redis expiration
Connection pooling for thread safety
Cross-server reference sharing
Horizontal scaling ready

Use when: Multi-user deployments, distributed systems, Docker/Kubernetes environments.

Docker Deployment Example

See examples/redis-docker/ for a complete Docker Compose setup with:

Valkey (Redis-compatible) server
Two MCP servers sharing the cache
Health checks and proper dependencies

# Start the stack
cd examples/redis-docker
docker compose up -d

# Zed IDE configuration
# Add to .zed/settings.json:
{
  "context_servers": {
    "redis-calculator": {
      "command": "npx",
      "args": ["mcp-remote", "http://localhost:8001/sse"]
    },
    "redis-data-analysis": {
      "command": "npx",
      "args": ["mcp-remote", "http://localhost:8002/sse"]
    }
  }
}

Cross-tool workflow:

redis-calculator: generate_primes(50) → returns ref_id
redis-data-analysis: analyze_data(ref_id) → resolves from shared Redis cache
Both servers see the same cached data!

API Reference

RefCache

cache = RefCache(
    name="my-cache",
    backend="memory",              # or "redis"
    default_namespace="public",
    default_ttl=3600,              # seconds
    max_size=10000,                # max entries
    preview_length=500,            # chars for preview
)

Decorators

@cache.cached(
    namespace="session:123",
    ttl=300,
    policy=AccessPolicy(...),
    preview_type="summary",        # or "truncate", "sample"
)
async def my_tool(...): ...

The @cache.cached() Decorator

The decorator provides full MCP tool integration:

@mcp.tool
@cache.cached(
    namespace="data",        # Namespace for isolation
    max_size=500,            # Per-tool preview size limit
    ttl=3600,                # TTL in seconds
    resolve_refs=True,       # Auto-resolve ref_ids in inputs
)
async def process_data(data: list[int]) -> list[float]:
    """Process data - accepts ref_ids, returns structured response."""
    return [x * 1.5 for x in data]

# Agent can call with ref_id from previous tool:
# process_data(data="calculator:abc123")
# Decorator resolves ref_id → actual list before execution

Features:

Pre-execution: Recursively resolves ref_ids in all inputs
Post-execution: Returns structured response with ref_id
Size-based: Small results return full value, large return preview
Doc injection: Adds caching info to tool docstrings automatically

Roadmap

v0.1.0 (Current)

v0.2.0 (Planned)

MCP template (cookiecutter/copier for new servers)
Time series backend (InfluxDB, TimescaleDB for financial data)
Redis Cluster/Sentinel support
Metrics/observability hooks (Prometheus, OpenTelemetry)
Reference metadata (tags, descriptions)
Audit logging (who accessed what, when)

v0.3.0

Lazy evaluation (compute-on-first-access references)
Derived references (ref.field.subfield access)
Encryption at rest for sensitive values
Reference aliasing (human-readable names)
Webhooks/events (notify on access, expiry)
Distributed locking (Redis)

Future

Schema validation for cached values
Import/export for backup and migration
Rate limiting per reference
Compression for large values
Multi-region Redis support

Development

# Install dependencies
uv sync

# Enter nix dev shell (optional, recommended)
nix develop

# Run tests
uv run pytest --cov

# Lint and format
uv run ruff check .
uv run ruff format .

# Type check
uv run mypy src/

IDE Setup (Zed)

The project includes Zed IDE configuration in .zed/settings.json with:

Pyright LSP with strict type checking
Ruff for format-on-save
MCP Context Servers for AI-assisted development:
- mcp-nixos - NixOS/Home Manager options lookup
- pypi-query-mcp-server - PyPI package intelligence
- context7 - Up-to-date framework documentation

To use the MCP servers, ensure you have uvx and npx available (included in the nix dev shell).

Integration with FastMCP Caching Middleware

mcp-refcache is complementary to FastMCP's built-in ResponseCachingMiddleware:

Feature	FastMCP Middleware	mcp-refcache
Purpose	Reduce API calls (TTL cache)	Manage context & permissions
Returns	Full cached response	Reference + preview
Pagination	❌	✅
Access Control	❌	✅ (User + Agent)
Private Compute	❌	✅ (EXECUTE permission)
Namespaces	❌	✅

Use both together:

FastMCP middleware: Cache expensive API calls
mcp-refcache: Manage what agents see and can do

Project Status

Current Version: 0.1.0

The core API is stable and ready for use. We're working toward a 1.0.0 release with additional features.

Stability: Core caching and access control features are stable. Preview strategies and FastMCP integration are production-ready.

Production Use: Suitable for production use. Pin to a specific version and review changes carefully when upgrading.

Roadmap

See the Roadmap section above for planned features in upcoming releases.

Support

PyPI: pypi.org/project/mcp-refcache
Contributing: See CONTRIBUTING.md for guidelines

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! Please read CONTRIBUTING.md for guidelines.

See CONTRIBUTING.md for detailed guidelines.

# Install for development
uv sync

# Run tests
uv run pytest --cov

# Lint and format
uv run ruff check . --fix
uv run ruff format .

Code Quality Standards

Test Coverage: Minimum 80% (currently meeting this requirement)
Type Safety: Full type annotations with mypy strict mode
Code Style: Ruff for linting and formatting (PEP 8 compliant)
Documentation: Docstrings for all public APIs (Google style)