Refcache
Reference-based caching for FastMCP servers with namespace isolation, access control, and private computation support.
Ask AI about Refcache
Powered by Claude Β· Grounded in docs
I know everything about Refcache. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
mcp-refcache
Reference-based caching for FastMCP servers with namespace isolation, access control, and private computation support.
Overview
mcp-refcache is a caching library designed for FastMCP servers that solves critical challenges when building AI agent systems:
- Context Explosion Prevention - Large API responses are stored by reference, returning only previews to agents
- Private Computation - Agents can use values in computations without ever seeing the actual data
- Namespace Isolation - Separate caches for public data, user sessions, and custom scopes
- Access Control - Fine-grained permissions for both users and agents (CRUD + Execute)
- Cross-Tool Data Flow - References act as a "data bus" between tools without exposing values
Backends: Memory (default), SQLite (persistent, cross-process), Redis (distributed, multi-server)
Token Counting: Built-in support for tiktoken (OpenAI models) and HuggingFace tokenizers for accurate preview sizing
The Problem
When an AI agent calls a tool that returns a large dataset (e.g., 500KB JSON), the entire response goes into the agent's context window, causing:
- Token explosion - Expensive and hits context limits
- Distraction - Agent gets overwhelmed with data it doesn't need
- Security risks - Sensitive data exposed in conversation history
The Solution
# Instead of returning 500KB of data...
{"users": [{"id": 1, "name": "...", ...}, ... 10000 more ...]}
# mcp-refcache returns a reference + preview
{
"ref_id": "a1b2c3",
"preview": "[User(id=1), User(id=2), ... and 9998 more]",
"total_items": 10000,
"namespace": "session:abc123"
}
The agent can then:
- Paginate through the data as needed
- Pass the reference to another tool (server resolves it automatically)
- Control preview size at server, tool, or per-call level
- Use without seeing - Execute permission enables blind computation
Showcase
https://github.com/user-attachments/assets/f084212b-ede3-40aa-b306-833ebffe3bf8
Cross-server caching demo: Generate 1000 primes β paginate β pass ref_id to another server for analysis β transform and analyze the result. All without flooding the agent's context.
Installation
# Core library (memory backend)
uv add mcp-refcache
# With Redis backend
uv add "mcp-refcache[redis]"
# With FastMCP integration (cache management tools)
uv add "mcp-refcache[mcp]"
# With SQLite backend (persistent, cross-tool sharing)
# No extra install needed - SQLite is in Python stdlib!
# Everything
uv add "mcp-refcache[all]"
Repository Structure
mcp-refcache/
βββ src/mcp_refcache/ # Main library code
βββ tests/ # Test suite (80%+ coverage)
βββ examples/ # Git submodules with demos (optional)
β βββ BundesMCP/ # Government API server example
β βββ finquant-mcp/ # Financial data server example
β βββ fastmcp-template/ # Template for new servers
βββ docs/ # Additional documentation
Note: Examples are git submodules and not included in the PyPI package. They demonstrate real-world usage but are optional.
Using Examples
Examples are included in the source distribution but not installed with pip.
See the examples/ directory in the source code for usage patterns.
Quick Start
from fastmcp import FastMCP
from mcp_refcache import RefCache, Namespace, Permission
# Create cache with namespaces
cache = RefCache(
namespaces=[
Namespace.PUBLIC,
Namespace.session("conv-123"),
Namespace.user("user-456"),
]
)
mcp = FastMCP("MyServer")
@mcp.tool()
@cache.cached(namespace="session:conv-123")
async def get_large_dataset(query: str) -> dict:
"""Returns large dataset - agent sees only preview."""
return await fetch_huge_data(query) # 500KB response
@mcp.tool()
async def process_data(data_ref: str) -> dict:
"""Process data by reference - agent never sees raw data."""
# Server resolves reference, agent only passed ref_id
data = cache.resolve(data_ref)
return {"processed": len(data["items"])}
Core Concepts
Preview Size Control
Preview size can be configured at three levels (highest priority first):
from mcp_refcache import RefCache, PreviewConfig
# Level 1: Server default (lowest priority)
cache = RefCache(
preview_config=PreviewConfig(max_size=1024) # tokens or chars
)
# Level 2: Per-tool (medium priority)
@cache.cached(max_size=500) # Override for this tool
async def generate_large_data(...):
...
# Level 3: Per-call (highest priority)
response = cache.get(ref_id, max_size=100) # Override for this call
# Or via tool:
get_cached_result(ref_id, max_size=100)
This hierarchy allows:
- Server admins to set sensible defaults
- Tool authors to specify appropriate limits per tool
- Agents to request smaller/larger previews as needed
Namespaces
Namespaces provide isolation and scoping for cached values:
| Namespace | Scope | Typical TTL | Use Case |
|---|---|---|---|
public | Global, shared | Long (hours/days) | API responses, static data |
session:<id> | Single conversation | Short (minutes) | Conversation context |
user:<id> | User across sessions | Medium (hours) | User preferences, history |
user:<id>:session:<id> | User's specific session | Short | Session-specific user data |
org:<id> | Organization | Long | Shared org resources |
custom:<name> | Arbitrary | Configurable | Project-specific needs |
Permission Model
from mcp_refcache import Permission, AccessPolicy
# Permission flags (can be combined with |)
Permission.READ # Resolve reference to see value
Permission.WRITE # Create new references
Permission.UPDATE # Modify existing cached values
Permission.DELETE # Remove/invalidate references
Permission.EXECUTE # Use value in computation WITHOUT seeing it!
Permission.CRUD # READ | WRITE | UPDATE | DELETE
Permission.FULL # CRUD | EXECUTE
The EXECUTE permission enables private computation - agents can use values without reading them.
Access Control
The access control system supports multiple layers:
from mcp_refcache import AccessPolicy, DefaultActor, Permission
# Role-based defaults (backwards compatible)
policy = AccessPolicy(
user_permissions=Permission.FULL,
agent_permissions=Permission.READ | Permission.EXECUTE,
)
# With ownership - owner gets special permissions
policy = AccessPolicy(
user_permissions=Permission.READ,
owner="user:alice",
owner_permissions=Permission.FULL,
)
# With explicit allow/deny lists
policy = AccessPolicy(
user_permissions=Permission.FULL,
denied_actors=frozenset({"agent:untrusted-*"}),
allowed_actors=frozenset({"agent:trusted-service"}),
)
# Session binding - lock to specific session
policy = AccessPolicy(
user_permissions=Permission.FULL,
bound_session="session-abc123",
)
Identity-Aware Actors
Actors represent users, agents, or system processes with optional identity:
from mcp_refcache import DefaultActor
# Anonymous actors (backwards compatible with "user"/"agent" strings)
user = DefaultActor.user()
agent = DefaultActor.agent()
# Identified actors
alice = DefaultActor.user(id="alice", session_id="sess-123")
claude = DefaultActor.agent(id="claude-instance-1")
# Pattern matching for ACLs
alice.matches("user:alice") # True
alice.matches("user:*") # True (wildcard)
claude.matches("agent:claude-*") # True (glob pattern)
Private Computation
Agents can orchestrate computations on sensitive data without accessing it:
# Store with EXECUTE-only for agents
cache.set(
"user_secrets",
{"ssn": "123-45-6789"},
policy=AccessPolicy(
user_permissions=Permission.FULL,
agent_permissions=Permission.EXECUTE, # Can use, can't see!
)
)
# Tool resolves reference server-side
@mcp.tool()
def validate_identity(secrets_ref: str) -> bool:
secrets = cache.resolve(secrets_ref) # Server sees value
return verify_ssn(secrets["ssn"]) # Agent never sees it
Backends
mcp-refcache supports multiple storage backends for different deployment scenarios:
Memory Backend (Default)
In-memory caching for testing and simple single-process use cases:
from mcp_refcache import RefCache
from mcp_refcache.backends import MemoryBackend
cache = RefCache(
name="my-cache",
backend=MemoryBackend(), # Default if not specified
)
Use when: Testing, simple scripts, single-process applications.
SQLite Backend
Persistent caching with zero external dependencies. Enables cross-tool reference sharing between multiple MCP servers on the same machine:
from mcp_refcache import RefCache
from mcp_refcache.backends import SQLiteBackend
# Default path: ~/.cache/mcp-refcache/cache.db
cache = RefCache(
name="my-cache",
backend=SQLiteBackend(),
)
# Custom path
cache = RefCache(
name="my-cache",
backend=SQLiteBackend("/path/to/cache.db"),
)
# Or via environment variable
# export MCP_REFCACHE_DB_PATH=/path/to/cache.db
Features:
- WAL mode for concurrent access
- Thread-safe with connection-per-thread model
- Cross-process reference sharing
- XDG-compliant default path
- Zero external dependencies (SQLite is in stdlib)
Use when: Single-machine deployments, multiple MCP servers sharing cache, persistent cache across restarts.
Redis Backend
Distributed caching for multi-user, multi-machine scenarios:
from mcp_refcache import RefCache
from mcp_refcache.backends import RedisBackend
# Connect to Redis/Valkey
cache = RefCache(
name="my-cache",
backend=RedisBackend(
host="localhost",
port=6379,
password="your-password", # Optional
),
)
# Or via URL
cache = RefCache(
name="my-cache",
backend=RedisBackend(url="redis://:password@localhost:6379/0"),
)
Features:
- Valkey/Redis compatible
- Native TTL via Redis expiration
- Connection pooling for thread safety
- Cross-server reference sharing
- Horizontal scaling ready
Use when: Multi-user deployments, distributed systems, Docker/Kubernetes environments.
Docker Deployment Example
See examples/redis-docker/ for a complete Docker Compose setup with:
- Valkey (Redis-compatible) server
- Two MCP servers sharing the cache
- Health checks and proper dependencies
# Start the stack
cd examples/redis-docker
docker compose up -d
# Zed IDE configuration
# Add to .zed/settings.json:
{
"context_servers": {
"redis-calculator": {
"command": "npx",
"args": ["mcp-remote", "http://localhost:8001/sse"]
},
"redis-data-analysis": {
"command": "npx",
"args": ["mcp-remote", "http://localhost:8002/sse"]
}
}
}
Cross-tool workflow:
redis-calculator:generate_primes(50)β returnsref_idredis-data-analysis:analyze_data(ref_id)β resolves from shared Redis cache- Both servers see the same cached data!
API Reference
RefCache
cache = RefCache(
name="my-cache",
backend="memory", # or "redis"
default_namespace="public",
default_ttl=3600, # seconds
max_size=10000, # max entries
preview_length=500, # chars for preview
)
Decorators
@cache.cached(
namespace="session:123",
ttl=300,
policy=AccessPolicy(...),
preview_type="summary", # or "truncate", "sample"
)
async def my_tool(...): ...
The @cache.cached() Decorator
The decorator provides full MCP tool integration:
@mcp.tool
@cache.cached(
namespace="data", # Namespace for isolation
max_size=500, # Per-tool preview size limit
ttl=3600, # TTL in seconds
resolve_refs=True, # Auto-resolve ref_ids in inputs
)
async def process_data(data: list[int]) -> list[float]:
"""Process data - accepts ref_ids, returns structured response."""
return [x * 1.5 for x in data]
# Agent can call with ref_id from previous tool:
# process_data(data="calculator:abc123")
# Decorator resolves ref_id β actual list before execution
Features:
- Pre-execution: Recursively resolves ref_ids in all inputs
- Post-execution: Returns structured response with ref_id
- Size-based: Small results return full value, large return preview
- Doc injection: Adds caching info to tool docstrings automatically
Roadmap
v0.1.0 (Current)
- Core reference-based caching with
@cache.cached()decorator - Memory backend (thread-safe, TTL support)
- SQLite backend (persistent, cross-tool sharing, zero dependencies)
- Redis backend (distributed, multi-user, Docker-ready)
- Preview generation (truncate, sample, paginate)
- Namespace isolation (public, session, user, org, custom)
- CRUD + EXECUTE permission model
- Separate user/agent access control
- TTL per namespace
- FastMCP integration with auto-resolve
- Langfuse observability (TracedRefCache)
- Docker deployment example with Valkey
v0.2.0 (Planned)
- MCP template (cookiecutter/copier for new servers)
- Time series backend (InfluxDB, TimescaleDB for financial data)
- Redis Cluster/Sentinel support
- Metrics/observability hooks (Prometheus, OpenTelemetry)
- Reference metadata (tags, descriptions)
- Audit logging (who accessed what, when)
v0.3.0
- Lazy evaluation (compute-on-first-access references)
- Derived references (
ref.field.subfieldaccess) - Encryption at rest for sensitive values
- Reference aliasing (human-readable names)
- Webhooks/events (notify on access, expiry)
- Distributed locking (Redis)
Future
- Schema validation for cached values
- Import/export for backup and migration
- Rate limiting per reference
- Compression for large values
- Multi-region Redis support
Development
# Install dependencies
uv sync
# Enter nix dev shell (optional, recommended)
nix develop
# Run tests
uv run pytest --cov
# Lint and format
uv run ruff check .
uv run ruff format .
# Type check
uv run mypy src/
IDE Setup (Zed)
The project includes Zed IDE configuration in .zed/settings.json with:
- Pyright LSP with strict type checking
- Ruff for format-on-save
- MCP Context Servers for AI-assisted development:
mcp-nixos- NixOS/Home Manager options lookuppypi-query-mcp-server- PyPI package intelligencecontext7- Up-to-date framework documentation
To use the MCP servers, ensure you have uvx and npx available (included in the nix dev shell).
Integration with FastMCP Caching Middleware
mcp-refcache is complementary to FastMCP's built-in ResponseCachingMiddleware:
| Feature | FastMCP Middleware | mcp-refcache |
|---|---|---|
| Purpose | Reduce API calls (TTL cache) | Manage context & permissions |
| Returns | Full cached response | Reference + preview |
| Pagination | β | β |
| Access Control | β | β (User + Agent) |
| Private Compute | β | β (EXECUTE permission) |
| Namespaces | β | β |
Use both together:
- FastMCP middleware: Cache expensive API calls
- mcp-refcache: Manage what agents see and can do
Project Status
Current Version: 0.1.0
The core API is stable and ready for use. We're working toward a 1.0.0 release with additional features.
Stability: Core caching and access control features are stable. Preview strategies and FastMCP integration are production-ready.
Production Use: Suitable for production use. Pin to a specific version and review changes carefully when upgrading.
Roadmap
See the Roadmap section above for planned features in upcoming releases.
Support
- PyPI: pypi.org/project/mcp-refcache
- Contributing: See CONTRIBUTING.md for guidelines
License
MIT License - see LICENSE for details.
Contributing
Contributions welcome! Please read CONTRIBUTING.md for guidelines.
See CONTRIBUTING.md for detailed guidelines.
# Install for development
uv sync
# Run tests
uv run pytest --cov
# Lint and format
uv run ruff check . --fix
uv run ruff format .
Code Quality Standards
- Test Coverage: Minimum 80% (currently meeting this requirement)
- Type Safety: Full type annotations with mypy strict mode
- Code Style: Ruff for linting and formatting (PEP 8 compliant)
- Documentation: Docstrings for all public APIs (Google style)
