Engram Go
Self-hosted MCP memory server β 35 tools for persistent context across AI coding sessions. Runs on Docker with PostgreSQL + pgvector + Ollama.
Ask AI about Engram Go
Powered by Claude Β· Grounded in docs
I know everything about Engram Go. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Every time you close your AI coding session, it forgets everything. The JWT library you chose. The expiry bug you spent an afternoon on. The pattern you explicitly rejected. Gone. Next session, the agent starts from zero and you start explaining.
# Session start β before touching any code
memory_recall("session handoff recent decisions", project="myapp")
# After settling on a technical choice
memory_store(
"Chose RS256 over HS256: the API gateway needs to verify tokens without
holding the signing secret. HS256 would require distributing the key to
every service. Do not change this without updating the gateway config.",
memory_type="decision",
project="myapp"
)
Local-First by Default
Your memories stay on your machine.
Engram stores everything locally by design. Your PostgreSQL keeps every memory. Embeddings run locally via Ollama. Nothing leaves your machine unless you explicitly send it. Choose your setup:
- 100% Local β Ollama handles both embeddings and summarization. Zero external dependencies. Start with
docker-compose.local.yml. - Hybrid β Use a LiteLLM proxy for advanced models (Qwen, Claude). Default
docker-compose.yml. RequiresLITELLM_URLin.env.
Both setups share the same PostgreSQL backend, API contract, and tool set. Swap profiles without data loss or schema migration.
make init && make up && make setup
# Done. Memory server at localhost:8788 (default: hybrid with LiteLLM).
For local-only, use: docker compose -f docker-compose.local.yml up -d
What makes this different
Finds what you mean, not just what you typed. BM25 keyword search and 1024-dimensional semantic vectors run simultaneously. Searching "database lock timeout" finds your note about "WAL mode contention under load" β no shared words, close meaning. When Ollama is unavailable, search degrades gracefully to BM25+recency. Your results never disappear because an external service went down.
Weights by recency automatically. Exponential decay at 1% per hour. Yesterday's decision outranks one from six months ago. Nothing is deleted; old memories step back. Six-month-old memories are still there if nothing more recent matches.
Surfaces connected memories without being asked. A knowledge graph links decisions to the bugs they caused and the patterns they require. Recall one; get its neighborhood. Store a bug report, store the architectural pattern that caused it, connect them with a causes edge. Now any query about the pattern automatically surfaces the bug β you don't have to remember to ask for both.
Stores documents, not just notes. memory_store_document handles up to 500,000 characters. Engram chunks at sentence boundaries and embeds each chunk independently. A 20,000-word architecture document is searchable at the paragraph level β a query about authentication surfaces the auth section, not the whole document.
Quick Start
Prerequisites
- Docker Engine 20.10+ and Docker Compose 2.0+ β check with
docker --versionanddocker compose version - Go 1.25+ β check with
go version; download from https://go.dev/dl/ - 4 GB RAM free β Ollama keeps the embedding model in memory
- 2 GB disk β PostgreSQL volume + Ollama model download (both cached on restart)
Initialize and Start
git clone https://github.com/petersimmons1972/engram-go.git && cd engram-go
make init # generates POSTGRES_PASSWORD, ENGRAM_API_KEY, creates volumes
make up # starts postgres, engram-go, and external LiteLLM (requires LITELLM_URL)
make setup # writes bearer token to ~/.claude/mcp_servers.json
For local-only setup (Ollama-only, no LiteLLM required):
docker compose -f docker-compose.local.yml up -d
make setup
Both setups expose the server at http://localhost:8788. Cold start: ~200ms. Idle memory: 18 MB.
Environment Configuration
If you prefer to author .env manually rather than using make init, see .env.example for all available variables, defaults, and descriptions. Key variables:
| Variable | Purpose | Default |
|---|---|---|
POSTGRES_PASSWORD | PostgreSQL auth | (generated by make init) |
ENGRAM_API_KEY | Bearer token for MCP | (generated by make init) |
ENGRAM_PORT | Server listen port | 8788 |
LITELLM_URL | LiteLLM proxy endpoint | http://litellm:4000 |
ENGRAM_EMBED_MODEL | Embedding model name | qwen3-embedding:8b (LiteLLM) or mxbai-embed-large (Ollama) |
ENGRAM_EMBED_URL | Embedding service endpoint | Inherits from LITELLM_URL |
ANTHROPIC_API_KEY | Claude API key (optional) | (empty) |
Connect to Claude Code
After make setup:
/mcp
All 38 core tools activate immediately. Five optional AI-enhanced tools (memory_ask, memory_reason, memory_explore, memory_query_document, memory_diagnose) activate when ANTHROPIC_API_KEY is set in .env.
Important: RFC1918 and /setup-token
When Engram is in Docker (both profiles), it automatically accepts /setup-token requests from the Docker bridge (RFC1918). If you run Engram outside Docker and need to call /setup-token from a LAN address, add to .env:
ENGRAM_SETUP_TOKEN_ALLOW_RFC1918=1
Without this, /setup-token only accepts loopback (127.0.0.1 / ::1). See Operations β HTTP Endpoints for the full endpoint contract.
Configuration & Deployment
| Topic | Where to Read |
|---|---|
| Detailed install steps, GPU setup, troubleshooting | Getting Started |
| Bearer-token destination, key rotation, and stash recovery | Operations |
/setup-token endpoint contract, health checks, diagnostics | Operations β HTTP Endpoints |
| Local-first vs hybrid (LiteLLM), personal infra notes | Deployment Notes |
| Backup, security model, data portability, RFC1918 setup | Operations |
| What each command-line binary does, who runs it, when | cmd/README.md |
Read-Only Tool Permissions (Claude Code)
Engram's read-side tools (memory_recall, memory_fetch, memory_query, memory_list, memory_status, memory_history, memory_timeline, memory_projects, audit/episode listings, constraint checks, memory_diagnose) carry the MCP ReadOnlyHint: true annotation. Claude Code's plan mode treats them as safe to invoke without a permission prompt.
Mutating tools (memory_store, memory_correct, memory_forget, memory_consolidate, memory_delete_project, etc.) intentionally prompt for permission on first use. To allow all engram tools without further prompts, add to ~/.claude/settings.json:
{
"permissions": {
"allow": ["mcp__engram__*"]
}
}
For a narrower allowlist, copy the permissions.allow snippet logged by engram at startup β it lists only the read-only tool names.
Architecture
Your AI client speaks MCP over SSE. Engram exposes 43 tools β 38 run entirely locally (store, recall, connect, correct, episode management, cross-project federation, aggregate analysis, decay audit, adaptive weight tuning, embedding evaluation, and more) plus 5 optional AI-enhanced tools that activate when ANTHROPIC_API_KEY is set. PostgreSQL with pgvector stores everything. Ollama (local) runs the embeddings.
New in v3
RAG Queries: memory_ask
Ask a natural-language question against your stored memories. Returns a synthesized answer with citations β not a list of chunks, a direct response.
memory_ask(
question="What did we decide about authentication and why?",
project="myapp"
)
# β "You chose RS256 JWT with 24h expiry stored in httpOnly cookies.
# The decision was driven by the need to verify tokens in the API gateway
# without distributing the signing secret. localStorage was explicitly
# rejected due to XSS risk. (memories: auth-decision-001, security-pattern-003)"
Document Storage: memory_store_document
Store up to 500,000 characters β architecture documents, meeting transcripts, entire codebases. Engram chunks at sentence boundaries and makes every paragraph individually searchable.
memory_store_document(
content=entire_architecture_document, # up to 500k chars
memory_type="architecture",
project="myapp"
)
# Later: search surfaces the specific section, not the whole document
memory_query_document(doc_id=doc_id, query="authentication flow")
Aggregate Queries
Query the shape of your memory store without reading individual memories.
memory_aggregate(by="memory_type", project="myapp")
# β [{label: "decision", count: 47}, {label: "error", count: 12}, ...]
memory_aggregate(by="failure_class")
# β [{label: "vocabulary_mismatch", count: 8}, {label: "stale_ranking", count: 3}, ...]
Retrieval Miss Tracking
When memory_recall returns nothing useful, log the miss with a failure class. This feeds the retrieval quality benchmark and makes future recall better.
memory_feedback(
event_id="<id from recall>",
memory_ids=[],
failure_class="vocabulary_mismatch" # or: aggregation_failure, stale_ranking,
# missing_content, scope_mismatch, other
)
New in v3.1
Decay Audit System
Track retrieval drift over time with canonical query snapshots. Register a set of reference queries, run them on a schedule, and measure how results shift between runs using RBO (rank-biased overlap) and Jaccard similarity. Alerts when drift exceeds threshold.
# Register a reference query
memory_audit_add_query(project="myapp", query="deployment procedures", description="CI/CD runbook recall")
# Take a snapshot and see drift vs previous run
memory_audit_run(project="myapp")
# β [{query: "deployment procedures", rbo_vs_prev: 0.94, additions_count: 1, removals_count: 0, alert: false}]
# Browse snapshot history for a query
memory_audit_compare(query_id="cq-abc123", limit=10)
Five new tools: memory_audit_add_query, memory_audit_list_queries, memory_audit_deactivate_query, memory_audit_run, memory_audit_compare.
Adaptive Weight Tuning
Failure-class events now feed a background tuner that adjusts the four search weights per project. Dominant vocabulary_mismatch events shift weight toward BM25; dominant stale_ranking shifts toward recency. Adjustments fire at most once per 7 days after β₯ 50 failure events, within hard per-weight guardrails.
memory_weight_history(project="myapp")
# β {current_weights: {vector: 0.40, bm25: 0.35, recency: 0.10, precision: 0.15},
# history: [{applied_at: "...", weights: {...}, trigger_data: "..."}]}
Expanded Relation Types
Knowledge graph now supports 11 typed edges (up from 7):
| Type | Meaning |
|---|---|
caused_by | This memory exists because of that one |
relates_to | Adjacent context, no causal direction |
depends_on | This memory requires that one |
supersedes | This memory replaces that one |
used_in | This memory is applied in that context |
resolved_by | Problem resolved by the referenced memory |
contradicts | Conflict or tension |
supports | Evidence or reinforcement |
derived_from | Citation chain β memory derived from another |
part_of | Hierarchical containment |
follows | Temporal or sequential ordering |
Pluggable Embedder Interface + Eval Tool
Compare any two Ollama embedding models against your actual stored memories before committing to a migration. Auto-pulls models not yet present in Ollama.
# See what models are installed and recommended
memory_models()
# Compare two 1024-dim compatible models on your real queries
memory_embedding_eval(project="myapp", model_a="mxbai-embed-large", model_b="bge-m3")
# β {model_a_stats: {...}, model_b_stats: {...}, overlap_scores: [...], recommendation: "..."}
other Failure Class
memory_feedback now accepts failure_class="other" for misses that do not fit the four specific categories. Tracked separately in memory_aggregate(by="failure_class").
Documentation
| Why Engram? | The problem with AI agent memory and how Engram solves it |
| How It Works | Four-signal search, knowledge graph, context efficiency |
| Getting Started | Install and connect in 5 minutes |
| Connecting Your IDE | Claude Code, Cursor, VS Code, Windsurf, Claude Desktop |
| All 43 Tools | MCP tool reference with usage examples |
| Claude Advisor | AI-powered summarization, consolidation, re-ranking |
| Operations | Backup, security, data portability |
| Document Storage Strategy | Four-tier ingest architecture, chunking, retrieval paths |
v3.0 vs v2 vs v1
v1 was Python. v2 rewrote in Go. v3.0 adds required authentication, auto-episode starts on every SSE connection, 35 tools, document storage, RAG queries, and aggregate analysis. v3.1 adds decay audit, adaptive weight tuning, expanded relation types, embedder eval, and 8 new tools.
| v1 (Python) | v2 (Go) | v3.0 (Go) | v3.1 (Go) | |
|---|---|---|---|---|
| Container size | 200 MB | 10 MB | 10 MB | 10 MB |
| Cold start | ~3 seconds | ~200ms | ~200ms | ~200ms |
| Idle memory | 120 MB | 18 MB | 18 MB | 18 MB |
| Base image | python:3.12-slim | Chainguard static | Chainguard static | Chainguard static |
| MCP transport | stdio | SSE | SSE | SSE |
| Authentication | optional | optional | required | required |
| Auto-episode | no | no | yes | yes |
| Tool count | 19 | 19 | 35 (30 local + 5 AI-enhanced) | 43 (38 local + 5 AI-enhanced) |
| Max memory size | 50k chars | 50k chars | 500k chars | 500k chars |
| Document mode | no | no | yes β chunked at sentence boundaries | yes β chunked at sentence boundaries |
| RAG queries | no | no | yes β memory_ask | yes β memory_ask |
| Aggregate queries | no | no | yes β memory_aggregate | yes β memory_aggregate |
| Relation types | β | β | 7 | 11 |
| Decay audit | no | no | no | yes β snapshot-based drift detection |
| Adaptive weights | no | no | no | yes β failure-class driven, per-project |
| Embedder eval | no | no | no | yes β compare any two Ollama models |
| Cloud required | no | no | no | no |
Credits
v3.1 features were developed in dialogue with open-brain-template by Myles Bryning. The decay audit concept, supports / derived_from / part_of / follows relation types, and pluggable embedder registry are derived from open-brain-template's architecture, adapted to engram-go's local-only, Ollama-first constraint. The BM25+vector+recency blend, failure-class taxonomy, and knowledge-graph-based retrieval originated in engram-go and were independently incorporated by open-brain-template.
GPL v3 β see LICENSE
