📦

Engram Go

Self-hosted MCP memory server — 35 tools for persistent context across AI coding sessions. Runs on Docker with PostgreSQL + pgvector + Ollama.

0 installs

Trust: 34 — Low

Rag

Ask AI about Engram Go

I know everything about Engram Go. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Engram — Persistent Memory for AI Agents

Every time you close your AI coding session, it forgets everything. The JWT library you chose. The expiry bug you spent an afternoon on. The pattern you explicitly rejected. Gone. Next session, the agent starts from zero and you start explaining.

# Session start — before touching any code
memory_recall("session handoff recent decisions", project="myapp")

# After settling on a technical choice
memory_store(
    "Chose RS256 over HS256: the API gateway needs to verify tokens without
     holding the signing secret. HS256 would require distributing the key to
     every service. Do not change this without updating the gateway config.",
    memory_type="decision",
    project="myapp"
)

Local-First by Default

Your memories stay on your machine.

Engram stores everything locally by design. Your PostgreSQL keeps every memory. Embeddings run locally via Ollama. Nothing leaves your machine unless you explicitly send it. Choose your setup:

100% Local — Ollama handles both embeddings and summarization. Zero external dependencies. Start with docker-compose.local.yml.
Hybrid — Use a LiteLLM proxy for advanced models (Qwen, Claude). Default docker-compose.yml. Requires LITELLM_URL in .env.

Both setups share the same PostgreSQL backend, API contract, and tool set. Swap profiles without data loss or schema migration.

make init && make up && make setup
# Done. Memory server at localhost:8788 (default: hybrid with LiteLLM).

For local-only, use: docker compose -f docker-compose.local.yml up -d

What makes this different

Finds what you mean, not just what you typed. BM25 keyword search and 1024-dimensional semantic vectors run simultaneously. Searching "database lock timeout" finds your note about "WAL mode contention under load" — no shared words, close meaning. When Ollama is unavailable, search degrades gracefully to BM25+recency. Your results never disappear because an external service went down.

Weights by recency automatically. Exponential decay at 1% per hour. Yesterday's decision outranks one from six months ago. Nothing is deleted; old memories step back. Six-month-old memories are still there if nothing more recent matches.

Surfaces connected memories without being asked. A knowledge graph links decisions to the bugs they caused and the patterns they require. Recall one; get its neighborhood. Store a bug report, store the architectural pattern that caused it, connect them with a causes edge. Now any query about the pattern automatically surfaces the bug — you don't have to remember to ask for both.

Stores documents, not just notes. memory_store_document handles up to 500,000 characters. Engram chunks at sentence boundaries and embeds each chunk independently. A 20,000-word architecture document is searchable at the paragraph level — a query about authentication surfaces the auth section, not the whole document.

Quick Start

Prerequisites

Docker Engine 20.10+ and Docker Compose 2.0+ — check with docker --version and docker compose version
Go 1.25+ — check with go version; download from https://go.dev/dl/
4 GB RAM free — Ollama keeps the embedding model in memory
2 GB disk — PostgreSQL volume + Ollama model download (both cached on restart)

Initialize and Start

git clone https://github.com/petersimmons1972/engram-go.git && cd engram-go
make init     # generates POSTGRES_PASSWORD, ENGRAM_API_KEY, creates volumes
make up       # starts postgres, engram-go, and external LiteLLM (requires LITELLM_URL)
make setup    # writes bearer token to ~/.claude/mcp_servers.json

For local-only setup (Ollama-only, no LiteLLM required):

docker compose -f docker-compose.local.yml up -d
make setup

Both setups expose the server at http://localhost:8788. Cold start: ~200ms. Idle memory: 18 MB.

Environment Configuration

If you prefer to author .env manually rather than using make init, see .env.example for all available variables, defaults, and descriptions. Key variables:

Variable	Purpose	Default
`POSTGRES_PASSWORD`	PostgreSQL auth	(generated by make init)
`ENGRAM_API_KEY`	Bearer token for MCP	(generated by make init)
`ENGRAM_PORT`	Server listen port	8788
`LITELLM_URL`	LiteLLM proxy endpoint	`http://litellm:4000`
`ENGRAM_EMBED_MODEL`	Embedding model name	`qwen3-embedding:8b` (LiteLLM) or `mxbai-embed-large` (Ollama)
`ENGRAM_EMBED_URL`	Embedding service endpoint	Inherits from `LITELLM_URL`
`ANTHROPIC_API_KEY`	Claude API key (optional)	(empty)

Connect to Claude Code

After make setup:

/mcp

All 38 core tools activate immediately. Five optional AI-enhanced tools (memory_ask, memory_reason, memory_explore, memory_query_document, memory_diagnose) activate when ANTHROPIC_API_KEY is set in .env.

Important: RFC1918 and `/setup-token`

When Engram is in Docker (both profiles), it automatically accepts /setup-token requests from the Docker bridge (RFC1918). If you run Engram outside Docker and need to call /setup-token from a LAN address, add to .env:

ENGRAM_SETUP_TOKEN_ALLOW_RFC1918=1

Without this, /setup-token only accepts loopback (127.0.0.1 / ::1). See Operations → HTTP Endpoints for the full endpoint contract.

Configuration & Deployment

Topic	Where to Read
Detailed install steps, GPU setup, troubleshooting	Getting Started
Bearer-token destination, key rotation, and stash recovery	Operations
`/setup-token` endpoint contract, health checks, diagnostics	Operations → HTTP Endpoints
Local-first vs hybrid (LiteLLM), personal infra notes	Deployment Notes
Backup, security model, data portability, RFC1918 setup	Operations
What each command-line binary does, who runs it, when	cmd/README.md

Read-Only Tool Permissions (Claude Code)

Engram's read-side tools (memory_recall, memory_fetch, memory_query, memory_list, memory_status, memory_history, memory_timeline, memory_projects, audit/episode listings, constraint checks, memory_diagnose) carry the MCP ReadOnlyHint: true annotation. Claude Code's plan mode treats them as safe to invoke without a permission prompt.

Mutating tools (memory_store, memory_correct, memory_forget, memory_consolidate, memory_delete_project, etc.) intentionally prompt for permission on first use. To allow all engram tools without further prompts, add to ~/.claude/settings.json:

{
  "permissions": {
    "allow": ["mcp__engram__*"]
  }
}

For a narrower allowlist, copy the permissions.allow snippet logged by engram at startup — it lists only the read-only tool names.

Architecture

Engram Architecture

Your AI client speaks MCP over SSE. Engram exposes 43 tools — 38 run entirely locally (store, recall, connect, correct, episode management, cross-project federation, aggregate analysis, decay audit, adaptive weight tuning, embedding evaluation, and more) plus 5 optional AI-enhanced tools that activate when ANTHROPIC_API_KEY is set. PostgreSQL with pgvector stores everything. Ollama (local) runs the embeddings.

New in v3

RAG Queries: `memory_ask`

Ask a natural-language question against your stored memories. Returns a synthesized answer with citations — not a list of chunks, a direct response.

memory_ask(
    question="What did we decide about authentication and why?",
    project="myapp"
)
# → "You chose RS256 JWT with 24h expiry stored in httpOnly cookies.
#    The decision was driven by the need to verify tokens in the API gateway
#    without distributing the signing secret. localStorage was explicitly
#    rejected due to XSS risk. (memories: auth-decision-001, security-pattern-003)"

Document Storage: `memory_store_document`

Store up to 500,000 characters — architecture documents, meeting transcripts, entire codebases. Engram chunks at sentence boundaries and makes every paragraph individually searchable.

memory_store_document(
    content=entire_architecture_document,  # up to 500k chars
    memory_type="architecture",
    project="myapp"
)
# Later: search surfaces the specific section, not the whole document
memory_query_document(doc_id=doc_id, query="authentication flow")

Aggregate Queries

Query the shape of your memory store without reading individual memories.

memory_aggregate(by="memory_type", project="myapp")
# → [{label: "decision", count: 47}, {label: "error", count: 12}, ...]

memory_aggregate(by="failure_class")
# → [{label: "vocabulary_mismatch", count: 8}, {label: "stale_ranking", count: 3}, ...]

Retrieval Miss Tracking

When memory_recall returns nothing useful, log the miss with a failure class. This feeds the retrieval quality benchmark and makes future recall better.

memory_feedback(
    event_id="<id from recall>",
    memory_ids=[],
    failure_class="vocabulary_mismatch"  # or: aggregation_failure, stale_ranking,
                                          #     missing_content, scope_mismatch, other
)

New in v3.1

Decay Audit System

Track retrieval drift over time with canonical query snapshots. Register a set of reference queries, run them on a schedule, and measure how results shift between runs using RBO (rank-biased overlap) and Jaccard similarity. Alerts when drift exceeds threshold.

# Register a reference query
memory_audit_add_query(project="myapp", query="deployment procedures", description="CI/CD runbook recall")

# Take a snapshot and see drift vs previous run
memory_audit_run(project="myapp")
# → [{query: "deployment procedures", rbo_vs_prev: 0.94, additions_count: 1, removals_count: 0, alert: false}]

# Browse snapshot history for a query
memory_audit_compare(query_id="cq-abc123", limit=10)

Five new tools: memory_audit_add_query, memory_audit_list_queries, memory_audit_deactivate_query, memory_audit_run, memory_audit_compare.

Adaptive Weight Tuning

Failure-class events now feed a background tuner that adjusts the four search weights per project. Dominant vocabulary_mismatch events shift weight toward BM25; dominant stale_ranking shifts toward recency. Adjustments fire at most once per 7 days after ≥ 50 failure events, within hard per-weight guardrails.

memory_weight_history(project="myapp")
# → {current_weights: {vector: 0.40, bm25: 0.35, recency: 0.10, precision: 0.15},
#    history: [{applied_at: "...", weights: {...}, trigger_data: "..."}]}

Expanded Relation Types

Knowledge graph now supports 11 typed edges (up from 7):

Type	Meaning
`caused_by`	This memory exists because of that one
`relates_to`	Adjacent context, no causal direction
`depends_on`	This memory requires that one
`supersedes`	This memory replaces that one
`used_in`	This memory is applied in that context
`resolved_by`	Problem resolved by the referenced memory
`contradicts`	Conflict or tension
`supports`	Evidence or reinforcement
`derived_from`	Citation chain — memory derived from another
`part_of`	Hierarchical containment
`follows`	Temporal or sequential ordering

Pluggable Embedder Interface + Eval Tool

Compare any two Ollama embedding models against your actual stored memories before committing to a migration. Auto-pulls models not yet present in Ollama.

# See what models are installed and recommended
memory_models()

# Compare two 1024-dim compatible models on your real queries
memory_embedding_eval(project="myapp", model_a="mxbai-embed-large", model_b="bge-m3")
# → {model_a_stats: {...}, model_b_stats: {...}, overlap_scores: [...], recommendation: "..."}

`other` Failure Class

memory_feedback now accepts failure_class="other" for misses that do not fit the four specific categories. Tracked separately in memory_aggregate(by="failure_class").

Documentation


Why Engram?	The problem with AI agent memory and how Engram solves it
How It Works	Four-signal search, knowledge graph, context efficiency
Getting Started	Install and connect in 5 minutes
Connecting Your IDE	Claude Code, Cursor, VS Code, Windsurf, Claude Desktop
All 43 Tools	MCP tool reference with usage examples
Claude Advisor	AI-powered summarization, consolidation, re-ranking
Operations	Backup, security, data portability
Document Storage Strategy	Four-tier ingest architecture, chunking, retrieval paths

v3.0 vs v2 vs v1

v1 was Python. v2 rewrote in Go. v3.0 adds required authentication, auto-episode starts on every SSE connection, 35 tools, document storage, RAG queries, and aggregate analysis. v3.1 adds decay audit, adaptive weight tuning, expanded relation types, embedder eval, and 8 new tools.

	v1 (Python)	v2 (Go)	v3.0 (Go)	v3.1 (Go)
Container size	200 MB	10 MB	10 MB	10 MB
Cold start	~3 seconds	~200ms	~200ms	~200ms
Idle memory	120 MB	18 MB	18 MB	18 MB
Base image	python:3.12-slim	Chainguard static	Chainguard static	Chainguard static
MCP transport	stdio	SSE	SSE	SSE
Authentication	optional	optional	required	required
Auto-episode	no	no	yes	yes
Tool count	19	19	35 (30 local + 5 AI-enhanced)	43 (38 local + 5 AI-enhanced)
Max memory size	50k chars	50k chars	500k chars	500k chars
Document mode	no	no	yes — chunked at sentence boundaries	yes — chunked at sentence boundaries
RAG queries	no	no	yes — `memory_ask`	yes — `memory_ask`
Aggregate queries	no	no	yes — `memory_aggregate`	yes — `memory_aggregate`
Relation types	—	—	7	11
Decay audit	no	no	no	yes — snapshot-based drift detection
Adaptive weights	no	no	no	yes — failure-class driven, per-project
Embedder eval	no	no	no	yes — compare any two Ollama models
Cloud required	no	no	no	no

Credits

v3.1 features were developed in dialogue with open-brain-template by Myles Bryning. The decay audit concept, supports / derived_from / part_of / follows relation types, and pluggable embedder registry are derived from open-brain-template's architecture, adapted to engram-go's local-only, Ollama-first constraint. The BM25+vector+recency blend, failure-class taxonomy, and knowledge-graph-based retrieval originated in engram-go and were independently incorporated by open-brain-template.

GPL v3 — see LICENSE

Engram Go

Reviews

Documentation

Local-First by Default

What makes this different

Quick Start

Prerequisites

Initialize and Start

Environment Configuration

Connect to Claude Code

Important: RFC1918 and /setup-token

Configuration & Deployment

Read-Only Tool Permissions (Claude Code)

Architecture

New in v3

RAG Queries: memory_ask

Document Storage: memory_store_document

Aggregate Queries

Retrieval Miss Tracking

New in v3.1

Decay Audit System

Adaptive Weight Tuning

Expanded Relation Types

Pluggable Embedder Interface + Eval Tool

other Failure Class

Documentation

v3.0 vs v2 vs v1

Credits

Security Checklist

Important: RFC1918 and `/setup-token`

RAG Queries: `memory_ask`

Document Storage: `memory_store_document`

`other` Failure Class