📦

io.github.bolnet/memwright

Embedded memory for AI agents with SQLite FTS5, pgvector, and Neo4j graph search.

0 installs

Trust: 37 — Low

Rag

Ask AI about io.github.bolnet/memwright

I know everything about io.github.bolnet/memwright. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

_{§ 00 · MASTHEAD · FILED UNDER INFRASTRUCTURE · BY SURENDRA SINGH · — FOR PUBLICATION —}

_{ATTESTOR — A MEMORY JOURNAL FOR AGENTIC SYSTEMS · VOL. 02 · REV. 0.1 · EST. 2026 · NEW YORK · MIT}

The memory layer for agent teams.

_{Self‑hosted · Deterministic retrieval · No LLM in the critical path}

_{Problem ↓ · Multi‑Agent ↓ · Pipeline ↓ · Deploy ↓ · Principles ↓ · Reference ↓}

Six stars out of eight hundred light up — Attestor recalls exactly what this moment needs

_{Attestor doesn’t search. It remembers.}

The problem

Agent prototypes don’t survive production. Memory is why. Single agents rediscover the same facts every run. Multi‑agent pipelines are worse — the planner’s decisions never reach the executor, the researcher’s findings never reach the reviewer. Teams paper over it by stuffing giant prompts between agents, burning tokens on stale context. That’s a workaround, not an architecture.

The solution — at a glance

Write · Write · Write · Write ... Read — memories accumulate across sessions, load_context primes the next task

_{Memory accumulates. Load primes. Four writes across Mon–Thu land as persisted memories. Friday morning a fresh Portfolio Planner wakes up to a new task, calls mem.load_context(), and Attestor ranks + dedupes + budget‑fits all four back into the context window. The agent resumes with full continuity — earnings signal, risk cap, prior stance, compliance precedent — not RAG over documents, but the agents’ own history replayed into a fresh context. Zero LLM calls in the critical path.}

Production‑grade memory infrastructure for multi‑agent systems.
_{The memory tier your agents need when they leave your laptop and start running in production.}

_{Namespace isolation · RBAC · Provenance tracking · Temporal correctness · Ranked retrieval · Token budgets — built for orchestrator‑worker and planner‑executor pipelines. Python library, REST API, or containerized service. No SaaS middleman, no per‑seat fees, no vendor lock‑in.}

from attestor import AgentMemory

mem = AgentMemory("./store")
mem.add("Order service uses event sourcing", entity="order-service", tags=["arch"])
mem.recall("how is the order service structured?", budget=2000)

poetry add attestor

_{MIT · Python 3.10–3.14 · Production deploy in one command}

§ Spec Sheet
Storage Roles	Doc · Vector · Graph
Interfaces	Python · REST · MCP
Retrieval Layers	5
RBAC Roles	6
Cloud Targets	Amazon Web Services · Microsoft Azure · Google Cloud Platform
License	MIT

§ 01 — The Problem

_{Why agent prototypes don't survive production}

Agent prototypes don't survive production. Memory is usually why.

Single agents rediscover the same facts every run. Multi‑agent pipelines are worse — the planner's decisions never reach the executor, the researcher's findings never reach the reviewer. Teams end up stuffing giant prompts between agents to paper over the gap. That's not an architecture — that's a workaround.

_{What we hear from teams building agent pipelines:}

We had a planner, a coder, a reviewer, a deployer — four agents in a pipeline. None of them knew what the others learned. We were passing giant prompts between them and burning tokens on stale information.

Without Attestor	With Attestor
01 — Each agent starts blind — no knowledge of what others learned	01 — Shared memory — planner writes, coder reads, reviewer sees both
02 — Giant prompts passed between agents burn context tokens	02 — Token‑budget recall — each agent pulls only what fits
03 — No access control — any agent can overwrite any state	03 — Six RBAC roles, namespace isolation, write quotas per agent
04 — Contradicting facts from different agents go undetected	04 — Contradictions auto‑resolved — newer facts supersede older ones
05 — Session ends, everything learned is gone forever	05 — Persistent across sessions, pipelines, and agent restarts

More agents, more sessions, more memories — retrieval gets better while context cost stays flat.

§ 02 — Multi-Agent Systems

_{Orchestrator · Planner · Executor · Reviewer}

Not a chatbot plugin. Infrastructure for agent teams.

Every recall and write is scoped to an AgentContext — a lightweight dataclass carrying identity, role, namespace, parent trail, token budget, write quota, and visibility. Contexts are immutable; spawning a sub‑agent returns a new context with inherited provenance.

#	Primitive	What it does
01	Namespace isolation	Every agent, project, or tenant gets its own namespace. Planner writes, coder reads, reviewer sees both. Isolated by default, shared when you configure it.
02	Six RBAC roles	Orchestrator, Planner, Executor, Researcher, Reviewer, Monitor. Read‑only observers to full admins.
03	Provenance tracking	Know which agent wrote which memory, when, and under which parent session. The reviewer can trace a decision back to the planner three sessions ago.
04	Cross‑agent contradiction resolution	Agent A learns "user works at Google." Agent B learns "user works at Meta." Attestor auto‑supersedes. Full history preserved. Zero inference calls in the critical path.
05	Token budgets per agent	`recall(query, budget=2000)` — a summarizer uses 500 tokens; a deep reasoner uses 5,000. Each agent receives exactly what fits in its context window.
06	Write quotas & review flags	Rate‑limit writes per namespace, flag writes for human review, add compliance tags for audit.

§ 03 — The Retrieval Pipeline

_{Five layers · zero inference calls}

Five layers. No LLM. Everything deterministic.

When an agent calls recall(query, budget), five cooperating layers find, fuse, score, and fit the most relevant memories into the requested token ceiling. The store can hold ten million memories; the context window never sees more than the budget.

#	Layer	Backend	Mechanism
01	Tag Match	Postgres	Tag index, FTS / trigram, exact + partial hits
02	Graph Expansion	Neo4j (GDS)	Multi‑hop BFS (depth 2)
03	Vector Search	pgvector	Cosine similarity (HNSW)
04	Fusion + Rank	In‑process	RRF (k=60) + PageRank + confidence decay
05	Diversity + Fit	In‑process	MMR (λ=0.7) + greedy token‑budget pack

Storage roles

Storage roles — document store, vector store, graph store

Every memory is persisted across three complementary stores. Every supported backend combination is just a different technology choice for one or more of these roles.

Role	What it stores	Why it exists
Document store	The source of truth — content, tags, entity, category, timestamps, provenance, confidence	Where `add()` commits; where `recall()` hydrates final memory text
Vector store	Dense embedding per memory, keyed by memory ID	Finds memories by meaning when no tag or word overlaps the query
Graph store	Entity nodes + typed edges (`uses`, `authored-by`, `supersedes`)	Connects memories indirectly — query “Python” can surface “Django” via the graph

Ingestion flow — what happens on `add()`

_{Example framing — a market intelligence system feeding a financial advisor pipeline. Every signal the desk cares about lands here.}

flowchart TB
    N1["<b>News wires</b><br/>Reuters &bull; Bloomberg<br/><i>breaking headlines</i>"]
    N2["<b>Market data</b><br/>ticks &bull; OHLC<br/><i>prices &bull; volumes</i>"]
    N3["<b>Earnings reports</b><br/>10-K &bull; 10-Q &bull; 8-K<br/><i>guidance &bull; surprises</i>"]
    N4["<b>Leadership changes</b><br/>CEO &bull; CFO &bull; Board<br/><i>appointments &bull; exits</i>"]
    N5["<b>Geopolitical events</b><br/>tariffs &bull; sanctions<br/><i>policy &bull; conflict</i>"]

    subgraph SOURCES ["&sect; MARKET INTELLIGENCE SOURCES"]
        direction LR
        N1 ~~~ N2 ~~~ N3 ~~~ N4 ~~~ N5
    end

    SOURCES ==>|"<b>mem.add(content, tags, entity, provenance, ts)</b>"| API{{"<b>INGEST API</b>"}}

    D1["<b>Document store</b><br/>insert row<br/>content &bull; tags &bull; entity<br/>ts &bull; source &bull; confidence"]
    D2["<b>Vector store</b><br/>embed text<br/>&rarr; 384-d vector<br/>keyed by memory ID"]
    D3["<b>Graph store</b><br/>extract entities + edges<br/>issuer &bull; sector &bull; person<br/>country &bull; event"]

    subgraph WRITE ["&sect; PARALLEL WRITES &mdash; one logical transaction"]
        direction LR
        D1 ~~~ D2 ~~~ D3
    end

    API --> D1
    API --> D2
    API --> D3

    CD["<b>Contradiction check</b><br/>per entity &bull; per field<br/>e.g. JPM CFO is X vs new JPM CFO is Y"]

    D1 ==> CD
    D2 ==> CD
    D3 ==> CD

    CD ==>|newer fact wins| SUP["<b>Supersede older fact</b><br/>keep in timeline for audit"]
    SUP ==> DONE["<b>Committed &bull; recallable</b>"]

    style SOURCES fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style WRITE   fill:#FBF8F1,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style API     fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style CD      fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style SUP     fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style DONE    fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style N1      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style N2      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style N3      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style N4      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style N5      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style D1      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style D2      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style D3      fill:#FBF8F1,stroke:#1A1614,color:#1A1614

_{The three writes commit as one logical transaction. On SQL backends it’s a real DB transaction; on distributed backends it’s sequenced with best-effort rollback. Contradictions don’t overwrite — older facts are superseded and retained in the timeline so auditors can reconstruct what the desk knew, and when.}

Ingestion flow — variant B • agent-to-agent conversation capture

_{A different kind of ingestion: the memories are not external feeds — they are the agents’ own conversation as they debate a trade proposal. Every turn is captured with speaker, timestamp, entity, and a decision marker.}

flowchart TB
    T1["<b>Portfolio Planner</b><br/><i>&ldquo;Proposing to add 2% JPM<br/>ahead of Q3 print&rdquo;</i>"]
    T2["<b>Market Researcher</b><br/><i>&ldquo;Consensus EPS is +4% QoQ,<br/>whisper number suggests beat&rdquo;</i>"]
    T3["<b>Risk Analyst</b><br/><i>&ldquo;Tariff headline risk this week<br/>&mdash; raise stop to 8%&rdquo;</i>"]
    T4["<b>Compliance Reviewer</b><br/><i>&ldquo;Ok with position limit.<br/>Logging rationale.&rdquo;</i>"]
    DEC["<b>Decision reached</b><br/>buy 2% JPM &bull; stop 8% &bull; pre-earnings"]

    subgraph CONV ["&sect; AGENT-TO-AGENT CONVERSATION &mdash; trade proposal thread"]
        direction LR
        T1 --> T2
        T2 --> T3
        T3 --> T4
        T4 --> DEC
    end

    CAP["<b>Turn-level capture</b><br/>speaker &bull; utterance &bull; ts<br/>entity: JPM &bull; topic: position sizing<br/>kind: <i>conversation</i>"]

    CONV ==>|auto-capture hook<br/>every turn| CAP
    CAP ==>|"<b>mem.add(content, speaker, thread_id, kind=&quot;chat&quot;)</b>"| API{{"<b>INGEST API</b>"}}

    DOC[("<b>Document store</b><br/>turn rows, thread_id,<br/>speaker, ts, decision flag")]
    VEC[("<b>Vector store</b><br/>embedding per turn")]
    GR[("<b>Graph store</b><br/>edges: agent &rarr; entity<br/>agent &rarr; decision")]

    API --> DOC
    API --> VEC
    API --> GR

    DONE["<b>Thread memorialised</b><br/>replayable &bull; attributable &bull; auditable"]
    DOC ==> DONE
    VEC ==> DONE
    GR ==> DONE

    style CONV fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style CAP  fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style API  fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style DONE fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style T1   fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style T2   fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style T3   fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style T4   fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style DEC  fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style DOC  fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style VEC  fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style GR   fill:#FBF8F1,stroke:#1A1614,color:#1A1614

_{Different from feed ingestion: the source is the agents themselves, not the outside world. Every turn is attributed to a speaker, tied to a thread, and flagged if it contained a decision. Nothing is paraphrased — the verbatim utterance is preserved so the reasoning can be reconstructed under audit.}

Recall flow — what happens on `recall()`

_{Same market intelligence system — now the Portfolio Planner asks a real question ahead of the morning call.}

flowchart TB
    U["<b>Financial Advisor</b><br/>typing into chat UI<br/>ahead of the 8am call"]
    BUBBLE["<b>Chat message</b><br/><i>What do we know about JPM&rsquo;s CFO transition<br/>and the fallout for US regional banks?</i>"]

    subgraph CHAT ["&sect; ADVISOR CHAT INTERFACE &mdash; human in the loop"]
        direction LR
        U ==> BUBBLE
    end

    CHAT ==>|routed to| AGENT(["<b>Portfolio Planner agent</b><br/>decomposes intent &bull; issues recall"])

    Q1["<i>JPM CFO transition</i>"]
    Q2["<i>Semiconductor supply-chain<br/>risk after latest tariff move</i>"]
    Q3["<i>Earnings surprises in<br/>US regional banks, last 90 days</i>"]

    subgraph QUERIES ["&sect; DECOMPOSED RECALL QUERIES &mdash; what the agent actually asks attestor"]
        direction LR
        Q1 ~~~ Q2 ~~~ Q3
    end

    AGENT ==> QUERIES
    QUERIES ==>|"<b>mem.recall(query, budget=2000)</b>"| API{{"<b>RECALL API</b>"}}

    L1["<b>01 &bull; Tag Match</b><br/>&rarr; document store<br/>FTS on JPM, CFO,<br/>tariff, earnings"]
    L2["<b>02 &bull; Graph Expansion</b><br/>&rarr; graph store<br/>BFS: JPM &rarr; CFO<br/>&rarr; Jeremy Barnum"]
    L3["<b>03 &bull; Vector Search</b><br/>&rarr; vector store<br/>cosine on query<br/>top-K nearest embeddings"]

    subgraph SOURCES ["&sect; STAGE A &mdash; parallel sources &bull; fan-out across 3 indexes"]
        direction LR
        L1 ~~~ L2 ~~~ L3
    end

    API --> L1
    API --> L2
    API --> L3

    IDS[("Candidate memory IDs<br/>~100s &bull; deduped")]

    L1 --> IDS
    L2 --> IDS
    L3 --> IDS

    L4["<b>04 &bull; Fusion &amp; Rank</b><br/>RRF k=60 &bull; PageRank boost on central entities<br/>&bull; confidence decay on stale prints"]
    L5["<b>05 &bull; Diversity &amp; Fit</b><br/>MMR &lambda;=0.7 &mdash; drop near-duplicate news wires<br/>&bull; greedy pack under 2,000 tokens"]
    OUT["<b>Portfolio Planner context</b>"]
    REPLY["<b>Chat reply to advisor</b><br/>sourced &bull; dated &bull; auditable"]

    IDS ==>|hydrate from doc store| L4
    L4 ==> L5
    L5 ==>|ranked memories &le; budget<br/>zero LLM calls in the path| OUT
    OUT ==>|grounded answer streamed to chat| REPLY

    style CHAT    fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style U       fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style BUBBLE  fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style AGENT   fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style REPLY   fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style QUERIES fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style SOURCES fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style API     fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style L1      fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style L2      fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style L3      fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style IDS     fill:#F5F1E8,stroke:#1A1614,color:#1A1614
    style L4      fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style L5      fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style OUT     fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style Q1      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style Q2      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style Q3      fill:#FBF8F1,stroke:#1A1614,color:#1A1614

_{Only memory IDs travel between layers until the hydrate step. A store with ten million market-intel rows still returns a tight result set inside the caller’s token ceiling. Graph expansion is the step that lets “tariff” surface memories about “TSMC” and “Nvidia” without either word appearing in the query.}

Recall flow — variant B • agent context recall

_{A different kind of recall: no human in the loop. An agent resuming a task — or handing off to a peer — pulls back its own prior working context: earlier decisions, peer rationale, what was true at the last checkpoint.}

flowchart TB
    RESUME["<b>Risk Analyst</b> resuming mid-task<br/>or <b>Compliance Reviewer</b> taking handoff<br/><i>no user prompt &mdash; agent self-initiated</i>"]

    INTENT["<b>Context intent</b><br/>&ldquo;what did the desk already decide<br/>about JPM this week, and why?&rdquo;"]

    RESUME ==> INTENT
    INTENT ==>|"<b>mem.recall_context(thread_id, entity=&quot;JPM&quot;, since=7d)</b>"| API{{"<b>CONTEXT RECALL API</b>"}}

    F1["<b>Thread filter</b><br/>same thread_id<br/>or same namespace"]
    F2["<b>Entity filter</b><br/>entity = JPM<br/>and related via graph"]
    F3["<b>Temporal filter</b><br/>within last 7d<br/>supersedes resolved"]
    F4["<b>Speaker filter</b><br/>peer agents in role<br/>RBAC-visible only"]

    subgraph FILTERS ["&sect; CONTEXT FILTERS &mdash; tighter than open-query recall"]
        direction LR
        F1 ~~~ F2 ~~~ F3 ~~~ F4
    end

    API --> FILTERS

    TURNS["<b>Prior conversation turns</b><br/>Planner proposal &bull; Researcher consensus<br/>Analyst stop raise &bull; Compliance sign-off"]
    DECS["<b>Prior decisions</b><br/>buy 2% JPM &bull; stop 8%<br/>decided 3 days ago"]
    DELTA["<b>What changed since</b><br/>new tariff headline today<br/>stop needs re-evaluation"]

    FILTERS ==> TURNS
    FILTERS ==> DECS
    FILTERS ==> DELTA

    PACK["<b>Context pack</b><br/>chronologically ordered<br/>&bull; speaker-attributed<br/>&bull; fit to agent token budget"]

    TURNS ==> PACK
    DECS ==> PACK
    DELTA ==> PACK

    PACK ==>|loaded into agent<br/>working memory| RESUMED["<b>Agent resumes task</b><br/>with full prior context<br/>&bull; no re-asking peers<br/>&bull; no lost decisions"]

    style RESUME   fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style INTENT   fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style API      fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style FILTERS  fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style F1       fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style F2       fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style F3       fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style F4       fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style TURNS    fill:#F5F1E8,stroke:#1A1614,color:#1A1614
    style DECS     fill:#F5F1E8,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style DELTA    fill:#F5F1E8,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style PACK     fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style RESUMED  fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8

_{Differs from open-query recall in three ways: (1) the caller is an agent, not a human — triggered by resume / handoff, not by a chat message; (2) the filters are tighter — thread, namespace, RBAC, time window — not just semantic similarity; (3) the output preserves chronology and attribution rather than ranking purely by relevance. This is how a long-running pipeline stays coherent across restarts, handoffs, and multi-day workflows.}

Isolation — namespace & RBAC boundary

_{Multi-tenant by construction. Every memory lives inside a namespace; every agent is bound to one of six roles (ORCHESTRATOR, PLANNER, EXECUTOR, RESEARCHER, REVIEWER, MONITOR); every call is authorised before it touches storage.}

flowchart TB
    CALLER["<b>Incoming call</b><br/>agent identity &bull; API key / JWT<br/>claims: namespace, role, thread"]

    AUTH["<b>AuthZ gate</b><br/>verify signature &bull; resolve role<br/>&bull; check write quota per agent"]

    CALLER ==> AUTH

    R1["<b>ORCHESTRATOR</b><br/>spawns sub-agents<br/>full read/write"]
    R2["<b>PLANNER</b><br/>decomposes tasks<br/>writes plans + decisions"]
    R3["<b>EXECUTOR</b><br/>runs the work<br/>add + recall own thread"]
    R4["<b>RESEARCHER</b><br/>gathers facts<br/>often read-only"]
    R5["<b>REVIEWER</b><br/>audits decisions<br/>read + flag-for-review"]
    R6["<b>MONITOR</b><br/>observability only<br/>read + timeline"]

    subgraph ROLES ["&sect; RBAC ROLES &mdash; 6 built-in, customisable"]
        direction LR
        R1 ~~~ R2 ~~~ R3 ~~~ R4 ~~~ R5 ~~~ R6
    end

    AUTH ==> ROLES

    NS1["<b>namespace: fund-alpha</b><br/>portfolio planner threads<br/>risk analyst threads"]
    NS2["<b>namespace: fund-beta</b><br/>separate data plane<br/>no cross-read"]
    NS3["<b>namespace: research</b><br/>shared feed ingest<br/>read-only for funds"]

    subgraph TENANTS ["&sect; NAMESPACE BOUNDARIES &mdash; hard isolation"]
        direction LR
        NS1 ~~~ NS2 ~~~ NS3
    end

    ROLES ==>|"filtered by namespace + role"| TENANTS

    STORE[("<b>Doc &bull; Vector &bull; Graph</b><br/>row-level tenant column<br/>filtered on every query")]

    TENANTS ==> STORE

    style CALLER  fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style AUTH    fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style ROLES   fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style TENANTS fill:#F5F1E8,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style STORE   fill:#FBF8F1,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style R1      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style R2      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style R3      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style R4      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style R5      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style R6      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style NS1     fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style NS2     fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style NS3     fill:#FBF8F1,stroke:#C15F3C,color:#1A1614

_{Namespaces are enforced at the row level (tenant column on every row, filtered on every query), not just in application code. Cross-namespace reads require an ORCHESTRATOR or REVIEWER context. Every write records the agent’s trail (parent chain, session id, namespace) as provenance metadata so feed ingests cannot impersonate a peer.}

Temporal — timeline & supersession

_{Attestor doesn’t overwrite — it supersedes. Every fact has a validity window, and the timeline is replayable to any point in the past. Auditors can answer not just “what does the desk know?” but “what did the desk know on 2026-04-10 at 08:00?”}

flowchart LR
    F1["<b>fact v1</b><br/><i>JPM CFO is Jeremy Barnum</i><br/>valid_from: 2022-05<br/>valid_to: <b>&infin;</b><br/>confidence: 0.95"]
    F2["<b>fact v2</b><br/><i>JPM CFO is Jane Doe</i><br/>valid_from: 2026-04-11<br/>valid_to: <b>&infin;</b><br/>confidence: 0.98<br/>supersedes: v1"]
    F3["<b>fact v3</b><br/><i>JPM CFO appointment delayed</i><br/>valid_from: 2026-04-12<br/>valid_to: <b>&infin;</b><br/>confidence: 0.90<br/>supersedes: v2"]

    subgraph TL ["&sect; TIMELINE &mdash; same entity, multiple states, none deleted"]
        direction LR
        F1 ==>|"new fact ingested<br/>contradiction detected"| F2
        F2 ==>|"corrected headline<br/>contradiction detected"| F3
    end

    Q1["<b>recall today</b><br/><i>who is JPM CFO?</i>"]
    Q2["<b>recall as-of 2026-04-11</b><br/><i>who was JPM CFO yesterday?</i>"]
    Q3["<b>auditor replay</b><br/><i>show full timeline</i>"]

    Q1 -->|"valid_to = &infin;<br/>latest confident fact"| A1["<b>v3 only</b><br/>&ldquo;appointment delayed&rdquo;"]
    Q2 -->|"filter: valid_at 2026-04-11"| A2["<b>v2 only</b><br/>&ldquo;Jane Doe&rdquo;"]
    Q3 -->|"no filter<br/>full chain"| A3["<b>v1 &rarr; v2 &rarr; v3</b><br/>with supersession edges"]

    TL ==> Q1
    TL ==> Q2
    TL ==> Q3

    style TL fill:#F5F1E8,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style F1 fill:#FBF8F1,stroke:#6B5F4F,color:#1A1614
    style F2 fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style F3 fill:#FBF8F1,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style Q1 fill:#1A1614,stroke:#C15F3C,color:#F5F1E8
    style Q2 fill:#1A1614,stroke:#C15F3C,color:#F5F1E8
    style Q3 fill:#1A1614,stroke:#C15F3C,color:#F5F1E8
    style A1 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style A2 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style A3 fill:#FBF8F1,stroke:#1A1614,color:#1A1614

_{Supersession is a graph edge, not a delete. The document store keeps every version; the graph store links v1 —supersedes→ v2. Recall defaults to “latest confident fact,” but any call can pass as_of to replay the past, which is how regulatory audit and post-mortem reconstruction both work on the same primitive.}

Provenance — source-to-citation chain

_{Every sentence the agent writes back to the advisor is traceable to its source. Not a paraphrase of a paraphrase — a cryptographic chain from raw feed ingest to grounded answer.}

flowchart LR
    RAW["<b>Raw signal</b><br/><i>Reuters wire, 08:14 UTC</i><br/>&ldquo;JPM names Jane Doe CFO&rdquo;<br/>sha256: a1b2c3&hellip;"]

    ING["<b>Ingest</b><br/>parse &bull; entity extract<br/>attach provenance envelope<br/>source_id &bull; source_ts &bull; hash"]

    MEM["<b>Memory row</b><br/>id: mem_42<br/>content, entity=JPM<br/>provenance: {source_id, hash,<br/>ingest_ts, confidence: 0.98}"]

    RECALL["<b>Retrieval</b><br/>5-layer pipeline<br/>returns mem_42 among others"]

    ANS["<b>Agent answer</b><br/><i>&ldquo;Jane Doe was named CFO on<br/>2026-04-11 [source: Reuters]&rdquo;</i><br/>citations: [mem_42]"]

    AUDIT["<b>Auditor click-through</b><br/>mem_42 &rarr; raw Reuters wire<br/>with timestamp + hash check"]

    RAW ==> ING
    ING ==> MEM
    MEM ==> RECALL
    RECALL ==> ANS
    ANS -.->|citation link<br/>resolves to| AUDIT
    AUDIT -.->|verifies hash<br/>against raw signal| RAW

    style RAW    fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style ING    fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style MEM    fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style RECALL fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style ANS    fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style AUDIT  fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614

_{The citation in the agent’s reply is not a string the LLM chose to emit — it’s the memory_id carried through the retrieval pipeline. An auditor clicks the citation and lands on the raw wire with timestamp and content hash. If the upstream signal was tampered with, the hash check fails. This is what distinguishes grounded from plausible.}

§ 04 — Deployment Matrix

_{Your cloud · your infrastructure · Terraform templates included}

Same API. Every backend. Your infrastructure, not theirs.

Attestor ships as a Python library, a REST API, or a containerized service. Reference Terraform templates for Amazon Web Services (ECS Fargate + ArangoDB), Microsoft Azure (Container Apps + Cosmos DB), and Google Cloud Platform (Cloud Run + AlloyDB) live under attestor/infra/. Clone, set your own variables, terraform apply in your account — no SaaS middleman, no per-seat fees, no vendor lock-in.

$ pip install attestor
$ attestor api --host 0.0.0.0 --port 8080

_{Starlette ASGI on http://localhost:8080. Backed by Postgres (pgvector) + Neo4j (GDS) — run them locally via the included Docker Compose stack, or point at managed Postgres and Neo4j AuraDB. Every agent in your stack talks to the same URL — they share memory instantly. Self‑hosted, in your infrastructure, behind your firewall.}

#	Target	Notes
01	AWS — ECS Fargate + ArangoDB	ECS task with ArangoDB sidecar behind an ALB. Terraform template included — clone, adapt, `terraform apply` in your account.
02	Azure — Container Apps	Cosmos DB DiskANN for doc + vector; NetworkX for graph. Terraform template included.
03	Google Cloud Platform — Cloud Run + AlloyDB	AlloyDB (PostgreSQL + ScaNN + pgvector + AGE) behind Cloud Run. Terraform template included.
04	PostgreSQL backend	pgvector + Apache AGE. Neon serverless or any Postgres 16. Doc · Vector · Graph
05	ArangoDB backend	Multi‑model: graph + document + vector in one engine. Oasis or self‑hosted.
06	Local / On‑Prem	Self‑hosted Postgres + Neo4j via the included Docker Compose stack. Air‑gapped deployments, no egress to third parties.

Same container, pluggable stores

Deployment matrix — one Attestor container, six targets, three storage roles swap per target

_{Every deployment is the same Python library wrapped in the same Starlette ASGI container. DocumentStore, VectorStore, and GraphStore are three interfaces; each column above is one implementation of each. Same API, same retrieval behavior, your infrastructure.}

Runtime topology — three integration modes

_{The same Attestor engine runs in three shapes — same storage, same retrieval, different coupling. Pick by latency budget and blast radius.}

flowchart TB
    subgraph M1 ["&sect; MODE A &mdash; EMBEDDED LIBRARY &bull; lowest latency"]
        direction LR
        AG1["<b>Agent process</b><br/>Python &bull; import attestor"]
        MW1["<b>Attestor in-proc</b><br/>AgentMemory('./store')"]
        ST1[("<b>Local stores</b><br/>Postgres (pgvector) · Neo4j (GDS)<br/>via Docker Compose")]
        AG1 ==> MW1 ==> ST1
    end

    subgraph M2 ["&sect; MODE B &mdash; SIDECAR CONTAINER &bull; process isolation"]
        direction LR
        AG2["<b>Agent container</b><br/>any language<br/>HTTP client"]
        MW2["<b>Attestor sidecar</b><br/>attestor api<br/>on localhost:8080"]
        ST2[("<b>Shared volume</b><br/>or managed backends")]
        AG2 ==>|"HTTP / MCP<br/>same pod"| MW2 ==> ST2
    end

    subgraph M3 ["&sect; MODE C &mdash; SHARED SERVICE &bull; multi-agent mesh"]
        direction LR
        AF1["<b>Agent A</b>"]
        AF2["<b>Agent B</b>"]
        AF3["<b>Agent C</b>"]
        MW3["<b>Attestor service</b><br/>App Runner · Cloud Run<br/>· Container Apps"]
        ST3[("<b>Managed backends</b><br/>Postgres · ArangoDB · Cosmos")]
        AF1 ==> MW3
        AF2 ==> MW3
        AF3 ==> MW3
        MW3 ==> ST3
    end

    style M1 fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style M2 fill:#F5F1E8,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style M3 fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style AG1 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style MW1 fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style ST1 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style AG2 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style MW2 fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style ST2 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style AF1 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style AF2 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style AF3 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style MW3 fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style ST3 fill:#FBF8F1,stroke:#1A1614,color:#1A1614

_{Mode A is sub-millisecond for a single agent prototyping on a laptop. Mode B adds language independence — a Go or Rust agent can call the sidecar over HTTP without Python in its image. Mode C is the production shape: one Attestor service in front of a multi-agent mesh, with managed storage behind. Code path is identical across all three — only configuration changes.}

Promotion path

Promotion path — laptop to dev VM to managed cloud, same API throughout

_{Prototype on a laptop. Promote to Docker Compose on a VM without rewriting a single line. Promote to managed container runtime by swapping the storage URLs. The code never learns which backend it’s talking to.}

§ 05 — Principles

_{What we won't compromise on}

#	Principle	What it means
01	Self‑hosted by default	Your data stays in your infrastructure. No SaaS middleman, no per‑seat fees, no lock‑in. Run on a laptop, a VM, or any cloud.
02	Deterministic retrieval	Tag match, graph traversal, vector search, RRF fusion, MMR diversity — all deterministic. No LLM judges. No hidden inference calls in the critical path.
03	One API, every backend	Same `mem.recall()` call whether the store is a local Postgres + Neo4j pair or ArangoDB behind a Cloud Run service. Swap backends without rewriting agents.
04	Agent teams are first‑class	Namespaces, roles, quotas, and provenance are not bolt‑ons. The primitives were designed for orchestrator–worker pipelines from day one.
05	Boring where it counts	Postgres, pgvector, Neo4j, GDS. Proven, debuggable, no magic. Terraform templates, not a hosted console.

Install attestor and point your agents at one URL. They share memory instantly.

_{↓ Reference documentation follows}

Reference

_{Everything below is the technical manual. If you are evaluating, the pitch ended at § 05.}

Quick Start
Architecture
How It Works
Python API
REST API
MCP Integration
Cloud Backends
Cloud Deployment
Embedding Providers
CLI Reference
Configuration
Testing
Compatibility
Uninstall

Quick Start

Attestor needs a Postgres (pgvector) + Neo4j (GDS) pair. Bring both up locally with the bundled Docker Compose stack:

cd attestor/infra/local
cp .env.example .env      # set OPENAI_API_KEY
docker compose up -d postgres neo4j

Install and point Attestor at the stack:

poetry add attestor
export POSTGRES_URL="postgresql://postgres:attestor@localhost:5432/attestor"
export NEO4J_URI="bolt://localhost:7687"
export NEO4J_USERNAME="neo4j"
export NEO4J_PASSWORD="attestor"

from attestor import AgentMemory

mem = AgentMemory()       # reads env / config.toml
mem.add("Architecture decision: event sourcing for order service",
        category="technical", entity="order-service", tags=["arch", "decision"])
results = mem.recall("how is the order service structured?", budget=2000)

For REST API self-host, MCP integration, and cloud deploy — see REST API, MCP Integration, Cloud Deployment. attestor doctor verifies all components (Postgres document + pgvector, Neo4j graph, retrieval pipeline).

Architecture

Attestor Architecture

The diagram above shows how a call to AgentMemory.recall() flows: the top-level API in core.py fans out across the three storage roles (document, vector, graph), the retrieval orchestrator fuses and ranks their results, and the scorer packs them into the caller’s token budget. The tree below enumerates every module referenced in the diagram.

Component Overview

attestor/
├── core.py                    # AgentMemory — public API surface (add / recall / search / timeline / health)
├── models.py                  # Memory + RetrievalResult dataclasses
├── context.py                 # AgentContext — multi-agent provenance, RBAC, token budgets
├── client.py                  # MemoryClient — drop-in HTTP client that mirrors AgentMemory
├── cli.py                     # CLI entry point (22 subcommands — add, recall, doctor, api, mcp, hook, ...)
├── api.py                     # Starlette ASGI REST API (8 routes)
├── locomo.py                  # LOCOMO benchmark runner (evaluation, not runtime)
├── mab.py                     # MAB benchmark runner (evaluation, not runtime)
├── store/
│   ├── base.py                # Abstract interfaces: DocumentStore, VectorStore, GraphStore
│   ├── registry.py            # Backend factory + selection by config
│   ├── connection.py          # Shared connection helpers
│   ├── embeddings.py          # Provider auto-detect (OpenAI / Bedrock / Vertex AI / Azure OpenAI / local opt-in)
│   ├── postgres_backend.py    # PostgreSQL + pgvector (document + vector roles)
│   ├── neo4j_backend.py       # Neo4j + GDS (graph role, PageRank / BFS / Leiden)
│   ├── schema.sql             # Postgres schema definition
│   ├── arango_backend.py      # ArangoDB (native doc + vector + graph) — opt-in backend
│   ├── aws_backend.py         # Amazon Web Services (DynamoDB + OpenSearch Serverless + Neptune)
│   ├── azure_backend.py       # Microsoft Azure (Cosmos DB DiskANN + NetworkX in-process for graph)
│   └── gcp_backend.py         # Google Cloud Platform AlloyDB (PostgreSQL + ScaNN + Vertex AI embeddings)
├── graph/
│   └── extractor.py           # Entity/relation extraction (4-output GraphRAG)
├── retrieval/
│   ├── orchestrator.py        # 5-layer cascade with RRF fusion
│   ├── tag_matcher.py         # Stop-word filtered tag extraction
│   └── scorer.py              # Temporal + entity + PageRank boosts, confidence decay
├── temporal/
│   └── manager.py             # Contradiction detection + supersession
├── extraction/
│   ├── extractor.py           # Memory extraction orchestrator
│   ├── rule_based.py          # Deterministic rule-based extractor
│   └── llm_extractor.py       # Optional LLM-backed extractor
├── mcp/
│   └── server.py              # MCP server (8 tools, 2 resources, 2 prompts)
├── hooks/
│   ├── session_start.py       # Context injection at session start
│   ├── post_tool_use.py       # Auto-capture from Write/Edit/Bash
│   └── stop.py                # Session summary generation
├── utils/
│   ├── config.py              # MemoryConfig dataclass + load/save
│   └── tokens.py              # Token-budget helpers
└── infra/                     # Reference Terraform templates (you supply state + credentials)
    └── aws_arango/            # VPC + ECS Fargate + ArangoDB CE sidecar + ALB (validated end-to-end)

core.py is the only module intended as a public API — every other path is internal and may change between releases. Agents running in-process import AgentMemory; agents talking to a remote Attestor service import MemoryClient from client.py (same method surface, HTTP transport).

Three Storage Roles

Every backend implements one or more of these roles:

Role	Purpose	Default	Alternatives
Document	Core storage, CRUD, filtering	Postgres	AlloyDB, ArangoDB, DynamoDB, Cosmos DB
Vector	Semantic similarity search	Postgres (pgvector, HNSW)	ScaNN (AlloyDB), ArangoDB, OpenSearch Serverless, Cosmos DiskANN
Graph	Entity relationships, multi-hop BFS, PageRank / Leiden	Neo4j (GDS)	Apache AGE (AlloyDB), ArangoDB, Neptune, NetworkX-in-process (Azure)

Backends fill one or more roles; the default topology is postgres (doc + vector) + neo4j (graph), with alternatives selectable via config.toml. Degradation is explicit and tiered: if the vector store is unreachable, retrieval falls back to tag + graph layers; if the graph store is unreachable, retrieval falls back to tag + vector; the document store is the only hard dependency. Non-fatal errors in vector or graph operations are caught and logged — the document path never breaks.

How It Works

Memory is infrastructure, not a prompt attachment

Attestor runs as a separate tier — a library, a container, or a cloud service — that agents query on demand. Stored memories never enter the context window until an agent explicitly calls recall() with a token budget. Retrieval cost stays constant as the store grows from 100 to 5,000,000 memories; only the ranking candidate pool expands.

Token cost is bounded by budget, not store size

Naive context-injection approach:
  Month 1:   2K tokens loaded every message
  Month 6:  15K tokens loaded every message  ← context crowded

Attestor:
  Month 1:   ≤2K tokens returned per recall  (ranked from 100 memories)
  Month 6:   ≤2K tokens returned per recall  (ranked from 5,000 memories)
                                             ← bounded cost, deeper recall

How a recall works

When an agent calls memory_recall("deployment setup", budget=2000):

Store: 5,000 memories

  Tag search finds:     15 memories tagged "deployment"
  Graph search finds:    8 memories linked to "AWS", "Docker" entities
  Vector search finds:  20 semantically similar memories

  After dedup + RRF fusion:  30 unique candidates, scored and ranked

  Budget fitting (2,000 tokens):
    Memory A (score 0.95):  500 tokens → in   (total: 500)
    Memory B (score 0.90):  600 tokens → in   (total: 1,100)
    Memory C (score 0.88):  400 tokens → in   (total: 1,500)
    Memory D (score 0.85):  300 tokens → in   (total: 1,800)
    Memory E (score 0.80):  400 tokens → SKIP (exceeds 2,000)

  Result: 4 memories, 1,800 tokens. 4,996 memories never entered context.

MCP Integration

Attestor ships an MCP server so any MCP-compatible client (Claude Code, Cursor, Windsurf, custom agents) can store and retrieve memories. Start it with attestor mcp.

Tool	Purpose	Key Parameters
`memory_add`	Store a fact	`content`, `tags[]`, `category`, `entity`, `namespace`, `event_date`, `confidence`
`memory_recall`	Smart multi-layer retrieval	`query`, `budget` (default: 2000), `namespace`
`memory_search`	Filter with date ranges	`query`, `category`, `entity`, `namespace`, `status`, `after`, `before`, `limit`
`memory_get`	Fetch by ID	`memory_id`
`memory_forget`	Archive (soft delete)	`memory_id`
`memory_timeline`	Chronological entity history	`entity`, `namespace`
`memory_stats`	Store size, counts	—
`memory_health`	Health check (call first!)	—

MCP Resources

attestor://entity/{name} — Entity details + related entities from graph
attestor://memory/{id} — Full memory object

MCP Prompts

recall — Search memories for relevant context
timeline — Chronological history of an entity

Retrieval Pipeline

The retrieval system uses a 5-layer cascade with multi-signal fusion:

Query: "deployment setup"
  │
  ├─ Layer 0: Graph Expansion
  │  Extract entities from query → BFS traversal (depth=2)
  │  "deployment" → finds "AWS", "Docker", "Terraform" connections
  │
  ├─ Layer 1: Tag Match (Postgres FTS / trigram)
  │  extract_tags(query) → tag_search() → score 1.0
  │
  ├─ Layer 2: Entity-Field Search
  │  Memories about graph-connected entities → score 0.5
  │
  ├─ Layer 3: Vector Search (pgvector HNSW)
  │  Semantic similarity → score = 1 - cosine_distance
  │
  ├─ Layer 4: Graph Relation Triples
  │  Inject relationship context → score 0.6
  │
  ▼ FUSION
  ├─ Reciprocal Rank Fusion (RRF, k=60)
  │  score = Σ 1/(k + rank_in_source)
  │  OR Graph Blend: 0.7 * norm_vector + 0.3 * norm_pagerank
  │
  ▼ SCORING
  ├─ Temporal Boost: +0.2 * max(0, 1 - age_days/90)
  ├─ Entity Boost:   +0.30 exact match, +0.15 substring
  ├─ PageRank Boost:  +0.3 * entity_pagerank_score
  │
  ▼ DIVERSITY
  ├─ MMR Rerank: λ*relevance - (1-λ)*max_jaccard_similarity (λ=0.7)
  │
  ▼ CONFIDENCE
  ├─ Time Decay:    -0.001 per hour since last access
  ├─ Access Boost:  +0.03 per access_count
  ├─ Clamp:         [0.1, 1.0]
  │
  ▼ BUDGET
  └─ Greedy selection by score until token budget filled

Querying "Python" also finds memories about "FastAPI" if they're connected in the entity graph. Multi-hop reasoning through relationship traversal.

Python API

Basic Usage

from attestor import AgentMemory

mem = AgentMemory("./my-agent")  # auto-provisions all backends

# Store
mem.add("User prefers Python over Java",
        tags=["preference", "coding"],
        category="preference",
        entity="Python")

# Recall with token budget
results = mem.recall("what language?", budget=2000)

# Formatted context for prompt injection
context = mem.recall_as_context("user background", budget=4000)

# Search with filters
memories = mem.search(category="project", entity="Python", limit=10)

# Timeline
history = mem.timeline("Python")

# Contradiction handling — automatic
mem.add("User works at Google", tags=["career"], category="career", entity="Google")
mem.add("User works at Meta", tags=["career"], category="career", entity="Meta")
# ^ Google memory auto-superseded

# Namespace isolation
mem.add("Team standup at 9am", namespace="team:alpha")
results = mem.recall("standup time", namespace="team:alpha")

# Maintenance
mem.forget(memory_id)             # Archive
mem.forget_before("2025-01-01")   # Archive old memories
mem.compact()                     # Permanently delete archived
mem.export_json("backup.json")    # Export
mem.import_json("backup.json")    # Import (dedup by content hash)

# Health & stats
mem.health()  # → {sqlite: ok, chroma: ok, networkx: ok, retrieval: ok}
mem.stats()   # → {total: 500, active: 480, ...}

# Context manager
with AgentMemory("./store") as mem:
    mem.add("auto-closed on exit")

Memory Object

@dataclass
class Memory:
    id: str                    # UUID
    content: str               # The actual fact/observation
    tags: List[str]            # Searchable tags
    category: str              # Classification (preference, career, project, ...)
    entity: str                # Primary entity (company, tool, person)
    namespace: str             # Isolation key (default: "default")
    created_at: str            # ISO timestamp
    event_date: str            # When the fact occurred
    valid_from: str            # Temporal validity start
    valid_until: str           # Set when superseded
    superseded_by: str         # ID of replacement memory
    confidence: float          # 0.0-1.0
    status: str                # active | superseded | archived
    access_count: int          # Times recalled
    last_accessed: str         # Last recall timestamp
    content_hash: str          # SHA-256 for dedup
    metadata: Dict[str, Any]   # Arbitrary JSON

Multi-Agent Systems

Attestor is built for production multi-agent pipelines — orchestrator-worker, planner-executor, researcher-reviewer, and hierarchical swarms. Every recall and write is scoped to an AgentContext that carries identity, role, namespace, parent trail, token budget, write quota, and visibility policy. Contexts are immutable; spawning a sub-agent returns a new context with inherited provenance.

from attestor.context import AgentContext, AgentRole, Visibility

# Create a root context
ctx = AgentContext.from_env(
    agent_id="orchestrator",
    namespace="project:acme",
    role=AgentRole.ORCHESTRATOR,
    token_budget=20000,
)

# Spawn child contexts for sub-agents (immutable — returns new instance)
planner = ctx.as_agent("planner", role=AgentRole.PLANNER, token_budget=5000)
researcher = ctx.as_agent("researcher", role=AgentRole.RESEARCHER, read_only=True)

# Provenance tracking — metadata auto-enriched
planner.add_memory("Architecture decision: use event sourcing",
                   category="technical", visibility=Visibility.TEAM)
# metadata includes: _agent_id, _session_id, _namespace, _visibility, _role

# Recall is scoped to namespace + cached within session
results = researcher.recall("architecture decisions")

# Token budget tracked
print(researcher.token_budget - researcher.token_budget_used)

# Governance
researcher.flag_for_review("Need human approval for deployment plan")
researcher.add_compliance_tag("SOC2")

# Session introspection
summary = ctx.session_summary()
# → {agent_trail, memories_written, memories_recalled, token_usage, review_flags}

AgentContext Features

Feature	Description
Namespace isolation	Each agent/project gets isolated memory partition
RBAC roles	ORCHESTRATOR, PLANNER, EXECUTOR, RESEARCHER, REVIEWER, MONITOR
Read-only mode	Agents can recall but not write
Write quotas	`max_writes_per_agent` (default: 100)
Token budgets	Per-agent budget tracking
Recall cache	Dedup redundant queries within a session
Scratchpad	Inter-agent data passing
Provenance	Agent trail, parent tracking, visibility levels
Compliance	Review flags, compliance tags for audit
Distributed mode	Set `memory_url` to use HTTP client instead of local

Cloud Backends

Each cloud backend fills all three roles (document, vector, graph) in a single service:

PostgreSQL (Neon, Cloud SQL, self-hosted)

Uses pgvector for vectors, Apache AGE for graph. AGE is optional — without it, graph gracefully degrades.

mem = AgentMemory("./store", config={
    "backends": ["postgres"],
    "postgres": {"url": "postgresql://user:pass@host:5432/attestor"}
})

ArangoDB (ArangoGraph Cloud, Docker)

Native document, vector, and graph support in one database.

mem = AgentMemory("./store", config={
    "backends": ["arangodb"],
    "arangodb": {"url": "https://instance.arangodb.cloud:8529", "database": "attestor"}
})

Azure (Cosmos DB)

Cosmos DB with DiskANN vector indexing. Graph via NetworkX persisted to Cosmos containers.

mem = AgentMemory("./store", config={
    "backends": ["azure"],
    "azure": {"cosmos_endpoint": "https://account.documents.azure.com:443/"}
})

Google Cloud Platform (AlloyDB)

Extends PostgreSQL backend with AlloyDB Connector (IAM auth) and Vertex AI embeddings (768D).

mem = AgentMemory("./store", config={
    "backends": ["gcp"],
    "gcp": {"project_id": "my-project", "cluster": "attestor", "instance": "primary"}
})

Installing cloud extras

poetry add "attestor[postgres]"    # PostgreSQL
poetry add "attestor[arangodb]"    # ArangoDB
poetry add "attestor[aws]"         # AWS (DynamoDB + OpenSearch + Neptune)
poetry add "attestor[azure]"       # Azure Cosmos DB
poetry add "attestor[gcp]"         # Google Cloud Platform AlloyDB + Vertex AI
poetry add "attestor[docker]"      # opt-in local Docker auto-start for ArangoDB when backend.docker = true and mode = "local"
poetry add "attestor[all]"         # Everything

pipx install "attestor[docker]" — opt-in local Docker auto-start for ArangoDB when backend.docker = true and mode = "local".

Cloud Deployment

The attestor/infra/ directory ships reference Terraform templates — not push-button deploy scripts. Clone the template that matches your target cloud, set your own variables, supply your own credentials, and terraform apply from your own workstation. We have validated each template end-to-end; you keep ownership of state, secrets, and account.

cd attestor/infra/aws_arango
cp variables.tf my.tfvars            # edit: arango_password, region, project_name
terraform init
terraform apply -var-file=my.tfvars
# tear down
terraform destroy -var-file=my.tfvars

Prerequisites: Docker (to build the image), Terraform ≥ 1.5, the relevant cloud CLI (aws/gcloud/az) authenticated to your account.

Cloud	Reference template	What it provisions	Status
Amazon Web Services	`attestor/infra/aws_arango/`	VPC + ECR + ECS Fargate task (attestor + ArangoDB CE sidecar) + ALB	Validated end-to-end
Microsoft Azure	Container Apps + Cosmos DB	Backend code ships in `store/azure_backend.py`; Terraform template forthcoming	Backend ready, template pending
Google Cloud Platform	Cloud Run + AlloyDB	Backend code ships in `store/gcp_backend.py`; Terraform template forthcoming	Backend ready, template pending

Today only the AWS template is shipped. Azure and Google Cloud Platform users can run the backend directly against their existing managed services (Cosmos DB / AlloyDB) while the deploy templates are being finalised.

Sensitive variables (passwords, API keys, account IDs) are declared sensitive = true in the templates and read from your .tfvars — never commit .tfvars or *.tfstate*; both are already in .gitignore.

REST API Endpoints

All deployments expose the same Starlette ASGI API:

Method	Endpoint	Description
`GET`	`/health`	Component health check
`GET`	`/stats`	Store statistics
`POST`	`/add`	Add a memory
`POST`	`/recall`	Smart retrieval with budget
`POST`	`/search`	Filtered search
`POST`	`/timeline`	Entity chronological history
`POST`	`/forget`	Archive a memory
`GET`	`/memory/{id}`	Get memory by ID

Response envelope: {"ok": true, "data": {...}} or {"ok": false, "error": "message"}

Embedding Providers

Attestor auto-detects the best available embedding provider:

Priority	Provider	Model	Dimensions	Trigger
1	Cloud-native	Bedrock Titan / Azure OpenAI / Vertex AI	768-1536	Cloud backend configured
2	OpenAI / OpenRouter	text-embedding-3-small	1536	`OPENAI_API_KEY` or `OPENROUTER_API_KEY` set
3	Local (opt-in)	all-MiniLM-L6-v2	384	Install `attestor[local-embeddings]`; no API key needed

The local provider is opt-in — install the local-embeddings extra to pull sentence-transformers (~90MB on first use). All providers implement the same interface — switching is transparent.

CLI Reference

MCP Server

attestor mcp                          # Start MCP server (uses ~/.attestor)
attestor mcp --path /custom/path      # Custom store location

Memory Operations

attestor add ./store "User prefers Python" --tags "pref,coding" --category preference
attestor recall ./store "what language?" --budget 4000
attestor search ./store --category project --entity Python --limit 20
attestor list ./store --status active --category technical
attestor timeline ./store --entity Python
attestor get ./store <memory-id>
attestor forget ./store <memory-id>

Maintenance

attestor doctor                        # Health check (Postgres, pgvector, Neo4j, Retrieval)
attestor stats ./store                 # Memory counts, DB size, breakdowns
attestor export ./store -o backup.json
attestor import ./store backup.json
attestor compact ./store               # Permanently delete archived memories
attestor inspect ./store               # Raw DB inspection

Lifecycle Hooks

attestor hook session-start           # Inject context at agent session start
attestor hook post-tool-use           # Auto-capture tool observations
attestor hook stop                    # Generate session summary on exit

Hooks integrate with any harness that supports session lifecycle callbacks.

Configuration

Store location

Default: ~/.attestor/ holds only local configuration. Storage lives in Postgres + Neo4j.

~/.attestor/
└── config.toml     # Retrieval tuning + backend connection info

Storage state:

Postgres (documents + pgvector embeddings) — managed via your Postgres instance / Docker volume
Neo4j (graph nodes + edges) — managed via your Neo4j instance / Docker volume

config.json

All fields optional. Defaults apply if the file doesn't exist:

{
  "default_token_budget": 2000,
  "min_results": 3,
  "backends": ["postgres", "neo4j"],
  "enable_mmr": true,
  "mmr_lambda": 0.7,
  "fusion_mode": "rrf",
  "confidence_gate": 0.0,
  "confidence_decay_rate": 0.001,
  "confidence_boost_rate": 0.03
}

Parameter	Default	Description
`default_token_budget`	2000	Max tokens returned per recall
`min_results`	3	Minimum results to return
`enable_mmr`	true	Maximal Marginal Relevance diversity reranking
`mmr_lambda`	0.7	Relevance vs diversity balance (0=diverse, 1=relevant)
`fusion_mode`	"rrf"	"rrf" (parameter-free) or "graph_blend" (weighted)
`confidence_decay_rate`	0.001	Score penalty per hour since last access
`confidence_boost_rate`	0.03	Score boost per access count
`confidence_gate`	0.0	Minimum confidence threshold to include in results

Environment Variables

Variable	Purpose
`ATTESTOR_PATH`	Default store path
`ATTESTOR_URL`	Remote API URL (distributed mode)
`ATTESTOR_NAMESPACE`	Default namespace
`ATTESTOR_TOKEN_BUDGET`	Default token budget
`ATTESTOR_SESSION_ID`	Session ID for provenance tracking

Testing

Running Tests

# All unit tests — Postgres + Neo4j integration layers are env-gated
poetry run pytest tests/ -v

# With coverage
poetry run pytest tests/ -v --cov=attestor --cov-report=term-missing

# Live integration tests (need credentials)
NEON_DATABASE_URL='postgresql://...' poetry run pytest tests/test_postgres_live.py -v
AZURE_COSMOS_ENDPOINT='https://...' poetry run pytest tests/test_azure_live.py -v

Test Coverage

607 unit tests covering all backends, retrieval, config, embeddings, and CLI
14 live integration tests per cloud backend (Neon, Azure, ArangoDB)
Mock tests for every cloud backend — no cloud account needed
Unit tests run without external services; integration tests are env-gated on live Postgres / Neo4j / cloud backends

Compatibility

MCP Clients

Client	Config File
Any MCP client	Standard MCP stdio transport
Claude Code	`.mcp.json` (project) or `~/.claude/.mcp.json` (global)
Cursor	`.cursor/mcp.json`
Windsurf	MCP config in settings

Same attestor mcp command for every client.

Python

Python 3.10, 3.11, 3.12, 3.13, 3.14

Uninstall

1. Remove MCP server config (if used)

Delete the memory entry from your MCP client's config file.

2. Uninstall the package

poetry remove attestor

3. Delete stored memories (optional)

# Export first if you want a backup
attestor export ~/.attestor -o attestor-backup.json

# Then delete
rm -rf ~/.attestor

License

MIT

_{mcp-name: io.github.bolnet/attestor}