📦

Ncms

No description available

0 installs

Trust: 30 — Low

Content

Ask AI about Ncms

I know everything about Ncms. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

See It Working • How It Works • Fine-Tune Your Own Adapter • Benchmarks • Quickstart Guide

Your AI agents forget everything between sessions. Every conversation starts from zero. Every insight, every architectural decision, every hard-won debugging breakthrough — gone.

NCMS fixes this. Permanently.

pip install ncms

from ncms.interfaces.mcp.server import create_ncms_services, create_mcp_server

memory, bus, snapshots, consolidation = await create_ncms_services()
server = create_mcp_server(memory, bus, snapshots, consolidation)

Three lines. Your agents now have persistent, searchable, shared memory with cognitive scoring — a system that learns while it sleeps, tracks how knowledge evolves through state-change grammar, and optionally runs a fine-tuned ingest-side classifier that replaces brittle regex with a 2.4 MB LoRA adapter you train on your own corpus. No vector database. No embedding pipeline. No external services.

What Makes NCMS Different

Problem	Traditional Approach	NCMS
Memory retrieval	Dense vector similarity (lossy)	BM25 + SPLADE + graph expansion + cross-encoder + structured recall (precise)
"What's the current state?"	Recency sort or last-write-wins	TLG grammar retrieval — structural proof over typed state-transition edges, 32/32 rank-1 on ADR corpus
Admission / state-change / topic tagging	5 separate regex & LLM code paths	One fine-tuned 2.4 MB LoRA adapter — five classification heads in a single forward pass
Agent coordination	Polling shared files, explicit tool calls	Embedded Knowledge Bus (osmotic)
Agent goes offline	Knowledge lost until restart	Snapshot surrogate response (always available)
Dependencies	Vector DB + graph DB + message broker	Zero. Single `pip install`.
Setup time	Hours of infrastructure	3 seconds to first query

See It Working

git clone https://github.com/AliceNN-ucdenver/ncms.git
cd ncms && uv sync
uv run ncms demo

Three collaborative agents run through a complete lifecycle — storing knowledge, asking questions, going offline with surrogate responses, and announcing breaking changes — all in-memory, under 10 seconds.

uv run ncms dashboard    # Real-time observability at http://localhost:8420

How It Works

NCMS organizes agent memory into a Hierarchical Temporal Memory Graph (HTMG) — a four-level structure where raw facts crystallize into tracked states, states cluster into temporal episodes, and episodes consolidate into strategic insights. Think of it as giving your agents not just storage, but the ability to understand their knowledge. (V1 architecture)

NCMS Architecture (HTMG)

HTMG - Hierarchical Temporal Memory Graph

Every memory enters through an ingest pipeline that classifies it — like a bouncer deciding who gets into the club, but one who went to grad school. Raw facts become ATOMIC nodes. State changes ("Redis upgraded to v7.4") become ENTITY_STATE nodes with bitemporal validity tracking. Related events cluster into EPISODE nodes via a 7-signal hybrid linker. And overnight, dream cycles consolidate episodes into ABSTRACT insights — the system literally learns while it sleeps.

The Fine-Tunable 5-Head SLM — Ingest-Voice Content Classifier

NCMS replaces five separate pieces of brittle pattern-matching code with a single fine-tuned LoRA adapter running a 5-head BERT classifier at ingest. One forward pass produces: admission routing, state-change detection, topic tagging, preference intent, and typed-span role classification. The output drives every downstream ingest decision — domain expansion, L2 entity-state creation, supersession edges, episode formation. (Current state: v9 · Domain plugin architecture)

Intent-Slot SLM - 5-head LoRA classifier

Five heads, one forward pass (20-65 ms on MPS):

Head	Output	What it replaces
`admission`	persist / ephemeral / discard	4-feature regex heuristic (65.9% accuracy)
`state_change`	declaration / retirement / none → feeds L2 entity-state induction	3-pattern state-declaration regex (8/8 FP on YAML templates)
`topic`	per-adapter taxonomy label (not a hardcoded enum)	LLM-based `label_detector.py` + manual `Memory.domains` tagging
`intent`	positive / negative / habitual / difficulty / choice / none	regex preference extractor (never shipped)
`role` (per-span)	primary / alternative / casual / not_relevant on each gazetteer-detected span	GLiNER-only entity extraction on closed-vocab domains (no role disambiguation)

Signals at retrieval. Each head's output lands on memory.structured["intent_slot"] at ingest and becomes a typed signal the scoring pipeline can read:

intent → boosts memories whose preference label matches the query's pattern intent (Phase H.1)
state_change → boosts memories tagged as actual state changes on CHANGE_DETECTION queries (Phase H.2); gates the supersession/conflict reconciliation penalty so it only fires on CURRENT_STATE_LOOKUP (Phase G — the canonical bug fix)
topic → auto-appended to Memory.domains so the domain filter narrows retrieval without manual tagging
role → reserved for grounding boost (memories where the query entity has role=primary); off-by-default pending v10 calibration

A per-query diagnostic event emits the signal vector for every search: which heads fired, which contributed to the rank-1 result's score, and whether grammar composition replaced the BM25 head.

3-tier fallback chain — the chain's presence at MemoryService construction is the kill-switch. Set NCMS_DEFAULT_ADAPTER_DOMAIN=<name> to load the LoRA at startup; leave it unset to ingest on the heuristic-only chain. No boolean flag.

primary:    LoRA adapter  (per-deployment, ~2.4 MB on disk)
   ↓ if adapter missing / head abstained
fallback:   E5 zero-shot   (cold-start: intent-only head, no training required)
   ↓ if torch unavailable
heuristic:  null output    (admission=persist, everything else None — ingest keeps working)

Dynamic topics. The topic vocabulary lives in the adapter's manifest.json + taxonomy.yaml, not in the codebase. Swap adapter → swap topics. The dashboard enumerates them directly from the database (SQLiteStore.list_topics_seen()) with zero config coupling.

Three reference adapters ship today — each ~2.4 MB at ~/.ncms/adapters/<domain>/v9/:

conversational/v9 — open-vocab domain (no gazetteer; role head idle, GLiNER provides entities)
clinical/v9 — 536 gazetteer entries × 6 slots (medication / procedure / symptom / severity / alternative / frequency)
software_dev/v9 — 712 gazetteer entries × 9 slots (library / language / framework / pattern / tool / database / service / alternative / frequency)

All three are baked into the NemoClaw hub Docker image; the hub defaults to software_dev.

CTLG — query-side cue tagging (planned, sibling adapter)

The 5-head SLM owns ingest voice. Query-voice semantic parsing is a different task — composing typed cues (causal / temporal / ordinal / modal) into a structured TLG query form — and it ships as a separate CTLG adapter loaded alongside the 5-head SLM at runtime. Two adapters in production, one per cognitive role (content classification vs cue tagging), NOT two-for-CTLG.

Why a sibling, not a 6th head: v8 attempted to add the cue tagger as a 6th head on the same encoder. Joint training of per-token BIO sequence labeling alongside the per-CLS classification heads saturated under shared encoder capacity — training loss oscillated, several previously-healthy heads regressed. v9 dropped the 6th head; the CTLG adapter forks training while keeping the runtime architecture coherent. (CTLG design · v8 saturation retrospective)

Plumbing already in place — EdgeType.CAUSED_BY + EdgeType.ENABLES on graph edges; cue_tags: list[dict] field on ExtractedLabel; _extract_and_persist_causal_edges ingestion path gated on cue_tags presence; rules-first synthesizer at domain/tlg/composition.py; NCMS_TLG_LLM_FALLBACK_ENABLED knob reserved. The cue head is the only missing piece — corpus annotation + dedicated training.

Retrieval Pipeline

Traditional memory systems compress documents into dense vectors, losing precision. NCMS uses complementary mechanisms that work together without a single embedding:

Retrieval Pipeline

Tier 0 — Intent Classification. Queries are classified into one of 7 intent types (fact lookup, current state, historical, event reconstruction, change detection, pattern, strategic reflection) via a BM25 exemplar index. This shapes which memory types receive a scoring bonus downstream.

Tier 1 — BM25 + SPLADE Hybrid Search. BM25 via Tantivy (Rust) provides exact lexical matching. SPLADE adds learned sparse neural retrieval — expanding "API specification" to also match "endpoint", "schema", "contract". Results fuse via Reciprocal Rank Fusion.

Tier 1.5 — Graph-Expanded Discovery. Entity relationships in the knowledge graph discover related memories that search missed lexically. A query matching "connection pooling" also finds memories about "PostgreSQL replication" — because both share the PostgreSQL entity.

Tier 2 — ACT-R Cognitive Scoring. Every memory has an activation level computed from access recency, frequency, and contextual relevance. Dream-learned association strengths weight entity connections; reconciliation penalties demote superseded or conflicted states.

Tier 2.5 — Score Normalization. Per-query min-max normalization brings all signals to [0,1] scale before combining.

Tier 3 — Selective Cross-Encoder Reranking. A 22M-parameter cross-encoder (ms-marco-MiniLM-L-6-v2) reranks candidates — but only for fact lookup, pattern, and strategic reflection queries. State and temporal queries skip reranking to preserve chronological and causal ordering.

Tier 4 — Structured Recall. The recall() method layers structured context on top: entity state snapshots, episode membership with sibling expansion, causal chains from the HTMG. One call returns what takes 5+ tool calls elsewhere.

Tier 5 — Temporal Linguistic Geometry (TLG). For state-evolution queries ("What's the current authentication scheme?", "What caused the payments delay?", "What came before MFA?"), TLG runs a grammar-based structural proof over typed state-transition edges. It produces an exact answer (or abstains) with a readable syntactic proof — and composes with BM25 via a zero-confidently-wrong invariant: when TLG's confidence is high, its rank-1 answer replaces BM25's head; when it abstains, BM25 ordering is returned unchanged.

Query intent today is BM25-exemplar classification. A small in-memory Tantivy index of ~70 exemplar queries classifies each search into one of 7 intent classes (fact_lookup, current_state_lookup, historical_lookup, event_reconstruction, change_detection, pattern_lookup, strategic_reflection). The SLM signals from ingest (intent / state_change / topic / role) feed retrieval bonuses gated on this classified intent. Query-side compositional parsing (cue tagging → structured TLG queries) is the next step — see the CTLG design. (Pre-paper · v9 findings)

activation(m) = base_level(m) + spreading_activation(m, query) + noise
                - supersession_penalty - conflict_penalty + hierarchy_bonus
base_level(m) = ln( sum( (time_since_access)^(-decay) ) )
spreading(m)  = sum( learned_PMI_weight(entity) )     ← dream-learned associations
combined(m)   = bm25 * w_bm25 + splade * w_splade + activation * w_actr + graph * w_graph
              ⊕ TLG grammar answer  (when has_confident_answer(), replaces rank-1)

Memory Ingestion Pipeline

Entities, preferences, topics, admission routing, and state-change detection all run on the same memory at ingest time — but the SLM (when enabled) is the primary source of truth on admission / state-change / topic, with regex paths kept alive as fallback for cold-start deployments.

Memory Ingestion Pipeline

Content Classification — Incoming content passes through a dedup gate (SHA-256) then a two-class classifier. NAVIGABLE documents (ADRs, PRDs, YAML configs with headings/structure) get section-aware ingestion: one vocabulary-dense profile memory in the memory store, full document + sections in the document store. ATOMIC fragments (facts, observations, announcements) proceed through the standard pipeline.

5-Head SLM (optional, set NCMS_DEFAULT_ADAPTER_DOMAIN=<name>) — Runs before admission. Produces all five classification outputs (intent / role / topic / admission / state_change) in one forward pass. Its admission_head replaces the regex admission scorer when confident; its state_change_head replaces the state-declaration regex; its topic_head auto-populates Memory.domains; its role_head classifies gazetteer-detected spans into primary / alternative / casual / not_relevant for downstream L2 entity-state grounding.

GLiNER NER — Zero-shot Named Entity Recognition using a 209M-parameter DeBERTa model. Extracts entities across any domain, running in parallel with the SLM — GLiNER's output feeds the knowledge graph (spreading activation, co-occurrence edges, entity-state reconciliation) while the SLM's output feeds ingest decisions. The two are complementary: GLiNER handles open-vocabulary NER, SLM handles typed domain-specific slot extraction.

Admission Routing — 3-way gate: discard, ephemeral cache, or persist. Either the SLM's admission_head (when confident) or the 4-feature regex heuristic (fallback) decides. Memories with importance >= 8.0 bypass admission entirely.

State Reconciliation — When a new entity state arrives ("Redis upgraded to v7.4"), NCMS classifies its relationship to existing states (supports / refines / supersedes / conflicts) and applies bitemporal truth maintenance. Superseded states get is_current=False with validity closure.

Episode Formation — Related memories are automatically grouped into temporal episodes via a 7-signal hybrid linker (BM25, SPLADE, entity overlap, domain match, temporal proximity, source agent, structured anchors like JIRA tickets).

Contradiction Detection (opt-in) — LLM-powered post-ingest scan for factual contradictions against existing related memories.

Knowledge Consolidation (opt-in) — Offline clustering + LLM synthesis of cross-memory patterns into searchable ABSTRACT insights.

Dream Cycles (Project Oracle)

Project Oracle — Dream Cycle Architecture

Like biological sleep consolidation, NCMS runs three non-LLM passes during "sleep" to create the differential access patterns ACT-R cognitive scoring needs to contribute signal:

Dream Rehearsal — Selects high-value memories via 5-signal weighted scoring (PageRank centrality 0.40, staleness 0.30, importance 0.20, frequency 0.05, recency 0.05) and injects synthetic access records.
Association Learning — Computes pointwise mutual information (PMI) from entity co-access patterns in the search log, feeding learned weights into spreading_activation().
Importance Drift — Compares recent access rates to older rates and adjusts memory.importance within bounded limits. Frequently accessed memories rise; neglected ones gracefully decay.

Knowledge Bus & Agent Sleep/Wake

Agents don't poll for updates. They don't call each other directly. Knowledge flows through domain-routed channels — osmotic knowledge transfer.

Knowledge Bus Architecture

# API agent announces a change — frontend agent gets it automatically
await agent.announce_knowledge(
    event="breaking-change",
    domains=["api:user-service"],
    content="GET /users now returns role field",
    breaking=True,
)

Ask/Respond — Non-blocking queries routed by domain. Announce/Subscribe — Fire-and-forget broadcasts to interested agents. Surrogate Response — When agents go offline, they publish knowledge snapshots. Other agents can still ask them questions through the snapshot.

Sleep/Wake/Surrogate Response Cycle

Fine-Tune Your Own Adapter

Three reference adapters ship today at ~/.ncms/adapters/{conversational,software_dev,clinical}/v9/ — but the point of the architecture is that operators train their own for their own domain. The 5-head classifier does its best work when fine-tuned on the kind of content your users actually ingest.

One-command training

# Put your corpus JSONL + taxonomy YAML in a directory:
./my_corpus/
├── gold.jsonl       # hand-labeled examples (start with ~50-75 rows)
├── topics.yaml      # topic_labels: [framework, testing, infra, ...]
└── object_to_topic: # map surface forms to topics

# Run the four-phase pipeline (takes ~5-15 min on Apple Silicon MPS):
uv run python -m experiments.intent_slot_distillation.train_adapter \
    --domain my_domain \
    --taxonomy ./my_corpus/topics.yaml \
    --adapter-dir ./adapters/my_domain/v1 \
    --target-size 500 \
    --adversarial-size 300 \
    --epochs 6 \
    --lora-r 16

What happens:

Bootstrap — loads your gold + any mixed-content seeds (admission / state-change variety). Auto-labels topic/admission/state_change from the taxonomy map where gold doesn't already have them.
Expand (SDG) — template-based synthetic data expansion. 500 target → ~400 deduped examples with full multi-head labels.
Adversarial — generates 200–300 hard cases across 7 failure modes (quoted speech, negated positives, past-flip, third-first contrast, double negation, sarcasm, empty/minimal).
Train + Gate — LoRA fine-tune with class-weighted slot loss. The gate refuses to promote an adapter that doesn't meet thresholds (intent F1 ≥ 0.70, slot F1 ≥ 0.75, confidently-wrong ≤ 10 %) or regresses against a named baseline adapter.

Output: a 2.4 MB adapter directory with lora_adapter/ + heads.safetensors + manifest.json + taxonomy.yaml + eval_report.md (PASS/FAIL gate + per-head F1 table).

Point NCMS at your adapter

# Via config
NCMS_DEFAULT_ADAPTER_DOMAIN=my_domain \
NCMS_SLM_CHECKPOINT_DIR=./adapters/my_domain/v9 \
uv run ncms serve

# Or via benchmark runner
uv run python -m benchmarks longmemeval --features-on \
    --intent-slot-domain my_domain

See Add a domain for the full walk-through of authoring a v9 domain plugin (gazetteer + diversity + archetypes), and v9 domain plugin architecture for the design rationale. Historical reading: P2 plan (the P2 sprint that produced the original 5-head adapter).

Benchmarks

NCMS achieves nDCG@10 = 0.7206 on SciFact — the BEIR dataset most aligned with factual knowledge retrieval — exceeding published ColBERTv2 (0.693, +4.0%) and SPLADE++ (0.710, +1.5%) without dense vectors or LLM at query time. Cross-domain validation on NFCorpus (biomedical) shows consistent improvement: +10.0% over BM25 (0.3188 → 0.3506).

On SWE-bench Django (503 documents, 170 test queries), structured recall achieves Recall AR nDCG@10 = 0.2032, exceeding search-only AR (0.1759) by +15.5%. (Full SWE-bench results)

TLG — state-evolution retrieval (NEW, 2026-04)

Across 11 intent shapes on the hand-curated ADR / project / clinical corpus:

Strategy	Top-5 accuracy	Rank-1 accuracy
BM25	13 / 32 (41 %)	5 / 32 (16 %)
BM25 + `observed_at DESC`	13 / 32 (41 %)	0 / 32 (0 %)
Entity-scoped + path-rerank	14 / 32 (44 %)	6 / 32 (19 %)
TLG grammar	32 / 32 (100 %)	32 / 32 (100 %)

Every TLG answer comes with a readable syntactic proof ("successor = ADR-010 (refines)", "walked 6 predecessors; root = ADR-001"). On LongMemEval's conversational subset the grammar correctly abstains (framing mismatch — LME isn't state-evolution content) and falls through to BM25+SPLADE unchanged. Full validation in docs/tlg-validation-findings.md; the reusable four-domain state-evolution benchmark (MSEB v1) ships at docs/mseb-results.md (original design).

5-Head SLM — ingest classifier (v9, 2026-04)

The shipped 5-head LoRA adapter classifies every memory at ingest into typed labels (intent / role / topic / admission / state_change). v8 attempted a 6th head for query-side cue tagging but joint training saturated; v9 ships the 5 production heads only, and the cue tagger is being designed as a separate sibling adapter (see CTLG design).

Domain	Ingest heads (F1, v9 baseline)	Gazetteer	Head labels trained
conversational/v9	intent 1.000, topic 1.000, admission 1.000, state_change 1.000	open-vocab (none)	preference + topic taxonomy
software_dev/v9	intent 1.000, role 0.79, topic 1.000, admission 1.000, state_change 1.000	712 entries × 9 slots	framework / library / language / pattern / tool / database / service taxonomy
clinical/v9	intent 1.000, role 0.79, topic 1.000, admission 1.000, state_change 1.000	536 entries × 6 slots	medication / procedure / symptom / severity taxonomy

The role head's "primary / alternative / casual / not_relevant" classification on gazetteer-detected spans is the v9 replacement for the v6 BIO slot tagger; it sources canonical state values for L2 entity-state nodes and feeds the role-grounding retrieval bonus (off-by-default pending v10 calibration evidence).

Compare to zero-shot baselines: E5 label-similarity hits intent F1 0.347–0.612 on the same gold — the LoRA adapters gain +0.22 to +0.49 absolute intent F1 while eliminating the 26.7–56.7% confidently-wrong rate.

For full per-head evidence + retrieval-side ablation results (Phase G + Phase H series), see docs/v9-mseb-slm-lift-findings.md and docs/mseb-results.md.

MSEB v1 — state-evolution retrieval across four domains (NEW, 2026-04-21)

A pluggable, gold-audited benchmark for state-evolution memory retrieval: four domains (SWE-bench Verified diffs, PMC clinical case reports, ADR prose, LongMemEval conversations) × four query classes (general / temporal / preference / noise). Head-to-head against mem0 in a 12-cell single-pass run (747 hand-audited gold queries, locked):

Domain	NCMS (hybrid)	mem0 (dense)	Δ NCMS – mem0
MSEB-SoftwareDev (ADR prose)	0.745	0.455	+0.29
MSEB-Clinical (PMC case reports)	0.672	0.224	+0.45
MSEB-SWE (SWE-bench Verified diffs)	0.416–0.456	0.256	+0.16–0.20
MSEB-Convo (LongMemEval)	0.345	0.207	+0.14

Hybrid retrieval beats dense retrieval on state-evolution content by +0.14 to +0.45 rank-1 across every domain tested — not a borderline result. All three backends (NCMS tlg-on, NCMS tlg-off, mem0) correctly reject 100 % of the 59 adversarial off-topic noise queries. Per-class breakdown, per-head SLM contribution analysis, honest TLG limitations, and full reproducibility recipe in docs/mseb-results.md.

./benchmarks/mseb/run_main_12.sh    # One-shot: 12 cells, 4 domains × 3 backends

LongMemEval A/B (500-question non-regression check, 2026-04-20)

The SLM on vs. off on LongMemEval is an axis-mismatch test — conversational memory recall isn't the axis the SLM was built for, so the point of this run is confirming zero regression + acceptable latency, not headline accuracy:

	Baseline (`--features-on`)	SLM (`--intent-slot-domain conversational`)	Δ
Recall@5	0.4680	0.4680	0.0000 (bit-identical across all 6 categories)
Elapsed	10,562 s	11,099 s	+537 s (~48 ms / memory overhead)
Memories stored	10,960	10,960	—
Errors / tracebacks / HTTP 4xx	0	0	—

The classifier ran ~11k forward passes cleanly; it just didn't move the number because LongMemEval's retrieval path doesn't consume the SLM's outputs on the axes it classifies. Expected and desired — the real benchmark for the SLM's admission + state_change + topic heads is state-evolution retrieval, not conversational recall. See docs/completed/intent-slot-history/intent-slot-sprint-4-findings.md §10 for the full A/B breakdown.

Baseline Comparison (SWE-bench Django)

Compared against Mem0 and Letta on SWE-bench Django (850 issues, 80/20 chronological split). NCMS wins 3 of 4 metrics with zero OpenAI API calls — Mem0 and Letta both use OpenAI text-embedding-3-small dense vectors.

Metric	NCMS	NCMS Recall	Mem0	Letta
AR nDCG@10	0.1750	0.2031	0.1550	0.1412
TTL Accuracy	0.6529	—	0.5941	0.7412
CR Temporal MRR	0.0947	—	0.0150	0.0616
LRU nDCG@10	0.3540	—	0.1979	0.1245

See the full ablation study, weight tuning results, and completed milestones for methodology, per-dataset metrics, and development history.

Get Started

pip install ncms                    # Core install
pip install "ncms[docs]"            # + rich document support (DOCX/PPTX/PDF/XLSX)
pip install "ncms[dashboard]"       # + observability dashboard

uv run ncms demo                    # See it in action
uv run ncms serve                   # Start MCP server
uv run ncms dashboard               # Real-time dashboard
uv run ncms load file.md --domains arch   # Matrix-style knowledge download
uv run ncms lint                    # Diagnose memory store health
uv run ncms export --output-dir wiki      # Export as linked markdown wiki

Quickstart Guide — MCP server setup, Claude Code hooks, NeMo agent integration, configuration reference, and local LLM inference.

GPU-Accelerated LLM Inference

NCMS LLM features (contradiction detection, knowledge consolidation) can be accelerated with an NVIDIA DGX Spark running vLLM via the NGC vLLM container.

Deploy Nemotron on DGX Spark:

sudo docker run -d --gpus all --ipc=host --restart unless-stopped \
  --name vllm-nemotron-nano \
  -p 8000:8000 \
  -e VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 \
  -v /root/.cache/huggingface:/root/.cache/huggingface \
  nvcr.io/nvidia/vllm:26.01-py3 \
  vllm serve nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
    --host 0.0.0.0 --port 8000 --trust-remote-code \
    --max-model-len 524288 \
    --enable-auto-tool-choice --tool-call-parser qwen3_coder

Point NCMS at the Spark:

NCMS_CONTRADICTION_DETECTION_ENABLED=true \
NCMS_LLM_MODEL=openai/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
NCMS_LLM_API_BASE=http://spark-ee7d.local:8000/v1 \
NCMS_CONSOLIDATION_KNOWLEDGE_ENABLED=true \
NCMS_CONSOLIDATION_KNOWLEDGE_MODEL=openai/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
NCMS_CONSOLIDATION_KNOWLEDGE_API_BASE=http://spark-ee7d.local:8000/v1 \
uv run ncms serve

The Nemotron 3 Nano (30B total, 3B active MoE) fits entirely in the Spark's 128GB unified memory, delivering sub-second LLM inference.

Note: the ingest-side intent-slot SLM (bert-base-uncased + LoRA) runs happily on Apple Silicon MPS, CUDA, or CPU — no DGX required. The DGX is only for the LLM-dependent opt-in features (contradiction detection, knowledge consolidation, synthesis).

Completed Features

Core retrieval (Phases 0-11)

P1 — Temporal Linguistic Geometry ✅ SHIPPED 2026-04-19

Grammar-based structural retrieval over typed state-transition edges
11 intent shapes (current_state, ordinal, causal_chain, sequence, predecessor, interval, transitive_cause, concurrent, before_named, range, noise)
Zero-confidently-wrong composition invariant with BM25
Readable syntactic proofs on every grammar answer
32/32 top-5 and rank-1 on ADR state-evolution corpus
Full integration: NCMS_TEMPORAL_ENABLED, ncms tlg status|induce, --tlg benchmark flag

P2 — Intent-Slot SLM ✅ SHIPPED 2026-04-20

LoRA multi-head classifier (5 heads, one forward pass per memory)
Replaces admission regex, state-change regex, LLM topic labeller, manual domain tagging, never-shipped preference extractor
3-tier fallback chain (LoRA adapter → E5 zero-shot → heuristic)
Per-deployment adapter training: 4-phase pipeline with pass/fail gate
3 reference adapters shipped (conversational / software_dev / clinical, F1=1.000 on gold)
Dynamic topics (no closed-vocab enum in code; lives in adapter manifest)
Benchmark runner integration (--intent-slot-domain on LongMemEval)
Dashboard event + SQLiteStore.list_topics_seen() for config-free topic enumeration

Content-aware ingestion & document model

Two-class content gate: ATOMIC fragments vs NAVIGABLE documents
Document Profile model (one profile memory + sections in document store)
Content-hash deduplication (SHA-256) at store boundary
Content size gating with importance-based exemptions
Entity quality filtering (rejects junk: numeric %, hex IDs, count patterns)

Retrieval enhancements

Level-first retrieval with intent-driven traversal strategies
Synthesis pipeline with 5 modes (summary, detail, timeline, comparison, evidence)
Emergent topic map from L4 abstract clustering
Temporal query parsing with proximity boost

Tools & interfaces

26 MCP tools via FastMCP
HTTP REST API with bearer token auth
A2A JSON-RPC 2.0 bridge (agent discovery + task routing)
CLI: ncms serve|demo|dashboard|info|load|lint|reindex|export|maintenance|watch|topics|state|episodes|topic-map|tlg
Observability dashboard (SSE + D3 graph + entity / episode / state / intent-slot views)

Ingestion & monitoring

Filesystem watcher with auto-domain classification (ncms watch)
Matrix-style knowledge loader (MD, JSON, YAML, CSV, HTML, DOCX, PPTX, PDF, XLSX)
Index rebuild utility (ncms reindex)
Read-only diagnostics (ncms lint)
Wiki export (ncms export)
Background maintenance scheduler
OpenTelemetry tracing integration
Prometheus metrics endpoint

Deployment & integration

NemoClaw integration (MCP config, OpenClaw skill, sandbox blueprint)
NeMo Agent Toolkit MemoryEditor adapter
Bus heartbeat + offline detection with auto-snapshot
Helm chart for Kubernetes
All-in-one Docker image with pre-baked models
docker-compose multi-agent hub

Evaluation

SciFact ablation: nDCG@10=0.7206, exceeds ColBERTv2 (+4.0%) and SPLADE++ (+1.5%)
SWE-bench Django: Recall AR 0.2032, +15.5% over search; beats Mem0 and Letta on 3 / 4 metrics
TLG ADR validation: 32 / 32 top-5 and rank-1 across 11 intent shapes
Intent-Slot LoRA gate: F1=1.000 on gold across 5 heads, 3 reference domains
Dream cycle benchmark (SciFact, NFCorpus, ArguAna)
LongMemEval: Recall@5=0.4680 (500 questions, 6 categories)
MemoryAgentBench harness (AR, TTL, LRU, selective forgetting)

Roadmap (Post-v1)

P3 — SWE state-evolution benchmark (planned)

MSEB v1 — four-domain state-evolution benchmark (SWE-bench Verified / PMC Clinical / ADR prose / LongMemEval), 747 hand-audited gold queries stratified by general / temporal / preference / noise, head-to-head vs mem0 — results in docs/mseb-results.md; full-scale rerun next
Reusable JSONL artefact that other memory systems can consume without knowing NCMS internals
Gates paper milestone M3 ("confidently-wrong = 0 at scale")

Adapter operations (follow-up from Sprint 4)

ncms train-adapter / adapter-list / adapter-promote CLIs (thin wrappers over the experiment driver)
Drift detection (dashboard watches per-head confidence distributions, warns on OOD content)
Generic-domain adapter (one broad adapter shipped with NCMS as Tier-2 fallback)
LoRA hyperparameter sweep automation
Encoder comparison (RoBERTa / DistilBERT) for latency / quality tradeoff

Distributed infrastructure

NATS / Redis-backed Knowledge Bus transport (implementing existing KnowledgeBusTransport Protocol)
Neo4j / FalkorDB graph backend (implementing existing KnowledgeGraph Protocol)
BM25-scored surrogate responses

Production validation (requires real agent workloads)

Simulated Agent Workday benchmark (3-7 day multi-agent workload for ACT-R validation)
ACT-R weight crossover demonstration (show ACT-R weight becomes beneficial with dream-learned access patterns)
Rehearsal Boost Rate measurement (validate ≥85% of rehearsed memories show activation increase)

Dashboard & observability

Historical replay and time-travel debugging (replay memory state at any point in time)
Intent-slot confidence histogram + drift alerts

See completed milestones and V1 ablation results for development history.

Research Artefacts

Current state (v9):

Main paper — architecture, SciFact/SWE-bench results, ablation studies
v9 MSEB findings — Phase G/H/I SLM-signal ablation results, regex-vs-SLM retrieval audit
MSEB v1 results — four-domain state-evolution benchmark (NCMS hybrid vs mem0 dense)
v9 domain plugin architecture — YAML-native domain plugins (gazetteer + diversity + archetypes)

Forward-looking (planned):

CTLG design — query-side cue tagger as a sibling adapter (post-v8 saturation pivot)
CTLG cue guidelines — annotation rubric for the cue corpus
CTLG grammar — composition rules from cue tags to TLGQuery

Background:

Temporal Linguistic Geometry pre-paper — grammar-theoretic framework for state-evolution retrieval
Intent-Slot Distillation pre-paper — original P2 motivation for the learned multi-head classifier

Sprint-level historical findings (v6/v7 era) live under docs/completed/.

Acknowledgments

GLiNER — Zero-shot NER by Zaratiana et al. (NAACL 2024)
SPLADE — Sparse neural retrieval by Formal et al. (SIGIR 2021), powered by sentence-transformers SparseEncoder
Tantivy — Rust-based full-text search engine
peft — LoRA adapter implementation (HuggingFace PEFT)
transformers — BERT encoder for the intent-slot SLM
safetensors — Adapter artifact serialization
ACT-R — Cognitive architecture by John R. Anderson
Linguistic Geometry — Game-state reduction framework by Boris Stilman — inspiration for TLG's zone / trajectory primitives
BEIR — Heterogeneous IR benchmark by Thakur et al. (NeurIPS 2021)
NetworkX — Graph library powering the knowledge graph
litellm — Universal LLM API proxy
aiosqlite — Async SQLite wrapper

License

MIT

Built for agents that remember — and reason over how knowledge changes.
_{By Shawn McCarthy / Chief Archeologist}