AriadneMem
Code for Paper: AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents
Ask AI about AriadneMem
Powered by Claude Β· Grounded in docs
I know everything about AriadneMem. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Threading the Maze of Lifelong Memory for LLM Agents
AriadneMem is a structured memory system that addresses disconnected evidence and state update challenges in long-horizon LLM agents through a decoupled two-phase pipeline.
π Project Page | π₯ Paper (PDF)
If you find our work is useful in your research, please consider raising a star :star: and citing:
@misc{zhu2026ariadnememthreadingmazelifelong,
title={AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents},
author={Wenhui Zhu and Xiwen Chen and Zhipeng Wang and Jingjing Wang and Xuanzhao Dong and Minzhou Huang and Rui Cai and Hejian Sang and Hao Wang and Peijie Qiu and Yueyue Deng and Prayag Tiwari and Brendan Hogan Rappazzo and Yalin Wang},
year={2026},
eprint={2603.03290},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.03290},
}
π Platform Compatibility
Works seamlessly with any AI platform supporting MCP or Python integration.
Cursor | Claude | Copilot | Python | MCP Client |
|---|---|---|---|---|
| β Fully Tested | π€ Compatible | π€ Compatible | β Fully Tested | π Universal |
π Key Features
- Two-Phase Pipeline: Decouples memory retrieval and state updates for enhanced stability.
- Evidence Threading: Successfully bridges disconnected information across long-horizon tasks.
- Plug & Play: Easy integration with modern AI IDEs and development workflows.
Quick Start
π‘ Hardware: AriadneMem works on both GPU and CPU. Uses remote LLM APIs (OpenAI/Qwen) and local embedding models.
Installation
pip install -r requirements.txt
Configuration
cp config.py.example config.py
Edit config.py:
OPENAI_API_KEY = "your-api-key"
OPENAI_BASE_URL = None # or Qwen: "https://dashscope.aliyuncs.com/compatible-mode/v1"
LLM_MODEL = "gpt-4o" # or "qwen-plus-2025-07-28"
# Per-component model overrides (optional, falls back to LLM_MODEL)
BUILDER_LLM_MODEL = None # Phase I: e.g. "gpt-4.1-mini" for cost savings
ANSWER_LLM_MODEL = None # Phase II: e.g. "gpt-4o" for better quality
# Reasoning mode
REASONING_MODE = "eco" # "eco" | "pro" | "custom"
# Local Embedding Model (no API needed)
# Lightweight option (fast on CPU):
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
# Or for better retrieval quality (GPU accelerates):
# EMBEDDING_MODEL = "Qwen/Qwen3-Embedding-0.6B"
Basic Usage
from main import AriadneMemSystem
from models.memory_entry import Dialogue
# Initialize system
system = AriadneMemSystem(clear_db=True)
# Add dialogues
dialogues = [
Dialogue(speaker="Alice", content="Let's meet at Starbucks tomorrow at 2pm", timestamp="2024-01-15T14:30:00"),
Dialogue(speaker="Bob", content="Sorry, can we change to 3pm?", timestamp="2024-01-15T15:00:00"),
Dialogue(speaker="Alice", content="Sure, 3pm works for me", timestamp="2024-01-15T15:05:00"),
]
system.add_dialogues(dialogues)
# Build memory graph
system.finalize()
# Query
answer = system.ask("What time (in hour) will Alice and Bob meet?")
# Output: "3pm" (correctly handles the state update from 2pm to 3pm)
API Reference
AriadneMemSystem
class AriadneMemSystem:
def __init__(
self,
api_key: str = None, # Uses config.OPENAI_API_KEY if None
model: str = None, # Default LLM model (falls back to config.LLM_MODEL)
base_url: str = None, # Uses config.OPENAI_BASE_URL if None
clear_db: bool = False, # Clear existing database
db_path: str = None, # Custom database path
redundancy_threshold: float = None,
coarsening_threshold: float = None,
builder_model: str = None, # Phase I model override (extraction + coarsening)
answer_model: str = None, # Phase II model override (topology-aware synthesis)
reasoning_mode: str = None # "eco" | "pro" | "custom"
)
def add_dialogue(self, speaker: str, content: str, timestamp: str = None)
def add_dialogues(self, dialogues: List[Dialogue])
def finalize(self) # Build memory graph
def ask(self, question: str) -> str
def get_all_memories() -> List[MemoryEntry]
def print_memories()
Per-Component LLM Models
Different phases can use different LLM models. Set via __init__ params or config.py (init params take priority):
# Option 1: via __init__ (runtime)
system = AriadneMemSystem(
builder_model="gpt-4.1-mini", # Phase I: cheaper model
answer_model="gpt-4o", # Phase II: stronger model
)
# Option 2: via config.py (global default)
BUILDER_LLM_MODEL = "gpt-4.1-mini"
ANSWER_LLM_MODEL = "gpt-4o"
If both are None, all phases use model (or config.LLM_MODEL).
Reasoning Mode
Control retrieval depth and prompt verbosity. Set via __init__ or config.py:
| Mode | MAX_REASONING_PATHS | MAX_REASONING_PATH_DEPTH | Reasoning Length |
|---|---|---|---|
"eco" (default) | 10 | 3 | 1-2 sentences |
"pro" | 25 | 3 | 9-10 sentences |
"custom" | User-defined | User-defined | User-defined template |
# Option 1: via __init__ (runtime)
system = AriadneMemSystem(reasoning_mode="pro")
# Option 2: via config.py (global default)
REASONING_MODE = "eco" # "eco" | "pro" | "custom"
Configuration Reference
LLM Configuration
| Parameter | Description | Example |
|---|---|---|
LLM_MODEL | Default model for all components | gpt-4o, qwen-plus-2025-07-28 |
BUILDER_LLM_MODEL | Phase I model override (set None to use default) | gpt-4.1-mini |
ANSWER_LLM_MODEL | Phase II model override (set None to use default) | gpt-4o |
OPENAI_BASE_URL | API endpoint | https://dashscope.aliyuncs.com/compatible-mode/v1 |
ENABLE_THINKING | Qwen deep thinking mode | True / False |
USE_JSON_FORMAT | Force JSON output | True (recommended) |
Reasoning Modes
| Mode | MAX_REASONING_PATHS | Reasoning Depth | Token Cost | Use Case |
|---|---|---|---|---|
"eco" | 10 | 1-2 sentences | Low | Simple queries, batch testing |
"pro" | 25 | 9-10 sentences | High | Multi-hop, complex reasoning |
"custom" | User-defined | User-defined | Varies | Fine-tuned for specific tasks |
# Switch mode in config.py
REASONING_MODE = "eco" # Fast & token-efficient
REASONING_MODE = "pro" # Thorough & detailed
REASONING_MODE = "custom" # Your own settings + prompt template
Phase I Parameters (Memory Construction)
| Parameter | Default | Paper | Description |
|---|---|---|---|
REDUNDANCY_THRESHOLD | 0.6 | Ξ»_red (Eq.3) | Entropy-aware gating threshold |
COARSENING_THRESHOLD | 0.6 | Ξ»_coal (Eq.6) | Merge vs Link decision threshold |
WINDOW_SIZE | 40 | - | Dialogues per processing window |
OVERLAP_SIZE | 2 | - | Window overlap for context continuity |
Phase II Parameters (Retrieval & Reasoning)
| Parameter | Default | Paper | Description |
|---|---|---|---|
SEMANTIC_TOP_K | 25 | - | Max nodes from semantic search |
KEYWORD_TOP_K | 5 | - | Max nodes from keyword search |
MAX_REASONING_PATH_DEPTH | 3 | L (Eq.10) | Max hops in DFS path discovery (auto-set by mode) |
MAX_REASONING_PATHS | 10/25 | - | Max reasoning paths (eco=10, pro=25, auto-set by mode) |
Prompt Templates (Customizable)
Prompt templates are auto-selected based on REASONING_MODE. You can also define a fully custom template:
# System prompt for topology-aware synthesis
ANSWER_SYSTEM_PROMPT = "You are a QA system with graph-based memory..."
# Custom mode: define your own template
REASONING_MODE = "custom"
_CUSTOM_USER_PROMPT_TEMPLATE = """Q: {query}
{entity_hint}{graph_hint}
{context_str}
... your own reasoning instructions ...
"""
Running Tests
Quick Test
python quick_test.py
LoCoMo Benchmark
# Run on 3 sessions with parallel question processing
python test_locomo10.py --num_sessions 3 --parallel_questions
# Run with LLM-as-Judge evaluation
python test_locomo10.py --num_sessions 3 --use_llm_judge
Multi-hop Reasoning Demo
python demo_multihop.py
MCP Server (Cursor Integration)
AriadneMem can be used as an MCP server in Cursor, providing long-term memory tools directly in your AI chat.
stdio mode (recommended for Cursor):
Edit ~/.cursor/mcp.json:
{
"mcpServers": {
"ariadnemem": {
"command": "/path/to/python",
"args": ["/path/to/MCP/server/stdio_server.py"]
}
}
}
For remote compute (e.g. Slurm clusters), use SSH to jump to the GPU node:
{
"mcpServers": {
"ariadnemem": {
"command": "ssh",
"args": [
"-o", "StrictHostKeyChecking=no",
"-o", "LogLevel=ERROR",
"gpu-node-name",
"/path/to/python",
"/path/to/MCP/server/stdio_server.py"
]
}
}
}
HTTP mode (for programmatic clients):
cd MCP
pip install -r requirements.txt
python run.py
See MCP/README.md for full setup guide with step-by-step CoreWeave/Slurm example, tool reference, and troubleshooting.
π§ Under Active Development: We are currently optimizing memory construction for code and math domains to better handle technical content and formal reasoning.
Key Features
| Feature | Paper Reference | Benefit |
|---|---|---|
| Entropy-Aware Gating | Eq. 2-3 | Filters noise before LLM extraction |
| Conflict-Aware Coarsening | Eq. 5-6 | Merges duplicates while preserving state updates |
| Hybrid Retrieval | Eq. 7 | Semantic + Lexical search for terminal nodes |
| Bridge Discovery | Eq. 9 | Steiner tree approximation for missing links |
| Multi-Hop Path Mining | Eq. 10 | DFS-based reasoning chain discovery |
| Topology-Aware Synthesis | Eq. 11 | Single LLM call with graph-guided reasoning |
Comparison with Baselines
| Dimension | Flat RAG | Planning-based | AriadneMem |
|---|---|---|---|
| Retrieval | Vector search | Multi-round LLM | Graph + Algorithm |
| Multi-hop | Not supported | 3-4 LLM calls | DFS (0 LLM calls) |
| State Updates | Keep all / Conflict | Keep all | Smart merge + temporal edges |
| LLM Calls/Query | 1 | 4-6 | 1 |
| Latency | Fast | Slow | Fast |
System Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AriadneMem Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PHASE I: Asynchronous Memory Construction β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β [Dialogue Stream D] β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββ β
β β Entropy-Aware Gating (Eq.3) β β Ξ¦_gate: filter low-info β
β β H(m) < Ο β block β β
β ββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββ β
β β Atomic Extraction F_ΞΈ (Eq.4) β β LLM: dialogue β entries β
β β De-linearization transform β β
β ββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββ β
β β Conflict-Aware Coarsening β β Merge/Link/Add (Eq.6) β
β β (Eq.5-6) β β
β β β’ Static duplicates β Merge β β
β β β’ State updates β Link edge β β
β ββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββ β
β β VectorStore (LanceDB) β β Multi-view indexing β
β β β’ Semantic (dense vectors) β β
β β β’ Lexical (keyword/BM25) β β
β β β’ Symbolic (metadata) β β
β ββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PHASE II: Real-Time Structural Reasoning β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β [Query q] β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββ β
β β Fast Paths (O(1) lookup) β β Cache/regex short-circuit β
β β Count/List/Relation queries β β
β ββββββββββββββββββββββββββββββββββ β
β β (if miss) β
β βΌ β
β ββββββββββββββββββββββββββββββββββ β
β β Hybrid Retrieval (Eq.7) β β Find terminal nodes V_term β
β β score = Ξ±Β·sim_sem + Ξ²Β·sim_lexβ β
β ββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββ β
β β Base Graph Construction β β Entity/temporal edges β
β β (Eq.8) β β
β ββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββ β
β β Bridge Discovery (Eq.9) β β Steiner tree approximation β
β β Find b* to connect V_term β (no LLM calls!) β
β ββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββ β
β β Multi-Hop Path Mining (Eq.10)β β DFS reasoning chains β
β β Discover logical paths P_q β β
β ββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββ β
β β Topology-Aware Synthesis β β Single LLM call β
β β (Eq.11) β β
β β a = LLM(q, Serialize(G_q)) β β
β ββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β [Answer a] β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Project Structure
AriadneMem/
βββ main.py # Main system entry point
βββ config.py # Configuration (LLM, thresholds, prompts, modes)
βββ requirements.txt # Dependencies
β
βββ core/
β βββ ariadne_memory_builder.py # Phase I: Memory Construction
β βββ ariadne_graph_retriever.py # Phase II: Structural Reasoning
β βββ ariadne_answer_generator.py # Topology-Aware Synthesis
β βββ semantic_normalizer.py # Answer post-processing
β βββ aggregation_builder.py # Entity aggregation
β
βββ models/
β βββ memory_entry.py # MemoryEntry, Dialogue dataclasses
β βββ enhanced_structures.py # EnhancedMemoryIndex, caches
β
βββ database/
β βββ vector_store.py # LanceDB vector store
β
βββ utils/
β βββ llm_client.py # OpenAI-compatible LLM client
β βββ embedding.py # SentenceTransformers embeddings
β
βββ dataset/
β βββ locomo10.json # LoCoMo benchmark data
β
βββ MCP/ # MCP Server (Model Context Protocol)
β βββ README.md # MCP documentation
β βββ run.py # HTTP server entry point
β βββ requirements.txt # MCP dependencies
β βββ mcp_config/
β β βββ settings.py # Server settings (inherits from config.py)
β βββ server/
β βββ stdio_server.py # stdio transport (recommended for Cursor)
β βββ http_server.py # HTTP transport (FastAPI + Streamable HTTP)
β βββ mcp_handler.py # MCP protocol handler (7 tools)
β
βββ test_locomo10.py # Full benchmark evaluation
βββ quick_test.py # Quick functionality test
βββ demo_multihop.py # Multi-hop reasoning demo
Troubleshooting
Q: How to switch to Qwen models?
# config.py
OPENAI_API_KEY = "your-qwen-api-key"
OPENAI_BASE_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1"
LLM_MODEL = "qwen-plus-2025-07-28"
ENABLE_THINKING = True # Enable Qwen's deep thinking mode
Q: Multi-hop reasoning not working?
Check:
- Nodes have shared entities or temporal proximity
- Inspect discovered paths:
graph_path.reasoning_paths - Increase
MAX_REASONING_PATH_DEPTHfor longer chains
Q: How to adjust filtering strength?
# More aggressive filtering (fewer nodes, faster)
REDUNDANCY_THRESHOLD = 0.5
COARSENING_THRESHOLD = 0.5
# More permissive (more nodes, better recall)
REDUNDANCY_THRESHOLD = 0.7
COARSENING_THRESHOLD = 0.7
Citation
@article{zhu2026ariadnemem,
title = {AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents},
author = {Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Wang, Jingjing and Dong, Xuanzhao and Huang, Minzhou and Cai, Rui and Sang, Hejian and Wang, Hao and Qiu, Peijie and Deng, Yueyue and Tiwari, Prayag and Hogan Rappazzo, Brendan and Wang, Yalin},
journal = {Preprint},
year = {2026},
url = {https://github.com/LLM-VLM-GSL/AriadneMem}
}
Acknowledgments
We would like to thank the following projects and teams:
- Codebase: SimpleMem (Special thanks to their open-source contribution!)
- Embedding Models:
- all-MiniLM-L6-v2 (Sentence Transformers) - Lightweight and CPU-friendly
- Qwen3-Embedding - State-of-the-art retrieval performance
- Vector Database: LanceDB - High-performance columnar storage
- Benchmark: LoCoMo - Long-context memory evaluation framework
License
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
You are free to use, share, and adapt this work for non-commercial purposes with proper attribution. For commercial licensing, please contact the authors.
