📦

Flywheel Memory

MCP server giving AI a knowledge graph over Obsidian vaults. 13-layer scoring that learns. Local-first, zero cloud.

0 installs

Trust: 39 — Low

Rag

Ask AI about Flywheel Memory

I know everything about Flywheel Memory. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Flywheel

Flywheel turns your Obsidian vault into safe local memory for AI agents.

Search your notes with real context, write back safely, and keep your Markdown on your machine.

Get Started · See It Work · What It Does · Skills + Flywheel · Benchmarks · Testing · Documentation · License

Flywheel is a local-first memory layer for AI agents working over Obsidian and plain Markdown. It gives agents grounded search across your notes, safe bounded writes back into the vault, and local indexes that stay on your machine.

Built for people who want AI to work over real notes without handing their vault to a cloud app. Your files stay readable Markdown, semantic search is optional and local, and every write is inspectable and reversible.

Grounded search — find the notes that matter, plus the linked context around them, without making the model trawl through a pile of files.
Safe reversible writes — update a live vault through bounded operations that preserve Markdown structure and can be undone.
Local-first by default — keep your notes on disk, use plain Markdown as the source of truth, and add local semantic search only if you want it.
Portable by construction — note text and entity edits land in Markdown. Flywheel's learned state (memories, link feedback, scoring) lives locally beside the vault in .flywheel/state.db. No cloud, no proprietary export needed — the parts that are your notes stay readable Markdown by definition.

Why not raw file access or naive RAG?

Raw file access gives an agent text, not memory. Flywheel adds better ranking across notes, linked context that helps the model stay grounded, and write operations that are bounded enough to trust inside a live vault.

A 30-second workflow

From the carter-strategy demo:

Ask: "How much have I billed Acme Corp?"
Flywheel searches the right notes, returns connected context, and answers from the vault.
If you want to act on the result, the same session can log follow-ups or update the right note as a visible, bounded change.

Get Started

Quick start

git clone https://github.com/velvetmonkey/flywheel-memory.git
cd flywheel-memory/demos/carter-strategy && claude

Then ask: "How much have I billed Acme Corp?"

Demo	You are	Ask this
carter-strategy	Solo consultant	"How much have I billed Acme Corp?"
artemis-rocket	Rocket engineer	"What's blocking propulsion?"
nexus-lab	PhD researcher	"How does AlphaFold connect to my experiment?"
zettelkasten	Zettelkasten student	"How does spaced repetition connect to active recall?"

Ready to use Flywheel against your own notes instead of the demo? Install on your vault:

Your Vault in 2 Minutes

Install Flywheel as an Agent Skill (the standard onboarding for AI tooling — npx skills uses GitHub as a registry):

Install the skill. From any directory:
```
npx -y skills add velvetmonkey/flywheel-memory -g
```
Drop -g to install at project scope (<vault>/.claude/skills/) instead of global (~/.claude/skills/). The skill teaches an agent when to use Flywheel and walks the user through the next two steps automatically.
Wire the MCP server. From your vault directory:
```
bash <(curl -fsSL https://raw.githubusercontent.com/velvetmonkey/flywheel-memory/main/skills/flywheel/scripts/install.sh)
```
Merges Flywheel into <vault>/.mcp.json. Windows users: install.ps1 — same idempotent merge for PowerShell.
Restart your client (claude / codex) from the vault directory. MCP servers register at startup only.

Then ask a question. Flywheel watches the vault, maintains local indexes, and serves structured context to MCP clients. Your source of truth stays in Markdown. If you delete .flywheel/state.db, Flywheel rebuilds note indexes from the vault — learned state (memories, link feedback) regenerates with use.

Manual install (no installers — for Cursor, Windsurf, VS Code, Continue.dev, etc.)

If you'd rather hand-edit .mcp.json (e.g. integrating with a non-Claude-Code client), add this block to your client's MCP config:

{
  "mcpServers": {
    "flywheel": {
      "command": "npx",
      "args": ["-y", "@velvetmonkey/flywheel-memory"]
    }
  }
}

cd /path/to/your/vault && claude

The skill itself is optional in this path — clients without a skills surface still get the full MCP tool set. Skill source: skills/flywheel/.

Optional: Tool presets

The agent preset (default) provides a focused set of core tools. Use power for tier 1+2 (adds wikilinks, corrections, note-ops, schema), full to expose the entire tool surface immediately, or auto for the full surface plus the informational discover_tools helper.

Preset	Tools	Categories	Behaviour
`agent` (default)	13	search, read, write, tasks, memory, diagnostics	Focused tier-1 surface — search, read, write, tasks, memory
`power`	17	search, read, write, tasks, memory, diagnostics, wikilinks, corrections, note-ops, schema	Tier 1+2 — agent + wikilinks, corrections, note-ops, schema
`full`	19	search, read, write, tasks, memory, diagnostics, wikilinks, corrections, note-ops, schema, graph, temporal	All categories visible at startup
`auto`	20	search, read, write, graph, schema, wikilinks, corrections, tasks, memory, note-ops, temporal, diagnostics	Full surface + informational `discover_tools` helper

Claude Code note: the memory merged tool is suppressed under Claude Code (CLAUDECODE=1) because Claude Code ships its own memory plane. Agent preset exposes 12 tools under Claude Code instead of 13; the briefing entrypoint still works as memory(action: "brief").

Compose bundles for custom configurations:

{
  "mcpServers": {
    "flywheel": {
      "command": "npx",
      "args": ["-y", "@velvetmonkey/flywheel-memory"],
      "env": {
        "FLYWHEEL_TOOLS": "agent,graph"
      }
    }
  }
}

Browse all tools -> | Preset recipes ->

Multiple vaults

Serve more than one vault from a single Flywheel instance with FLYWHEEL_VAULTS:

{
  "mcpServers": {
    "flywheel": {
      "command": "npx",
      "args": ["-y", "@velvetmonkey/flywheel-memory"],
      "env": {
        "FLYWHEEL_VAULTS": "personal:/home/you/obsidian/Personal,work:/home/you/obsidian/Work"
      }
    }
  }
}

Search automatically spans all vaults and tags each result with its source vault. Each vault keeps separate indexes, graph state, file watchers, and config.

Full multi-vault configuration -> | Client setup examples ->

Windows users

If you ran install.ps1 in step 2 above, the Windows-specific config (cmd /c npx and FLYWHEEL_WATCH_POLL: "true") is already written into your .mcp.json automatically — no further action needed.

If you're hand-editing .mcp.json instead, three things differ from macOS and Linux:

Use cmd /c npx instead of npx. On Windows, npx is installed as a .cmd script and cannot be spawned directly.
Set VAULT_PATH to your vault's Windows path.
Set FLYWHEEL_WATCH_POLL: "true". Without polling, Flywheel will not reliably pick up changes made from Obsidian on Windows.

See docs/CONFIGURATION.md#windows for the full example.

If you use Cursor, Windsurf, VS Code, OpenClaw, or another client, see docs/SETUP.md for client-specific configuration. For OpenClaw, use the dedicated OpenClaw integration guide.

See It Work

Voice: The learning loop

From the carter-strategy demo: log a call by voice, watch wikilinks and suggestions appear, accept and reject a few, then log again — the suggestions improve immediately.

https://github.com/user-attachments/assets/cb9e4945-7f0b-410d-85ef-0c42ffc18c6e

https://github.com/user-attachments/assets/bfdae034-6217-426e-bb1d-ff8e2f0d4bc3

https://github.com/user-attachments/assets/4a0635ff-dd73-4fb1-933d-bf384822e2ce

Write: Auto-wikilinks on mutation

> Log that Stacy reviewed the security checklist before the Beta Corp kickoff

flywheel -> edit_section action=add
  path: "daily-notes/2026-01-04.md"
  section: "Log"
  suggestOutgoingLinks: true
  content: "[[Stacy Thompson|Stacy]] reviewed the [[API Security Checklist|security checklist]]
            before the [[Beta Corp Dashboard|Beta Corp]] kickoff
            -> [[GlobalBank API Audit]], [[Acme Data Migration]]"

You type a normal sentence. Flywheel resolves known entities, detects prospective entities (proper nouns, acronyms, CamelCase terms), and adds wikilinks and suggests related links based on aliases, co-occurrence, graph structure, and semantic context. Suggested outgoing links are optional and off by default. Enable them where you want the graph to grow naturally, such as daily notes, meeting logs, or voice capture. Configuration guide ->

Boundaries

Writes happen through visible tool calls.
Changes stay within the vault unless you explicitly point a tool somewhere else.
Git commits are opt-in.
Proactive linking can be disabled.

Reproduce it yourself: The carter-strategy demo includes a run-demo-test.sh script that runs the full sequence end to end with claude -p, checking tool usage and vault state between steps.

Policy example: Search the vault, then act on it

> Create a policy that finds overdue invoices and logs follow-up tasks in today's daily note

flywheel -> policy action=author
  description: "Find invoices with status:sent, create follow-up task list in daily note"
  ✓ Saved to .flywheel/policies/overdue-invoice-chaser.yaml

> Preview the overdue-invoice-chaser policy

flywheel -> policy action=preview name=overdue-invoice-chaser
  Step 1: vault_search: query "type:invoice status:sent" in invoices/ -> 3 results
  Step 2: edit_section: would append to daily-notes/2026-03-31.md#Tasks
  (no changes made; preview only)

> Execute it

flywheel -> policy action=execute name=overdue-invoice-chaser
  ✓ 2 steps executed, 1 note modified, committed as single git commit

Policies search the vault, then write back. Author them in plain language, preview before running, and undo with one call if needed. Policies guide -> | Examples ->

What It Does

Search with context

One search call returns enough context for the model to answer grounded questions: frontmatter, section-aware snippets, dates, and linked notes that matter. Keyword search (BM25) handles exact terms. Optional local semantic search helps when the right note is related but not explicitly linked yet. Together they reduce file-hopping and make answers more reliable over a real vault. How search works ->

Write safely

Every mutation is conflict-detected with a SHA-256 content hash and reversible with one undo. Writes preserve Markdown structure, so edits do not corrupt tables, callouts, code blocks, frontmatter, links, comments, or math. Auto-wikilinks stay deterministic and traceable. For one-off edits, use the direct write tools. For repeatable workflows that search the vault and act on the results, use policies, saved YAML workflows that branch on vault state and run multiple write steps as a single atomic operation. How scoring works -> | Policies guide ->

Build memory over time

Every accepted link strengthens the graph. Every rejected link updates the scorer. Every write adds more context for the next read. memory(action: "brief") assembles a token-budgeted summary of recent activity, and memory persists observations with confidence decay. The graph can be exported through graph(action: "export") as GraphML for visualization in tools like Gephi or NetworkX — see the carter-strategy demo for an example. Configuration ->

Skills + Flywheel

Skills encode methodology: how to do something. Flywheel encodes knowledge: what you know. They are complementary layers:

Layer	What it provides	Example
Skills	Procedures, templates, reasoning frameworks	"How to write a client proposal"
Flywheel	Entities, relationships, history, context	"Everything you know about this client"

An agent calling a proposal-writing skill works better when it can also search your vault for the client's history, past invoices, project notes, and team relationships. Skills tell agents how to work. Flywheel tells them what you know.

Other tools in the agent-context space treat knowledge as a side-effect of skill execution: state files written by a harness, scoped to its conventions. Flywheel treats knowledge as the substrate skills run on top of — an entity graph, temporal history, and grounded retrieval that any skill in any harness can read through MCP. A harness needs Flywheel-shaped state to be accurate; Flywheel works with any harness.

For install steps, see Your Vault in 2 Minutes above. Skill source and example queries: skills/flywheel/.

OpenClaw skills and Flywheel connect through MCP. OpenClaw routes intent and manages session flow; Flywheel provides the structured context and safe writes that make responses accurate. Integration guide ->

The Flywheel Suite

Flywheel Memory is the core memory engine. Flywheel Crank is the Obsidian plugin that visualizes the same local graph and workflows. Start with Flywheel Memory; add Crank when you want a UI around the same vault.

Benchmarks

Agent-first tools should prove their claims. Flywheel ships with reproducible benchmarks against academic retrieval standards:

HotpotQA full end to end: 90.0% document recall on 50 questions / 4,960 docs. Latest artifact: April 10, 2026. Cost in that run: $0.083/question.
LoCoMo full end to end: 81.9% evidence recall and 54.0% answer accuracy on 695 scored questions / 272 sessions. Latest artifact: April 10, 2026. Final token F1: 0.431.
LoCoMo unit retrieval: 84.8% Recall@5 and 90.4% Recall@10 on the full non-adversarial retrieval set.

Every number below ties back to a checked-in report or reproducible harness in the repo.

Multi-hop retrieval vs. academic baselines (HotpotQA, 500 questions, 4,960 documents):

System	Recall	Training data
BM25 baseline	~75%	None
TF-IDF + Entity	~80%	None
Baleen (Stanford)	~85%	HotpotQA
MDR (Facebook)	~88%	HotpotQA
Flywheel	90.0%	None
Beam Retrieval	~93%	End-to-end

Conversational memory retrieval (LoCoMo, 1,531 scored retrieval queries, 272 session notes):

Category	Recall@5	Recall@10
Overall	84.8%	90.4%
Single-hop	88.1%	91.7%
Commonsense	95.4%	98.3%
Multi-hop	58.1%	72.7%
Temporal	56.9%	67.4%

E2E with Claude Sonnet (latest checked-in 695-question run): 97.4% single-hop evidence recall, 73.7% multi-hop evidence recall, 81.9% overall evidence recall, and 54.0% answer accuracy (Claude Haiku judge). Full methodology and caveats ->

Directional, not apples-to-apples. Test settings, sample sizes, retrieval pools, and metrics differ. Flywheel searches 4,960 pooled docs, which is harder than the standard HotpotQA distractor setting of 10 docs and much smaller than fullwiki. Academic retrievers are trained on the benchmark; Flywheel uses no benchmark training data. Expect about 1 percentage point of run-to-run variance from LLM non-determinism. Full caveats ->

demos/hotpotqa/ · demos/locomo/ · Full methodology ->

Testing

3,292 defined tests across 185 test files and about 64.4k lines of test code. CI runs focused jobs on Ubuntu, plus a full matrix on Ubuntu and Windows across Node 22 and 24.

Graph quality: Latest generated report shows balanced-mode 50.6% precision / 66.7% recall / 57.6% F1 on the primary synthetic vault, along with multi-generation, archetype, chaos, and regression coverage. Report ->
Live AI testing: Real claude -p sessions verify tool adoption end to end, not just handler logic.
Write safety: Git-backed conflict detection, atomic rollback, and 100 parallel writes with zero corruption in the checked-in test suite.
Security: Coverage includes SQL injection, path traversal, Unicode normalization, and permission bypass cases.

Full methodology and results ->

Documentation

Doc	Why read it
PROVE-IT.md	Start here to see the project working quickly
TOOLS.md	Full tool reference
COOKBOOK.md	Example prompts by use case
SETUP.md	Full setup guide for your vault
CONFIGURATION.md	Environment variables, presets, and custom tool sets
ALGORITHM.md	Link scoring and search ranking details
ARCHITECTURE.md	Indexing, graph, and auto-wikilink design
TESTING.md	Benchmarks, methodology, and test coverage
TROUBLESHOOTING.md	Diagnostics and recovery
SHARING.md	Privacy notes, tracked data, and shareable stats
VISION.md	Project direction and longer-term goals

License

Apache-2.0. See LICENSE for details.