Markdown Fastrag MCP
Fast markdown RAG with multi-provider embeddings (Vertex AI, Gemini, OpenAI, Voyage), incremental indexing with mtime/size fast-path, stale vector pruning, Milvus vector store.
Installation
npx markdown-fastrag-mcpAsk AI about Markdown Fastrag MCP
Powered by Claude Β· Grounded in docs
I know everything about Markdown Fastrag MCP. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Markdown-FastRAG-MCP
A semantic search engine for markdown documents. An MCP server with non-blocking background indexing, multi-provider embeddings (Gemini, OpenAI, Vertex AI, Voyage), and Milvus / Zilliz Cloud vector storage β designed for multi-agent concurrent access.
This project is a fork of Zackriya-Solutions/MCP-Markdown-RAG, heavily extended for production multi-agent use. Original project is licensed under Apache 2.0.
Ask "what are the tradeoffs of microservices?" and find your notes about service boundaries, distributed systems, and API design β even if none of them mention "microservices."
graph LR
A["Claude Code"] --> M["Milvus Standalone<br/>(Docker)"]
B["Codex"] --> M
C["Copilot"] --> M
D["Antigravity"] --> M
M --> V["Shared Document Index"]
Quick Start
pip install markdown-fastrag-mcp
Add to your MCP host config:
{
"mcpServers": {
"markdown-rag": {
"command": "uvx",
"args": ["markdown-fastrag-mcp"],
"env": {
"EMBEDDING_PROVIDER": "gemini",
"GEMINI_API_KEY": "${GEMINI_API_KEY}",
"MILVUS_ADDRESS": "http://localhost:19530"
}
}
}
}
Tip: Omit
MILVUS_ADDRESSfor local-only use (defaults to SQLite-based Milvus Lite).
Features
- Semantic matching β finds conceptually related content, not just keyword hits
- Multi-provider embeddings β Gemini, OpenAI, Vertex AI, Voyage, or local models
- Async background indexing β non-blocking
index_documentsreturns instantly withjob_id; poll withget_index_status - Event-loop-safe threading β all sync I/O runs in worker threads via
asyncio.to_thread - Smart incremental indexing β mtime/size fast-path skips unchanged files without reading them
- 3-way delta scan β classifies files as new/modified/deleted in one walk; new files skip Milvus delete
- Smart chunk merging β small chunks below
MIN_CHUNK_TOKENSare merged with siblings; parent header context injected - Empty chunk filtering β frontmatter-only and structural-only chunks (headers/separators with no prose) are dropped at indexing and filtered at search time
- Short chunk drop β final chunks below
MIN_FINAL_TOKENS(default 150) are dropped with per-chunk stderr logging - Reconciliation sweep β after each index run, queries all Milvus paths and deletes orphan vectors whose source files no longer exist on disk
- Search dedup β per-file result limiting prevents a single document from dominating results
- Scoped search & pruning β
scope_pathfilters results to subdirectories; pruning never wipes unrelated data - Batch embedding & insert β concurrent batches with 429 retry, chunked Milvus inserts under gRPC 64MB limit
- Shell reindex CLI β
reindex.pyfor large-scale indexing with real-time progress logs
π Documentation
| Document | Description |
|---|---|
| Embedding Providers | All 6 providers: setup, auth, tuning, rate limiting |
| Milvus / Zilliz Setup | Lite vs Standalone vs Zilliz Cloud, Docker Compose, troubleshooting |
| Indexing Architecture | Non-blocking flow, to_thread, 3-way delta, reconciliation sweep |
| Optimization | Chunk merging, header injection, batch insert, search dedup |
Tools
| Tool | Description |
|---|---|
index_documents | Start background index job, returns job_id instantly |
get_index_status | Poll job status (running / succeeded / failed) |
search_documents | Semantic search with relevance scores and file paths |
clear_index | Reset vector database and tracking state |
How It Works
flowchart LR
A["π Markdown Files"] -->|"walk + filter"| B["π Delta Scan<br/>mtime/size"]
B -->|changed| C["βοΈ Chunk + Merge"]
B -->|unchanged| SKIP["βοΈ Skip"]
B -->|deleted| PRUNE["ποΈ Prune"]
C --> D["π§ Embed"]
D -->|"batch insert"| E["πΎ Milvus"]
F["π Query"] --> D
D -->|"kΓ5"| G["π Dedup + Top-K"]
style A fill:#2d3748,color:#e2e8f0
style D fill:#553c9a,color:#e9d8fd
style E fill:#2a4365,color:#bee3f8
style G fill:#22543d,color:#c6f6d5
style PRUNE fill:#742a2a,color:#fed7d7
Configuration
Core
| Variable | Default | Description |
|---|---|---|
EMBEDDING_PROVIDER | local | gemini, openai, openai-compatible, vertex, voyage |
EMBEDDING_DIM | 768 | Vector dimension |
MILVUS_ADDRESS | .db/milvus_markdown.db | Milvus address or local file path |
MARKDOWN_WORKSPACE | β | Lock workspace root |
Indexing
| Variable | Default | Description |
|---|---|---|
MARKDOWN_CHUNK_SIZE | 2048 | Token chunk size |
MARKDOWN_CHUNK_OVERLAP | 100 | Token overlap between chunks |
MIN_CHUNK_TOKENS | 300 | Small-chunk merge threshold |
MIN_FINAL_TOKENS | 150 | Drop final chunks below this token count |
DEDUP_MAX_PER_FILE | 1 | Max results per file (0 = off) |
EMBEDDING_BATCH_SIZE | 250 | Texts per API call |
EMBEDDING_CONCURRENT_BATCHES | 4 | Parallel batches |
EMBEDDING_BATCH_DELAY_MS | 0 | Delay (ms) between batch waves |
MILVUS_INSERT_BATCH | 5000 | Rows per Milvus insert (gRPC 64MB limit) |
Tip: Defaults work well for most vaults. Adjust
MIN_CHUNK_TOKENS/MIN_FINAL_TOKENSif short notes are being dropped unexpectedly. Changes require a force reindex (reindex.py --force).See Embedding Providers for full auth and tuning options.
Performance
| Metric | Result |
|---|---|
| Unchanged files β hash computations | 0 (mtime/size fast-path) |
| Changed file β embed + insert | ~3 seconds |
| No changes β full scan | instant |
| Full reindex (1300 files, 23K chunks) | ~7β8 minutes |
License
Apache 2.0 β see LICENSE for full text.
This project is a fork of MCP-Markdown-RAG by Zackriya Solutions. Original project is licensed under Apache 2.0; this fork maintains the same license.
Key additions over upstream:
- Multi-provider embeddings (Gemini, Vertex AI, OpenAI, Voyage)
- Milvus vector store replacing Qdrant
- Non-blocking background indexing with
asyncio.to_thread - 3-way delta scan (new/modified/deleted)
- Smart chunk merging with parent header injection
- Empty chunk filtering (frontmatter-only / structural-only drop)
- Short chunk drop (final chunks below 150 tokens with per-chunk logging)
- Reconciliation sweep (Milvusβdisk ghost vector cleanup)
- Scoped search & pruning, batch embedding, shell CLI
- VS Code Copilot MCP compatibility (dummy params for zero-required-arg tools)
