📦

Markdown Fastrag MCP

Fast markdown RAG with multi-provider embeddings (Vertex AI, Gemini, OpenAI, Voyage), incremental indexing with mtime/size fast-path, stale vector pruning, Milvus vector store.

0 installs

1 stars

Trust: 59 — Fair

Installation

npx markdown-fastrag-mcp

Ask AI about Markdown Fastrag MCP

I know everything about Markdown Fastrag MCP. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Markdown-FastRAG-MCP

A semantic search engine for markdown documents. An MCP server with non-blocking background indexing, multi-provider embeddings (Gemini, OpenAI, Vertex AI, Voyage), and Milvus / Zilliz Cloud vector storage — designed for multi-agent concurrent access.

This project is a fork of Zackriya-Solutions/MCP-Markdown-RAG, heavily extended for production multi-agent use. Original project is licensed under Apache 2.0.

Ask "what are the tradeoffs of microservices?" and find your notes about service boundaries, distributed systems, and API design — even if none of them mention "microservices."

graph LR
    A["Claude Code"] --> M["Milvus Standalone<br/>(Docker)"]
    B["Codex"] --> M
    C["Copilot"] --> M
    D["Antigravity"] --> M
    M --> V["Shared Document Index"]

Quick Start

pip install markdown-fastrag-mcp

Add to your MCP host config:

{
  "mcpServers": {
    "markdown-rag": {
      "command": "uvx",
      "args": ["markdown-fastrag-mcp"],
      "env": {
        "EMBEDDING_PROVIDER": "gemini",
        "GEMINI_API_KEY": "${GEMINI_API_KEY}",
        "MILVUS_ADDRESS": "http://localhost:19530"
      }
    }
  }
}

Tip: Omit MILVUS_ADDRESS for local-only use (defaults to SQLite-based Milvus Lite).

Features

Semantic matching — finds conceptually related content, not just keyword hits
Multi-provider embeddings — Gemini, OpenAI, Vertex AI, Voyage, or local models
Async background indexing — non-blocking index_documents returns instantly with job_id; poll with get_index_status
Event-loop-safe threading — all sync I/O runs in worker threads via asyncio.to_thread
Smart incremental indexing — mtime/size fast-path skips unchanged files without reading them
3-way delta scan — classifies files as new/modified/deleted in one walk; new files skip Milvus delete
Smart chunk merging — small chunks below MIN_CHUNK_TOKENS are merged with siblings; parent header context injected
Empty chunk filtering — frontmatter-only and structural-only chunks (headers/separators with no prose) are dropped at indexing and filtered at search time
Short chunk drop — final chunks below MIN_FINAL_TOKENS (default 150) are dropped with per-chunk stderr logging
Reconciliation sweep — after each index run, queries all Milvus paths and deletes orphan vectors whose source files no longer exist on disk
Search dedup — per-file result limiting prevents a single document from dominating results
Scoped search & pruning — scope_path filters results to subdirectories; pruning never wipes unrelated data
Batch embedding & insert — concurrent batches with 429 retry, chunked Milvus inserts under gRPC 64MB limit
Shell reindex CLI — reindex.py for large-scale indexing with real-time progress logs

📚 Documentation

Document	Description
Embedding Providers	All 6 providers: setup, auth, tuning, rate limiting
Milvus / Zilliz Setup	Lite vs Standalone vs Zilliz Cloud, Docker Compose, troubleshooting
Indexing Architecture	Non-blocking flow, `to_thread`, 3-way delta, reconciliation sweep
Optimization	Chunk merging, header injection, batch insert, search dedup

Tools

Tool	Description
`index_documents`	Start background index job, returns `job_id` instantly
`get_index_status`	Poll job status (`running` / `succeeded` / `failed`)
`search_documents`	Semantic search with relevance scores and file paths
`clear_index`	Reset vector database and tracking state

How It Works

flowchart LR
    A["📁 Markdown Files"] -->|"walk + filter"| B["🔍 Delta Scan<br/>mtime/size"]
    B -->|changed| C["✂️ Chunk + Merge"]
    B -->|unchanged| SKIP["⏭️ Skip"]
    B -->|deleted| PRUNE["🗑️ Prune"]
    C --> D["🧠 Embed"]
    D -->|"batch insert"| E["💾 Milvus"]

    F["🔎 Query"] --> D
    D -->|"k×5"| G["📊 Dedup + Top-K"]

    style A fill:#2d3748,color:#e2e8f0
    style D fill:#553c9a,color:#e9d8fd
    style E fill:#2a4365,color:#bee3f8
    style G fill:#22543d,color:#c6f6d5
    style PRUNE fill:#742a2a,color:#fed7d7

Configuration

Core

Variable	Default	Description
`EMBEDDING_PROVIDER`	`local`	`gemini`, `openai`, `openai-compatible`, `vertex`, `voyage`
`EMBEDDING_DIM`	`768`	Vector dimension
`MILVUS_ADDRESS`	`.db/milvus_markdown.db`	Milvus address or local file path
`MARKDOWN_WORKSPACE`	—	Lock workspace root

Indexing

Variable	Default	Description
`MARKDOWN_CHUNK_SIZE`	`2048`	Token chunk size
`MARKDOWN_CHUNK_OVERLAP`	`100`	Token overlap between chunks
`MIN_CHUNK_TOKENS`	`300`	Small-chunk merge threshold
`MIN_FINAL_TOKENS`	`150`	Drop final chunks below this token count
`DEDUP_MAX_PER_FILE`	`1`	Max results per file (`0` = off)
`EMBEDDING_BATCH_SIZE`	`250`	Texts per API call
`EMBEDDING_CONCURRENT_BATCHES`	`4`	Parallel batches
`EMBEDDING_BATCH_DELAY_MS`	`0`	Delay (ms) between batch waves
`MILVUS_INSERT_BATCH`	`5000`	Rows per Milvus insert (gRPC 64MB limit)

Tip: Defaults work well for most vaults. Adjust MIN_CHUNK_TOKENS / MIN_FINAL_TOKENS if short notes are being dropped unexpectedly. Changes require a force reindex (reindex.py --force).

See Embedding Providers for full auth and tuning options.

Performance

Metric	Result
Unchanged files — hash computations	0 (mtime/size fast-path)
Changed file — embed + insert	~3 seconds
No changes — full scan	instant
Full reindex (1300 files, 23K chunks)	~7–8 minutes

License

Apache 2.0 — see LICENSE for full text.

This project is a fork of MCP-Markdown-RAG by Zackriya Solutions. Original project is licensed under Apache 2.0; this fork maintains the same license.

Key additions over upstream:

Multi-provider embeddings (Gemini, Vertex AI, OpenAI, Voyage)
Milvus vector store replacing Qdrant
Non-blocking background indexing with asyncio.to_thread
3-way delta scan (new/modified/deleted)
Smart chunk merging with parent header injection
Empty chunk filtering (frontmatter-only / structural-only drop)
Short chunk drop (final chunks below 150 tokens with per-chunk logging)
Reconciliation sweep (Milvus↔disk ghost vector cleanup)
Scoped search & pruning, batch embedding, shell CLI
VS Code Copilot MCP compatibility (dummy params for zero-required-arg tools)