๐ฆ
Local RAG MCP
MCP server: Local RAG MCP
0 installs
28 stars
7 forks
Trust: 47 โ Fair
Ai
Installation
npx local-rag-mcpAsk AI about Local RAG MCP
Powered by Claude ยท Grounded in docs
I know everything about Local RAG MCP. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Loading tools...
Reviews
Documentation
Local RAG/MCP Knowledge Base Assistant
๐ The Problem
- Growing Documentation: Knowledge scattered across files
- Information Retrieval: Hard to find answers without keywords
- Privacy Concerns: Cloud solutions may not comply with policies
Users โ Search โ Answer = ๐ซ
โจ The Solution
A local, intelligent Q&A system using:
- RAG: Semantic search over documentation
- MCP: Dynamic document access
- Local LLM: Privacy-preserving answers (Ollama)
โจ Key Benefits
- โ Privacy-first (runs locally)
- โ No API costs
- โ Fast semantic search
- โ Intelligent document access
- โ Complete data control
๐๏ธ Architecture - Top Level
โโโโโโโโโโโโโโโโโโโโโโโโ
โ User Interface โ (CLI)
โโโโโโโโโโโโฌโโโโโโโโโโโโ
โ
โโโโโโโดโโโโโโ
โผ โผ
[RAG] [MCP]
Query Tools
โ โ
โโโโโโโฌโโโโโโ
โผ
[Ollama LLM]
๐๏ธ Architecture - Storage
โโโโโโโโโโโโโโโโโโ
โ FAISS Index โ Vector Database
โ + MCP Tools โ
โโโโโโโโโโฌโโโโโโโโ
โ
โโโโโโผโโโโโโ
โ docs/ โ
โdirectory โ
โโโโโโโโโโโโ
๐ RAG Pipeline
- Document Loading โ Read .md, .txt, .pdf, .docx
- Chunking โ Split into 700-char chunks
- Embedding โ Use SentenceTransformers
- Indexing โ Build FAISS vector index
- Query โ Retrieve top 5 similar chunks
- Prompt Building โ Create context-aware prompt
- LLM Generation โ Get answer from model
๐ Why FAISS?
- Fast vector similarity search
- Lightweight and memory-efficient
- No external dependencies
- Perfect for local deployments
- Millions of vectors supported
๐ง MCP - Model Context Protocol
MCP provides standardized interface for LLM tool access:
read_document(file_path)
list_documents()
search_documents(query)
๐ง MCP Benefits
- Tool Use by LLM
- Real-time document access
- Standardized interface
- Easy to extend
- Local tool execution
๐ป Tech Stack
Language: Python 3.10+
Vector DB: FAISS
Embeddings: SentenceTransformers
LLM: Ollama (local)
MCP: FastMCP
๐ Project Structure
src/
โโโ config.py Configuration
โโโ main.py CLI entry point
โโโ assistant.py Main orchestrator
โโโ rag/
โ โโโ ingest.py Load documents
โ โโโ chunk.py Split text
โ โโโ embed.py Generate embeddings
โ โโโ build_index.py Build FAISS index
โ โโโ query.py Retrieve & generate
โโโ mcp/
โ โโโ server.py MCP tool definitions
โ โโโ client.py MCP client wrapper
โโโ docs/ Documentation
๐ Index Building (Setup)
$ python main.py build-index
1. Load documents
โ
2. Split into chunks
โ
3. Generate embeddings
โ
4. Build FAISS index
โ
5. Save files
๐ Query Processing (Runtime)
User Question
โ
Embed question
โ
Search FAISS โ Top 5 chunks
โ
LLM decides: Use MCP tools?
โ
Build prompt + context
โ
Call Ollama
โ
Return answer + sources
โจ Core Features
- Semantic Search: Find by meaning, not keywords
- Multi-format: .md, .txt, .pdf, .docx files
- Source Attribution: Shows document sources
- MCP Tools: LLM can read full documents
- No External APIs: Runs locally only
- Fast Retrieval: Sub-second search
โ๏ธ Configuration Options
CHUNK_SIZE = 700
CHUNK_OVERLAP = 100
EMBEDDING_MODEL = "all-MiniLM-L6-v2"
OLLAMA_MODEL = "qwen3:0.6b"
TOP_K = 5
๐ฌ Live Demo - Starting
$ python main.py
Output:
๐ค Company Knowledge Base
Ask questions about documentation
Type 'exit' to stop
๐ฌ Demo - Query 1
โ What are company values?
๐ค Innovation, integrity, collaboration
๐ Sources:
โข Loan Rangers Team.md
โข Info Security.md
๐ฌ Demo - Query 2
โ What documents do we have?
๐ค [Uses MCP list_documents]
โข Loan Rangers Team.md
โข Information Security.md
โข Services.md
๐ฌ Demo - Query 3
โ Full security policy?
๐ค [Uses MCP read_document]
[Full document content...]
๐ Security - Local vs Cloud
Cloud: Data โ Internet โ Server
- โ ๏ธ Network transmission
- โ ๏ธ External storage
- โ ๏ธ Subscription costs
Local: Data โ Local System
- โ No transmission
- โ Local storage only
- โ No costs
๐ Implementation Safeguards
- MCP Sandbox: Prevents path traversal
- Local Storage: Documents stay on device
- No Telemetry: No tracking
- Offline Ready: Works without internet
โก Performance Benchmarks
Index Building: ~30s (one-time)
Query Embedding: ~50ms
FAISS Search: ~5ms
LLM Generation: 2-5s
Total Cycle: 2-6s
โก Tuning for Speed
# Faster (smaller model):
OLLAMA_MODEL = "qwen3:0.6b"
# Faster retrieval:
TOP_K = 3
CHUNK_SIZE = 500
๐ข Deployment - Single Machine
1. Install Ollama & Python deps
2. Copy docs/ to server
3. Build index
4. Run with nohup
$ nohup python main.py > log &
๐ข Scaling - Option 1: FastAPI
[HTTP Clients] [HTTP Clients + Webllm]
โ โ
[FastAPI] [FastAPI]
โ โ
[Ollama + FAISS] [FAISS]
๐ข Scaling - Option 2: Distributed
[Clients] โ [Load Balancer]
โ
[Multiple Retrievers]
๐ข Storage Scaling
Docs Index Build
10 MB ~2 MB ~5s
100 MB ~20 MB ~30s
1 GB ~200 MB ~5min
๐ฎ Phase 2: Enhanced Features
- โ Web UI (Streamlit)
- โ API endpoints
- โ Multi-language support
- โ Document versioning
- โ Fine-tuned embeddings
๐ฎ Phase 3: Advanced
- โ Conversation memory
- โ Multi-hop reasoning
- โ Metadata filtering
- โ Feedback loop
- โ Analytics dashboard
๐ฎ Phase 4: Enterprise
- โ User authentication
- โ Audit logging
- โ Role-based access
- โ LLM fine-tuning
- โ Cost analysis
๐ Why This Works
| Aspect | Traditional | Our RAG |
|---|---|---|
| Understanding | Keywords | Semantic |
| Answers | Documents | Direct |
| Privacy | Cloud | Local |
| Cost | Subscription | One-time |
| Speed | Slow | Sub-second |
โ What You Have Now
- Local privacy-first knowledge base
- Fast semantic search (FAISS)
- Intelligent tool use (MCP)
- Maintainable Python code
- Foundation for enterprise features
๐ Quick Reference
# Build index
python main.py build-index
# Run interactively
python main.py
# Check config
cat config.py
๐ Resources
- Code: MobilaName/local-rag-mcp
- FAISS: facebook/faiss
- Ollama: ollama.ai
- FastMCP: github.com/jlowin/fastmcp
- Transformers: huggingface.co
Thank You!
