📦

Local Rag System

Local RAG System for DevOps/SRE - Complete Documentation

0 installs

Trust: 34 — Low

Rag

Ask AI about Local Rag System

I know everything about Local Rag System. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Local RAG System for DevOps/SRE

Introduction and Project Goals
System Architecture
Infrastructure Components
LLM Model Selection
Documentation Indexing
MCP Integrations and Extensions
Deployment and Operations
MCP Setup Guide
Best Practices and Troubleshooting
Complete Code Reference

1. Introduction and Project Goals

Business Objective

Create a fully local, private RAG (Retrieval-Augmented Generation) system that enables:

Fast technical answers about DevOps/SRE/Cloud without browsing documentation.
Data privacy - everything runs locally, zero data sent to external APIs.
No API costs - unlimited queries without token fees.
Technical knowledge - indexing O'Reilly books, AWS/Kubernetes/Terraform documentation.
Tool integration - internet access, Kubernetes, Docker, filesystem.

Core Principles

100% local - no data leaves your computer.
Production-ready - Docker Compose, health checks, monitoring.
Scalable - easy to add new documents.
Flexible - swappable components (models, vector databases).
Apple Silicon optimized - MLX utilization for maximum performance.

2. System Architecture

Component Overview

┌─────────────────────────────────────────────────────────────┐
│                       USER / LM STUDIO                      │
│                   (Interface + LLM Model)                   │
│                   + MCP Tool Integration                    │
└────────────────┬────────────────────────────────────────────┘
                 │
                 ├──> MCP Tool Calls (RAG, Web Search, K8s)
                 │
┌────────────────▼────────────────────────────────────────────┐
│              LANGCHAIN RAG SERVER (Docker)                  │
│  • FastAPI endpoints (/query, /search, /health)             │
│  • LangChain orchestration                                  │
│  • HuggingFace Embeddings (sentence-transformers)           │
│  • Connection pooling                                       │
└────┬───────────────────────────────────┬────────────────────┘
     │                                    │
     │ Vector Search                      │ LLM Generation
     │                                    │
┌────▼─────────────────────┐   ┌────────▼───────────────────┐
│     QDRANT (Docker)      │   │   LM STUDIO LOCAL SERVER   │
│   • 23,389 chunks        │   │   • Qwen3 Coder 30B MLX    │
│   • Similarity search    │   │   • Magistral Small 2509   │
│   • Web Dashboard        │   │   • Tool calling support   │
│   • Port 6333/6334       │   │   • Port 1234              │
└──────────────────────────┘   └────────────────────────────┘

Detailed Query Flow

User asks a question in the LM Studio Chat.
LM Studio determines if external tools are needed (RAG, web search, etc.).
If RAG is needed, LM Studio calls the RAG MCP Server.
The RAG MCP Server forwards the query to the RAG FastAPI Server.
The RAG Server converts the question into an embedding using sentence-transformers.
Qdrant searches for the 3-5 most similar documentation fragments.
The RAG Server builds a prompt combining:
- System instruction
- Context from Qdrant (retrieved fragments)
- User's question
The RAG Server calls LM Studio Local Server for LLM generation.
LM Studio (LLM model) generates an answer using:
- Context from Qdrant (priority: specific examples, code, facts).
- Pretrained knowledge (general understanding, syntax, best practices).
The answer is returned to the user with source citations.

Two Knowledge Sources

Source	Description	Strengths	Weaknesses
Qdrant Retrieved Context	Specific fragments from indexed documents. Exact quotes, examples, code snippets.	Current, precise, verifiable.	Limited to indexed content.
Pretrained Model Knowledge	General knowledge about programming, DevOps, clouds, best practices, common patterns.	Broad, structural understanding.	May be outdated (training cutoff date).

Technology Stack

Backend: Python 3.12+, FastAPI, LangChain, Uvicorn
Databases: Qdrant, Sentence Transformers all-MiniLM-L6-v2 (embeddings)
Containerization: Docker & Docker Compose
LLM: LM Studio, MLX quantized models
MCP: Model Context Protocol for tool integration

3. Infrastructure Components

3.1. Qdrant Vector Database

Role: Store and search vector representations of documentation.
Specifications:
- Version: qdrant/qdrant:latest
- Ports: 6333 (HTTP API), 6334 (gRPC)
- Storage: Docker volume qdrant_storage
- Collection: devops_docs (23,389 chunks)
- Vector dimensions: 384 (all-MiniLM-L6-v2)
- Health Check: http://localhost:6333 (every 10s)
- Resource Usage:
  - CPU: ~0.5-1 core
  - RAM: ~200-500 MB
  - Disk: ~2-5 GB

3.2. LangChain RAG Server

Role: RAG pipeline orchestration, API endpoint for queries.
Specifications:
- Framework: FastAPI + LangChain
- Port: 8000
- Base Image: python:3.12-slim
- API Endpoints:
  - GET /: Service information
  - GET /health: Detailed health check
  - GET /config: Current configuration
  - POST /query: Main RAG query endpoint
  - POST /search: Direct search without an LLM

3.3. LM Studio Local Server

Role: Host LLM models, inference, tool calling.
Specifications:
- Port: 1234 (OpenAI-compatible API)
- Platform: Apple Silicon M4 Pro, 48 GB RAM
- Installed Models:
  - Qwen3 Coder 30B MLX 6BIT (~25 GB)
  - Magistral Small 2509 MLX 5BIT (~17 GB)

4. LLM Model Selection

4.1. Selection Criteria for DevOps/SRE

Technical accuracy: Precision in Terraform, K8s, AWS.
Code generation: HCL, YAML, Docker Compose.
Tool calling support: Integration with MCP servers.
M4 performance: MLX optimization.

4.2. Model Comparison

Model	Specifications	Strengths	Weaknesses	Best for...
Qwen3 Coder 30B MLX (6-bit) ⭐⭐⭐⭐⭐	30B, 25GB, 30-40 tok/s	Best for code/infra, great Terraform/K8s knowledge, fast on MLX	Requires ~35 GB RAM, slower than smaller models	Generating Terraform modules, debugging K8s manifests, code review.
Magistral Small 2509 MLX (5-bit) ⭐⭐⭐⭐⭐	22B, 17GB, 40-50 tok/s	Excellent reasoning, lighter and faster than Qwen3, good at technical writing	Slightly weaker in pure code generation	Architectural decisions, complex problem solving, best practice recommendations.

4.3. Deployment Recommendations

For 48 GB RAM:
- Primary: Qwen3 Coder 30B (for code/infrastructure)
- Secondary: Magistral Small 2509 (for reasoning/decisions)
For 32 GB RAM:
- Primary: Magistral Small 2509 (universal)
For 64+ GB RAM:
- Premium: Qwen3 Coder 30B 8BIT (max quality)

5. Documentation Indexing

5.1. Document Preparation

Supported formats: PDF, TXT, Markdown, HTML.
Folder structure:

documents/
├── devops/
│   ├── terraformcookbook.pdf
│   ├── kubernetes-best-practices.pdf
│   └── aws_resources.pdf
├── sre/
│   └── site-reliability-engineering.pdf
└── cloud/
    └── aws-well-architected.pdf

Recommended sources: O'Reilly books, official documentation, internal company documentation.

5.2. Indexing Process

Loading: Read documents from folders (PyPDFLoader).
Chunking:
- Chunk size: 1000 characters
- Overlap: 200 characters (to preserve context)
- Result: 23,389 chunks from 8,677 pages.
Embedding Generation:
- Model: sentence-transformers/all-MiniLM-L6-v2
- Dimensions: 384
Storing in Qdrant:
- Collection: devops_docs
- Distance metric: Cosine Similarity
- Indexing time: ~30-60 minutes.

5.3. Running the Indexing Script

# Ensure Docker stack is running
docker compose up -d

# Run indexing (local Python)
python index_documents.py

# Or via Docker
docker compose exec langchain-server python /app/index_documents.py

# Verify indexing
curl http://localhost:8000/health | jq '.collection_vectors_count'
# Should show: 23389

6. MCP Integrations and Extensions

6.1. Model Context Protocol (MCP) Overview

What is MCP: A protocol created by Anthropic that enables LLMs to access external tools.
Architecture: LLM → MCP Server → External Service/API
Supported in: LM Studio, Claude Desktop, VS Code, Cursor

6.2. Available MCP Integrations

MCP Server	Description	Use Cases
RAG DevOps Docs	Search through your indexed DevOps/SRE documentation	Terraform questions, K8s best practices, AWS configurations
Web Search	Multi-engine web search (Bing, Brave, DuckDuckGo)	Latest package versions, breaking changes, new features
Kubernetes	Native K8s cluster management	List pods, get logs, check deployments, helm operations
Filesystem	Local file system access	Read configs, search code, analyze project structure
Docker	Container management	List containers, check logs, manage images

6.3. Web Search Integration

Implementation: mrkrsl/web-search-mcp
Available Tools:
- full-web-search: Comprehensive search with full content extraction.
- get-web-search-summaries: Quick search with snippets.
- get-single-web-page-content: Extract content from a specific URL.

6.4. Kubernetes MCP Server

Implementation: containers/kubernetes-mcp-server
Key Features:
- Pod management (list, logs, exec).
- CRUD for any K8s resource (Deployments, Services, etc.).
- Helm operations (install, list, uninstall).
Security Modes:
- Read-only: View only.
- Disable destructive: View and create, but no updates/deletes.
- Full access: Full permissions (for dev environments).

7. Deployment and Operations

7.1. Quick Start

# Clone the repository
git clone https://github.com/pshq-ripe/local-rag-system
cd local-mcp

# Start Docker services
docker compose up -d

# Check health
curl http://localhost:8000/health
curl http://localhost:6333

# Index documents (first time only)
python index_documents.py

# Verify indexing completed
curl http://localhost:8000/health | jq

7.2. Docker Compose Setup

Services:
- qdrant: The vector database.
- langchain-server: The RAG server.
- LM Studio: Runs natively on the host (outside of Docker).

7.3. Project Structure

local-mcp/
├── docker-compose.yaml        # Main orchestration file
├── Dockerfile                 # RAG server container image
├── requirements.txt           # Python dependencies
├── rag_server.py              # FastAPI RAG server
├── index_documents.py         # Document indexing script
├── Makefile                   # Convenience commands
│
├── documents/                 # Source documents for indexing
│   ├── devops/
│   ├── sre/
│   └── cloud/
│
├── logs/                      # Application logs
│
└── mcp-servers/              # MCP server implementations
    ├── rag-mcp-server/       # RAG MCP integration
    └── README.md             # MCP setup instructions

7.4. Networking

Docker Network: rag-network (bridge type).
Port Mapping:
- 6333 → Qdrant HTTP API
- 6334 → Qdrant gRPC
- 8000 → RAG Server API
- 1234 → LM Studio (host)
Host Access: host.docker.internal allows the RAG server to communicate with LM Studio.

7.5. Makefile Commands

# Build images
make build

# Start stack
make up

# Stop stack
make down

# View logs
make logs

# Restart RAG server
make restart

# Index documents
make index

# Health check
make health

# Test query
make test

# Clean everything
make clean

7.6. Health Checks & Monitoring

Qdrant Health: GET http://localhost:6333
RAG Server Health: GET http://localhost:8000/health

{
  "status": "healthy",
  "qdrant_connected": true,
  "lm_studio_connected": true,
  "collection_exists": true,
  "collection_vectors_count": 23389,
  "qa_chain_initialized": true
}

8. MCP Setup Guide

8.1. Prerequisites

# Ensure Node.js is installed (v18+)
node --version

# Ensure npm is available
npm --version

# Ensure Docker is running
docker compose version

8.2. MCP Servers Installation

Option 1: Install All MCP Servers (Recommended)

# Create MCP servers directory
mkdir -p ~/lm-studio-mcp
cd ~/lm-studio-mcp

# Run the complete setup script
curl -o setup-mcp.sh https://raw.githubusercontent.com/pshq-ripe/scripts/setup-mcp.sh
chmod +x setup-mcp.sh
./setup-mcp.sh

Option 2: Manual Installation

8.2.1. Web Search MCP Server

cd ~/lm-studio-mcp
git clone https://github.com/mrkrsl/web-search-mcp.git
cd web-search-mcp

npm install
npm run build

# Test
node dist/index.js --help

8.2.2. RAG MCP Server (Custom)

cd ~/lm-studio-mcp
mkdir rag-mcp-server
cd rag-mcp-server

# Create package.json
cat > package.json << 'EOF'
{
  "name": "rag-mcp-server",
  "version": "1.0.0",
  "type": "module",
  "description": "MCP server for DevOps RAG documentation",
  "main": "rag-mcp-server.js",
  "dependencies": {
    "@modelcontextprotocol/sdk": "^0.5.0",
    "node-fetch": "^3.3.2"
  }
}
EOF

# Install dependencies
npm install

# Copy rag-mcp-server.js from repository
curl -o rag-mcp-server.js https://raw.githubusercontent.com/pshq-ripe/mcp-servers/rag-mcp-server.js
chmod +x rag-mcp-server.js

# Test
node rag-mcp-server.js

8.2.3. Kubernetes MCP Server

# No installation needed - uses npx
# Will be installed on first use

8.2.4. Filesystem MCP Server

# No installation needed - uses npx
# Will be installed on first use

8.2.5. Docker MCP Server (Optional)

# No installation needed - uses npx
# Will be installed on first use

8.3. LM Studio MCP Configuration

8.3.1. Locate mcp.json

# On macOS, mcp.json is located at one of:
# 1. ~/Library/Application Support/LMStudio/mcp.json
# 2. ~/.config/lmstudio/mcp.json
# 3. ~/.lmstudio/mcp.json

# Find it:
find ~ -name "mcp.json" 2>/dev/null | grep -i lmstudio

8.3.2. Create/Update mcp.json

{
  "mcpServers": {
    "rag-devops-docs": {
      "command": "node",
      "args": [
        "/Users/YOUR_USERNAME/lm-studio-mcp/rag-mcp-server/rag-mcp-server.js"
      ],
      "env": {
        "RAG_SERVER_URL": "http://localhost:8000"
      }
    },
    "web-search": {
      "command": "node",
      "args": [
        "/Users/YOUR_USERNAME/lm-studio-mcp/web-search-mcp/dist/index.js"
      ],
      "env": {
        "MAX_BROWSERS": "3",
        "BROWSER_HEADLESS": "true",
        "DEFAULT_TIMEOUT": "6000",
        "MAX_CONTENT_LENGTH": "100000",
        "ENABLE_RELEVANCE_CHECKING": "true",
        "RELEVANCE_THRESHOLD": "0.3"
      }
    },
    "kubernetes": {
      "command": "npx",
      "args": [
        "-y",
        "kubernetes-mcp-server@latest",
        "--disable-destructive"
      ],
      "env": {
        "KUBECONFIG": "/Users/YOUR_USERNAME/.kube/config"
      }
    },
    "filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/Users/YOUR_USERNAME/projects"
      ]
    },
    "docker": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-server-docker"
      ]
    }
  }
}

Important: Replace YOUR_USERNAME with your actual username!

# Get your username
whoami

# Or use full path
echo $HOME

8.3.3. Restart LM Studio

# Close LM Studio completely
killall "LM Studio"

# Restart
open -a "LM Studio"

8.4. Verify MCP Setup

8.4.1. Check LM Studio Logs

In LM Studio:

Go to Developer tab
Check Developer Logs
Look for:

[Plugin(mcp/rag-devops-docs)] stdout: [Tools Prvdr.] Register with LM Studio
[Plugin(mcp/web-search)] stdout: [Tools Prvdr.] Register with LM Studio
[Plugin(mcp/kubernetes)] stdout: [Tools Prvdr.] Register with LM Studio

8.4.2. Test RAG MCP Server

# Test directly
cd ~/lm-studio-mcp/rag-mcp-server
node rag-mcp-server.js

# Should print: "RAG MCP server running on stdio"
# Ctrl+C to exit

# Test RAG Server is accessible
curl http://localhost:8000/health | jq

8.4.3. Test in LM Studio Chat

Load a model with tool calling support (Qwen3 Coder or Magistral), then ask:

Search my DevOps documentation for information about creating EC2 instances in Terraform.

Expected behavior:

Model recognizes it needs documentation
Calls search_devops_docs tool
Returns answer with sources (e.g., "terraformcookbook.pdf, page 83")

8.5. Troubleshooting MCP

Problem: MCP Server not found

# Verify file exists
ls -la ~/lm-studio-mcp/rag-mcp-server/rag-mcp-server.js

# Check permissions
chmod +x ~/lm-studio-mcp/rag-mcp-server/rag-mcp-server.js

# Test execution
node ~/lm-studio-mcp/rag-mcp-server/rag-mcp-server.js

Problem: Module not found errors

cd ~/lm-studio-mcp/rag-mcp-server

# Clean and reinstall
rm -rf node_modules package-lock.json
npm install

# Verify dependencies
npm list

Problem: RAG Server connection refused

# Check Docker stack
docker compose ps

# Check RAG Server
curl http://localhost:8000/health

# Restart if needed
docker compose restart langchain-server

Problem: Tool not appearing in LM Studio

Verify mcp.json syntax (use JSON validator)
Check file paths are absolute (not relative)
Restart LM Studio completely
Check Developer Logs for errors
Ensure model supports tool calling (Qwen3 Coder, Magistral)

8.6. Advanced MCP Configuration

Custom System Prompt for Better Tool Usage

In LM Studio → Chat Settings → System Prompt:

You are a DevOps/SRE expert assistant with access to comprehensive tools:

- search_devops_docs: Search indexed documentation (Terraform, K8s, AWS, Docker)
- web_search: Search the internet for latest information
- kubernetes operations: Manage K8s clusters
- filesystem: Access local files and code

When answering questions about DevOps/Infrastructure:
1. Use search_devops_docs FIRST to check documentation
2. Use web_search for latest versions or breaking changes
3. Cite specific sources (book names, page numbers, URLs)
4. Combine documentation facts with your general knowledge

For general questions, answer directly without tools.

Testing Individual MCP Servers

# Test Web Search
cd ~/lm-studio-mcp/web-search-mcp
npm test  # if available

# Test Kubernetes
kubectl get pods  # Ensure kubectl works
npx kubernetes-mcp-server@latest --help

# Test RAG
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "Test query"}' | jq

9. Best Practices and Troubleshooting

9.1. Model Selection Strategy

Code generation: Qwen3 Coder 30B
Architectural decisions: Magistral Small 2509
Quick queries: Magistral Small 2509
Debugging: Qwen3 Coder 30B

9.2. RAG Query Optimization

max_results (k): Default is 3. Increasing it improves context but slows down the response.
temperature: Default is 0.7. For DevOps tasks, 0.5-0.7 is recommended for more predictable answers.
score_threshold: Default is 0.5. Higher values (0.7) for more precise matches, lower (0.3) for broader results.

9.3. Common Issues and Solutions

Problem: RAG Server won't start

Symptoms: Container crashes immediately, "Connection refused" in logs.

Solution:

# Check Qdrant
docker compose logs qdrant

# Verify network
docker network inspect local-mcp_rag-network

# Check dependencies
docker compose ps

# Restart
docker compose restart

Problem: Collection not found (404)

Symptoms: /query returns 503, /health shows collection_exists: false.

Solution:

# Run indexing
make index
# Or
python index_documents.py

# Verify
curl http://localhost:6333/collections

# Restart server
make restart

Problem: LM Studio connection error

Symptoms: "Connection error" in /query response, timeout errors.

Solution:

# Check LM Studio Local Server is running
curl http://localhost:1234/v1/models

# Verify model is loaded in LM Studio
# Check firewall settings
# Test host.docker.internal resolves

Problem: MCP tools not working

Symptoms: Model doesn't call tools, tools not visible in UI.

Solution:

Verify model supports tool calling (Qwen3 Coder, Magistral)
Check mcp.json syntax and paths
Restart LM Studio completely
Check Developer Logs for errors
Test MCP servers individually (see section 8.6)

Problem: Slow inference

Symptoms: Query takes >30 seconds, timeouts.

Solution:

# Switch to smaller/faster model (Magistral 5BIT)
# Reduce max_results (3 → 2)
# Check system resources (Activity Monitor)
# Close other applications
# Consider MLX quantized models

Problem: Out of memory

Symptoms: LM Studio crashes, "Model loading stopped" error, system freeze.

Solution:

# Switch to smaller model
# Close other apps
# Use higher quantization (6BIT → 4BIT)
# Check available RAM: vm_stat
# Consider upgrading RAM

9.4. Backup and Maintenance

Backup Qdrant Data

# Backup
docker run --rm \
  -v qdrant_storage:/data \
  -v $(pwd):/backup \
  ubuntu tar czf /backup/qdrant-$(date +%Y%m%d).tar.gz /data

# Restore
docker run --rm \
  -v qdrant_storage:/data \
  -v $(pwd):/backup \
  ubuntu tar xzf /backup/qdrant-20251117.tar.gz -C /

Update Dependencies

# Update Docker images
docker compose pull

# Rebuild
docker compose build --no-cache

# Update Python dependencies
pip install --upgrade -r requirements.txt

# Update MCP servers
cd ~/lm-studio-mcp/web-search-mcp
git pull
npm install
npm run build

10. Complete Code Reference

10.1. Key Files

docker-compose.yaml: Docker orchestration
Dockerfile: RAG server container
requirements.txt: Python dependencies
rag_server.py: FastAPI RAG server
index_documents.py: Document indexing script
Makefile: Convenience commands
mcp.json: MCP configuration for LM Studio
rag-mcp-server.js: RAG MCP integration

10.2. Environment Variables

Variable	Default	Description
`QDRANT_URL`	`http://qdrant:6333`	Qdrant server URL
`LM_STUDIO_URL`	`http://host.docker.internal:1234/v1`	LM Studio API URL
`COLLECTION_NAME`	`devops_docs`	Qdrant collection name
`EMBEDDING_MODEL`	`sentence-transformers/all-MiniLM-L6-v2`	Embedding model
`CHUNK_SIZE`	`1000`	Document chunk size
`CHUNK_OVERLAP`	`200`	Chunk overlap size
`TEMPERATURE`	`0.7`	LLM temperature
`MAX_RETRIEVAL_RESULTS`	`3`	Max chunks to retrieve

10.3. API Endpoints

Endpoint	Method	Description
`/`	GET	Service information
`/health`	GET	Detailed health check
`/config`	GET	Current configuration
`/query`	POST	Main RAG query
`/search`	POST	Direct vector search

10.4. MCP Tools Available

Tool	MCP Server	Description
`search_devops_docs`	rag-devops-docs	Search indexed documentation
`check_rag_health`	rag-devops-docs	Check RAG system status
`full-web-search`	web-search	Comprehensive web search
`get-web-search-summaries`	web-search	Quick web search
`pods_list`	kubernetes	List Kubernetes pods
`pods_log`	kubernetes	Get pod logs
`helm_install`	kubernetes	Install Helm chart
(many more)	kubernetes	K8s operations
`read_file`	filesystem	Read local file
`search_files`	filesystem	Search in files
`list_containers`	docker	List Docker containers
`container_logs`	docker	Get container logs

Summary

This system is designed to be:

✅ Production-ready: Docker, health checks, graceful degradation
✅ Private: 100% local, zero external API calls
✅ Scalable: Easy to add documents and MCP servers
✅ Performant: MLX optimization, native Go MCP, proper indexing
✅ Secure: Read-only modes, isolated networks, local-only access

Achieved Goals:

23,389 chunks of DevOps/SRE documentation indexed
5+ MCP servers integrated (RAG, Web, K8s, Docker, FS)
2 LLM models (Qwen3 Coder, Magistral) ready for work
Sub-second query latency for most queries
Comprehensive tooling for DevOps workflows

Next Steps:

Add more documents to expand knowledge base
Test different models for specific use cases
Expand MCP integrations (Git, Grafana, Slack)
Fine-tune retrieval parameters based on usage patterns
Set up automated backups and monitoring

For issues, questions, or contributions, please see the repository's issue tracker.

Local Rag System

Reviews

Documentation

Local RAG System for DevOps/SRE

Table of Contents

1. Introduction and Project Goals

Business Objective

Core Principles

2. System Architecture

Component Overview

Detailed Query Flow

Two Knowledge Sources

Technology Stack

3. Infrastructure Components

3.1. Qdrant Vector Database

3.2. LangChain RAG Server

3.3. LM Studio Local Server

4. LLM Model Selection

4.1. Selection Criteria for DevOps/SRE

4.2. Model Comparison

4.3. Deployment Recommendations

5. Documentation Indexing

5.1. Document Preparation

5.2. Indexing Process

5.3. Running the Indexing Script

6. MCP Integrations and Extensions

6.1. Model Context Protocol (MCP) Overview

6.2. Available MCP Integrations

6.3. Web Search Integration

6.4. Kubernetes MCP Server

7. Deployment and Operations

7.1. Quick Start

7.2. Docker Compose Setup

7.3. Project Structure

7.4. Networking

7.5. Makefile Commands

7.6. Health Checks & Monitoring

8. MCP Setup Guide

8.1. Prerequisites

8.2. MCP Servers Installation

Option 1: Install All MCP Servers (Recommended)

Option 2: Manual Installation

8.2.1. Web Search MCP Server

8.2.2. RAG MCP Server (Custom)

8.2.3. Kubernetes MCP Server

8.2.4. Filesystem MCP Server

8.2.5. Docker MCP Server (Optional)

8.3. LM Studio MCP Configuration

8.3.1. Locate mcp.json

8.3.2. Create/Update mcp.json

8.3.3. Restart LM Studio

8.4. Verify MCP Setup

8.4.1. Check LM Studio Logs

8.4.2. Test RAG MCP Server

8.4.3. Test in LM Studio Chat

8.5. Troubleshooting MCP

Problem: MCP Server not found

Problem: Module not found errors

Problem: RAG Server connection refused

Problem: Tool not appearing in LM Studio

8.6. Advanced MCP Configuration

Custom System Prompt for Better Tool Usage

Testing Individual MCP Servers

9. Best Practices and Troubleshooting

9.1. Model Selection Strategy

9.2. RAG Query Optimization

9.3. Common Issues and Solutions

Problem: RAG Server won't start

Problem: Collection not found (404)

Problem: LM Studio connection error

Problem: MCP tools not working

Problem: Slow inference

Problem: Out of memory

9.4. Backup and Maintenance

Backup Qdrant Data

Update Dependencies

10. Complete Code Reference

10.1. Key Files

10.2. Environment Variables

10.3. API Endpoints