w3-mcp-server-qdrant
MCP server for vector search with Qdrant and Ollama embeddings
Ask AI about w3-mcp-server-qdrant
Powered by Claude Β· Grounded in docs
I know everything about w3-mcp-server-qdrant. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
W3 MCP Qdrant Server
Python MCP server for vector search using Qdrant vector database and Ollama embeddings.
Status: β Working with Qdrant vector search and Ollama embeddings + Advanced query techniques
Features
- qdrant_search - Search for similar documents using text queries (auto-embedded via Ollama)
- β¨ Query Expansion - Generate N query variations, search all, merge with RRF
- β¨ HyDE - Hypothetical Document Embeddings for semantic enrichment
- β¨ Reranking - Use LLM to reorder results by relevance
- qdrant_list_collections - List and manage Qdrant collections
Supports flexible output formats (Markdown or JSON) with configurable similarity thresholds and advanced search options.
Quick Start
1. Prerequisites Setup
Qdrant Server
# Using Docker (Recommended)
docker run -p 6333:6333 qdrant/qdrant:latest
Or install locally: Qdrant Quick Start
Ollama Server
# Install: https://ollama.ai
ollama pull bge-m3
ollama pull mistral
ollama serve
Available embedding models:
bge-m3(384 dims) - β recommended - best quality-speed balancenomic-embed-text(768 dims) - balanced, good for general usemxbai-embed-large(1024 dims) - highest qualityall-minilm(384 dims) - ultra-lightweight, good for mobile
2. Clean Setup (Important!)
cd /path/to/w3-mcp-server-qdrant
# Remove old lockfile and venv
rm -rf uv.lock .venv venv
# Unset old environment variable
unset VIRTUAL_ENV
3. Install Dependencies with uv
# Install all Python dependencies using uv
uv sync
That's it! uv sync installs all dependencies including MCP, pydantic, qdrant-client, and httpx.
4. Configure Environment
Create a .env file from template:
cp .env.example .env
Edit .env:
# Qdrant Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY= # Optional if using API key auth
# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBED_MODEL=bge-m3:latest
OLLAMA_RERANK_MODEL=mistral # For query expansion and reranking
Or export environment variables:
export QDRANT_URL=http://localhost:6333
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBED_MODEL=bge-m3:latest
export OLLAMA_RERANK_MODEL=mistral
5. Verify Installation
# Check Qdrant
curl http://localhost:6333/health
# Check Ollama
curl http://localhost:11434/api/tags
# Check Python env
uv run python -c "from mcp.server.fastmcp import FastMCP; print('β MCP ready')"
6. Test with MCP Inspector
# Start MCP Inspector (interactive web UI)
uv run mcp dev server.py
Opens URL like:
http://localhost:6274/?MCP_PROXY_AUTH_TOKEN=...
Features:
- β Available tools listed in sidebar
- β Test each tool interactively with JSON input
- β Real-time request/response viewing
- β Server logs and debugging
- β No extra dependencies needed
Usage
Option A: MCP Inspector (Development)
Best way to test and debug:
cd /path/to/w3-mcp-server-qdrant
# Start inspector
uv run mcp dev server.py
Opens web UI at http://localhost:5173:
- See available tools
- Test each tool with JSON input
- View request/response in real-time
- See server logs
Option B: Direct Python
# Run server (stdio mode)
uv run python server.py
Option C: Claude Code Integration
Method 1: Local Source (Development)
Edit ~/.claude/claude_config.json:
{
"mcpServers": {
"qdrant": {
"type": "stdio",
"command": "uv",
"args": ["run", "server.py"],
"cwd": "/path/to/w3-mcp-server-qdrant",
"env": {
"QDRANT_URL": "http://localhost:6333",
"OLLAMA_BASE_URL": "http://localhost:11434",
"OLLAMA_EMBED_MODEL": "bge-m3:latest",
"OLLAMA_RERANK_MODEL": "mistral"
}
}
}
}
Advantages:
- β Run latest development version
- β Easy to modify and test changes
- β Direct access to source code
Method 2: PyPI Installation (When Published)
Install from PyPI (always fetch latest version):
uv run --with w3-mcp-server-qdrant --refresh w3-mcp-server-qdrant
Edit ~/.claude/claude_config.json:
{
"mcpServers": {
"qdrant": {
"type": "stdio",
"command": "uv",
"args": ["run", "--with", "w3-mcp-server-qdrant", "--refresh", "w3-mcp-server-qdrant"],
"env": {
"QDRANT_URL": "http://localhost:6333",
"OLLAMA_BASE_URL": "http://localhost:11434",
"OLLAMA_EMBED_MODEL": "bge-m3:latest",
"OLLAMA_RERANK_MODEL": "mistral"
}
}
}
}
Advantages:
- β No need to clone repository
- β Easy version management
- β Automatic dependency isolation
Then restart Claude Code.
Tools Documentation
qdrant_search
Search for similar documents in a collection using text query (auto-embedded via Ollama).
Supports advanced search techniques: query expansion, hypothetical document embeddings (HyDE), and LLM-based reranking.
Basic Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
collection_name | string | required | Name of the collection to search |
query_text | string | required | Text to search for (auto-embedded via Ollama) |
limit | integer | 5 | Max results to return (1-100) |
score_threshold | float | 0.0 | Minimum similarity threshold (0.0-1.0) |
fields | string | "" | Comma-separated metadata fields to return (empty = all) |
response_format | string | "markdown" | "markdown" or "json" |
Advanced Parameters - Query Expansion
Generate N query variations, search all in parallel, merge results with Reciprocal Rank Fusion:
| Parameter | Type | Default | Description |
|---|---|---|---|
expand_query | boolean | false | Enable query expansion |
expand_query_count | integer | 3 | Number of variations to generate (1-10) |
Advanced Parameters - HyDE
Generate a hypothetical document matching the query intent, then embed it:
| Parameter | Type | Default | Description |
|---|---|---|---|
use_hyde | boolean | false | Enable HyDE |
hyde_combine_original | boolean | true | Also search original query + HyDE doc |
Advanced Parameters - Reranking
Use LLM to reorder results by relevance to the original query:
| Parameter | Type | Default | Description |
|---|---|---|---|
rerank | boolean | false | Enable LLM reranking |
rerank_top_n | integer | 10 | Number of results to rerank (1-100) |
Examples
Example 1: Basic search
{
"collection_name": "docs",
"query_text": "machine learning",
"limit": 5
}
Example 2: Query expansion (good recall)
{
"collection_name": "docs",
"query_text": "machine learning",
"expand_query": true,
"expand_query_count": 5,
"limit": 5
}
Example 3: HyDE (semantic understanding)
{
"collection_name": "docs",
"query_text": "machine learning",
"use_hyde": true,
"hyde_combine_original": true,
"limit": 5
}
Example 4: Full combo (best quality, slower)
{
"collection_name": "docs",
"query_text": "machine learning",
"expand_query": true,
"expand_query_count": 3,
"use_hyde": true,
"rerank": true,
"rerank_top_n": 15,
"limit": 5
}
Output Format
Returns JSON with search metadata and ranked results:
{
"query": "machine learning",
"collection": "docs",
"total": 3,
"search_method": "rrf+hyde+expand+rerank",
"results": [
{
"index": 1,
"id": "doc_123",
"score": 0.0273,
"metadata": {
"title": "Machine Learning Basics",
"author": "Jane Doe"
}
}
]
}
Note: search_method field indicates which techniques were applied:
basic- simple vector searchrrf- multiple searches merged with Reciprocal Rank Fusionrrf+hyde- RRF with HyDErrf+expand- RRF with query expansionrrf+hyde+expand+rerank- all techniques combined
qdrant_list_collections
List all collections in Qdrant with metadata.
Parameters:
response_format(string): "markdown" or "json" (default: "markdown")
Example:
{
"response_format": "json"
}
Output:
{
"collections": [
{
"name": "tech_docs",
"points_count": 1250,
"vector_size": 768
},
{
"name": "papers",
"points_count": 3840,
"vector_size": 1024
}
]
}
Configuration
QDRANT_URL
Specifies the URL of your Qdrant server.
Set via:
-
Environment variable:
export QDRANT_URL=http://localhost:6333 uv run python server.py -
.env file:
QDRANT_URL=http://localhost:6333 -
In claude_config.json:
"env": { "QDRANT_URL": "http://localhost:6333" }
OLLAMA_BASE_URL
Specifies the URL of your Ollama server.
Default: http://localhost:11434
OLLAMA_EMBED_MODEL
Specifies which embedding model to use for embedding search queries and documents.
Default: bge-m3:latest
Recommended embedding models:
bge-m3(384 dims) - β Recommended - best quality-to-speed rationomic-embed-text(768 dims) - balanced, good for most use casesall-minilm(384 dims) - fast, lightweightmxbai-embed-large(1024 dims) - highest quality but slower
OLLAMA_RERANK_MODEL
Specifies which LLM model to use for advanced features (query expansion, HyDE, reranking).
Default: mistral
Recommended models:
mistral(7B) - β Recommended - good quality, reasonable speedqwen2.5-coder(7B) - high quality but optimized for codellama3.2(3B) - smaller, faster but lower qualityneural-chat(7B) - good for instruction-following
Note: Only used when expand_query=true, use_hyde=true, or rerank=true
Project Structure
w3-mcp-server-qdrant/
βββ server.py # MCP server entry point
βββ pyproject.toml # Project config
βββ .env.example # Environment variables template
βββ README.md # This file
βββ tests/
βββ test_mcp_server.py # Integration tests
How It Works
Architecture
MCP Client (Claude, IDE, etc.)
β
MCP Server (server.py)
βββ Ollama: text β embedding vector
βββ Qdrant: vector search
Search Flow
- User provides text query
- Ollama embeds query β embedding vector
- Qdrant searches for similar vectors
- Results returned with scores and metadata
Examples
Search documents
# Via Claude/MCP interface
qdrant_search(
collection_name="tech_docs",
query_text="machine learning algorithms",
limit=5,
score_threshold=0.6,
response_format="markdown"
)
List collections
# Via Claude/MCP interface
qdrant_list_collections(response_format="json")
Development
Run tests using uv
uv run pytest tests/
Code formatting with uv
uv run black server.py
uv run ruff check server.py
Testing with MCP Inspector
uv run mcp dev server.py
Web UI at http://localhost:5173 shows:
- Available tools and schemas
- Real-time request/response
- Server logs
- Interactive testing
Performance Tips
Basic Search Optimization
- Score threshold: Use
score_thresholdto filter low-relevance results and reduce noise - Result limit: Adjust
limitparameter (1-100) to balance quality vs. speed - Embedding model: Choose based on quality vs. speed tradeoff:
nomic-embed-text: balanced (recommended)all-minilm: fast, lightweightmxbai-embed-large: higher quality but slower
Advanced Features Trade-offs
| Feature | Quality | Speed | Use Case |
|---|---|---|---|
| Basic search | ββ | β‘β‘β‘ | Clear, specific queries |
| Query expansion | βββ | β‘β‘ | Ambiguous queries, high recall needed |
| HyDE | βββ | β‘β‘ | Semantic understanding important |
| Reranking | ββββ | β‘ | Precision critical, can wait 1-2s |
| All combined | βββββ | β‘ | Best quality, time not critical |
Performance Strategy
- Fast path: Basic search with
limit=5 - Balanced:
expand_query=true, expand_query_count=3 - High quality: Add
use_hyde=true - Maximum quality: Add
rerank=true(slowest, ~5-10s)
Troubleshooting
Qdrant connection error
# Check if Qdrant is running
curl http://localhost:6333/health
# Start Qdrant with Docker
docker run -p 6333:6333 qdrant/qdrant:latest
Ollama embedding failed
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Pull embedding model
ollama pull nomic-embed-text
# Start Ollama
ollama serve
Collection not found
- Ensure collection exists in Qdrant
- Create collection through Qdrant UI or external tools
- Verify collection name matches exactly
MCP module not found
# Install dependencies with uv
uv sync
Server hangs on startup
- Check if Qdrant server is running and accessible
- Check if Ollama server is running
- Try:
curl http://localhost:6333/healthandcurl http://localhost:11434/api/tags
Implemented Features
- Query expansion with LLM-generated variations
- HyDE (Hypothetical Document Embeddings)
- Reciprocal Rank Fusion (RRF) for result merging
- LLM-based result reranking
- Parallel async embedding and search
Future Enhancements
- Support for additional embedding models
- Batch vector operations
- Collection creation/deletion tools
- Vector update and delete operations
- Semantic search filters
- Caching for query expansions
- Custom RRF weights configuration
References
License
MIT
