Docs To MCP Server
MCP server that exposes documentation search tools for AI assistants
Ask AI about Docs To MCP Server
Powered by Claude Β· Grounded in docs
I know everything about Docs To MCP Server. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
@florexlabs/docs-to-mcp
Convert any documentation URL into a ready-to-run MCP server.
100% local by default β no API keys needed. Embeddings run locally with Transformers.js.
URL β crawl β clean HTML β markdown β chunks β embeddings β vector store β MCP server
Prerequisites
- Node.js >= 18
- Docker (for ChromaDB):
docker run -p 8000:8000 chromadb/chroma - Playwright browsers:
npx playwright install chromium - No API keys needed for default local embeddings
Quick Start
# Install Playwright browsers (one-time)
npx playwright install chromium
# Start ChromaDB
docker run -p 8000:8000 chromadb/chroma
# Initialize a project from a docs URL
npx @florexlabs/docs-to-mcp init https://docs.example.com --out ./my-docs-to-mcp
cd my-docs-to-mcp
npm install
# Crawl, build, and start β no API keys needed!
npm run crawl
npm run build
npm run start
Installation
npm install -g @florexlabs/docs-to-mcp
Or use directly with npx:
npx @florexlabs/docs-to-mcp <command>
Embedding Providers
Local (default)
Uses Transformers.js with the Xenova/all-MiniLM-L6-v2 model. Runs 100% on your machine via ONNX runtime. No API keys, no external services, no cost.
docs-to-mcp build # uses local by default
docs-to-mcp build --model Xenova/all-MiniLM-L6-v2 # explicit model
The model is downloaded automatically on first use (~80MB) and cached locally.
OpenAI (opt-in)
For higher quality embeddings on large documentation sets, you can use OpenAI:
export OPENAI_API_KEY=sk-...
docs-to-mcp build --provider openai
docs-to-mcp build --provider openai --model text-embedding-3-large
Commands
docs-to-mcp init <url>
Generate a new MCP server project from a documentation URL.
docs-to-mcp init https://docs.example.com --out ./my-docs-to-mcp
Options:
--out <dir>β Output directory (default:./docs-to-mcp-project)--depth <n>β Crawl depth (default:3)--limit <n>β Max pages (default:50)--provider <name>β Embedding provider:localoropenai(default:local)--model <name>β Embedding model--collection <name>β Collection name (default:docs)
docs-to-mcp crawl <url>
Crawl a documentation site, parse HTML to markdown, and chunk it.
docs-to-mcp crawl https://docs.example.com --out ./data --depth 3 --limit 50
Options:
--out <dir>β Output directory (default:./data)--depth <n>β Crawl depth (default:3)--limit <n>β Max pages (default:50)--verboseβ Verbose output
docs-to-mcp build
Embed chunks and upsert into ChromaDB.
docs-to-mcp build # local embeddings (default)
docs-to-mcp build --provider openai # use OpenAI instead
Options:
--collection <name>β Collection name (default:docs)--provider <name>βlocaloropenai(default:local)--model <name>β Embedding model--data <dir>β Data directory (default:./data)--forceβ Force rebuild--verboseβ Verbose output
docs-to-mcp start
Start the MCP server (stdio transport).
docs-to-mcp start --collection docs
docs-to-mcp dev
Start the MCP server in development mode with logging.
docs-to-mcp dev --collection docs
MCP Tools
The server exposes three tools:
| Tool | Description |
|---|---|
search_docs(query, topK?) | Semantic search across indexed documentation |
get_source(url) | Get all chunks from a specific source URL |
list_sources() | List all indexed documentation sources |
Connecting to MCP Clients
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"my-docs": {
"command": "npx",
"args": ["@florexlabs/docs-to-mcp", "start", "--collection", "docs"],
"env": {
"CHROMA_URL": "http://localhost:8000"
}
}
}
}
Cursor
Add to .cursor/mcp.json:
{
"mcpServers": {
"my-docs": {
"command": "npx",
"args": ["@florexlabs/docs-to-mcp", "start", "--collection", "docs"],
"env": {
"CHROMA_URL": "http://localhost:8000"
}
}
}
}
Environment Variables
CHROMA_URL=http://localhost:8000
# Only needed with --provider openai:
OPENAI_API_KEY=sk-...
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
Architecture
packages/
cli/ β CLI commands (init, crawl, build, start, dev)
crawler/ β Playwright-based same-origin doc crawler
parser/ β HTML cleanup (Cheerio) + markdown conversion (Turndown)
chunker/ β Heading-aware markdown chunking
embeddings/ β Local (Transformers.js) + OpenAI providers
vector-store/ β ChromaDB adapter
mcp-server/ β MCP server with search tools
Security Notes
- Only crawls same-origin links by default
- Never executes scraped content
- URLs are sanitized and normalized
- Local embeddings stay on your machine β nothing leaves your network
- If using OpenAI, embeddings are sent to OpenAI's API
- Do not crawl private documentation unless you understand where data goes
- No shell execution from user-controlled input
Development
pnpm install
pnpm test
pnpm build
License
MIT
