coderag
MCP server for CodeRAG - Model Context Protocol integration for RAG
Installation
npx @sylphx/coderag-mcpAsk AI about coderag
Powered by Claude Β· Grounded in docs
I know everything about coderag. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
CodeRAG
Lightning-fast hybrid code search for AI assistants
Zero dependencies β’ <50ms search β’ Hybrid TF-IDF + Vector β’ MCP ready
Quick Start β’ Features β’ MCP Setup β’ API
Why CodeRAG?
Traditional code search tools are either slow (full-text grep), inaccurate (keyword matching), or complex (require external services).
CodeRAG is different:
β Old way: Docker + ChromaDB + Ollama + 30 second startup
β
CodeRAG: npx @sylphx/coderag-mcp (instant)
| Feature | grep/ripgrep | Cloud RAG | CodeRAG |
|---|---|---|---|
| Semantic understanding | β | β | β |
| Zero external deps | β | β | β |
| Offline support | β | β | β |
| Startup time | Instant | 10-30s | <1s |
| Search latency | ~100ms | ~500ms | <50ms |
β¨ Features
Search
- π Hybrid Search - TF-IDF + optional vector embeddings
- π§ StarCoder2 Tokenizer - Code-aware tokenization (4.7MB, trained on code)
- π Smoothed IDF - No term gets ignored, stable ranking
- β‘ <50ms Latency - Instant results even on large codebases
Indexing
- π 1000-2000 files/sec - Fast initial indexing
- πΎ SQLite Persistence - Instant startup (<100ms) with cached index
- β‘ Incremental Updates - Smart diff detection, no full rebuilds
- ποΈ File Watching - Real-time index updates on file changes
Integration
- π¦ MCP Server - Works with Claude Desktop, Cursor, VS Code, Windsurf
- π§ Vector Search - Optional OpenAI embeddings for semantic search
- π³ AST Chunking - Smart code splitting using Synth parsers (15+ languages)
- π» Low Memory Mode - SQL-based search for resource-constrained environments
π Quick Start
Option 1: MCP Server (Recommended for AI Assistants)
npx @sylphx/coderag-mcp --root=/path/to/project
Or add to your MCP config:
{
"mcpServers": {
"coderag": {
"command": "npx",
"args": ["-y", "@sylphx/coderag-mcp", "--root=/path/to/project"]
}
}
}
See MCP Server Setup for Claude Desktop, Cursor, VS Code, etc.
Option 2: As a Library
npm install @sylphx/coderag
# or
bun add @sylphx/coderag
import { CodebaseIndexer, PersistentStorage } from '@sylphx/coderag'
// Create indexer with persistent storage
const storage = new PersistentStorage({ codebaseRoot: './my-project' })
const indexer = new CodebaseIndexer({
codebaseRoot: './my-project',
storage,
})
// Index codebase (instant on subsequent runs)
await indexer.index({ watch: true })
// Search
const results = await indexer.search('authentication logic', { limit: 10 })
console.log(results)
// [{ path: 'src/auth/login.ts', score: 0.87, matchedTerms: ['authentication', 'logic'], snippet: '...' }]
π¦ Packages
| Package | Description | Install |
|---|---|---|
| @sylphx/coderag | Core search library | npm i @sylphx/coderag |
| @sylphx/coderag-mcp | MCP server for AI assistants | npx @sylphx/coderag-mcp |
π MCP Server Setup
Claude Desktop
Add to claude_desktop_config.json:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"coderag": {
"command": "npx",
"args": ["-y", "@sylphx/coderag-mcp", "--root=/path/to/project"]
}
}
}
Cursor
Add to ~/.cursor/mcp.json (macOS) or %USERPROFILE%\.cursor\mcp.json (Windows):
{
"mcpServers": {
"coderag": {
"command": "npx",
"args": ["-y", "@sylphx/coderag-mcp", "--root=/path/to/project"]
}
}
}
VS Code
Add to VS Code settings (JSON) or .vscode/mcp.json:
{
"mcp": {
"servers": {
"coderag": {
"command": "npx",
"args": ["-y", "@sylphx/coderag-mcp", "--root=${workspaceFolder}"]
}
}
}
}
Windsurf
Add to ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"coderag": {
"command": "npx",
"args": ["-y", "@sylphx/coderag-mcp", "--root=/path/to/project"]
}
}
}
Claude Code
claude mcp add coderag -- npx -y @sylphx/coderag-mcp --root=/path/to/project
π οΈ MCP Tool: codebase_search
Search project source files with hybrid TF-IDF + vector ranking.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | Yes | - | Search query |
limit | number | No | 10 | Max results |
include_content | boolean | No | true | Include code snippets |
file_extensions | string[] | No | - | Filter by extension (e.g., [".ts", ".tsx"]) |
path_filter | string | No | - | Filter by path pattern |
exclude_paths | string[] | No | - | Exclude paths (e.g., ["node_modules", "dist"]) |
Example
{
"query": "user authentication login",
"limit": 5,
"file_extensions": [".ts", ".tsx"],
"exclude_paths": ["node_modules", "dist", "test"]
}
Response Format
LLM-optimized output (minimal tokens, maximum content):
# Search: "user authentication login" (3 results)
## src/auth/login.ts:15-28
```typescript
15: export async function authenticate(credentials) {
16: const user = await findUser(credentials.email)
17: return validatePassword(user, credentials.password)
18: }
src/middleware/auth.ts:42-55 [mdβtypescript]
42: // Embedded code from markdown docs
43: const authMiddleware = (req, res, next) => {
src/utils/large.ts:1-200 [truncated]
1: // First 70% shown...
... [800 chars truncated] ...
195: // Last 20% shown
---
## π API Reference
### `CodebaseIndexer`
Main class for indexing and searching.
```typescript
import { CodebaseIndexer, PersistentStorage } from '@sylphx/coderag'
const storage = new PersistentStorage({ codebaseRoot: './project' })
const indexer = new CodebaseIndexer({
codebaseRoot: './project',
storage,
maxFileSize: 1024 * 1024, // 1MB default
})
// Index with file watching
await indexer.index({ watch: true })
// Search with options
const results = await indexer.search('query', {
limit: 10,
includeContent: true,
fileExtensions: ['.ts', '.js'],
excludePaths: ['node_modules'],
})
// Stop watching
await indexer.stopWatch()
PersistentStorage
SQLite-backed storage for instant startup.
import { PersistentStorage } from '@sylphx/coderag'
const storage = new PersistentStorage({
codebaseRoot: './project', // Creates .coderag/ folder
dbPath: './custom.db', // Optional custom path
})
Low-Level TF-IDF Functions
import { buildSearchIndex, searchDocuments, initializeTokenizer } from '@sylphx/coderag'
// Initialize StarCoder2 tokenizer (4.7MB, one-time download)
await initializeTokenizer()
// Build index
const documents = [
{ uri: 'file://auth.ts', content: 'export function authenticate...' },
{ uri: 'file://user.ts', content: 'export class User...' },
]
const index = await buildSearchIndex(documents)
// Search
const results = await searchDocuments('authenticate user', index, { limit: 5 })
Vector Search (Optional)
For semantic search with embeddings:
import { hybridSearch, createEmbeddingProvider } from '@sylphx/coderag'
// Requires OPENAI_API_KEY environment variable
const results = await hybridSearch('authentication flow', indexer, {
vectorWeight: 0.7, // 70% vector, 30% TF-IDF
limit: 10,
})
βοΈ Configuration
MCP Server Options
| Option | Default | Description |
|---|---|---|
--root=<path> | Current directory | Codebase root path |
--max-size=<bytes> | 1048576 (1MB) | Max file size to index |
--no-auto-index | false | Disable auto-indexing on startup |
Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY | Enable vector search with OpenAI embeddings |
OPENAI_BASE_URL | Custom OpenAI-compatible endpoint |
EMBEDDING_MODEL | Embedding model (default: text-embedding-3-small) |
EMBEDDING_DIMENSIONS | Custom embedding dimensions |
π Performance
| Metric | Value |
|---|---|
| Initial indexing | ~1000-2000 files/sec |
| Startup with cache | <100ms |
| Search latency | <50ms |
| Memory per 1000 files | ~1-2 MB |
| Tokenizer size | 4.7MB (StarCoder2) |
Benchmarks
Tested on MacBook Pro M1, 16GB RAM:
| Codebase | Files | Index Time | Search Time |
|---|---|---|---|
| Small (100 files) | 100 | 0.5s | <10ms |
| Medium (1000 files) | 1,000 | 2s | <30ms |
| Large (10000 files) | 10,000 | 15s | <50ms |
ποΈ Architecture
coderag/
βββ packages/
β βββ core/ # @sylphx/coderag
β β βββ src/
β β β βββ indexer.ts # Main indexer with file watching
β β β βββ tfidf.ts # TF-IDF with StarCoder2 tokenizer
β β β βββ code-tokenizer.ts # StarCoder2 tokenization
β β β βββ hybrid-search.ts # Vector + TF-IDF fusion
β β β βββ incremental-tfidf.ts # Smart incremental updates
β β β βββ storage-persistent.ts # SQLite storage
β β β βββ vector-storage.ts # LanceDB vector storage
β β β βββ embeddings.ts # OpenAI embeddings
β β β βββ ast-chunking.ts # Synth AST chunking
β β β βββ language-config.ts # Language registry (15+ languages)
β β βββ package.json
β β
β βββ mcp-server/ # @sylphx/coderag-mcp
β βββ src/
β β βββ index.ts # MCP server
β βββ package.json
How It Works
- Indexing: Scans codebase, tokenizes with StarCoder2, builds TF-IDF index
- AST Chunking: Splits code at semantic boundaries (functions, classes, etc.)
- Storage: Persists to SQLite (
.coderag/folder) for instant startup - Watching: Detects file changes, performs incremental updates
- Search: Hybrid TF-IDF + optional vector search with score fusion
Supported Languages
AST-based chunking with semantic boundary detection:
| Category | Languages |
|---|---|
| JavaScript | JavaScript, TypeScript, JSX, TSX |
| Systems | Python, Go, Java, C |
| Markup | Markdown, HTML, XML |
| Data/Config | JSON, YAML, TOML, INI |
| Other | Protobuf |
Embedded Code Support: Automatically parses code blocks in Markdown and <script>/<style> tags in HTML.
π§ Development
# Clone
git clone https://github.com/SylphxAI/coderag.git
cd coderag
# Install
bun install
# Build
bun run build
# Test
bun run test
# Lint & Format
bun run lint
bun run format
π€ Contributing
Contributions are welcome! Please:
- Open an issue to discuss changes
- Fork and create a feature branch
- Run
bun run lintandbun run test - Submit a pull request
π License
MIT Β© Sylphx
Powered by Sylphx
Built with @sylphx/synth β’ @sylphx/mcp-server-sdk β’ @sylphx/doctor β’ @sylphx/bump
