Codeindex
Semantic code indexing and retrieval for AI agents. Pre-indexes codebases with tree-sitter AST parsing, embedding-based search, and LLM summaries. CLI + MCP server.
Ask AI about Codeindex
Powered by Claude Β· Grounded in docs
I know everything about Codeindex. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
codeindex
Semantic code indexing and retrieval for AI agents.
AI coding tools spend tokens and round-trips searching codebases with grep and glob. codeindex pre-indexes your project and returns ranked code regions (not just files) with their dependency connections in a single query.
Quick Start
# Build from source
cargo install --path crates/cli
# In any project
codeindex init
codeindex index
codeindex query "where is authentication handled"
What It Does
codeindex parses your source code with tree-sitter, extracts meaningful regions (functions, classes, structs, traits, interfaces, methods), maps their dependencies, and stores everything in a fast local index.
Queries return:
- Ranked regions with file paths, line numbers, and signatures
- Dependency graphs showing what each region calls, what calls it, and what types it references
- Optional LLM summaries describing what each region does
Supported Languages
| Language | Regions | Dependencies |
|---|---|---|
| Rust | functions, structs, enums, traits, impl blocks, methods | use imports, function calls, trait impls |
| TypeScript/JavaScript | functions, classes, interfaces, methods | import statements |
| Python | functions, classes, methods | from/import statements |
| Go | functions, methods, structs, interfaces | import declarations |
Adding a language means implementing one trait with two methods. The plugin architecture uses tree-sitter grammars.
Commands
codeindex init # Initialize index in current project
codeindex index # Index (or re-index) the project
codeindex index --incremental # Only process changed files
codeindex status # Show index health
# Querying
codeindex query "auth handler" # Natural language search
codeindex query ":symbol AuthRequest" # Direct symbol lookup
codeindex query ":deps src/auth.rs::login" # Dependency graph
codeindex query ":file src/auth.rs" # All regions in a file
# Options
codeindex query "..." --top 10 # More results
codeindex query "..." --depth 2 # Deeper dependency expansion
codeindex query "..." --json # Machine-readable output
codeindex query "..." --format compact # One line per result
AI Agent Integration
MCP Server (Claude Code)
codeindex ships as an MCP server for native integration with Claude Code and other MCP-compatible tools.
{
"mcpServers": {
"codeindex": {
"command": "codeindex",
"args": ["mcp-server"]
}
}
}
This exposes three tools:
codeindex_queryβ search the codebasecodeindex_depsβ dependency graph for a symbolcodeindex_statusβ index health check
CLI (Codex, any agent with shell access)
Any agent that can run shell commands can use:
codeindex query "where is the database connection pool" --json
How It Works
Indexing
- Walks the project (respects
.gitignore) - Parses each file with tree-sitter
- Extracts regions (functions, classes, etc.) and dependencies (imports, calls, type references)
- Stores metadata in SQLite, embeddings in an HNSW vector index
- Optionally generates LLM summaries via Claude Haiku
Querying
- Parses the query (natural language vs structured)
- Runs keyword search (SQLite FTS5) and semantic search (HNSW) in parallel
- Fuses results with Reciprocal Rank Fusion
- Expands the dependency graph around top results
- Optionally re-ranks with Haiku for explanation
Index Freshness
- File watcher daemon:
codeindex watchmonitors for changes with debounced re-indexing - Git hooks:
codeindex init --git-hooksinstalls post-checkout/merge/commit hooks
Architecture
codeindex/
βββ crates/
β βββ core/ # Library: storage, indexing, querying, embeddings
β βββ tree-sitter-langs/ # Language plugins (Rust, TS, Python, Go)
β βββ cli/ # CLI binary
β βββ mcp-server/ # MCP server binary
β βββ daemon/ # File watcher + git hooks
Core library + thin frontends. The CLI, MCP server, and daemon all share the same on-disk index via file-based locking.
Configuration
codeindex config set embedding.provider voyage # Use Voyage API for embeddings
codeindex config set summary.enabled true # Enable Haiku summaries
Config lives at .codeindex/config.json. Works with zero configuration β defaults to a local embedding model with no API keys required.
| Setting | Default | Options |
|---|---|---|
embedding.provider | local | local, voyage |
summary.enabled | false | true, false |
summary.model | claude-haiku-4-5-20251001 | Any Anthropic model |
daemon.debounce_ms | 500 | Any integer |
Performance
Tested on a 475-file Go project (rigbox-sidecar):
| Metric | Result |
|---|---|
| Full index | 1.2s |
| Regions extracted | 5,593 |
| Dependencies mapped | 2,095 |
| Symbol lookup | <10ms |
Building
git clone https://github.com/rigbox-dev/codeindex.git
cd codeindex
cargo build --release
cargo test --workspace
The release binary is at target/release/codeindex.
License
MIT
