Codixing
Code retrieval engine for AI agents β MCP server
Ask AI about Codixing
Powered by Claude Β· Grounded in docs
I know everything about Codixing. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Codixing
Website: codixing.com Β· Docs
Code retrieval engine that saves your AI agent 73% of its token budget. Replaces grep with ranked, AST-aware search β so models spend tokens reasoning, not reading.
Install
curl -fsSL https://codixing.com/install.sh | sh
Installs codixing to /usr/local/bin. macOS (Apple Silicon) and Linux (x86_64). Binaries also on the releases page.
Claude Code plugin (optional)
claude plugin marketplace add ferax564/codixing
claude plugin install codixing@codixing
Adds 5 slash commands: /codixing-setup, /codixing-explore, /codixing-review, /codixing-preflight, /codixing-release.
MCP server (optional β for Cursor, Windsurf, Continue.dev, Codex)
Add to your project's .mcp.json:
{
"mcpServers": {
"codixing": {
"command": "npx",
"args": ["-y", "codixing-mcp", "--root", ".", "--no-daemon-fork"]
}
}
}
Or for OpenAI Codex CLI: codex mcp add codixing -- codixing-mcp --root .
Why Not Just Grep?
AI coding agents use grep, find, and cat for code navigation. These tools return everything, always β a single rg b2Vec2 on a real codebase returns 2,240 hits (225 KB), burning context before any reasoning happens.
Codixing returns the top 20 results in 1.3 KB β same signal, 99% less waste.
The cost of noise
Tested on 6 real-world repos (tokio, ripgrep, axum, django, fastapi, react β 9,493 files):
| Metric | grep/cat/find | Codixing | Savings |
|---|---|---|---|
| Tool calls per session | 58 | 26 | 55% fewer |
| Output tokens | ~84,600 | ~22,900 | 73% fewer |
| Est. cost (Opus @ $15/M) | $1.27 | $0.34 | $0.93/session |
At 50 agent sessions/day, that's $1,400/month back in your pocket β and the agent finds the right code more often.
What you get
| Capability | grep/rg | Codixing |
|---|---|---|
| Bounded, ranked output | No | Yes (BM25 + PageRank) |
| Symbol definitions (not just mentions) | No | Yes (AST-parsed symbol table) |
| Dependency graph queries | No | Yes (transitive imports, call graph) |
| Natural language search | No | Yes (BM25 + optional embeddings) |
| Token budget management | No | Yes (auto-truncation) |
Agent golden path
If you are wiring Codixing into an AI agent, start with these tools instead of exposing the whole surface at once:
| Task | Use this first | Why |
|---|---|---|
| Find relevant code from a concept | code_search / codixing search | Ranked, token-bounded retrieval for natural language and code terms |
| Jump to a known definition | find_symbol / codixing symbols | Definitions only, not every textual mention |
| Check blast radius before editing | search_usages --complete or predict_impact / codixing impact | Deterministic callers/importers/tests instead of top-K guesses |
| Understand a feature | feature_hub or get_context_for_task | One call combines search, dependencies, dependents, and tests |
| Inspect exact text | grep_code / codixing grep | Literal/regex scan for strings, errors, TODOs, and generated names |
| Focus on current work | focus_map / codixing graph --map | Graph-ranked context biased toward changed or seed files |
The rest of the MCP tools are specialist tools. Use search_tools and
get_tool_schema when an agent needs to discover a narrower capability.
Getting Started
60-second setup
# 1. Install
curl -fsSL https://codixing.com/install.sh | sh
# 2. Index your project
codixing init .
# β Indexed 2,847 files, 14,203 chunks, 8,891 symbols in 1.2s
# 3. Search
codixing search "authentication handler"
# βΊ src/auth/handler.rs:42 [score: 0.94]
# pub fn handle_auth_request(req: Request) -> Result<Token>
That's it. Your agent now uses ranked search instead of grep.
CLI commands
# Search (natural language or symbol names)
codixing search "error handling middleware"
# Symbol lookup (definitions only, not mentions)
codixing symbols Engine
# Dependency graph
codixing callers src/engine.rs # who imports this file?
codixing callees src/engine.rs # what does this file import?
# Keep index fresh (re-indexes only changed files)
codixing sync
# Architecture map
codixing graph --token-budget 4000
Hybrid search (optional)
codixing init builds BM25 + symbol graph by default. For natural-language queries
("how does the auth flow work?"), opt into semantic embeddings with --embed:
codixing init . --embed --model bge-small-en # one-time, ~2 min on a medium repo
codixing search "how does auth work" --strategy fast
ONNX-based embedding models (bge-small-en, bge-base-en, etc.) require ONNX
Runtime (pip install onnxruntime, or download from the
onnxruntime releases). The
static model2vec model and BM25-only installs do not need it.
CLI Commands
The most common commands (run codixing --help for the full list):
codixing search "query" # Semantic code search
codixing grep "pattern" # Literal/regex text scan (path:line:col:text)
codixing symbols Widget # Find symbol definitions
codixing usages add_chunk # Find call sites and imports
codixing callers src/engine.rs # Who imports this file
codixing callees src/engine.rs # What this file imports
codixing graph --map # Architecture overview
codixing graph --communities # Louvain community detection
codixing graph --surprises 10 # Top N surprising edges
codixing graph --html graph.html # Interactive HTML visualization
codixing path src/a.rs src/b.rs # Shortest import chain
codixing impact src/engine.rs # Blast radius analysis
codixing api src/engine.rs # Public API surface
codixing types Engine # Type relationships
codixing examples add_chunk # Usage examples from tests + callers
codixing context src/engine.rs # Cross-file context assembly
codixing init . # Index a project
codixing sync # Incremental re-index
codixing audit # Find stale files
Full reference: codixing.com/docs
MCP server (optional)
For editors with MCP support, the codixing-mcp binary exposes 67 JSON-RPC 2.0 tools:
| Category | Tools |
|---|---|
| Search | code_search, find_symbol, grep_code, search_usages, read_symbol, find_similar, stitch_context |
| Graph | get_repo_map, focus_map, get_references, get_transitive_deps, symbol_callers, symbol_callees, predict_impact, find_orphans, explain |
| Files | read_file, write_file, edit_file, delete_file, apply_patch, list_files, outline_file |
| Analysis | find_tests, find_source_for_test, get_complexity, review_context, rename_symbol, run_tests, get_context_for_task, check_staleness, generate_onboarding, audit_freshness |
| Git | git_diff, get_hotspots, search_changes, get_blame |
| Session | remember, recall, forget, get_session_summary, session_status, session_reset_focus |
| Meta | index_status, search_tools, get_tool_schema, enrich_docs |
Daemon mode
Daemon mode loads the engine once and serves calls over a Unix socket (or named pipe on Windows) β 4-5x faster. The daemon auto-starts on first connection and self-terminates after 30 minutes idle:
codixing-mcp --root /path/to/project # auto-starts daemon
codixing-mcp --root /path/to/project --daemon # explicit daemon start
codixing-mcp --root /path/to/project --no-daemon-fork # disable auto-start
The daemon auto-updates the index within ~100ms of any file save.
LSP Server
codixing-lsp brings code intelligence to any LSP-capable editor β VS Code, Neovim, Emacs, Sublime Text, JetBrains.
Capabilities: Hover, Go-to-definition, References, Call hierarchy (incoming/outgoing), Workspace symbols, Document symbols, Live reindex on save, Cyclomatic complexity diagnostics, Code actions, Inlay hints, Completions, Signature help, Rename refactoring, Semantic tokens.
codixing-lsp --root /path/to/project
Neovim:
{ cmd = { "codixing-lsp", "--root", vim.fn.getcwd() } }
Emacs (eglot):
(add-to-list 'eglot-server-programs
'((rust-mode python-mode) . ("codixing-lsp" "--root" "/your/project")))
VS Code / Cursor Extension
The editors/vscode/ directory contains a TypeScript extension with: Index Workspace, Sync Index, Search, Show Repo Map, Start Daemon, Register MCP Server.
cd editors/vscode && npm install && npm run compile
# Then F5 in VS Code to launch the Extension Development Host
Pre-built VSIX: Download codixing.vsix from the releases page and install:
code --install-extension codixing.vsix
Performance
| Metric | BM25-only | Hybrid (BgeSmallEn) |
|---|---|---|
| Init (138 files) | 0.21s | 120s (one-time) |
| MCP cold start | 24ms | 107ms |
| Search latency | 30-42ms | 36-40ms |
| Top-1 accuracy | 7/10 | 10/10 |
Retrieval accuracy (OpenClaw, 20 curated file-localization queries, 2026-04-28):
| Tool | Recall@10 | MRR | Notes |
|---|---|---|---|
| Codixing | 0.802 | 0.827 | symbols, usages, search, and cross-imports routed by query type |
| codebase-memory-mcp v0.6.0 | 0.374 | 0.243 | Local CLI benchmark; semantic tool was not exposed by the downloaded build |
| grep | 0.191 | 0.168 | Baseline recursive text scan |
Raw results: external_competitor_benchmark.md. To reproduce the full table, set CODEBASE_MEMORY_MCP=/path/to/codebase-memory-mcp for a local v0.6.0 binary, then run run_external_competitors.sh.
Large codebase (368K LoC, 7,607 files): Init 7.9s, search 94ms, 99% token reduction vs grep.
Linux kernel (63K C/H files, 30M+ lines, 84K-node dependency graph): 1.57s cold-start search, 0.79s warm via the MCP daemon path. Zero-deserialization mmap for instant startup. Note: fresh-process CLI invocations on a 2GB+ hybrid index pay startup cost on every call β prefer the MCP daemon or --no-embeddings for the CLI path.
SWE-bench Lite (300 tasks, 12 repos): Recall@5 = 74.3% (vs grep 41.3%).
See benchmarks/ for detailed methodology and reproduction scripts.
Key Features
- 29 languages β Tree-sitter AST for Rust, Python, TypeScript, Go, Java, C, C++, C#, Ruby, Swift, Kotlin, Scala, Zig, PHP, Bash, Matlab; line-based parsing for config/diagram formats (YAML, TOML, Dockerfile, Makefile, Mermaid, XML); structured doc parsers for Markdown, HTML, reStructuredText, AsciiDoc, and plain text
- Documentation indexing β indexes Markdown, HTML, reStructuredText (
.rst), AsciiDoc (.adoc), and plain text (.txt+ bareREADME/LICENSE/AUTHORS/CHANGELOG) alongside code with section-aware chunking, CHANGELOG-aware version-section splitting, breadcrumb metadata, and doc-to-code graph linking; use--docs-onlyto restrict results to docs or--code-onlyto exclude them - Hybrid search β BM25 + optional vector embeddings, fused with Reciprocal Rank Fusion
- Symbol-level call graph β Function-to-function call edges extracted from AST, including Rust trait dispatch, Python class inheritance, and TypeScript interface implementations
- Dependency graph β Import + call extraction, PageRank scoring, Personalized PageRank for focus-aware maps, Louvain community detection, shortest path queries, surprise/anomaly edge scoring
- Interactive graph visualization β
codixing graph --htmlgenerates a self-contained HTML file with force-directed layout, community coloring, confidence-styled edges, surprise highlights, search/filter, zoom/pan - Graph exports for external tools β
codixing graph --graphml(Gephi/yEd),--cypher(Neo4j),--obsidian(markdown vault with one note per community) for downstream analysis and knowledge-base integration - Git hooks β
codixing hook installwires post-commit hooks for automatic index sync after every commit;codixing hook status/uninstallmanage the lifecycle - Caller cascade β
codixing callers <file> --depth Nwalks the import graph N hops to surface the full transitive caller cascade - TOML output filter pipeline β Project-local
.codixing/filter_rules.tomlcompresses MCP tool output for token-tight agent loops, with tee recovery to disk for full output when agents need it - Edge confidence β Every dependency edge tagged Verified/High/Medium/Low based on extraction method (AST-resolved, call extraction, doc reference, external)
- Ranked cross-imports β PageRank + git recency scoring for relevance-ranked graph queries across directory boundaries
- Memory relations β
memory_relatetool creates typed edges between agent memory entries, enabling associative recall across sessions - Feature hub β One-call feature exploration combining search + callers + callees + tests for unified understanding
- Change impact analysis β
codixing impactcomputes blast radius: direct dependents, transitive dependents, and affected tests for any file - Semantic concept graph β Vocabulary gap bridging via behavioral signatures; embedding-free
--semanticstrategy matches code by what it does, not just what it's named - API surface analysis β
codixing apilists public symbols with visibility tracking (pub, pub(crate), export, etc.) - Type-aware search β
codixing typesshows type relationships: implements, extends, returns, contains - Usage example mining β
codixing examplesfinds real usage from tests, callers, and doc blocks - Cross-file context assembly β
codixing contextfollows import chains and callees to assemble understanding context - Query-personalized PageRank β Query-time graph boost seeds PageRank from query-relevant nodes for context-aware ranking
- Learned query reformulation β Project-specific vocabulary expansion learns from codebase patterns
- CLI + MCP β Full CLI surface for direct use (run
codixing --help); 67 MCP tools for editor integration (search, graph traversal, file operations, code review, git analysis, session memory, federation discovery) - File freshness audit β
audit_freshnesstool identifies stale and orphaned files across releases - Preflight gates β Plugin enforces existence scanning before proposing new features
- TypeScript import resolution β Resolve
.jsβ.tsimports with node16/bundler moduleResolution support, enabling 0.8+ R@10 on cross-package code discovery - Background embedding drain β Instant BM25 search after
codixing init, hybrid vector search transparently upgrades as embeddings complete in the background - Model2Vec with code-aware preprocessing β Static embeddings via
potion-base-8M(no ONNX needed, instant init). CamelCase/snake_case splitting before tokenization reduces subword fragments by 50-70%, achieving MRR 1.000 on concept queries - Jina Code Int8 β
jina-embeddings-v2-base-codeint8-quantized for ARM64 (768 dims, 8ms/query, nDCG@10 0.949). SetJINA_CODE_INT8_ONNXenv var to the model path - Embedding speed measurement β New
bench-embedCLI subcommand for profiling embedding performance across custom models - Daemon mode β Engine stays in memory, auto-starts on first connection, Unix socket (macOS/Linux) or named pipe (Windows) IPC, file watcher for live index updates, 30-min idle timeout
- Field-weighted BM25 β Configurable per-field boosting (entity_names 3Γ, signature 2Γ, scope_chain 1.5Γ, content 1Γ)
- Search pipeline β Composable search stages (definition boost, test demotion, path match, graph boost, recency boost, graph semantic propagation via GraphPropagationStage, file-level dedup via FileDedupStage, truncation) with 6 strategies including trigram exact-match
- Multi-query RRF fusion β Auto-generates query reformulations for natural-language queries (3+ words) and fuses results via Reciprocal Rank Fusion; also available via explicit
queriesparameter oncode_search - Git recency signal β Mildly boosts recently modified files (+10% linear decay over 180 days) via lazy-loaded git log timestamps
- Overlapping chunks β Bridge chunks at AST-aware chunk boundaries capture cross-function context; configurable
overlap_ratio(default 0.0) - File path boosting β Detects explicit file paths and backtick code references in queries and boosts matching results (2.5Γ)
- Kernel-scale performance β Tested on the Linux kernel (63K C/H files, 30M+ lines, 84K-node graph): 1.57s cold-start search, 0.79s warm via the MCP daemon. Mmap symbol table AND trigram index (zero-deserialization), compact chunk metadata (11Γ smaller), lazy trigram loading
- Trigram pre-filtering β File-level trigram inverted index (Russ Cox/trigrep technique) skips files before disk I/O; 110Γ faster literal grep at 1K files, 52Γ faster at 10K files; persistent bitcode storage, regex HIR walking with OR-branch support, parallel rayon verification
- LSP rename + semantic tokens β Cross-file rename refactoring with conflict detection; semantic highlighting for Rust, Python, TypeScript, Go
- Queue-based embedding β Optional RustQueue-backed pipeline with crash recovery, parallel ONNX workers (NΓ throughput), deferred embedding (
--defer-embeddings), and streaming mpsc pattern that fixes OOM on large repos - Streaming embeddings β Fixed-window batch processing (256 chunks) with progress reporting; incremental vector reuse via content hashing
- Federation auto-discovery β Auto-detects Cargo, npm, pnpm, Go workspaces, git submodules, and nested projects
- Read-only concurrent access β Multiple instances share the same index; periodic reload detects writer updates automatically
- Incremental embedding β
syncskips re-embedding unchanged chunks (content hash comparison) - Progress notifications β Long-running MCP tools emit
notifications/progresswith streaming partial results so agents see live status - Windows support β Named pipe daemon, brute-force vector fallback when usearch (POSIX-only) is unavailable
- GitHub Action β Automated code review with impact analysis on PRs
- Token budgets β All output respects token limits; adaptive truncation at score cliffs
- Cross-repo federation β Unified search across multiple indexed projects with CLI management and workspace auto-discovery (
codixing federation init/add/remove/list/search/discover) - Cross-package import graph β
cross-importscommand finds files in one directory that import from another via single O(E) graph walk - HTTP API server β REST endpoints (search, symbols, grep, hotspots, complexity, outline, graph) with SSE streaming (
crates/server/) - Single binary β No JVM, no Docker, no external databases, no API keys. macOS, Linux, and Windows
Supported Languages
| Tier | Languages |
|---|---|
| Tier 1 (full AST + graph) | Rust, Python, TypeScript, TSX, JavaScript, Go, Java, C, C++, C# |
| Tier 2 (full AST + graph) | Ruby, Swift, Kotlin, Scala |
| Tier 3 (full AST + graph) | Zig, PHP, Bash, Matlab |
| Config (symbol extraction) | YAML, TOML, Dockerfile, Makefile |
| Diagram / Markup (symbol extraction) | Mermaid, XML/Draw.io |
| Docs (section-aware chunking) | Markdown, HTML, reStructuredText (.rst), AsciiDoc (.adoc, .asciidoc), plain text (.txt + bare README/LICENSE/AUTHORS/CHANGELOG) |
Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Codixing Engine β
β β
β Tree-sitter β cAST Chunker β Tantivy (BM25) β
β AST Parser (18 langs) + Code Tokenizer β
β β
β Symbol Table (DashMap) Code Graph (petgraph + PageRank) β
β β
β Retriever: BM25 Β· Hybrid (RRF) Β· Thorough (MMR) Β· Explore β
β + Exact (trigram) Β· Graph boost Β· Definition 3.5Γ Β· Session β
β SearchPipeline: composable stages, 6 strategies β
β β
β API: CLI Β· MCP (67 tools, JSON-RPC 2.0) Β· LSP Β· HTTP β
β Daemon (Unix socket / Windows named pipe) Β· File Watcher β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Development
cargo build --workspace
cargo test --workspace # 1239 tests
cargo clippy --workspace -- -D warnings
cargo fmt --check
License
Licensed under the Apache License, Version 2.0. See LICENSE for the full text.
