io.github.BigJai/tokennuke
Code indexing MCP: 15 tools, 10 languages, hybrid search, call graphs, O(1) retrieval.
Ask AI about io.github.BigJai/tokennuke
Powered by Claude Β· Grounded in docs
I know everything about io.github.BigJai/tokennuke. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
TokenNuke
Intelligent code indexing MCP server. 15 tools, 10 languages, tree-sitter AST extraction, hybrid search (FTS5 + vector), call graphs, remote repo indexing, incremental indexing.
Save 99% of tokens β get exact function source via byte-offset seek instead of reading entire files.
Formerly
codemunch-pro. Same code, better name.
Install
pip install tokennuke
Quick Start
Claude Desktop / Cline
Add to your MCP client config:
{
"mcpServers": {
"tokennuke": {
"command": "tokennuke"
}
}
}
HTTP Server
tokennuke --transport streamable-http --port 5002
15 MCP Tools
| Tool | Description |
|---|---|
index_folder | Index a local directory (incremental, SHA-256 based) |
index_repo | Index a GitHub/GitLab repo (tarball download, no git needed) |
list_repos | List all indexed repositories with stats |
invalidate_cache | Force re-index a repository |
file_tree | Get directory tree with file counts |
file_outline | List symbols in a single file |
repo_outline | List all symbols in repo (summary) |
get_symbol | Get full source of one symbol (O(1) byte seek) |
get_symbols | Batch get multiple symbols |
search_symbols | Hybrid search (FTS5 + vector RRF) |
search_text | Full-text search in file contents |
get_callees | What does this function call? |
get_callers | Who calls this function? |
diff_symbols | What changed since last index? (PR review) |
dependency_map | What does this file depend on? What depends on it? |
10 Languages
Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby
All via tree-sitter-language-pack β zero compilation, pre-built binaries.
Key Features
O(1) Symbol Retrieval
Every symbol stores its byte offset and length. get_symbol seeks directly to the function source β no reading entire files. A 200-byte function from a 40KB file = 99.5% token savings.
Incremental Indexing
Files are hashed (SHA-256). Only changed files are re-parsed. Re-indexing a 10K file repo after changing one file takes milliseconds.
Hybrid Search (FTS5 + Vector)
Combines BM25 keyword matching with semantic vector similarity using Reciprocal Rank Fusion. Search "authentication middleware" and find auth_middleware, verify_token, and login_handler.
Call Graphs
Traces function calls through the AST. get_callees("main") shows what main calls. get_callers("authenticate") shows who calls authenticate. Supports depth traversal.
Remote Repo Indexing
Index any public GitHub or GitLab repo by URL β no git binary needed. Downloads the tarball via API, extracts, and indexes. Cached locally with SHA-based freshness checks. Supports private repos with auth tokens and sparse paths.
Full-Text Content Search
Search raw file contents β string literals, TODO comments, config values, error messages. Not just symbol names.
How It Works
- Parse β tree-sitter builds an AST for each source file
- Extract β Walk AST to find functions, classes, methods, types, interfaces
- Store β SQLite database per repo with FTS5 virtual tables
- Embed β FastEmbed (ONNX, CPU-only) generates 384-dim vectors for semantic search
- Graph β Call expressions extracted from function bodies, edges stored and resolved
- Serve β FastMCP exposes 15 tools via stdio or HTTP
Architecture
~/.tokennuke/
βββ myproject_a1b2c3d4e5f6.db # Per-repo SQLite database
βββ otherproject_7890abcdef.db
βββ ...
Each DB contains:
βββ files # Indexed files with SHA-256 hashes
βββ symbols # Functions, classes, methods, types
βββ symbols_fts # FTS5 full-text search index
βββ symbols_vec # sqlite-vec 384-dim vector index
βββ call_edges # Call graph (caller β callee)
βββ file_content_fts # Raw file content search
Use Cases
- AI Coding Agents: Give your agent surgical access to codebases without burning context
- Code Review: Find all callers of a function before changing its signature
- Onboarding: Search symbols semantically β "where is error handling?" finds relevant code
- Refactoring: Map call graphs before moving functions between modules
- Documentation: Extract all public APIs with signatures and docstrings
License
MIT
