Yomu
Frontend-specialized semantic code search MCP server. Reads and understands your codebase. (θͺγ)
Ask AI about Yomu
Powered by Claude Β· Grounded in docs
I know everything about Yomu. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
English | ζ₯ζ¬θͺ
yomu
Frontend code search for AI agents. Find code by concept when you don't know the name.
The problem
You need to find the chat streaming hook in vercel/ai (3,535 files), but you don't know the function is called useChat.
Typical agent workflow:
glob "**/chat*" β 12 files. None are it (it's called use-chat.ts).
grep "stream.*hook" β 0 files.
grep "chat" β too many results. Try reading a few...
read packages/react/src/use-chat.ts β Found it. What does it import?
read packages/ai/src/ui/process-ui-message-stream.ts β Now I have context.
3-5 tool calls, trial and error, noise in the context window.
With yomu:
yomu search "streaming chat hooks"
## packages/react/src/use-chat.ts
Imports: @ai-sdk/provider-utils, @ai-sdk/ui-utils
Siblings: UseChatOptions [type_def], UseChatHelpers [type_def]
1. useChat [hook] β 58:210 (similarity: 0.85)
export function useChat({ api, ...options }: UseChatOptions): UseChatHelpers {
...full function body...
}
## packages/rsc/src/streamable-value/use-streamable-value.ts
2. useStreamableValue [hook] β 12:45 (similarity: 0.72)
...
## examples/ai-e2e-next/.../chat-context.tsx
3. useSharedChatContext [hook] β 8:22 (similarity: 0.68)
...
1 call. The implementation is the first result β out of 130 files that contain "useChat", and 9,015 total chunks in the index.
Each result includes the full code body, file imports, and sibling definitions. No follow-up reads needed.
Why not just grep?
Claude Code's developers found that agentic search β letting the model use glob and grep iteratively β outperformed RAG for code navigation. They're right. When the agent can retry with different keywords, read directory structures, and refine its search, grep works remarkably well.
yomu doesn't compete with that workflow. It reduces the iterations:
| Approach | Calls | Context window cost |
|---|---|---|
| grep/glob (iterative) | 3β5 | Each miss adds noise |
| yomu search | 1 | Code + imports + siblings in one response |
The classic RAG problems β index sync lag, stale embeddings, cold starts β are addressed:
| Solution | Details |
|---|---|
| No sync lag | Every search checks index freshness and re-chunks automatically if files changed |
| No API key required | Local embedding model, FTS5 full-text fallback when model unavailable |
| Incremental embedding | 50 chunks per search call, most-imported files first. No upfront build |
grep is the right tool when you know the name. yomu is for the moment before that β when you know the concept but not the identifier.
When to use yomu (and when not to)
| yomu | grep/glob |
|---|---|
| You don't know what the code is called | You know the exact name (grep "useAuth" is faster) |
| grep returns too many results β you want the impl | You need regex matching or exact string search |
| You want code + imports + related types in one call | The codebase is small and familiar |
yomu doesn't replace grep. It covers the case grep can't: searching by concept.
Setup
Install
brew install thkt/tap/yomu
Or build from source (requires Rust 1.85+):
cargo build --release
Configure
Semantic search uses a local embedding model (Ruri v3, ~1.2 GB). If the model is already cached locally, search uses it automatically β no API key required.
If the model is not installed or unavailable, search falls back to text-only mode (FTS5). All other commands (index, rebuild, impact, status) work without the model.
No manual indexing. search auto-indexes on first call.
Platform notes
| Platform | Build command |
|---|---|
| macOS (Apple Silicon) | cargo build --release (default: mlx backend) |
| Linux / x86 | cargo build --release --no-default-features --features candle |
Commands
Global flags
| Flag | Description |
|---|---|
--json | Output as JSON (all commands) |
--json can appear before or after the subcommand:
$ yomu --json status
{"files":42,"chunks":187,"embedded_chunks":187,"embeddable_chunks":187,"embed_percentage":100,"references":156,"last_indexed":"2025-03-29 01:23:45"}
yomu search [query] β Search by concept
Returns ranked results with full context. Each result includes:
| Included | Why |
|---|---|
| Full code body | No follow-up read needed |
| File imports | Dependency context without opening another file |
| Sibling defs | Other functions/types in the same file |
| Chunk type | component / hook / type_def / css_rule / rust_fn |
Options:
| Flag | Default | Description |
|---|---|---|
--limit | 10 | Max results (max: 100) |
--offset | 0 | Pagination offset (max: 500) |
--from | β | Search for code similar to a file or symbol (src/foo.rs or src/foo.rs:my_fn). Query becomes optional |
--no-embed | false | Skip embedding lookups; use FTS5 only. Same effect as YOMU_EMBED=0 |
--from uses the stored embeddings of the target β no re-embedding needed:
yomu search --from src/query/mod.rs # files similar to this file
yomu search --from src/query/mod.rs:rerank # files similar to this function
yomu search --from src/query/mod.rs "filter" # hybrid: similar to file + FTS on "filter"
yomu impact <target> β Blast radius of a change
Shows which files depend on a target file or symbol.
$ yomu impact "packages/ai/src/ui/ui-messages.ts" --symbol UIMessage --depth 2
## Impact analysis: `packages/ai/src/ui/ui-messages.ts`
### Direct symbol references
- packages/ai/src/ui/process-ui-message-stream.ts
- packages/react/src/use-chat.ts
...34 files
### All transitive dependents
#### Depth 1
- packages/ai/src/ui/process-ui-message-stream.ts
...37 files
#### Depth 2
- packages/react/src/use-chat.ts
...18 files
Total: 55 dependent file(s)
Real output from vercel/ai. One call replaces manually tracing imports.
Options: --symbol (optional, filter to specific export), --depth (default: 3, max: 10)
--json returns a structured response. dependents[].references is populated for direct (depth=1) edges and lists every (ref_kind, via_symbol) pair from the source file. Transitive (depth>=2) dependents reach the target through intermediate files, so their references is []:
{
"target": "src/storage.rs",
"in_index": true,
"dependents": [
{
"file_path": "src/indexer.rs",
"depth": 1,
"references": [
{"ref_kind": "named", "via_symbol": "Db"},
{"ref_kind": "named", "via_symbol": "open_db"}
]
},
{
"file_path": "src/main.rs",
"depth": 2,
"references": []
}
],
"symbol_refs": [],
"total": 2
}
ref_kind is one of named / default / namespace / type_only / side_effect. via_symbol is null for namespace/side-effect imports where no individual symbol is named at the import site.
yomu index / yomu rebuild / yomu status
| Command | Details |
|---|---|
index | Update the chunk index. No API calls, ~2.5s on 3,535 files. Usually not needed β search auto-indexes |
rebuild | Full re-parse from scratch |
status | Files, chunks, embedding coverage, references |
How it works
Source files β tree-sitter AST β Semantic chunks β Local embeddings (Ruri v3) β Hybrid search
| Stage | Details |
|---|---|
| Indexing | tree-sitter splits code at function/component/type boundaries. Each chunk is one searchable unit. The import graph is built in the same pass. On vercel/ai: 3,535 files β 9,015 chunks + 5,026 import references in 2.5s, zero API calls |
| Embedding | Chunks are embedded incrementally via a local model (Ruri v3, 310M params). 50 chunks per search call, prioritized by import count β the most-used code gets searchable first. No upfront build required |
| Search | Three-tier hybrid: vector similarity β name/path matching β FTS5 full-text. Reranked with IDF-weighted keyword scoring. Frequently-imported files rank higher, test files are pushed down |
Supported file types
| Type | Parser | Chunk types |
|---|---|---|
| TypeScript / TSX | tree-sitter | component, hook, type_def, test_case, other |
| JavaScript / JSX | tree-sitter | component, hook, type_def, test_case, other |
| Rust | tree-sitter | rust_fn, rust_struct, rust_enum, rust_trait, rust_impl |
| CSS | tree-sitter | css_rule (selectors, @media, @keyframes) |
| HTML | tree-sitter | html_element |
Other files fall back to character-based chunking with overlap.
Limitations
| Limitation | Details |
|---|---|
| Model not auto-downloaded | The ~1.2 GB embedding model must be pre-cached; search does not download it automatically |
| SCSS/Sass not supported | Only plain CSS |
| Cold start | First search call takes a few seconds for chunking + initial embedding |
| Large files skipped | Files over 1 MB are excluded from indexing |
| Embedding opt-out | Pass --no-embed to yomu search, or set YOMU_EMBED=0; search falls back to text-only mode |
Development
Setup
Run once after cloning:
git config --local core.hooksPath .githooks
This installs a pre-commit hook that runs cargo fmt --check and cargo clippy --all-targets --all-features -- -D warnings before each commit. Violations abort the commit. To skip for one commit: git commit --no-verify.
Common commands
cargo test # all tests
cargo clippy --all-targets --all-features -- -D warnings # lint (matches CI)
cargo fmt -- --check # format check
Architecture
src/
βββ main.rs CLI entry point (clap)
βββ lib.rs Crate root, public API
βββ config.rs Runtime configuration
βββ modernbert.rs ModernBERT model (mlx backend)
βββ tools/ Application facade β orchestrates indexer, query, storage per command
βββ indexer/
β βββ mod.rs Orchestration: incremental index, embed budget
β βββ chunker/ tree-sitter AST β semantic chunks
β βββ embedder.rs Local embedding (ModernBERT/Ruri v3 via mlx-rs or candle)
β βββ walker.rs File discovery, .gitignore filtering
βββ resolver.rs Import path resolution (tsconfig aliases, index.ts probing)
βββ query/ Hybrid search + IDF reranking
βββ storage/
βββ mod.rs Schema, CRUD, types
βββ search.rs Vector similarity, name matching, FTS5
βββ embed.rs Embedding storage, vec_chunks
βββ graph.rs Import graph traversal, dependents, siblings
Single binary, zero runtime dependencies. SQLite and sqlite-vec are statically linked.
License
MIT
