Rememex
semantic search for your local files find by meaning, not keywords. 120+ file types, OCR, MCP server for AI agents. 100% private.
Ask AI about Rememex
Powered by Claude Β· Grounded in docs
I know everything about Rememex. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Rememex
a semantic upgrade to your file system. you type meaning, it finds files. nothing leaves your machine.
named after Vannevar Bush's Memex (1945), a vision of a device that stores and retrieves all human knowledge.
windows 10+ only for now. uses UWP OCR and mica backdrop.
why rememex?
| rememex | ripgrep | Everything | Sourcegraph | Microsoft Recall | |
|---|---|---|---|---|---|
| search type | semantic + keyword hybrid | regex / literal text | filename (content via content:) | keyword + symbol + semantic | screenshots your entire life every 5 seconds |
| understands meaning | β | β | β | β | β (it saw everything. literally everything.) |
| local & private | β everything on your machine | β | β | cloud or self-hosted | "local" (pinky promise) |
| file types | 120+ (code, docs, images, configs) | text files | all files (index by name) | code repos | your screen. all of it. always. |
| image OCR | β built-in | β | β | β | β (it OCRs your passwords too) |
| EXIF / GPS | β reverse geocodes to city names | β | β | β | knows where you are anyway |
| MCP server | β built-in for AI agents | β | β | ? | no but copilot watches you type |
| price | free, open source | free, open source | free | starts at $49/user/mo | free* (*costs your dignity) |
| vibes | finds what you mean | finds what you type | finds filenames | enterpriseβ’ | big brother as a feature |
what it does
- indexes 120+ file types (code, docs, images, configs, whatever)
- OCR on images via windows built-in engine
- reads EXIF β reverse geocodes GPS to city names. search "photos from istanbul" and it works
- EXIF dates β human words. "summer morning" finds a photo from july at 8am
- hybrid search: vector + full-text + JINA cross-encoder reranker
- smart chunking per language (rust at
fn/struct, python atdef/class, etc) - semantic containers for isolation (work/personal/research)
- MCP server for AI agents. details β Β· agent instructions β
- annotations: attach searchable notes to any file, from the UI or via MCP. agents and humans share the same knowledge layer
- optional cloud embeddings -- plug in OpenAI, Gemini, Cohere, or any compatible API. default is still 100% local
architecture
indexing
graph LR
W[file watcher] -->|change event| SI[index single file]
WB[WalkBuilder] -->|bulk scan| B[collect files]
B --> C{image?}
C -->|yes| D[UWP OCR + EXIF]
C -->|no| E[file_io reader]
D --> F[git context]
E --> F
F --> G["semantic chunking (per-language)"]
G --> H[embedding provider]
H -->|local ONNX or remote API| I[(lancedb)]
I --> J[ANN + FTS index build]
search
graph LR
Q[query] --> QR[query router]
QR -->|weights + hyde flag| HYDE{hyde?}
HYDE -->|conceptual| LLM[LLM hypothetical doc]
HYDE -->|other| EMB[embed query]
LLM --> EMB
Q --> EXP[expand query variants]
EMB --> VS[vector search]
EXP --> FTS[full-text search]
VS --> HM["hybrid merge (RRF)"]
FTS --> HM
EMB --> AS[annotation search]
AS --> AM[merge annotations]
HM --> AM
AM --> RR[JINA reranker]
RR --> SC[score normalization]
SC --> MMR[MMR diversity]
MMR --> R[ranked results]
UI[tauri UI] --> Q
MCP[MCP server] -->|stdio| Q
run it
npm install
npm run tauri dev # dev is slow
npm run tauri build # release build, use this for real speed
Alt+Space to toggle. config & docs β CONFIG.md
RAM usage peaks during initial indexing β this is expected. once indexing completes, it drops and stays stable.
try it with real data
we ship a test dataset so you can see what semantic search actually feels like. 2,483 resume PDFs across 24 professions, accountants to teachers.
# unzip test-set/data.zip somewhere
# create a new container in rememex, point it at the unzipped folder
# wait for indexing (~30 min on local embeddings)
we indexed it and ran these queries. all results below used the most basic config β no cloud APIs, no fine-tuning:
| setting | value |
|---|---|
| embedding model | Multilingual-E5-Base (local ONNX, ~170MB) |
| reranker | off |
| chunk size | 512 tokens, 64 overlap |
| query router | on |
| MMR diversity | on (~65% balance) |
| HyDE | off |
| embedding provider | local β zero API calls |
real results, real scores:
| query | top result | score | why it's interesting |
|---|---|---|---|
| "software engineer who knows Python and machine learning" | AGRICULTURE/62994611.pdf β Python, TensorFlow, Keras, Scikit-learn, Pandas | 55.2 | filed under AGRICULTURE. rememex found it anyway |
| "nurse with emergency room experience" | ADVOCATE/46772262.pdf β Certified Emergency Nurse, Trauma Nurse Specialist | 58.2 | filed under ADVOCATE. wrong folder, right person |
| "someone who can cook Italian and French cuisine" | CHEF/10276858.pdf β Italian cuisine, fine dining, ethnic foods preparation | 34.7 | query said "French" too β top result has Italian, #3 result has "French cuisine talent". it splits the match across candidates |
| "MBA graduate with sales leadership" | DIGITAL-MEDIA/20330739.pdf β built $25MM sales teams, Exec Director of Sales | 55.5 | MBA + sales, found in DIGITAL-MEDIA folder. categories don't matter |
| "graphic designer with Photoshop and Illustrator" | DESIGNER/29147100.pdf β Adobe Photoshop, Illustrator, InDesign, portfolio link | 66.1 | highest score. exact skill match + portfolio |
| "civil engineer with AutoCAD and project management" | CONSTRUCTION/32025286.pdf β AutoCAD Civil 3D, cost analysis, full project admin | 59.4 | construction admin, not "civil engineer" by title. meaning > title |
the point: grep needs the exact keyword. rememex finds meaning β even when the words are different, even when the file is in the wrong folder.
agentic benchmark
same 5 tasks, same codebase. grep vs rememex MCP:
| task | grep | rememex |
|---|---|---|
| "find where GPS coords become city names" | grep "GPS" β 0. grep "geocode" β found file, need to open. 3 steps | 1 step |
| "find the quality filter threshold" | grep "threshold" β 0 (code says >= 25.0). failed | 1 step |
| "find dedup logic for best chunk per file" | grep "dedup" β 0. grep "best" β noise. 3-5 steps | 1 step |
| "find config migration handling" | grep "legacy" β wrong file. wrong answer | 1 step |
| "find embedding batch size constant" | grep "batch_size" β 0 (it's EMBED_BATCH_SIZE). failed | 1 step |
grep needs the exact keyword. rememex needs the idea.
agents using rememex are expected to use 5-10x fewer tokens and complete tasks significantly faster. fewer search attempts, fewer wrong files opened, fewer round-trips. the benchmark above shows 1 step vs 3-5 , that's both speed and cost.
project structure
rememex/
βββ src/ # react/ts frontend
β βββ components/ # UI components
β β βββ Sidebar.tsx # sidebar: containers, annotations, filters
β β βββ SearchBar.tsx # search input
β β βββ ResultsList.tsx # virtualized search results
β β βββ StatusBar.tsx # indexing status bar
β β βββ TitleBar.tsx # custom window title bar
β β βββ Settings.tsx # settings panel
β β βββ settings/ # modular settings sub-panels
β βββ locales/ # i18n translations (en, tr)
β βββ Modal.tsx # modal dialog component
β βββ i18n.tsx # internationalization setup
β βββ types.ts # shared TypeScript types
β βββ App.tsx # main app shell
βββ src-tauri/
β βββ src/
β βββ indexer/ # core engine
β β βββ mod.rs # indexer orchestration, batch embed, reranker
β β βββ chunking.rs # per-language semantic splitting
β β βββ embedding.rs # fastembed ONNX inference
β β βββ embedding_provider.rs # local/remote provider trait
β β βββ search.rs # hybrid vector + full-text + reranker
β β βββ pipeline.rs # search pipeline scoring
β β βββ annotations.rs # annotation CRUD operations
β β βββ ocr.rs # UWP OCR bridge
β β βββ file_io.rs # file reading (text, pdf, binary)
β β βββ git.rs # git log integration
β β βββ db.rs # lancedb operations
β βββ bin/mcp.rs # MCP server binary (stdio)
β βββ commands.rs # tauri IPC commands
β βββ config.rs # config loading / migration
β βββ state.rs # shared app state types
β βββ watcher.rs # notify-based file watcher
β βββ lib.rs # app setup, tray, shortcuts
βββ config.schema.json # JSON schema for config validation
βββ AGENT.md # agent instructions for MCP
βββ MCP.md # MCP server documentation
βββ CONFIG.md # configuration reference
βββ ROADMAP.md # what's done, what's next
stack
rust (tauri 2), react/ts, lancedb, Multilingual-E5-Base, JINA Reranker v2, rayon
roadmap
docs
| doc | what |
|---|---|
| CONFIG.md | configuration reference, all options, file types, provider setup |
| MCP.md | MCP server setup, tool reference, editor configs |
| AGENT.md | instructions for AI agents using rememex as a tool |
| BEST_PRACTICES.md | tips for getting the best out of rememex |
| CONTRIBUTING.md | how to contribute, bug reports, PR guidelines |
star history
contributors
license
MIT
