📦

Rememex

semantic search for your local files find by meaning, not keywords. 120+ file types, OCR, MCP server for AI agents. 100% private.

0 installs

Trust: 44 — Fair

Rag

Ask AI about Rememex

I know everything about Rememex. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Rememex

a semantic upgrade to your file system. you type meaning, it finds files. nothing leaves your machine.

named after Vannevar Bush's Memex (1945), a vision of a device that stores and retrieves all human knowledge.

windows 10+ only for now. uses UWP OCR and mica backdrop.

why rememex?

	rememex	ripgrep	Everything	Sourcegraph	Microsoft Recall
search type	semantic + keyword hybrid	regex / literal text	filename (content via `content:`)	keyword + symbol + semantic	screenshots your entire life every 5 seconds
understands meaning	✅	❌	❌	✅	✅ (it saw everything. literally everything.)
local & private	✅ everything on your machine	✅	✅	cloud or self-hosted	"local" (pinky promise)
file types	120+ (code, docs, images, configs)	text files	all files (index by name)	code repos	your screen. all of it. always.
image OCR	✅ built-in	❌	❌	❌	✅ (it OCRs your passwords too)
EXIF / GPS	✅ reverse geocodes to city names	❌	❌	❌	knows where you are anyway
MCP server	✅ built-in for AI agents	❌	❌	?	no but copilot watches you type
price	free, open source	free, open source	free	starts at $49/user/mo	free* (*costs your dignity)
vibes	finds what you mean	finds what you type	finds filenames	enterprise™	big brother as a feature

what it does

indexes 120+ file types (code, docs, images, configs, whatever)
OCR on images via windows built-in engine
reads EXIF → reverse geocodes GPS to city names. search "photos from istanbul" and it works
EXIF dates → human words. "summer morning" finds a photo from july at 8am
hybrid search: vector + full-text + JINA cross-encoder reranker
smart chunking per language (rust at fn/struct, python at def/class, etc)
semantic containers for isolation (work/personal/research)
MCP server for AI agents. details → · agent instructions →
annotations: attach searchable notes to any file, from the UI or via MCP. agents and humans share the same knowledge layer
optional cloud embeddings -- plug in OpenAI, Gemini, Cohere, or any compatible API. default is still 100% local

architecture

indexing

graph LR
    W[file watcher] -->|change event| SI[index single file]
    WB[WalkBuilder] -->|bulk scan| B[collect files]
    B --> C{image?}
    C -->|yes| D[UWP OCR + EXIF]
    C -->|no| E[file_io reader]
    D --> F[git context]
    E --> F
    F --> G["semantic chunking (per-language)"]
    G --> H[embedding provider]
    H -->|local ONNX or remote API| I[(lancedb)]
    I --> J[ANN + FTS index build]

search

graph LR
    Q[query] --> QR[query router]
    QR -->|weights + hyde flag| HYDE{hyde?}
    HYDE -->|conceptual| LLM[LLM hypothetical doc]
    HYDE -->|other| EMB[embed query]
    LLM --> EMB
    Q --> EXP[expand query variants]
    EMB --> VS[vector search]
    EXP --> FTS[full-text search]
    VS --> HM["hybrid merge (RRF)"]
    FTS --> HM
    EMB --> AS[annotation search]
    AS --> AM[merge annotations]
    HM --> AM
    AM --> RR[JINA reranker]
    RR --> SC[score normalization]
    SC --> MMR[MMR diversity]
    MMR --> R[ranked results]

    UI[tauri UI] --> Q
    MCP[MCP server] -->|stdio| Q

run it

npm install
npm run tauri dev        # dev is slow
npm run tauri build      # release build, use this for real speed

Alt+Space to toggle. config & docs → CONFIG.md

RAM usage peaks during initial indexing — this is expected. once indexing completes, it drops and stays stable.

try it with real data

we ship a test dataset so you can see what semantic search actually feels like. 2,483 resume PDFs across 24 professions, accountants to teachers.

# unzip test-set/data.zip somewhere
# create a new container in rememex, point it at the unzipped folder
# wait for indexing (~30 min on local embeddings)

we indexed it and ran these queries. all results below used the most basic config — no cloud APIs, no fine-tuning:

setting	value
embedding model	Multilingual-E5-Base (local ONNX, ~170MB)
reranker	off
chunk size	512 tokens, 64 overlap
query router	on
MMR diversity	on (~65% balance)
HyDE	off
embedding provider	local — zero API calls

real results, real scores:

query	top result	score	why it's interesting
"software engineer who knows Python and machine learning"	`AGRICULTURE/62994611.pdf` — Python, TensorFlow, Keras, Scikit-learn, Pandas	55.2	filed under AGRICULTURE. rememex found it anyway
"nurse with emergency room experience"	`ADVOCATE/46772262.pdf` — Certified Emergency Nurse, Trauma Nurse Specialist	58.2	filed under ADVOCATE. wrong folder, right person
"someone who can cook Italian and French cuisine"	`CHEF/10276858.pdf` — Italian cuisine, fine dining, ethnic foods preparation	34.7	query said "French" too — top result has Italian, #3 result has "French cuisine talent". it splits the match across candidates
"MBA graduate with sales leadership"	`DIGITAL-MEDIA/20330739.pdf` — built $25MM sales teams, Exec Director of Sales	55.5	MBA + sales, found in DIGITAL-MEDIA folder. categories don't matter
"graphic designer with Photoshop and Illustrator"	`DESIGNER/29147100.pdf` — Adobe Photoshop, Illustrator, InDesign, portfolio link	66.1	highest score. exact skill match + portfolio
"civil engineer with AutoCAD and project management"	`CONSTRUCTION/32025286.pdf` — AutoCAD Civil 3D, cost analysis, full project admin	59.4	construction admin, not "civil engineer" by title. meaning > title

the point: grep needs the exact keyword. rememex finds meaning — even when the words are different, even when the file is in the wrong folder.

agentic benchmark

same 5 tasks, same codebase. grep vs rememex MCP:

task	grep	rememex
"find where GPS coords become city names"	grep "GPS" → 0. grep "geocode" → found file, need to open. 3 steps	1 step
"find the quality filter threshold"	grep "threshold" → 0 (code says `>= 25.0`). failed	1 step
"find dedup logic for best chunk per file"	grep "dedup" → 0. grep "best" → noise. 3-5 steps	1 step
"find config migration handling"	grep "legacy" → wrong file. wrong answer	1 step
"find embedding batch size constant"	grep "batch_size" → 0 (it's `EMBED_BATCH_SIZE`). failed	1 step

grep needs the exact keyword. rememex needs the idea.

agents using rememex are expected to use 5-10x fewer tokens and complete tasks significantly faster. fewer search attempts, fewer wrong files opened, fewer round-trips. the benchmark above shows 1 step vs 3-5 , that's both speed and cost.

project structure

rememex/
├── src/                          # react/ts frontend
│   ├── components/               # UI components
│   │   ├── Sidebar.tsx           # sidebar: containers, annotations, filters
│   │   ├── SearchBar.tsx         # search input
│   │   ├── ResultsList.tsx       # virtualized search results
│   │   ├── StatusBar.tsx         # indexing status bar
│   │   ├── TitleBar.tsx          # custom window title bar
│   │   ├── Settings.tsx          # settings panel
│   │   └── settings/            # modular settings sub-panels
│   ├── locales/                  # i18n translations (en, tr)
│   ├── Modal.tsx                 # modal dialog component
│   ├── i18n.tsx                  # internationalization setup
│   ├── types.ts                  # shared TypeScript types
│   └── App.tsx                   # main app shell
├── src-tauri/
│   └── src/
│       ├── indexer/              # core engine
│       │   ├── mod.rs            # indexer orchestration, batch embed, reranker
│       │   ├── chunking.rs       # per-language semantic splitting
│       │   ├── embedding.rs      # fastembed ONNX inference
│       │   ├── embedding_provider.rs  # local/remote provider trait
│       │   ├── search.rs         # hybrid vector + full-text + reranker
│       │   ├── pipeline.rs       # search pipeline scoring
│       │   ├── annotations.rs    # annotation CRUD operations
│       │   ├── ocr.rs            # UWP OCR bridge
│       │   ├── file_io.rs        # file reading (text, pdf, binary)
│       │   ├── git.rs            # git log integration
│       │   └── db.rs             # lancedb operations
│       ├── bin/mcp.rs            # MCP server binary (stdio)
│       ├── commands.rs           # tauri IPC commands
│       ├── config.rs             # config loading / migration
│       ├── state.rs              # shared app state types
│       ├── watcher.rs            # notify-based file watcher
│       └── lib.rs                # app setup, tray, shortcuts
├── config.schema.json            # JSON schema for config validation
├── AGENT.md                      # agent instructions for MCP
├── MCP.md                        # MCP server documentation
├── CONFIG.md                     # configuration reference
└── ROADMAP.md                    # what's done, what's next

stack

rust (tauri 2), react/ts, lancedb, Multilingual-E5-Base, JINA Reranker v2, rayon

roadmap

ROADMAP.md

docs

doc	what
CONFIG.md	configuration reference, all options, file types, provider setup
MCP.md	MCP server setup, tool reference, editor configs
AGENT.md	instructions for AI agents using rememex as a tool
BEST_PRACTICES.md	tips for getting the best out of rememex
CONTRIBUTING.md	how to contribute, bug reports, PR guidelines

star history

contributors

license

MIT

Rememex

Reviews

Documentation

Rememex

why rememex?

what it does

architecture

indexing

search

run it

try it with real data

agentic benchmark

project structure

stack

roadmap

docs

star history

contributors

license

Security Checklist