Kreuzakt
A search engine for humans and computers for your most boring documents
Ask AI about Kreuzakt
Powered by Claude Β· Grounded in docs
I know everything about Kreuzakt. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Kreuzakt - a simple replacement for Paperless
Kreuzakt is a project that takes the best parts of Paperless, drastically improves the OCR using VLLMs, and throws out 99% of the complexity. Take every boring document in your life and make them all instantly easy to find, and (optionally) let AIs search them to answer questions for you.
What's Different:
- Kreuzakt uses a single Docker container with an SQLite database, there aren't a ton of moving parts
- Rather than use Tesseract, Kreuzakt uses LLMs to do OCR (by default via OpenRouter but Ollama/Local LLMs work as well) via Kreuzberg. This drastically improves OCR accuracy, and by extension, search accuracy.
- Kreuzakt provides a remote MCP server - connect Claude Desktop, Cursor, or any other MCP client to Kreuzakt and ask questions about your documents
- Kreuzakt uses an LLM to also derive a title / description / original date for every document, out of the box. Zero manual curation / toil work.
- Metadata can always be regenerated from the source documents, the only thing you need to migrate is the originals
What's the Same:
- Kreuzakt always preserves your original documents, it never edits them directly
- Ingestion based on file watches works the same, drop documents into the 'ingest' folder and it will automatically be processed
Self-hosting with Docker Compose
services:
kreuzakt:
image: ghcr.io/anaisbetts/kreuzakt:latest
ports:
- "3000:3000"
environment:
OPENROUTER_KEY: ${OPENROUTER_KEY}
TZ: Europe/Berlin # Set your local timezone
volumes:
- ./docs:data
restart: unless-stopped
Drop this in a docker-compose.yml, set OPENROUTER_KEY in your environment or a .env file, and run docker compose up -d. The web UI is at http://localhost:3000.
The ./docs folder will be initialized with directories including ./data/ingest, ./data/originals, and ./data/thumbnails.
Ok now what do I do?
docker-compose up -d- Drop all of your documents into the ingest folder - they will eventually all move to the originals folder. You can see the progress at
/settings- if you have a lot of documents it might take a bit. - If you've got an existing Paperless install, you can run the import
- You can also simply drag-drop a bunch of files onto the main page
How much is this gonna cost me?
I'm too lazy to do the math on exactly how much per-page it costs, but for perspective, importing 440 documents from Paperless (a few of which were up to 80pgs long), cost me ~$5.
Volume mounts
Everything lives under /data by default β the SQLite database, originals, thumbnails, and the ingest folder. If you want to split things up, override with individual env vars and mount each path separately:
| Variable | Default | Description |
|---|---|---|
INGEST_DIR | /data/ingest | Watched folder for new documents |
IMPORT_DIR | /data/import | Staging folder for orchestrated imports (e.g. Paperless); not watched |
ORIGINALS_DIR | /data/originals | Stored original files |
THUMBNAILS_DIR | /data/thumbnails | Generated thumbnails |
DB_PATH | /data/docs-ai.db | SQLite database |
Optional environment variables
| Variable | Default | Description |
|---|---|---|
OPENROUTER_KEY | β | API key for OpenRouter (recommended) |
OPENAI_API_KEY | β | Alternative: direct OpenAI key |
OPENAI_BASE_URL | https://openrouter.ai/api/v1 | Base URL for any OpenAI-compatible API (e.g. Ollama at http://host.docker.internal:11434/v1) |
OCR_VLM_MODEL | openai/gpt-5.4-mini | Model used for OCR |
METADATA_LLM_MODEL | openai/gpt-5.4 | Model used for title/description extraction |
PORT | 3000 | Port inside the container |
TZ | UTC | Timezone for date display (e.g. Europe/Berlin, America/New_York). Use any tz database name. |
MCP setup
Kreuzakt exposes a remote MCP endpoint at /mcp (Streamable HTTP). Replace the hostname in the snippets below with wherever you serve the app β for example https://docs.your-tailnet.ts.net/mcp when using Tailscale Serve. Most clients will not talk to plain http, so terminating TLS (Serve, a reverse proxy, etc.) is the usual approach.
Claude Desktop β npx mcp-remote@latest β¦
mcp-remote bridges the HTTP MCP endpoint for clients that expect a local process.
{
"mcpServers": {
"docs": {
"command": "npx",
"args": ["mcp-remote@latest", "https://docs.your-tailnet.ts.net/mcp"]
}
}
}
Cursor β type: "http" in MCP config
Add to .cursor/mcp.json or your projectβs MCP settings.
{
"mcpServers": {
"docs": {
"type": "http",
"url": "https://docs.your-tailnet.ts.net/mcp"
}
}
}
Example prompts
- "Find invoices from Deutsche Telekom."
- "What was my health insurance number again?"
- "How much did I pay in taxes last year"
Local development
Prerequisites: Bun (the project runs Next.js and scripts through Bun; see package.json).
- Install dependencies:
bun install - Copy
.env.local.exampleto.env.localand set at least one way to reach an OpenAI-compatible API. The usual choice isOPENROUTER_KEY. For a local LLM, setOPENAI_DEV_URL,OPENAI_DEV_KEY, and optionallyOCR_VLM_DEV_MODEL/METADATA_LLM_DEV_MODEL. See.env.local.examplefor all variables the app and tooling recognize. - Start the dev server:
bun dev. The app listens on port 3000 by default (PORT). Runtime data defaults to./data(SQLite, ingest, originals, thumbnails) unless you overrideDATA_DIRor individual path variables.
Other useful commands:
bun testβ unit testsbun run test:integrationβ integration tests (loads.env.localvia--env-file; requires Paperless-related vars when those tests run)bun storybookβ UI development on port 6006
So.... why's it called "Kreuzakt"?
It uses the library Kreuzberg, and it is a tool to help you with your "Akte" (files/documents). Just like "Berghain" is a portmanteau of "Kreuzberg" and "Friedrichshain", the two districts in Berlin that it sits between. (today you learn!)
