📦

Ocr MCP

FastMCP server providing advanced OCR capabilities with current state-of-the-art models (DeepSeek-OCR, Florence-2, DOTS.OCR, PP-OCRv5, Qwen-Image-Layered decomposition), WIA scanner control, and multi-format document processing for PDFs, CBZ comics, and images.

0 installs

Trust: 34 — Low

Agents

Ask AI about Ocr MCP

I know everything about Ocr MCP. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

OCR-MCP

Complete AI OCR webapp and MCP server. A web app for people (drag‑and‑drop OCR, scanner, batch) and a FastMCP 3.1 MCP server for agentic IDEs—Claude, Cursor, Windsurf—so agents can run OCR, preprocessing, and workflows as tools. Same 10+ engines, WIA scanner (Windows), and pipelines; one repo.

Topics: ocr, mcp, fastmcp, document-processing, scanner, wia, pdf, computer-vision, model-context-protocol, llm

What it does

Web app — React (web_sota/) + FastAPI (backend/app.py): upload or scan, pick engine, get text/PDF/JSON. Ports 10858 (Vite) and 10859 (API). In-app Help (/help) documents the web UI, the MCP server, and OCR backends.
MCP server — FastMCP 3.1 stdio: tools for OCR, preprocessing, scanner, workflows. Sampling and agentic workflow (SEP-1577) supported. Same Python env and engines as the web backend; Mistral key for agents is typically MISTRAL_API_KEY in the client config (web Settings only affect the FastAPI process).

Features: 10+ backends (PaddleOCR-VL-1.5, DeepSeek-OCR-2, Mistral OCR, …) · Auto backend selection · Preprocessing (deskew, enhance, crop) · Layout & table extraction · Quality assessment · WIA scanner · Batch & pipelines · Multi-format export

Docs

Doc	Description
Install	Install, run MCP, Web UI (`start.ps1`, ports 10858/10859), PyYAML notes, client config
Backend deps	Web FastAPI backend: same venv as `ocr-mcp`, `pyproject.toml`, PyTorch, `OCR_AUTO_INSTALL_DEPS`
Technical	Architecture, tools, config, development, packaging
OCR models	Engines, capabilities, hardware (see also AI_MODELS.md)
Backend requirements	Per-model pip packages, system deps, env/config
AI features	Sampling, SEP-1577, agentic workflows, prompts
In-app Help	Source for `/help`: webapp vs MCP vs backends (mirrors INSTALL / TECHNICAL)
SOTA Compliance	🚀 Verified SOTA v12.0 Architecture

Also: JUSTFILE.md (just recipes) · OCR-MCP_MASTER_PLAN.md (roadmap) · tests/README.md (testing)

Quick start

uv sync
just run

Web UI (recommended): from repo root run web_sota\start.ps1 (PowerShell). It clears ports 10858/10859, runs uv sync, restores PyYAML if needed (see docs/INSTALL.md), starts the FastAPI backend in a new window, starts Vite in another window, then opens http://localhost:10858 in your browser.

Or: just webapp if your justfile wraps the same flow.

If the start script fails, use two terminals from the ocr-mcp repo root:

Terminal 1 (backend):
$env:PYTHONPATH = (Get-Location).Path; uv run uvicorn backend.app:app --host 127.0.0.1 --port 10859
Terminal 2 (frontend):
cd web_sota; npm run dev -- --port 10858 --host

Then open http://localhost:10858

Tests: uv sync --extra dev then uv run python -m pytest or python scripts/run_tests.py --suite quick. See tests/README.md.

License

MIT — see LICENSE.