io.github.hwang-yh-cto/space-ocr
Multilingual structured OCR with verified per-character bboxes for AI agents.
Ask AI about io.github.hwang-yh-cto/space-ocr
Powered by Claude Β· Grounded in docs
I know everything about io.github.hwang-yh-cto/space-ocr. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
space-ocr-mcp
MCP (Model Context Protocol) server for space-ocr β structured OCR with verified per-character bounding boxes.
Why
Unlike calling Gemini/GPT-4V directly, space-ocr re-anchors LLM output to real Google Vision API symbols, so bounding boxes are not hallucinated. AI agents that act on the extracted data (auto-fill, verification UI, accounting reconciliation) can trust the coordinates.
Tools
ocr_extractβ Extract structured fields from a document image. Passtemplate_idfor built-in document types orfieldsfor custom schemas.list_templatesβ List built-in document templates (receipt,invoice,purchase_order,delivery,quote,bankbook,resident_card,driver_license,passport).
Install & run
npx -y space-ocr-mcp
Set SPACE_OCR_API_KEY (issue one at space-ocr.com β Settings β API Keys).
Claude Desktop config
~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"space-ocr": {
"command": "npx",
"args": ["-y", "space-ocr-mcp"],
"env": { "SPACE_OCR_API_KEY": "YOUR_API_KEY" }
}
}
}
Restart Claude Desktop. You should see the space-ocr tools available.
Cursor / Windsurf / other MCP clients
Use the same command / args / env pattern in their MCP configuration UI.
Image inputs
ocr_extract accepts:
- A public URL (
https://...) - A local file path (
/path/to/file.jpgβ auto base64-encoded) - A base64 string
- A
data:image/...;base64,...URI
Pricing
Β₯10 per call (flat), billed against the same Charge Amount balance as the REST API. Failed calls are auto-refunded. Out-of-balance returns an error with no charge.
License
MIT
