📈

DeepSeek OCR WebUI

0 installs

Trust: 52 — Fair

Analytics

Ask AI about DeepSeek OCR WebUI

I know everything about DeepSeek OCR WebUI. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

🔍 DeepSeek-OCR-WebUI

Visit Application →

🌐 English | 简体中文 | 繁體中文 | 日本語

Intelligent OCR System · Vue 3 Modern UI · Batch Processing · Multi-Mode Support

Features • Quick Start • Screenshots • Contributors

🎉 v4.1 Update: UI Improvements & Model Version Display

v4.1 OCR-2 UI

Header shows OCR-2 model badge · Footer displays v4.1 · OCR-2

🏷️ OCR-2 Model Badge — Header now shows a prominent OCR-2 badge so users instantly know the model version
🎨 Table Rendering Fix — OCR-detected tables now display with white backgrounds, dark text, and zebra striping for clear readability (previously appeared as dark/unreadable blocks)
📡 Health API model_version — /health endpoint now returns "model_version": "DeepSeek-OCR-2" for programmatic version detection
🔖 Footer Version — Updated to v4.1 · OCR-2

🎉 v4.0 Update: DeepSeek-OCR-2 Model Upgrade!

🚀 Major model upgrade to DeepSeek-OCR-2 (Visual Causal Flow) — better accuracy, higher resolution!

✨ What's New in v4.0

🧠 DeepSeek-OCR-2 Model - Upgraded to the latest DeepSeek-OCR-2 with Visual Causal Flow architecture
🔬 Higher Resolution - Dynamic resolution up to (0-6)×768×768 + 1×1024×1024 (was 640×640)
⚡ Flash Attention 2 - Native flash_attention_2 support on CUDA for optimal inference speed
🎯 Improved Accuracy - Better document understanding, chart parsing, and text recognition
🔄 Full Backward Compatibility - All 7 recognition modes, REST API, and frontend unchanged
🐳 Docker v4.0 - New all-in-one image with pre-downloaded OCR-2 model (Dockerfile.v4.0)
📦 Unified Tokenizer - Switched from AutoProcessor to AutoTokenizer (aligned with official OCR-2 API)

🔧 Technical Changes

Component	v3.6 (OCR v1)	v4.0 (OCR-2)
Model	`deepseek-ai/DeepSeek-OCR`	`deepseek-ai/DeepSeek-OCR-2`
`image_size`	640	768
Attention	`eager`	`flash_attention_2` (CUDA)
Tokenizer	`AutoProcessor`	`AutoTokenizer`
Resolution	Fixed crops	Dynamic (0-6)×768 + 1×1024

💡 All existing features from v3.6 (concurrency, rate limiting, queue management, Vue 3 frontend) are fully preserved.

🎉 v3.6 Update: Backend Concurrency & Rate Limiting!

🚀 Performance optimization with smart queue management and rate limiting!

✨ What's New in v3.6

⚡ Backend Concurrency Optimization - Non-blocking inference with ThreadPoolExecutor
🔒 Rate Limiting - Per-client and per-IP request limits (X-Client-ID header support)
📊 Queue Management - Real-time queue status with position tracking
🏥 Enhanced Health API - Queue depth, status (healthy/busy/full), and rate limit info
🌐 New Languages - Added Traditional Chinese (zh-TW) and Japanese (ja-JP)
🎯 429 Error Handling - Graceful handling when queue is full or rate limited

🙏 Contributors: @cloudman6 (PR #41)

🎉 v3.5 Major Update: Brand New Vue 3 Frontend!

🚀 Complete UI Overhaul with Modern Vue 3 + TypeScript Architecture!

Home Page	Processing Page

✨ What's New in v3.5

🎨 Brand New Vue 3 UI - Modern, responsive design with Naive UI components
⚡ TypeScript Support - Full type safety and better developer experience
📦 Dexie.js Database - Local IndexedDB for offline page management
🔄 Real-time Processing Queue - Visual OCR progress with queue management
🏥 Health Check System - Backend status monitoring with visual indicators
📄 Enhanced PDF Support - Smooth PDF rendering with page-by-page processing
🌐 i18n Ready - Built-in internationalization (EN/CN/TW/JP)
🧪 E2E Testing - Comprehensive Playwright test coverage

👥 Contributors

🌟 Special Thanks to Our Amazing Contributors! 🌟

This project is the result of an outstanding collaboration. The Vue 3 frontend was developed through a successful merge of PR #34.

_CloudMan
_{🏆 Vue 3 Frontend Lead Developer}
_{164 commits · Complete UI Rewrite}

_neosun100
_{🎯 Project Maintainer}
_{Backend · Docker · Integration}

💡 About the Vue 3 Frontend: @cloudman6 contributed an exceptional Vue 3 + TypeScript frontend with 164 commits, including comprehensive E2E tests, modern UI components, and production-ready architecture. This collaboration transformed DeepSeek-OCR-WebUI into a professional-grade application!

📖 Introduction

DeepSeek-OCR-WebUI is an intelligent document recognition web application powered by the DeepSeek-OCR model. It provides a modern, intuitive interface for converting images and PDFs to structured text with high accuracy.

✨ Core Highlights

Feature	Description
🎯 7 Recognition Modes	Document, OCR, Chart, Find, Freeform, and more
🖼️ Bounding Box Visualization	Find mode with automatic position annotation
📦 Batch Processing	Process multiple images/pages sequentially
📄 PDF Support	Upload PDFs, auto-convert to images
🎨 Modern Vue 3 UI	Responsive design with Naive UI
🌐 Multilingual	EN, 简体中文, 繁體中文, 日本語
🍎 Apple Silicon	Native MPS acceleration for M1/M2/M3/M4
🐳 Docker Ready	One-command deployment
⚡ GPU Acceleration	NVIDIA CUDA support

🚀 Features

7 Recognition Modes

Mode	Icon	Description	Use Cases
Doc to Markdown	📄	Preserve format and layout	Contracts, papers, reports
General OCR	📝	Extract all visible text	Image text extraction
Plain Text	📋	Pure text without format	Simple text recognition
Chart Parser	📊	Recognize charts and formulas	Data charts, math formulas
Image Description	🖼️	Generate detailed descriptions	Image understanding
Find & Locate	🔍	Find and annotate positions	Invoice field locating
Custom Prompt	✨	Customize recognition needs	Flexible tasks

🆕 Vue 3 Frontend Features

┌─────────────────────────────────────────────────────────────┐
│  📁 Page Sidebar          │  📄 Document Viewer             │
│  ├─ Thumbnail List        │  ├─ High-res Image Display      │
│  ├─ Drag & Drop Reorder   │  ├─ OCR Overlay Toggle          │
│  ├─ Batch Selection       │  ├─ Zoom Controls               │
│  └─ Quick Actions         │  └─ Status Indicators           │
├─────────────────────────────────────────────────────────────┤
│  🔄 Processing Queue      │  📝 Result Panel                │
│  ├─ Real-time Progress    │  ├─ Markdown Preview            │
│  ├─ Cancel/Retry          │  ├─ Word/PDF Export             │
│  └─ Health Monitoring     │  └─ Copy to Clipboard           │
└─────────────────────────────────────────────────────────────┘

🖼️ Screenshots

Home Page

Vue3 Home Page

Clean, modern landing page with quick access to all features

Processing Interface

Vue3 Processing Page

Full-featured document processing with sidebar, viewer, and results panel

Quick Start Guide

Step-by-step guide: Import files → Select pages → Choose OCR mode → Get results

📦 Quick Start

🐳 Docker (Recommended)

# Pull and run
docker pull neosun/deepseek-ocr:v4.1
docker run -d \
  --name deepseek-ocr \
  --gpus all \
  -p 8001:8001 \
  --shm-size=8g \
  neosun/deepseek-ocr:v4.1

# Access: http://localhost:8001

Available Docker Tags

Tag	Description
`latest`	Latest stable (= v4.1)
`v4.1`	UI improvements & model version display
`v4.0`	DeepSeek-OCR-2 model upgrade
`v3.6`	Backend concurrency & rate limiting
`v3.5`	Vue 3 frontend version
`v3.3.1-fix-bfloat16`	BFloat16 compatibility fix

🍎 Mac (Apple Silicon)

# Clone and setup
git clone https://github.com/neosun100/DeepSeek-OCR-WebUI.git
cd DeepSeek-OCR-WebUI

# Create conda environment
conda create -n deepseek-ocr python=3.11
conda activate deepseek-ocr

# Install dependencies
pip install -r requirements-mac.txt

# Start service
./start.sh
# Access: http://localhost:8001

🐧 Linux (Native)

# With NVIDIA GPU
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
./start.sh

🔌 API & Integration

REST API

import requests

# Single image OCR
with open("image.png", "rb") as f:
    response = requests.post(
        "http://localhost:8001/ocr",
        files={"file": f},
        data={"prompt_type": "ocr"}
    )
    print(response.json()["text"])

# PDF OCR (all pages)
with open("document.pdf", "rb") as f:
    response = requests.post(
        "http://localhost:8001/ocr-pdf",
        files={"file": f},
        data={"prompt_type": "document"}
    )
    print(response.json()["merged_text"])

Endpoints:

GET /health - Health check
POST /ocr - Single image OCR
POST /ocr-pdf - PDF OCR (all pages)
POST /pdf-to-images - Convert PDF to images

📖 Full API Documentation: API.md

MCP (Model Context Protocol)

Enable AI assistants like Claude Desktop to use OCR:

{
  "mcpServers": {
    "deepseek-ocr": {
      "command": "python",
      "args": ["/path/to/mcp_server.py"]
    }
  }
}

📖 MCP Setup Guide: MCP_SETUP.md

🌐 Multilingual Support

Language	Code	Status
🇺🇸 English	en-US	✅ Default
🇨🇳 简体中文	zh-CN	✅
🇹🇼 繁體中文	zh-TW	✅
🇯🇵 日本語	ja-JP	✅

Switch language via the selector in the top-right corner.

📊 Version History

v4.1 (2026-02-20) - UI Improvements & Model Version Display

🏷️ UI & API Enhancements:

✅ OCR-2 model badge in header for instant version recognition
✅ Table rendering fix: white background, dark text, zebra striping
✅ Health API returns model_version: "DeepSeek-OCR-2"
✅ Footer updated to v4.1 · OCR-2

v4.0 (2026-02-20) - DeepSeek-OCR-2 Model Upgrade

🧠 Major Model Upgrade:

✅ Upgraded to DeepSeek-OCR-2 (Visual Causal Flow)
✅ Dynamic resolution: (0-6)×768×768 + 1×1024×1024
✅ Flash Attention 2 on CUDA for optimal inference speed
✅ Switched from AutoProcessor to AutoTokenizer
✅ image_size upgraded from 640 to 768
✅ New Dockerfile.v4.0 with pre-downloaded OCR-2 model
✅ Full backward compatibility with all v3.6 features

v3.6 (2026-01-20) - Backend Concurrency & Rate Limiting

⚡ Performance Optimization:

✅ Non-blocking inference with ThreadPoolExecutor
✅ Concurrency control with asyncio.Semaphore (OCR: 1, PDF: 2)
✅ Queue system with MAX_OCR_QUEUE_SIZE and dynamic status
✅ Per-IP and per-Client-ID rate limiting (X-Client-ID header)
✅ 429 error handling (queue full, client limit, IP limit)
✅ Health indicator with 3 status colors (green/yellow/red)
✅ OCR queue popover with real-time position display