☁️

Polyglot

Local GPU translation MCP server — TranslateGemma via Ollama, 57 languages, zero cloud dependency

0 installs

Trust: 37 — Low

Cloud

Ask AI about Polyglot

I know everything about Polyglot. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Local GPU translation MCP server — 57 languages, zero cloud dependency.

What it does

Translates text between 57 languages using TranslateGemma running locally on your GPU via Ollama. No API keys, no cloud, no rate limits — everything stays on your machine.

Quick Start

1. Install Ollama

Download from ollama.com and start it:

ollama serve

2. Pull a model

ollama pull translategemma:12b   # 8.1 GB — best quality/speed balance
# or
ollama pull translategemma:4b    # 3.3 GB — faster, lower quality
# or
ollama pull translategemma:27b   # 17 GB  — highest quality

Tip: You can skip this step — Polyglot auto-pulls the model on first use.

3. Add to your MCP client

Claude Code / Claude Desktop — add to claude_desktop_config.json or .mcp.json:

{
  "mcpServers": {
    "polyglot": {
      "command": "npx",
      "args": ["-y", "@mcptoolshop/polyglot-mcp"]
    }
  }
}

From source:

git clone https://github.com/mcp-tool-shop-org/polyglot-mcp.git
cd polyglot-mcp
npm install && npm run build
node dist/index.js

That's it. Ask Claude to translate something and it will use the translate tool automatically.

Tools

Polyglot exposes five MCP tools:

`translate`

Translate text between any supported language pair.

Parameter	Required	Description
`text`	yes	Text to translate
`from`	yes	Source language code or name (e.g., `en`, `English`)
`to`	yes	Target language code or name (e.g., `ja`, `Japanese`)
`model`	no	Ollama model (default: `translategemma:12b`)
`glossary`	no	Custom term overrides as `{"source": "translation"}` — merged with the built-in software glossary

Long text is automatically split into chunks at paragraph and sentence boundaries, translated in sequence, and reassembled. All translations are validated for quality (empty output, echo detection, truncation, garbled text).

`translate_markdown`

Translate an entire markdown document while preserving structure. Code blocks, HTML elements, badges, URLs, and table formatting are kept intact — only prose content (headings, paragraphs, taglines, table cells) is translated.

Parameter	Required	Description
`markdown`	yes	The full markdown content to translate
`from`	yes	Source language code or name
`to`	yes	Target language code or name
`model`	no	Ollama model (default: `translategemma:12b`)

`list_languages`

List all 57 supported languages with their codes.

`check_status`

Check if Ollama is running and which TranslateGemma models are installed. Attempts auto-start if Ollama isn't running.

`translate_all`

Translate markdown content into multiple languages at once (default: 7 — Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese). Runs translations concurrently with GPU-safe semaphore limiting.

Parameter	Required	Description
`markdown`	yes	The full markdown content to translate
`from`	no	Source language code (default: `en`)
`languages`	no	Array of target language codes (default: all 7)
`model`	no	Ollama model (default: `translategemma:12b`)
`concurrency`	no	Max concurrent translations (default: 2, max: 3)
`navBar`	no	Inject language nav bar (default: true)

Features

Auto-start & Auto-pull

Ollama is automatically started if it isn't running. The TranslateGemma model is automatically pulled if it isn't installed. Zero manual setup required.

Retry with Exponential Backoff

Transient Ollama failures (network blips, temporary overload) are automatically retried up to 2 times with exponential backoff (1 s, 2 s). Non-retryable errors (bad model name, invalid input) fail immediately.

Smart Chunking

Long text is split at natural boundaries — paragraphs, then sentences — so translation context is preserved. Chunk sizes adapt to the model: 2K chars for 2B/4B models, 4K for 12B, 6K for 27B.

Segment Cache

Translated segments are cached by content hash (SHA-256 of source text + target language + model). Unchanged segments skip re-translation entirely. Cache lives in .polyglot-cache.json with a 30-day TTL.

Translation Memory (Fuzzy Cache)

When an exact cache hit isn't found, Polyglot checks for near-miss segments using Levenshtein similarity. If a cached source is ≥85% similar to the current segment, the existing translation is reused. This dramatically speeds up retranslation after minor README edits.

Concurrency Semaphore

All Ollama calls are guarded by a counting semaphore (default limit: 1) to prevent GPU OOM on systems with limited VRAM. Override with POLYGLOT_CONCURRENCY:

POLYGLOT_CONCURRENCY=2 npx @mcptoolshop/polyglot-mcp

MCP Progress Tokens

All tools report progress via MCP notifications/progress when the client provides a progressToken. Translate reports per-chunk, translate_markdown per-segment-batch, translate_all per-language, and check_status per-step.

Software Glossary

A built-in glossary of 12 technical terms (API, CLI, SDK, etc.) ensures consistent translation of software terminology. Custom glossary entries can be passed per-request and are merged with the defaults.

Batch Translation

translateBatch groups multiple segments into a single prompt where possible, reducing round-trips. Falls back to individual translation if the batch separator is mangled.

Configurable Default Model

Set the POLYGLOT_MODEL environment variable to override the default model:

POLYGLOT_MODEL=translategemma:27b npx @mcptoolshop/polyglot-mcp

Structured Errors

All errors use PolyglotError with a machine-readable code (MODEL_NOT_FOUND, OLLAMA_UNAVAILABLE, TRANSLATION_FAILED, etc.), a human-readable message, an optional hint, and a retryable flag.

Output Validation

Every translation is automatically validated: empty output throws (retryable), source-text echo is flagged, severe truncation and hallucination blowup are warned, garbled encoding and model meta-commentary are detected. Warnings appear in the MCP tool response.

Streaming

OllamaClient.generateStream() yields tokens via NDJSON as Ollama produces them. The translate() function accepts an onToken callback for real-time progress display. Both streaming and non-streaming paths share retry logic.

Supported Languages

Afrikaans, Albanian, Arabic, Bengali, Bulgarian, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Maltese, Marathi, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Scottish Gaelic, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh.

Performance

On an RTX 5080 (16 GB VRAM) with TranslateGemma 12B (Q4):

Metric	Value
First translation (cold model load)	~15 s
Subsequent translations	~600 ms
VRAM usage	~8.1 GB
Long text (per chunk)	~600 ms

Architecture

MCP Client (Claude Code, etc.)
      │
      │  MCP protocol (stdio)
      ▼
┌──────────────────┐
│    index.ts      │  MCP server — 5 tools: translate, translate_markdown,
│                  │  translate_all, list_languages, check_status
├──────────────────┤
│  translate.ts    │  Prompt building, chunking, batch mode, streaming
├──────────────────┤
│translateMarkdown │  Markdown-aware segmentation, table parsing, reassembly
├──────────────────┤
│ translateAll.ts  │  Multi-language orchestrator with nav bar injection
├──────────────────┤
│  semaphore.ts    │  Counting semaphore for GPU-safe concurrency
├──────────────────┤
│   validate.ts    │  Output validation (empty, echo, truncation, garble)
├──────────────────┤
│   ollama.ts      │  HTTP client — auto-start, auto-pull, retry, streaming
├──────────────────┤
│   cache.ts       │  Segment cache + fuzzy translation memory
├──────────────────┤
│  glossary.ts     │  Software term dictionary
├──────────────────┤
│   polish.ts      │  Post-translation artifact cleanup
├──────────────────┤
│  languages.ts    │  57 language definitions
├──────────────────┤
│   errors.ts      │  PolyglotError structured error class
└──────────────────┘
      │
      │  HTTP (localhost:11434)
      ▼
   Ollama + TranslateGemma (GPU)

Security & Data Scope

Aspect	Detail
Data touched	Text sent to local Ollama API (`localhost:11434`), `.polyglot-cache.json` segment cache
Data NOT touched	No files outside working directory, no browser data, no OS credentials
Network	HTTP to `localhost:11434` only — zero external/internet egress
Telemetry	None collected or sent

See SECURITY.md for the vulnerability reporting policy.

Development

npm install             # install deps
npm run typecheck       # type-check without emitting
npm test                # run 256 unit tests (vitest)
npm run build           # compile TypeScript to dist/
npm run verify          # typecheck + test + build + pack (full gate)

License

MIT — see LICENSE.

Built by MCP Tool Shop