Offload MCP
MCP server that offloads routine AI coding tasks to free LLM APIs
Ask AI about Offload MCP
Powered by Claude · Grounded in docs
I know everything about Offload MCP. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
offload-mcp
MCP server for offloading routine coding-assistant work to a cheaper model.
The default model chain uses Gemma because the models are useful, open, and fun to experiment with. Running them locally can be heavy on RAM, GPU, and setup; the Gemini API (key from Google AI Studio) makes them easy to use for small routine tasks at almost no cost. You can use any supported model ID.
Install
Get a free API key from https://aistudio.google.com/apikey.
Choose one install method.
Option 1: npx (recommended)
npx downloads and runs offload-mcp@latest on demand. You do not need to install the package globally. Your MCP client runs this command whenever it starts the server.
JSON-style MCP config:
{
"mcpServers": {
"offload-mcp": {
"command": "npx",
"args": ["offload-mcp@latest"],
"env": { "GOOGLE_AI_API_KEY": "your_key" }
}
}
}
TOML-style MCP config:
[mcp_servers.offload-mcp]
command = "npx"
args = ["offload-mcp@latest"]
env = { GOOGLE_AI_API_KEY = "your_key" }
To test that npm can resolve the package:
npx offload-mcp@latest
That starts an MCP stdio server, so it will wait for an MCP client instead of printing a normal CLI screen.
Option 2: global npm install
Install once:
npm install -g offload-mcp
Then use the binary directly in your MCP config.
JSON-style MCP config:
{
"mcpServers": {
"offload-mcp": {
"command": "offload-mcp",
"env": { "GOOGLE_AI_API_KEY": "your_key" }
}
}
}
TOML-style MCP config:
[mcp_servers.offload-mcp]
command = "offload-mcp"
env = { GOOGLE_AI_API_KEY = "your_key" }
To update a global install later:
npm update -g offload-mcp
Use
Ask your assistant to offload routine work:
offload a commit message for the current diff
offload this translation to Mexican Spanish: <text>
use offload to summarize src/index.ts
For local diffs and files, offload_source is the important path because the MCP server reads the input directly:
offload_source(task="commit_message", source="git_diff")
offload_source(task="pr_description", source="git_staged_diff")
offload_source(task="code_summary", source="file", path="src/index.ts")
Footer example:
—— Offloaded via gemma-4-31b-it · 307 model tokens · ~1,420 primary input tokens avoided · [offload-mcp](https://github.com/peterhadorn/offload-mcp)
model tokens come from the API response. primary input tokens avoided is an estimate and only appears when using offload_source.
Tasks
commit_message
pr_description
code_summary
translate
changelog_entry
naming_suggestion
classify
extract_data
code_review_single
docstring
subject_lines
freeform
Use freeform for anything else:
offload(task="freeform", content="ECONNREFUSED 10.0.1.5:5432", prompt="Rewrite as a user-friendly error message. Output only the message.")
Status
offload_status shows local usage counters:
Today: 47/14400 calls (0.3%), 28,500 model tokens processed
Month: 312 calls over 8 days (avg 39/day), 187,400 model tokens processed
Estimated primary input avoided: today ~12,800 tokens, month ~74,200 tokens
Tasks today:
commit_message: 18
docstring: 12
code_summary: 9
Stats are stored locally at ~/.offload-mcp/usage.json by default. Only counters are stored, not task content.
Config
| Env var | Default | Description |
|---|---|---|
GOOGLE_AI_API_KEY | - | Required |
OFFLOAD_MODEL | gemma-4-31b-it | Preferred model |
OFFLOAD_FALLBACK_MODELS | gemma-4-26b-a4b-it | Comma-separated fallback models |
OFFLOAD_TIMEOUT_MS | 20000 | Per-model request timeout |
OFFLOAD_RETRIES_PER_MODEL | 1 | Attempts per model before falling back (1 = no retry) |
OFFLOAD_RPD_LIMIT | 14400 | Local daily call limit. Lower it if your Gemini API account has a stricter quota. |
OFFLOAD_LOG_PATH | ~/.offload-mcp/usage.json | Local usage stats |
By default, requests try gemma-4-31b-it first and fall back to gemma-4-26b-a4b-it on timeouts, rate limits, and transient server errors. Set OFFLOAD_FALLBACK_MODELS= to disable fallback.
Data
offload-mcp sends task content to the configured Gemini API model. Do not offload secrets, private customer data, proprietary code, or regulated data unless your policy allows it.
offload_source with source="file" reads any file path the MCP server process can access. Treat the path and cwd parameters as trusted local input from your MCP client.
MIT
