CopilotLocalRouter
Route simple AI coding tasks to a local Ollama model β saving cloud tokens without interrupting your workflow. Run as an MCP stdio server via `dnx CopilotLocalRouter`.
Ask AI about CopilotLocalRouter
Powered by Claude Β· Grounded in docs
I know everything about CopilotLocalRouter. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
CopilotLocalRouter
Route simple AI coding tasks to a local Ollama model β saving cloud tokens without interrupting your workflow.
What is this?
GitHub Copilot, Claude Code, and Cursor send every request to an expensive cloud model β even trivial ones like "write a method to add two integers" or "explain what this loop does." CopilotLocalRouter sits between your AI assistant and the cloud, intercepting simple tasks and handling them locally with Ollama.
Complex tasks (multi-file refactors, architecture decisions) are passed through to the cloud unchanged.
Your AI Assistant (Copilot / Claude / Cursor)
β
βΌ MCP stdio transport
βββββββββββββββββββββββ
β CopilotLocalRouter β
β βββββββββββββββββ β
β β Classifier β β β scores prompt complexity
β ββββββββ¬βββββββββ β
βββββββββββΌββββββββββββ
β
Simple / Medium βββββββββββΊ Ollama (local LLM, ~50ms)
β
Complex
β
βΌ
[skip] signal βββββββββββΊ Cloud model (unchanged)
Features
- 5 MCP tools β generate, explain, refactor, review, and test generation
- Automatic routing β heuristic classifier scores every prompt; no manual tagging required
- Transparent fallback β returns
[skip]so your AI assistant silently falls back to the cloud - Circuit breaker β if Ollama goes down, requests fall through to cloud immediately
- Prompt cache β identical prompts served from an LRU cache (SHA256-keyed, 60 min TTL)
- Cost tracking β logs estimated token savings every N requests
- OTel metrics β
System.Diagnostics.Metricscompatible; connect any OTel collector - Zero config default β works out of the box with
qwen2.5-coderonlocalhost:11434
Table of Contents
- Prerequisites
- Installation
- Quick Start
- Supported AI Clients
- Available Tools
- Recommended Models
- Configuration
- How Routing Works
- Architecture
- Resilience
- Telemetry
- Contributing
- Troubleshooting
- Changelog
- License
Prerequisites
| Requirement | Version | Notes |
|---|---|---|
| .NET SDK | 10.0+ | Includes dnx β no separate tool install needed |
| Ollama | 0.2.0+ | Must be running locally or accessible on the network |
| AI Assistant | Any | GitHub Copilot, Claude Code, Cursor, or any MCP-compatible client |
Installation
There is nothing to install. dnx is a tool execution script included with the .NET 10 SDK that works like npx β it downloads and runs a .NET tool on demand. Add the config block for your AI client (see Quick Start) and dnx handles the rest automatically.
Pinning a version: Use
CopilotLocalRouter@0.1.0in theargsarray to lock to a specific release. Omitting the version always uses the latest.
Build from Source
git clone https://github.com/michaelstonis/CopilotLocalRouter.git
cd CopilotLocalRouter
dotnet build
dotnet test
Quick Start
1. Start Ollama and pull a model:
ollama pull qwen2.5-coder
2. Add to your AI client's MCP config (no prior install needed β dnx downloads the tool on first run):
VS Code / GitHub Copilot β .vscode/mcp.json
{
"servers": {
"copilot-local-router": {
"command": "dnx",
"args": ["CopilotLocalRouter"],
"env": {
"OLLAMA_BASE_URL": "http://localhost:11434",
"OLLAMA_MODEL": "qwen2.5-coder"
}
}
}
}
Claude Code β .claude/mcp.json
{
"mcpServers": {
"copilot-local-router": {
"command": "dnx",
"args": ["CopilotLocalRouter"],
"env": {
"OLLAMA_BASE_URL": "http://localhost:11434",
"OLLAMA_MODEL": "qwen2.5-coder"
}
}
}
}
Or via CLI:
claude mcp add copilot-local-router dnx -- CopilotLocalRouter
Cursor β .cursor/mcp.json
{
"mcpServers": {
"copilot-local-router": {
"command": "dnx",
"args": ["CopilotLocalRouter"],
"env": {
"OLLAMA_BASE_URL": "http://localhost:11434",
"OLLAMA_MODEL": "qwen2.5-coder"
}
}
}
}
3. Restart your AI client and verify:
Ask your assistant: "What MCP tools are available?"
You should see: local_code_generate, local_code_explain, local_code_refactor, local_code_review, local_test_generate.
Supported AI Clients
| Client | Support | Config file |
|---|---|---|
| VS Code + GitHub Copilot | β Full | .vscode/mcp.json |
| Claude Code | β Full | .claude/mcp.json |
| Cursor | β Full | .cursor/mcp.json |
| Any MCP stdio client | β Full | Client-specific |
Available Tools
| Tool name | When it's used | Skips to cloud when |
|---|---|---|
local_code_generate | Write a function, class, method, or boilerplate | Task spans multiple files or requires architectural decisions |
local_code_explain | Explain what code does, how an algorithm works | Explanation requires deep multi-file context |
local_code_refactor | Clean up, rename, extract, simplify within one file | Refactor spans multiple files or involves breaking changes |
local_code_review | Review code for bugs, smells, and naming issues | Full security audit or codebase-wide review is requested |
local_test_generate | Write unit tests for a function or class | Integration tests or complex mocking is required |
Recommended Models
| Model | Size | Best for | Command |
|---|---|---|---|
qwen2.5-coder | 4.7 GB | Code generation, refactoring, tests | ollama pull qwen2.5-coder |
gemma3:4b | 3.3 GB | Explanations, code review, low-RAM machines | ollama pull gemma3:4b |
deepseek-coder-v2 | 8.9 GB | Highest quality code tasks | ollama pull deepseek-coder-v2 |
codellama | 3.8 GB | General-purpose coding (older baseline) | ollama pull codellama |
Tip:
qwen2.5-coderis the recommended default β it offers the best balance of speed, quality, and memory footprint for code tasks.
Configuration
All configuration is done via environment variables in your MCP client config, or as appsettings.json overrides when building from source.
Environment Variables
| Variable | Default | Description |
|---|---|---|
OLLAMA_BASE_URL | http://localhost:11434 | Ollama server URL |
OLLAMA_MODEL | qwen2.5-coder | Default model for all tools |
ROUTER_ENABLED | true | Set to false to disable routing (all tasks go to cloud) |
Routing Thresholds
Control how aggressively tasks are routed locally by setting these in appsettings.json (source builds) or by contributing an env-var override:
| Setting | Default | Effect |
|---|---|---|
Router:SimpleConfidenceThreshold | 0.75 | Minimum score to route a "simple" task locally |
Router:MediumConfidenceThreshold | 0.50 | Minimum score to route a "medium" task locally |
Router:MaxTokensSimple | 500 | Token count upper bound for "simple" classification |
Router:MaxTokensMedium | 1500 | Token count upper bound for "medium" classification |
More aggressive local routing (accept more tasks locally, at lower quality confidence):
"SimpleConfidenceThreshold": 0.60,
"MediumConfidenceThreshold": 0.40
More conservative (only high-confidence simple tasks handled locally):
"SimpleConfidenceThreshold": 0.90,
"MediumConfidenceThreshold": 0.70
Cost Tracking
"CostTracking": {
"Enabled": true,
"CloudInputTokenRate": 0.003,
"CloudOutputTokenRate": 0.015,
"Currency": "USD",
"LogSummaryEveryNRequests": 100
}
Rates are per 1,000 tokens. The estimator uses conservative sizing based on average Ollama response sizes.
How Routing Works
Every prompt is scored by a heuristic classifier across four signals:
- Keyword analysis β task verbs (
generate,explain,refactor) and complexity indicators (across all,entire codebase,architecture) - Token count β prompts over 500 tokens are unlikely to be simple tasks
- Structural signals β multi-file references, import counts, line counts
- Task type match β each tool has a baseline complexity expectation
Scores above SimpleConfidenceThreshold β local. Scores above MediumConfidenceThreshold β local (medium tier). Everything else β [skip].
Architecture
src/
CopilotLocalRouter.Core/ # Domain logic
Classification/ # Heuristic task classifier
Routing/ # Tiered request router
Agents/ # AgentManager β Ollama conversation executor
Resilience/ # Circuit breaker, LRU prompt cache, normalizer
Telemetry/ # Metrics, cost estimator, quality scorer
Configuration/ # RouterOptions, ModelProfile
Interfaces/ # IRequestRouter, IAgentManager, ITaskClassifier
CopilotLocalRouter.Ollama/ # OllamaSharp IChatClient integration
CopilotLocalRouter.McpTools/ # MCP tool definitions (5 tools)
CopilotLocalRouter.Host/ # Startup, DI, appsettings, health check
tests/
CopilotLocalRouter.Core.Tests/ # Unit tests β classifier, resilience, telemetry
CopilotLocalRouter.McpTools.Tests/ # Integration tests β MCP tool end-to-end
benchmarks/
CopilotLocalRouter.Benchmarks/ # BenchmarkDotNet β classification + cache key perf
Key dependencies:
| Package | Version | Role |
|---|---|---|
ModelContextProtocol | 1.2.0 | MCP stdio server |
OllamaSharp | 5.4.25 | Ollama IChatClient implementation |
Microsoft.Extensions.AI | 10.5.0 | Middleware pipeline, IChatClient abstractions |
Microsoft.Extensions.Hosting | 10.0.7 | DI, configuration, lifetime management |
Resilience
| Feature | Behaviour |
|---|---|
| Circuit breaker | Opens after 3 consecutive Ollama failures; auto-resets after 30 seconds |
| Prompt cache | LRU cache, 500 entries, 60-minute TTL, SHA256-keyed |
| Retry | Up to 2 retries with 1s / 3s exponential backoff |
| Graceful degradation | Any failure returns [skip] β the user's request is never blocked |
Telemetry
OTel-compatible metrics emitted via System.Diagnostics.Metrics (meter name: CopilotLocalRouter):
| Metric | Type | Description |
|---|---|---|
router.requests.total | Counter | All requests received |
router.requests.local | Counter | Requests handled by Ollama |
router.requests.skipped | Counter | Requests returned to cloud |
router.cache.hits | Counter | Prompt cache hits |
router.cache.misses | Counter | Prompt cache misses |
router.circuit.state_changes | Counter | Circuit breaker state transitions |
router.classification.duration_ms | Histogram | Time to classify a prompt |
router.agent.duration_ms | Histogram | Time for Ollama to respond |
router.agent.response_tokens | Histogram | Tokens in Ollama response |
router.agents.active | Gauge | Concurrent Ollama calls in flight |
Connect any OpenTelemetry collector (Prometheus, Jaeger, Grafana, etc.) to the standard OTel endpoint.
Contributing
Contributions are welcome. Please open an issue first if you're planning a significant change.
git clone https://github.com/michaelstonis/CopilotLocalRouter.git
cd CopilotLocalRouter
dotnet restore
dotnet build
dotnet test
Guidelines:
- All new routing logic should have classifier unit tests
- MCP tool changes require integration tests in
CopilotLocalRouter.McpTools.Tests - Keep tool descriptions tightly tuned β they directly affect AI agent tool selection
Troubleshooting
See docs/troubleshooting.md for common issues including:
- Ollama not reachable / tools not appearing in assistant
- All tasks returning
[skip](threshold tuning) - Circuit breaker stuck open
- Model not found errors
Changelog
See CHANGELOG.md for release history.
License
MIT β see LICENSE for details.
