📦

CopilotLocalRouter

Route simple AI coding tasks to a local Ollama model — saving cloud tokens without interrupting your workflow. Run as an MCP stdio server via `dnx CopilotLocalRouter`.

0 installs

Trust: 34 — Low

Blockchain

Ask AI about CopilotLocalRouter

I know everything about CopilotLocalRouter. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

CopilotLocalRouter

Route simple AI coding tasks to a local Ollama model — saving cloud tokens without interrupting your workflow.

What is this?

GitHub Copilot, Claude Code, and Cursor send every request to an expensive cloud model — even trivial ones like "write a method to add two integers" or "explain what this loop does." CopilotLocalRouter sits between your AI assistant and the cloud, intercepting simple tasks and handling them locally with Ollama.

Complex tasks (multi-file refactors, architecture decisions) are passed through to the cloud unchanged.

Your AI Assistant (Copilot / Claude / Cursor)
         │
         ▼  MCP stdio transport
  ┌─────────────────────┐
  │  CopilotLocalRouter │
  │  ┌───────────────┐  │
  │  │  Classifier   │  │  ← scores prompt complexity
  │  └──────┬────────┘  │
  └─────────┼───────────┘
            │
     Simple / Medium ──────────► Ollama (local LLM, ~50ms)
            │
          Complex
            │
            ▼
       [skip] signal ──────────► Cloud model (unchanged)

Features

5 MCP tools — generate, explain, refactor, review, and test generation
Automatic routing — heuristic classifier scores every prompt; no manual tagging required
Transparent fallback — returns [skip] so your AI assistant silently falls back to the cloud
Circuit breaker — if Ollama goes down, requests fall through to cloud immediately
Prompt cache — identical prompts served from an LRU cache (SHA256-keyed, 60 min TTL)
Cost tracking — logs estimated token savings every N requests
OTel metrics — System.Diagnostics.Metrics compatible; connect any OTel collector
Zero config default — works out of the box with qwen2.5-coder on localhost:11434

Prerequisites
Installation
Quick Start
Supported AI Clients
Available Tools
Recommended Models
Configuration
How Routing Works
Architecture
Resilience
Telemetry
Contributing
Troubleshooting
Changelog
License

Prerequisites

Requirement	Version	Notes
.NET SDK	10.0+	Includes `dnx` — no separate tool install needed
Ollama	0.2.0+	Must be running locally or accessible on the network
AI Assistant	Any	GitHub Copilot, Claude Code, Cursor, or any MCP-compatible client

Installation

There is nothing to install. dnx is a tool execution script included with the .NET 10 SDK that works like npx — it downloads and runs a .NET tool on demand. Add the config block for your AI client (see Quick Start) and dnx handles the rest automatically.

Pinning a version: Use CopilotLocalRouter@0.1.0 in the args array to lock to a specific release. Omitting the version always uses the latest.

Build from Source

git clone https://github.com/michaelstonis/CopilotLocalRouter.git
cd CopilotLocalRouter
dotnet build
dotnet test

Quick Start

1. Start Ollama and pull a model:

ollama pull qwen2.5-coder

2. Add to your AI client's MCP config (no prior install needed — dnx downloads the tool on first run):

VS Code / GitHub Copilot — .vscode/mcp.json

{
  "servers": {
    "copilot-local-router": {
      "command": "dnx",
      "args": ["CopilotLocalRouter"],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_MODEL": "qwen2.5-coder"
      }
    }
  }
}

Claude Code — .claude/mcp.json

{
  "mcpServers": {
    "copilot-local-router": {
      "command": "dnx",
      "args": ["CopilotLocalRouter"],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_MODEL": "qwen2.5-coder"
      }
    }
  }
}

Or via CLI:

claude mcp add copilot-local-router dnx -- CopilotLocalRouter

Cursor — .cursor/mcp.json

{
  "mcpServers": {
    "copilot-local-router": {
      "command": "dnx",
      "args": ["CopilotLocalRouter"],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_MODEL": "qwen2.5-coder"
      }
    }
  }
}

3. Restart your AI client and verify:

Ask your assistant: "What MCP tools are available?"

You should see: local_code_generate, local_code_explain, local_code_refactor, local_code_review, local_test_generate.

Supported AI Clients

Client	Support	Config file
VS Code + GitHub Copilot	✅ Full	`.vscode/mcp.json`
Claude Code	✅ Full	`.claude/mcp.json`
Cursor	✅ Full	`.cursor/mcp.json`
Any MCP stdio client	✅ Full	Client-specific

Available Tools

Tool name	When it's used	Skips to cloud when
`local_code_generate`	Write a function, class, method, or boilerplate	Task spans multiple files or requires architectural decisions
`local_code_explain`	Explain what code does, how an algorithm works	Explanation requires deep multi-file context
`local_code_refactor`	Clean up, rename, extract, simplify within one file	Refactor spans multiple files or involves breaking changes
`local_code_review`	Review code for bugs, smells, and naming issues	Full security audit or codebase-wide review is requested
`local_test_generate`	Write unit tests for a function or class	Integration tests or complex mocking is required

Recommended Models

Model	Size	Best for	Command
`qwen2.5-coder`	4.7 GB	Code generation, refactoring, tests	`ollama pull qwen2.5-coder`
`gemma3:4b`	3.3 GB	Explanations, code review, low-RAM machines	`ollama pull gemma3:4b`
`deepseek-coder-v2`	8.9 GB	Highest quality code tasks	`ollama pull deepseek-coder-v2`
`codellama`	3.8 GB	General-purpose coding (older baseline)	`ollama pull codellama`

Tip: qwen2.5-coder is the recommended default — it offers the best balance of speed, quality, and memory footprint for code tasks.

Configuration

All configuration is done via environment variables in your MCP client config, or as appsettings.json overrides when building from source.

Environment Variables

Variable	Default	Description
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama server URL
`OLLAMA_MODEL`	`qwen2.5-coder`	Default model for all tools
`ROUTER_ENABLED`	`true`	Set to `false` to disable routing (all tasks go to cloud)

Routing Thresholds

Control how aggressively tasks are routed locally by setting these in appsettings.json (source builds) or by contributing an env-var override:

Setting	Default	Effect
`Router:SimpleConfidenceThreshold`	`0.75`	Minimum score to route a "simple" task locally
`Router:MediumConfidenceThreshold`	`0.50`	Minimum score to route a "medium" task locally
`Router:MaxTokensSimple`	`500`	Token count upper bound for "simple" classification
`Router:MaxTokensMedium`	`1500`	Token count upper bound for "medium" classification

More aggressive local routing (accept more tasks locally, at lower quality confidence):

"SimpleConfidenceThreshold": 0.60,
"MediumConfidenceThreshold": 0.40

More conservative (only high-confidence simple tasks handled locally):

"SimpleConfidenceThreshold": 0.90,
"MediumConfidenceThreshold": 0.70

Cost Tracking

"CostTracking": {
  "Enabled": true,
  "CloudInputTokenRate": 0.003,
  "CloudOutputTokenRate": 0.015,
  "Currency": "USD",
  "LogSummaryEveryNRequests": 100
}

Rates are per 1,000 tokens. The estimator uses conservative sizing based on average Ollama response sizes.

How Routing Works

Every prompt is scored by a heuristic classifier across four signals:

Keyword analysis — task verbs (generate, explain, refactor) and complexity indicators (across all, entire codebase, architecture)
Token count — prompts over 500 tokens are unlikely to be simple tasks
Structural signals — multi-file references, import counts, line counts
Task type match — each tool has a baseline complexity expectation

Scores above SimpleConfidenceThreshold → local. Scores above MediumConfidenceThreshold → local (medium tier). Everything else → [skip].

Architecture

src/
  CopilotLocalRouter.Core/        # Domain logic
    Classification/               # Heuristic task classifier
    Routing/                      # Tiered request router
    Agents/                       # AgentManager — Ollama conversation executor
    Resilience/                   # Circuit breaker, LRU prompt cache, normalizer
    Telemetry/                    # Metrics, cost estimator, quality scorer
    Configuration/                # RouterOptions, ModelProfile
    Interfaces/                   # IRequestRouter, IAgentManager, ITaskClassifier
  CopilotLocalRouter.Ollama/      # OllamaSharp IChatClient integration
  CopilotLocalRouter.McpTools/    # MCP tool definitions (5 tools)
  CopilotLocalRouter.Host/        # Startup, DI, appsettings, health check

tests/
  CopilotLocalRouter.Core.Tests/         # Unit tests — classifier, resilience, telemetry
  CopilotLocalRouter.McpTools.Tests/     # Integration tests — MCP tool end-to-end

benchmarks/
  CopilotLocalRouter.Benchmarks/         # BenchmarkDotNet — classification + cache key perf

Key dependencies:

Package	Version	Role
`ModelContextProtocol`	1.2.0	MCP stdio server
`OllamaSharp`	5.4.25	Ollama `IChatClient` implementation
`Microsoft.Extensions.AI`	10.5.0	Middleware pipeline, `IChatClient` abstractions
`Microsoft.Extensions.Hosting`	10.0.7	DI, configuration, lifetime management

Resilience

Feature	Behaviour
Circuit breaker	Opens after 3 consecutive Ollama failures; auto-resets after 30 seconds
Prompt cache	LRU cache, 500 entries, 60-minute TTL, SHA256-keyed
Retry	Up to 2 retries with 1s / 3s exponential backoff
Graceful degradation	Any failure returns `[skip]` — the user's request is never blocked

Telemetry

OTel-compatible metrics emitted via System.Diagnostics.Metrics (meter name: CopilotLocalRouter):

Metric	Type	Description
`router.requests.total`	Counter	All requests received
`router.requests.local`	Counter	Requests handled by Ollama
`router.requests.skipped`	Counter	Requests returned to cloud
`router.cache.hits`	Counter	Prompt cache hits
`router.cache.misses`	Counter	Prompt cache misses
`router.circuit.state_changes`	Counter	Circuit breaker state transitions
`router.classification.duration_ms`	Histogram	Time to classify a prompt
`router.agent.duration_ms`	Histogram	Time for Ollama to respond
`router.agent.response_tokens`	Histogram	Tokens in Ollama response
`router.agents.active`	Gauge	Concurrent Ollama calls in flight

Connect any OpenTelemetry collector (Prometheus, Jaeger, Grafana, etc.) to the standard OTel endpoint.

Contributing

Contributions are welcome. Please open an issue first if you're planning a significant change.

git clone https://github.com/michaelstonis/CopilotLocalRouter.git
cd CopilotLocalRouter
dotnet restore
dotnet build
dotnet test

Guidelines:

All new routing logic should have classifier unit tests
MCP tool changes require integration tests in CopilotLocalRouter.McpTools.Tests
Keep tool descriptions tightly tuned — they directly affect AI agent tool selection

Troubleshooting

See docs/troubleshooting.md for common issues including:

Ollama not reachable / tools not appearing in assistant
All tasks returning [skip] (threshold tuning)
Circuit breaker stuck open
Model not found errors

Changelog

See CHANGELOG.md for release history.

License

MIT — see LICENSE for details.

CopilotLocalRouter

Reviews

Documentation

CopilotLocalRouter

What is this?

Features

Table of Contents

Prerequisites

Installation

Build from Source

Quick Start

Supported AI Clients

Available Tools

Recommended Models

Configuration

Environment Variables

Routing Thresholds

Cost Tracking

How Routing Works

Architecture

Resilience

Telemetry

Contributing

Troubleshooting

Changelog

License