io.github.islobodan/cotforce-mcp
Give brains to your small models. MCP server enforcing step-by-step Chain-of-Thought.
Ask AI about io.github.islobodan/cotforce-mcp
Powered by Claude Β· Grounded in docs
I know everything about io.github.islobodan/cotforce-mcp. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
CotForce-MCP
"Give brains to your small models."
CotForce enforces step-by-step Chain-of-Thought, turning 4B parameter models into methodical reasoners.
Why this exists
A 4-billion-parameter Gemma cannot solve SEND + MORE = MONEY. It's a classic cryptarithmetic puzzle β 8 unique digits, 5 columns, 4 carry values. A bare 4B model guesses randomly. It hallucinates digits. It loses track of carries after column 2.
The same model, with CotForce:
Step 1: Analyze the leftmost column. S+M+C3 = MO. Max sum is 19998. β΄ M=1.
Step 2: S+1+C3 = 10+O. With M=1 and carry, O must be 0.
Step 3: D+E = Y+10C1 β C1=1. Now R+C1=9 β C1=0βR=9 (used), C1=1βR=8.
...
Step 11: All digits assigned. 9567 + 1085 = 10652. Verified.
11 structured reasoning steps. Zero hallucinations. Correct answer.
CotForce doesn't make small models smarter. It forces them to think before they speak β which is often all they need.
β‘ Two modes β one line of config
CotForce uses the MCP sampling protocol (sampling/createMessage) to call LLMs. If your client supports it (Claude Desktop, Cursor), nothing extra is needed.
If not β or if you're using a local model like Gemma via LMStudio β switch to direct HTTP mode:
{
"mcpServers": {
"cotforce": {
"command": "node",
"args": ["node_modules/@slbdn/cotforce-mcp/index.js"],
"env": {
"MODE": "direct",
"API_BASE_URL": "http://localhost:1234/v1",
"MODEL": "gemma-4-e4b-it-mlx"
}
}
}
}
That's it. The same 4B Gemma that couldn't solve SEND+MORE=MONEY above β now with CotForce, working locally through LMStudio.
π Features
- Rigid CoT enforcement β forces any LLM to output valid JSON
{reasoning, result}via strict system prompts and fewβshot examples. - Adaptive multiβlayer parser β plug-in architecture with 5 built-in parsers (direct JSON, fenced blocks, XML/labels, brace-balanced, truncated recovery) in a priority-sorted pipeline. Add custom parsers via
CotParserinterface. Select parsers viaCOT_PARSERSenv var.- Direct JSON (with codeβfence stripping)
- JSON inside markdown fenced blocks
- XML / heuristic label extraction (
<reasoning>,Reasoning:) - Braceβbalancing scanner for nested JSON objects
- Zod runtime validation β validates tool arguments and parsed CoT output with strict schemas.
- Automatic retry with temperature increase β up to 3 attempts (configurable) with increasing temperature and correction suffixes.
- Perβrequest rejection memo β no global mutable state; safe under concurrent tool calls.
- Token budgeting with tiktoken β accurate token counting using OpenAI's
cl100k_baseencoding, with fallback to character heuristic. Tweak viaREASONING_OVERHEAD. - Configurable model β set
MODELenvironment variable to hint a specific model; leave unset for host default. - Model-specific prompts β automatically selects tuned system prompts for Claude, GPT-4, Gemini, and Grok based on
MODEL. - Universal compatibility β works with MCP sampling (Claude Desktop) or direct LLM HTTP calls (OpenAI, LMStudio, Ollama, any OpenAI-compatible API). Set
API_KEYto use direct mode. - Structured logging β timestamped, levelβfiltered logs to stderr (supports
LOG_LEVEL). - Output truncation detection β detects when the LLM response hits the token limit and retries with a conciseness hint (
TRUNCATION_THRESHOLD). - Token usage exposure β every response includes input / output / budget token counts so callers can optimize.
- User-supplied result schema β optional
resultSchemaparameter validates theresultfield typeβmap; mismatches trigger retry. - Structured metrics β in-memory counters for requests, success/fail rates, truncations, retries, latency, and token usage. Logged on shutdown.
- Comprehensive test suite β 151 tests covering parser pipeline, token budgeting, metrics, schema validation, retry loop, progress notifications, caching, and MCP server integration.
π¦ Installation
npm install @slbdn/cotforce-mcp
# or
git clone https://github.com/islobodan/cotforce-mcp
cd cotforce-mcp
npm install
npm run build
Requires Node.js β₯ 18.
Quick start β Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"cotforce": {
"command": "npx",
"args": ["-y", "@slbdn/cotforce-mcp"],
"env": {
"MODEL": "claude-3-5-sonnet"
}
}
}
}
No clone, no build. npx -y pulls and runs directly from npm.
π§ Configuration
The server is configured via environment variables (all optional):
| Variable | Default | Description |
|---|---|---|
MODEL | (not set) | Model name hint (e.g. claude-3-5-sonnet, gpt-4o). If empty, no hint sent β MCP host decides. |
MAX_RETRIES | 2 | Number of retry attempts before returning raw output. |
BASE_TEMP | 0.1 | Initial sampling temperature. |
TEMP_INCREMENT | 0.2 | Temperature added per retry attempt. |
TIMEOUT | 60000 / 120000 | Sampling timeout in ms (60s). Direct HTTP mode uses longer default (120s) since local models are slower. |
CACHE_TTL | 3600000 | Result cache TTL in ms (default 1 hour). Set to 0 to disable. |
CACHE_MAX_ENTRIES | 100 | Maximum cached results before evicting oldest. |
COT_PARSERS | (all) | Comma-separated parser names to use (e.g., direct-json,fenced-block). Skips others. |
TRUNCATION_THRESHOLD | 0.95 | Ratio of output/budget that triggers truncation detection. Attempts truncated JSON recovery first, then retries with 1.5x budget. |
REASONING_OVERHEAD | 800 | Fixed token overhead added to the budget formula. Increase for verbose models. |
FALLBACK_MODELS | (not set) | Comma-separated list of fallback models (e.g. gpt-4o,claude-3-5-sonnet). Cycled on failure. |
MODE | auto | auto, sampling, or direct. auto uses direct HTTP when API_KEY is set and client lacks sampling support. |
API_KEY | (not set) | LLM API key for direct HTTP mode. Optional for local endpoints (LMStudio, Ollama). Required for remote providers (OpenAI, Anthropic, etc.). |
API_BASE_URL | https://api.openai.com | Base URL for direct HTTP mode. Change for LMStudio (http://localhost:1234/v1) or other providers. |
LOG_LEVEL | INFO | One of DEBUG, INFO, WARN, ERROR. |
Example
MODEL=gpt-4o MAX_RETRIES=3 BASE_TEMP=0.2 TEMP_INCREMENT=0.15 LOG_LEVEL=DEBUG npx @slbdn/cotforce-mcp
π§ͺ Usage
As an MCP Tool
Add to your MCP client configuration. A .mcp.json file is included in the package for auto-discovery by clients like Cursor, VS Code, and Windsurf. Copy the relevant config below to your client's settings:
With MCP sampling (Claude Desktop):
{
"mcpServers": {
"cotforce": {
"command": "node",
"args": ["/path/to/cotforce-mcp/index.js"],
"env": {
"MODEL": "claude-3-5-sonnet",
"MAX_RETRIES": "2"
}
}
}
}
With direct LLM HTTP (LMStudio, OpenAI, Ollama):
{
"mcpServers": {
"cotforce": {
"command": "node",
"args": ["/path/to/cotforce-mcp/index.js"],
"env": {
"MODE": "direct",
"API_BASE_URL": "http://localhost:1234/v1",
"MODEL": "local-model",
"MAX_RETRIES": "2"
}
}
}
}
Note:
API_KEYis optional for local endpoints like LMStudio or Ollama. It is required for remote providers like OpenAI or Anthropic.
The root
index.jsis a launcher that delegates todist/index.js. It guards against missing builds with a helpful error message.
π©Ί Troubleshooting
Response truncated mid-reasoning
What you see: finish_reason: "length" in the LLM response. The reasoning cuts off before the result field.
Why: The token budget is too tight. Complex reasoning (like SEND+MORE=MONEY) can need 3000+ output tokens, but the default minimum is 4096 β while the default model-level cap can vary.
Fix: Increase the budget overhead:
REASONING_OVERHEAD=1600 # default is 800, raise for verbose models
Or skip token-heavy parser layers to save budget for reasoning:
COT_PARSERS=direct-json,fenced-block # skip heuristic and brace-balanced
MCP client timeout
What you see: MCP error -32001: Request timed out before the solution appears.
Why: Complex CoT reasoning takes time β 60-90 seconds for local models like Gemma. This error can come from two places:
- CotForce's own timeout β default 120s for direct HTTP mode. Controlled by the
TIMEOUTenv var. - The MCP client's timeout β LM Studio, Claude Desktop, Cursor, etc. each have their own default timeout for tool calls (often 30-60s). This is separate from CotForce's timeout.
Fix β check both sides:
Increase CotForce's timeout:
TIMEOUT=180000 # 3 minutes
Check your MCP client's timeout setting:
LM Studio β add "timeout" to mcp.json (milliseconds):
{
"mcpServers": {
"cotforce": {
"command": "node",
"args": ["index.js"],
"env": {
"TIMEOUT": "180000"
},
"timeout": 300000
}
}
}
Claude Desktop β the tool call timeout is not directly configurable. A workaround is to increase CotForce's TIMEOUT to complete within the client's window, or use a faster model.
Cursor / VS Code β check the MCP extension or .vscode/mcp.json for a timeout or requestTimeout setting.
Call the Tool
{
"name": "solve_problem",
"arguments": {
"prompt": "What is 7 * 8 + 2?"
}
}
With Result Schema Validation
{
"name": "solve_problem",
"arguments": {
"prompt": "List the prime numbers between 10 and 20",
"resultSchema": {
"primes": "object",
"count": "number"
}
}
}
If the result field doesn't match the schema, the server retries with a correction hint.
More Examples
See EXAMPLES.md for 16 diverse examples including:
- Logic puzzles, probability, word problems
- Code analysis, regex, SQL queries
- Creative writing, recipe adaptation
- Nested JSON with schema validation
- Usage with different models and fallbacks
Example Response
{
"content": [{
"type": "text",
"text": "π€ Agentic CoT Result:\n\n**Reasoning:** Step 1: Multiply 7 * 8 = 56. Step 2: Add 2 to get 58.\n\n**Answer:** 58\n\nπ Token Usage: 42 in / 150 out / 4096 budget"
}]
}
If parsing fails after all retries, the server returns the raw LLM output with a warning.
π§© Custom Parsers
The parser is a priority-sorted pipeline of plugins. Five built-in parsers run in order:
| Priority | Name | What it does |
|---|---|---|
| 10 | direct-json | Parses whole output as JSON (strips ```json fences) |
| 20 | fenced-block | Extracts JSON from markdown code blocks |
| 30 | heuristic | Looks for <reasoning>/<result> XML tags or Reasoning:/Result: labels |
| 40 | brace-balanced | Finds first balanced {} in arbitrary text |
| 50 | truncated-recovery | Salvages reasoning from truncated JSON (hit token limit) |
Filter parsers via COT_PARSERS env var:
COT_PARSERS=direct-json,fenced-block node index.js
Write a custom parser:
import { CotParser, AgenticCotSchema } from "@slbdn/cotforce-mcp";
class YamlParser implements CotParser {
name = "yaml";
priority = 35; // runs after heuristic, before brace-balanced
parse(raw: string): { reasoning: string; result: unknown } | null {
// Custom YAML parsing logic here
return null; // return null if this output isn't YAML
}
}
Then register it programmatically:
import { defaultParserPipeline, ParserPipeline } from "@slbdn/cotforce-mcp";
const pipeline = defaultParserPipeline();
pipeline.addParser(new YamlParser());
const result = pipeline.parse(rawText);
π API
Tool: solve_problem
- Input:
{ prompt: string }β the problem to solve. - Output: either:
- Success β structured CoT result.
- Soft failure β raw LLM output if parsing fails after all retries.
Sampling / LLM Calling
CotForce supports two modes for calling the LLM:
MCP Sampling (default with compatible clients):
- Uses MCP native
sampling/createMessage - Client selects and calls the model
- Requires client support (Claude Desktop, etc.)
Direct HTTP (for clients without sampling support):
- Calls OpenAI-compatible
/v1/chat/completionsdirectly - Works with OpenAI, LMStudio, Ollama, and any compatible provider
- Activated automatically in
MODE=autowhenAPI_KEYis set and client lacks sampling - Or force with
MODE=direct
Both modes use the same system prompt with fewβshot examples and strict schema constraints.
ποΈ Architecture
cotforce-mcp/
βββ src/
β βββ index.ts # MCP server, tool handlers, routing logic
β βββ lib/
β βββ parser.ts # Parser pipeline: CotParser interface + 5 plugin parsers + Zod schemas
β βββ tokens.ts # tiktoken integration + budget computation
β βββ prompts.ts # Model-specific system prompts
β βββ metrics.ts # In-memory request/performance counters
β βββ llm.ts # Direct HTTP LLM client (OpenAI-compatible)
βββ tests/
β βββ cache.test.ts # 10 unit tests for result caching
β βββ parser.test.ts # 47 unit tests for parser layers
β βββ tokens.test.ts # 23 unit tests for token budgeting
β βββ schema.test.ts # 8 unit tests for result schema validation
β βββ metrics.test.ts # 9 unit tests for metrics tracking
β βββ prompts.test.ts # 12 unit tests for model-specific prompts
β βββ llm.test.ts # 6 tests for direct mode detection
β βββ retry.test.ts # 4 integration tests for retry loop
β βββ progress.test.ts # 5 unit tests for progress notifications
β βββ server.test.ts # 9 integration tests via @slbdn/mcp-tester
βββ index.js # Root launcher (delegates to dist/)
βββ dist/ # Compiled TypeScript output
βββ package.json
π§ How It Works
- System prompt enforces JSON output with
reasoningandresult. Model-specific variants tuned for Claude, GPT-4, Gemini, Grok. - Parser pipeline runs 5 built-in parsers in priority order (direct JSON, fenced blocks, XML/labels, brace-balanced, truncated recovery). First valid match wins. Custom parsers can be added via
COT_PARSERSenv var and theCotParserinterface. - Retry logic β if parsing fails, injects correction suffix and increases temperature. Supports fallback models (
FALLBACK_MODELS) when primary model refuses. - Rejection memory stores a snippet of the last failure to contextualise the next call (scoped perβrequest, threadβsafe).
- Token budgeting uses
estimateTokens()(lightweight heuristic) for budget math andcountTokens()(tiktoken) for exact counts. SetsmaxTokensdynamically (between 4096 and 8192) via formulaoverhead + inputTokens Γ 4. Detects truncation viafinish_reason: "length"and attempts JSON recovery before retrying.
π οΈ Development
git clone https://github.com/islobodan/cotforce-mcp
cd cotforce-mcp
npm install
npm run build # compile TypeScript to dist/
npm run dev # tsc --watch
npm run typecheck # type-check src/ and tests/
Scripts
| Script | Purpose |
|---|---|
npm run build | Compile TypeScript (src/ β dist/) |
npm run dev | Watch mode compilation |
npm run typecheck | TypeScript type-checking for source and tests |
npm test | Run full Jest test suite (133 tests) |
npm run test:smoke | Quick smoke test via mcp-tester CLI |
npm run test:tools | List available tools via mcp-tester CLI |
Testing
The test suite uses Jest with ts-jest (ESM) and @slbdn/mcp-tester for MCP server integration testing:
- Parser tests (
tests/parser.test.ts) β 47 unit tests covering all 5 parser plugins, edge cases, andAgenticCotSchemavalidation. - Token tests (
tests/tokens.test.ts) β 16 unit tests fortiktokenintegration, budget computation, andREASONING_OVERHEADtuning. - Schema tests (
tests/schema.test.ts) β 8 unit tests for user-suppliedresultSchemavalidation. - Metrics tests (
tests/metrics.test.ts) β 9 unit tests for request counters, latency tracking, and token usage averages. - Prompt tests (
tests/prompts.test.ts) β 10 unit tests for model-specific prompt selection. - LLM tests (
tests/llm.test.ts) β 3 unit tests for direct HTTP mode detection. - Server tests (
tests/server.test.ts) β 11 integration tests for tool discovery, argument validation, server lifecycle, and concurrent calls.
Custom Jest matchers are available via @slbdn/mcp-tester:
expect(tools).toHaveTool("solve_problem");
expect(tools).toHaveToolWithSchema("solve_problem");
expect(result).toReturnTextContaining("Reasoning:");
β οΈ Limitations & Honest Assessment
- No true production monitoring β only structured logs; no aggregated metrics.
- Token budget formula is heuristic β may need tuning for very verbose models.
- Model hints are suggestions β the MCP host decides which model to use.
- See TODO list for planned improvements.
π License
MIT Β© Slobodan Ivkovic
β Support
If you find CotForce-MCP useful, consider starring the repo and sharing your feedback!
