Ollama Fastmcp Wrapper
A proxy service that bridges Ollama with FastMCP thus allowing MCP servers to be plugged and used locally.
Ask AI about Ollama Fastmcp Wrapper
Powered by Claude Β· Grounded in docs
I know everything about Ollama Fastmcp Wrapper. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Ollama-FastMCP Wrapper
A proxy service that bridges Ollama with FastMCP thus allowing models to be used locally in conjunction with MCP servers and their tools directly on the local machine.
β¨ Features
- Connect/disconnect to multiple MCP servers at runtime (using FastMCP).
- Expose FastMCP tools as callable functions to Ollama models.
- Use Ollama locally with tool-augmented reasoning.
- Historical conversation with the LLM model persistable on disk with async I/O.
- Automatically summarise historical conversation with the model.
- Run as:
- API Server (via FastAPI + Uvicorn)
- Interactive CLI
:question: What is MCP?
The Model Context Protocol (MCP) is a protocol that lets you build servers to expose data and functionality to LLM applications in a secure, standardized way. Limited to this wrapper, MCP use is limited to the Tools part.
β‘ Installation
-
Clone this repository and install dependencies:
git clone https://github.com/andreamoro/ollama-fastmcp-wrapper.gitInstall dependencies according to the package manager you are using. If it's uv:
cd ollama-fastmcp-wrapper uv syncOtherwise go with the traditional (obsolete) pip:
cd ollama-fastmcp-wrapper pip install -r requirements.txt
Β Β Β Β Β Β Python requirements include: Β Β Β Β Β Β - aiofiles (async file I/O) Β Β Β Β Β Β - fastapi Β Β Β Β Β Β - fastmcp Β Β Β Β Β Β - uvicorn Β Β Β Β Β Β - ollama
- Make sure you have:
- Ollama Client up and running:
ollama serve - An MCP server to use the Tools capability
- Without an MCP server, you can use this wrapper as a conversation interface
- Two example MCP servers are included in the
mcp_servers/directory:- math_server.py - Basic arithmetic operations (add, subtract, multiply, divide)
- ipinfo_server.py - IP geolocation lookup with 20 preset organizations
β΅ Usage
Run the wrapper:
uv run python ollama_wrapper.py
You'll be asked which mode to start in:
- API mode β starts a REST API on
http://127.0.0.1:8000 - CLI mode β starts a terminal-based chat loop
π Demo Scripts
The demos/ directory contains comprehensive usage examples in both shell and Python formats:
- basic_chat - Simple chat without tools
- math_operations - Using the math MCP server
- ipinfo_lookup - IP geolocation queries
- server_management - Connect/disconnect/list servers
- history_management - Conversation persistence
- temperature_test_multi_model.py - Advanced temperature testing across multiple models
See demos/README.md for detailed usage instructions and examples.
π₯οΈ API Mode
Start the API:
uv run python ollama_wrapper.py
# choose "api"
π For comprehensive API documentation, usage patterns, and examples, see API_USAGE.md
API Endpoints
Root:
GET /β API documentation and endpoint listing
Chat:
POST /chatβ Send a chat request (with optional MCP tools)
Model Management:
GET /modelβ Get current session modelGET /model/listβ List all available Ollama modelsPOST /model/switch/{model_name}β Switch session model and reset contextGET /ollama/configβ Get Ollama instance configuration (host, label, active model)GET /ollama/statusβ Quick health check for Ollama (5s timeout)
History:
GET /historyβ Get current conversation historyGET /history/clearβ Clear the current conversation historyGET /history/load/{file_name}β Load conversation history from diskGET /history/overwrite/{file_name}β Overwrite an existing conversation fileGET /history/save/{file_name}β Save conversation history to disk
Servers:
GET /serversβ List available FastMCP servers from configPOST /servers/{server_name}/connectβ Connect to an MCP serverPOST /servers/{server_name}/disconnectβ Disconnect from an MCP serverGET /servers/{server_name}/toolsβ List available tools for a specific MCP server
Quick Start Example
# Start API server
uv run python ollama_wrapper.py api
# Simple chat request (uses session model from config)
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello! Tell me about Python"}'
# Connect to MCP server and use tools
curl -X POST http://localhost:8000/servers/math/connect
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{
"message": "What is 25 * 4?",
"mcp_server": "math"
}'
For detailed usage patterns including model switching, multi-model testing, temperature testing, and more, see API_USAGE.md.
π¬ CLI Mode
uv run python ollama_wrapper.py
# choose "cli"
π For comprehensive CLI documentation, commands, and usage patterns, see CLI_USAGE.md
Quick Start
# Start CLI
uv run python ollama_wrapper.py cli
# Chat naturally
You: Hello! How are you?
Bot: I'm doing well, thank you for asking!
# Use commands
You: /help # Show available commands
You: /model # Change model interactively
You: /clear # Clear conversation context
You: /exit # Exit CLI
For detailed usage patterns including model switching, persistent conversations, and temperature adjustment, see CLI_USAGE.md.
βοΈ Configuration
Configuration Files
Version 0.4.0 introduces a separated configuration structure:
-
Wrapper Configuration (
wrapper_config.toml- root directory):[wrapper] transport = "HTTP" # Transport method: "HTTP" or "STDIO" (default: HTTP) host = "0.0.0.0" # Server host address (default: 0.0.0.0) port = 8000 # Server port (default: 8000) history_file = "" # Path to conversation history file (default: none) overwrite_history = false # Overwrite history file on exit (default: false) max_history_messages = 20 # Maximum messages before summarization kicks in (default: 20) [ollama] host = "localhost" # Ollama instance host (default: localhost) port = 11434 # Ollama instance port (default: 11434) timeout = 300 # Request timeout in seconds (default: 300) label = "" # Optional label to identify this Ollama instance model = { default = "llama3.2:3b", temperature = 0.2 } # Model settingsWrapper Settings:
max_history_messages: Maximum number of messages before automatic summarization (default: 20)- When the message count exceeds this limit, older messages are summarized to save context
- Helps maintain conversation continuity while keeping token usage manageable
- Adjust based on your model's context window and use case
Ollama Settings:
host: Ollama instance host (use with port for remote/tunneled instances)port: Ollama instance port (default: 11434)timeout: Request timeout in seconds (default: 300). Prevents the wrapper from hanging indefinitely when SSH tunnels drop or remote Ollama becomes unresponsive.label: Human-readable label to identify this Ollama instance (prompted at startup if not set)- For server-side settings (parallelism, VRAM, network access), see OLLAMA_SERVER.md
Model Settings:
default: Default model name if not specified in requeststemperature: Controls response randomness (0.0-2.0)- Low (0.0-0.3): Consistent, deterministic responses (recommended for factual tasks)
- Medium (0.7-1.0): Balanced creativity
- High (1.5-2.0): Very creative, less predictable
- Temperature can be overridden per request via API
-
MCP Servers Configuration (
mcp_servers/mcp_servers_config.toml):[[servers]] name = "math" command = "uv" args = ["run", "--with", "fastmcp", "mcp_servers/math_server.py"] host = "http://localhost:5000/mcp" port = 5000 enabled = true [[servers]] name = "ipinfo" command = "uv" args = ["run", "--with", "fastmcp", "mcp_servers/ipinfo_server.py"] host = "http://localhost:5001/mcp" port = 5001 enabled = true token_file = "mcp_tokens.toml" # Optional: specify token file (default: mcp_tokens.toml) -
API Tokens (
mcp_servers/mcp_tokens.toml- gitignored):# Copy from mcp_tokens.toml.example and add your tokens [ipinfo] token = "your_ipinfo_token_here"
Configuration Priority
Command-line arguments take precedence over config file settings:
- If you specify
--hostor--porton the command line, those values will be used - If not specified on command line, values from
wrapper_config.tomlwill be used - If not in config file, default values will be used
Transport Methods
- STDIO transport β spawn server locally
- HTTP transport β connect to remote MCP server
Command-Line Arguments
Positional Arguments:
mode- Operation mode:apiorcli(default:api)model- Ollama model to use (e.g.,llama3.2:3b,gemma3:1b)
Optional Arguments:
-c, --wrapper-config <file>- Path to wrapper configuration file (default:wrapper_config.toml)--mcp-config <filename>- MCP servers config filename in mcp_servers/ directory (default:mcp_servers_config.toml)--history-file <file>- Path to conversation history file to load/save-o, --overwrite-history- Allow overwriting existing history file-t, --transport <method>- Transport method:HTTPorSTDIO(default: from config orHTTP)--wrapper-host <address>- Wrapper API server host address (default: from config or0.0.0.0)--wrapper-port <number>- Wrapper API server port number (default: from config or8000)
Ollama Connection Arguments:
--ollama-host <address>- Ollama instance host (default: from config orlocalhost)--ollama-port <number>- Ollama instance port (default: from config or11434)--ollama-timeout <seconds>- Request timeout in seconds (default: from config or300)--ollama-label <label>- Label to identify the Ollama instance (default:local-server)
Examples
# Use custom wrapper config file
uv run python ollama_wrapper.py api -c my_wrapper_config.toml
# Override config file settings
uv run python ollama_wrapper.py api --wrapper-host 127.0.0.1 --wrapper-port 9000
# Specify transport method
uv run python ollama_wrapper.py api -t STDIO
# Load conversation history
uv run python ollama_wrapper.py cli --history-file my_conversation.json
# Start with specific model and auto-save history
uv run python ollama_wrapper.py api llama3.2:3b --history-file conversation.json -o
# Use alternate MCP servers config
uv run python ollama_wrapper.py api --mcp-config alternate_servers.toml
# Connect to remote Ollama via SSH tunnel (with custom timeout)
uv run python ollama_wrapper.py api --ollama-host localhost --ollama-port 11435 \
--ollama-timeout 600 --ollama-label "remote-vps-via-tunnel"
π Architecture Diagram
flowchart TD
O[Ollama] --> W[Ollama-FastMCP Wrapper]
W -->|API mode| A[API]
W -->|CLI mode| C[CLI]
A --> M{ MCP Server Available?}
C --> Z
M -->|Yes| F[FastMCP Tools]
F --> Z[Chat]
M -->|No| Z
:bulb: Release History / Roadmap
Please check the Changelog file for more information.
License
MIT License Β© 2025 Andrea MORO
