LukeLamb/claude-ollama-mcp
Lets Claude query and manage a local Ollama server: list/show/pull/delete models, run generate/chat completions. Zero npm deps, pure Node over the HTTP API.
Ask AI about LukeLamb/claude-ollama-mcp
Powered by Claude Β· Grounded in docs
I know everything about LukeLamb/claude-ollama-mcp. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Claude Ollama
Lets Claude Desktop query and manage a local Ollama server. List installed models, inspect them, run one-shot generate/chat completions against any local model, or pull/delete models from the registry β all without opening a terminal.
Typical use: comparing Claude's answer to a local model on the same prompt, running cheap bulk completions against a quantized model, or checking custom training-checkpoint models you've imported into Ollama.
Requirements
- A running Ollama server (
ollama serveor the Ollama app). - Default endpoint is
http://localhost:11434. Override via theollama_urluser config in Claude Desktop's extension settings if you run Ollama on a different host or port. - No npm dependencies β pure Node over the HTTP API.
Install (Claude Desktop)
- Download the latest
Ollama.mcpbfrom the Releases page. - In Claude Desktop: Settings β Extensions β Extension Developer β Install Extension β pick the
.mcpb. - (Optional) In the extension's settings, set
Ollama server URLif you run Ollama on a non-default host/port. Leave blank forhttp://localhost:11434.
Tools
| Tool | Annotation | Purpose |
|---|---|---|
ollama_status | read-only | Health check + server version |
list_models | read-only | Local models with size, digest, family, parameter size, quantization |
list_running | read-only | Models currently loaded in VRAM |
show_model | read-only | Model details: modelfile, parameters, template, capabilities |
generate | open-world | One-shot text completion (non-streaming) |
chat | open-world | Chat completion with message history (non-streaming) |
pull_model | open-world | Download a model from the registry |
delete_model | destructive | Remove a locally-installed model |
Example prompts
"Which local models do I have installed, and which one is currently loaded in VRAM?"
"Run
forge:b6c1on this prompt: ''. Compare that output to your own answer.""Show me the modelfile for
forge:b7c1β I want to check the temperature setting.""Pull
llama3.1:70b." (expect a long wait for large models)"Delete the
forge:b5c3model β I don't need that checkpoint anymore."
Privacy policy
This extension runs entirely on your local machine and sends HTTP requests only to your Ollama server (default http://localhost:11434). No data leaves your machine unless you explicitly configure ollama_url to point at a remote Ollama instance, in which case the prompts and responses travel to that server.
The information visible to Claude includes:
- All prompts and chat messages you pass to
generateandchat(these go to the Ollama server, which may log them depending on its configuration). - Full text of completions returned by Ollama.
- Metadata for every installed model (names, digests, sizes, quantization, modelfile contents).
- Which models are currently loaded in VRAM and their size footprint.
If you have installed models containing proprietary fine-tunes or modelfiles with sensitive metadata, note that Claude will see that information when you call show_model or list_models.
delete_model is destructive and cannot be undone from this extension β the model must be re-pulled from the registry (or re-imported from source blobs) if deleted by mistake.
Troubleshooting
"cannot reach Ollama at http://localhost:11434 β is the server running?" β Start Ollama with ollama serve or launch the Ollama app. Verify with curl http://localhost:11434/ (should return "Ollama is running").
pull_model hangs for a long time β Ollama's pull API with stream: false blocks until the full download completes, which for multi-GB models can take many minutes. If you're pulling a huge model, run ollama pull <name> in a terminal instead β you'll see streaming progress there, and subsequent MCP calls will find the model already installed.
Custom/remote Ollama endpoint β Set ollama_url in the extension's settings (e.g. http://192.168.1.42:11434). Requires restart of the extension.
list_running shows a model after you stopped using it β Ollama keeps models hot in VRAM for a configurable TTL (default 5 minutes). The expires_at timestamp tells you when it'll unload. This is Ollama's behavior, not the extension's.
Development
Single ~400-line Node.js script, zero npm dependencies. Rebuild the .mcpb:
cd bundle-source
zip -j ../Ollama.mcpb manifest.json package.json server.js README.md LICENSE icon.png glama.json
License
MIT. See LICENSE.
Related
- claude-terminal-mcp β shell, filesystem, and background jobs.
- claude-rocm-mcp β AMD GPU monitoring; pairs well for checking whether Ollama's loaded model is saturating VRAM.
- claude-sessions-mcp β tmux session management for long-running jobs.
- claude-linux-mcp β X11 desktop control.
