📦
Ollama Almasrv MCP
MCP server for accessing Ollama models on a remote GPU server (RTX PRO 6000 96GB) via Model Context Protocol. 10 models: 4 local + 6 cloud. Tools: chat, think, embed, similarity, health, models.
0 installs
Trust: 56 — Fair
Ai
Installation
npx ollama-almasrv-mcpAsk AI about Ollama Almasrv MCP
Powered by Claude · Grounded in docs
I know everything about Ollama Almasrv MCP. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Loading tools...
Reviews
Documentation
ollama-almasrv-mcp
MCP (Model Context Protocol) server for accessing Ollama models on a remote GPU server.
Designed for ALMASRV (NVIDIA RTX PRO 6000 96GB VRAM) but works with any Ollama instance behind a compatible HTTP gateway.
Features
- 6 MCP tools: chat, think, embed, similarity, models, health
- 10 models: 4 local (llama3.3:70b, qwen3:32b, llama3.2:3b, mxbai-embed-large) + 6 cloud
- 1024-dim embeddings compatible with SQL Server 2025
VECTOR(1024) - Thinking models with reasoning traces (qwen3:32b, kimi-k2-thinking, glm-5, kimi-k2.5, minimax-m2.5)
- Configurable endpoints via environment variables
Quick Install
pip install ollama-almasrv-mcp
Setup with Claude Code
# Add to Claude Code (user-level, available everywhere)
claude mcp add --scope user ollama-almasrv -- ollama-almasrv-mcp
# Or with custom server URLs
claude mcp add --scope user \
-e ALMASRV_GATEWAY_URL=http://your-server:8030 \
-e ALMASRV_EMBED_URL=http://your-server:8031 \
ollama-almasrv -- ollama-almasrv-mcp
Available Tools
| Tool | Description |
|---|---|
ollama_chat | Chat with any Ollama model (default: qwen3:32b) |
ollama_think | Chat with thinking models that return reasoning traces |
ollama_embed | Generate 1024-dim embedding vectors |
ollama_similarity | Calculate cosine similarity between two texts |
ollama_models | List all available models |
ollama_health | Check gateway and embedding service health |
Configuration
| Environment Variable | Default | Description |
|---|---|---|
ALMASRV_GATEWAY_URL | http://192.168.50.78:8030 | Ollama gateway (chat/think/models) |
ALMASRV_EMBED_URL | http://192.168.50.78:8031 | Embedding service (embed/similarity) |
Requirements
- Python >= 3.10
- A running Ollama instance with a compatible HTTP gateway
- Gateway endpoints:
/chat,/models,/health - Embedding endpoints:
/embeddings,/similarity,/health
License
MIT
