📦

Ollama Almasrv MCP

MCP server for accessing Ollama models on a remote GPU server (RTX PRO 6000 96GB) via Model Context Protocol. 10 models: 4 local + 6 cloud. Tools: chat, think, embed, similarity, health, models.

0 installs

Trust: 56 — Fair

Installation

npx ollama-almasrv-mcp

Ask AI about Ollama Almasrv MCP

I know everything about Ollama Almasrv MCP. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

ollama-almasrv-mcp

MCP (Model Context Protocol) server for accessing Ollama models on a remote GPU server.

Designed for ALMASRV (NVIDIA RTX PRO 6000 96GB VRAM) but works with any Ollama instance behind a compatible HTTP gateway.

Features

6 MCP tools: chat, think, embed, similarity, models, health
10 models: 4 local (llama3.3:70b, qwen3:32b, llama3.2:3b, mxbai-embed-large) + 6 cloud
1024-dim embeddings compatible with SQL Server 2025 VECTOR(1024)
Thinking models with reasoning traces (qwen3:32b, kimi-k2-thinking, glm-5, kimi-k2.5, minimax-m2.5)
Configurable endpoints via environment variables

Quick Install

pip install ollama-almasrv-mcp

Setup with Claude Code

# Add to Claude Code (user-level, available everywhere)
claude mcp add --scope user ollama-almasrv -- ollama-almasrv-mcp

# Or with custom server URLs
claude mcp add --scope user \
  -e ALMASRV_GATEWAY_URL=http://your-server:8030 \
  -e ALMASRV_EMBED_URL=http://your-server:8031 \
  ollama-almasrv -- ollama-almasrv-mcp

Available Tools

Tool	Description
`ollama_chat`	Chat with any Ollama model (default: qwen3:32b)
`ollama_think`	Chat with thinking models that return reasoning traces
`ollama_embed`	Generate 1024-dim embedding vectors
`ollama_similarity`	Calculate cosine similarity between two texts
`ollama_models`	List all available models
`ollama_health`	Check gateway and embedding service health

Configuration

Environment Variable	Default	Description
`ALMASRV_GATEWAY_URL`	`http://192.168.50.78:8030`	Ollama gateway (chat/think/models)
`ALMASRV_EMBED_URL`	`http://192.168.50.78:8031`	Embedding service (embed/similarity)

Requirements

Python >= 3.10
A running Ollama instance with a compatible HTTP gateway
Gateway endpoints: /chat, /models, /health
Embedding endpoints: /embeddings, /similarity, /health

License

MIT