K8s Gpu MCP Server
NVIDIA GPU hardware introspection for Kubernetes clusters via MCP
Ask AI about K8s Gpu MCP Server
Powered by Claude Β· Grounded in docs
I know everything about K8s Gpu MCP Server. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
k8s-gpu-mcp-server
Just-in-Time SRE Diagnostic Agent for NVIDIA GPU Clusters on Kubernetes
Overview
k8s-gpu-mcp-server is an ephemeral diagnostic agent that provides surgical,
real-time NVIDIA GPU hardware introspection for Kubernetes clusters via the
Model Context Protocol (MCP).
Unlike traditional monitoring systems, this agent is designed for AI-assisted troubleshooting by SREs debugging complex hardware failures that standard Kubernetes APIs cannot detect.
β¨ Key Features
- π― Low-Footprint, Always-Available - Persistent HTTP server (~15-20MB idle) performs GPU work only when tools are invoked
- π HTTP Transport - JSON-RPC 2.0 over HTTP/SSE (production default)
- π Deep Hardware Access - Direct NVML integration for GPU diagnostics
- π€ AI-Native - Built for Claude Desktop, Cursor, and MCP-compatible hosts
- π MCP Prompts - Pre-built GPU diagnostic workflows for guided troubleshooting
- π Secure by Default - Read-only operations with explicit operator mode
- β‘ Production Ready - Real Tesla T4 testing, 550+ tests passing
π Quick Start
One-Click Install
Click the button above to install automatically in Cursor.
One-Line Installation
# Using npx (recommended)
npx k8s-gpu-mcp-server@latest
# Or install globally
npm install -g k8s-gpu-mcp-server
π Manual Configuration: Cursor / VS Code
Add to ~/.cursor/mcp.json (Cursor) or VS Code MCP config:
{
"mcpServers": {
"k8s-gpu-mcp": {
"command": "npx",
"args": ["-y", "k8s-gpu-mcp-server@latest"]
}
}
}
π Manual Configuration: Claude Desktop
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"k8s-gpu-mcp": {
"command": "npx",
"args": ["-y", "k8s-gpu-mcp-server@latest"]
}
}
}
Install from Source
# Clone and build
git clone https://github.com/ArangoGutierrez/k8s-gpu-mcp-server.git
cd k8s-gpu-mcp-server
make agent
# Test with mock GPUs (no hardware required)
cat examples/gpu_inventory.json | ./bin/agent --nvml-mode=mock
# Test with real GPU (requires NVIDIA driver)
cat examples/gpu_inventory.json | ./bin/agent --nvml-mode=real
Deploy to Kubernetes
# Deploy with Helm OCI (recommended)
helm install k8s-gpu-mcp-server \
oci://ghcr.io/arangogutierrez/charts/k8s-gpu-mcp-server \
--namespace gpu-diagnostics --create-namespace
# Or from local chart
helm install k8s-gpu-mcp-server ./deployment/helm/k8s-gpu-mcp-server \
--namespace gpu-diagnostics --create-namespace
# Find agent pod on target node
NODE_NAME=<node-name>
POD=$(kubectl get pods -n gpu-diagnostics \
-l app.kubernetes.io/name=k8s-gpu-mcp-server \
--field-selector spec.nodeName=$NODE_NAME \
-o jsonpath='{.items[0].metadata.name}')
# Start diagnostic session
kubectl exec -it -n gpu-diagnostics $POD -- /agent --mode=read-only
Note: GPU access requires
runtimeClassName: nvidiaconfigured by GPU Operator or nvidia-ctk. For clusters without RuntimeClass, use fallback:--set gpu.runtimeClass.enabled=false --set gpu.resourceRequest.enabled=true
Configure Claude Desktop with kubectl (Advanced)
For deployed agents, add to your Claude Desktop configuration:
{
"mcpServers": {
"k8s-gpu-agent": {
"command": "kubectl",
"args": ["exec", "-i", "deploy/k8s-gpu-mcp-server", "-n", "gpu-diagnostics", "--", "/agent"]
}
}
}
Then ask Claude: "What's the temperature of the GPUs?"
π Full Quick Start Guide β | Kubernetes Deployment β
π Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP Client (Claude/Cursor) β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β stdio / HTTP
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Gateway Pod (:8080) β
β Router β Circuit Breaker β HTTP Client β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β HTTP (pod-to-pod)
βββββββββββββββββββββΌββββββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Agent (Node 1) β β Agent (Node 2) β β Agent (Node N) β
β 9 MCP Tools β β 9 MCP Tools β β 9 MCP Tools β
β NVML β GPU β β NVML β GPU β β NVML β GPU β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Design Principles:
- HTTP-First: Gateway routes via HTTP to agent pods (~50ms latency)
- Low Footprint: Persistent HTTP server, ~15-20MB memory
- Observable: Circuit breaker, Prometheus metrics, distributed tracing
- Interface Abstraction: Testable, flexible, portable (538 tests)
π Architecture Documentation β
π οΈ Available Tools
| Tool | Description | Category | Status |
|---|---|---|---|
get_gpu_inventory | Hardware inventory + telemetry | NVML | β Available |
get_gpu_health | GPU health monitoring with scoring | NVML | β Available |
analyze_xid_errors | Parse GPU XID error codes from kernel logs | NVML | β Available |
get_nvlink_topology | NVLink interconnect topology and health | NVML | β Available |
get_gpu_timeline | Historical GPU metrics from flight recorder | NVML + Blackbox | β Available |
describe_gpu_node | Node-level GPU diagnostics with K8s metadata | K8s + NVML | β Available |
get_pod_gpu_allocation | GPU-to-Pod correlation via resource requests | K8s | β Available |
explain_failure | Root cause analysis for failed GPU workloads | K8s + Incidents | β Available |
get_incident_report | Detailed incident report with timeline and snapshots | K8s + Incidents | β Available |
kill_gpu_process | Terminate GPU process | Operator | π§ M4 (Operator) |
reset_gpu | GPU reset | Operator | π§ M4 (Operator) |
π Available Prompts
MCP Prompts provide guided diagnostic workflows that orchestrate multiple tools.
See pkg/prompts/prompts.go for prompt definitions.
| Prompt | Description |
|---|---|
gpu-health-check | Comprehensive GPU health assessment with recommendations |
diagnose-xid-errors | Analyze NVIDIA XID errors with remediation guidance |
gpu-triage | Standard SRE triage workflow: inventory β health β XID analysis |
Example usage with Claude:
You: "Run the GPU triage workflow for node gpu-worker-5"
Claude: [Executes gpu-triage prompt]
β Calls get_gpu_inventory, get_gpu_health, analyze_xid_errors
β Returns structured triage report with recommendations
π Operation Modes
| Mode | Flag | Description |
|---|---|---|
| Read-Only (default) | --mode=read-only | All diagnostic tools, no mutations |
| Operator | --mode=operator | Enables future mutating operations (kill process, reset GPU) |
Read-only mode is the default and recommended for most use cases. Operator mode enables future M4 tools that perform write operations on GPUs.
πΌ Flight Recorder
The agent includes a built-in flight recorder (pkg/blackbox) that continuously
captures GPU telemetry (temperature, power, utilization, memory) into per-GPU
ring buffers. This enables tools like get_gpu_timeline and get_incident_report
to query historical GPU metrics around the time of a failure.
The flight recorder starts automatically with the agent and requires no additional configuration. Data is retained in memory for the configured window (default: 30 minutes).
π MCP Usage Guide β
π Project Status
Current Milestone: M3: Kubernetes Integration
Progress: ~90% Complete (HTTP Transport β , Gateway β , K8s Tools β )
Completed Milestones
- β M1: Foundation & API - Completed Jan 3, 2026
- β
M2: Hardware Introspection - Completed Jan 10, 2026
- Real NVML integration, tested on Tesla T4
- GPU health monitoring, XID error analysis
- npm/Helm distribution
Recent Updates (Jan 2026)
- Jan 17: MCP Prompts support - 3 built-in GPU diagnostic workflows
- Jan 16: Documentation 360 review for external contributors
- Jan 15: K8s tools complete (
describe_gpu_node,get_pod_gpu_allocation) - Jan 14: HTTP Transport Epic complete - 150Γ latency improvement
- Jan 14: Cross-node networking fix (Calico VXLAN)
- Jan 13: Gateway mode with circuit breaker & Prometheus metrics
π§ͺ Testing
Unit Tests (No GPU Required)
make test # Run all unit tests (538 tests passing)
make coverage # Generate coverage report
make coverage-html # View coverage in browser
Integration Tests (Requires GPU)
make test-integration # Run on GPU hardware
# Or manually:
go test -tags=integration -v ./pkg/nvml/
Latest Test Results:
β 538 total tests passing
β Race detector enabled (-race)
β Coverage: 58-80% by package
Integration tested on Tesla T4:
- GPU: Tesla T4 (15GB)
- Temperature: 29Β°C
- Power: 13.9W
- All NVML operations verified
ποΈ Build
# Build for local platform
make agent
# Build for Linux (with real NVML)
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 make agent
# Build container image
make image
# Multi-arch release builds
make dist
Binary Sizes:
- Mock mode: 4.3MB (CGO disabled)
- Real mode: 7.9MB (CGO enabled)
π¦ Installation
Using npm (Recommended)
# Run directly with npx
npx k8s-gpu-mcp-server@latest
# Or install globally
npm install -g k8s-gpu-mcp-server
From Source
git clone https://github.com/ArangoGutierrez/k8s-gpu-mcp-server.git
cd k8s-gpu-mcp-server
make agent
sudo mv bin/agent /usr/local/bin/k8s-gpu-mcp-server
Using Go
go install github.com/ArangoGutierrez/k8s-gpu-mcp-server/cmd/agent@latest
Container Image
docker pull ghcr.io/arangogutierrez/k8s-gpu-mcp-server:latest
Helm Chart (OCI)
# Install from GHCR OCI registry
helm install k8s-gpu-mcp-server \
oci://ghcr.io/arangogutierrez/charts/k8s-gpu-mcp-server \
--namespace gpu-diagnostics --create-namespace
π€ Contributing
We welcome contributions! Please see our Development Guide for details.
Quick Contribution Guide
- Check open issues
- Fork and create feature branch:
git checkout -b feat/my-feature - Make changes, add tests
- Run checks:
make all - Commit with DCO:
git commit -s -S -m "feat(scope): description" - Open PR with labels and milestone
π Full Development Guide β
π Documentation
- Quick Start Guide - Get running in 5 minutes
- Kubernetes Deployment - K8s deployment and configuration
- Architecture - System design and components
- Security Model - RBAC and security configuration
- MCP Usage - How to consume the MCP server
- Development Guide - Contributing guidelines
- Examples - Sample JSON-RPC requests
π§ Technology Stack
- Language: Go 1.25+ (latest stable)
- MCP Protocol: mcp-go v0.43.2
- GPU Library: go-nvml v0.13.0-1
- Testing: testify v1.10.0
- Container: Distroless Debian 12 (coming in M3)
π― Use Cases
1. Debugging Stuck Training Jobs
SRE: "Why is the training job on node-5 stuck?"
Claude β k8s-gpu-mcp-server β Detects XID 48 (ECC Error)
Claude: "Node-5 has uncorrectable memory errors. Drain immediately."
2. Thermal Management
SRE: "Are any GPUs thermal throttling?"
Claude β k8s-gpu-mcp-server β Checks temps and throttle status
Claude: "GPU 3 is at 86Β°C and thermal throttling. Check cooling."
3. Topology Validation
SRE: "Is NVLink properly configured for multi-GPU training?"
Claude β k8s-gpu-mcp-server β Inspects NVLink topology
Claude: "All 8 GPUs connected via NVLink, 600GB/s bandwidth."
4. Zombie Process Hunting
SRE: "GPU memory is full but no pods are running"
Claude β k8s-gpu-mcp-server β Lists GPU processes
Claude: "Found zombie process PID 12345 using 8GB. Kill it?"
π Achievements
- β Go 1.25 - Latest Go version
- β Real NVML - Tested on Tesla T4
- β 550+ Tests Passing - Race detector enabled, 58-80% coverage
- β HTTP-First Architecture - 150Γ faster than exec routing
- β Gateway + Circuit Breaker - Production-grade reliability
- β MCP Prompts - Guided diagnostic workflows for SRE troubleshooting
- β Prometheus Metrics - Per-node latency tracking
- β ~8MB Binary - 84% under 50MB target
- β MCP 2025-06-18 - Latest protocol version
π License
Apache License 2.0 - See LICENSE for details.
π Acknowledgments
- NVIDIA NVML - GPU Management Library
- Model Context Protocol - MCP Specification
- mcp-go - MCP Go Implementation
- Anthropic Claude - AI Assistant
- Cursor - AI-Powered IDE
π Contact
Maintainer: @ArangoGutierrez
Issues: GitHub Issues
Discussions: GitHub Discussions
β Star us on GitHub β it helps!
