Databricks MCP Guide
No description available
Ask AI about Databricks MCP Guide
Powered by Claude Β· Grounded in docs
I know everything about Databricks MCP Guide. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Building Custom MCP Servers for Databricks
A comprehensive guide to creating production-ready MCP (Model Context Protocol) servers that integrate with Databricks using FastAPI and the Databricks SDK.
Table of Contents
- Introduction
- What is MCP?
- Why Build an MCP Server for Databricks?
- Prerequisites
- Architecture Options
- Getting Started
- Step-by-Step Tutorial
- Configuration Files
- Development Workflow
- Testing Your Server
- Deploying to Databricks Apps
- Connecting to Claude CLI
- Troubleshooting
- Next Steps
- Resources
Introduction
This guide will walk you through creating a custom MCP server that enables AI agents like Claude to interact with your Databricks workspace. You'll learn how to:
- Set up a FastAPI + FastMCP application
- Create tools that call Databricks APIs
- Deploy your server to Databricks Apps
- Connect it to Claude CLI for AI-powered workspace management
By the end of this tutorial, you'll have a working MCP server that can list clusters, execute SQL queries, and run jobsβall through natural language commands to an AI agent.
What is MCP?
Model Context Protocol (MCP) is an open standard that connects AI agents to tools, data sources, and contextual information. Think of it as a universal adapter that lets AI assistants like Claude interact with your services in a standardized way.
Key Concepts
Tools: Functions that AI agents can call
@mcp_server.tool
def list_clusters() -> dict:
"""List all Databricks clusters"""
# Returns cluster information
Prompts: Reusable instructions for AI agents
# Analyze Cluster Performance
You are an expert at analyzing Databricks clusters...
Transport: How agents communicate with your server
- stdio: Standard input/output (for CLI tools like Claude Desktop)
- HTTP/SSE: Web-based communication (for deployed applications)
How MCP Works
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β AI Agent β β MCP Server β β Databricks β
β (Claude) β β (Your Code) β β Workspace β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
β 1. "List my clusters" β β
βββββββββββββββββββββββββββββββ>β β
β β β
β β 2. Call Databricks API β
β βββββββββββββββββββββββββββββββ>β
β β β
β β 3. Return cluster data β
β β<βββββββββββββββββββββββββββββββ€
β 4. Return formatted results β β
β<βββββββββββββββββββββββββββββββ€ β
β β β
Why Build an MCP Server for Databricks?
Use Cases
1. Natural Language Workspace Management
User: "Show me all running clusters and their costs"
Claude: [Uses list_clusters tool] Here are your 3 running clusters...
2. Automated Troubleshooting
User: "Why did my job fail?"
Claude: [Uses get_job_run tool] The job failed due to a cluster timeout...
3. Data Analysis Assistance
User: "Query my sales table and create a summary"
Claude: [Uses execute_sql tool] Here's the data from your sales table...
4. Infrastructure as Code with AI
User: "Create a cluster for machine learning workloads"
Claude: [Uses create_cluster tool] Created cluster ml-cluster-001...
Benefits
- Unified Interface: One tool for all Databricks operations
- AI-Powered: Natural language instead of remembering API calls
- Extensible: Add custom business logic and integrations
- Secure: Leverages Databricks authentication and Unity Catalog permissions
Prerequisites
Required Software
-
Python 3.11 or higher
python --version # Should show 3.11+ -
uv (Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh -
Databricks CLI
pip install databricks-cli -
Node.js 18+ (optional, for frontend)
node --version # Should show v18+ -
Bun (optional, fast JavaScript runtime)
curl -fsSL https://bun.sh/install | bash
Databricks Requirements
- Databricks Workspace: A Databricks account with workspace access
- Personal Access Token: For authentication (Generate here)
- SQL Warehouse (optional): For SQL execution tools
- Databricks Apps enabled: For production deployment
Knowledge Prerequisites
- Basic Python programming
- Familiarity with REST APIs
- Basic understanding of async/await in Python
- Command line comfort
Architecture Options
Option 1: FastAPI + FastMCP (Recommended for Production)
When to use: Full-stack applications with web UI and MCP integration
βββββββββββββββββββββββββββββββββββββββββββββββ
β Databricks Apps Platform β
β βββββββββββββββββββββββββββββββββββββββββ β
β β Combined Application β β
β β β β
β β ββββββββββββ ββββββββββββββββ β β
β β β FastAPI β β FastMCP β β β
β β β /api/* β β /mcp/* β β β
β β β β β β β β
β β β REST API β β MCP Protocol β β β
β β β Web UI β β AI Tools β β β
β β ββββββ¬ββββββ ββββββββ¬ββββββββ β β
β β ββββββββββ¬ββββββββββββ β β
β β β β β
β β βββββββββββΌββββββββββ β β
β β β Databricks SDK β β β
β β βββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββ
Features:
- Web UI for users + MCP for AI agents
- OAuth authentication via Databricks Apps
- Scalable and highly available
- Unity Catalog permission integration
Option 2: Standalone stdio MCP Server
When to use: CLI/desktop integration (Claude Desktop, VS Code)
ββββββββββββββββ stdio ββββββββββββββββ
β Claude CLI β <ββββββββββββ> β MCP Server β
ββββββββββββββββ ββββββββ¬ββββββββ
β
β API calls
β
ββββββββΌββββββββ
β Databricks β
ββββββββββββββββ
Features:
- Simple deployment
- Direct integration with desktop AI apps
- PAT or CLI profile authentication
- Great for personal productivity
Comparison
| Feature | FastAPI + FastMCP | stdio MCP |
|---|---|---|
| Web UI | β Yes | β No |
| Production Ready | β Yes | β οΈ Personal use |
| OAuth Support | β Yes | β No |
| Complexity | Medium | Low |
| Best For | Teams, production | Individual, CLI |
This guide focuses on FastAPI + FastMCP as it's the most versatile option.
Getting Started
Step 1: Environment Setup
# Create project directory
mkdir my-databricks-mcp
cd my-databricks-mcp
# Initialize uv project
uv init
# Install core dependencies
uv add fastapi uvicorn databricks-sdk fastmcp mcp pyyaml python-dotenv
# Install development dependencies
uv add --dev pytest pytest-asyncio ruff
Step 2: Authenticate with Databricks
# Configure Databricks CLI
databricks configure --token
# Enter your workspace URL and token when prompted
# Workspace URL: https://your-workspace.cloud.databricks.com
# Token: dapi...
# Test authentication
databricks current-user me
Step 3: Create Environment File
# Create .env.local
cat > .env.local << 'EOF'
# Databricks authentication
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# Optional: Default SQL warehouse
DATABRICKS_WAREHOUSE_ID=your-warehouse-id
# Server configuration
DATABRICKS_APP_PORT=8000
LOG_LEVEL=INFO
EOF
Step-by-Step Tutorial
Step 1: Create Project Structure
# Create directory structure
mkdir -p server/routers server/services prompts tests scripts claude_scripts
# Create __init__.py files
touch server/__init__.py
touch server/routers/__init__.py
touch server/services/__init__.py
touch tests/__init__.py
Your structure should look like:
my-databricks-mcp/
βββ server/
β βββ __init__.py
β βββ routers/
β β βββ __init__.py
β βββ services/
β βββ __init__.py
βββ prompts/
βββ tests/
β βββ __init__.py
βββ scripts/
βββ claude_scripts/
βββ .env.local
βββ pyproject.toml
Step 2: Create Databricks Client Wrapper
# server/services/databricks_client.py
import os
from databricks.sdk import WorkspaceClient
from databricks.sdk.errors import DatabricksError
def get_workspace_client() -> WorkspaceClient:
"""Get authenticated Databricks workspace client.
Uses environment variables:
- DATABRICKS_HOST: Workspace URL
- DATABRICKS_TOKEN: Personal access token or auto-injected by Databricks Apps
"""
return WorkspaceClient(
host=os.environ.get('DATABRICKS_HOST'),
token=os.environ.get('DATABRICKS_TOKEN')
)
def verify_authentication() -> dict:
"""Verify Databricks authentication is working."""
try:
client = get_workspace_client()
user = client.current_user.me()
return {
'success': True,
'user_name': user.user_name,
'user_id': user.id,
'workspace_url': client.config.host
}
except DatabricksError as e:
return {
'success': False,
'error': f'Authentication failed: {str(e)}'
}
Step 3: Create Your First Tool
# server/tools.py
import os
from databricks.sdk.errors import DatabricksError, NotFound, PermissionDenied
from server.services.databricks_client import get_workspace_client
def load_tools(mcp_server):
"""Register all MCP tools with the server."""
@mcp_server.tool
def list_clusters() -> dict:
"""List all Databricks clusters in the workspace.
Returns comprehensive cluster information including:
- Cluster ID and name
- Current state (RUNNING, TERMINATED, etc.)
- Size (number of workers)
- Spark version and node types
- Creator and creation time
Returns:
Dictionary with success status and list of clusters
"""
try:
client = get_workspace_client()
clusters = []
for cluster in client.clusters.list():
clusters.append({
'cluster_id': cluster.cluster_id,
'cluster_name': cluster.cluster_name,
'state': cluster.state.value if cluster.state else 'UNKNOWN',
'num_workers': cluster.num_workers,
'spark_version': cluster.spark_version,
'node_type_id': cluster.node_type_id,
'creator_user_name': cluster.creator_user_name,
'start_time': cluster.start_time,
})
return {
'success': True,
'clusters': clusters,
'count': len(clusters)
}
except PermissionDenied:
return {
'success': False,
'error': 'Permission denied to list clusters',
'error_code': 'PERMISSION_DENIED',
'suggestion': 'Ensure you have cluster read permissions in your workspace'
}
except DatabricksError as e:
return {
'success': False,
'error': str(e),
'error_code': 'DATABRICKS_ERROR'
}
except Exception as e:
return {
'success': False,
'error': f'Unexpected error: {str(e)}',
'error_code': 'INTERNAL_ERROR'
}
@mcp_server.tool
def execute_sql(
query: str,
warehouse_id: str = None,
catalog: str = None,
schema: str = None
) -> dict:
"""Execute a SQL query on Databricks SQL warehouse.
Runs the provided SQL statement and returns results in a structured format.
Automatically uses default warehouse from environment if not specified.
Args:
query: SQL statement to execute (SELECT, CREATE, INSERT, etc.)
warehouse_id: SQL warehouse ID (optional, uses DATABRICKS_WAREHOUSE_ID env var if not provided)
catalog: Unity Catalog catalog name (optional, e.g., 'main')
schema: Schema/database name (optional, e.g., 'default')
Returns:
Dictionary with query results including:
- columns: List of column names
- rows: List of row dictionaries
- row_count: Number of rows returned
Example:
execute_sql("SELECT * FROM my_table LIMIT 10")
"""
try:
client = get_workspace_client()
# Use environment variable as fallback
if not warehouse_id:
warehouse_id = os.environ.get('DATABRICKS_WAREHOUSE_ID')
if not warehouse_id:
return {
'success': False,
'error': 'No warehouse_id provided and DATABRICKS_WAREHOUSE_ID not set',
'error_code': 'MISSING_WAREHOUSE_ID',
'suggestion': 'Provide warehouse_id or set DATABRICKS_WAREHOUSE_ID environment variable'
}
# Execute statement
response = client.statement_execution.execute_statement(
warehouse_id=warehouse_id,
statement=query,
catalog=catalog,
schema=schema,
wait_timeout='30s'
)
# Parse results
if response.result and response.result.data_array:
columns = [col.name for col in response.manifest.schema.columns]
rows = []
for row_data in response.result.data_array:
row_dict = {col: row_data[i] for i, col in enumerate(columns)}
rows.append(row_dict)
return {
'success': True,
'columns': columns,
'rows': rows,
'row_count': len(rows),
'statement_id': response.statement_id
}
else:
return {
'success': True,
'message': 'Query executed successfully (no results returned)',
'statement_id': response.statement_id
}
except DatabricksError as e:
return {
'success': False,
'error': str(e),
'error_code': 'DATABRICKS_ERROR',
'query': query
}
except Exception as e:
return {
'success': False,
'error': f'Unexpected error: {str(e)}',
'error_code': 'INTERNAL_ERROR',
'query': query
}
Step 4: Create Prompt Loader
# server/prompts.py
import glob
import os
def load_prompts(mcp_server):
"""Dynamically load prompts from the prompts directory.
Each .md file in prompts/ becomes an MCP prompt with:
- Name: filename without extension
- Description: First line of file (without # prefix)
"""
prompt_dir = 'prompts'
if not os.path.exists(prompt_dir):
print(f"Warning: {prompt_dir} directory not found")
return
prompt_files = glob.glob(f'{prompt_dir}/*.md')
for prompt_file in prompt_files:
# Extract prompt name from filename
prompt_name = os.path.splitext(os.path.basename(prompt_file))[0]
# Read prompt content
with open(prompt_file, 'r', encoding='utf-8') as f:
content = f.read()
# Extract title from first line
lines = content.strip().split('\n')
title = lines[0].strip().lstrip('#').strip() if lines else prompt_name
# Create closure to capture values properly
def make_prompt_handler(prompt_content, name, desc):
@mcp_server.prompt(name=name, description=desc)
async def handle_prompt():
return prompt_content
return handle_prompt
# Register prompt
make_prompt_handler(content, prompt_name, title)
print(f"Loaded prompt: {prompt_name}")
Step 5: Create Example Prompt
# Create prompts/analyze_cluster.md
cat > prompts/analyze_cluster.md << 'EOF'
# Analyze Cluster Performance
You are an expert at analyzing Databricks cluster configurations and identifying optimization opportunities.
## Your Task
When a user asks you to analyze a cluster, you should:
1. **Use the `list_clusters` tool** to get current cluster information
2. **Review the configuration** for each cluster:
- Node type and count
- Spark version
- Current state
- Runtime duration
3. **Identify optimization opportunities**:
- Oversized clusters (too many workers for workload)
- Undersized clusters (bottlenecks)
- Old Spark versions (missing optimizations)
- Long-running idle clusters (cost waste)
4. **Provide recommendations** with:
- Expected cost savings
- Performance impact
- Implementation steps
- Risk level (Low/Medium/High)
## Example Analysis
**Cluster**: data-processing-cluster
**Current**: 10 workers, i3.2xlarge, Spark 11.3
**Issues**: Oversized (avg CPU < 30%), old Spark version
**Recommendation**: Reduce to 6 workers, upgrade to Spark 13.3
**Savings**: ~$2,400/month
**Risk**: Low (test with smaller dataset first)
EOF
Step 6: Create Main Application
# server/app.py
import os
from pathlib import Path
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.staticfiles import StaticFiles
from fastmcp import FastMCP
import yaml
from server.tools import load_tools
from server.prompts import load_prompts
# Load environment variables for local development
env_file = Path('.env.local')
if env_file.exists():
from dotenv import load_dotenv
load_dotenv(env_file)
# Load configuration
config = yaml.safe_load(open('config.yaml')) if os.path.exists('config.yaml') else {}
servername = config.get('servername', 'databricks-mcp')
# Create MCP server
mcp_server = FastMCP(name=servername)
# Register tools and prompts
load_tools(mcp_server)
load_prompts(mcp_server)
# Create ASGI app for MCP
mcp_asgi_app = mcp_server.http_app()
# Create FastAPI app
app = FastAPI(
title="Databricks MCP Server",
description="MCP server for Databricks workspace management",
version="0.1.0",
lifespan=mcp_asgi_app.lifespan,
)
# CORS middleware for frontend development
app.add_middleware(
CORSMiddleware,
allow_origins=["http://localhost:5173", "http://localhost:3000"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Health check endpoint
@app.get('/api/health')
def health_check():
"""Check if server and Databricks connection are healthy."""
from server.services.databricks_client import verify_authentication
auth_status = verify_authentication()
return {
'status': 'healthy' if auth_status['success'] else 'unhealthy',
'databricks': auth_status
}
# Combine MCP and FastAPI routes
combined_app = FastAPI(
title="Combined MCP App",
routes=[
*mcp_asgi_app.routes, # MCP routes at /mcp/*
*app.routes, # API routes
],
lifespan=mcp_asgi_app.lifespan,
)
# Entry point for uvicorn
if __name__ == "__main__":
import uvicorn
port = int(os.environ.get('DATABRICKS_APP_PORT', 8000))
uvicorn.run(combined_app, host="0.0.0.0", port=port)
Step 7: Create Configuration File
# Create config.yaml
cat > config.yaml << 'EOF'
servername: my-databricks-mcp
EOF
Step 8: Test Locally
# Load environment variables
source .env.local
# Start the server
uv run uvicorn server.app:combined_app --reload --port 8000
Open another terminal and test:
# Test health endpoint
curl http://localhost:8000/api/health
# Test MCP initialization
curl -X POST http://localhost:8000/mcp/ \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": {"name": "test-client"}
}
}'
# List available tools
curl -X POST http://localhost:8000/mcp/ \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 2,
"method": "tools/list"
}'
Configuration Files
pyproject.toml
[project]
name = "my-databricks-mcp"
version = "0.1.0"
description = "MCP server for Databricks integration"
requires-python = ">=3.11"
dependencies = [
"fastapi>=0.104.1",
"uvicorn[standard]>=0.24.0",
"databricks-sdk==0.59.0",
"fastmcp>=0.2.0",
"mcp>=1.12.0",
"pyyaml>=6.0.2",
"python-dotenv>=1.0.0",
]
[project.optional-dependencies]
dev = [
"pytest>=7.4.0",
"pytest-asyncio>=0.21.0",
"ruff>=0.1.0",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.ruff]
line-length = 100
target-version = "py311"
app.yaml (for Databricks Apps deployment)
command: ["python", "-m", "server.app"]
source_code_path: "."
environment:
# Automatically injected by Databricks Apps
- name: DATABRICKS_HOST
value_from: workspace
- name: DATABRICKS_TOKEN
value_from: pat
- name: DATABRICKS_APP_PORT
value_from: app_port
# Custom environment variables
- name: LOG_LEVEL
value: INFO
.gitignore
cat > .gitignore << 'EOF'
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
.venv
.uv
# Environment
.env
.env.local
.env.*.local
# IDE
.vscode/
.idea/
*.swp
*.swo
# Testing
.pytest_cache/
.coverage
htmlcov/
# Distribution
dist/
build/
*.egg-info/
# Databricks
.databricks/
EOF
Development Workflow
Development Scripts
Create helper scripts for common tasks:
# scripts/watch.sh - Start development server with hot reload
cat > scripts/watch.sh << 'EOF'
#!/bin/bash
source .env.local
export DATABRICKS_HOST
export DATABRICKS_TOKEN
uv run uvicorn server.app:combined_app --reload --port 8000
EOF
chmod +x scripts/watch.sh
# scripts/fix.sh - Format and lint code
cat > scripts/fix.sh << 'EOF'
#!/bin/bash
uv run ruff format server/ tests/
uv run ruff check --fix server/ tests/
EOF
chmod +x scripts/fix.sh
# scripts/test.sh - Run tests
cat > scripts/test.sh << 'EOF'
#!/bin/bash
uv run pytest tests/ -v
EOF
chmod +x scripts/test.sh
Common Commands
# Start development server
./scripts/watch.sh
# Format code
./scripts/fix.sh
# Run tests
./scripts/test.sh
# Install new dependency
uv add package-name
# Remove dependency
uv remove package-name
Testing Your Server
MCP Inspector (Interactive Testing)
The best way to test your MCP server interactively:
# Create testing script
cat > claude_scripts/inspect_local_mcp.sh << 'EOF'
#!/bin/bash
source .env.local
export DATABRICKS_HOST
export DATABRICKS_TOKEN
# Launch MCP Inspector
npx @modelcontextprotocol/inspector \
uv run uvicorn server.app:combined_app --port 8000
EOF
chmod +x claude_scripts/inspect_local_mcp.sh
# Run inspector
./claude_scripts/inspect_local_mcp.sh
This opens a web interface where you can:
- See all available tools and prompts
- Call tools with custom parameters
- View responses in real-time
- Debug errors interactively
curl Testing
# Create curl test script
cat > claude_scripts/test_local_mcp_curl.sh << 'EOF'
#!/bin/bash
source .env.local
export DATABRICKS_HOST
export DATABRICKS_TOKEN
# Start server in background
uv run uvicorn server.app:combined_app --port 8000 &
SERVER_PID=$!
sleep 3
echo "Testing MCP server..."
# Initialize session
echo "1. Initialize session"
curl -s -X POST http://localhost:8000/mcp/ \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": {"name": "test-client"}
}
}' | jq '.'
# List tools
echo -e "\n2. List tools"
curl -s -X POST http://localhost:8000/mcp/ \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 2,
"method": "tools/list"
}' | jq '.result.tools[] | {name: .name, description: .description}'
# Call list_clusters tool
echo -e "\n3. Call list_clusters tool"
curl -s -X POST http://localhost:8000/mcp/ \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "list_clusters",
"arguments": {}
}
}' | jq '.'
# Cleanup
kill $SERVER_PID
EOF
chmod +x claude_scripts/test_local_mcp_curl.sh
# Run test
./claude_scripts/test_local_mcp_curl.sh
Deploying to Databricks Apps
Step 1: Prepare for Deployment
# Create deployment script
cat > scripts/deploy.sh << 'EOF'
#!/bin/bash
set -e
echo "π Deploying to Databricks Apps..."
# Load environment
source .env.local
# Generate requirements.txt from pyproject.toml
echo "π¦ Generating requirements.txt..."
uv pip compile pyproject.toml -o requirements.txt
# Deploy via Databricks CLI
echo "π§ Deploying application..."
databricks apps deploy my-databricks-mcp \
--source-code-path . \
--config-file app.yaml
# Get app URL
echo "β
Deployment complete!"
APP_URL=$(databricks apps get my-databricks-mcp --json | jq -r '.url')
echo "π App URL: $APP_URL"
echo "π MCP Endpoint: $APP_URL/mcp/"
echo "π Logs: $APP_URL/logz"
EOF
chmod +x scripts/deploy.sh
Step 2: Deploy
# Run deployment
./scripts/deploy.sh
Step 3: Verify Deployment
# Check app status
databricks apps get my-databricks-mcp
# View logs
databricks apps logs my-databricks-mcp
Step 4: Test Remote Server
# Create remote testing script
cat > claude_scripts/test_remote_mcp_curl.sh << 'EOF'
#!/bin/bash
source .env.local
# Get app URL
APP_URL=$(databricks apps get my-databricks-mcp --json | jq -r '.url')
MCP_URL="${APP_URL}/mcp/"
echo "Testing remote MCP server at: $MCP_URL"
# Get OAuth token
TOKEN=$(databricks auth token --host "$DATABRICKS_HOST" | jq -r '.access_token')
# Initialize session
INIT_RESPONSE=$(curl -s -X POST "$MCP_URL" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": {"name": "test-client"}
}
}')
echo "Initialize response:"
echo "$INIT_RESPONSE" | jq '.'
# Extract session ID
SESSION_ID=$(echo "$INIT_RESPONSE" | jq -r '.result.sessionId // empty')
# List tools
echo -e "\nListing tools:"
curl -s -X POST "$MCP_URL" \
-H "Authorization: Bearer $TOKEN" \
-H "mcp-session-id: $SESSION_ID" \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 2,
"method": "tools/list"
}' | jq '.result.tools[] | {name: .name, description: .description}'
EOF
chmod +x claude_scripts/test_remote_mcp_curl.sh
# Test remote deployment
./claude_scripts/test_remote_mcp_curl.sh
Connecting to Claude CLI
Understanding MCP Proxy
Databricks Apps use OAuth authentication, which requires a proxy to connect with Claude CLI. The proxy handles:
- Getting OAuth tokens from Databricks CLI
- Managing MCP session initialization
- Translating stdio (Claude) β HTTP/SSE (Databricks Apps)
Step 1: Create MCP Proxy
# server/proxy.py
import os
import sys
import json
import subprocess
import requests
def get_oauth_token(databricks_host):
"""Get OAuth token via Databricks CLI."""
result = subprocess.run(
['databricks', 'auth', 'token', '--host', databricks_host],
capture_output=True,
text=True,
check=True
)
return json.loads(result.stdout).get('access_token')
class MCPProxy:
def __init__(self, databricks_host, app_url):
self.databricks_host = databricks_host
self.app_url = app_url if app_url.endswith('/mcp/') else app_url + '/mcp/'
self.session = requests.Session()
self._oauth_token = None
self._session_id = None
def _get_token(self):
"""Get or refresh OAuth token."""
if not self._oauth_token:
self._oauth_token = get_oauth_token(self.databricks_host)
return self._oauth_token
def _initialize_session(self):
"""Initialize MCP session with server."""
token = self._get_token()
headers = {
'Authorization': f'Bearer {token}',
'Content-Type': 'application/json'
}
# Get session ID from initial request
response = self.session.get(self.app_url, headers=headers)
self._session_id = response.headers.get('mcp-session-id')
# Send initialize request
init_request = {
'jsonrpc': '2.0',
'method': 'initialize',
'params': {
'protocolVersion': '2024-11-05',
'capabilities': {},
'clientInfo': {'name': 'databricks-mcp-proxy'}
}
}
self.session.post(self.app_url, headers=headers, json=init_request)
def proxy_request(self, request_data):
"""Forward MCP request to server."""
if not self._session_id:
self._initialize_session()
token = self._get_token()
headers = {
'Authorization': f'Bearer {token}',
'mcp-session-id': self._session_id,
'Content-Type': 'application/json'
}
response = self.session.post(self.app_url, headers=headers, json=request_data)
return response.json()
def run(self):
"""Run proxy stdio loop."""
for line in sys.stdin:
request = json.loads(line)
response = self.proxy_request(request)
print(json.dumps(response))
sys.stdout.flush()
def main():
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--databricks-host', required=True)
parser.add_argument('--databricks-app-url', required=True)
args = parser.parse_args()
proxy = MCPProxy(args.databricks_host, args.databricks_app_url)
proxy.run()
if __name__ == '__main__':
main()
Step 2: Update pyproject.toml
Add proxy as script entry point:
[project.scripts]
dbx-mcp-proxy = "server.proxy:main"
Step 3: Add to Claude CLI
# Set environment variables
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export APP_URL=$(databricks apps get my-databricks-mcp --json | jq -r '.url')
export DATABRICKS_APP_URL="${APP_URL}/mcp/"
# Add to Claude CLI
claude mcp add my-databricks-mcp --scope user -- \
uv run dbx-mcp-proxy \
--databricks-host "$DATABRICKS_HOST" \
--databricks-app-url "$DATABRICKS_APP_URL"
Step 4: Verify Connection
# Start Claude CLI
claude
# In Claude, try:
# "List my Databricks clusters"
# Claude should use your list_clusters tool!
Troubleshooting
Issue: Authentication Failed
Symptoms: Authentication failed or 401 Unauthorized
Solutions:
-
Verify environment variables:
echo $DATABRICKS_HOST echo $DATABRICKS_TOKEN -
Test authentication directly:
databricks current-user me -
Regenerate token if expired
Issue: MCP Tools Not Showing
Symptoms: tools/list returns empty array
Solutions:
-
Check tools are loaded:
# Add debug logging in server/app.py load_tools(mcp_server) print(f"Loaded tools: {list(mcp_server._tools.keys())}") -
Verify decorator syntax:
@mcp_server.tool # Correct def my_tool(): pass # Not @tool or @mcp.tool
Issue: MCP Endpoint Not Found
Symptoms: 404 Not Found on /mcp/ endpoint
Solutions:
-
Ensure MCP routes are first:
combined_app = FastAPI( routes=[ *mcp_asgi_app.routes, # MCP MUST be first *app.routes, ], lifespan=mcp_asgi_app.lifespan, ) -
Check URL has trailing slash:
/mcp/not/mcp
Issue: Deployment Fails
Symptoms: databricks apps deploy fails
Solutions:
- Verify
app.yamlexists and is valid - Check source code path is correct:
.for current directory - Ensure
requirements.txtis generated:uv pip compile pyproject.toml -o requirements.txt
Issue: Proxy Connection Fails
Symptoms: Claude CLI can't connect to server
Solutions:
-
Verify Databricks CLI authentication:
databricks auth token --host $DATABRICKS_HOST -
Check app URL is correct and ends with
/mcp/:echo $DATABRICKS_APP_URL # Should be: https://app-name.databricksapps.com/mcp/ -
Test proxy manually:
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' | \ uv run dbx-mcp-proxy \ --databricks-host $DATABRICKS_HOST \ --databricks-app-url $DATABRICKS_APP_URL
Next Steps
Add More Tools
Expand your MCP server with additional Databricks operations:
Job Management:
@mcp_server.tool
def run_job(job_id: str, parameters: dict = None) -> dict:
"""Trigger a Databricks job run."""
@mcp_server.tool
def get_job_run_status(run_id: str) -> dict:
"""Get the status of a job run."""
Unity Catalog:
@mcp_server.tool
def list_catalogs() -> dict:
"""List all Unity Catalog catalogs."""
@mcp_server.tool
def create_catalog(name: str, comment: str = None) -> dict:
"""Create a new Unity Catalog catalog."""
Notebook Operations:
@mcp_server.tool
def list_notebooks(path: str) -> dict:
"""List notebooks in a workspace directory."""
@mcp_server.tool
def export_notebook(path: str, format: str = "SOURCE") -> dict:
"""Export a notebook in specified format."""
Add Web UI
Create a React or Vue frontend:
# Create React app
cd client
npm create vite@latest . -- --template react-ts
npm install
# Update package.json scripts
"scripts": {
"dev": "vite",
"build": "vite build",
"preview": "vite preview"
}
# Start development
npm run dev
Add Tests
Create comprehensive test suite:
# tests/test_tools.py
import pytest
from unittest.mock import Mock, patch
from server.tools import load_tools
from fastmcp import FastMCP
@pytest.fixture
def mcp_server():
server = FastMCP(name="test")
load_tools(server)
return server
def test_list_clusters_success(mcp_server):
"""Test successful cluster listing."""
with patch('server.tools.get_workspace_client') as mock_client:
# Setup mock
mock_cluster = Mock()
mock_cluster.cluster_id = 'test-123'
mock_cluster.cluster_name = 'test-cluster'
mock_cluster.state.value = 'RUNNING'
mock_client.return_value.clusters.list.return_value = [mock_cluster]
# Call tool
tool = mcp_server._tools['list_clusters']
result = tool.func()
# Assertions
assert result['success'] is True
assert len(result['clusters']) == 1
Optimize for Production
Add Caching:
from functools import lru_cache
from datetime import datetime, timedelta
_cache_time = None
_cache_data = None
@mcp_server.tool
def list_clusters_cached() -> dict:
"""List clusters with 5-minute cache."""
global _cache_time, _cache_data
now = datetime.now()
if _cache_data and _cache_time and (now - _cache_time) < timedelta(minutes=5):
return _cache_data
result = list_clusters()
_cache_data = result
_cache_time = now
return result
Add Logging:
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@mcp_server.tool
def list_clusters() -> dict:
"""List clusters with logging."""
logger.info("Listing clusters...")
try:
result = # ... cluster listing logic
logger.info(f"Found {len(result['clusters'])} clusters")
return result
except Exception as e:
logger.error(f"Failed to list clusters: {e}")
raise
Resources
Official Documentation
- MCP Protocol: https://modelcontextprotocol.io/
- Databricks SDK: https://databricks-sdk-py.readthedocs.io/
- FastAPI: https://fastapi.tiangolo.com/
- FastMCP: https://github.com/jlowin/fastmcp
- Databricks Apps: https://docs.databricks.com/en/dev-tools/databricks-apps/
Example Projects
- Databricks MCP Examples: https://github.com/databricks/databricks-mcp-examples
- Custom MCP Template: Based on the projects in this guide
Community
- MCP Discord: https://discord.gg/mcp
- Databricks Community: https://community.databricks.com/
Tools
- MCP Inspector:
npx @modelcontextprotocol/inspector - Databricks CLI:
pip install databricks-cli - uv Package Manager: https://github.com/astral-sh/uv
Conclusion
You've now built a production-ready MCP server for Databricks! You can:
β Create tools that call Databricks APIs β Load prompts from markdown files β Deploy to Databricks Apps β Connect to Claude CLI with OAuth authentication
Next Steps:
- Add more tools for your specific use cases
- Create custom prompts for your workflows
- Build a web UI for non-AI users
- Share your MCP server with your team
Happy building! π
