📦

Databricks MCP Guide

No description available

0 installs

Trust: 30 — Low

Devtools

Ask AI about Databricks MCP Guide

I know everything about Databricks MCP Guide. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Building Custom MCP Servers for Databricks

A comprehensive guide to creating production-ready MCP (Model Context Protocol) servers that integrate with Databricks using FastAPI and the Databricks SDK.

Introduction
What is MCP?
Why Build an MCP Server for Databricks?
Prerequisites
Architecture Options
Getting Started
Step-by-Step Tutorial
Configuration Files
Development Workflow
Testing Your Server
Deploying to Databricks Apps
Connecting to Claude CLI
Troubleshooting
Next Steps
Resources

Introduction

This guide will walk you through creating a custom MCP server that enables AI agents like Claude to interact with your Databricks workspace. You'll learn how to:

Set up a FastAPI + FastMCP application
Create tools that call Databricks APIs
Deploy your server to Databricks Apps
Connect it to Claude CLI for AI-powered workspace management

By the end of this tutorial, you'll have a working MCP server that can list clusters, execute SQL queries, and run jobs—all through natural language commands to an AI agent.

What is MCP?

Model Context Protocol (MCP) is an open standard that connects AI agents to tools, data sources, and contextual information. Think of it as a universal adapter that lets AI assistants like Claude interact with your services in a standardized way.

Key Concepts

Tools: Functions that AI agents can call

@mcp_server.tool
def list_clusters() -> dict:
    """List all Databricks clusters"""
    # Returns cluster information

Prompts: Reusable instructions for AI agents

# Analyze Cluster Performance
You are an expert at analyzing Databricks clusters...

Transport: How agents communicate with your server

stdio: Standard input/output (for CLI tools like Claude Desktop)
HTTP/SSE: Web-based communication (for deployed applications)

How MCP Works

┌──────────────┐                ┌──────────────┐                ┌──────────────┐
│  AI Agent    │                │  MCP Server  │                │  Databricks  │
│  (Claude)    │                │  (Your Code) │                │  Workspace   │
└──────┬───────┘                └──────┬───────┘                └──────┬───────┘
       │                               │                               │
       │  1. "List my clusters"        │                               │
       ├──────────────────────────────>│                               │
       │                               │                               │
       │                               │  2. Call Databricks API       │
       │                               ├──────────────────────────────>│
       │                               │                               │
       │                               │  3. Return cluster data       │
       │                               │<──────────────────────────────┤
       │  4. Return formatted results  │                               │
       │<──────────────────────────────┤                               │
       │                               │                               │

Why Build an MCP Server for Databricks?

Use Cases

1. Natural Language Workspace Management

User: "Show me all running clusters and their costs"
Claude: [Uses list_clusters tool] Here are your 3 running clusters...

2. Automated Troubleshooting

User: "Why did my job fail?"
Claude: [Uses get_job_run tool] The job failed due to a cluster timeout...

3. Data Analysis Assistance

User: "Query my sales table and create a summary"
Claude: [Uses execute_sql tool] Here's the data from your sales table...

4. Infrastructure as Code with AI

User: "Create a cluster for machine learning workloads"
Claude: [Uses create_cluster tool] Created cluster ml-cluster-001...

Benefits

Unified Interface: One tool for all Databricks operations
AI-Powered: Natural language instead of remembering API calls
Extensible: Add custom business logic and integrations
Secure: Leverages Databricks authentication and Unity Catalog permissions

Prerequisites

Required Software

Python 3.11 or higher
```
python --version  # Should show 3.11+
```

uv (Python package manager)

curl -LsSf https://astral.sh/uv/install.sh | sh

Databricks CLI
```
pip install databricks-cli
```
Node.js 18+ (optional, for frontend)
```
node --version  # Should show v18+
```

Bun (optional, fast JavaScript runtime)

curl -fsSL https://bun.sh/install | bash

Databricks Requirements

Databricks Workspace: A Databricks account with workspace access
Personal Access Token: For authentication (Generate here)
SQL Warehouse (optional): For SQL execution tools
Databricks Apps enabled: For production deployment

Knowledge Prerequisites

Basic Python programming
Familiarity with REST APIs
Basic understanding of async/await in Python
Command line comfort

Architecture Options

Option 1: FastAPI + FastMCP (Recommended for Production)

When to use: Full-stack applications with web UI and MCP integration

┌─────────────────────────────────────────────┐
│         Databricks Apps Platform            │
│  ┌───────────────────────────────────────┐  │
│  │        Combined Application           │  │
│  │                                       │  │
│  │  ┌──────────┐      ┌──────────────┐  │  │
│  │  │ FastAPI  │      │   FastMCP    │  │  │
│  │  │  /api/*  │      │   /mcp/*     │  │  │
│  │  │          │      │              │  │  │
│  │  │ REST API │      │ MCP Protocol │  │  │
│  │  │ Web UI   │      │ AI Tools     │  │  │
│  │  └────┬─────┘      └──────┬───────┘  │  │
│  │       └────────┬───────────┘          │  │
│  │                │                      │  │
│  │      ┌─────────▼─────────┐            │  │
│  │      │  Databricks SDK   │            │  │
│  │      └───────────────────┘            │  │
│  └───────────────────────────────────────┘  │
└─────────────────────────────────────────────┘

Features:

Web UI for users + MCP for AI agents
OAuth authentication via Databricks Apps
Scalable and highly available
Unity Catalog permission integration

Option 2: Standalone stdio MCP Server

When to use: CLI/desktop integration (Claude Desktop, VS Code)

┌──────────────┐      stdio      ┌──────────────┐
│  Claude CLI  │ <────────────> │  MCP Server  │
└──────────────┘                 └──────┬───────┘
                                        │
                                        │ API calls
                                        │
                                 ┌──────▼───────┐
                                 │  Databricks  │
                                 └──────────────┘

Features:

Simple deployment
Direct integration with desktop AI apps
PAT or CLI profile authentication
Great for personal productivity

Comparison

Feature	FastAPI + FastMCP	stdio MCP
Web UI	✅ Yes	❌ No
Production Ready	✅ Yes	⚠️ Personal use
OAuth Support	✅ Yes	❌ No
Complexity	Medium	Low
Best For	Teams, production	Individual, CLI

This guide focuses on FastAPI + FastMCP as it's the most versatile option.

Getting Started

Step 1: Environment Setup

# Create project directory
mkdir my-databricks-mcp
cd my-databricks-mcp

# Initialize uv project
uv init

# Install core dependencies
uv add fastapi uvicorn databricks-sdk fastmcp mcp pyyaml python-dotenv

# Install development dependencies
uv add --dev pytest pytest-asyncio ruff

Step 2: Authenticate with Databricks

# Configure Databricks CLI
databricks configure --token

# Enter your workspace URL and token when prompted
# Workspace URL: https://your-workspace.cloud.databricks.com
# Token: dapi...

# Test authentication
databricks current-user me

Step 3: Create Environment File

# Create .env.local
cat > .env.local << 'EOF'
# Databricks authentication
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXX

# Optional: Default SQL warehouse
DATABRICKS_WAREHOUSE_ID=your-warehouse-id

# Server configuration
DATABRICKS_APP_PORT=8000
LOG_LEVEL=INFO
EOF

Step-by-Step Tutorial

Step 1: Create Project Structure

# Create directory structure
mkdir -p server/routers server/services prompts tests scripts claude_scripts

# Create __init__.py files
touch server/__init__.py
touch server/routers/__init__.py
touch server/services/__init__.py
touch tests/__init__.py

Your structure should look like:

my-databricks-mcp/
├── server/
│   ├── __init__.py
│   ├── routers/
│   │   └── __init__.py
│   └── services/
│       └── __init__.py
├── prompts/
├── tests/
│   └── __init__.py
├── scripts/
├── claude_scripts/
├── .env.local
└── pyproject.toml

Step 2: Create Databricks Client Wrapper

# server/services/databricks_client.py
import os
from databricks.sdk import WorkspaceClient
from databricks.sdk.errors import DatabricksError

def get_workspace_client() -> WorkspaceClient:
    """Get authenticated Databricks workspace client.

    Uses environment variables:
    - DATABRICKS_HOST: Workspace URL
    - DATABRICKS_TOKEN: Personal access token or auto-injected by Databricks Apps
    """
    return WorkspaceClient(
        host=os.environ.get('DATABRICKS_HOST'),
        token=os.environ.get('DATABRICKS_TOKEN')
    )

def verify_authentication() -> dict:
    """Verify Databricks authentication is working."""
    try:
        client = get_workspace_client()
        user = client.current_user.me()
        return {
            'success': True,
            'user_name': user.user_name,
            'user_id': user.id,
            'workspace_url': client.config.host
        }
    except DatabricksError as e:
        return {
            'success': False,
            'error': f'Authentication failed: {str(e)}'
        }

Step 3: Create Your First Tool

# server/tools.py
import os
from databricks.sdk.errors import DatabricksError, NotFound, PermissionDenied
from server.services.databricks_client import get_workspace_client

def load_tools(mcp_server):
    """Register all MCP tools with the server."""

    @mcp_server.tool
    def list_clusters() -> dict:
        """List all Databricks clusters in the workspace.

        Returns comprehensive cluster information including:
        - Cluster ID and name
        - Current state (RUNNING, TERMINATED, etc.)
        - Size (number of workers)
        - Spark version and node types
        - Creator and creation time

        Returns:
            Dictionary with success status and list of clusters
        """
        try:
            client = get_workspace_client()
            clusters = []

            for cluster in client.clusters.list():
                clusters.append({
                    'cluster_id': cluster.cluster_id,
                    'cluster_name': cluster.cluster_name,
                    'state': cluster.state.value if cluster.state else 'UNKNOWN',
                    'num_workers': cluster.num_workers,
                    'spark_version': cluster.spark_version,
                    'node_type_id': cluster.node_type_id,
                    'creator_user_name': cluster.creator_user_name,
                    'start_time': cluster.start_time,
                })

            return {
                'success': True,
                'clusters': clusters,
                'count': len(clusters)
            }
        except PermissionDenied:
            return {
                'success': False,
                'error': 'Permission denied to list clusters',
                'error_code': 'PERMISSION_DENIED',
                'suggestion': 'Ensure you have cluster read permissions in your workspace'
            }
        except DatabricksError as e:
            return {
                'success': False,
                'error': str(e),
                'error_code': 'DATABRICKS_ERROR'
            }
        except Exception as e:
            return {
                'success': False,
                'error': f'Unexpected error: {str(e)}',
                'error_code': 'INTERNAL_ERROR'
            }

    @mcp_server.tool
    def execute_sql(
        query: str,
        warehouse_id: str = None,
        catalog: str = None,
        schema: str = None
    ) -> dict:
        """Execute a SQL query on Databricks SQL warehouse.

        Runs the provided SQL statement and returns results in a structured format.
        Automatically uses default warehouse from environment if not specified.

        Args:
            query: SQL statement to execute (SELECT, CREATE, INSERT, etc.)
            warehouse_id: SQL warehouse ID (optional, uses DATABRICKS_WAREHOUSE_ID env var if not provided)
            catalog: Unity Catalog catalog name (optional, e.g., 'main')
            schema: Schema/database name (optional, e.g., 'default')

        Returns:
            Dictionary with query results including:
            - columns: List of column names
            - rows: List of row dictionaries
            - row_count: Number of rows returned

        Example:
            execute_sql("SELECT * FROM my_table LIMIT 10")
        """
        try:
            client = get_workspace_client()

            # Use environment variable as fallback
            if not warehouse_id:
                warehouse_id = os.environ.get('DATABRICKS_WAREHOUSE_ID')
                if not warehouse_id:
                    return {
                        'success': False,
                        'error': 'No warehouse_id provided and DATABRICKS_WAREHOUSE_ID not set',
                        'error_code': 'MISSING_WAREHOUSE_ID',
                        'suggestion': 'Provide warehouse_id or set DATABRICKS_WAREHOUSE_ID environment variable'
                    }

            # Execute statement
            response = client.statement_execution.execute_statement(
                warehouse_id=warehouse_id,
                statement=query,
                catalog=catalog,
                schema=schema,
                wait_timeout='30s'
            )

            # Parse results
            if response.result and response.result.data_array:
                columns = [col.name for col in response.manifest.schema.columns]
                rows = []
                for row_data in response.result.data_array:
                    row_dict = {col: row_data[i] for i, col in enumerate(columns)}
                    rows.append(row_dict)

                return {
                    'success': True,
                    'columns': columns,
                    'rows': rows,
                    'row_count': len(rows),
                    'statement_id': response.statement_id
                }
            else:
                return {
                    'success': True,
                    'message': 'Query executed successfully (no results returned)',
                    'statement_id': response.statement_id
                }

        except DatabricksError as e:
            return {
                'success': False,
                'error': str(e),
                'error_code': 'DATABRICKS_ERROR',
                'query': query
            }
        except Exception as e:
            return {
                'success': False,
                'error': f'Unexpected error: {str(e)}',
                'error_code': 'INTERNAL_ERROR',
                'query': query
            }

Step 4: Create Prompt Loader

# server/prompts.py
import glob
import os

def load_prompts(mcp_server):
    """Dynamically load prompts from the prompts directory.

    Each .md file in prompts/ becomes an MCP prompt with:
    - Name: filename without extension
    - Description: First line of file (without # prefix)
    """
    prompt_dir = 'prompts'

    if not os.path.exists(prompt_dir):
        print(f"Warning: {prompt_dir} directory not found")
        return

    prompt_files = glob.glob(f'{prompt_dir}/*.md')

    for prompt_file in prompt_files:
        # Extract prompt name from filename
        prompt_name = os.path.splitext(os.path.basename(prompt_file))[0]

        # Read prompt content
        with open(prompt_file, 'r', encoding='utf-8') as f:
            content = f.read()

        # Extract title from first line
        lines = content.strip().split('\n')
        title = lines[0].strip().lstrip('#').strip() if lines else prompt_name

        # Create closure to capture values properly
        def make_prompt_handler(prompt_content, name, desc):
            @mcp_server.prompt(name=name, description=desc)
            async def handle_prompt():
                return prompt_content
            return handle_prompt

        # Register prompt
        make_prompt_handler(content, prompt_name, title)
        print(f"Loaded prompt: {prompt_name}")

Step 5: Create Example Prompt

# Create prompts/analyze_cluster.md
cat > prompts/analyze_cluster.md << 'EOF'
# Analyze Cluster Performance

You are an expert at analyzing Databricks cluster configurations and identifying optimization opportunities.

## Your Task

When a user asks you to analyze a cluster, you should:

1. **Use the `list_clusters` tool** to get current cluster information
2. **Review the configuration** for each cluster:
   - Node type and count
   - Spark version
   - Current state
   - Runtime duration

3. **Identify optimization opportunities**:
   - Oversized clusters (too many workers for workload)
   - Undersized clusters (bottlenecks)
   - Old Spark versions (missing optimizations)
   - Long-running idle clusters (cost waste)

4. **Provide recommendations** with:
   - Expected cost savings
   - Performance impact
   - Implementation steps
   - Risk level (Low/Medium/High)

## Example Analysis

**Cluster**: data-processing-cluster
**Current**: 10 workers, i3.2xlarge, Spark 11.3
**Issues**: Oversized (avg CPU < 30%), old Spark version
**Recommendation**: Reduce to 6 workers, upgrade to Spark 13.3
**Savings**: ~$2,400/month
**Risk**: Low (test with smaller dataset first)
EOF

Step 6: Create Main Application

# server/app.py
import os
from pathlib import Path
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.staticfiles import StaticFiles
from fastmcp import FastMCP
import yaml

from server.tools import load_tools
from server.prompts import load_prompts

# Load environment variables for local development
env_file = Path('.env.local')
if env_file.exists():
    from dotenv import load_dotenv
    load_dotenv(env_file)

# Load configuration
config = yaml.safe_load(open('config.yaml')) if os.path.exists('config.yaml') else {}
servername = config.get('servername', 'databricks-mcp')

# Create MCP server
mcp_server = FastMCP(name=servername)

# Register tools and prompts
load_tools(mcp_server)
load_prompts(mcp_server)

# Create ASGI app for MCP
mcp_asgi_app = mcp_server.http_app()

# Create FastAPI app
app = FastAPI(
    title="Databricks MCP Server",
    description="MCP server for Databricks workspace management",
    version="0.1.0",
    lifespan=mcp_asgi_app.lifespan,
)

# CORS middleware for frontend development
app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:5173", "http://localhost:3000"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Health check endpoint
@app.get('/api/health')
def health_check():
    """Check if server and Databricks connection are healthy."""
    from server.services.databricks_client import verify_authentication
    auth_status = verify_authentication()
    return {
        'status': 'healthy' if auth_status['success'] else 'unhealthy',
        'databricks': auth_status
    }

# Combine MCP and FastAPI routes
combined_app = FastAPI(
    title="Combined MCP App",
    routes=[
        *mcp_asgi_app.routes,  # MCP routes at /mcp/*
        *app.routes,           # API routes
    ],
    lifespan=mcp_asgi_app.lifespan,
)

# Entry point for uvicorn
if __name__ == "__main__":
    import uvicorn
    port = int(os.environ.get('DATABRICKS_APP_PORT', 8000))
    uvicorn.run(combined_app, host="0.0.0.0", port=port)

Step 7: Create Configuration File

# Create config.yaml
cat > config.yaml << 'EOF'
servername: my-databricks-mcp
EOF

Step 8: Test Locally

# Load environment variables
source .env.local

# Start the server
uv run uvicorn server.app:combined_app --reload --port 8000

Open another terminal and test:

# Test health endpoint
curl http://localhost:8000/api/health

# Test MCP initialization
curl -X POST http://localhost:8000/mcp/ \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2024-11-05",
      "capabilities": {},
      "clientInfo": {"name": "test-client"}
    }
  }'

# List available tools
curl -X POST http://localhost:8000/mcp/ \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/list"
  }'

Configuration Files

pyproject.toml

[project]
name = "my-databricks-mcp"
version = "0.1.0"
description = "MCP server for Databricks integration"
requires-python = ">=3.11"
dependencies = [
    "fastapi>=0.104.1",
    "uvicorn[standard]>=0.24.0",
    "databricks-sdk==0.59.0",
    "fastmcp>=0.2.0",
    "mcp>=1.12.0",
    "pyyaml>=6.0.2",
    "python-dotenv>=1.0.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=7.4.0",
    "pytest-asyncio>=0.21.0",
    "ruff>=0.1.0",
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.ruff]
line-length = 100
target-version = "py311"

app.yaml (for Databricks Apps deployment)

command: ["python", "-m", "server.app"]

source_code_path: "."

environment:
  # Automatically injected by Databricks Apps
  - name: DATABRICKS_HOST
    value_from: workspace

  - name: DATABRICKS_TOKEN
    value_from: pat

  - name: DATABRICKS_APP_PORT
    value_from: app_port

  # Custom environment variables
  - name: LOG_LEVEL
    value: INFO

.gitignore

cat > .gitignore << 'EOF'
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
.venv
.uv

# Environment
.env
.env.local
.env.*.local

# IDE
.vscode/
.idea/
*.swp
*.swo

# Testing
.pytest_cache/
.coverage
htmlcov/

# Distribution
dist/
build/
*.egg-info/

# Databricks
.databricks/
EOF

Development Workflow

Development Scripts

Create helper scripts for common tasks:

# scripts/watch.sh - Start development server with hot reload
cat > scripts/watch.sh << 'EOF'
#!/bin/bash
source .env.local
export DATABRICKS_HOST
export DATABRICKS_TOKEN
uv run uvicorn server.app:combined_app --reload --port 8000
EOF
chmod +x scripts/watch.sh

# scripts/fix.sh - Format and lint code
cat > scripts/fix.sh << 'EOF'
#!/bin/bash
uv run ruff format server/ tests/
uv run ruff check --fix server/ tests/
EOF
chmod +x scripts/fix.sh

# scripts/test.sh - Run tests
cat > scripts/test.sh << 'EOF'
#!/bin/bash
uv run pytest tests/ -v
EOF
chmod +x scripts/test.sh

Common Commands

# Start development server
./scripts/watch.sh

# Format code
./scripts/fix.sh

# Run tests
./scripts/test.sh

# Install new dependency
uv add package-name

# Remove dependency
uv remove package-name

Testing Your Server

MCP Inspector (Interactive Testing)

The best way to test your MCP server interactively:

# Create testing script
cat > claude_scripts/inspect_local_mcp.sh << 'EOF'
#!/bin/bash
source .env.local
export DATABRICKS_HOST
export DATABRICKS_TOKEN

# Launch MCP Inspector
npx @modelcontextprotocol/inspector \
  uv run uvicorn server.app:combined_app --port 8000
EOF
chmod +x claude_scripts/inspect_local_mcp.sh

# Run inspector
./claude_scripts/inspect_local_mcp.sh

This opens a web interface where you can:

See all available tools and prompts
Call tools with custom parameters
View responses in real-time
Debug errors interactively

curl Testing

# Create curl test script
cat > claude_scripts/test_local_mcp_curl.sh << 'EOF'
#!/bin/bash
source .env.local
export DATABRICKS_HOST
export DATABRICKS_TOKEN

# Start server in background
uv run uvicorn server.app:combined_app --port 8000 &
SERVER_PID=$!
sleep 3

echo "Testing MCP server..."

# Initialize session
echo "1. Initialize session"
curl -s -X POST http://localhost:8000/mcp/ \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2024-11-05",
      "capabilities": {},
      "clientInfo": {"name": "test-client"}
    }
  }' | jq '.'

# List tools
echo -e "\n2. List tools"
curl -s -X POST http://localhost:8000/mcp/ \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/list"
  }' | jq '.result.tools[] | {name: .name, description: .description}'

# Call list_clusters tool
echo -e "\n3. Call list_clusters tool"
curl -s -X POST http://localhost:8000/mcp/ \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "list_clusters",
      "arguments": {}
    }
  }' | jq '.'

# Cleanup
kill $SERVER_PID
EOF
chmod +x claude_scripts/test_local_mcp_curl.sh

# Run test
./claude_scripts/test_local_mcp_curl.sh

Deploying to Databricks Apps

Step 1: Prepare for Deployment

# Create deployment script
cat > scripts/deploy.sh << 'EOF'
#!/bin/bash
set -e

echo "🚀 Deploying to Databricks Apps..."

# Load environment
source .env.local

# Generate requirements.txt from pyproject.toml
echo "📦 Generating requirements.txt..."
uv pip compile pyproject.toml -o requirements.txt

# Deploy via Databricks CLI
echo "🔧 Deploying application..."
databricks apps deploy my-databricks-mcp \
    --source-code-path . \
    --config-file app.yaml

# Get app URL
echo "✅ Deployment complete!"
APP_URL=$(databricks apps get my-databricks-mcp --json | jq -r '.url')
echo "🌐 App URL: $APP_URL"
echo "🔗 MCP Endpoint: $APP_URL/mcp/"
echo "📊 Logs: $APP_URL/logz"
EOF
chmod +x scripts/deploy.sh

Step 2: Deploy

# Run deployment
./scripts/deploy.sh

Step 3: Verify Deployment

# Check app status
databricks apps get my-databricks-mcp

# View logs
databricks apps logs my-databricks-mcp

Step 4: Test Remote Server

# Create remote testing script
cat > claude_scripts/test_remote_mcp_curl.sh << 'EOF'
#!/bin/bash
source .env.local

# Get app URL
APP_URL=$(databricks apps get my-databricks-mcp --json | jq -r '.url')
MCP_URL="${APP_URL}/mcp/"

echo "Testing remote MCP server at: $MCP_URL"

# Get OAuth token
TOKEN=$(databricks auth token --host "$DATABRICKS_HOST" | jq -r '.access_token')

# Initialize session
INIT_RESPONSE=$(curl -s -X POST "$MCP_URL" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2024-11-05",
      "capabilities": {},
      "clientInfo": {"name": "test-client"}
    }
  }')

echo "Initialize response:"
echo "$INIT_RESPONSE" | jq '.'

# Extract session ID
SESSION_ID=$(echo "$INIT_RESPONSE" | jq -r '.result.sessionId // empty')

# List tools
echo -e "\nListing tools:"
curl -s -X POST "$MCP_URL" \
  -H "Authorization: Bearer $TOKEN" \
  -H "mcp-session-id: $SESSION_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/list"
  }' | jq '.result.tools[] | {name: .name, description: .description}'
EOF
chmod +x claude_scripts/test_remote_mcp_curl.sh

# Test remote deployment
./claude_scripts/test_remote_mcp_curl.sh

Connecting to Claude CLI

Understanding MCP Proxy

Databricks Apps use OAuth authentication, which requires a proxy to connect with Claude CLI. The proxy handles:

Getting OAuth tokens from Databricks CLI
Managing MCP session initialization
Translating stdio (Claude) ↔ HTTP/SSE (Databricks Apps)

Step 1: Create MCP Proxy

# server/proxy.py
import os
import sys
import json
import subprocess
import requests

def get_oauth_token(databricks_host):
    """Get OAuth token via Databricks CLI."""
    result = subprocess.run(
        ['databricks', 'auth', 'token', '--host', databricks_host],
        capture_output=True,
        text=True,
        check=True
    )
    return json.loads(result.stdout).get('access_token')

class MCPProxy:
    def __init__(self, databricks_host, app_url):
        self.databricks_host = databricks_host
        self.app_url = app_url if app_url.endswith('/mcp/') else app_url + '/mcp/'
        self.session = requests.Session()
        self._oauth_token = None
        self._session_id = None

    def _get_token(self):
        """Get or refresh OAuth token."""
        if not self._oauth_token:
            self._oauth_token = get_oauth_token(self.databricks_host)
        return self._oauth_token

    def _initialize_session(self):
        """Initialize MCP session with server."""
        token = self._get_token()
        headers = {
            'Authorization': f'Bearer {token}',
            'Content-Type': 'application/json'
        }

        # Get session ID from initial request
        response = self.session.get(self.app_url, headers=headers)
        self._session_id = response.headers.get('mcp-session-id')

        # Send initialize request
        init_request = {
            'jsonrpc': '2.0',
            'method': 'initialize',
            'params': {
                'protocolVersion': '2024-11-05',
                'capabilities': {},
                'clientInfo': {'name': 'databricks-mcp-proxy'}
            }
        }
        self.session.post(self.app_url, headers=headers, json=init_request)

    def proxy_request(self, request_data):
        """Forward MCP request to server."""
        if not self._session_id:
            self._initialize_session()

        token = self._get_token()
        headers = {
            'Authorization': f'Bearer {token}',
            'mcp-session-id': self._session_id,
            'Content-Type': 'application/json'
        }

        response = self.session.post(self.app_url, headers=headers, json=request_data)
        return response.json()

    def run(self):
        """Run proxy stdio loop."""
        for line in sys.stdin:
            request = json.loads(line)
            response = self.proxy_request(request)
            print(json.dumps(response))
            sys.stdout.flush()

def main():
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument('--databricks-host', required=True)
    parser.add_argument('--databricks-app-url', required=True)
    args = parser.parse_args()

    proxy = MCPProxy(args.databricks_host, args.databricks_app_url)
    proxy.run()

if __name__ == '__main__':
    main()

Step 2: Update pyproject.toml

Add proxy as script entry point:

[project.scripts]
dbx-mcp-proxy = "server.proxy:main"

Step 3: Add to Claude CLI

# Set environment variables
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export APP_URL=$(databricks apps get my-databricks-mcp --json | jq -r '.url')
export DATABRICKS_APP_URL="${APP_URL}/mcp/"

# Add to Claude CLI
claude mcp add my-databricks-mcp --scope user -- \
  uv run dbx-mcp-proxy \
  --databricks-host "$DATABRICKS_HOST" \
  --databricks-app-url "$DATABRICKS_APP_URL"

Step 4: Verify Connection

# Start Claude CLI
claude

# In Claude, try:
# "List my Databricks clusters"
# Claude should use your list_clusters tool!

Troubleshooting

Issue: Authentication Failed

Symptoms: Authentication failed or 401 Unauthorized

Solutions:

Verify environment variables:

echo $DATABRICKS_HOST
echo $DATABRICKS_TOKEN

Test authentication directly:
```
databricks current-user me
```
Regenerate token if expired

Issue: MCP Tools Not Showing

Symptoms: tools/list returns empty array

Solutions:

Check tools are loaded:

# Add debug logging in server/app.py
load_tools(mcp_server)
print(f"Loaded tools: {list(mcp_server._tools.keys())}")

Verify decorator syntax:

@mcp_server.tool  # Correct
def my_tool():
    pass

# Not @tool or @mcp.tool

Issue: MCP Endpoint Not Found

Symptoms: 404 Not Found on /mcp/ endpoint

Solutions:

Ensure MCP routes are first:

combined_app = FastAPI(
    routes=[
        *mcp_asgi_app.routes,  # MCP MUST be first
        *app.routes,
    ],
    lifespan=mcp_asgi_app.lifespan,
)

Check URL has trailing slash: /mcp/ not /mcp

Issue: Deployment Fails

Symptoms: databricks apps deploy fails

Solutions:

Verify app.yaml exists and is valid
Check source code path is correct: . for current directory

Ensure requirements.txt is generated:

uv pip compile pyproject.toml -o requirements.txt

Issue: Proxy Connection Fails

Symptoms: Claude CLI can't connect to server

Solutions:

Verify Databricks CLI authentication:

databricks auth token --host $DATABRICKS_HOST

Check app URL is correct and ends with /mcp/:

echo $DATABRICKS_APP_URL
# Should be: https://app-name.databricksapps.com/mcp/

Test proxy manually:

echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' | \
uv run dbx-mcp-proxy \
  --databricks-host $DATABRICKS_HOST \
  --databricks-app-url $DATABRICKS_APP_URL

Next Steps

Add More Tools

Expand your MCP server with additional Databricks operations:

Job Management:

@mcp_server.tool
def run_job(job_id: str, parameters: dict = None) -> dict:
    """Trigger a Databricks job run."""

@mcp_server.tool
def get_job_run_status(run_id: str) -> dict:
    """Get the status of a job run."""

Unity Catalog:

@mcp_server.tool
def list_catalogs() -> dict:
    """List all Unity Catalog catalogs."""

@mcp_server.tool
def create_catalog(name: str, comment: str = None) -> dict:
    """Create a new Unity Catalog catalog."""

Notebook Operations:

@mcp_server.tool
def list_notebooks(path: str) -> dict:
    """List notebooks in a workspace directory."""

@mcp_server.tool
def export_notebook(path: str, format: str = "SOURCE") -> dict:
    """Export a notebook in specified format."""

Add Web UI

Create a React or Vue frontend:

# Create React app
cd client
npm create vite@latest . -- --template react-ts
npm install

# Update package.json scripts
"scripts": {
  "dev": "vite",
  "build": "vite build",
  "preview": "vite preview"
}

# Start development
npm run dev

Add Tests

Create comprehensive test suite:

# tests/test_tools.py
import pytest
from unittest.mock import Mock, patch
from server.tools import load_tools
from fastmcp import FastMCP

@pytest.fixture
def mcp_server():
    server = FastMCP(name="test")
    load_tools(server)
    return server

def test_list_clusters_success(mcp_server):
    """Test successful cluster listing."""
    with patch('server.tools.get_workspace_client') as mock_client:
        # Setup mock
        mock_cluster = Mock()
        mock_cluster.cluster_id = 'test-123'
        mock_cluster.cluster_name = 'test-cluster'
        mock_cluster.state.value = 'RUNNING'

        mock_client.return_value.clusters.list.return_value = [mock_cluster]

        # Call tool
        tool = mcp_server._tools['list_clusters']
        result = tool.func()

        # Assertions
        assert result['success'] is True
        assert len(result['clusters']) == 1

Optimize for Production

Add Caching:

from functools import lru_cache
from datetime import datetime, timedelta

_cache_time = None
_cache_data = None

@mcp_server.tool
def list_clusters_cached() -> dict:
    """List clusters with 5-minute cache."""
    global _cache_time, _cache_data

    now = datetime.now()
    if _cache_data and _cache_time and (now - _cache_time) < timedelta(minutes=5):
        return _cache_data

    result = list_clusters()
    _cache_data = result
    _cache_time = now
    return result

Add Logging:

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@mcp_server.tool
def list_clusters() -> dict:
    """List clusters with logging."""
    logger.info("Listing clusters...")
    try:
        result = # ... cluster listing logic
        logger.info(f"Found {len(result['clusters'])} clusters")
        return result
    except Exception as e:
        logger.error(f"Failed to list clusters: {e}")
        raise

Resources

Official Documentation

MCP Protocol: https://modelcontextprotocol.io/
Databricks SDK: https://databricks-sdk-py.readthedocs.io/
FastAPI: https://fastapi.tiangolo.com/
FastMCP: https://github.com/jlowin/fastmcp
Databricks Apps: https://docs.databricks.com/en/dev-tools/databricks-apps/

Example Projects

Databricks MCP Examples: https://github.com/databricks/databricks-mcp-examples
Custom MCP Template: Based on the projects in this guide

Community

MCP Discord: https://discord.gg/mcp
Databricks Community: https://community.databricks.com/

Tools

MCP Inspector: npx @modelcontextprotocol/inspector
Databricks CLI: pip install databricks-cli
uv Package Manager: https://github.com/astral-sh/uv

Conclusion

You've now built a production-ready MCP server for Databricks! You can:

✅ Create tools that call Databricks APIs ✅ Load prompts from markdown files ✅ Deploy to Databricks Apps ✅ Connect to Claude CLI with OAuth authentication

Next Steps:

Add more tools for your specific use cases
Create custom prompts for your workflows
Build a web UI for non-AI users
Share your MCP server with your team

Happy building! 🚀

Databricks MCP Guide

Reviews

Documentation

Building Custom MCP Servers for Databricks

Table of Contents

Introduction

What is MCP?

Key Concepts

How MCP Works

Why Build an MCP Server for Databricks?

Use Cases

Benefits

Prerequisites

Required Software

Databricks Requirements

Knowledge Prerequisites

Architecture Options

Option 1: FastAPI + FastMCP (Recommended for Production)

Option 2: Standalone stdio MCP Server

Comparison

Getting Started

Step 1: Environment Setup

Step 2: Authenticate with Databricks

Step 3: Create Environment File

Step-by-Step Tutorial

Step 1: Create Project Structure

Step 2: Create Databricks Client Wrapper

Step 3: Create Your First Tool

Step 4: Create Prompt Loader

Step 5: Create Example Prompt

Step 6: Create Main Application

Step 7: Create Configuration File

Step 8: Test Locally

Configuration Files

pyproject.toml

app.yaml (for Databricks Apps deployment)

.gitignore

Development Workflow

Development Scripts

Common Commands

Testing Your Server

MCP Inspector (Interactive Testing)

curl Testing

Deploying to Databricks Apps

Step 1: Prepare for Deployment

Step 2: Deploy

Step 3: Verify Deployment

Step 4: Test Remote Server

Connecting to Claude CLI

Understanding MCP Proxy

Step 1: Create MCP Proxy

Step 2: Update pyproject.toml

Step 3: Add to Claude CLI

Step 4: Verify Connection

Troubleshooting

Issue: Authentication Failed

Issue: MCP Tools Not Showing

Issue: MCP Endpoint Not Found

Issue: Deployment Fails

Issue: Proxy Connection Fails

Next Steps

Add More Tools

Add Web UI

Add Tests

Optimize for Production

Resources

Official Documentation

Example Projects

Community

Tools

Conclusion