📦

Find Kusto Table MCP

A mcp server for finding the right kusto table when you have thousands of tables across multiple clusters

0 installs

Trust: 51 — Fair

Devtools

Installation

npx find-kusto-table-mcp

Ask AI about Find Kusto Table MCP

I know everything about Find Kusto Table MCP. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

🚀 Kusto Table Search MCP Server

An experimental Model Context Protocol (MCP) server for Kusto table discovery and query building. Built with FastMCP.

🌟 What Does It Do?

This is an experimental MCP server for AI agents working with Azure Data Explorer (Kusto):

Features

🔍 Table Search: Find tables using natural language queries
🛡️ Schema Validation: Helps prevent AI hallucination by sampling actual table schemas
📓 Jupyter Notebook Integration: Transparent, repeatable investigations with Kqlmagic
⚡ Caching: Lazy loading and LRU caching for performance
🔧 Azure SDK Integration: Connects to real Kusto clusters with Azure authentication
� Interactive Visualizations: Charts and dashboards directly in notebooks
🎯 Query Helpers: Anti-pattern detection and optimization suggestions

🎯 Use Cases

Helping AI agents build Kusto queries using actual table schemas
Exploring large Kusto environments with many tables
Creating transparent, repeatable investigation notebooks
Interactive data analysis with Python/KQL integration
Learning and experimenting with Kusto/KQL

🚀 Quick Start

One-Click Installation for VS Code

Prerequisites

Python 3.9 or higher
Azure credentials with Kusto access (Managed Identity, Azure CLI, or Device Auth)
pipx for isolated installation (recommended)

Installation

Option 1: Using pipx (Recommended)

# Install pipx if you don't have it
python -m pip install --user pipx
python -m pipx ensurepath

# Install the MCP server
pipx install find-kusto-table-mcp

# Run the server
find-kusto-table-mcp

Option 2: Using pip

pip install find-kusto-table-mcp
python -m kusto_mcp

Option 3: From Source (Development)

# Clone the repository
git clone https://github.com/AmeliaRose802/find-kusto-table-mcp.git
cd find-kusto-table-mcp

# Install dependencies
pip install -r requirements.txt

# Setup connection strings (copy from example)
cp connection_strings.json.example connection_strings.json
# Edit connection_strings.json with your cluster details

# Run the server
python kusto_server.py

Configuration for Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "kusto-table-search": {
      "command": "python",
      "args": ["/path/to/cache-kusto-info/kusto_server.py"],
      "env": {}
    }
  }
}

Using with FastMCP Client

from fastmcp import FastMCP
from fastmcp.client import Client
import kusto_server

async with Client(kusto_server.mcp) as client:
    # Search for tables
    result = await client.call_tool(
        "search_kusto_tables",
        arguments={"query": "wireserver requests", "limit": 5}
    )
    print(result)

🎨 Key Features Explained

1. Real Kusto Connectivity (NEW!)

Connect to Azure Data Explorer using the official Azure SDK with automatic authentication fallback:

# Automatic authentication chain:
# 1. Managed Identity (for Azure-hosted services)
# 2. Azure CLI (for local development)  
# 3. Device Auth (interactive fallback)

### Configuration

```bash
# Configure in cache/connection_strings.json
{
    "clusters": {
        "mycluster": {
            "cluster_url": "https://mycluster.westus.kusto.windows.net",
            "use_real_client": true  # Enable Azure SDK
        }
    }
}

# Built-in resilience:
# - Circuit breaker (protects against cascading failures)
# - Exponential backoff retry (3 attempts with jitter)
# - Bulkhead isolation (20 concurrent queries max)
# - Rate limiting (token bucket algorithm)

2. Advanced Analytics Engine (NEW!)

Perform sophisticated analysis on cached query results without polluting context:

# Statistical Analysis
stats = query_handle_statistical_analysis(
    handle="qh_abc123",
    columns=["ResponseTime", "RequestCount"]
)
# Returns: mean, median, stddev, P50/P75/P90/P95/P99, skewness, kurtosis

# Correlation Detection
correlations = query_handle_correlation_analysis(
    handle="qh_abc123",
    numeric_columns=["CPU", "Memory", "Latency"]
)
# Returns: Pearson correlation matrix

# Anomaly Detection (3 methods)
anomalies = query_handle_outlier_detection(
    handle="qh_abc123",
    column="RequestLatency",
    method="iqr",  # or "zscore" or "modified_zscore"
    threshold=1.5
)
# Returns: outlier values, indices, statistics

# Time Series Analysis
trends = query_handle_time_series_analysis(
    handle="qh_abc123",
    time_column="Timestamp",
    value_column="ErrorRate",
    window_size=10
)
# Returns: trend (increasing/decreasing/stable), rate_of_change, moving_averages

3. Query Optimization (NEW!)

Intelligent query analysis with anti-pattern detection and automatic optimization:

# Comprehensive Query Analysis
analysis = query_analyze_optimization(
    query="""
    MyTable 
    | where EventTime > ago(7d)
    | extend ComputedField = tostring(RawData)
    | order by EventTime desc
    | take 100
    """
)
# Returns:
# - Anti-patterns detected (no time filter, SELECT *, inefficient ordering, etc.)
# - Complexity score (0-100+)
# - Estimated cost (low/medium/high/very high)
# - Optimization suggestions
# - Optimized query

# Index Recommendations
indexes = query_suggest_indexes(
    query="MyTable | where UserId == 'user123' and EventTime > ago(1h)",
    table_name="MyTable"
)
# Returns: recommended indexes (range, datetime, hash) with justification

# Query Comparison
comparison = query_compare_performance(
    query1="MyTable | where EventTime > ago(1d) | take 1000",
    query2="MyTable | where EventTime > ago(1d) | summarize count()"
)
# Returns: side-by-side complexity, cost, anti-patterns, recommendation

4. Jupyter Notebook Integration (NEW!)

Create transparent, repeatable investigations using Kqlmagic:

# Create an investigation notebook
create_investigation_notebook(
    name="my_investigation",
    title="Production Issue Analysis",
    description="Investigating errors in ApplicationLogs",
    pattern="investigation"  # or monitoring, troubleshooting, exploration
)

# Add Kqlmagic connection
add_kusto_connection(
    notebook="my_investigation",
    cluster="admeus",
    database="AdmeusDB",
    auth_method="azcli"  # Azure CLI authentication
)

# Add KQL query cells
add_query_to_notebook(
    notebook="my_investigation",
    title="Error Rate Analysis",
    query="ApplicationLogs | where Severity >= 3 | summarize count() by bin(EventTime, 5m)",
    description="Track error rates over time",
    add_python_analysis=True  # Adds pandas analysis cell
)

# Add visualization
add_visualization_to_notebook(
    notebook="my_investigation",
    title="Error Distribution",
    query="ApplicationLogs | summarize count() by ErrorType | render piechart"
)

# Add documentation
add_markdown_to_notebook(
    notebook="my_investigation",
    title="Investigation Summary",
    content="## Findings\n\n- Error rate spiked at 3pm...\n- Root cause: ...\n"
)

Benefits:

✅ Transparent: All queries visible in cells
✅ Repeatable: Re-run entire notebook anytime
✅ Shareable: Export .ipynb and share with team
✅ Interactive: Mix KQL, Python, visualizations
✅ Version-controlled: Commit to git

5. Smart Table Search with Lazy Caching

Traditional approach: Load all schemas upfront (slow, memory-intensive) Our approach: Load schemas on-demand as you search (fast, efficient)

# Search is instant - schemas loaded only when needed
search_kusto_tables(
    query="node health monitoring tables",
    method="hybrid",  # Combines keyword + fuzzy + semantic
    limit=10
)

6. Anti-Hallucination Query Building

AI agents love to invent column names. We prevent this:

# Step 1: Sample table to get ACTUAL column names
schema = sample_table_for_query_building(
    cluster="admeus",
    database="AdmeusDB", 
    table="NodeHealthEvents"
)

# Step 2: Build query using ONLY the real column names
# ✅ Uses: EventTime, NodeName, Status (from actual schema)
# ❌ Never invents: Timestamp, Node, State

7. Investigation Workflows

List and discover investigation notebooks:

# List all investigation notebooks
list_investigation_notebooks()
# Returns: List of all notebooks with metadata (pattern, queries, creation date)

# Open notebook in Jupyter
# notebooks/my_investigation.ipynb

📚 Tool Reference

Core Search Tools

Tool	Purpose	Key Parameters
`search_kusto_tables`	Find tables using natural language	`query`, `method`, `limit`
`get_table_details`	Get comprehensive table info	`cluster`, `database`, `table`
`sample_table_for_query_building`	Anti-hallucination: Get real schema	`cluster`, `database`, `table`

Query Handle Tools

Tool	Purpose	Key Parameters
`execute_query_with_handle`	Execute query, return handle	`query`, `limit`
`query_handle_analyze`	Analyze cached results	`handle`, `operation`, `column`
`query_handle_list`	List all active handles	`include_expired`
`query_handle_validate`	Check if handle is valid	`handle`
`query_handle_get_sample`	Get sample rows from handle	`handle`, `num_rows`

Advanced Analytics Tools (NEW!)

Tool	Purpose	Key Parameters
`query_handle_statistical_analysis`	Statistical analysis on cached results	`handle`, `columns`
`query_handle_correlation_analysis`	Detect correlations between columns	`handle`, `numeric_columns`
`query_handle_outlier_detection`	Anomaly detection (IQR/Z-score)	`handle`, `column`, `method`, `threshold`
`query_handle_time_series_analysis`	Trend detection and time-series analysis	`handle`, `time_column`, `value_column`, `window_size`

Query Optimization Tools (NEW!)

Tool	Purpose	Key Parameters
`query_analyze_optimization`	Comprehensive query analysis	`query`
`query_suggest_indexes`	Index recommendations	`query`, `table_name`
`query_compare_performance`	Side-by-side query comparison	`query1`, `query2`

Template Tools

Tool	Purpose	Key Parameters
`template_create`	Create reusable query template	`name`, `query`, `parameters`
`template_list`	List all templates (10 production templates included)	`tags`, `search`
`template_render`	Generate query from template	`name`, `parameters`
`template_get`	Get template details	`name`
`template_delete`	Delete custom template	`name`

Cache & Performance Tools

Tool	Purpose	Key Parameters
`cache_stats`	Get comprehensive cache statistics	-
`cache_clear`	Clear expired cache entries	-
`performance_stats`	Get performance metrics	`operation` (optional)

🛡️ Anti-Hallucination Best Practices

✅ DO: Always Sample Before Querying

# CORRECT WORKFLOW
1. search_kusto_tables("error logs")
2. sample_table_for_query_building(cluster, db, table)
3. Build query using EXACT column names from step 2
4. execute_query_with_handle(query)

❌ DON'T: Assume Column Names

# WRONG - Assumes "Timestamp" exists
query = "ErrorLogs | where Timestamp > ago(1h)"

# RIGHT - Uses actual column from schema
schema = sample_table_for_query_building(...)
time_col = schema["schema"]["primary_time_column"]  # e.g., "EventTime"
query = f"ErrorLogs | where {time_col} > ago(1h)"

� Running the Server

Local Development (Default)

# Run with STDIO transport (for Claude Desktop)
python kusto_server.py

Alternative Transports

# Run with HTTP transport
fastmcp run kusto_server.py --transport http --port 8000

# Run with SSE transport
fastmcp run kusto_server.py --transport sse

🧪 Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/test_fastmcp_server.py -v

# Run integration tests only
pytest tests/test_fastmcp_server.py::TestKustoServerIntegration -v

# Run resilience pattern tests
pytest tests/test_resilience.py -v

# Run real Kusto client tests
pytest tests/test_kusto_client.py -v

🆕 Recent Enhancements

See OVERNIGHT_ENHANCEMENTS_V3.md for comprehensive documentation of recent production-ready improvements:

1. Real Kusto Client Integration

Native Azure SDK connectivity (azure-kusto-data 4.0+)
Multi-auth support: Managed Identity → Azure CLI → Device Auth
Built-in resilience: circuit breakers, retry, bulkheads
Impact: Production-ready Kusto connectivity with enterprise reliability

2. Advanced Analytics Engine

Statistical analysis: mean, median, stddev, percentiles (P50-P99)
Correlation detection: Pearson correlation matrices
Anomaly detection: IQR, Z-score, Modified Z-score methods
Time-series analysis: trend detection, rate of change, moving averages
Data quality scoring: completeness, consistency, validity (0-100 score)
Impact: Perform sophisticated analysis without context pollution

3. Query Optimization Analyzer

Anti-pattern detection: 10+ patterns (no time filter, SELECT *, inefficient ordering)
Complexity scoring: 0-100+ with cost estimation
Automatic optimization: query rewriting with best practices
Index recommendations: range, datetime, hash indexes
Side-by-side comparison: compare alternative queries
Impact: Write better queries, reduce costs, improve performance

4. Enhanced Error Recovery System

Circuit breaker: state machine with auto-recovery (5 failure threshold, 60s timeout)
Exponential backoff retry: 3 attempts with jitter (1s → 2s → 4s)
Bulkhead isolation: 20 concurrent query limit
Fallback handlers: graceful degradation
Rate limiting: token bucket algorithm
Impact: Production-grade fault tolerance

5. Query Result Caching with Persistence

Disk-backed storage: survives server restarts
LRU eviction: memory limit 1000 handles
Lazy loading: handles loaded only when accessed
Integrity verification: pickle serialization with validation
Cache statistics: hit rate, disk I/O, memory usage
Impact: Efficient memory usage, reliable caching

6. Advanced Query Template Library

10 production-ready templates: error analysis, performance, security, cost
Comprehensive documentation: parameters, tags, execution times
Common scenarios: monitoring, SLA tracking, threat detection, FinOps
Impact: Accelerate common analysis patterns

Total Additions: ~3000+ lines of production code, 7 new files, 7 new MCP tools, comprehensive test coverage

📊 Performance Characteristics

Table Search: <100ms for cached queries, <500ms for first search
Schema Cache: <10ms for cached lookups, <2s for fresh fetch
Query Handles: <50ms for analytics operations, persistent across restarts
Memory: <200MB typical, <500MB with large caches, LRU eviction at 1000 handles
Disk I/O: Lazy loading, handles loaded only when accessed
Resilience: Circuit breaker opens after 5 failures, 60s timeout, auto-recovery
Concurrency: Bulkhead limits 20 concurrent Kusto queries, prevents resource exhaustion
Retry: Exponential backoff (1s → 2s → 4s) with jitter, 3 max attempts

🏗️ Architecture

kusto_server.py (FastMCP server - 20+ tools)
├── @mcp.tool decorators (20+ tools including analytics & optimization)
├── @mcp.resource (2 help resources)
└── @mcp.prompt (2 reusable prompts)

src/
├── core/                              # Infrastructure
│   ├── logging_config.py              # Structured logging
│   ├── config.py                      # Configuration management
│   ├── performance.py                 # Performance monitoring
│   ├── exceptions.py                  # Custom exceptions
│   └── resilience.py                  # NEW: Circuit breakers, retry, bulkheads, rate limiting
├── services/                          # Business logic
│   ├── schema_cache_service.py        # Lazy schema caching
│   ├── query_handle_service.py        # NEW: Disk-backed result caching with LRU
│   ├── query_template_service.py      # Template management (10 production templates)
│   ├── query_templates.json           # NEW: Production query template library
│   ├── kql_query_builder_service.py   # KQL query generation
│   ├── analytics_engine.py            # NEW: Statistical analysis, correlation, anomalies
│   └── query_optimizer.py             # NEW: Anti-pattern detection, query optimization
├── tools/                             # High-level tools
│   ├── table_discovery.py             # Table search orchestration
│   └── query_sampler.py               # Anti-hallucination sampling
└── utils/                             # Utilities
    ├── anti_hallucination.py          # Schema validation
    ├── kusto_client.py                # NEW: Real Azure SDK + Mock clients with resilience
    └── helpers.py                     # Helper functions

tests/                                 # Comprehensive test suite
├── test_fastmcp_server.py             # Server integration tests
├── test_resilience.py                 # NEW: Circuit breaker, retry, bulkhead tests
├── test_kusto_client.py               # NEW: Real & mock client tests
├── test_ai_query_builder.py           # NEW: AI query building tests
├── test_query_handle_service.py       # Query handle tests
└── test_schema_cache_service.py       # Schema cache tests

🤝 Contributing

Contributions are welcome! Please see DEVELOPMENT_GUIDE.md for development practices and OVERNIGHT_ENHANCEMENTS_V3.md for recent feature additions.

📝 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

Built with FastMCP by @jlowin
Powered by Azure SDK for Python
Inspired by the need for trustworthy AI-powered query building
Special thanks to the Azure Data Explorer team

🎉 Latest Update: Major production-ready enhancements including real Azure SDK integration, advanced analytics, query optimization, enterprise resilience patterns, persistent caching, and 10 production query templates. See OVERNIGHT_ENHANCEMENTS_V3.md for details.

📧 Support

📖 Documentation
🐛 Issue Tracker
💬 Discussions

Legacy Files (Pre-FastMCP Migration)

The following files are from the previous MCP SDK implementation and are kept for reference:

mcp_table_search_server.py - Original MCP SDK server
enhanced_mcp_server.py - Previous enhanced server (now replaced by kusto_server.py)

Use kusto_server.py for all new development.