📦

Langfuse MCP Better

A Model Context Protocol (MCP) server for Langfuse, enabling AI agents to query Langfuse trace data for enhanced debugging and observability

0 installs

Trust: 49 — Fair

Devtools

Installation

npx langfuse-mcp-better

Ask AI about Langfuse MCP Better

I know everything about Langfuse MCP Better. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Langfuse MCP Better (Model Context Protocol)

An enhanced Model Context Protocol (MCP) server for Langfuse with powerful training data extraction capabilities. This fork adds specialized tools for extracting LLM training data from LangGraph applications, supporting fine-tuning and reinforcement learning workflows.

What's New in Better?

🎯 Training Data Extraction: Extract LLM interactions filtered by LangGraph node hierarchy
🔄 Multiple Output Formats: OpenAI, Anthropic, generic prompt/completion, and DPO formats
🎨 Smart Filtering: Filter by node name, node path, model, and time range
📊 Rich Metadata: Token usage, model parameters, timestamps, and node information
🚀 Production Ready: Full test coverage and comprehensive documentation

Based on the excellent langfuse-mcp by Aviv Sinai.

Quick Start

Installation

Install via pip or uvx:

# Using pip
pip install langfuse-mcp-better

# Using uvx (recommended)
uvx langfuse-mcp-better --public-key YOUR_KEY --secret-key YOUR_SECRET --host https://cloud.langfuse.com

Cursor IDE Integration

For Cursor IDE, you can use the deeplink (replace with your credentials):

{
  "mcpServers": {
    "langfuse-better": {
      "command": "uvx",
      "args": ["langfuse-mcp-better", "--public-key", "YOUR_KEY", "--secret-key", "YOUR_SECRET", "--host", "https://cloud.langfuse.com"]
    }
  }
}

💡 Note: Cursor IDE deeplinks work best when configured manually in .cursor/mcp.json. See Configuration section below for details.

Features

Integration with Langfuse for trace and observation data
Tool suite for AI agents to query trace data
Exception and error tracking capabilities
Session and user activity monitoring
Training data extraction for fine-tuning and reinforcement learning
- LangGraph node hierarchy filtering
- Multiple output formats (OpenAI, Anthropic, generic, DPO)
- Rich metadata including token usage and model parameters

Available Tools

The MCP server provides the following tools for AI agents:

Core Tools

fetch_traces - Find traces based on criteria like user ID, session ID, etc.
fetch_trace - Get a specific trace by ID
fetch_observations - Get observations filtered by type
fetch_observation - Get a specific observation by ID
fetch_sessions - List sessions in the current project
get_session_details - Get detailed information about a session
get_user_sessions - Get all sessions for a user

Exception & Error Tools

find_exceptions - Find exceptions and errors in traces
find_exceptions_in_file - Find exceptions in a specific file
get_exception_details - Get detailed information about an exception
get_error_count - Get the count of errors

Training Data Tools

fetch_llm_training_data - [NEW] Extract LLM training data from LangGraph nodes for fine-tuning and reinforcement learning. Supports multiple output formats (OpenAI, Anthropic, generic, DPO) and filtering by node hierarchy.

Utility Tools

get_data_schema - Get schema information for the data structures

Setup

Install `uv`

First, make sure uv is installed. For installation instructions, see the uv installation docs.

If you already have an older version of uv installed, you might need to update it with uv self update.

Installation from PyPI

Requirement: The server depends on the Langfuse Python SDK v3. Installations automatically pull langfuse>=3.0.0 and require Python 3.10–3.13.

# Using pip
pip install langfuse-mcp-better

# Using uv
uv pip install langfuse-mcp-better

Development Installation

If you're iterating on this repository, install the local checkout:

# from the repo root
uv pip install --editable .

Recommended local environment

For development we suggest creating an isolated environment pinned to Python 3.11 (the version used in CI):

uv venv --python 3.11 .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
uv pip install --python .venv/bin/python -e .

All subsequent examples assume the virtual environment is activated.

Obtain Langfuse credentials

You'll need your Langfuse credentials:

Public key
Secret key
Host URL (usually https://cloud.langfuse.com or your self-hosted URL)

You can store these in a local .env file instead of passing CLI flags each time:

LANGFUSE_PUBLIC_KEY=your_public_key
LANGFUSE_SECRET_KEY=your_secret_key
LANGFUSE_HOST=https://cloud.langfuse.com

When present, the MCP server reads these values automatically. CLI arguments still override the environment if provided.

Running the Server

Run the server using uvx or the installed command:

# Using uvx (no installation needed)
uvx langfuse-mcp-better --public-key YOUR_KEY --secret-key YOUR_SECRET --host https://cloud.langfuse.com

# Using the installed command
langfuse-mcp-better --public-key YOUR_KEY --secret-key YOUR_SECRET --host https://cloud.langfuse.com

# Backward compatible command also available
langfuse-mcp --public-key YOUR_KEY --secret-key YOUR_SECRET --host https://cloud.langfuse.com

Local checkout tip: During development run uv run python -m langfuse_mcp ... to execute the code in your working tree.

The server writes diagnostic logs to /tmp/langfuse_mcp.log. Remove the --host switch if you are targeting the default Cloud endpoint. Use --log-level (e.g., --log-level DEBUG) and --log-to-console to control verbosity during debugging.

Run with Docker

Option 1: Pull from GitHub Container Registry (Recommended)

Pull and run the pre-built image:

docker pull ghcr.io/avivsinai/langfuse-mcp:latest
docker run --rm -i \
  -e LANGFUSE_PUBLIC_KEY=YOUR_PUBLIC_KEY \
  -e LANGFUSE_SECRET_KEY=YOUR_SECRET_KEY \
  -e LANGFUSE_HOST=https://cloud.langfuse.com \
  -e LANGFUSE_MCP_LOG_FILE=/logs/langfuse_mcp.log \
  -v "$(pwd)/logs:/logs" \
  ghcr.io/avivsinai/langfuse-mcp:latest

Available tags:

latest - Most recent release
v0.2.0 - Specific version
0.2 - Major.minor version

Option 2: Build from source

Build the image from the repository root so the container installs the current checkout instead of the latest PyPI release:

docker build -t langfuse-logs-mcp .
docker run --rm -i \
  -e LANGFUSE_PUBLIC_KEY=YOUR_PUBLIC_KEY \
  -e LANGFUSE_SECRET_KEY=YOUR_SECRET_KEY \
  -e LANGFUSE_HOST=https://cloud.langfuse.com \
  -e LANGFUSE_MCP_LOG_FILE=/logs/langfuse_mcp.log \
  -v "$(pwd)/logs:/logs" \
  langfuse-logs-mcp

Why no -t? Allocating a pseudo-TTY can interfere with MCP stdio clients. Use -i only so the server communicates over plain stdin/stdout.

The Dockerfile copies the local source tree and installs it with pip install ., so the container always runs your latest commits - a must while testing features that have not shipped on PyPI.

Configuration with MCP clients

Configure for Cursor

Create a .cursor/mcp.json file in your project root:

{
  "mcpServers": {
    "langfuse-better": {
      "command": "uvx",
      "args": ["langfuse-mcp-better", "--public-key", "YOUR_KEY", "--secret-key", "YOUR_SECRET", "--host", "https://cloud.langfuse.com"]
    }
  }
}

Configure for Claude Desktop

Add to your Claude settings:

{
  "command": ["uvx"],
  "args": ["langfuse-mcp-better"],
  "type": "stdio",
  "env": {
    "LANGFUSE_PUBLIC_KEY": "YOUR_KEY",
    "LANGFUSE_SECRET_KEY": "YOUR_SECRET",
    "LANGFUSE_HOST": "https://cloud.langfuse.com"
  }
}

Output Modes

Each tool supports different output modes to control the level of detail in responses:

compact (default): Returns a summary with large values truncated
full_json_string: Returns the complete data as a JSON string
full_json_file: Saves the complete data to a file and returns a summary with file information

Using the Training Data Tool

The fetch_llm_training_data tool is specifically designed for extracting training data from LangGraph applications. It provides powerful filtering and formatting capabilities for machine learning workflows.

Key Features

🚀 Automatic Pagination & Time Segmentation:
- Request any amount of data (1000, 10000+) - pagination handled automatically
- Query any time range (30 days, 60 days, 90+ days) - automatically splits into 7-day segments
- No API limits or time restrictions exposed to users
🔍 Smart Filtering:
- ls_model_name: Partial matching (case-insensitive) - "Qwen3_235B" matches all variants
- langgraph_node and agent_name: Exact matching for precision
- At least one filter required
Multiple Output Formats: Support for OpenAI, Anthropic, generic, and DPO formats
Rich Metadata: Includes token usage, model parameters, timestamps, and node information
Flexible Combinations: Combine multiple filters for precise data extraction
Transparent: Shows pages_fetched, time_segments_processed, and total_raw_observations in metadata

Output Formats

OpenAI Format (`output_format="openai"`)

Perfect for OpenAI fine-tuning:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What is AI?"},
    {"role": "assistant", "content": "AI is artificial intelligence..."}
  ],
  "metadata": {
    "model": "gpt-4",
    "usage": {"total_tokens": 150},
    "langgraph_node": "llm_call",
    "agent_name": "supervisor",
    "ls_model_name": "gpt-4-turbo"
  }
}

Anthropic Format (`output_format="anthropic"`)

Optimized for Claude fine-tuning:

{
  "system": "You are a helpful assistant",
  "messages": [
    {"role": "user", "content": "What is AI?"},
    {"role": "assistant", "content": "AI is artificial intelligence..."}
  ],
  "metadata": {...}
}

Generic Format (`output_format="generic"`)

Simple prompt/completion pairs:

{
  "prompt": "What is AI?",
  "completion": "AI is artificial intelligence...",
  "metadata": {...}
}

DPO Format (`output_format="dpo"`)

For Direct Preference Optimization:

{
  "prompt": "What is AI?",
  "chosen": "AI is artificial intelligence...",
  "rejected": null,
  "metadata": {
    "_note": "rejected field is null - add negative samples for DPO training"
  }
}

Automatic Pagination & Time Segmentation

No more API limits or time restrictions! The tool automatically handles both pagination and long time ranges:

# Request 5000 samples from last 30 days - no problem!
fetch_llm_training_data(
    age=43200,  # 30 days (exceeds 7-day API limit)
    ls_model_name="gpt-4-turbo",
    limit=5000,  # Automatically fetches across multiple API calls
    output_format="openai"
)

# The tool will:
# 1. Split 30 days into 5 time segments (7 days each)
# 2. For each segment, paginate through API calls (100 items each)
# 3. Aggregate and return all samples across all segments
# 4. Show metadata: time_segments_processed=5, pages_fetched=50, total_raw_observations=5000

Time Segmentation Details:

Queries > 7 days are automatically split into 7-day segments
Each segment is processed with pagination
Works seamlessly with any time range (30 days, 60 days, 90+ days)
You never see API time limit errors!

Usage Examples

Extract all LLM calls from a specific LangGraph node

# Get 1000 LLM interactions from the "agent_llm" node in the last 24 hours
fetch_llm_training_data(
    age=1440,  # 24 hours in minutes
    langgraph_node="agent_llm",
    limit=1000,  # Default: will auto-paginate if needed
    output_format="openai"
)

Filter by agent name

# Get 5000 LLM calls from the "supervisor" agent in the last week
fetch_llm_training_data(
    age=10080,  # 7 days
    agent_name="supervisor",
    limit=5000,  # Automatically handles pagination
    output_format="generic"
)

Filter by model name (partial matching)

# Extract 10,000 Qwen model calls using partial name (30 days automatically segmented)
# "Qwen3_235B" will match all variants like:
#   - Qwen3_235B_A22B_Instruct_2507
#   - Qwen3_235B_A22B_Instruct_2507_ShenZhen
#   - Qwen3_235B_A22B_Instruct_2507_Beijing
fetch_llm_training_data(
    age=43200,  # 30 days (automatically split into 5 time segments)
    ls_model_name="Qwen3_235B",  # Partial name - matches all variants!
    limit=10000,  # Large scale - automatically paginated and segmented
    output_format="openai"
    # include_metadata=False by default - pure training data
)

Combine multiple filters

# Extract data with specific node and model combination
fetch_llm_training_data(
    age=10080,
    langgraph_node="reasoning_node",
    ls_model_name="gpt-4-turbo",
    output_format="openai"
)

Save complete data to file

# Extract data and save to file for offline processing
fetch_llm_training_data(
    age=10080,
    agent_name="supervisor",
    output_format="openai",
    output_mode="full_json_file"  # Saves to configured dump directory
)

LangGraph Integration

The tool expects LangGraph applications to include specific metadata in their observations:

# In your LangGraph application, add metadata to track nodes
from langfuse import Langfuse

langfuse = Langfuse()

# When creating observations, include the required metadata fields
generation = langfuse.generation(
    name="llm_call",
    input=messages,
    output=response,
    metadata={
        "langgraph_node": "reasoning_node",      # Required for filtering by node
        "agent_name": "supervisor",              # Required for filtering by agent
        "ls_model_name": "gpt-4-turbo"          # Required for filtering by model
    }
)

Metadata Fields (Optional)

By default (include_metadata=False), only training data is returned - pure messages/prompts without metadata. This is what you want for model training.

Set include_metadata=True only when you need metadata for:

Data Analysis: Token usage, cost tracking
Quality Control: Filtering by performance metrics
Debugging: Tracing back to original traces
Reproducibility: Understanding data sources

When include_metadata=True, each sample includes:

observation_id, trace_id: For tracing back to source
timestamp: When the LLM call was made
model, model_parameters: Model configuration
usage: Token usage statistics (for cost analysis)
langgraph_node, agent_name, ls_model_name: Source information

⚠️ Important: Metadata is NOT used during model training. Keep it disabled (default) for cleaner training files.

Development

Clone the repository

git clone https://github.com/futumaster/langfuse-mcp-better.git
cd langfuse-mcp-better

Create a virtual environment and install dependencies

uv venv --python 3.11 .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install --python .venv/bin/python -e ".[dev]"

Set up environment variables

export LANGFUSE_SECRET_KEY="your-secret-key"
export LANGFUSE_PUBLIC_KEY="your-public-key"
export LANGFUSE_HOST="https://cloud.langfuse.com"  # Or your self-hosted URL

Testing

Run the unit test suite (mirrors CI):

pytest

To run the demo client:

uv run examples/langfuse_client_demo.py --public-key YOUR_PUBLIC_KEY --secret-key YOUR_SECRET_KEY

Version Management

This project uses dynamic versioning based on Git tags:

The version is automatically determined from git tags using uv-dynamic-versioning
To create a new release:
- Tag your commit with git tag v0.1.2 (following semantic versioning)
- Push the tag with git push --tags
- Create a GitHub release from the tag
The GitHub workflow will automatically build and publish the package with the correct version to PyPI

For a detailed history of changes, please see the CHANGELOG.md file.

Langfuse 3.x migration notes

The MCP server now uses the Langfuse Python SDK v3 resource clients (langfuse.api.trace.list, langfuse.api.observations.get_many, etc.) and must currently run on Python 3.10–3.13 because the upstream SDK still relies on Pydantic v1 internals.
Unit tests use a v3-style fake client that fails if legacy fetch_* helpers are invoked, helping catch regressions early.
Tool responses now include pagination metadata when the Langfuse API returns cursors, while retaining the existing MCP interface.
Diagnostic logs continue to stream to /tmp/langfuse_mcp.log; this is useful when verifying the upgraded integration against a live Langfuse deployment.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Cache Management

We use the cachetools library to implement efficient caching with proper size limits:

Uses cachetools.LRUCache for better reliability
Configurable cache size via the CACHE_SIZE constant
Automatically evicts the least recently used items when caches exceed their size limits

Langfuse MCP Better

Installation

Reviews

Documentation

Langfuse MCP Better (Model Context Protocol)

What's New in Better?

Quick Start

Installation

Cursor IDE Integration

Features

Available Tools

Core Tools

Exception & Error Tools

Training Data Tools

Utility Tools

Setup

Install uv

Installation from PyPI

Development Installation

Recommended local environment

Obtain Langfuse credentials

Running the Server

Run with Docker

Option 1: Pull from GitHub Container Registry (Recommended)

Option 2: Build from source

Configuration with MCP clients

Configure for Cursor

Configure for Claude Desktop

Output Modes

Using the Training Data Tool

Key Features

Output Formats

OpenAI Format (output_format="openai")

Anthropic Format (output_format="anthropic")

Generic Format (output_format="generic")

DPO Format (output_format="dpo")

Automatic Pagination & Time Segmentation

Usage Examples

Extract all LLM calls from a specific LangGraph node

Filter by agent name

Filter by model name (partial matching)

Combine multiple filters

Save complete data to file

LangGraph Integration

Metadata Fields (Optional)

Development

Clone the repository

Create a virtual environment and install dependencies

Set up environment variables

Testing

Version Management

Langfuse 3.x migration notes

Contributing

License

Cache Management

Install `uv`

OpenAI Format (`output_format="openai"`)

Anthropic Format (`output_format="anthropic"`)

Generic Format (`output_format="generic"`)

DPO Format (`output_format="dpo"`)