OpenTracy
No description available
Ask AI about OpenTracy
Powered by Claude Β· Grounded in docs
I know everything about OpenTracy. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
The auto-distillation layer for your LLM calls.
Drop-in OpenAI-compatible SDK. Every request becomes a trace; traces become datasets; datasets become distilled custom models; the routing layer swaps those models in under your app via aliases β so your cost curve goes down over time without code changes.
Sponsors
Try it in Colab (no install)
Each notebook runs end-to-end on a free Colab runtime β bring your own OpenAI key, optionally Anthropic / Groq.
| # | Notebook | One-line pitch | Colab |
|---|---|---|---|
| 01 | Quickstart | First completion() call, see _cost + _latency_ms, swap providers | |
| 02 | Drop in over the OpenAI SDK | Keep from openai import OpenAI, change only base_url | |
| 03 | Semantic auto-routing | One prompt, the right model of 13 β learned, not rule-based | |
| 04 | Ticket classifier (real app) | End-to-end support-ticket classifier with cost breakdown | |
| 05 | Distillation β train your student | Turn trace history into a distilled tiny model | |
| 06 | Serve your distilled model | Four serving paths from load-the-adapter to alias swap |
Colab heads-up β traces only show up in the dashboard if you set
OPENTRACY_ENGINE_URLbeforeimport opentracy. Every notebook has a commented-out cell at the top with the two lines you need.
Install
pip install opentracy
Quick start
import opentracy as lr
resp = lr.completion(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)
print(f"cost: ${resp._cost:.6f} latency: {resp._latency_ms:.0f}ms")
Works with 13 providers out of the box: OpenAI, Anthropic, Gemini, Groq, Mistral, DeepSeek, Together, Fireworks, Cerebras, Sambanova, Perplexity, Cohere, Bedrock.
Connecting to the OpenTracy platform (traces, dashboards, distillation)
By default lr.completion() goes direct to the provider, so calls do not
appear in the OpenTracy dashboard. To route every call through a running
engine β the only way traces, metrics, and the distillation loop get data β
set OPENTRACY_ENGINE_URL before importing the SDK:
import os
os.environ["OPENTRACY_ENGINE_URL"] = "http://<your-opentracy-host>:8080" # engine port
import opentracy as lr
resp = lr.completion(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
)
# trace now visible in the dashboard on the UI host at :3000
Alternatives:
- Per-call: pass
force_engine=True, api_base="http://<host>:8080/v1"tolr.completion(...). - Drop-in OpenAI SDK (no code change beyond
base_urlβ see below).
API keys for your providers should be saved once via the UI
(Settings β API Keys) or the API (POST /v1/secrets/<provider>); the
engine picks them up immediately from ~/.opentracy/secrets.json.
Routing with fallbacks
router = lr.Router(
model_list=[
{"model_name": "smart", "model": "openai/gpt-4o"},
{"model_name": "smart", "model": "anthropic/claude-sonnet-4-6"},
],
fallbacks=[{"smart": ["deepseek/deepseek-chat"]}],
)
resp = router.completion(model="smart", messages=[{"role": "user", "content": "Hi"}])
Drop-in replacement for the OpenAI SDK
Point any existing OpenAI app at the OpenTracy engine β zero code changes beyond base_url:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="any")
# All 13 providers routed through the OpenTracy engine; every request is a trace.
Distillation β what makes OpenTracy different from a plain gateway
from opentracy import Distiller
d = Distiller()
# Submit a dataset built from your own traces, pick a teacher + student model,
# and OpenTracy trains the distilled model and serves it behind a routing alias
# you can point traffic at. Your app code never changes.
Install the training extras for the distillation pipeline:
pip install opentracy[distill]
Self-host the full platform (traces + UI + REST API)
git clone https://github.com/OpenTracy/opentracy.git
cd opentracy
make up # Gateway + ClickHouse analytics + Python API + UI
Engine at http://localhost:8080, Python API at http://localhost:8000, UI at http://localhost:3000.
GPU is optional
The Python API container runs on CPU by default β no nvidia-container-toolkit
required, so it works on plain cloud VMs and laptops without a GPU. Local
training and inference paths fall back to CPU automatically.
To enable GPU acceleration (faster distillation training, local inference):
DOCKER_RUNTIME=nvidia docker compose up -d
This requires the NVIDIA Container Toolkit
on the host. Persist the variable in .env next to docker-compose.yml if you
always run with GPU.
What OpenTracy Does
Requests βββΊ Gateway (13 providers) βββΊ Traces (ClickHouse)
β
βββββββββ΄ββββββββ
βΌ βΌ
Clustering Analytics
(domains) (cost/latency)
β
βββββββ΄ββββββ
βΌ βΌ
Evaluations Distillation
(AI metrics) (training data)
- Route β proxy to 13 LLM providers with fallbacks, retries, and cost tracking
- Observe β every request/response stored in ClickHouse with full content
- Cluster β auto-group prompts by domain using embeddings + LLM labeling
- Evaluate β run models against domain datasets with built-in and AI-suggested metrics
- Distill β export input/output pairs per domain for fine-tuning smaller models
Features
Gateway
- 13 LLM Providers through one OpenAI-compatible API
- Python SDK β
lr.completion()one-liner - Router Class β load balancing, fallbacks, retries, 4 strategies
- Streaming β all providers including Anthropic & Bedrock SSE translation
- Cost Tracking β 70+ models with per-token pricing on every response
- Vision / Multimodal β images via base64 or URL
- Tool Calling β function calls with cross-provider translation
- Semantic Routing β auto-select the best model per prompt (with weights)
Observability
- ClickHouse Analytics β traces, cost, latency, model-level stats
- Full Content Capture β input/output text stored for every request
- Trace Scanning β AI agent detects hallucinations, refusals, quality regressions
- Real-time Dashboard β UI with filters, search, trace detail drawer
Domain Clustering
- Auto-clustering β groups prompts by semantic similarity (KMeans + MiniLM embeddings)
- LLM Labeling β AI agent names each cluster (e.g., "JavaScript Concepts", "Business Strategy")
- Quality Gates β coherence scoring, outlier detection, merge suggestions
- Input + Output Storage β full pairs stored per cluster for distillation
Evaluations
- Run Evaluations β send dataset samples through models, score and compare
- 6 Built-in Metrics β exact match, contains, similarity, LLM-as-judge, latency, cost
- AI Metric Suggestion β harness agent analyzes dataset domain and creates tailored metrics
- Background Execution β evaluations run async with progress tracking
- Model Comparison β side-by-side results with winner determination
Distillation
- BOND Pipeline β teacher β LLM-as-Judge curation β LoRA training (Unsloth) β GGUF export
- Dataset Support β use domain clusters or custom datasets as training source
- UI + API β create and monitor jobs via dashboard or REST endpoints
Harness (AI Agent System)
- Agent Runner β loads
.mdagent configs, calls LLM, parses structured output - 7 Agents β cluster labeler, coherence scorer, outlier detector, merge checker, trace scanner, eval generator, metrics suggester
- Memory Layer β persistent agent memory with query/summary
- Tool Access β agents can call tools (list traces, query datasets, etc.)
Supported Providers
| Provider | Syntax | Env Var |
|---|---|---|
| OpenAI | openai/gpt-4o-mini | OPENAI_API_KEY |
| Anthropic | anthropic/claude-haiku-4-5-20251001 | ANTHROPIC_API_KEY |
| Gemini | gemini/gemini-2.0-flash | GEMINI_API_KEY |
| Mistral | mistral/mistral-small-latest | MISTRAL_API_KEY |
| Groq | groq/llama-3.3-70b-versatile | GROQ_API_KEY |
| DeepSeek | deepseek/deepseek-chat | DEEPSEEK_API_KEY |
| Perplexity | perplexity/sonar | PERPLEXITY_API_KEY |
| Cerebras | cerebras/llama3.1-70b | CEREBRAS_API_KEY |
| SambaNova | sambanova/Meta-Llama-3.1-70B-Instruct | SAMBANOVA_API_KEY |
| Together | together/meta-llama/Llama-3.3-70B-Instruct-Turbo | TOGETHER_API_KEY |
| Fireworks | fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct | FIREWORKS_API_KEY |
| Cohere | cohere/command-r-plus | COHERE_API_KEY |
| AWS Bedrock | bedrock/amazon.titan-text-express-v1 | AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY |
Installation
pip install -e ".[openai,anthropic,api]" # SDK + common providers
pip install -e ".[all]" # everything
pip install -e ".[train]" # training/distillation deps (CUDA)
Python SDK
Completion
import opentracy as lr
response = lr.completion(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
)
# Streaming
for chunk in lr.completion(model="openai/gpt-4o-mini", messages=[...], stream=True):
print(chunk.choices[0].delta.content or "", end="")
# Fallbacks
response = lr.completion(
model="openai/gpt-4o-mini",
messages=[...],
fallbacks=["anthropic/claude-haiku-4-5-20251001", "groq/llama-3.3-70b-versatile"],
num_retries=2,
)
Router (Load Balancing)
router = lr.Router(
model_list=[
{"model_name": "smart", "model": "openai/gpt-4o"},
{"model_name": "smart", "model": "anthropic/claude-sonnet-4-20250514"},
{"model_name": "fast", "model": "groq/llama-3.3-70b-versatile"},
],
fallbacks=[{"smart": ["deepseek/deepseek-chat"]}],
strategy="round-robin", # or: least-cost, lowest-latency, weighted-random
)
response = router.completion(model="smart", messages=[...])
Drop-in OpenAI SDK
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="any")
client.chat.completions.create(model="openai/gpt-4o-mini", messages=[...])
client.chat.completions.create(model="anthropic/claude-haiku-4-5-20251001", messages=[...])
client.chat.completions.create(model="mistral/mistral-small-latest", messages=[...])
Running
| Command | What | Requires |
|---|---|---|
make gateway | Gateway proxy (no weights needed) | Go |
make gateway-db | Gateway + ClickHouse | Go + Docker |
make gateway-router | Full semantic routing (model="auto") | Go + weights |
make up | Full stack (ClickHouse + engine + API + UI) background | Go + Docker + Python + Node |
make api | Python API only (uvicorn --reload) | Python |
API Keys
Configure via the UI, environment variables, or ~/.opentracy/secrets.json:
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
make start-full
API Endpoints
Gateway (Go Engine β port 8080)
| Method | Endpoint | Description |
|---|---|---|
POST | /v1/chat/completions | Chat completion (any provider) |
POST | /v1/route | Route a prompt without generating |
GET | /v1/models | List registered models |
GET | /health | Health check |
Python API (port 8000)
| Method | Endpoint | Description |
|---|---|---|
| Analytics | ||
GET | /v1/stats/{tenant}/analytics | Full analytics (traces, cost, latency, distributions) |
| Clustering | ||
POST | /v1/clustering/run | Run clustering pipeline (embed, cluster, label) |
GET | /v1/clustering/datasets | List domain datasets from latest run |
GET | /v1/clustering/datasets/{run}/{cluster} | Get traces for a cluster |
| Datasets | ||
GET | /v1/datasets | List all datasets (eval + domain clusters) |
POST | /v1/datasets | Create evaluation dataset |
POST | /v1/datasets/{id}/samples | Add samples to dataset |
| Evaluations | ||
POST | /v1/evaluations | Create and run evaluation (async) |
GET | /v1/evaluations | List evaluations |
GET | /v1/evaluations/{id}/status | Evaluation progress |
GET | /v1/evaluations/{id}/results | Evaluation results with scores |
| Distillation | ||
POST | /v1/distillation/{tenant}/jobs | Create distillation job |
GET | /v1/distillation/{tenant}/jobs | List distillation jobs |
GET | /v1/distillation/{tenant}/jobs/{id} | Get job status and results |
| Metrics | ||
GET | /v1/metrics | List built-in + custom metrics |
POST | /v1/metrics | Create custom metric |
POST | /v1/auto-eval/suggest-metrics | AI-powered metric suggestion |
| Models | ||
GET | /v1/models/available | Models available from configured providers |
| Harness | ||
GET | /v1/harness/agents | List AI agents |
POST | /v1/harness/run/{name} | Run an agent with input |
GET | /v1/harness/memory | Query agent memory |
| Secrets | ||
GET | /v1/secrets | List configured providers |
POST | /v1/secrets/{provider} | Save API key |
GET | /v1/harness/agents | List AI agents |
POST | /v1/harness/run/{name} | Run an agent with input |
GET | /v1/harness/memory | Query agent memory |
| Secrets | ||
GET | /v1/secrets | List configured providers |
POST | /v1/secrets/{provider} | Save API key |
Architecture
go/ # Go engine (high-performance gateway)
βββ cmd/opentracy-engine/ # Entry point
βββ internal/
β βββ provider/ # 13 providers
β βββ server/ # HTTP handlers + session management
β βββ clickhouse/ # Trace writer + 8 migrations
β βββ router/ # UniRoute algorithm + LRU cache
β βββ embeddings/ # ONNX MiniLM embedder
opentracy/ # Python layer (analytics, clustering, evals)
βββ api/server.py # FastAPI β analytics, clustering, evaluations, metrics
βββ sdk.py # completion(), acompletion(), Router class
βββ clustering/
β βββ pipeline.py # Extract β embed β cluster β label β store
β βββ labeler.py # LLM-powered cluster labeling via harness
β βββ quality.py # Coherence, diversity, noise quality gates
βββ harness/
β βββ runner.py # Agent executor (JSON parsing, retry, tools)
β βββ tools.py # Agent tools (query traces, datasets, etc.)
β βββ memory_store.py # Persistent agent memory
β βββ agents/ # 7 agent configs (.md files)
β βββ cluster_labeler.md
β βββ coherence_scorer.md
β βββ outlier_detector.md
β βββ merge_checker.md
β βββ trace_scanner.md
β βββ eval_generator.md
β βββ metrics_suggester.md
βββ distillation/
β βββ pipeline.py # 4-phase orchestrator (data gen β curation β train β export)
β βββ data_gen.py # Teacher model candidate generation
β βββ curation.py # LLM-as-Judge scoring & selection
β βββ trainer.py # SFT/BOND fine-tuning (Unsloth + LoRA)
β βββ export.py # LoRA merge + GGUF conversion
β βββ repository.py # ClickHouse persistence
β βββ router.py # API endpoints
β βββ schemas.py # Pydantic models & model catalog
βββ evaluations/ # Evaluation runs & results
βββ datasets/ # Dataset CRUD, from-traces, auto-collect
βββ metrics/ # Metric definitions & validation
βββ experiments/ # A/B experiments & comparison
βββ annotations/ # Human annotation queues
βββ auto_eval/ # Automated evaluation configs & triggers
βββ eval_agent/ # AI-powered eval setup assistant
βββ proposals/ # Decision engine proposals
βββ trace_issues/ # Issue scanning & detection
βββ training/ # Custom router training (UniRoute)
βββ storage/
β βββ clickhouse_client.py # Analytics queries
β βββ secrets.py # API key management
β βββ state_manager.py # File-based state persistence
βββ model_prices.py # 70+ models with pricing
βββ mcp/ # Claude Code MCP server
ui/ # React dashboard
βββ src/features/
β βββ traces/ # Trace explorer with drawer, filters, timeline
β βββ evaluations/ # Run evaluations, metrics, experiments
β βββ distill-dataset/ # Dataset management, clustering, export
Evaluation Workflow
# 1. Send traffic through the gateway (traces auto-captured)
curl http://localhost:8080/v1/chat/completions \
-d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Explain closures in JS"}]}'
# 2. Run clustering to group prompts by domain
curl -X POST http://localhost:8000/v1/clustering/run?days=30&min_traces=5
# 3. AI suggests metrics for a domain dataset
curl -X POST http://localhost:8000/v1/auto-eval/suggest-metrics \
-d '{"dataset_id": "cluster:run-id:2"}'
# 4. Run evaluation comparing models on that dataset
curl -X POST http://localhost:8000/v1/evaluations \
-d '{"name": "JS eval", "dataset_id": "cluster:run-id:2",
"models": ["openai/gpt-4o-mini", "mistral/mistral-small-latest"],
"metrics": ["similarity", "latency", "cost", "llm_judge"]}'
# 5. Check results
curl http://localhost:8000/v1/evaluations/{id}/results
Distillation
BOND-style distillation pipeline: generate candidates with a teacher model, score them with LLM-as-Judge, fine-tune a student model with LoRA, and export to GGUF.
make install-train # install training deps (requires CUDA)
Via UI at http://localhost:3000 β Distillation, or via API:
curl -X POST http://localhost:8000/v1/distillation/default/jobs \
-H "Content-Type: application/json" \
-d '{
"teacher_model": "openai/gpt-4o-mini",
"student_model": "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
"num_candidates": 5,
"dataset_id": "my-dataset"
}'
Semantic Routing
With pre-trained weights, the router picks the best model per prompt:
make download-weights # download from HuggingFace
make gateway-router # start with semantic routing enabled
router = load_router()
decision = router.route("Explain quantum computing")
print(f"Best model: {decision.selected_model}")
print(f"Expected error: {decision.expected_error:.4f}")
Training Custom Routers
from opentracy import full_training_pipeline, TrainingConfig, PromptDataset, create_client
train_data = PromptDataset.load("train.json")
val_data = PromptDataset.load("val.json")
clients = [
create_client("openai", "gpt-4o"),
create_client("openai", "gpt-4o-mini"),
create_client("groq", "llama-3.1-8b-instant"),
]
result = full_training_pipeline(
train_data, val_data, clients,
TrainingConfig(num_clusters=100, output_dir="./weights"),
)
MCP Integration (Claude Code)
pip install opentracy[mcp]
Add to ~/.claude/settings.json:
{
"mcpServers": {
"opentracy": {
"command": "python",
"args": ["-m", "opentracy.mcp"]
}
}
}
Tools: opentracy_route, opentracy_generate, opentracy_smart_generate, opentracy_list_models, opentracy_compare.
Development
make help # show all commands
make install # install Python SDK + Go deps
make install-train # install training/distillation deps (CUDA)
make up # start full local stack (ClickHouse + engine + API + UI)
make stop # stop all local services
make test # run all tests
make lint # lint all code
License
MIT License - see LICENSE file for details.
