io.github.us-all/datadog
Datadog MCP β 165 tools for metrics, monitors, logs, APM, RUM, incidents, CI/CD, fleet
Ask AI about io.github.us-all/datadog
Powered by Claude Β· Grounded in docs
I know everything about io.github.us-all/datadog. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Datadog MCP Server
The Datadog MCP that answers "why is this happening?" β not just "what's the value?"
Aggregation tools that fold 5β7 sequential API calls into one structured response. Full SLO CRUD. Fleet automation. The widest Datadog API coverage in any MCP β 159 tools built on the @us-all MCP standard.
What it does that others don't
- Aggregation tools β
analyze-monitor-stateandslo-compliance-snapshotcollapse 5β7 sequential API calls into one structured response with acaveatsarray for partial failures. No other Datadog MCP ships this pattern. - Full SLO CRUD β create, update, delete SLOs (and their corrections). The official Bits AI MCP and community alternatives are read-only on SLOs.
- Fleet Automation β 17 tools across deployments, schedules, and instrumented pods. Only this server.
- Status Pages β 21 tools for full status-page lifecycle (components, degradations, maintenances). Only this server.
- Token-efficient by design β
extractFieldsprojection,DD_TOOLS/DD_DISABLE16-category toggles, and asearch-toolsmeta-tool keep LLM context low across 159 tools. - Apps SDK card β
slo-compliance-snapshotrenders as a visual card on ChatGPT clients via_meta["openai/outputTemplate"]. Claude clients receive the same JSON content (non-breaking). - stdio + Streamable HTTP β defaults to stdio (Claude Desktop / Code). Set
MCP_TRANSPORT=httpfor ChatGPT Apps SDK or remote clients (Bearer auth viaMCP_HTTP_TOKEN).
Try this β 5 prompts
Connect the server to Claude Desktop or Claude Code, then paste any of these:
- SLO health β "List my SLOs and their error budget remaining this month. Group by status: compliant, at-risk, breached."
- Incident triage β "There's an active incident on
checkout-service. Pull the linked monitors, the recent error spikes from APM, and which deployments touched the service in the last 24h." - Monitor noise audit β "Find monitors that alerted more than 10 times in the last 7 days but had MTTR under 5 minutes β these are probably flapping."
- RUM error spike β "RUM error rate jumped on the checkout funnel between 14:00 and 14:30 today. Show me the top error groups, affected sessions, and the user actions before the errors."
- Fleet rollout β "Schedule the
datadog-agent7.55.0 rollout to thestagingcluster, weekends only, starting next Saturday."
When to use this vs Datadog's official MCP
Datadog's official MCP (Bits AI MCP, GA 2026-03-09) is complementary, not a replacement:
| Official Datadog MCP | @us-all/datadog-mcp (this) | |
|---|---|---|
| Tool count | 16+ core toolsets | 159 tools across full API surface |
| Deployment | Remote (managed by Datadog) | Self-host stdio (npx / Docker / npm) |
| Auth | Datadog SSO | API + APP key |
| Sites | Public Datadog sites | Any site, incl. internal/sovereign; US5 default |
| SLO writes | β | β create/update/delete SLOs + corrections |
| Fleet automation | β | β 17 tools |
| Status pages | β | β 21 tools |
| Aggregation tools | β | β
analyze-monitor-state, slo-compliance-snapshot |
| MCP Prompts | β | β
4 (triage-incident, audit-monitor-noise, analyze-rum-error-spike, investigate-slow-trace) |
| MCP Resources | β | β
dd://service/{serviceName}, dd://team/{teamId}, dd://synthetics/{testId}, etc. |
Use the official Bits AI MCP for fast managed onboarding and SSO. Use this when you need full API coverage, SLO/fleet/status-page write parity, or self-hosting (internal sites, isolated networks, dev/CI sandboxes).
Install
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"datadog": {
"command": "npx",
"args": ["-y", "@us-all/datadog-mcp"],
"env": {
"DD_API_KEY": "<your-api-key>",
"DD_APP_KEY": "<your-app-key>",
"DD_SITE": "datadoghq.com"
}
}
}
}
Claude Code
claude mcp add datadog -s user \
-e DD_API_KEY=<your-api-key> -e DD_APP_KEY=<your-app-key> -e DD_SITE=datadoghq.com \
-- npx -y @us-all/datadog-mcp
Docker
docker run -e DD_API_KEY=... -e DD_APP_KEY=... -e DD_SITE=datadoghq.com \
ghcr.io/us-all/datadog-mcp-server:latest
Build from source
git clone https://github.com/us-all/datadog-mcp-server.git
cd datadog-mcp-server && pnpm install && pnpm build
node dist/index.js
Configuration
| Variable | Required | Default | Description |
|---|---|---|---|
DD_API_KEY | β | β | Datadog API key |
DD_APP_KEY | β | β | Datadog Application key |
DD_SITE | β | us5.datadoghq.com | Datadog site (see table below) |
DD_ALLOW_WRITE | β | false | Set true to enable mutations (create/update/delete) |
DD_TOOLS | β | β | Comma-sep allowlist of categories. Only these load β biggest token saver. |
DD_DISABLE | β | β | Comma-sep denylist. Ignored when DD_TOOLS is set. |
MCP_TRANSPORT | β | stdio | http to enable Streamable HTTP transport |
MCP_HTTP_TOKEN | conditional | β | Bearer token. Required when MCP_TRANSPORT=http |
MCP_HTTP_PORT | β | 3000 | HTTP listen port |
MCP_HTTP_HOST | β | 127.0.0.1 | HTTP bind host (DNS rebinding protection auto-enabled for localhost) |
MCP_HTTP_SKIP_AUTH | β | false | Skip Bearer auth β e.g. behind a reverse proxy that handles it |
Categories (16): metrics, monitors, dashboards, logs, apm, rum, incidents, security, synthetics, ci, infra, fleet, status-pages, oncall, teams, account.
When MCP_TRANSPORT=http: POST /mcp (Bearer-auth JSON-RPC) + GET /health (public liveness).
Sites:
| Site | Value | Region |
|---|---|---|
| US1 | datadoghq.com | US (Virginia) |
| US3 | us3.datadoghq.com | US (Virginia) |
| US5 | us5.datadoghq.com | US (Oregon) |
| EU1 | datadoghq.eu | EU (Frankfurt) |
| AP1 | ap1.datadoghq.com | Asia-Pacific (Tokyo) |
Token efficiency
Naive setup loads ~25K tokens of tool schema before any conversation. Three knobs mitigate:
| Scenario | Tools | Schema tokens | vs default |
|---|---|---|---|
| default (all categories) | 159 | 25,200 | β |
typical (DD_TOOLS=metrics,monitors,logs,apm,dashboards) | 55 | 9,300 | β63% |
narrow (DD_TOOLS=metrics,monitors) | 24 | 3,800 | β85% |
- Category toggles β
DD_TOOLS=metrics,monitors,logs,apm(biggest win). extractFieldsresponse projection βget-dashboard { dashboardId: "abc", extractFields: "id,title,widgets.*.definition.type" }.search-toolsmeta-tool β always enabled; lets the LLM discover tools at runtime instead of preloading all schemas.
Read-only mode
By default, all writes are blocked to prevent accidental mutations by AI agents. The following require DD_ALLOW_WRITE=true:
create-monitor, update-monitor, delete-monitor, mute-monitor, create-dashboard, update-dashboard, delete-dashboard, send-logs, post-event, trigger-synthetics, create-synthetics-test, update-synthetics-test, delete-synthetics-test, create-downtime, cancel-downtime, create-case, update-case-status, send-dora-deployment, send-dora-incident, create-slo, update-slo, delete-slo, plus all fleet/status-page/security writes.
MCP Prompts (4)
Workflow templates the model can invoke directly:
triage-incidentβ given an incident ID, walks linked monitors, recent error spikes, and recent deploys.audit-monitor-noiseβ flag flapping monitors via alert frequency Γ MTTR.analyze-rum-error-spikeβ diff RUM error rates across two windows, attribute to top error groups.investigate-slow-traceβ given a slow trace ID, traverse the span tree and surface bottleneck spans.
MCP Resources
Read-only entities by URI: dd://monitor/{id}, dd://dashboard/{id}, dd://slo/{id}, dd://incident/{id}, dd://service/{serviceName}, dd://team/{teamId} (team + members), dd://synthetics/{testId}, dd://host/{name}.
Tool reference
159 tools across 16 categories. Use the search-tools meta-tool to discover at runtime; the full list is collapsed below.
| Domain | Tools |
|---|---|
| Status Pages | 21 |
| RUM (events + apps + metrics + retention) | 27 |
| Metrics, Hosts, SLOs, Downtimes, Containers, Processes | 19 |
| Fleet Automation | 17 |
| Synthetics, Logs/Spans Metrics, SLO Corrections | 16 |
| Monitors, Dashboards, Notebooks, Events | 16 |
| Incidents, Cases, Error Tracking, Audit | 13 |
| OnCall, Teams, Users, Services, Bots | 11 |
| Security signals + rules + suppressions | 9 |
| APM, CI Visibility, DORA, Network Devices | 9 |
| + aggregations | analyze-monitor-state, slo-compliance-snapshot |
| + meta | search-tools |
Full tool list (click to expand)
Metrics (5)
query-metrics, get-metrics, get-metric-metadata, list-active-metrics, list-metric-tags
Monitors (7)
get-monitors, get-monitor, create-monitor, update-monitor, delete-monitor, mute-monitor, validate-monitor, analyze-monitor-state (aggregation)
Dashboards (5)
get-dashboards, get-dashboard, create-dashboard, update-dashboard, delete-dashboard
Logs (3)
search-logs, aggregate-logs, send-logs
Events (2)
get-events, post-event
Incidents (6)
get-incidents, get-incident, search-incidents, create-incident, update-incident, delete-incident
APM (1)
search-spans
RUM (17)
search-rum-events, aggregate-rum, list-rum-applications, get-rum-application, create-rum-application, update-rum-application, delete-rum-application, list-rum-metrics, get-rum-metric, create-rum-metric, update-rum-metric, delete-rum-metric, list-rum-retention-filters, get-rum-retention-filter, create-rum-retention-filter, update-rum-retention-filter, delete-rum-retention-filter
SLOs (6)
list-slos, get-slo, get-slo-history, create-slo, update-slo, delete-slo, slo-compliance-snapshot (aggregation), plus 5 SLO-correction tools
Synthetics (6)
list-synthetics, get-synthetics-result, trigger-synthetics, create-synthetics-test, update-synthetics-test, delete-synthetics-test
Hosts / Containers / Processes (4)
list-hosts, get-host-totals, list-containers, list-processes
Downtimes (3)
list-downtimes, create-downtime, cancel-downtime
Security (9)
search-security-signals, get-security-signal, list-security-rules, get-security-rule, delete-security-rule, list-security-suppressions, get-security-suppression, create-security-suppression, delete-security-suppression
CI Visibility (4)
search-ci-pipelines, aggregate-ci-pipelines, search-ci-tests, aggregate-ci-tests
Cases (4)
list-cases, get-case, create-case, update-case-status
Error Tracking (2)
list-error-tracking-issues, get-error-tracking-issue
DORA (2)
send-dora-deployment, send-dora-incident
Network Devices (2)
list-network-devices, get-network-device
Notebooks (2)
list-notebooks, get-notebook
OnCall (2)
get-team-oncall, get-oncall-schedule
Services & Software Catalog (2)
list-services, get-service-definition
Teams (6)
list-teams, get-team, create-team, update-team, delete-team, get-team-members
Account & Users (2)
get-usage-summary, list-users
Logs/Spans/APM Retention metrics (15)
5 each for logs-metrics, spans-metrics, apm-retention-filters (list/get/create/update/delete)
Status Pages (21)
Full lifecycle: pages, components, degradations, maintenances. See src/tools/status-pages.ts.
Fleet Automation (17)
Agents, deployments, schedules, instrumented pods. See src/tools/fleet.ts.
Audit (1)
search-audit-logs
Meta (1)
search-tools β query other tools by keyword; always enabled regardless of DD_TOOLS.
Architecture
Claude β MCP stdio β index.ts β tools/*.ts β @datadog/datadog-api-client β Datadog API
Built on @us-all/mcp-toolkit:
extractFieldsβ token-efficient response projectionsaggregate(fetchers, caveats)β fan-out helper for aggregation toolscreateWrapToolHandlerβ domain-specific redaction (DD_API_KEY/DD_APP_KEY) + DatadogApiExceptionerror extractionsearch-toolsmeta-tool
Tech stack
Node.js 18+ β’ TypeScript strict ESM β’ pnpm β’ @modelcontextprotocol/sdk β’ @datadog/datadog-api-client (official) β’ zod β’ dotenv β’ vitest + dd-trace.
Contributing
See CONTRIBUTING.md. New shared patterns belong in @us-all/mcp-toolkit β single source of truth for the 7-server suite.
