Sre Agent
SRE Agent β An AI-powered MCP server for production incident triage. Takes natural-language symptom reports, plans structured investigations using Gemini, executes parallel workers (logs, metrics, deploys, runbooks), synthesizes root-cause reports, and proposes remediation patches with human approval gates.
Installation
npx sre-agentAsk AI about Sre Agent
Powered by Claude Β· Grounded in docs
I know everything about Sre Agent. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
π Site Reliability Engineer (SRE) Agent π΅οΈββοΈ
Welcome to the SRE Agent project. This open-source AI agent helps you monitor logs, diagnose production issues, suggest fixes, and post findings to your team so you can move faster when things go wrong.
π Quick Start
Prerequisites
- Python 3.13+
- Docker (required for local mode)
1οΈβ£ Install the SRE Agent
pip install sre-agent
2οΈβ£ Start the CLI
sre-agent
On first run, the setup wizard will guide you through configuration:

3οΈβ£ Provide the required setup values
The wizard currently asks for:
ANTHROPIC_API_KEYGITHUB_PERSONAL_ACCESS_TOKENGITHUB_OWNER,GITHUB_REPO,GITHUB_REFSLACK_BOT_TOKEN,SLACK_CHANNEL_ID- AWS credentials (
AWS_PROFILEor access keys) andAWS_REGION
By default the agent uses claude-sonnet-4-5-20250929. You can override this by setting the MODEL environment variable.
4οΈβ£ Pick a running mode
After setup, the CLI gives you two modes:
Local: run diagnoses from your machine against a CloudWatch log group.Remote Deployment: deploy and run the agent on AWS ECS.
Remote mode currently supports AWS ECS only for deploying the agent runtime.
This is the local shell view:

π What Does It Do?
Think about a microservice app where any service can fail at any time. The agent watches error logs, identifies which service is affected, checks the configured GitHub repository, diagnoses likely root causes, suggests fixes, and reports back to Slack.
In short, it handles the heavy lifting so your team can focus on fixing the issue quickly.
Your application can run on Kubernetes, ECS, VMs, or elsewhere. The key requirement is that logs are available in CloudWatch.
πΊοΈ Integration Roadmap
π§ Model provider
- Anthropic
- vLLM
- OpenAI
πͺ΅ Logging platform
- AWS CloudWatch
- Google Cloud Observability
- Azure Monitor
π’ Remote code repository
- GitHub
- GitLab
- Bitbucket
π Notification channel
- Slack
- Microsoft Teams
πΆοΈ Remote deployment mode:
- AWS ECS
[!TIP] Looking for a feature or integration that is not listed yet? Open a Feature / Integration request π
ποΈ Architecture

The diagram shows the boundary between your application environment and the agent responsibilities.
You are responsible for getting logs into your logging platform and setting up how the agent is triggered (for example, CloudWatch metric filters and alarms). Once triggered, the agent handles diagnosis and reporting.
The monitored application is not limited to AWS ECS. It can be deployed anywhere, as long as it sends relevant logs to CloudWatch.
When running with the current stack, the flow is:
- Read error logs from CloudWatch.
- Inspect source code via the configured GitHub MCP integration.
- Produce diagnosis and fix suggestions.
- Send results to Slack.
π§ͺ Evaluation
We built an evaluation suite to test both tool-use behaviour and diagnosis quality. You can find details here:
Run the suites with:
uv run sre-agent-run-tool-call-eval
uv run sre-agent-run-diagnosis-quality-eval
π€ Why We Built This
We wanted to learn practical best practices for running AI agents in production: cost, safety, observability, and evaluation. We are sharing the journey in the open and publishing what we learn as we go.
We also write about this work on the Fuzzy Labs blog.
Contributions welcome. Join us and help shape the future of AI-powered SRE.
π§ For Developers
See DEVELOPMENT.md for the full local setup guide.
Install dependencies:
uv sync --dev
Run the interactive CLI locally:
uv run sre-agent
If you want to run a direct diagnosis without the CLI:
docker compose up -d slack
uv run python -m sre_agent.run /aws/containerinsights/no-loafers-for-you/application currencyservice 10
