Evidra
Fail-closed policy guardrails for AI agents running kubectl, terraform, helm, and argocd.
Ask AI about Evidra
Powered by Claude Β· Grounded in docs
I know everything about Evidra. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Evidra
Flight recorder and reliability scoring for infrastructure automation
Evidra records intent, outcome, and refusal for every infrastructure mutation β across MCP agents, CI pipelines, A2A agents, and scripts. The append-only evidence chain enables risk assessment, behavioral signal detection, and reliability scoring.
CLI and MCP are the authoritative analytics surfaces today.
Two ways to use it:
| What | How | |
|---|---|---|
| DevOps MCP Server | All-in-one: kubectl/helm/terraform/aws with smart output + auto-evidence | evidra-mcp as your agent's MCP server |
| Flight Recorder | Add evidence to any existing workflow β no MCP required | evidra record, evidra import, webhooks, or proxy mode |
Quick Start β MCP Server
{
"mcpServers": {
"evidra": {
"command": "evidra-mcp",
"args": ["--evidence-dir", "~/.evidra/evidence"]
}
}
}
Your agent gets seven default DevOps tools: run_command, collect_diagnostics, write_file, describe_tool, prescribe_smart, report, and get_event. The normal path is still run_command with automatic evidence recording for mutations. Use describe_tool only when you want the full explicit-control schema for prescribe_smart or report. Add --full-prescribe when you also want artifact-aware prescribe_full.
Quick Start β CLI (No MCP)
# Wrap any command β evidence recorded automatically
evidra record -f deploy.yaml -- kubectl apply -f deploy.yaml
# Import from CI pipelines
evidra import --input record.json
# View reliability scorecard
evidra scorecard --period 30d
Works with any agent framework, CI system, or script. No MCP required.
Security boundary: Evidra does not sandbox the wrapped command. Treat it with the same trust model as direct shell execution.
# Install
brew install samebits/tap/evidra
What Your Agent Gets
Smart output β fewer tokens, same information
Agent: run_command("kubectl get deployment web -n bench")
# Without evidra-mcp (raw JSON): ~2,400 tokens
{"apiVersion":"apps/v1","metadata":{"managedFields":[...],...},"spec":{...},"status":{...}}
# With evidra-mcp (smart output): ~40 tokens
deployment/web (bench): 0/2 ready | image: nginx:99.99 | Available=False
Auto-evidence for mutations β zero agent code
Agent: run_command("kubectl apply -f fix.yaml")
β evidra auto-prescribes (intent recorded)
β kubectl executes
β evidra auto-reports (outcome recorded)
β smart output returned to agent
Read-only commands (get, describe, logs) execute directly β no overhead.
Skills β tested on real infrastructure
Install the Evidra skill to give your agent operational discipline: diagnosis before fix, safety boundaries, domain-specific patterns. Skills are tested on 62 real scenarios via infra-bench before shipping β skills that hurt performance don't ship.
7 default tools, plus optional Full Prescribe
| Tool | Description |
|---|---|
run_command | Execute kubectl, helm, terraform, aws β with smart output |
collect_diagnostics | Gather pods, describe output, events, and recent logs for one workload |
write_file | Write config or manifest files under the current workspace or temp directories |
describe_tool | Show the full schema for deferred protocol tools when you want explicit control |
prescribe_smart | Smart Prescribe with deferred schema loading; use describe_tool first when needed |
report | Record outcome; full explicit schema available via describe_tool |
get_event | Look up evidence |
Enable --full-prescribe to add Full Prescribe when your agent has artifact bytes and you want artifact-aware explicit intent capture.
Most agents only need run_command. Use collect_diagnostics when the model would otherwise spend multiple turns on get / describe / events / logs. Use write_file for agent-authored manifests or Terraform snippets without leaving the MCP surface. Use describe_tool only when you deliberately want the explicit prescribe_smart / report flow instead of the default auto-evidence path.
Why Not Just kubectl-mcp-server?
| kubectl-mcp-server | evidra-mcp | |
|---|---|---|
| Tools | 270 specialized | 7 default tools + optional Full Prescribe |
| Output | Raw JSON (~2400 tokens) | Smart summary (~40 tokens) |
| Evidence | None | Auto prescribe/report for mutations |
| Security | Open | Command allowlist + blocked subcommands |
| Skills | None | Bench-tested, installable |
| Scoring | None | Reliability scorecards + behavioral signals |
For Platform Teams
Self-hosted analytics
docker compose up --build -d
Centralize evidence across agents, pipelines, and controllers:
- Which agents retry the same operation?
- Which scenarios cause the most failures?
- How does model X compare to model Y on real infrastructure?
CI/CD integration
# Wrap any command β CLI records prescribe/execute/report
evidra record -f deploy.yaml -- kubectl apply -f deploy.yaml
# Import completed operations
evidra import --input record.json
# View reliability scorecard
evidra scorecard --period 30d
References: Self-hosted setup Β· CLI reference Β· API reference
For Agent Benchmarking
Test which skills and tools actually improve your agent. 62 real scenarios on real Kubernetes clusters.
# Baseline β no skill
infra-bench certify --track cka --model sonnet --provider bifrost
# With role skill
infra-bench certify --track cka --model sonnet --role k8s-admin
# Result: skills help L1 (75% fewer turns) but break L2 diagnosis
Bench repo: evidra-infra-bench | Dashboard: lab.evidra.cc/bench
Intelligence Layer
From the evidence chain, Evidra computes:
- Risk assessment β pluggable pipeline with multiple assessors
- Behavioral signals β protocol violations, retry loops, blast radius, drift detection
- Reliability scorecards β 0-100 score with band and confidence
Eight behavioral signals documented in the Signal specification.
Explicit Protocol (Advanced)
For agents that want full control over evidence recording:
prescribe_smart / prescribe_full β canonicalize artifact β assess risk β record intent
execute β run the command (or decline to act)
report β record verdict, exit code, or refusal reason
Three evidence modes:
| Mode | How | Agent awareness |
|---|---|---|
| Proxy Observed | Auto prescribe/report via observed mutation-style tool calls | None needed |
| Smart Prescribe | Agent calls prescribe_smart + report | Minimal (~30 tokens) |
| Full Prescribe | Agent calls prescribe_full with artifact | Full artifact (~300 tokens) |
Most users should use Proxy Observed or the default DevOps surface. Smart Prescribe and Full Prescribe are for teams that want agents to see risk assessments before executing.
Proxy Mode β Wrap Mutation-Oriented MCP Servers
Add evidence to an existing MCP server β zero agent changes:
{
"mcpServers": {
"infra": {
"command": "evidra-mcp",
"args": ["--proxy", "--", "npx", "-y", "@anthropic/mcp-server-kubernetes"]
}
}
}
The proxy records evidence when it sees run_command or other mutation-shaped MCP tool calls it can classify heuristically. Unclassified or read-only tool calls pass through without evidence.
Docs
- MCP Setup Guide
- Skill Setup Guide
- CLI Reference
- API Reference
- Architecture
- Protocol Specification
- Executor Contract
- Supported Tools
Development
make build
make test
make lint
make test-mcp-inspector # MCP protocol compliance tests
Environment Variables
| Variable | Description |
|---|---|
EVIDRA_EVIDENCE_DIR | Evidence storage path (default: ~/.evidra/evidence) |
EVIDRA_SIGNING_MODE | strict (default) or optional (dev mode) |
EVIDRA_SIGNING_KEY | Base64 Ed25519 signing key |
EVIDRA_ENVIRONMENT | Environment label (production, staging) |
License
Licensed under the Apache License 2.0.
