Pydantic AI Harness
Batteries for your Pydantic AI agent.
Ask AI about Pydantic AI Harness
Powered by Claude · Grounded in docs
I know everything about Pydantic AI Harness. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Pydantic AI Harness
The batteries for your Pydantic AI agent.
Pydantic AI's capabilities and hooks API is how you give an agent its harness -- bundles of tools, lifecycle hooks, instructions, and model settings that extend what the agent can do without any framework changes.
Pydantic AI Harness is the official capability library for Pydantic AI, maintained by the Pydantic AI team. Pydantic AI core ships capabilities that require model or framework support, and capabilities fundamental to every agent -- web search, tool search, thinking. Everything else lives here: standalone building blocks you pick and choose to turn your agent into a coding agent, a research assistant, or anything else. This is also where new capabilities start -- as they stabilize and prove themselves broadly essential, they can graduate into core.
The capability matrix tracks where we are. Tell us what to prioritize.
Contents: Installation · Quick start · Capability matrix · An ecosystem agent · Help us prioritize · Build your own · Contributing · Version policy · Pydantic AI references · License
Installation
uv add pydantic-ai-harness
Extras for specific capabilities:
uv add "pydantic-ai-harness[codemode]" # CodeMode (adds the Monty sandbox)
The code-mode extra is also supported as an alias.
Requires Python 3.10+ and pydantic-ai-slim>=1.89.1.
Quick start
uv add "pydantic-ai-slim[anthropic,mcp,duckduckgo,logfire]" "pydantic-ai-harness[code-mode]"
import logfire
from pydantic_ai import Agent
from pydantic_ai.capabilities import MCP, WebSearch
from pydantic_ai_harness import CodeMode
# See https://ai.pydantic.dev/logfire/ for setup details.
logfire.configure()
logfire.instrument_pydantic_ai()
agent = Agent(
'anthropic:claude-opus-4-7',
capabilities=[
# Wraps every tool into a single run_code tool, sandboxed by Monty
# (https://github.com/pydantic/monty -- pulled in by the [code-mode] extra).
# The model writes Python that calls multiple tools with loops, conditionals,
# asyncio.gather, and local filtering -- one model round-trip for N tool calls.
CodeMode(),
# Connect to any MCP server -- here, the open-source Hacker News server
# (https://github.com/cyanheads/hn-mcp-server). builtin=False forces the
# local FastMCP toolset so CodeMode can wrap the tools; without it,
# providers that natively support MCP server connectors execute the tools
# server-side and bypass the sandbox.
MCP('https://hn.caseyjhand.com/mcp', builtin=False),
# Provider-adaptive web search; builtin=False routes through the local
# DuckDuckGo fallback (the [duckduckgo] extra above) so CodeMode can batch
# web searches alongside the HN calls in a single run_code.
WebSearch(builtin=False),
],
)
result = agent.run_sync(
"Across the top, best, and 'show HN' Hacker News feeds, find the most-discussed "
"story with at least 100 points. Pull its comment thread, its submitter's profile, "
"and any web coverage. Summarize what you find in one paragraph."
)
print(result.output)
"""
The most-discussed HN story across top/best/show clearing 100 points is "Vibe coding
and agentic engineering are getting closer than I'd like" by Simon Willison (748 points,
853 comments, on the Best feed), submitted by long-time HNer e12e. The piece argues
that the two modes Willison once kept mentally separate -- throwaway "vibe coding" and
disciplined "agentic engineering" -- are blurring, since agents like Claude Code now
reliably handle non-trivial tasks like "build a JSON API endpoint that runs a SQL query"
with tests and docs on the first pass. The HN thread is unusually substantive, with
commenters debating whether LLMs created or merely *exposed* sloppy engineering
practices and warning of a "normalization of deviance" as engineers stop reviewing diffs.
"""
See this run as a public Logfire trace → Each run_code span fans out into the tool calls the model issued from inside the sandbox -- it's the easiest way to understand what code mode actually did.
Capability matrix
We studied leading coding agents, agent frameworks, and Claw-style assistants to map every capability area that matters for production agents. Each one is tracked as an issue in this repo.
Vote on whatever is linked in the Status column -- PRs if we're actively building it, issues if it's planned -- to help us decide what to work on next.
| Category | Capability | Description | Status | Community alternatives |
|---|---|---|---|---|
| Tools &Â execution | Code mode | Sandboxed Python execution via Monty -- one run_code call replaces N tool calls | :white_check_mark: Docs | |
| Tool search | Progressive tool discovery for large tool sets | :white_check_mark: Pydantic AI | ||
| File system | Read, write, edit, search files with path traversal prevention | :construction: PR #177 | pydantic-ai-backend (vstorm‑co) | |
| Shell | Execute commands with allowlists, denylists, and timeouts | :construction: PR #177 | pydantic-ai-backend (vstorm‑co) | |
| Repo context injection | Auto-load CLAUDE.md/AGENTS.md and repo structure | :construction: PR #175 | pydantic-deep (vstorm‑co) | |
| Verification loop | Run tests after edits, auto-fix failures | :construction: PRÂ #169 | ||
| Context management | Sliding window | Trim conversation history to stay within token limits | :construction: PR #191 | summarization-pydantic-ai (vstorm‑co) |
| Context compaction | LLM-powered summarization of older messages | :construction: PR #191 | summarization-pydantic-ai (vstorm‑co) | |
| Limit warnings | Warn agent before hitting context/iteration limits | :construction: PR #191 | summarization-pydantic-ai (vstorm‑co) | |
| Tool output management | Truncate, summarize, or spill large tool outputs | :construction: PRÂ #185 | ||
| System reminders | Inject periodic reminders to counteract instruction drift | :construction: PRÂ #181 | ||
| Memory & persistence | Memory | Persistent key-value memory across sessions | :construction: PR #179 | pydantic-deep (vstorm‑co) |
| Session persistence | Save and restore full conversation state | :construction: PRÂ #176 | ||
| Checkpointing | Save, rewind, and fork conversation state | :memo: #196 | pydantic-deep (vstorm‑co) | |
| Agent orchestration | Sub-agents | Delegate subtasks to specialized child agents | :construction: PR #178 | subagents-pydantic-ai (vstorm‑co) |
| Skills | Progressive tool loading -- search, activate, deactivate | :construction: PR #183 | pydantic-ai-skills (DougTrajano), pydantic-deep (vstorm‑co) | |
| Planning | Break complex tasks into structured plans before execution | :construction: PRÂ #180 | ||
| Task tracking | Track tasks, subtasks, and dependencies | :memo: #65 | pydantic-ai-todo (vstorm‑co) | |
| Teams | Multi-agent teams with shared state and message bus | :memo: #195 | pydantic-deep (vstorm‑co) | |
| Safety & guardrails | Input guardrails | Validate user input before the agent run starts | :construction: PR #182 | pydantic-ai-shields (vstorm‑co) |
| Output guardrails | Validate model output after the run completes | :construction: PR #182 | pydantic-ai-shields (vstorm‑co) | |
| Cost/token budgets | Enforce token and cost limits per run | :construction: PR #182 | pydantic-ai-shields (vstorm‑co) | |
| Tool access control | Block tools or require approval before execution | :construction: PR #182 | pydantic-ai-shields (vstorm‑co) | |
| Async guardrails | Run validation concurrently with model requests | :construction: PR #182 | pydantic-ai-shields (vstorm‑co) | |
| Secret masking | Detect and redact secrets in agent I/O | :construction: PR #172 | pydantic-ai-shields (vstorm‑co) | |
| Approval workflows | Require human approval for sensitive operations | :construction: PR #173 | Pydantic AI (built‑in) | |
| Tool budget | Limit total tool calls or cost per run | :construction: PRÂ #168 | ||
| Reliability | Stuck loop detection | Detect and break out of repetitive agent loops | :construction: PRÂ #186 | |
| Tool error recovery | Retry failed tool calls with backoff and budget | :construction: PRÂ #171 | ||
| Tool orphan repair | Fix orphaned tool calls in conversation history | :construction: PRÂ #184 | ||
| Reasoning | Adaptive reasoning | Adjust thinking effort based on task complexity | :construction: PRÂ #174 | |
| Current time | Inject current date/time into system prompt | :construction: PRÂ #170 |
Packages by vstorm-co are endorsed by the Pydantic AI team. We're working with them to upstream some of their implementations into this repo.
An ecosystem agent
The Quick start above is deliberately small. Here's the other end of the spectrum -- an agent wired up with capabilities drawn from across the Pydantic AI ecosystem: this repo, core pydantic-ai, and the community packages we vouch for in the matrix above.
import logfire
from pydantic_ai import Agent
from pydantic_ai.capabilities import MCP, Thinking, WebSearch
from pydantic_ai_harness import CodeMode
# Community packages, alphabetical:
from pydantic_ai_backends import ConsoleCapability
from pydantic_ai_shields import CostTracking, InputGuard, SecretRedaction, ToolGuard
from pydantic_ai_skills import SkillsCapability
from pydantic_ai_summarization import ContextManagerCapability
from pydantic_ai_todo import TodoCapability
from pydantic_deep import MemoryCapability, StuckLoopDetection
from subagents_pydantic_ai import SubAgentCapability, SubAgentConfig
# See https://ai.pydantic.dev/logfire/ for setup details.
logfire.configure()
logfire.instrument_pydantic_ai()
agent = Agent(
'anthropic:claude-opus-4-7',
capabilities=[
# --- Execution ---
# Wraps every tool into a single run_code, sandboxed by Monty.
CodeMode(),
# --- Reasoning ---
# Provider-adaptive thinking; uses native extended thinking on supporting models.
Thinking(effort='xhigh'),
# --- Context management ---
# Sliding window + LLM compaction. By @vstorm-co:
# https://github.com/vstorm-co/summarization-pydantic-ai
# Pydantic AI also ships `AnthropicCompaction` and `OpenAICompaction` for
# provider-native compaction.
ContextManagerCapability(max_tokens=180_000),
# --- Tools ---
# Connect to any MCP server -- here, the open-source Hacker News server
# (https://github.com/cyanheads/hn-mcp-server).
MCP('https://hn.caseyjhand.com/mcp'),
# Provider-adaptive web search; falls back to a local DuckDuckGo implementation.
WebSearch(),
# Filesystem + shell. By @vstorm-co: https://github.com/vstorm-co/pydantic-ai-backend
ConsoleCapability(),
# --- Memory & persistence ---
# Persistent ./MEMORY.md per agent name. By @vstorm-co:
# https://github.com/vstorm-co/pydantic-deepagents
MemoryCapability(agent_name='harness-example'),
# --- Orchestration ---
# Agent skills (Anthropic's spec) by @DougTrajano:
# https://github.com/DougTrajano/pydantic-ai-skills
# @vstorm-co's pydantic-deep also offers skills loading; the two have different
# spec footprints (Doug's is closer to programmatic skills).
SkillsCapability(directories=['./skills']),
# Spawn sub-agents with their own toolsets and instructions. By @vstorm-co:
# https://github.com/vstorm-co/subagents-pydantic-ai
SubAgentCapability(subagents=[
SubAgentConfig(
name='researcher',
description='Deep research on a topic',
instructions='You are a thorough research assistant.',
),
]),
# Track tasks and subtasks; in-memory by default, AsyncPostgresStorage available.
# By @vstorm-co: https://github.com/vstorm-co/pydantic-ai-todo
TodoCapability(enable_subtasks=True),
# --- Safety & reliability ---
# The next four are by @vstorm-co: https://github.com/vstorm-co/pydantic-ai-shields
# Per-run cost cap with a callback hook.
CostTracking(budget_usd=5.0),
# Reject prompts that look like prompt-injection attempts.
InputGuard(guard=lambda p: 'ignore previous instructions' not in p.lower()),
# Block or require approval per tool name.
ToolGuard(blocked=['rm'], require_approval=['write_file']),
# Detect API keys/tokens in tool I/O and redact before they reach the model.
SecretRedaction(),
# Bail out if the agent gets stuck calling the same tools in a loop.
# By @vstorm-co: https://github.com/vstorm-co/pydantic-deepagents
StuckLoopDetection(),
],
)
This snippet is illustrative, not literally copy-pasteable: a few capabilities have setup requirements (a ./skills directory, a Postgres database for TodoCapability's persistent storage), and the community packages move independently of this one. The capability matrix tracks each one's status. As the harness ships first-party versions, the imports above will collapse onto fewer packages -- but the example will keep working, since the API surface is the same.
Help us prioritize
Vote on whatever is linked in the Status column above. If there's a PR, vote on the PR -- it means we're actively building it. If there's only an issue, vote on the issue.
Want something that's not on the list? Open a capability request.
Build your own
Capabilities are the primary extension point for Pydantic AI. Any of the existing capabilities in this repo can serve as a reference for building your own.
Publishing as a standalone package? Use the pydantic-ai-<name> naming convention. See Publishing capability packages.
Contributing
We welcome capability contributions. Here's how:
- Start with an issue. Open a capability request describing the behavior you want. This lets us discuss the approach and priority before code is written -- we can close an approach without closing the problem.
- Then open a PR. Once the issue exists, you're welcome to open a PR with an implementation. Link the issue in your PR. We review based on community interest -- upvotes on both the issue and PR count.
- Don't chase green CI. Get the approach working, then let us know. We'll take it from there -- we may push to your branch, rewrite, or open a follow-up PR. You'll be credited as the original author. (See the Pydantic AI contributing guide.)
Note: PRs that modify
pyproject.tomloruv.lockfrom non-team members are auto-closed by CI to prevent supply chain risk. If you need a new dependency, open an issue.
Development
make install # install dependencies
make format # ruff format
make lint # ruff check
make typecheck # pyright strict
make test # pytest
make testcov # pytest with 100% branch coverage
Version policy
Pydantic AI Harness uses 0.x versioning to signal that APIs are still stabilizing. During 0.x:
- Minor releases (0.1 → 0.2) may include breaking changes — renamed parameters, changed defaults, restructured APIs. As the library grows, especially as capabilities gain provider-native support (starting as a local implementation, then auto-switching to the provider's built-in API when available), we may need to reshape APIs we couldn't fully anticipate in the initial design.
- Patch releases (0.1.0 → 0.1.1) will not intentionally break existing behavior.
- All breaking changes are documented in release notes with migration guidance.
- Where practical, we'll keep the previous behavior available under a deprecated name or configuration option before removing it.
This is why Pydantic AI Harness is a separate package from Pydantic AI, which has a stricter version policy. As the core capabilities stabilize, we'll move toward 1.0 with stability guarantees to match.
Pydantic AI references
- Capabilities -- what capabilities are, built-in capabilities, building your own
- Hooks -- lifecycle hooks reference, ordering, error handling
- Extensibility -- publishing packages, third-party ecosystem
- Toolsets -- building tools for capabilities
- API reference -- full API docs
License
MIT -- see LICENSE.

