📦

Pydantic AI Harness

Batteries for your Pydantic AI agent.

0 installs

Trust: 52 — Fair

Other

Ask AI about Pydantic AI Harness

I know everything about Pydantic AI Harness. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Pydantic AI Harness

The batteries for your Pydantic AI agent.

Pydantic AI's capabilities and hooks API is how you give an agent its harness -- bundles of tools, lifecycle hooks, instructions, and model settings that extend what the agent can do without any framework changes.

Pydantic AI Harness is the official capability library for Pydantic AI, maintained by the Pydantic AI team. Pydantic AI core ships capabilities that require model or framework support, and capabilities fundamental to every agent -- web search, tool search, thinking. Everything else lives here: standalone building blocks you pick and choose to turn your agent into a coding agent, a research assistant, or anything else. This is also where new capabilities start -- as they stabilize and prove themselves broadly essential, they can graduate into core.

The capability matrix tracks where we are. Tell us what to prioritize.

Contents: Installation · Quick start · Capability matrix · An ecosystem agent · Help us prioritize · Build your own · Contributing · Version policy · Pydantic AI references · License

Installation

uv add pydantic-ai-harness

Extras for specific capabilities:

uv add "pydantic-ai-harness[codemode]"   # CodeMode (adds the Monty sandbox)

The code-mode extra is also supported as an alias.

Requires Python 3.10+ and pydantic-ai-slim>=1.89.1.

Quick start

uv add "pydantic-ai-slim[anthropic,mcp,duckduckgo,logfire]" "pydantic-ai-harness[code-mode]"

import logfire
from pydantic_ai import Agent
from pydantic_ai.capabilities import MCP, WebSearch
from pydantic_ai_harness import CodeMode

# See https://ai.pydantic.dev/logfire/ for setup details.
logfire.configure()
logfire.instrument_pydantic_ai()

agent = Agent(
    'anthropic:claude-opus-4-7',
    capabilities=[
        # Wraps every tool into a single run_code tool, sandboxed by Monty
        # (https://github.com/pydantic/monty -- pulled in by the [code-mode] extra).
        # The model writes Python that calls multiple tools with loops, conditionals,
        # asyncio.gather, and local filtering -- one model round-trip for N tool calls.
        CodeMode(),
        # Connect to any MCP server -- here, the open-source Hacker News server
        # (https://github.com/cyanheads/hn-mcp-server). builtin=False forces the
        # local FastMCP toolset so CodeMode can wrap the tools; without it,
        # providers that natively support MCP server connectors execute the tools
        # server-side and bypass the sandbox.
        MCP('https://hn.caseyjhand.com/mcp', builtin=False),
        # Provider-adaptive web search; builtin=False routes through the local
        # DuckDuckGo fallback (the [duckduckgo] extra above) so CodeMode can batch
        # web searches alongside the HN calls in a single run_code.
        WebSearch(builtin=False),
    ],
)

result = agent.run_sync(
    "Across the top, best, and 'show HN' Hacker News feeds, find the most-discussed "
    "story with at least 100 points. Pull its comment thread, its submitter's profile, "
    "and any web coverage. Summarize what you find in one paragraph."
)
print(result.output)
"""
The most-discussed HN story across top/best/show clearing 100 points is "Vibe coding
and agentic engineering are getting closer than I'd like" by Simon Willison (748 points,
853 comments, on the Best feed), submitted by long-time HNer e12e. The piece argues
that the two modes Willison once kept mentally separate -- throwaway "vibe coding" and
disciplined "agentic engineering" -- are blurring, since agents like Claude Code now
reliably handle non-trivial tasks like "build a JSON API endpoint that runs a SQL query"
with tests and docs on the first pass. The HN thread is unusually substantive, with
commenters debating whether LLMs created or merely *exposed* sloppy engineering
practices and warning of a "normalization of deviance" as engineers stop reviewing diffs.
"""

See this run as a public Logfire trace → Each run_code span fans out into the tool calls the model issued from inside the sandbox -- it's the easiest way to understand what code mode actually did.

Capability matrix

We studied leading coding agents, agent frameworks, and Claw-style assistants to map every capability area that matters for production agents. Each one is tracked as an issue in this repo.

Vote on whatever is linked in the Status column -- PRs if we're actively building it, issues if it's planned -- to help us decide what to work on next.

Category	Capability	Description	Status	Community alternatives
Tools & execution	Code mode	Sandboxed Python execution via Monty -- one `run_code` call replaces N tool calls	:white_check_mark: Docs
	Tool search	Progressive tool discovery for large tool sets	:white_check_mark: Pydantic AI
	File system	Read, write, edit, search files with path traversal prevention	:construction: PR #177	pydantic-ai-backend (vstorm‑co)
	Shell	Execute commands with allowlists, denylists, and timeouts	:construction: PR #177	pydantic-ai-backend (vstorm‑co)
	Repo context injection	Auto-load CLAUDE.md/AGENTS.md and repo structure	:construction: PR #175	pydantic-deep (vstorm‑co)
	Verification loop	Run tests after edits, auto-fix failures	:construction: PR #169
Context management	Sliding window	Trim conversation history to stay within token limits	:construction: PR #191	summarization-pydantic-ai (vstorm‑co)
	Context compaction	LLM-powered summarization of older messages	:construction: PR #191	summarization-pydantic-ai (vstorm‑co)
	Limit warnings	Warn agent before hitting context/iteration limits	:construction: PR #191	summarization-pydantic-ai (vstorm‑co)
	Tool output management	Truncate, summarize, or spill large tool outputs	:construction: PR #185
	System reminders	Inject periodic reminders to counteract instruction drift	:construction: PR #181
Memory & persistence	Memory	Persistent key-value memory across sessions	:construction: PR #179	pydantic-deep (vstorm‑co)
	Session persistence	Save and restore full conversation state	:construction: PR #176
	Checkpointing	Save, rewind, and fork conversation state	:memo: #196	pydantic-deep (vstorm‑co)
Agent orchestration	Sub-agents	Delegate subtasks to specialized child agents	:construction: PR #178	subagents-pydantic-ai (vstorm‑co)
	Skills	Progressive tool loading -- search, activate, deactivate	:construction: PR #183	pydantic-ai-skills (DougTrajano), pydantic-deep (vstorm‑co)
	Planning	Break complex tasks into structured plans before execution	:construction: PR #180
	Task tracking	Track tasks, subtasks, and dependencies	:memo: #65	pydantic-ai-todo (vstorm‑co)
	Teams	Multi-agent teams with shared state and message bus	:memo: #195	pydantic-deep (vstorm‑co)
Safety & guardrails	Input guardrails	Validate user input before the agent run starts	:construction: PR #182	pydantic-ai-shields (vstorm‑co)
	Output guardrails	Validate model output after the run completes	:construction: PR #182	pydantic-ai-shields (vstorm‑co)
	Cost/token budgets	Enforce token and cost limits per run	:construction: PR #182	pydantic-ai-shields (vstorm‑co)
	Tool access control	Block tools or require approval before execution	:construction: PR #182	pydantic-ai-shields (vstorm‑co)
	Async guardrails	Run validation concurrently with model requests	:construction: PR #182	pydantic-ai-shields (vstorm‑co)
	Secret masking	Detect and redact secrets in agent I/O	:construction: PR #172	pydantic-ai-shields (vstorm‑co)
	Approval workflows	Require human approval for sensitive operations	:construction: PR #173	Pydantic AI (built‑in)
	Tool budget	Limit total tool calls or cost per run	:construction: PR #168
Reliability	Stuck loop detection	Detect and break out of repetitive agent loops	:construction: PR #186
	Tool error recovery	Retry failed tool calls with backoff and budget	:construction: PR #171
	Tool orphan repair	Fix orphaned tool calls in conversation history	:construction: PR #184
Reasoning	Adaptive reasoning	Adjust thinking effort based on task complexity	:construction: PR #174
	Current time	Inject current date/time into system prompt	:construction: PR #170

Packages by vstorm-co are endorsed by the Pydantic AI team. We're working with them to upstream some of their implementations into this repo.

An ecosystem agent

The Quick start above is deliberately small. Here's the other end of the spectrum -- an agent wired up with capabilities drawn from across the Pydantic AI ecosystem: this repo, core pydantic-ai, and the community packages we vouch for in the matrix above.

import logfire
from pydantic_ai import Agent
from pydantic_ai.capabilities import MCP, Thinking, WebSearch
from pydantic_ai_harness import CodeMode

# Community packages, alphabetical:
from pydantic_ai_backends import ConsoleCapability
from pydantic_ai_shields import CostTracking, InputGuard, SecretRedaction, ToolGuard
from pydantic_ai_skills import SkillsCapability
from pydantic_ai_summarization import ContextManagerCapability
from pydantic_ai_todo import TodoCapability
from pydantic_deep import MemoryCapability, StuckLoopDetection
from subagents_pydantic_ai import SubAgentCapability, SubAgentConfig

# See https://ai.pydantic.dev/logfire/ for setup details.
logfire.configure()
logfire.instrument_pydantic_ai()

agent = Agent(
    'anthropic:claude-opus-4-7',
    capabilities=[
        # --- Execution ---
        # Wraps every tool into a single run_code, sandboxed by Monty.
        CodeMode(),

        # --- Reasoning ---
        # Provider-adaptive thinking; uses native extended thinking on supporting models.
        Thinking(effort='xhigh'),

        # --- Context management ---
        # Sliding window + LLM compaction. By @vstorm-co:
        # https://github.com/vstorm-co/summarization-pydantic-ai
        # Pydantic AI also ships `AnthropicCompaction` and `OpenAICompaction` for
        # provider-native compaction.
        ContextManagerCapability(max_tokens=180_000),

        # --- Tools ---
        # Connect to any MCP server -- here, the open-source Hacker News server
        # (https://github.com/cyanheads/hn-mcp-server).
        MCP('https://hn.caseyjhand.com/mcp'),

        # Provider-adaptive web search; falls back to a local DuckDuckGo implementation.
        WebSearch(),

        # Filesystem + shell. By @vstorm-co: https://github.com/vstorm-co/pydantic-ai-backend
        ConsoleCapability(),

        # --- Memory & persistence ---
        # Persistent ./MEMORY.md per agent name. By @vstorm-co:
        # https://github.com/vstorm-co/pydantic-deepagents
        MemoryCapability(agent_name='harness-example'),

        # --- Orchestration ---
        # Agent skills (Anthropic's spec) by @DougTrajano:
        # https://github.com/DougTrajano/pydantic-ai-skills
        # @vstorm-co's pydantic-deep also offers skills loading; the two have different
        # spec footprints (Doug's is closer to programmatic skills).
        SkillsCapability(directories=['./skills']),

        # Spawn sub-agents with their own toolsets and instructions. By @vstorm-co:
        # https://github.com/vstorm-co/subagents-pydantic-ai
        SubAgentCapability(subagents=[
            SubAgentConfig(
                name='researcher',
                description='Deep research on a topic',
                instructions='You are a thorough research assistant.',
            ),
        ]),

        # Track tasks and subtasks; in-memory by default, AsyncPostgresStorage available.
        # By @vstorm-co: https://github.com/vstorm-co/pydantic-ai-todo
        TodoCapability(enable_subtasks=True),

        # --- Safety & reliability ---
        # The next four are by @vstorm-co: https://github.com/vstorm-co/pydantic-ai-shields
        # Per-run cost cap with a callback hook.
        CostTracking(budget_usd=5.0),

        # Reject prompts that look like prompt-injection attempts.
        InputGuard(guard=lambda p: 'ignore previous instructions' not in p.lower()),

        # Block or require approval per tool name.
        ToolGuard(blocked=['rm'], require_approval=['write_file']),

        # Detect API keys/tokens in tool I/O and redact before they reach the model.
        SecretRedaction(),

        # Bail out if the agent gets stuck calling the same tools in a loop.
        # By @vstorm-co: https://github.com/vstorm-co/pydantic-deepagents
        StuckLoopDetection(),
    ],
)

This snippet is illustrative, not literally copy-pasteable: a few capabilities have setup requirements (a ./skills directory, a Postgres database for TodoCapability's persistent storage), and the community packages move independently of this one. The capability matrix tracks each one's status. As the harness ships first-party versions, the imports above will collapse onto fewer packages -- but the example will keep working, since the API surface is the same.

Help us prioritize

Vote on whatever is linked in the Status column above. If there's a PR, vote on the PR -- it means we're actively building it. If there's only an issue, vote on the issue.

Want something that's not on the list? Open a capability request.

Build your own

Capabilities are the primary extension point for Pydantic AI. Any of the existing capabilities in this repo can serve as a reference for building your own.

Publishing as a standalone package? Use the pydantic-ai-<name> naming convention. See Publishing capability packages.

Contributing

We welcome capability contributions. Here's how:

Start with an issue. Open a capability request describing the behavior you want. This lets us discuss the approach and priority before code is written -- we can close an approach without closing the problem.
Then open a PR. Once the issue exists, you're welcome to open a PR with an implementation. Link the issue in your PR. We review based on community interest -- upvotes on both the issue and PR count.
Don't chase green CI. Get the approach working, then let us know. We'll take it from there -- we may push to your branch, rewrite, or open a follow-up PR. You'll be credited as the original author. (See the Pydantic AI contributing guide.)

Note: PRs that modify pyproject.toml or uv.lock from non-team members are auto-closed by CI to prevent supply chain risk. If you need a new dependency, open an issue.

Development

make install   # install dependencies
make format    # ruff format
make lint      # ruff check
make typecheck # pyright strict
make test      # pytest
make testcov   # pytest with 100% branch coverage

Version policy

Pydantic AI Harness uses 0.x versioning to signal that APIs are still stabilizing. During 0.x:

Minor releases (0.1 → 0.2) may include breaking changes — renamed parameters, changed defaults, restructured APIs. As the library grows, especially as capabilities gain provider-native support (starting as a local implementation, then auto-switching to the provider's built-in API when available), we may need to reshape APIs we couldn't fully anticipate in the initial design.
Patch releases (0.1.0 → 0.1.1) will not intentionally break existing behavior.
All breaking changes are documented in release notes with migration guidance.
Where practical, we'll keep the previous behavior available under a deprecated name or configuration option before removing it.

This is why Pydantic AI Harness is a separate package from Pydantic AI, which has a stricter version policy. As the core capabilities stabilize, we'll move toward 1.0 with stability guarantees to match.

Pydantic AI references

Capabilities -- what capabilities are, built-in capabilities, building your own
Hooks -- lifecycle hooks reference, ordering, error handling
Extensibility -- publishing packages, third-party ecosystem
Toolsets -- building tools for capabilities
API reference -- full API docs

License

MIT -- see LICENSE.