Duh
D.U.H. β D.U.H is a Universal Harness. Provider-agnostic AI coding agent.
Ask AI about Duh
Powered by Claude Β· Grounded in docs
I know everything about Duh. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Demo
(GIF fallback shown above on platforms that do not render <video>. Source assets live under docs/site/assets/.)
$ duh -p "fix the bug in auth.py"
β Thinking...
I'll read the file first to understand the issue.
β Έ Running Read(file="auth.py")
Found the bug: token validation checks `expires_at < now` but should be `<=` β
tokens expiring at exactly the current second are incorrectly rejected.
β Ή Running Edit(file="auth.py", old_string="expires_at < now", new_string="expires_at <= now")
β Applied edit to auth.py (line 42).
β Ό Running Bash(command="python -m pytest tests/test_auth.py -q")
...........
11 passed in 0.43s
Done in 3 turns.
A universal, open-source AI coding agent. One harness, any provider β Anthropic Claude, OpenAI API, ChatGPT Plus/Pro subscription (Codex-family models), Google Gemini, Groq, local Ollama, or a deterministic stub for tests. Drop-in compatible with the Claude Agent SDK NDJSON protocol, so it can replace claude wherever that binary is expected.
What's new in v0.9.0
- duhwave persistent agentic-swarm extension (5 ADRs, 343 tests) β see the new section below
- Recursive Language Model context engine (replaces compaction on read-heavy turns)
- Real OpenAI benchmark on the agile-team pipeline ($0.0015/run on
gpt-4o-mini) - 9 runnable example demos under
examples/duhwave/
Benchmarks
Head-to-head, same model, same task, three LLM judges. Three benchmarks of increasing reading-intensity. Same-model deltas (D.U.H. minus first-party CLI):
| Model | B1 feature /35 | B2 rate limiter /50 | B3 docs /45 |
|---|---|---|---|
| Opus 4.7 | β0.3 | β0.3 | +31.4 * |
| GPT-5.4 | β0.6 | 0.0 | +6.6 |
| Gemini 3.1 | +2.0 | 0.0 | +0.3 |
*claude-code-opus B3 hit a stream-idle timeout at 43 min with only an index page; duh-opus at the same model produced the full docs set.
B1 + B2 (single-session feature, multi-file rate limiter) β harness parity at matched models. B3 (docs over D.U.H.'s own 30K-LOC source tree) β D.U.H. wins at matched models, corroborated by an automated consistency harness that resolves every cited symbol, signature, and import against source: D.U.H. docs are 100% faithful vs Codex's 52% at the same GPT-5.4.
D.U.H. additionally benched seven open-weights models that have no
first-party coding CLI at all (via OpenRouter + Groq native adapters).
Top open-weights B1 result: qwen3-max-thinking at 24.0/35 β within
judge-noise of gemini-cli-3.1 (25.0) and ahead of three other open
agents in the same matrix.
Full methodology, pre-registered hypotheses, rubrics, and raw
artefacts (diffs, session logs, judge JSONs, adversarial + consistency
outputs) in benchmarks/:
double-agent-tdd/β B1benchmark-2-ratelimit/β B2benchmark-3-docs/β B3
Reproduce B1 with cd benchmarks/double-agent-tdd && ./run_all.sh && ./judge_all.sh && python3 aggregate.py.
Analogous ./run.sh <agent> / ./adversarial_all.sh /
./consistency_all.sh / python3 aggregate.py flows for B2 and B3.
duhwave-agile (real OpenAI, May 2026): 5-stage agile-team pipeline
(PM β Architect β Engineer β Tester β Reviewer) running on gpt-4o-mini
in 35.5s for $0.0015. Produces an executable pytest project; 3/5 tests
pass on AI output (real coordination defects surface naturally).
Full result.
Install
# From source (recommended during alpha)
git clone https://github.com/nikhilvallishayee/duh
cd duh
python3.12 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
# From PyPI
pip install duh-cli # core: Anthropic + OpenAI + Gemini + Groq + Ollama (native SDKs)
pip install 'duh-cli[rich]' # + Rich TUI rendering
pip install 'duh-cli[bridge]' # + WebSocket remote bridge
pip install 'duh-cli[attachments]' # + PDF attachment support (pdfplumber)
pip install 'duh-cli[security]' # + vulnerability monitoring (ruff, pip-audit, detect-secrets, cyclonedx)
pip install 'duh-cli[litellm]' # + LiteLLM fallback for long-tail providers (opt-in, see ADR-075)
pip install 'duh-cli[all]' # everything above
As of v0.8.0, duh-cli ships native adapters for Anthropic, OpenAI, Gemini, Groq, and Ollama β no LiteLLM in the default install path. LiteLLM is now an opt-in fallback (duh-cli[litellm]) for providers without a native adapter (Azure, Bedrock, Vertex, Together, Cohere, Mistral, β¦). See ADR-075 for the rationale (supply-chain hardening + native feature fidelity: Anthropic cache_control, Gemini thinking_budget, Groq rate-limit headers).
Quick start
duh -p "fix the bug in auth.py" # print mode, auto-detect provider
duh --provider anthropic --model claude-sonnet-4-6 -p "hello" # force Claude
duh --provider openai --model gpt-5.2-codex -p "refactor db" # ChatGPT Plus/Pro Codex (OAuth)
duh # interactive REPL
duh doctor # diagnostics + health checks
duh security scan # vulnerability scan (SARIF output)
duh security init --non-interactive # set up security policy
duh security doctor # scanner health check
Providers
All six first-class providers use native SDKs β no LiteLLM in the default install path (ADR-075).
| Provider | Auth | Example models |
|---|---|---|
| Anthropic | ANTHROPIC_API_KEY env var or /connect anthropic | claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5 |
| OpenAI API | OPENAI_API_KEY env var or /connect openai (API key) | gpt-4o, o1, o3 |
| OpenAI ChatGPT (Codex) | /connect openai β PKCE OAuth against auth.openai.com; tokens stored in ~/.config/duh/auth.json (0600). See ADR-051, ADR-052. | gpt-5.2-codex, gpt-5.1-codex, gpt-5.1-codex-max, gpt-5.1-codex-mini |
| Gemini | GEMINI_API_KEY env var or /connect gemini (free at aistudio.google.com) | gemini-2.5-pro, gemini-2.5-flash, gemini-3.1-pro-preview |
| Groq | GROQ_API_KEY env var or /connect groq (free at console.groq.com) | llama-3.3-70b-versatile, openai/gpt-oss-120b, llama-3.1-8b-instant |
| Ollama | Local daemon on localhost:11434, no key | Any locally-pulled model |
| LiteLLM (opt-in, ADR-075) | pip install 'duh-cli[litellm]' + provider-specific env vars | azure/gpt-4o, bedrock/anthropic.claude-3-5-sonnet, β¦ |
| Stub | DUH_STUB_PROVIDER=1 | Deterministic canned response for tests / offline runs |
Provider and model auto-detect: give --model gpt-5.2-codex and D.U.H. will route to the ChatGPT/Codex Responses endpoint if OAuth exists; otherwise it falls back to the standard OpenAI Chat Completions adapter with your API key. --model gemini-2.5-pro routes to the native google-genai adapter; --model groq/llama-3.3-70b-versatile routes to the native groq SDK.
Slash commands (REPL)
/help /model /connect /models /cost /status /context /changes /git /tasks /brief /search /template /plan /pr /undo /jobs /health /errors /theme /mode /clear /compact /snapshot /exit
Plus Ctrl+K for the command palette (fuzzy search across every slash command, model, provider, and recent file) and Ctrl+T to cycle themes (dark / light / high-contrast).
Run /help in the REPL for the full description of each command. Highlights: /connect openai runs the ChatGPT OAuth flow; /connect gemini / /connect groq prompt for the API key and store it under ~/.config/duh/auth.json (0600); /snapshot takes a ghost filesystem snapshot you can apply or discard; /plan switches to design-first two-phase execution; /pr list|view|diff|checks integrates with gh; /errors shows the last N errors from the bounded in-session buffer; /theme cycles TUI themes.
Built-in tools
Read, Write, Edit, MultiEdit, Bash, Glob, Grep, Skill, ToolSearch, WebFetch, WebSearch, Task, Agent, Swarm, EnterWorktree, ExitWorktree, NotebookEdit, TestImpact, MemoryStore, MemoryRecall, HTTP, Docker, Database, GitHub, TodoWrite, AskUserQuestion, plus LSP (deferred, loaded via ToolSearch).
WebSearch is zero-config: if no API key is set, it falls back to DuckDuckGo (Instant Answer API β HTML scrape) so the tool always returns something usable. Priority chain: SERPER_API_KEY β TAVILY_API_KEY β BRAVE_SEARCH_API_KEY β DDG IA β DDG HTML. Tune the per-request timeout with DUH_WEBSEARCH_TIMEOUT (default 5 seconds).
Multi-agent
D.U.H. spawns child engines on demand. Two entry points:
Agentβ one subagent, synchronous result.Swarmβ 1β5 subagents in parallel viaasyncio.gather(partial failure tolerant).
Both accept agent_type (general / coder / researcher / planner / reviewer / subagent) and model. Since v0.8.0 the model field takes generic tiers resolved per-provider:
| Tier | Description | Resolves to (parent on Anthropic) | β¦on Gemini | β¦on Groq |
|---|---|---|---|---|
small | Cheapest / fastest | claude-haiku-4-5 | gemini-2.5-flash | llama-3.1-8b-instant |
medium | Balanced default | claude-sonnet-4-6 | gemini-2.5-pro | llama-3.3-70b-versatile |
large | Strongest reasoning | claude-opus-4-6 | gemini-3.1-pro-preview | openai/gpt-oss-120b |
inherit (default) | Use parent's model unchanged | β | β | β |
This means a Gemini-parent β "small" child never 404s asking for "haiku". Literal model names (claude-haiku-4-5, gemini-2.5-flash, β¦) are still accepted for backwards compatibility. See docs/wiki/Multi-Agent.md and ADR-012. Coordinator mode (duh --coordinator) turns the main agent into a pure orchestrator that delegates everything to subagents.
duhwave β persistent agentic-swarm extension
duhwave is a layered runtime on top of D.U.H. that turns single-shot agent invocations into a persistent, event-driven swarm:
- Persistent host daemon β
duh wave start+ 10-subcommand CLI (start/stop/ls/inspect/pause/resume/logs/install/uninstall/web) - Event ingress β webhook (HMAC-verified), filewatch, cron, MCP push, manual seam β all routing through one append-only TriggerLog
- Topology DSL β declarative
swarm.toml+ signed.duhwavebundles installed under~/.duh/waves/ - Recursive Language Model substrate β bytes addressed by reference, never summarised; bounded recursion with cycle detection (cites Zhang/Kraska/Khattab arXiv 2512.24601)
- Cross-agent variable handles β workers see selected handles from the coordinator's REPL; results bind back as new handles (cites Yang/Zou/Pan et al. arXiv 2604.25917)
- Three Task execution surfaces β in-process / subprocess / remote HTTP+bearer β one lifecycle, orphan recovery on restart
Five Accepted ADRs (028β032) define the design. 343 unit + integration tests cover the implementation.
β Cookbook: Build your own swarm β ADR Index β Benchmark: agile-team on real OpenAI
Runnable demos
| Demo | What it proves | Needs API key |
|---|---|---|
examples/duhwave/01_rlm_demo.py | RLM substrate single-agent | no |
examples/duhwave/02_swarm_demo.py | Cross-agent handle-passing | no |
examples/duhwave/03_event_driven.py | Webhook β trigger β match | no |
examples/duhwave/04_topology_bundle.py | Bundle install β daemon β manual seam | no |
examples/duhwave/repo_triage/ | Multi-agent showpiece (stub workers) | no |
examples/duhwave/parity_hermes/ | Hermes feature parity (5 patterns) | no |
examples/duhwave/parity_claw/ | Always-on multi-channel (4 channels) | no |
examples/duhwave/agile_team/ | 5-agent agile-team headless run | optional |
examples/duhwave/telegram_assistant/ | Mock Telegram bus + real OpenAI | yes |
examples/duhwave/real_e2e/ | Daemon-driven webhook β real OpenAI agent β outbox | yes |
Security
D.U.H. ships a three-layer pluggable security module (ADR-053 + ADR-054) that addresses every published agent RCE in the 2024-2026 CVE corpus:
Layer 1 β Vulnerability monitoring (duh security CLI):
- 13 scanners across 3 tiers: Minimal (ruff-sec, pip-audit, detect-secrets, cyclonedx-sbom + 5 D.U.H.-specific scanners), Extended (semgrep, osv-scanner, gitleaks, bandit), Paranoid (GitHub Actions CodeQL, Scorecard, Dependabot)
- D.U.H.-specific scanners detect project-file RCE (CVE-2025-59536), MCP tool poisoning (CVE-2025-54136), sandbox bypass (CVE-2025-59532), command injection (CVE-2026-35022), and OAuth hardening violations
- SARIF output for GitHub Code Scanning, delta mode (
--baseline), exception management with expiry duh security initwizard,duh security doctor, pre-push git hook installer
Layer 2 β Runtime hardening (ADR-054):
- Taint propagation:
UntrustedStrsubclass tags every string with its origin (user_input,model_output,tool_output,file_content,mcp_output,network) and propagates through all string operations - Confirmation tokens: HMAC-bound tokens prevent model-originated tainted strings from reaching dangerous tools (Bash, Write, Edit) without explicit user confirmation
- Lethal trifecta check: Sessions with simultaneous read-private + read-untrusted + network-egress capabilities require explicit acknowledgement (
--i-understand-the-lethal-trifecta) - MCP Unicode normalization: NFKC normalization + rejection of zero-width, bidi, tag, and variation selector characters in MCP tool descriptions (GlassWorm defense)
- Per-hook filesystem namespacing: Each hook gets a private temp directory; cross-hook file access is blocked
- PEP 578 audit hook bridge:
sys.addaudithooktelemetry onopen,subprocess.Popen,socket.connect,exec,import pickleβ sub-500ns fast path - Signed plugin manifests: TOFU trust store with sigstore-ready verification and revocation
- Provider differential fuzzer: Hypothesis property tests ensure all 5 adapters parse tool_use identically
Layer 3 β Sandboxing:
Shell commands and MCP stdio servers are wrapped by the host OS sandbox: macOS Seatbelt (sandbox-exec profiles), Linux Landlock (syscall-level filesystem access control), and a network policy layer that blocks outbound traffic unless explicitly allowed (see duh/adapters/sandbox/). Approval behaviour is controlled with --approval-mode suggest|auto-edit|full-auto (reads only / reads+writes / everything), with --dangerously-skip-permissions as the hard bypass.
MCP (Model Context Protocol)
MCP servers connect via four transports: stdio (via the mcp SDK), SSE, streamable HTTP, and WebSocket. Configure with --mcp-config <file-or-json>. See ADR-010 and ADR-040.
Hooks
29 lifecycle events (6 original + 22 extended + AUDIT) including PreToolUse, PostToolUse, PostToolUseFailure, SessionStart, SessionEnd, UserPromptSubmit, PermissionRequest, PreCompact, PostCompact, FileChanged, SubagentStart, Elicitation, and more. Hooks support glob matchers, blocking semantics (a hook can refuse a tool call or rewrite its input), and both shell-command and Python-callable handlers. See ADR-013, ADR-036, ADR-044, ADR-045.
Tests and coverage
6119 tests, 100% line coverage, ~60s on a laptop. Includes 330+ security-specific tests (unit, integration, property-based), CVE replay fixtures, performance regression gates, and a three-tier TUI E2E suite (Rich CaptureConsole snapshots β PTY+pyte byte-level β tmux full-terminal, ADR-074). CI runs on GitHub Actions with a --cov-fail-under=85 floor (current actual: 100%).
.venv/bin/python -m pytest tests/ # full suite
.venv/bin/python -m pytest tests/ --cov=duh # with coverage
DUH_STUB_PROVIDER=1 .venv/bin/python -m pytest tests/ # force stub provider
Selected ADRs
| # | Topic |
|---|---|
| 001 | Project vision: provider-agnostic, <5K LOC kernel |
| 005 | Safety architecture (schema + approval + patterns) |
| 009 | Provider adapter contract (.stream() async generator) |
| 021 | Claude Agent SDK NDJSON protocol compatibility |
| 037 | Seatbelt / Landlock platform sandboxing |
| 039 | Ghost snapshots for /snapshot |
| 040 | MCP stdio / SSE / HTTP / WebSocket transports |
| 045 | Hook blocking + input rewriting |
| 051 | OAuth provider authentication (ChatGPT Plus/Pro) |
| 052 | ChatGPT Codex adapter (Responses API) |
| 053 | Continuous vulnerability monitoring (pluggable scanner module) |
| 054 | LLM-specific security hardening (taint propagation, confirmation tokens, lethal trifecta) |
| 073 | TUI parity sprint (command palette, themes, streaming, virtualization) |
| 074 | TUI E2E testing (3-tier: snapshot, PTY+pyte, tmux) |
| 075 | Drop LiteLLM as default; native Gemini + Groq adapters |
Full list: docs/adrs/ (75 ADRs).
Development
git clone https://github.com/nikhilvallishayee/duh
cd duh
python3.12 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
# Run tests
.venv/bin/python -m pytest tests/
# Run with coverage
.venv/bin/python -m pytest tests/ --cov=duh --cov-report=term
# Offline / deterministic mode (no real provider calls)
export DUH_STUB_PROVIDER=1
duh -p "hello" # β "stub-ok"
Source layout:
duh/
kernel/ # agentic loop, sessions, tasks, plan mode, undo, skills, memory
ports/ # abstract provider / tool / approver interfaces
adapters/ # anthropic, openai, openai_chatgpt, gemini, groq, ollama, stub,
# litellm_provider (opt-in fallback), mcp_executor, mcp_transports
sandbox/ # seatbelt, landlock, network policy
auth/ # credential store + OpenAI ChatGPT PKCE OAuth
providers/ # provider registry + model resolution + tier mapping (small/medium/large)
tools/ # 27 built-in tools (see list above)
security/ # vulnerability monitoring, policy resolver, scanner plugins, CI templates
scanners/ # 13 scanners (4 minimal, 5 D.U.H.-custom, 4 extended)
cli/ # parser, main, runner, repl, sdk_runner, ndjson
bridge/ # optional WebSocket remote bridge
ui/ # Rich TUI rendering + command palette + themes
plugins/ # plugin loader, signed manifests, TOFU trust store
hooks.py # 29-event hook system with per-hook FS namespacing
License
Apache 2.0 β see LICENSE.
