Pentest Agents
Autonomous bug-bounty framework for Claude Code β 40 specialist agents, exploit-chain builder, writeup search, and live HackerOne/Bugcrowd integration.
Ask AI about Pentest Agents
Powered by Claude Β· Grounded in docs
I know everything about Pentest Agents. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Pentest Agent Suite for Claude Code
Autonomous bug-bounty framework for Claude Code and 6 other AI coding tools β 50 agents, 26 commands, 19 CLI tools, 11 skills, 2 MCP servers.
~760 files Β· ~118k lines Β· 50 agents Β· 26 commands Β· 19 CLI tools Β· 11 skills Β· 2 MCP servers (16 bug-bounty platforms + BYO writeup search) Β· 2,500 payload lines
A complete bug bounty framework. Battle-tested hunting methodology with concrete payloads, 7-Question Gate validation, autonomous hunt loops, AβB exploit chain building, persistent brain with endpoint tracking, optional semantic writeup search (bring your own index), automatic cost tracking via CC hooks, live platform integration, and a cross-IDE installer that emits the native format for Claude Code, Codex, Gemini, Cursor, Windsurf, VS Code Copilot, and OpenClaw.
Quick Start
# MCP servers are launched via `uv run --with mcp` β no global pip install required.
export HACKERONE_USERNAME=you HACKERONE_TOKEN=your_token
uv run python3 tools/scaffold.py hackerone tesla
cd ~/bounties/hackerone-tesla && claude
/model opus # Opus 4.7 [1M] β subagents inherit via model: "inherit"
/sync hackerone tesla
/brain init && /status
/hunt tesla.com
scaffold.py provisions the workspace for every supported project-scoped
client, not only Claude Code: CLAUDE.md, AGENTS.md, .codex/,
.agents/skills/, .gemini/, .cursor/, .windsurf/, .github/, and
.vscode/mcp.json are generated from the copied workspace assets so paths
resolve inside the bounty workspace.
Install (Claude Code + 6 other AI coding tools)
The framework ships pre-rendered for every supported tool. There are two ways to use it:
1. Use the bundles directly (no install step)
git clone https://github.com/H-mmer/pentest-agents-suite
cd pentest-agents-suite/pentest-agents/providers/codex
codex # or: cd ../gemini && gemini, etc.
The providers/<id>/ tree contains a fully-translated, ready-to-use bundle
for each non-Claude target. Path references inside use .. to reach the
repo's tools/, rules/, and mcp-*-server/ β so the bundle works as
long as it stays inside the cloned repo.
2. Run the installer (writes into your own project or ~/.codex/ etc.)
python3 -m tools.installer install --targets all --scope project
python3 -m tools.installer install --targets codex --scope global
Install mode rewrites paths to absolute references back into the cloned pentest-agents repo, so the install works no matter where the user's own project lives.
| Target | Agents | Slash commands | Rules | MCP | Scopes |
|---|---|---|---|---|---|
| Claude Code | native .claude/agents/*.md | .claude/skills/<name>/SKILL.md | CLAUDE.md | .mcp.json / ~/.claude.json | global + project |
| OpenAI Codex | native .codex/agents/*.toml | .agents/skills/<name>/SKILL.md | AGENTS.md (β€32 KiB) | [mcp_servers.*] in config.toml | global + project |
| Google Gemini | native .gemini/agents/*.md | TOML in .gemini/commands/ | GEMINI.md | mcpServers in settings.json | global + project |
| Cursor | β skills .cursor/skills/agent-*/SKILL.md (no native subagents) | β skills .cursor/skills/cmd-*/SKILL.md | .cursor/rules/*.mdc + AGENTS.md | .cursor/mcp.json | global + project |
| Windsurf | β skills | Workflows | .windsurf/rules/*.md (β€12 KiB / file) | ~/.codeium/windsurf/mcp_config.json | global + project |
| VS Code Copilot | .github/agents/*.agent.md (β€30 KiB / agent) | .github/prompts/*.prompt.md | .github/copilot-instructions.md + .github/instructions/* | .vscode/mcp.json | project + global-MCP |
| OpenClaw | β skills | β skills | ~/.openclaw/workspace/AGENTS.md or <proj>/AGENTS.md | mcp.servers in ~/.openclaw/openclaw.json | global + project (MCP is user-level) |
Cursor, Windsurf, and OpenClaw have no native subagent concept; Claude-format
agents render as skills/rules. Codex commands are emitted as AgentSkills under
.agents/skills/; the deprecated .codex/prompts/ path is not used.
providers/ directory (in the cloned repo):
providers/
βββ codex/ AGENTS.md + .codex/{agents,config.toml} + .agents/skills
βββ gemini/ GEMINI.md + .gemini/{agents,commands} + settings.json
βββ cursor/ AGENTS.md + .cursor/{rules,skills,mcp.json}
βββ windsurf/ AGENTS.md + .windsurf/{rules,workflows,skills} + mcp_config.json
βββ copilot/ .github/{copilot-instructions.md,instructions,prompts,agents} + .vscode/mcp.json
βββ openclaw/ AGENTS.md + .agents/skills/ + openclaw.json
providers/ is generated, not edited by hand. Re-render after editing
.claude/, rules/, or skills/ source:
python3 -m tools.installer render --targets all
python3 -m tools.installer render --check # exits 1 if drift
The test_committed_providers_match_render pytest case enforces drift
detection locally β there is no GitHub Actions CI by project policy.
What gets translated
When .claude/ content is rendered for non-Claude targets, the translator:
- Drops the
model:field β each target uses its own default model. - Strips Claude-specific prose β "Claude Code" β "the AI coding tool",
"the Agent tool" β "the subagent dispatch tool",
model: "inherit"is removed entirely. - Rewrites
$CLAUDE_PROJECT_DIRβ to..inproviders/(relative to the cloned repo), or to absolute paths into the cloned source repo when installing into a user's project. - Maps
effort:frontmatter tomodel_reasoning_effortin Codex TOML. - Caps body length β Copilot agents are truncated at 30,000 chars (Copilot's hard limit). Windsurf rules are chunked at 12,000 chars (workspace) / 6,000 chars (global).
- Adds Copilot subagent links β orchestrator agents (chain-builder,
correlator, recon-ranker) get an
agents:list of siblings so Copilot wires the dispatch graph.
Installer management
pentest-agents list # detect which targets are installed
pentest-agents install --targets claude_code,codex --scope global
pentest-agents install --dry-run # preview every file + JSON merge
pentest-agents verify # check manifest vs. disk (drift)
pentest-agents uninstall # reverse, restore .pa-backup files
pentest-agents render --targets all # regenerate providers/<id>/
pentest-agents render --check # drift gate (exit 1 if dirty)
Every install records a manifest (.pentest-agents/manifest.json for project
scope, ~/.config/pentest-agents/manifest.json for global). Uninstall only
removes files we wrote and surgically strips only the MCP/JSON keys we merged β
your other settings are never touched. Conflicting writes back up the original
as <path>.pa-backup and are restored on uninstall.
Workflow
New program: /new β /sync β /brain init β /analyze β /surface β /hunt
Returning: /resume <target> β /hunt or /autopilot
After finding: /validate β /chain β /report β /dupcheck β /submit β /learn
Batch triage: /triage (7-Question Gate on all findings)
MCP Servers (2)
bounty-platforms (16 platforms)
HackerOne (full API), Bugcrowd, Intigriti, Immunefi (public), YesWeHack + 11 stubs. 7 MCP tools: list_platforms, get_program_scope, get_program_policy, search_hacktivity, sync_program, draft_report, submit_report.
writeup-search (BYO index)
Searchable knowledge base agents query during hunting and validation. 4 MCP tools:
search_writeupsβ semantic search (FAISS) or keyword search for prior artget_writeupβ full writeup content by IDsearch_techniquesβ exploitation techniques by vuln classsearch_payloadsβ curated payloads fromrules/payloads.md
The writeup index is not bundled. Bulk-redistributing scraped hacktivity violates most platform ToS, so this repo ships the server only. The
search_payloads+search_techniquesfallback works out of the box; the semantic/keyword layers activate once you point the server at your own index.
Three search modes (auto-detected, graceful fallback):
| Mode | Requires | Searches |
|---|---|---|
| FAISS (semantic) | faiss-cpu, sentence-transformers, your metadata.db + index.faiss | Your writeup corpus via vector embeddings |
| SQLite (keyword) | Your metadata.db only | Your writeup corpus via LIKE over the text column |
| Local (default) | Nothing β zero deps | rules/payloads.md + skills/ shipped in this repo |
Point the server at your index by dropping metadata.db (+ optionally index.faiss) into ~/.local/share/pentest-writeups/, or set WRITEUP_DB_DIR=/path/to/dir.
Expected schema (metadata.db): a SQLite file with at least one table containing columns id, title, url, and one text column (content / text / body / writeup). Row order in the table must match vector order in index.faiss when using semantic mode.
Build your own index β rag-builder/
The repo now ships a local RAG/FAISS builder under rag-builder/ that turns a list of GitHub / GitLab repositories into a metadata.db + index.faiss pair the writeup-search MCP server consumes. Destructive operations (clone, embed, write) are always gated behind --execute β running the CLI without it prints the plan and changes nothing, so you can never wipe an existing index by accident.
cd rag-builder
# 1. Inspect the plan β no network, no writes.
python3 build.py status
python3 build.py ingest # dry-run (the default)
# 2. Opt-in pre-flight: probe every URL with `git ls-remote` (network).
python3 build.py ingest --check-remotes # ~5s for 141 repos at 16 workers
# 3. Actually clone + index every repo from repos.yaml into ./data/.
python3 build.py ingest --execute
python3 build.py ingest --execute --check-remotes # skip unreachable first
# 4. Point the MCP server at the output.
export WRITEUP_DB_DIR="$PWD/data"
python3 ../mcp-writeup-server/server.py --test
rag-builder/repos.yaml ships with a 146-entry seed covering CTF archives, bug-bounty reports, payload collections, and research aggregators β edit freely. repos-skipped.yaml is loaded automatically as an exclusion list (override with --skip-list or --no-skip-list). config.yaml controls the embedding model (all-MiniLM-L6-v2 by default), host allowlist, clone size cap, and file-size ceiling. See rag-builder/README.md for the full reference.
CC Hooks (automatic cost tracking)
Configured in settings.json, fires automatically:
- SubagentStop β
cost_hook.pylogs agent name + session tocost-tracking.json - Stop β logs session end
- SessionStart β welcome message
Statusline shows live cost from session token data: $0.57
Commands (26)
Hunting & Analysis
| Command | Description |
|---|---|
/hunt <target> [--vuln-class] | Active hunting β searches writeup DB for techniques first, then tests with concrete payloads |
/autopilot <target> | Autonomous loop with --paranoid/--normal/--yolo checkpoints |
/surface <target> | P1/P2/Kill ranked attack surface |
/chain | Build AβBβC exploit chains via chain-builder agent (9 capability rows + 4 documented deep chains in rules/chain-table.md) |
/analyze <target> | AI analysis: crown jewels, attack paths, blind spots |
/mindmap <target> | Attack surface tree with brain status |
/sast <repo> | Source-code vulnerability hunting (entry β flow β gap β exploit pipeline) |
Validation & Reporting
| Command | Description |
|---|---|
/validate <finding> | 7-Question Gate β PASS/KILL/DOWNGRADE/CHAIN REQUIRED |
/triage | Batch-validate ALL findings, kill weak ones |
/quality <draft> | Score report 1-10 (blocks below 7) |
/report [format] | Reports (hard gate: requires /validate PASS) |
/dupcheck <desc> | Hacktivity + writeup DB for duplicates |
/submit <finding> | Submit (hard gate: /validate PASS + /quality β₯ 7) |
Session & Memory
| Command | Description |
|---|---|
/resume <target> | Resume β untested endpoints + suggestions |
/remember | Log finding/pattern for cross-target learning |
/learn <id> <status> | Record response β auto-boosts paid techniques |
/brain | init, brief, status, endpoint, endpoints, record, exhausted |
Infrastructure
| Command | Description |
|---|---|
/new, /sync, /status | Setup + dashboard |
/pipeline, /quickscan, /fullscan | Scanning pipelines |
/correlate | Chain discovery across findings |
/cost, /monitor | Cost tracking, target change detection |
Agents (50)
H1 Weakness Specialists (19)
xss-hunter (#60/#61/#62), sqli-hunter (#67), csrf-hunter (#57), ssrf-hunter (#75), ssti-hunter (#74), idor-hunter (#55), auth-tester (#27), info-disclosure (#18), open-redirect (#38), rce-hunter (#70), xxe-hunter (#63), file-upload (#39), cors-hunter (#58), subdomain-takeover (#145), business-logic (#28), race-condition (#29), privilege-escalation (#26), oauth-hunter (#1/#22/#106/#137), llm-ai-hunter (chains under #18/#55/#61/#70/#106)
Hunting & Analysis (3)
- validator β 7-Question Gate + never-submit list (PASS/KILL/DOWNGRADE/CHAIN)
- chain-builder β AβB chain walk against the capability table, searches writeup DB for proven chains
- recon-ranker β P1/P2/Kill surface ranking
Infrastructure / Recon (10)
recon, vuln-scanner, config-auditor, cloud-recon, js-analyzer, waf-profiler, graphql-audit, nuclei-writer, browser-agent (Burp MCP), browser-stealth-agent (Camoufox)
Meta / Validation (9)
brain, correlator, quality-check, monitor, poc-builder, report-writer, scope-check, browser-verifier (client-side PoC proof), dast-devils-advocate (adversarial downgrade)
SAST Pipeline (8)
sast-file-ranker, sast-entry-mapper, sast-danger-mapper, sast-flow-tracer, sast-gap-analyzer, sast-devils-advocate, sast-hunter, sast-exploit-builder
Specialized (1)
web3-auditor β Solidity grep arsenal, Foundry PoC, DeFi patterns
Hunting Skills (5 deep methodology skills + 6 reference skills = 11)
The hunt-* skills are vuln-class-specific methodology files distilled from
public bug-bounty reports. Each has a verified 2024-2026 CVE catalog and
sub-techniques. The matching specialist agent reads its skill via
Read $CLAUDE_PROJECT_DIR/skills/hunt-<class>/SKILL.md before testing.
| Skill | Lines | Pairs With | Highlights |
|---|---|---|---|
skills/hunt-rce/SKILL.md | 1,135 | rce-hunter | 1,218-report distillation. RSC CVE-2025-55182, runc Leaky Vessels, BentoML pickle, LangChain REPL, Tekton/OpenProject git arg injection, ingress-nginx, container/runtime, ML serving, agentic LLM tool-use, OSS supply chain |
skills/hunt-idor/SKILL.md | 969 | idor-hunter | 1,117-report distillation. Sam Curry automotive chain, OneUptime CVE-2026-30956, Zitadel V2Beta/Mgmt API, Inforcer tenant enum, Apache Answer UUIDv1 prediction, Indico BOLA, GraphQL field-level pivots, agentic AI cross-tenant |
skills/hunt-xss/SKILL.md | 968 | xss-hunter | DOMPurify mXSS family, Auth0 nextjs-auth0 returnTo, RSC DoS family, markdown-to-jsx, listmonk admin-ATO, Trix rich-text editor (H1 #2819573 / #2521419), Jupyter notebook XSS (GHSA-rch3-82jr-f9w9), n8n MCP OAuth XSS (GHSA-537j-gqpc-p7fq), LinkedIn-class iframe-in-article (H1 #2212950), 10 sub-techniques (A-J), Semgrep / ast-grep / ripgrep / CodeQL patterns |
skills/hunt-oauth/SKILL.md | 770 | oauth-hunter | 365-report distillation. ruby-saml parser differentials, Authentik regex redirect_uri, workers-oauth-provider PKCE downgrade, Entra ID actor token, Hono JWT alg confusion, nOAuth, Tekton token exfil, Argo CD project token, tinyauth |
skills/hunt-llm-ai/SKILL.md | 930 | llm-ai-hunter | OWASP LLM Top 10 v2025 + Agentic AI Top 10. Microsoft 365 Copilot ASCII Smuggling, LangChain GmailToolkit indirect injection (CVE-2025-46059), LangChain PythonREPLTool semantic RCE (CVE-2025-68613), BentoML pickle, Ollama RCE family, Open WebUI SSE injection, MLflow path traversal |
Reference skills (read by methodology-aware agents): hunting-methodology,
recon-methodology, report-writing, sast-methodology,
triage-validation, vuln-classes.
CLI Tools (19)
| Tool | Purpose |
|---|---|
| brain.py | Brain with endpoint tracking + circuit breaker |
| intel_engine.py | Hacktivity patterns + techβvuln mapping |
| journal.py | JSONL session journal for /resume |
| target_selector.py | Program ROI ranking |
| cost_hook.py | CC hook: auto-logs agent completions via SubagentStop |
| statusline.py | Dashboard (--compact/--watch/--json) |
| scope_check.py | Scope validation with --list |
| scope_hook.py | PreToolUse hook: blocks out-of-scope Bash commands (exact + wildcard) |
| cvss_version_guard.py | Enforces H1 = CVSS 3.1, other platforms = CVSS 4.0 |
| file_path_guard.py | Blocks hallucinated file paths in reports |
| file_safety.py | Shared safety checks for agent-written files |
| dedup_findings.py | Dedup + hacktivity cross-reference |
| global_brain.py | Cross-engagement knowledge (incremental hash-based sync) |
| response_tracker.py | Response learning + auto-boost paid techniques |
| scaffold.py | Workspace scaffolding with update mode |
| capture.py | Screenshots + video (WSL2) |
| cost.py | Token cost tracking + ROI |
| camofox_ctl.sh | Camoufox (stealth Firefox) lifecycle β Cloudflare/Akamai bypass |
| pentest-statusline.sh | CC statusline: findings, brain, context, cost |
Rules Library (rules/)
Single source of truth for every agent β all hunters, validators, and report-writers read the relevant files at session start.
| File | Lines | Purpose |
|---|---|---|
hunting.md | 360 | 31 hunting rules (Rule 0 harm check, Rule 8 sibling check, Rule 9 AβB signal, Rule 19 never-submit, Rule 24 mutation matrix, Rule 28 detection-token rotation, Rule 30 no cross-region inference, Rule 31 unauth state-change battery) |
payloads.md | 2,605 | XSS (incl. Detection Mechanism Rotation Ladder) / SSRF / SQLi / IDOR / OAuth / upload / race / SSTI / deser / JWT / LFI / prototype pollution / NoSQLi / DeFi |
techniques.md | 389 | Proven attack techniques extracted from real paid engagements |
waf-bypass-protocol.md | 166 | WAF bypass iteration ladder for Akamai/Cloudflare/Imperva |
vendor-status.md | 127 | Patched vendor vectors, framework fingerprints, cooldown tables |
chain-table.md | 192 | Capabilityβnext-bug chain table for /chain (9 capability rows + 4 documented deep chains) |
never-submit.md | 42 | Never-submit list + conditionally-valid-with-chain table |
mistakes.md | 665 | Top 10 most common mistakes β every agent reads this at session start |
Key Features
- Writeup search MCP: Agents query prior art during hunting β bring your own FAISS/SQLite writeup index, or fall back to the shipped payload/technique library
- CC hooks: SubagentStop/Stop auto-log costs, statusline shows live
$X.XXfrom token data - PreToolUse scope hook: Bash commands are matched (exact + wildcard) against
scope.yaml; out-of-scope targets are blocked before the tool call fires - 7-Question Gate: Every finding validated β first NO = KILL
- Depth Engine:
/autopilotenforces an anti-shallow protocol β no claim of "exhausted" until the exhaustion matrix is complete - Stacked-encoding mandate:
/huntand/autopilotrequire multi-layer encoding in every payload attempt before declaring a surface clean - CVSS policy guard: HackerOne findings use CVSS 3.1; every other platform uses CVSS 4.0 β enforced by
cvss_version_guard.py - Circuit breaker: 5Γ consecutive 403/429 β auto-backoff 60s
- Endpoint tracking: Brain records every endpoint tested per target
- Hard validation gates: /report and /submit refuse without /validate PASS
- Never-submit filter: Pipeline auto-kills informational findings
- Incremental sync: Global brain hash-based, skips unchanged files
- Feedback loop: /learn auto-boosts paid techniques globally
- Session journal: JSONL log for /resume continuity
Requirements
- Python 3.10+,
uv(MCP servers launch viauv run --with mcp) - Optional:
uv pip install faiss-cpu sentence-transformers(for writeup semantic search) - Security tools: nmap, httpx, subfinder, nuclei, ffuf, katana, sqlmap
- GraphQL hunter tools:
graphql-path-enumβcargo install --git https://gitlab.com/dee-see/graphql-path-enum(auto-installed bysetup-mcp.shifcargois present) - Evidence: grim/scrot, wf-recorder/ffmpeg
- jq (for statusline)
License
For authorized security testing only. Follow responsible disclosure.
