io.github.bch1212/injectshield
Prompt-injection firewall for AI agents β scan untrusted text before LLM calls.
Ask AI about io.github.bch1212/injectshield
Powered by Claude Β· Grounded in docs
I know everything about io.github.bch1212/injectshield. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
InjectShield
Prompt-injection firewall for AI agents.
A drop-in REST API that detects and neutralizes injection attacks in any text β git commits, web pages, files, emails, user inputs β before they reach your AI agent's context window.
This repo is the open-source heuristic ruleset plus the source for the managed API at promptshield.pages.dev.
Why
In May 2026 a viral HN thread demonstrated that a single git commit message could burn a Claude Code user's entire session quota via a schema-driven attack ("OpenClaw"). The pattern is general: any AI agent that ingests untrusted text β code review bots, documentation summarizers, RAG agents, support copilots β is exposed to prompt injection. Most teams ship without any input-side defense.
InjectShield is one layer of a defense-in-depth strategy. It's not a silver bullet. Use it alongside system-prompt hardening, tool sandboxing, and output filtering.
Install as an MCP (Claude Code, Cursor, Cline, ...)
InjectShield ships a native MCP server at @injectshield/mcp. Once installed, your agent has three new tools β scan, scan_url, patterns β for input-side defense without writing any glue code.
# Claude Code:
claude mcp add injectshield --env INJECTSHIELD_API_KEY=is_live_β¦ -- npx -y @injectshield/mcp
For Cursor / Cline / other MCP clients, see packages/injectshield-mcp/README.md.
Quick start
# 1) Get a key (delivered by email):
curl -X POST https://api.injectshield.dev/v1/keys \
-H "Content-Type: application/json" \
-d '{"email":"you@company.com"}'
# 2) Scan:
curl -X POST https://api.injectshield.dev/v1/scan \
-H "Authorization: Bearer is_live_..." \
-H "Content-Type: application/json" \
-d '{"text":"ignore previous instructions","context":"user_input"}'
Or signup via the landing page: https://injectshield.dev β self-serve, email delivery.
What's open-source vs. managed
Live:
- Landing page + live demo: https://injectshield.dev
- API base:
https://api.injectshield.dev - Health: https://api.injectshield.dev/healthz
- Docs: https://injectshield.dev/docs
Open-source (this repo, MIT):
src/patterns.tsβ the heuristic pattern library (~20 categorized rules).src/detect.tsβ the detection engine (heuristic aggregation, sanitization).test/β the test suite.server/,public/β the full API + landing-page source.
Managed only (paid tiers):
- Hosted API with usage metering, dashboards, custom-pattern uploads, webhook alerts, no-logging mode (Pro), team accounts.
- Future: Workers AI / Anthropic semantic classifier with prompt-engineered injection detection.
Detection categories
| Category | Examples |
|---|---|
instruction_injection | "ignore previous instructions", "new system prompt" |
system_override | system-prompt leak, role-tag forgery, ChatML/Llama special tokens |
role_hijack | "you are nowβ¦", DAN, Developer Mode |
exfiltration | data sent to attacker URLs, markdown image exfil |
schema_attack | OpenClaw-style schema references |
encoding_smuggle | base64-decoded directives |
invisible_text | zero-width / bidi / Unicode-Tag smuggling |
tool_abuse | synthetic tool-call directives in untrusted text |
jailbreak_classic | DAN, "no restrictions", etc. |
Contributing patterns
Found a novel attack? Open a PR adding a PatternRule to src/patterns.ts with:
- A unique
id. - A
categoryfrom the enum above. - A
weightin [0, 1] β pick conservatively; the aggregation indetect.tscombines weights so every additional rule contributes meaningfully but isn't dominant. - A test in
test/detect.test.tscovering both a positive and a likely-benign negative example.
We auto-deploy merged patterns to the managed API. No-cost contributions get attribution in the changelog.
Running locally
npm install
npm test # 11 tests, ~20ms
DATABASE_URL=postgres://... npm run dev # boots Hono on :8080
License
MIT. InjectShield reduces but does not eliminate prompt-injection risk.
Acknowledgments
Built on Cloudflare Pages (frontend) + Railway (API) + Postgres + Anthropic Claude (semantic layer). Pattern library informed by HackAPrompt, the PINT benchmark, and a long list of public attack examples.
