🔒

io.github.bch1212/injectshield

Prompt-injection firewall for AI agents — scan untrusted text before LLM calls.

0 installs

Trust: 37 — Low

Security

Ask AI about io.github.bch1212/injectshield

I know everything about io.github.bch1212/injectshield. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

InjectShield

Prompt-injection firewall for AI agents.

A drop-in REST API that detects and neutralizes injection attacks in any text — git commits, web pages, files, emails, user inputs — before they reach your AI agent's context window.

This repo is the open-source heuristic ruleset plus the source for the managed API at promptshield.pages.dev.

Why

In May 2026 a viral HN thread demonstrated that a single git commit message could burn a Claude Code user's entire session quota via a schema-driven attack ("OpenClaw"). The pattern is general: any AI agent that ingests untrusted text — code review bots, documentation summarizers, RAG agents, support copilots — is exposed to prompt injection. Most teams ship without any input-side defense.

InjectShield is one layer of a defense-in-depth strategy. It's not a silver bullet. Use it alongside system-prompt hardening, tool sandboxing, and output filtering.

Install as an MCP (Claude Code, Cursor, Cline, ...)

InjectShield ships a native MCP server at @injectshield/mcp. Once installed, your agent has three new tools — scan, scan_url, patterns — for input-side defense without writing any glue code.

# Claude Code:
claude mcp add injectshield --env INJECTSHIELD_API_KEY=is_live_… -- npx -y @injectshield/mcp

For Cursor / Cline / other MCP clients, see packages/injectshield-mcp/README.md.

Quick start

# 1) Get a key (delivered by email):
curl -X POST https://api.injectshield.dev/v1/keys \
  -H "Content-Type: application/json" \
  -d '{"email":"you@company.com"}'

# 2) Scan:
curl -X POST https://api.injectshield.dev/v1/scan \
  -H "Authorization: Bearer is_live_..." \
  -H "Content-Type: application/json" \
  -d '{"text":"ignore previous instructions","context":"user_input"}'

Or signup via the landing page: https://injectshield.dev — self-serve, email delivery.

What's open-source vs. managed

Live:

Landing page + live demo: https://injectshield.dev
API base: https://api.injectshield.dev
Health: https://api.injectshield.dev/healthz
Docs: https://injectshield.dev/docs

Open-source (this repo, MIT):

src/patterns.ts — the heuristic pattern library (~20 categorized rules).
src/detect.ts — the detection engine (heuristic aggregation, sanitization).
test/ — the test suite.
server/, public/ — the full API + landing-page source.

Managed only (paid tiers):

Hosted API with usage metering, dashboards, custom-pattern uploads, webhook alerts, no-logging mode (Pro), team accounts.
Future: Workers AI / Anthropic semantic classifier with prompt-engineered injection detection.

Detection categories

Category	Examples
`instruction_injection`	"ignore previous instructions", "new system prompt"
`system_override`	system-prompt leak, role-tag forgery, ChatML/Llama special tokens
`role_hijack`	"you are now…", DAN, Developer Mode
`exfiltration`	data sent to attacker URLs, markdown image exfil
`schema_attack`	OpenClaw-style schema references
`encoding_smuggle`	base64-decoded directives
`invisible_text`	zero-width / bidi / Unicode-Tag smuggling
`tool_abuse`	synthetic tool-call directives in untrusted text
`jailbreak_classic`	DAN, "no restrictions", etc.

Contributing patterns

Found a novel attack? Open a PR adding a PatternRule to src/patterns.ts with:

A unique id.
A category from the enum above.
A weight in [0, 1] — pick conservatively; the aggregation in detect.ts combines weights so every additional rule contributes meaningfully but isn't dominant.
A test in test/detect.test.ts covering both a positive and a likely-benign negative example.

We auto-deploy merged patterns to the managed API. No-cost contributions get attribution in the changelog.

Running locally

npm install
npm test         # 11 tests, ~20ms
DATABASE_URL=postgres://... npm run dev   # boots Hono on :8080

License

MIT. InjectShield reduces but does not eliminate prompt-injection risk.

Acknowledgments

Built on Cloudflare Pages (frontend) + Railway (API) + Postgres + Anthropic Claude (semantic layer). Pattern library informed by HackAPrompt, the PINT benchmark, and a long list of public attack examples.