AI.Sentinel.Mcp
MCP (Model Context Protocol) proxy for AI.Sentinel — wraps another MCP server and scans tool calls in both directions.
Ask AI about AI.Sentinel.Mcp
Powered by Claude · Grounded in docs
I know everything about AI.Sentinel.Mcp. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
AI.Sentinel
Security monitoring middleware for IChatClient (Microsoft.Extensions.AI). Wraps any LLM client transparently, scans every prompt and response through 55 detectors, and blocks, alerts, or logs threats — with an embedded real-time dashboard.
Not building on
IChatClient? AI.Sentinel also ships as a drop-in hook for Claude Code (sentinel-hook), GitHub Copilot (sentinel-copilot-hook), and any MCP host — Cursor, Continue, Cline, Windsurf — via thesentinel-mcpstdio proxy. Same 55 detectors, same audit trail, zero code changes in the host.
Approval workflows for high-stakes tool calls (
delete_database,send_payment,rotate_secrets) — pluggable backends include in-memory, SQLite (persists across CLI invocations), and native Microsoft Entra PIM (approvers act in the portal they already use). Operators approve/deny from the dashboard or the PIM portal; the conversation resumes when approval lands. See the approvals docs.
Why you need it
When you connect an LLM to your application you inherit a new attack surface. Users can craft messages that override the model's instructions (prompt injection), the model can leak credentials or PII it saw in context (credential exposure), or return fabricated citations and wildly inconsistent numbers (hallucination). None of these are bugs in your code — they happen at the model boundary, which your existing middleware stack doesn't see.
AI.Sentinel sits at that boundary:
User prompt → [AI.Sentinel: scan] → LLM → [AI.Sentinel: scan] → Your app
It scans both directions on every call. If something looks wrong it can quarantine the message before it reaches the model, or quarantine the response before it reaches the user. If it only looks suspicious it alerts your logging/event system. Everything is stored in an in-process audit ring buffer and surfaced on a live dashboard.

The embedded dashboard ships in AI.Sentinel.AspNetCore. Mount it on any ASP.NET Core app with one line — no JavaScript framework, no extra service to run.
Packages
| Package | Description |
|---|---|
AI.Sentinel | Core — pipeline, 55 detectors, intervention engine, audit store |
AI.Sentinel.Detectors.Sdk | SDK for writing and testing custom detectors — SentinelContextBuilder, FakeEmbeddingGenerator, worked examples |
AI.Sentinel.AspNetCore | Embedded dashboard (no JS framework, HTMX + SSE) |
AI.Sentinel.Cli | dotnet tool install AI.Sentinel.Cli — offline replay CLI for forensics + CI |
AI.Sentinel.ClaudeCode / AI.Sentinel.ClaudeCode.Cli | Claude Code native hook adapter — wire into settings.json hooks to scan UserPromptSubmit, PreToolUse, PostToolUse |
AI.Sentinel.Copilot / AI.Sentinel.Copilot.Cli | GitHub Copilot native hook adapter — wire into hooks.json to scan userPromptSubmitted, preToolUse, postToolUse |
AI.Sentinel.Mcp / AI.Sentinel.Mcp.Cli | dotnet tool install AI.Sentinel.Mcp.Cli — stdio MCP proxy that scans tools/call + prompts/get for any MCP-speaking host (Cursor, Continue, Cline, Windsurf, Copilot) |
dotnet add package AI.Sentinel
dotnet add package AI.Sentinel.AspNetCore # optional, for the dashboard
Quick start
// Program.cs
builder.Services.AddAISentinel(opts =>
{
opts.OnCritical = SentinelAction.Quarantine; // throw SentinelException
opts.OnHigh = SentinelAction.Alert; // publish mediator notification
opts.OnMedium = SentinelAction.Log;
opts.OnLow = SentinelAction.Log;
});
builder.Services.AddChatClient(pipeline =>
pipeline.UseAISentinel()
.Use(new OpenAIChatClient(...)));
// optional dashboard
app.MapAISentinel("/ai-sentinel");
Catch quarantined messages:
try
{
var response = await chatClient.GetResponseAsync(messages);
}
catch (SentinelException ex)
{
// ex.PipelineResult has the full detection details
logger.LogWarning("Blocked: {Severity}", ex.PipelineResult?.MaxSeverity);
}
Named pipelines
Register multiple isolated pipelines under string names; pick one per chat client at construction time. Useful for multi-LLM-endpoint apps and dev/staging/prod tier configurations.
// Default + two named variants
services.AddAISentinel(opts => opts.EmbeddingGenerator = realGen);
services.AddAISentinel("strict", opts =>
{
opts.OnCritical = SentinelAction.Quarantine;
opts.Configure<JailbreakDetector>(c => c.SeverityFloor = Severity.High);
});
services.AddAISentinel("lenient", opts =>
{
opts.OnCritical = SentinelAction.Log;
opts.Configure<RepetitionLoopDetector>(c => c.Enabled = false);
});
// Pick one per chat client
services.AddChatClient("openai-strict", b =>
b.UseAISentinel("strict").Use(new OpenAIChatClient(...)));
services.AddChatClient("openai-lenient", b =>
b.UseAISentinel("lenient").Use(new OpenAIChatClient(...)));
Each named pipeline gets its own SentinelOptions, IDetectionPipeline, and InterventionEngine.
The audit store, forwarders, and alert sink are shared — operational dashboards see all pipelines
through one feed. User-added detectors via opts.AddDetector<T>() register globally; per-pipeline
detector tuning rides on opts.Configure<T>(c => ...).
Each named pipeline starts from a fresh SentinelOptions() — no inheritance from the default.
For shared base config, extract a helper:
Action<SentinelOptions> baseCfg = opts => opts.EmbeddingGenerator = realGen;
services.AddAISentinel(baseCfg);
services.AddAISentinel("strict", opts => { baseCfg(opts); opts.OnCritical = SentinelAction.Quarantine; });
Phase A limitations (planned for Phase B when a real user need surfaces):
- Always register the default unnamed
AddAISentinel(...)first. The shared audit store, forwarders, alert sink, and tool-call guard are wired by the default call. Skipping it and registering only named pipelines causes the named chat client to throw a missing-shared-infrastructure error the first time it's resolved (during request handling). - Tool-call authorization is global, not per-name.
opts.RequireToolPolicy(...)calls on named pipelines are silently ignored — only the default pipeline's bindings are consulted byIToolCallGuard. Configure tool policies on the default for now. - No request-time selector. The pipeline is fixed at chat-client construction time; multi-tenant routing where the tenant ID arrives with the request requires Phase B.
How it works
Every call to GetResponseAsync or GetStreamingResponseAsync runs two pipeline passes:
- Prompt scan — before the request reaches the LLM
- Response scan — after the LLM responds, before the result is returned
Each pass runs all enabled detectors in parallel (Task.WhenAll), aggregates a Threat Risk Score (0–100), and calls the Intervention Engine which takes the configured action for the highest severity found.
IChatClient.GetResponseAsync(messages)
│
├─ [1] DetectionPipeline.RunAsync(prompt context)
│ ├─ PromptInjectionDetector
│ ├─ JailbreakDetector
│ ├─ ... (28 more, parallel)
│ └─ ThreatRiskScore + detections
│
├─ InterventionEngine.Apply(result) → Quarantine / Alert / Log / PassThrough
├─ AuditStore.AppendAsync(entry)
│
├─ inner IChatClient.GetResponseAsync(messages)
│
├─ [2] DetectionPipeline.RunAsync(response context)
├─ InterventionEngine.Apply(result)
└─ AuditStore.AppendAsync(entry)
Detectors (55)
Detectors run in three modes:
- Rule-based — fast regex or heuristic, always active, sub-microsecond per call
- Semantic — uses embedding cosine similarity via
EmbeddingGenerator. Language-agnostic. Active only withopts.EmbeddingGeneratorconfigured. - LLM escalation — fires a second-pass LLM classifier (stub detectors, active only with
opts.EscalationClient)
Security (31)
| ID | Detector | Type | Detects |
|---|---|---|---|
SEC‑01 | PromptInjection | Rule-based | Override/injection phrase patterns (ignore all previous instructions, you are now a different AI, etc.) |
SEC‑02 | CredentialExposure | Rule-based | API keys, tokens, private keys, secrets in output |
SEC‑03 | ToolPoisoning | Rule-based | Suspicious tool-call manipulation patterns |
SEC‑04 | DataExfiltration | Rule-based | Base64 blobs, high-entropy encoded data |
SEC‑05 | Jailbreak | Rule-based | Jailbreak attempt phrases (DAN, roleplay exploits) |
SEC‑06 | PrivilegeEscalation | Rule-based | Role/permission escalation requests |
SEC‑07 | CovertChannel | Semantic | Encoding-based hidden payloads |
SEC‑08 | EntropyCovertChannel | LLM escalation | Statistical entropy anomalies in output |
SEC‑09 | IndirectInjection | Semantic | Injection via retrieved documents or tool results |
SEC‑10 | AgentImpersonation | Semantic | Model claiming to be a different agent or system |
SEC‑11 | MemoryCorruption | Semantic | Attempts to corrupt agent memory/context |
SEC‑12 | UnauthorizedAccess | Semantic | Attempts to access restricted resources |
SEC‑13 | ShadowServer | Semantic | Redirection to unauthorised endpoints |
SEC‑14 | InformationFlow | Semantic | Cross-context data leakage |
SEC‑15 | PhantomCitationSecurity | Semantic | Security-context hallucinated authority sources |
SEC‑16 | GovernanceGap | Semantic | Policy/compliance bypass attempts |
SEC‑17 | SupplyChainPoisoning | Semantic | Compromised dependency suggestions |
SEC‑18 | ToolDescriptionDivergence | Stub | Tool description changed at runtime vs. original declaration (requires tool-descriptor snapshot) |
SEC‑20 | SystemPromptLeakage | Rule-based | Verbatim fragments of the system prompt echoed in conversation history |
SEC‑23 | PiiLeakage | Rule-based | PII: SSN, credit card, IBAN, BSN, UK NINO, passport, DE tax ID, email+name, phone, DOB |
SEC‑24 | AdversarialUnicode | Rule-based | Zero-width spaces, homoglyphs, invisible characters used to smuggle hidden instructions |
SEC‑25 | CodeInjection | Rule-based | SQL injection, shell metacharacters, path traversal in LLM-generated code |
SEC‑26 | PromptTemplateLeakage | Rule-based | {{variable}}, <SYSTEM>, [INST] and other prompt scaffolding markers |
SEC‑27 | LanguageSwitchAttack | Rule-based | Abrupt script/language switch mid-response — injection vector via non-Latin text |
SEC‑28 | RefusalBypass | Rule-based | Model complied with a request it should have refused (caller-supplied forbidden patterns) |
SEC‑19 | ToolCallFrequency | Rule-based | Counts ChatRole.Tool messages; flags sessions with excessive tool invocations |
SEC‑21 | ExcessiveAgency | Semantic | Detects autonomous-action language ("I deleted", "I deployed", "I executed") |
SEC‑22 | HumanTrustManipulation | Semantic | Spots rapport/authority manipulation ("you can trust me", "I am your advisor") |
SEC‑29 | OutputSchema | Rule-based | Response doesn't deserialize as the caller-supplied ExpectedResponseType (OWASP LLM05) |
SEC‑30 | ShorthandEmergence | Semantic | Counts unknown all-caps tokens that may signal emergent covert language |
SEC‑31 | VectorRetrievalPoisoning | Semantic | Detects malicious instructions embedded in RAG-retrieved document chunks (OWASP LLM08) |
Hallucination (9)
| ID | Detector | Type | Detects |
|---|---|---|---|
HAL‑01 | PhantomCitation | Rule-based | Fake DOIs, arXiv IDs, .invalid/.nonexistent domains |
HAL‑02 | SelfConsistency | Rule-based | Numeric inconsistency (values differing by >10×) |
HAL‑03 | CrossAgentContradiction | Semantic | Contradictions between agents in a multi-agent session |
HAL‑04 | SourceGrounding | Semantic | Claims unsupported by provided context |
HAL‑05 | ConfidenceDecay | Semantic | Confidence degradation across turns |
HAL‑06 | StaleKnowledge | Semantic | Time-sensitive facts stated as current ("the latest version is X", "the current CEO is Y") |
HAL‑07 | IntraSessionContradiction | Semantic | Model contradicts itself within the same conversation |
HAL‑08 | GroundlessStatistic | Rule-based | Specific percentages or statistics asserted without any source in the provided context |
HAL‑09 | UncertaintyPropagation | Semantic | Flags hedged statements that contradict a definitive assertion in the same response |
Operational (15)
| ID | Detector | Type | Detects |
|---|---|---|---|
OPS‑01 | BlankResponse | Rule-based | Empty or whitespace-only responses |
OPS‑02 | RepetitionLoop | Rule-based | Same sentence repeated 3+ times |
OPS‑03 | IncompleteCodeBlock | Rule-based | Unclosed code fences |
OPS‑04 | PlaceholderText | Rule-based | TODO, [INSERT HERE], Lorem ipsum leftovers |
OPS‑05 | ContextCollapse | Semantic | Loss of conversational context across turns |
OPS‑06 | AgentProbing | Semantic | Attempts to map agent capabilities or system prompt |
OPS‑07 | QueryIntent | Semantic | Malicious intent hidden in benign-looking queries |
OPS‑08 | ResponseCoherence | Semantic | Response that doesn't address the question asked |
OPS‑09 | TruncatedOutput | Rule-based | Detects mid-sentence truncation and unclosed code fences |
OPS‑10 | WaitingForContext | Semantic | Finds stall phrases when the user prompt was substantive |
OPS‑11 | UnboundedConsumption | Rule-based | Compares response length to prompt length; flags unbounded expansion |
OPS‑12 | SemanticRepetition | Semantic | Same idea restated with different wording — extends RepetitionLoop beyond literal string matching |
OPS‑13 | PersonaDrift | Semantic | Model's tone, persona, or stated identity shifts significantly across turns — context poisoning signal |
OPS‑14 | Sycophancy | Semantic | Model reverses a stated position purely because the user pushed back — epistemic cowardice |
OPS‑15 | WrongLanguage | Rule-based | Response language doesn't match the user's language (script/charset detection) |
Semantic detectors are no-ops until
opts.EmbeddingGeneratoris configured. They use embedding cosine similarity and are language-agnostic — no LLM round-trip required.
LLM escalation detectors are no-ops until
opts.EscalationClientis configured. Set it to a cheap fast model (e.g. GPT-4o-mini) to activate them.
Streaming:
GetStreamingResponseAsyncbuffers the complete response before yielding tokens so the response scan can quarantine before any token reaches the application. Time-to-first-token equals full model response latency on this path.
OWASP LLM Top 10 (2025) Coverage
| OWASP | Threat | Detectors |
|---|---|---|
| LLM01 | Prompt Injection | PromptInjectionDetector, IndirectInjectionDetector, ToolPoisoningDetector |
| LLM02 | Sensitive Info Disclosure | CredentialExposureDetector, PiiLeakageDetector, SystemPromptLeakageDetector, PromptTemplateLeakageDetector |
| LLM03 | Supply Chain | SupplyChainPoisoningDetector |
| LLM04 | Data & Model Poisoning | DataExfiltrationDetector, InformationFlowDetector |
| LLM05 | Improper Output Handling | CodeInjectionDetector, OutputSchemaDetector |
| LLM06 | Excessive Agency | ExcessiveAgencyDetector, ToolCallFrequencyDetector |
| LLM07 | System Prompt Leakage | SystemPromptLeakageDetector, GovernanceGapDetector |
| LLM08 | Vector & Embedding Weaknesses | VectorRetrievalPoisoningDetector |
| LLM09 | Misinformation | PhantomCitationDetector, GroundlessStatisticDetector, StaleKnowledgeDetector, UncertaintyPropagationDetector |
| LLM10 | Unbounded Consumption | UnboundedConsumptionDetector, RepetitionLoopDetector |
Tool-Call Authorization
AI.Sentinel ships with IToolCallGuard — a preventive control evaluated before every tool
call across all four surfaces. Decision model is binary Allow | Deny. Same policy
abstraction (IAuthorizationPolicy) as planned ZeroAlloc.Mediator.Authorization.
[AuthorizationPolicy("admin-only")]
public sealed class AdminOnlyPolicy : IAuthorizationPolicy
{
public bool IsAuthorized(ISecurityContext ctx) => ctx.Roles.Contains("admin");
}
services.AddSingleton<IAuthorizationPolicy, AdminOnlyPolicy>();
services.AddAISentinel(opts =>
{
opts.RequireToolPolicy("Bash", "admin-only");
opts.RequireToolPolicy("delete_*", "admin-only");
opts.DefaultToolPolicy = ToolPolicyDefault.Allow; // default
});
builder.Services.AddChatClient(pipeline =>
pipeline.UseAISentinel()
.UseToolCallAuthorization()
.UseFunctionInvocation()
.Use(new OpenAIChatClient(...)));
| Surface | Caller resolution default | Deny semantics |
|---|---|---|
| In-process | IServiceProvider.GetService<ISecurityContext>() → Anonymous | throw ToolCallAuthorizationException |
| Claude Code | HookConfig.CallerContextProvider → Anonymous | HookOutput(Block, reason) |
| Copilot | CopilotHookConfig.CallerContextProvider → Anonymous | HookOutput(Block, reason) |
| MCP proxy | DI provider → SENTINEL_MCP_CALLER_ID/_ROLES env → Anonymous | McpProtocolException(InvalidRequest, reason) |
Default behaviour: if no policies are registered, every call is allowed (drop-in upgrade).
Prompt hardening (OWASP LLM01 — preventive)
SentinelOptions.SystemPrefix prepends a hardening system message to every outbound chat call,
telling the model to treat retrieved/external content as data, not instructions. Detection still
runs on the user's raw prompt; the model receives the hardened version. The caller's ChatMessage
collection is never mutated — a hardened copy is forwarded to the inner client.
services.AddAISentinel(opts =>
{
// First-line OWASP LLM01 mitigation. English default; override for other languages.
opts.SystemPrefix = SentinelOptions.DefaultSystemPrefix;
});
Default behaviour: SystemPrefix == null (no hardening) — opt-in, drop-in upgrade for existing
AI.Sentinel users. If the caller's messages already start with a system message, the prefix is
merged into it as "{SystemPrefix}\n\n{original system text}" — single system message preserved.
Audit storage + forwarding
AI.Sentinel ships with two related capabilities for audit data:
- Storage (
IAuditStore) — singular, queryable, source of truth. Default is in-memory ring buffer; opt into SQLite for persistence across restarts. - Forwarding (
IAuditForwarder) — plural, fire-and-forget, mirrors every audit entry to one or more external systems (NDJSON file, Azure Sentinel, OpenTelemetry).
Default behaviour (no extra registration): in-memory ring buffer + zero forwarders. Existing AI.Sentinel users see no behaviour change.
Persistent storage with SQLite
services.AddAISentinel(opts => { ... });
services.AddSentinelSqliteStore(opts =>
{
opts.DatabasePath = "/var/lib/ai-sentinel/audit.db";
opts.RetentionPeriod = TimeSpan.FromDays(90); // optional time-based cleanup
});
Single-file SQLite DB. WAL mode enabled (concurrent reads while writer active). Hash chain survives restarts. Last-registration-wins for IAuditStore.
Forwarding to external systems
Forwarders are fire-and-forget — never block the proxy, never throw. Failures swallow + log to stderr + increment audit.forward.dropped counter.
// NDJSON file (in core, zero dependencies — direct file append, no buffering)
services.AddSentinelNdjsonFileForwarder(opts =>
opts.FilePath = "/var/log/ai-sentinel/audit.ndjson");
Operators ship the NDJSON file via Filebeat / Vector / Fluent Bit — universal coverage.
// Azure Sentinel (auto-wrapped with BufferingAuditForwarder<T>)
services.AddSentinelAzureSentinelForwarder(opts =>
{
opts.DcrEndpoint = new Uri("https://my-dce.westus2.ingest.monitor.azure.com");
opts.DcrImmutableId = "dcr-abc123";
opts.StreamName = "Custom-AISentinelAudit_CL";
// opts.Credential default = new DefaultAzureCredential()
});
Direct Logs Ingestion API. Static-token auth supported via DCR; OAuth2 / mTLS not in v1 (see backlog). Requires DCR + custom table set up in your Log Analytics workspace.
// OpenTelemetry (vendor-neutral; OTel SDK handles batching)
services.AddSentinelOpenTelemetryForwarder();
services.AddOpenTelemetry().WithLogging(b => b.AddOtlpExporter());
Routes to any OTLP-speaking backend: Splunk, Datadog, Elastic, NewRelic, more. Uses your existing OTel logging pipeline.
Buffering decorator
AzureSentinelAuditForwarder is automatically wrapped — per-entry HTTP roundtrips would crater throughput. Default buffering: batch=100, interval=5s, channel capacity=10000. Drops on overflow with rate-limited stderr log + audit.forward.dropped counter for monitoring. Override via .WithBuffering(...) in the future (currently a v1.1 backlog item).
NdjsonFileAuditForwarder and OpenTelemetryAuditForwarder are NOT auto-buffered — direct file append is already fast, and the OTel SDK does its own BatchLogRecordExportProcessor batching.
New packages
| Package | Purpose | Dependencies |
|---|---|---|
AI.Sentinel.Sqlite | Persistent SqliteAuditStore | Microsoft.Data.Sqlite |
AI.Sentinel.AzureSentinel | AzureSentinelAuditForwarder | Azure.Monitor.Ingestion, Azure.Identity |
AI.Sentinel.OpenTelemetry | OpenTelemetryAuditForwarder | OpenTelemetry, Microsoft.Extensions.Logging.Abstractions |
Configuration
builder.Services.AddAISentinel(opts =>
{
// Action per severity level
opts.OnCritical = SentinelAction.Quarantine; // throws SentinelException
opts.OnHigh = SentinelAction.Alert; // publishes mediator notification + alert sink
opts.OnMedium = SentinelAction.Log; // logs via ILogger
opts.OnLow = SentinelAction.Log;
// opts.OnLow = SentinelAction.PassThrough; // silent
// Optional: embedding provider for 38 semantic detectors (language-agnostic detection)
opts.EmbeddingGenerator = new OpenAIEmbeddingGenerator(...);
// Optional: custom embedding cache (default: in-memory LRU, 1 024 entries)
// options.EmbeddingCache = new MyRedisEmbeddingCache(...);
// Optional: LLM second-pass classifier for 2 stub detectors (ToolDescriptionDivergenceDetector)
opts.EscalationClient = new OpenAIChatClient("gpt-4o-mini", ...);
// Audit ring buffer size (in-process, no external store required)
opts.AuditCapacity = 10_000; // default
// Agent identity labels for audit entries
opts.DefaultSenderId = new AgentId("web-user");
opts.DefaultReceiverId = new AgentId("assistant");
// Optional: POST alert payloads to a webhook on Quarantine/Alert actions
opts.AlertWebhook = new Uri("https://hooks.example.com/sentinel");
// Optional: suppress repeat alerts for the same detector+session.
// null (default) suppresses for the entire session lifetime.
// Set a TimeSpan to re-alert after the window expires.
opts.AlertDeduplicationWindow = TimeSpan.FromMinutes(5);
// Optional: per-session token-bucket circuit breaker.
// MaxCallsPerSecond = steady-state refill rate; BurstSize = initial token count.
// Pass "sentinel.session_id" in ChatOptions.AdditionalProperties for per-user buckets.
// Without a session key, all calls share a global bucket.
opts.MaxCallsPerSecond = 5; // allow 5 calls/sec per session (steady state)
opts.BurstSize = 20; // up-front burst before throttling kicks in
// Optional: inactivity window after which per-session dedup + rate-limiter
// state is evicted. Applies to both AlertDeduplicationWindow's _seen
// dictionary and the rate-limiter bucket map. Default: 1 hour.
// Lower this for high-cardinality session keys (many unique users),
// raise it for long-lived sessions.
opts.SessionIdleTimeout = TimeSpan.FromHours(1);
// Optional: validate structured LLM responses against a caller-supplied type (SEC-29).
// The type must be annotated with [ZeroAllocSerializable(SerializationFormat.SystemTextJson)].
// Requires calling services.AddSerializerDispatcher() (from ZeroAlloc.Serialisation).
opts.ExpectedResponseType = typeof(MyResponse);
});
Actions
SentinelAction | Behaviour |
|---|---|
Quarantine | Throws SentinelException with full PipelineResult. Stops the call. Also fires the alert sink. |
Alert | Publishes ThreatDetectedNotification + InterventionAppliedNotification via IMediator. Fires the alert sink. Call continues. |
Log | Writes to ILogger<InterventionEngine>. Call continues. |
PassThrough | No action. Detections are still audited. |
Alert sink behaviour: when opts.AlertWebhook is set, WebhookAlertSink POSTs a JSON payload (type, severity, detector, reason, action, session) to the configured URL on Quarantine or Alert actions. The DeduplicatingAlertSink wraps the webhook sink and suppresses repeat alerts for the same detector+session, controlled by opts.AlertDeduplicationWindow.
Embedding cache
Scan-time embeddings are cached by default in a 1 024-entry in-memory LRU store
(InMemoryLruEmbeddingCache). The cache is keyed by input text and avoids
redundant API calls for repeated messages in the same process.
To use a persistent or shared cache, implement IEmbeddingCache and set it on
SentinelOptions:
options.EmbeddingCache = new MyRedisEmbeddingCache(redis, ttl: TimeSpan.FromHours(1));
Tuning individual detectors
Use opts.Configure<T>(c => ...) to disable a detector or clamp its severity output:
services.AddAISentinel(opts =>
{
// Disable a detector entirely — zero CPU cost, no audit entries
opts.Configure<WrongLanguageDetector>(c => c.Enabled = false);
// Elevate any firing of JailbreakDetector to at least High
opts.Configure<JailbreakDetector>(c => c.SeverityFloor = Severity.High);
// Cap a noisy detector's output to Low
opts.Configure<RepetitionLoopDetector>(c => c.SeverityCap = Severity.Low);
});
Floor and Cap apply only to firing results — Clean results pass through unchanged
(no fabricated findings). Multiple Configure<T> calls for the same detector merge by
mutation, so base configuration and per-environment overrides compose naturally.
Dashboard
Mount the built-in dashboard with one line:
app.UseAISentinel("/ai-sentinel");
// Protect it with your own middleware:
app.UseAISentinel("/ai-sentinel", branch =>
branch.Use(RequireInternalNetwork));
The dashboard shows:
- Threat Risk Score — live ring gauge (0–100, SAFE / WATCH / ALERT / ISOLATE)
- Live event feed — every detection with severity badge, detector ID, reason, and session-id drill-down link
- Severity trend chart — 15-minute rolling sparkline, stroke colour modulates by max severity
- Category chips — filter by
Security/Hallucination/Operational/Authorization(DetectorId prefix) - Live free-text search — case-insensitive substring match against the Reason column, 300 ms debounced
- Per-session timeline drill-down — click any session id in the feed → URL gains
?session=<id>→ feed and export filter to that session, with a clearable pill above the table - NDJSON export — 📥 Export button downloads the current filtered view as NDJSON (
/api/export.ndjsonhonors all active filters;X-Sentinel-Truncated: trueheader signals if the store cap was hit) - Auto dark mode — pure-CSS
prefers-color-schemeswitch; severity / authz accents preserved across themes - Detector hit stats — which detectors fire most
All filter parameters intersect: chips ∧ session ∧ search. The URL captures full operator state — every view is shareable and bookmarkable.
Looks like
| Desktop (light) | Filtered drill-down | Desktop (dark) |
|---|---|---|
![]() | ![]() | ![]() |
| Tablet | Mobile |
|---|---|
![]() | ![]() |
No npm, no JS build step — served entirely from embedded resources using HTMX + SSE + raw CSS.
Events / Mediator integration
If your DI container has an IMediator (ZeroAlloc.Mediator, MediatR-compatible), AI.Sentinel publishes two notification types on Alert-level events:
// Fired when a threat is detected
readonly record struct ThreatDetectedNotification(
SessionId SessionId,
AgentId SenderId,
AgentId ReceiverId,
PipelineResult PipelineResult,
DateTimeOffset DetectedAt);
// Fired when an intervention is applied
readonly record struct InterventionAppliedNotification(
SessionId SessionId,
SentinelAction Action,
Severity Severity,
string Reason,
DateTimeOffset AppliedAt);
Register a handler to forward these to Slack, PagerDuty, your SIEM, or anywhere else.
CLI: sentinel (offline replay)
AI.Sentinel.Cli is a dotnet tool that replays saved conversations through the full detector pipeline — useful for incident forensics, CI regression testing, and detector tuning.
dotnet tool install -g AI.Sentinel.Cli
sentinel scan conversation.json
Accepts OpenAI Chat Completion JSON ({"messages": [...]}) or AI.Sentinel audit NDJSON. Auto-detects by default.
sentinel scan conversation.json
[--format <openai|audit|auto>] # default: auto
[--output <text|json>] # default: text
[--expect <detectorId>] # repeatable, e.g. --expect SEC-01
[--min-severity <Low|Medium|High|Critical>]
[--baseline <prior-result.json>] # diff against a prior run
Exit codes: 0 scan completed (no failing assertions), 1 assertion failed or baseline regression, 2 I/O or parse error.
The CLI's core types — SentinelReplayClient, ConversationLoader, ReplayRunner, ReplayResult — are all public, so callers can reference AI.Sentinel.Cli programmatically from their own xUnit tests to assert detection behavior on saved conversations.
IDE / Agent integration
AI.Sentinel ships native hook adapters for the two major AI coding agents that support out-of-process hook scripts. Both adapters read hook payloads from stdin, run the detector pipeline, and signal block/warn/allow via exit code + stdout JSON.
Claude Code
dotnet tool install -g AI.Sentinel.ClaudeCode.Cli
Add to ~/.claude/settings.json (or your project's .claude/settings.json):
{
"hooks": {
"UserPromptSubmit": [
{ "hooks": [{ "type": "command", "command": "sentinel-hook user-prompt-submit" }] }
],
"PreToolUse": [
{ "matcher": "*", "hooks": [{ "type": "command", "command": "sentinel-hook pre-tool-use" }] }
],
"PostToolUse": [
{ "matcher": "*", "hooks": [{ "type": "command", "command": "sentinel-hook post-tool-use" }] }
]
}
}
GitHub Copilot
dotnet tool install -g AI.Sentinel.Copilot.Cli
Add to your repo's hooks.json (per Copilot hook documentation):
{
"version": 1,
"hooks": {
"userPromptSubmitted": [
{ "type": "command", "bash": "sentinel-copilot-hook user-prompt-submitted", "timeoutSec": 10 }
],
"preToolUse": [
{ "type": "command", "bash": "sentinel-copilot-hook pre-tool-use", "timeoutSec": 10 }
],
"postToolUse": [
{ "type": "command", "bash": "sentinel-copilot-hook post-tool-use", "timeoutSec": 10 }
]
}
}
MCP proxy
For any MCP-speaking agent (Cursor, Continue, Cline, Windsurf, Copilot's MCP path), install the proxy and point your MCP host at it instead of at the target server directly:
dotnet tool install -g AI.Sentinel.Mcp.Cli
Example entry in your MCP host config (mcpServers block or equivalent):
{
"mcpServers": {
"filesystem-guarded": {
"command": "sentinel-mcp",
"args": ["proxy", "--target", "uvx", "mcp-server-filesystem", "/home/me"],
"env": {
"SENTINEL_HOOK_ON_CRITICAL": "Block",
"SENTINEL_MCP_DETECTORS": "security"
}
}
}
}
The proxy spawns the target command as a subprocess, intercepts tools/call and prompts/get, scans through the Sentinel detector pipeline, and blocks threats with a JSON-RPC error to the host.
Additional env vars (beyond the shared SENTINEL_HOOK_ON_* table above):
| Variable | Default | Values |
|---|---|---|
SENTINEL_MCP_DETECTORS | security | security (9 regex security detectors) or all (every detector — higher false-positive rate on structured data) |
SENTINEL_MCP_MAX_SCAN_BYTES | 262144 | Truncation cap on tool-result text passed to the detector pipeline. Counts UTF-8 bytes (see v1.1 note below). Full content still forwarded to the host. |
MCP proxy v1.1
In addition to tools/call and prompts/get, the proxy now intercepts resources/read,
mirrors target server capabilities (only advertises tools / prompts / resources to the
host when the upstream target advertises them), and supports HTTP transports alongside
stdio.
New environment variables:
| Variable | Default | Purpose |
|---|---|---|
SENTINEL_MCP_SCAN_MIMES | text/,application/json,application/xml,application/yaml | MIME allowlist for resources/read scanning. Comma-separated. A trailing / matches any subtype (e.g. text/ matches text/plain and text/html). Resources outside the allowlist forward verbatim without scanning. |
SENTINEL_MCP_HTTP_HEADERS | (none) | key=value;key=value headers applied to every HTTP-transport request. Use for static-token auth (e.g. Authorization=Bearer xyz). Malformed pairs are skipped silently. |
SENTINEL_MCP_TIMEOUT_SEC | 5 | Subprocess shutdown grace in seconds. After this window the proxy logs transport_dispose action=grace_expired and returns; the MCP host's own kill policy is the second line of defence. |
SENTINEL_MCP_LOG_JSON | (off) | Set to 1 for NDJSON stderr output. Default is key=value lines. Useful when piping proxy logs into a log aggregator. |
New CLI flags:
sentinel-mcp proxy [--on-critical Block|Warn|Allow]
[--on-high Block|Warn|Allow]
[--on-medium Block|Warn|Allow]
[--on-low Block|Warn|Allow]
--target /path/to/server arg1 ... # stdio mode (existing)
sentinel-mcp proxy [...flags...] --target https://example.com/mcp # HTTP mode (new)
Precedence: CLI flag > SENTINEL_MCP_ON_* env var > existing shared SENTINEL_HOOK_ON_*
env var > default. The shared SENTINEL_HOOK_ON_* env vars continue to work unchanged.
When --target starts with http:// or https:// the proxy uses
HttpClientTransport (Streamable HTTP with automatic SSE fallback) instead of spawning
a subprocess. Combine with SENTINEL_MCP_HTTP_HEADERS for token auth.
Auth scope: SENTINEL_MCP_HTTP_HEADERS covers static-token auth only
(bearer tokens, API keys, tenant headers). OAuth2 flows and mTLS client certificates
are not supported in v1.1 — see the deferred items in docs/BACKLOG.md if you need them.
Behaviour change in v1.1: SENTINEL_MCP_MAX_SCAN_BYTES now counts UTF-8 bytes
(was a char count, which double-counted for multi-byte characters). ASCII content
is unchanged; emoji / CJK / accented text reaches the cap sooner.
Severity → action mapping
Both adapters share the same env-var contract — configure once, applies to both:
| Variable | Default | Values |
|---|---|---|
SENTINEL_HOOK_ON_CRITICAL | Block | Block / Warn / Allow |
SENTINEL_HOOK_ON_HIGH | Block | Block / Warn / Allow |
SENTINEL_HOOK_ON_MEDIUM | Warn | Block / Warn / Allow |
SENTINEL_HOOK_ON_LOW | Allow | Block / Warn / Allow |
SENTINEL_HOOK_VERBOSE | false | 1 / true / yes → emit a one-line diagnostic to stderr on every invocation |
Block → hook exits 2, which both Claude Code and Copilot surface as "call blocked" with the detector ID + reason on stderr. Warn → exit 0 with the reason on stderr (visible in the agent's log). Allow → silent pass.
Diagnostics: did the hook fire?
Set SENTINEL_HOOK_VERBOSE=1 to emit a grep-friendly one-liner to stderr on every invocation, including Allow outcomes:
[sentinel-hook] event=user-prompt-submit decision=Allow session=sess-42
[sentinel-hook] event=user-prompt-submit decision=Block detector=SEC-01 severity=Critical session=sess-42
Useful when wiring the hook for the first time ("did it run?") or when a block was expected but didn't happen. Leave off in steady state — the normal Block/Warn reason is already on stderr.
Native binary (optional, faster cold start)
The hook CLIs are Native-AOT ready. Both .csproj files gate PackAsTool behind PublishAot, so a single source tree produces either the dotnet tool NuGet package or a ~6.5 MB single-file native binary, depending on how you publish it.
dotnet publish src/AI.Sentinel.ClaudeCode.Cli -c Release -r win-x64 -p:PublishAot=true
Replace win-x64 with linux-x64, osx-arm64, etc. Output lands under bin/Release/net8.0/<rid>/publish/sentinel-hook[.exe]. Expect ~10× faster cold start than the dotnet-tool entry point — worth it if the hook fires on every tool call. Point the agent's hook command at the binary's full path instead of sentinel-hook.
Prereqs on Windows: the Visual Studio "Desktop development with C++" workload (for
link.exe+ Windows SDK). Linux needsclang/libcdev packages; macOS needs Xcode CLT.
Programmatic use
The underlying libraries (AI.Sentinel.ClaudeCode and AI.Sentinel.Copilot) expose HookAdapter / CopilotHookAdapter and the vendor-agnostic HookPipelineRunner as public types. Reference the library packages (not the .Cli tool packages) to write your own host integration in C#.
OpenTelemetry
AI.Sentinel emits metrics and traces out of the box via the ai.sentinel meter and activity source. Wire them into your existing OTel pipeline with the standard .NET instrumentation API:
builder.Services.AddOpenTelemetry()
.WithMetrics(m => m.AddMeter("ai.sentinel"))
.WithTracing(t => t.AddSource("ai.sentinel"));
Metrics
| Metric | Type | Description |
|---|---|---|
sentinel.scans | Counter | Total pipeline scans executed |
sentinel.scan.ms | Histogram | Pipeline scan duration in milliseconds |
sentinel.threats | Counter | Threats detected (tagged by severity and detector) |
sentinel.alerts.suppressed | Counter | Alerts suppressed by the deduplication window (tagged by detector) |
sentinel.rate_limit.exceeded | Counter | Calls rejected by the per-session rate limiter (tagged by session) |
Traces
Each GetResponseAsync / GetStreamingResponseAsync call produces a sentinel.scan span (one per direction — prompt and response) with the following attributes:
| Attribute | Description |
|---|---|
sentinel.severity | Max severity found in this scan |
sentinel.is_clean | true when no threats were detected |
sentinel.threat_count | Number of distinct detector hits |
sentinel.top_detector | ID of the highest-severity detector that fired |
All metrics and spans are zero-cost when no listener is registered — there is no overhead when OTel is not configured.
Audit store
All detections (regardless of severity) are written to a ring buffer audit store in process memory. Capacity defaults to 10,000 entries; oldest entries are overwritten when full.
Query the store directly:
var store = app.Services.GetRequiredService<IAuditStore>();
await foreach (var entry in store.QueryAsync(new AuditQuery
{
MinSeverity = Severity.Medium,
From = DateTimeOffset.UtcNow.AddHours(-1),
PageSize = 100
}, CancellationToken.None))
{
Console.WriteLine($"{entry.Timestamp:HH:mm:ss} [{entry.Severity}] {entry.DetectorId}: {entry.Reason}");
}
Benchmarks
All measurements: .NET 9.0.15, Windows 11, Release, Job.Default, MemoryDiagnoser + ThreadingDiagnoser.
Individual detectors
| Scenario | Mean | Allocated |
|---|---|---|
PromptInjection — clean input | ~59 ns | 0 B |
PromptInjection — malicious input | ~231 ns | 480 B |
RepetitionLoopDetector — clean input | ~106 ns | 296 B |
Detection pipeline (DetectionPipeline.RunAsync, no-op inner client)
| Detector set | Input | Mean | Allocated |
|---|---|---|---|
| Empty (baseline) | clean | ~16 ns | 32 B |
| Security-only (13 detectors) | clean | ~958 ns | 472 B |
| Security-only (13 detectors) | malicious | ~2,388 ns | 2,616 B |
| All detectors (25 rule-based) | clean | ~1,855 ns | 1,568 B |
| All detectors (25 rule-based) | malicious | ~3,462 ns | 4,008 B |
End-to-end (SentinelChatClient.GetResponseAsync, no-op inner client, single short message)
| Detector set | Input | Mean | Allocated |
|---|---|---|---|
| Empty | clean | ~994 ns | 1.24 KB |
| Security-only | clean | ~2,636 ns | 2.26 KB |
| All detectors | clean | ~6,268 ns | 4.53 KB |
| All detectors | malicious | ~8,653 ns | 7.25 KB |
Audit store
| Scenario | Mean | Allocated |
|---|---|---|
| Sequential append | ~118 ns | 0 B |
| 8 concurrent appends | ~1,468 ns | 400 B |
E2E numbers exclude real LLM latency (measured against a no-op inner client). Add your model's round-trip time on top.
Run the full suite yourself:
dotnet run --project benchmarks/AI.Sentinel.Benchmarks -c Release -- --filter "*"
License
MIT © Marcel Roozekrans




