📦

AI.Sentinel.Mcp

MCP (Model Context Protocol) proxy for AI.Sentinel — wraps another MCP server and scans tool calls in both directions.

0 installs

Trust: 34 — Low

Browser

Ask AI about AI.Sentinel.Mcp

I know everything about AI.Sentinel.Mcp. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

AI.Sentinel

Security monitoring middleware for IChatClient (Microsoft.Extensions.AI). Wraps any LLM client transparently, scans every prompt and response through 55 detectors, and blocks, alerts, or logs threats — with an embedded real-time dashboard.

Not building on IChatClient? AI.Sentinel also ships as a drop-in hook for Claude Code (sentinel-hook), GitHub Copilot (sentinel-copilot-hook), and any MCP host — Cursor, Continue, Cline, Windsurf — via the sentinel-mcp stdio proxy. Same 55 detectors, same audit trail, zero code changes in the host.

Approval workflows for high-stakes tool calls (delete_database, send_payment, rotate_secrets) — pluggable backends include in-memory, SQLite (persists across CLI invocations), and native Microsoft Entra PIM (approvers act in the portal they already use). Operators approve/deny from the dashboard or the PIM portal; the conversation resumes when approval lands. See the approvals docs.

Why you need it

When you connect an LLM to your application you inherit a new attack surface. Users can craft messages that override the model's instructions (prompt injection), the model can leak credentials or PII it saw in context (credential exposure), or return fabricated citations and wildly inconsistent numbers (hallucination). None of these are bugs in your code — they happen at the model boundary, which your existing middleware stack doesn't see.

AI.Sentinel sits at that boundary:

User prompt → [AI.Sentinel: scan] → LLM → [AI.Sentinel: scan] → Your app

It scans both directions on every call. If something looks wrong it can quarantine the message before it reaches the model, or quarantine the response before it reaches the user. If it only looks suspicious it alerts your logging/event system. Everything is stored in an in-process audit ring buffer and surfaced on a live dashboard.

AI.Sentinel dashboard — TRS gauge, severity counters, live event feed

The embedded dashboard ships in AI.Sentinel.AspNetCore. Mount it on any ASP.NET Core app with one line — no JavaScript framework, no extra service to run.

Packages

Package	Description
`AI.Sentinel`	Core — pipeline, 55 detectors, intervention engine, audit store
`AI.Sentinel.Detectors.Sdk`	SDK for writing and testing custom detectors — `SentinelContextBuilder`, `FakeEmbeddingGenerator`, worked examples
`AI.Sentinel.AspNetCore`	Embedded dashboard (no JS framework, HTMX + SSE)
`AI.Sentinel.Cli`	`dotnet tool install AI.Sentinel.Cli` — offline replay CLI for forensics + CI
`AI.Sentinel.ClaudeCode` / `AI.Sentinel.ClaudeCode.Cli`	Claude Code native hook adapter — wire into `settings.json` hooks to scan UserPromptSubmit, PreToolUse, PostToolUse
`AI.Sentinel.Copilot` / `AI.Sentinel.Copilot.Cli`	GitHub Copilot native hook adapter — wire into `hooks.json` to scan userPromptSubmitted, preToolUse, postToolUse
`AI.Sentinel.Mcp` / `AI.Sentinel.Mcp.Cli`	`dotnet tool install AI.Sentinel.Mcp.Cli` — stdio MCP proxy that scans `tools/call` + `prompts/get` for any MCP-speaking host (Cursor, Continue, Cline, Windsurf, Copilot)

dotnet add package AI.Sentinel
dotnet add package AI.Sentinel.AspNetCore   # optional, for the dashboard

Quick start

// Program.cs
builder.Services.AddAISentinel(opts =>
{
    opts.OnCritical = SentinelAction.Quarantine; // throw SentinelException
    opts.OnHigh     = SentinelAction.Alert;      // publish mediator notification
    opts.OnMedium   = SentinelAction.Log;
    opts.OnLow      = SentinelAction.Log;
});

builder.Services.AddChatClient(pipeline =>
    pipeline.UseAISentinel()
            .Use(new OpenAIChatClient(...)));

// optional dashboard
app.MapAISentinel("/ai-sentinel");

Catch quarantined messages:

try
{
    var response = await chatClient.GetResponseAsync(messages);
}
catch (SentinelException ex)
{
    // ex.PipelineResult has the full detection details
    logger.LogWarning("Blocked: {Severity}", ex.PipelineResult?.MaxSeverity);
}

Named pipelines

Register multiple isolated pipelines under string names; pick one per chat client at construction time. Useful for multi-LLM-endpoint apps and dev/staging/prod tier configurations.

// Default + two named variants
services.AddAISentinel(opts => opts.EmbeddingGenerator = realGen);
services.AddAISentinel("strict", opts =>
{
    opts.OnCritical = SentinelAction.Quarantine;
    opts.Configure<JailbreakDetector>(c => c.SeverityFloor = Severity.High);
});
services.AddAISentinel("lenient", opts =>
{
    opts.OnCritical = SentinelAction.Log;
    opts.Configure<RepetitionLoopDetector>(c => c.Enabled = false);
});

// Pick one per chat client
services.AddChatClient("openai-strict", b =>
    b.UseAISentinel("strict").Use(new OpenAIChatClient(...)));
services.AddChatClient("openai-lenient", b =>
    b.UseAISentinel("lenient").Use(new OpenAIChatClient(...)));

Each named pipeline gets its own SentinelOptions, IDetectionPipeline, and InterventionEngine. The audit store, forwarders, and alert sink are shared — operational dashboards see all pipelines through one feed. User-added detectors via opts.AddDetector<T>() register globally; per-pipeline detector tuning rides on opts.Configure<T>(c => ...).

Each named pipeline starts from a fresh SentinelOptions() — no inheritance from the default. For shared base config, extract a helper:

Action<SentinelOptions> baseCfg = opts => opts.EmbeddingGenerator = realGen;
services.AddAISentinel(baseCfg);
services.AddAISentinel("strict", opts => { baseCfg(opts); opts.OnCritical = SentinelAction.Quarantine; });

Phase A limitations (planned for Phase B when a real user need surfaces):

Always register the default unnamed AddAISentinel(...) first. The shared audit store, forwarders, alert sink, and tool-call guard are wired by the default call. Skipping it and registering only named pipelines causes the named chat client to throw a missing-shared-infrastructure error the first time it's resolved (during request handling).
Tool-call authorization is global, not per-name. opts.RequireToolPolicy(...) calls on named pipelines are silently ignored — only the default pipeline's bindings are consulted by IToolCallGuard. Configure tool policies on the default for now.
No request-time selector. The pipeline is fixed at chat-client construction time; multi-tenant routing where the tenant ID arrives with the request requires Phase B.

How it works

Every call to GetResponseAsync or GetStreamingResponseAsync runs two pipeline passes:

Prompt scan — before the request reaches the LLM
Response scan — after the LLM responds, before the result is returned

Each pass runs all enabled detectors in parallel (Task.WhenAll), aggregates a Threat Risk Score (0–100), and calls the Intervention Engine which takes the configured action for the highest severity found.

IChatClient.GetResponseAsync(messages)
  │
  ├─ [1] DetectionPipeline.RunAsync(prompt context)
  │       ├─ PromptInjectionDetector
  │       ├─ JailbreakDetector
  │       ├─ ... (28 more, parallel)
  │       └─ ThreatRiskScore + detections
  │
  ├─ InterventionEngine.Apply(result)   → Quarantine / Alert / Log / PassThrough
  ├─ AuditStore.AppendAsync(entry)
  │
  ├─ inner IChatClient.GetResponseAsync(messages)
  │
  ├─ [2] DetectionPipeline.RunAsync(response context)
  ├─ InterventionEngine.Apply(result)
  └─ AuditStore.AppendAsync(entry)

Detectors (55)

Detectors run in three modes:

Rule-based — fast regex or heuristic, always active, sub-microsecond per call
Semantic — uses embedding cosine similarity via EmbeddingGenerator. Language-agnostic. Active only with opts.EmbeddingGenerator configured.
LLM escalation — fires a second-pass LLM classifier (stub detectors, active only with opts.EscalationClient)

Security (31)

ID	Detector	Type	Detects
`SEC‑01`	PromptInjection	Rule-based	Override/injection phrase patterns (`ignore all previous instructions`, `you are now a different AI`, etc.)
`SEC‑02`	CredentialExposure	Rule-based	API keys, tokens, private keys, secrets in output
`SEC‑03`	ToolPoisoning	Rule-based	Suspicious tool-call manipulation patterns
`SEC‑04`	DataExfiltration	Rule-based	Base64 blobs, high-entropy encoded data
`SEC‑05`	Jailbreak	Rule-based	Jailbreak attempt phrases (DAN, roleplay exploits)
`SEC‑06`	PrivilegeEscalation	Rule-based	Role/permission escalation requests
`SEC‑07`	CovertChannel	Semantic	Encoding-based hidden payloads
`SEC‑08`	EntropyCovertChannel	LLM escalation	Statistical entropy anomalies in output
`SEC‑09`	IndirectInjection	Semantic	Injection via retrieved documents or tool results
`SEC‑10`	AgentImpersonation	Semantic	Model claiming to be a different agent or system
`SEC‑11`	MemoryCorruption	Semantic	Attempts to corrupt agent memory/context
`SEC‑12`	UnauthorizedAccess	Semantic	Attempts to access restricted resources
`SEC‑13`	ShadowServer	Semantic	Redirection to unauthorised endpoints
`SEC‑14`	InformationFlow	Semantic	Cross-context data leakage
`SEC‑15`	PhantomCitationSecurity	Semantic	Security-context hallucinated authority sources
`SEC‑16`	GovernanceGap	Semantic	Policy/compliance bypass attempts
`SEC‑17`	SupplyChainPoisoning	Semantic	Compromised dependency suggestions
`SEC‑18`	ToolDescriptionDivergence	Stub	Tool description changed at runtime vs. original declaration (requires tool-descriptor snapshot)
`SEC‑20`	SystemPromptLeakage	Rule-based	Verbatim fragments of the system prompt echoed in conversation history
`SEC‑23`	PiiLeakage	Rule-based	PII: SSN, credit card, IBAN, BSN, UK NINO, passport, DE tax ID, email+name, phone, DOB
`SEC‑24`	AdversarialUnicode	Rule-based	Zero-width spaces, homoglyphs, invisible characters used to smuggle hidden instructions
`SEC‑25`	CodeInjection	Rule-based	SQL injection, shell metacharacters, path traversal in LLM-generated code
`SEC‑26`	PromptTemplateLeakage	Rule-based	`{{variable}}`, `<SYSTEM>`, `[INST]` and other prompt scaffolding markers
`SEC‑27`	LanguageSwitchAttack	Rule-based	Abrupt script/language switch mid-response — injection vector via non-Latin text
`SEC‑28`	RefusalBypass	Rule-based	Model complied with a request it should have refused (caller-supplied forbidden patterns)
`SEC‑19`	ToolCallFrequency	Rule-based	Counts `ChatRole.Tool` messages; flags sessions with excessive tool invocations
`SEC‑21`	ExcessiveAgency	Semantic	Detects autonomous-action language ("I deleted", "I deployed", "I executed")
`SEC‑22`	HumanTrustManipulation	Semantic	Spots rapport/authority manipulation ("you can trust me", "I am your advisor")
`SEC‑29`	OutputSchema	Rule-based	Response doesn't deserialize as the caller-supplied `ExpectedResponseType` (OWASP LLM05)
`SEC‑30`	ShorthandEmergence	Semantic	Counts unknown all-caps tokens that may signal emergent covert language
`SEC‑31`	VectorRetrievalPoisoning	Semantic	Detects malicious instructions embedded in RAG-retrieved document chunks (OWASP LLM08)

Hallucination (9)

ID	Detector	Type	Detects
`HAL‑01`	PhantomCitation	Rule-based	Fake DOIs, arXiv IDs, `.invalid`/`.nonexistent` domains
`HAL‑02`	SelfConsistency	Rule-based	Numeric inconsistency (values differing by >10×)
`HAL‑03`	CrossAgentContradiction	Semantic	Contradictions between agents in a multi-agent session
`HAL‑04`	SourceGrounding	Semantic	Claims unsupported by provided context
`HAL‑05`	ConfidenceDecay	Semantic	Confidence degradation across turns
`HAL‑06`	StaleKnowledge	Semantic	Time-sensitive facts stated as current ("the latest version is X", "the current CEO is Y")
`HAL‑07`	IntraSessionContradiction	Semantic	Model contradicts itself within the same conversation
`HAL‑08`	GroundlessStatistic	Rule-based	Specific percentages or statistics asserted without any source in the provided context
`HAL‑09`	UncertaintyPropagation	Semantic	Flags hedged statements that contradict a definitive assertion in the same response

Operational (15)

ID	Detector	Type	Detects
`OPS‑01`	BlankResponse	Rule-based	Empty or whitespace-only responses
`OPS‑02`	RepetitionLoop	Rule-based	Same sentence repeated 3+ times
`OPS‑03`	IncompleteCodeBlock	Rule-based	Unclosed code fences
`OPS‑04`	PlaceholderText	Rule-based	`TODO`, `[INSERT HERE]`, `Lorem ipsum` leftovers
`OPS‑05`	ContextCollapse	Semantic	Loss of conversational context across turns
`OPS‑06`	AgentProbing	Semantic	Attempts to map agent capabilities or system prompt
`OPS‑07`	QueryIntent	Semantic	Malicious intent hidden in benign-looking queries
`OPS‑08`	ResponseCoherence	Semantic	Response that doesn't address the question asked
`OPS‑09`	TruncatedOutput	Rule-based	Detects mid-sentence truncation and unclosed code fences
`OPS‑10`	WaitingForContext	Semantic	Finds stall phrases when the user prompt was substantive
`OPS‑11`	UnboundedConsumption	Rule-based	Compares response length to prompt length; flags unbounded expansion
`OPS‑12`	SemanticRepetition	Semantic	Same idea restated with different wording — extends RepetitionLoop beyond literal string matching
`OPS‑13`	PersonaDrift	Semantic	Model's tone, persona, or stated identity shifts significantly across turns — context poisoning signal
`OPS‑14`	Sycophancy	Semantic	Model reverses a stated position purely because the user pushed back — epistemic cowardice
`OPS‑15`	WrongLanguage	Rule-based	Response language doesn't match the user's language (script/charset detection)

Semantic detectors are no-ops until opts.EmbeddingGenerator is configured. They use embedding cosine similarity and are language-agnostic — no LLM round-trip required.

LLM escalation detectors are no-ops until opts.EscalationClient is configured. Set it to a cheap fast model (e.g. GPT-4o-mini) to activate them.

Streaming: GetStreamingResponseAsync buffers the complete response before yielding tokens so the response scan can quarantine before any token reaches the application. Time-to-first-token equals full model response latency on this path.

OWASP LLM Top 10 (2025) Coverage

OWASP	Threat	Detectors
LLM01	Prompt Injection	`PromptInjectionDetector`, `IndirectInjectionDetector`, `ToolPoisoningDetector`
LLM02	Sensitive Info Disclosure	`CredentialExposureDetector`, `PiiLeakageDetector`, `SystemPromptLeakageDetector`, `PromptTemplateLeakageDetector`
LLM03	Supply Chain	`SupplyChainPoisoningDetector`
LLM04	Data & Model Poisoning	`DataExfiltrationDetector`, `InformationFlowDetector`
LLM05	Improper Output Handling	`CodeInjectionDetector`, `OutputSchemaDetector`
LLM06	Excessive Agency	`ExcessiveAgencyDetector`, `ToolCallFrequencyDetector`
LLM07	System Prompt Leakage	`SystemPromptLeakageDetector`, `GovernanceGapDetector`
LLM08	Vector & Embedding Weaknesses	`VectorRetrievalPoisoningDetector`
LLM09	Misinformation	`PhantomCitationDetector`, `GroundlessStatisticDetector`, `StaleKnowledgeDetector`, `UncertaintyPropagationDetector`
LLM10	Unbounded Consumption	`UnboundedConsumptionDetector`, `RepetitionLoopDetector`

Tool-Call Authorization

AI.Sentinel ships with IToolCallGuard — a preventive control evaluated before every tool call across all four surfaces. Decision model is binary Allow | Deny. Same policy abstraction (IAuthorizationPolicy) as planned ZeroAlloc.Mediator.Authorization.

[AuthorizationPolicy("admin-only")]
public sealed class AdminOnlyPolicy : IAuthorizationPolicy
{
    public bool IsAuthorized(ISecurityContext ctx) => ctx.Roles.Contains("admin");
}

services.AddSingleton<IAuthorizationPolicy, AdminOnlyPolicy>();
services.AddAISentinel(opts =>
{
    opts.RequireToolPolicy("Bash",       "admin-only");
    opts.RequireToolPolicy("delete_*",   "admin-only");
    opts.DefaultToolPolicy = ToolPolicyDefault.Allow; // default
});

builder.Services.AddChatClient(pipeline =>
    pipeline.UseAISentinel()
            .UseToolCallAuthorization()
            .UseFunctionInvocation()
            .Use(new OpenAIChatClient(...)));

Surface	Caller resolution default	Deny semantics
In-process	`IServiceProvider.GetService<ISecurityContext>()` → Anonymous	throw `ToolCallAuthorizationException`
Claude Code	`HookConfig.CallerContextProvider` → Anonymous	`HookOutput(Block, reason)`
Copilot	`CopilotHookConfig.CallerContextProvider` → Anonymous	`HookOutput(Block, reason)`
MCP proxy	DI provider → `SENTINEL_MCP_CALLER_ID/_ROLES` env → Anonymous	`McpProtocolException(InvalidRequest, reason)`

Default behaviour: if no policies are registered, every call is allowed (drop-in upgrade).

Prompt hardening (OWASP LLM01 — preventive)

SentinelOptions.SystemPrefix prepends a hardening system message to every outbound chat call, telling the model to treat retrieved/external content as data, not instructions. Detection still runs on the user's raw prompt; the model receives the hardened version. The caller's ChatMessage collection is never mutated — a hardened copy is forwarded to the inner client.

services.AddAISentinel(opts =>
{
    // First-line OWASP LLM01 mitigation. English default; override for other languages.
    opts.SystemPrefix = SentinelOptions.DefaultSystemPrefix;
});

Default behaviour: SystemPrefix == null (no hardening) — opt-in, drop-in upgrade for existing AI.Sentinel users. If the caller's messages already start with a system message, the prefix is merged into it as "{SystemPrefix}\n\n{original system text}" — single system message preserved.

Audit storage + forwarding

AI.Sentinel ships with two related capabilities for audit data:

Storage (IAuditStore) — singular, queryable, source of truth. Default is in-memory ring buffer; opt into SQLite for persistence across restarts.
Forwarding (IAuditForwarder) — plural, fire-and-forget, mirrors every audit entry to one or more external systems (NDJSON file, Azure Sentinel, OpenTelemetry).

Default behaviour (no extra registration): in-memory ring buffer + zero forwarders. Existing AI.Sentinel users see no behaviour change.

Persistent storage with SQLite

services.AddAISentinel(opts => { ... });
services.AddSentinelSqliteStore(opts =>
{
    opts.DatabasePath    = "/var/lib/ai-sentinel/audit.db";
    opts.RetentionPeriod = TimeSpan.FromDays(90); // optional time-based cleanup
});

Single-file SQLite DB. WAL mode enabled (concurrent reads while writer active). Hash chain survives restarts. Last-registration-wins for IAuditStore.

Forwarding to external systems

Forwarders are fire-and-forget — never block the proxy, never throw. Failures swallow + log to stderr + increment audit.forward.dropped counter.

// NDJSON file (in core, zero dependencies — direct file append, no buffering)
services.AddSentinelNdjsonFileForwarder(opts =>
    opts.FilePath = "/var/log/ai-sentinel/audit.ndjson");

Operators ship the NDJSON file via Filebeat / Vector / Fluent Bit — universal coverage.

// Azure Sentinel (auto-wrapped with BufferingAuditForwarder<T>)
services.AddSentinelAzureSentinelForwarder(opts =>
{
    opts.DcrEndpoint    = new Uri("https://my-dce.westus2.ingest.monitor.azure.com");
    opts.DcrImmutableId = "dcr-abc123";
    opts.StreamName     = "Custom-AISentinelAudit_CL";
    // opts.Credential default = new DefaultAzureCredential()
});

Direct Logs Ingestion API. Static-token auth supported via DCR; OAuth2 / mTLS not in v1 (see backlog). Requires DCR + custom table set up in your Log Analytics workspace.

// OpenTelemetry (vendor-neutral; OTel SDK handles batching)
services.AddSentinelOpenTelemetryForwarder();
services.AddOpenTelemetry().WithLogging(b => b.AddOtlpExporter());

Routes to any OTLP-speaking backend: Splunk, Datadog, Elastic, NewRelic, more. Uses your existing OTel logging pipeline.

Buffering decorator

AzureSentinelAuditForwarder is automatically wrapped — per-entry HTTP roundtrips would crater throughput. Default buffering: batch=100, interval=5s, channel capacity=10000. Drops on overflow with rate-limited stderr log + audit.forward.dropped counter for monitoring. Override via .WithBuffering(...) in the future (currently a v1.1 backlog item).

NdjsonFileAuditForwarder and OpenTelemetryAuditForwarder are NOT auto-buffered — direct file append is already fast, and the OTel SDK does its own BatchLogRecordExportProcessor batching.

New packages

Package	Purpose	Dependencies
`AI.Sentinel.Sqlite`	Persistent `SqliteAuditStore`	`Microsoft.Data.Sqlite`
`AI.Sentinel.AzureSentinel`	`AzureSentinelAuditForwarder`	`Azure.Monitor.Ingestion`, `Azure.Identity`
`AI.Sentinel.OpenTelemetry`	`OpenTelemetryAuditForwarder`	`OpenTelemetry`, `Microsoft.Extensions.Logging.Abstractions`

Configuration

builder.Services.AddAISentinel(opts =>
{
    // Action per severity level
    opts.OnCritical = SentinelAction.Quarantine;  // throws SentinelException
    opts.OnHigh     = SentinelAction.Alert;        // publishes mediator notification + alert sink
    opts.OnMedium   = SentinelAction.Log;          // logs via ILogger
    opts.OnLow      = SentinelAction.Log;
    // opts.OnLow   = SentinelAction.PassThrough;  // silent

    // Optional: embedding provider for 38 semantic detectors (language-agnostic detection)
    opts.EmbeddingGenerator = new OpenAIEmbeddingGenerator(...);
    // Optional: custom embedding cache (default: in-memory LRU, 1 024 entries)
    // options.EmbeddingCache = new MyRedisEmbeddingCache(...);

    // Optional: LLM second-pass classifier for 2 stub detectors (ToolDescriptionDivergenceDetector)
    opts.EscalationClient = new OpenAIChatClient("gpt-4o-mini", ...);

    // Audit ring buffer size (in-process, no external store required)
    opts.AuditCapacity = 10_000; // default

    // Agent identity labels for audit entries
    opts.DefaultSenderId   = new AgentId("web-user");
    opts.DefaultReceiverId = new AgentId("assistant");

    // Optional: POST alert payloads to a webhook on Quarantine/Alert actions
    opts.AlertWebhook = new Uri("https://hooks.example.com/sentinel");

    // Optional: suppress repeat alerts for the same detector+session.
    // null (default) suppresses for the entire session lifetime.
    // Set a TimeSpan to re-alert after the window expires.
    opts.AlertDeduplicationWindow = TimeSpan.FromMinutes(5);

    // Optional: per-session token-bucket circuit breaker.
    // MaxCallsPerSecond = steady-state refill rate; BurstSize = initial token count.
    // Pass "sentinel.session_id" in ChatOptions.AdditionalProperties for per-user buckets.
    // Without a session key, all calls share a global bucket.
    opts.MaxCallsPerSecond = 5;   // allow 5 calls/sec per session (steady state)
    opts.BurstSize = 20;          // up-front burst before throttling kicks in

    // Optional: inactivity window after which per-session dedup + rate-limiter
    // state is evicted. Applies to both AlertDeduplicationWindow's _seen
    // dictionary and the rate-limiter bucket map. Default: 1 hour.
    // Lower this for high-cardinality session keys (many unique users),
    // raise it for long-lived sessions.
    opts.SessionIdleTimeout = TimeSpan.FromHours(1);

    // Optional: validate structured LLM responses against a caller-supplied type (SEC-29).
    // The type must be annotated with [ZeroAllocSerializable(SerializationFormat.SystemTextJson)].
    // Requires calling services.AddSerializerDispatcher() (from ZeroAlloc.Serialisation).
    opts.ExpectedResponseType = typeof(MyResponse);
});

Actions

`SentinelAction`	Behaviour
`Quarantine`	Throws `SentinelException` with full `PipelineResult`. Stops the call. Also fires the alert sink.
`Alert`	Publishes `ThreatDetectedNotification` + `InterventionAppliedNotification` via `IMediator`. Fires the alert sink. Call continues.
`Log`	Writes to `ILogger<InterventionEngine>`. Call continues.
`PassThrough`	No action. Detections are still audited.

Alert sink behaviour: when opts.AlertWebhook is set, WebhookAlertSink POSTs a JSON payload (type, severity, detector, reason, action, session) to the configured URL on Quarantine or Alert actions. The DeduplicatingAlertSink wraps the webhook sink and suppresses repeat alerts for the same detector+session, controlled by opts.AlertDeduplicationWindow.

Embedding cache

Scan-time embeddings are cached by default in a 1 024-entry in-memory LRU store (InMemoryLruEmbeddingCache). The cache is keyed by input text and avoids redundant API calls for repeated messages in the same process.

To use a persistent or shared cache, implement IEmbeddingCache and set it on SentinelOptions:

options.EmbeddingCache = new MyRedisEmbeddingCache(redis, ttl: TimeSpan.FromHours(1));

Tuning individual detectors

Use opts.Configure<T>(c => ...) to disable a detector or clamp its severity output:

services.AddAISentinel(opts =>
{
    // Disable a detector entirely — zero CPU cost, no audit entries
    opts.Configure<WrongLanguageDetector>(c => c.Enabled = false);

    // Elevate any firing of JailbreakDetector to at least High
    opts.Configure<JailbreakDetector>(c => c.SeverityFloor = Severity.High);

    // Cap a noisy detector's output to Low
    opts.Configure<RepetitionLoopDetector>(c => c.SeverityCap = Severity.Low);
});

Floor and Cap apply only to firing results — Clean results pass through unchanged (no fabricated findings). Multiple Configure<T> calls for the same detector merge by mutation, so base configuration and per-environment overrides compose naturally.

Dashboard

Mount the built-in dashboard with one line:

app.UseAISentinel("/ai-sentinel");

// Protect it with your own middleware:
app.UseAISentinel("/ai-sentinel", branch =>
    branch.Use(RequireInternalNetwork));

The dashboard shows:

Threat Risk Score — live ring gauge (0–100, SAFE / WATCH / ALERT / ISOLATE)
Live event feed — every detection with severity badge, detector ID, reason, and session-id drill-down link
Severity trend chart — 15-minute rolling sparkline, stroke colour modulates by max severity
Category chips — filter by Security / Hallucination / Operational / Authorization (DetectorId prefix)
Live free-text search — case-insensitive substring match against the Reason column, 300 ms debounced
Per-session timeline drill-down — click any session id in the feed → URL gains ?session=<id> → feed and export filter to that session, with a clearable pill above the table
NDJSON export — 📥 Export button downloads the current filtered view as NDJSON (/api/export.ndjson honors all active filters; X-Sentinel-Truncated: true header signals if the store cap was hit)
Auto dark mode — pure-CSS prefers-color-scheme switch; severity / authz accents preserved across themes
Detector hit stats — which detectors fire most

All filter parameters intersect: chips ∧ session ∧ search. The URL captures full operator state — every view is shareable and bookmarkable.

Looks like

Desktop (light)	Filtered drill-down	Desktop (dark)

Tablet	Mobile

No npm, no JS build step — served entirely from embedded resources using HTMX + SSE + raw CSS.

Events / Mediator integration

If your DI container has an IMediator (ZeroAlloc.Mediator, MediatR-compatible), AI.Sentinel publishes two notification types on Alert-level events:

// Fired when a threat is detected
readonly record struct ThreatDetectedNotification(
    SessionId      SessionId,
    AgentId        SenderId,
    AgentId        ReceiverId,
    PipelineResult PipelineResult,
    DateTimeOffset DetectedAt);

// Fired when an intervention is applied
readonly record struct InterventionAppliedNotification(
    SessionId      SessionId,
    SentinelAction Action,
    Severity       Severity,
    string         Reason,
    DateTimeOffset AppliedAt);

CLI: `sentinel` (offline replay)

AI.Sentinel.Cli is a dotnet tool that replays saved conversations through the full detector pipeline — useful for incident forensics, CI regression testing, and detector tuning.

dotnet tool install -g AI.Sentinel.Cli
sentinel scan conversation.json

Accepts OpenAI Chat Completion JSON ({"messages": [...]}) or AI.Sentinel audit NDJSON. Auto-detects by default.

sentinel scan conversation.json
  [--format <openai|audit|auto>]                 # default: auto
  [--output <text|json>]                          # default: text
  [--expect <detectorId>]                         # repeatable, e.g. --expect SEC-01
  [--min-severity <Low|Medium|High|Critical>]
  [--baseline <prior-result.json>]                # diff against a prior run

Exit codes: 0 scan completed (no failing assertions), 1 assertion failed or baseline regression, 2 I/O or parse error.

The CLI's core types — SentinelReplayClient, ConversationLoader, ReplayRunner, ReplayResult — are all public, so callers can reference AI.Sentinel.Cli programmatically from their own xUnit tests to assert detection behavior on saved conversations.

IDE / Agent integration

AI.Sentinel ships native hook adapters for the two major AI coding agents that support out-of-process hook scripts. Both adapters read hook payloads from stdin, run the detector pipeline, and signal block/warn/allow via exit code + stdout JSON.

Claude Code

dotnet tool install -g AI.Sentinel.ClaudeCode.Cli

Add to ~/.claude/settings.json (or your project's .claude/settings.json):

{
  "hooks": {
    "UserPromptSubmit": [
      { "hooks": [{ "type": "command", "command": "sentinel-hook user-prompt-submit" }] }
    ],
    "PreToolUse": [
      { "matcher": "*", "hooks": [{ "type": "command", "command": "sentinel-hook pre-tool-use" }] }
    ],
    "PostToolUse": [
      { "matcher": "*", "hooks": [{ "type": "command", "command": "sentinel-hook post-tool-use" }] }
    ]
  }
}

GitHub Copilot

dotnet tool install -g AI.Sentinel.Copilot.Cli

Add to your repo's hooks.json (per Copilot hook documentation):

{
  "version": 1,
  "hooks": {
    "userPromptSubmitted": [
      { "type": "command", "bash": "sentinel-copilot-hook user-prompt-submitted", "timeoutSec": 10 }
    ],
    "preToolUse": [
      { "type": "command", "bash": "sentinel-copilot-hook pre-tool-use", "timeoutSec": 10 }
    ],
    "postToolUse": [
      { "type": "command", "bash": "sentinel-copilot-hook post-tool-use", "timeoutSec": 10 }
    ]
  }
}

MCP proxy

For any MCP-speaking agent (Cursor, Continue, Cline, Windsurf, Copilot's MCP path), install the proxy and point your MCP host at it instead of at the target server directly:

dotnet tool install -g AI.Sentinel.Mcp.Cli

Example entry in your MCP host config (mcpServers block or equivalent):

{
  "mcpServers": {
    "filesystem-guarded": {
      "command": "sentinel-mcp",
      "args": ["proxy", "--target", "uvx", "mcp-server-filesystem", "/home/me"],
      "env": {
        "SENTINEL_HOOK_ON_CRITICAL": "Block",
        "SENTINEL_MCP_DETECTORS": "security"
      }
    }
  }
}

The proxy spawns the target command as a subprocess, intercepts tools/call and prompts/get, scans through the Sentinel detector pipeline, and blocks threats with a JSON-RPC error to the host.

Additional env vars (beyond the shared SENTINEL_HOOK_ON_* table above):

Variable	Default	Values
`SENTINEL_MCP_DETECTORS`	`security`	`security` (9 regex security detectors) or `all` (every detector — higher false-positive rate on structured data)
`SENTINEL_MCP_MAX_SCAN_BYTES`	`262144`	Truncation cap on tool-result text passed to the detector pipeline. Counts UTF-8 bytes (see v1.1 note below). Full content still forwarded to the host.

MCP proxy v1.1

In addition to tools/call and prompts/get, the proxy now intercepts resources/read, mirrors target server capabilities (only advertises tools / prompts / resources to the host when the upstream target advertises them), and supports HTTP transports alongside stdio.

New environment variables:

Variable	Default	Purpose
`SENTINEL_MCP_SCAN_MIMES`	`text/,application/json,application/xml,application/yaml`	MIME allowlist for `resources/read` scanning. Comma-separated. A trailing `/` matches any subtype (e.g. `text/` matches `text/plain` and `text/html`). Resources outside the allowlist forward verbatim without scanning.
`SENTINEL_MCP_HTTP_HEADERS`	(none)	`key=value;key=value` headers applied to every HTTP-transport request. Use for static-token auth (e.g. `Authorization=Bearer xyz`). Malformed pairs are skipped silently.
`SENTINEL_MCP_TIMEOUT_SEC`	`5`	Subprocess shutdown grace in seconds. After this window the proxy logs `transport_dispose action=grace_expired` and returns; the MCP host's own kill policy is the second line of defence.
`SENTINEL_MCP_LOG_JSON`	(off)	Set to `1` for NDJSON stderr output. Default is `key=value` lines. Useful when piping proxy logs into a log aggregator.

New CLI flags:

sentinel-mcp proxy [--on-critical Block|Warn|Allow]
                   [--on-high     Block|Warn|Allow]
                   [--on-medium   Block|Warn|Allow]
                   [--on-low      Block|Warn|Allow]
                   --target /path/to/server arg1 ...        # stdio mode (existing)
sentinel-mcp proxy [...flags...] --target https://example.com/mcp   # HTTP mode (new)

Precedence: CLI flag > SENTINEL_MCP_ON_* env var > existing shared SENTINEL_HOOK_ON_* env var > default. The shared SENTINEL_HOOK_ON_* env vars continue to work unchanged.

When --target starts with http:// or https:// the proxy uses HttpClientTransport (Streamable HTTP with automatic SSE fallback) instead of spawning a subprocess. Combine with SENTINEL_MCP_HTTP_HEADERS for token auth.

Auth scope: SENTINEL_MCP_HTTP_HEADERS covers static-token auth only (bearer tokens, API keys, tenant headers). OAuth2 flows and mTLS client certificates are not supported in v1.1 — see the deferred items in docs/BACKLOG.md if you need them.

Behaviour change in v1.1: SENTINEL_MCP_MAX_SCAN_BYTES now counts UTF-8 bytes (was a char count, which double-counted for multi-byte characters). ASCII content is unchanged; emoji / CJK / accented text reaches the cap sooner.

Severity → action mapping

Both adapters share the same env-var contract — configure once, applies to both:

Variable	Default	Values
`SENTINEL_HOOK_ON_CRITICAL`	`Block`	`Block` / `Warn` / `Allow`
`SENTINEL_HOOK_ON_HIGH`	`Block`	`Block` / `Warn` / `Allow`
`SENTINEL_HOOK_ON_MEDIUM`	`Warn`	`Block` / `Warn` / `Allow`
`SENTINEL_HOOK_ON_LOW`	`Allow`	`Block` / `Warn` / `Allow`
`SENTINEL_HOOK_VERBOSE`	`false`	`1` / `true` / `yes` → emit a one-line diagnostic to stderr on every invocation

Block → hook exits 2, which both Claude Code and Copilot surface as "call blocked" with the detector ID + reason on stderr. Warn → exit 0 with the reason on stderr (visible in the agent's log). Allow → silent pass.

Diagnostics: did the hook fire?

Set SENTINEL_HOOK_VERBOSE=1 to emit a grep-friendly one-liner to stderr on every invocation, including Allow outcomes:

[sentinel-hook] event=user-prompt-submit decision=Allow session=sess-42
[sentinel-hook] event=user-prompt-submit decision=Block detector=SEC-01 severity=Critical session=sess-42

Useful when wiring the hook for the first time ("did it run?") or when a block was expected but didn't happen. Leave off in steady state — the normal Block/Warn reason is already on stderr.

Native binary (optional, faster cold start)

The hook CLIs are Native-AOT ready. Both .csproj files gate PackAsTool behind PublishAot, so a single source tree produces either the dotnet tool NuGet package or a ~6.5 MB single-file native binary, depending on how you publish it.

dotnet publish src/AI.Sentinel.ClaudeCode.Cli -c Release -r win-x64 -p:PublishAot=true

Replace win-x64 with linux-x64, osx-arm64, etc. Output lands under bin/Release/net8.0/<rid>/publish/sentinel-hook[.exe]. Expect ~10× faster cold start than the dotnet-tool entry point — worth it if the hook fires on every tool call. Point the agent's hook command at the binary's full path instead of sentinel-hook.

Prereqs on Windows: the Visual Studio "Desktop development with C++" workload (for link.exe + Windows SDK). Linux needs clang/libc dev packages; macOS needs Xcode CLT.

Programmatic use

The underlying libraries (AI.Sentinel.ClaudeCode and AI.Sentinel.Copilot) expose HookAdapter / CopilotHookAdapter and the vendor-agnostic HookPipelineRunner as public types. Reference the library packages (not the .Cli tool packages) to write your own host integration in C#.

OpenTelemetry

AI.Sentinel emits metrics and traces out of the box via the ai.sentinel meter and activity source. Wire them into your existing OTel pipeline with the standard .NET instrumentation API:

builder.Services.AddOpenTelemetry()
    .WithMetrics(m => m.AddMeter("ai.sentinel"))
    .WithTracing(t => t.AddSource("ai.sentinel"));

Metrics

Metric	Type	Description
`sentinel.scans`	Counter	Total pipeline scans executed
`sentinel.scan.ms`	Histogram	Pipeline scan duration in milliseconds
`sentinel.threats`	Counter	Threats detected (tagged by `severity` and `detector`)
`sentinel.alerts.suppressed`	Counter	Alerts suppressed by the deduplication window (tagged by `detector`)
`sentinel.rate_limit.exceeded`	Counter	Calls rejected by the per-session rate limiter (tagged by `session`)

Traces

Each GetResponseAsync / GetStreamingResponseAsync call produces a sentinel.scan span (one per direction — prompt and response) with the following attributes:

Attribute	Description
`sentinel.severity`	Max severity found in this scan
`sentinel.is_clean`	`true` when no threats were detected
`sentinel.threat_count`	Number of distinct detector hits
`sentinel.top_detector`	ID of the highest-severity detector that fired

All metrics and spans are zero-cost when no listener is registered — there is no overhead when OTel is not configured.

Audit store

All detections (regardless of severity) are written to a ring buffer audit store in process memory. Capacity defaults to 10,000 entries; oldest entries are overwritten when full.

Query the store directly:

var store = app.Services.GetRequiredService<IAuditStore>();

await foreach (var entry in store.QueryAsync(new AuditQuery
{
    MinSeverity = Severity.Medium,
    From        = DateTimeOffset.UtcNow.AddHours(-1),
    PageSize    = 100
}, CancellationToken.None))
{
    Console.WriteLine($"{entry.Timestamp:HH:mm:ss} [{entry.Severity}] {entry.DetectorId}: {entry.Reason}");
}

Benchmarks

All measurements: .NET 9.0.15, Windows 11, Release, Job.Default, MemoryDiagnoser + ThreadingDiagnoser.

Individual detectors

Scenario	Mean	Allocated
`PromptInjection` — clean input	~59 ns	0 B
`PromptInjection` — malicious input	~231 ns	480 B
`RepetitionLoopDetector` — clean input	~106 ns	296 B

Detection pipeline (DetectionPipeline.RunAsync, no-op inner client)

Detector set	Input	Mean	Allocated
Empty (baseline)	clean	~16 ns	32 B
Security-only (13 detectors)	clean	~958 ns	472 B
Security-only (13 detectors)	malicious	~2,388 ns	2,616 B
All detectors (25 rule-based)	clean	~1,855 ns	1,568 B
All detectors (25 rule-based)	malicious	~3,462 ns	4,008 B

End-to-end (SentinelChatClient.GetResponseAsync, no-op inner client, single short message)

Detector set	Input	Mean	Allocated
Empty	clean	~994 ns	1.24 KB
Security-only	clean	~2,636 ns	2.26 KB
All detectors	clean	~6,268 ns	4.53 KB
All detectors	malicious	~8,653 ns	7.25 KB

Audit store

Scenario	Mean	Allocated
Sequential append	~118 ns	0 B
8 concurrent appends	~1,468 ns	400 B

E2E numbers exclude real LLM latency (measured against a no-op inner client). Add your model's round-trip time on top.

Run the full suite yourself:

dotnet run --project benchmarks/AI.Sentinel.Benchmarks -c Release -- --filter "*"