Agentic Chatops
3-tier agentic ChatOps (n8n + GPT-4o + Claude Code) implementing all 21 patterns from "Agentic Design Patterns" β solo operator managing 137 devices
Ask AI about Agentic Chatops
Powered by Claude Β· Grounded in docs
I know everything about Agentic Chatops. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
agentic-chatops
AI agents that triage infrastructure alerts, investigate root causes, and propose fixes β while a solo operator sleeps.
For the complete technical reference, see README.extensive.md.

The Problem
One person. 310+ infrastructure objects across 6 sites. 3 firewalls, 12 Kubernetes nodes, self-hosted everything. When an alert fires at 3am, there's no team to call. There never is.
The Solution
Three agentic subsystems that handle the detective work β ChatOps (infrastructure), ChatSecOps (security), ChatDevOps (CI/CD) β built on n8n orchestration, Matrix as the human interface, and a 3-tier agent architecture. The human stays in the loop for every infrastructure change. The system never acts without a thumbs-up or poll vote.
What Makes This Different
Self-Improving Prompts β now with A/B trials (nobody else does this)
The system evaluates its own performance and auto-patches its prompts. Every session is scored by an LLM-as-a-Judge on 5 quality dimensions (gemma3:12b local-first since 2026-04-19, Haiku for calibration). When a dimension averages below threshold over 30 days, the preference-iterating patcher (IFRNLLEI01PRD-645, 2026-04-20) generates 3 candidate instruction variants (concise / detailed / examples) and assigns each future matching session to one arm via deterministic BLAKE2b hash β plus a no-patch control. A daily cron runs a one-sided Welch t-test once every arm reaches 15 samples; the winner is promoted only if it beats control by β₯ 0.05 points with p < 0.1. Otherwise the trial is aborted. Prompt-level policy iteration β no model weights are ever fine-tuned.
Session β LLM Judge (5 dims) β dimension trending below threshold
β prompt-patch-trial.py generates 3 candidate variants + 1 control
β future sessions hash-routed to arms β Welch t-test at 15+ samples/arm
β winner promoted to config/prompt-patches.json (source: "trial:N:idx=I")
β next eval cycle scores the new patch β loop continues
AI Planner Wired to Proven Ansible Playbooks
Before Claude Code investigates, a Haiku planner generates a 3-5 step investigation plan. The planner queries AWX for matching Ansible playbooks from 41 proven templates (maintenance, cert sync, K8s drain, PVE updates, DMZ deployments). Plans naturally include "Run AWX Template 64 with dry_run=true" as remediation steps β bridging AI reasoning with proven automation.
Predictive Alerting
Instead of only reacting after alerts fire, the system queries LibreNMS API daily for trending risk across both sites. Devices are scored on disk usage trends, alert frequency, and health signals. A daily top-10 risk report posts to Matrix before problems become incidents.
5-Signal RAG + GraphRAG + Staleness + Temporal Filter + mtime-Sort
Retrieval uses Reciprocal Rank Fusion across 5 signals (semantic + keyword + compiled wiki + MemPalace transcripts + chaos baselines), plus a GraphRAG knowledge graph (360 entities, 193 relationships). Retrieval short-circuits via two intent detectors: temporal window ("last 48h", "72 hours ending YYYY-MM-DD") filters wiki on source_mtime, and mtime-sort intent ("name any three memory files created in the last 48h") bypasses semantic retrieval entirely and returns an mtime-ranked window. Results older than 7 days get age-proportional staleness warnings. A Haiku synth step composes cross-chunk answers when top rerank < threshold (3-4Γ faster p95 than the Ollama ensemble). SYNTH_HAIKU_FORCE_FAIL env supports 5 failure modes (429 / auth / timeout / network / empty) that all fall back cleanly to local qwen2.5.
Karpathy-Style Compiled Knowledge Base
Following Andrej Karpathy's LLM Knowledge Bases pattern: raw data from 7+ sources (117 memory files, 55 CLAUDE.md files, 33 incidents, 27 lessons, 101 OpenClaw memories, 17 skills, ~5,200 lab docs) is compiled into a browsable 44-article wiki with auto-maintained indexes, daily SHA-256 incremental recompilation, and contradiction detection. All articles embedded into RAG as the 3rd fusion signal.
Full Observability Stack with OTel
88,448 tool calls instrumented across 108 tool types with per-tool error rates and latency percentiles. 39K OTel spans across 94 traces exported to OpenObserve (OTLP). 10 Grafana dashboards (64+ panels) covering ChatOps, ChatSecOps, ChatDevOps, and trace analysis. 18,220 infrastructure commands logged across 232 devices.
Formal Evaluation Pipeline
58 scenarios across 3 eval sets (22 regression + 20 discovery + 16 holdout) + 54 adversarial red-team tests. Prompt Scorecard grades 19 surfaces daily on 6 dimensions. Agent Trajectory scoring on 8 infra / 4 dev steps. A/B variant testing (react_v1 vs react_v2). CI eval gate blocks bad merges. Monthly eval flywheel cycle.
Structured Agentic Substrate β 9 adoptions from the OpenAI Agents SDK
The 2026-04-20 audit of openai/openai-agents-python flagged 11 gaps; 9 were implemented (issues IFRNLLEI01PRD-635..643). The system now has a versioned, typed, recoverable substrate the old string-based Matrix pipeline couldn't offer:
- Schema versioning on 9 session/audit tables + a central registry (
scripts/lib/schema_version.py) mirroring the SDK'sRunState.CURRENT_SCHEMA_VERSION/SCHEMA_VERSION_SUMMARIESpattern. Writers stampschema_version=CURRENT; readerscheck_row()fail-fast on future versions. - 13 typed events (
session_events.py) in a newevent_logtable βtool_started/ended,handoff_requested/completed/cycle_detected/compaction,reasoning_item_created,mcp_approval_*,agent_updated,message_output_created,tool_guardrail_rejection,agent_as_tool_call. Replaces free-form Matrix strings with Grafana-queryable structured telemetry. - Per-turn lifecycle hooks β
session-start.sh,post-tool-use.sh,user-prompt-submit.sh,session-end.sh(new β theon_final_outputequivalent) feeding asession_turnstable with per-turn cost, tokens, duration, tool count. - 3-behavior tool-guardrail taxonomy (
allow/reject_content/deny) inunified-guard.sh+audit-bash.sh+protect-files.sh.reject_contentsends Claude a retry hint instead of a wall;denyhard-halts. Every rejection is a typed event. HandoffInputDataenvelope (scripts/lib/handoff.py) β zlib-compressed base64 payload carryinginput_history,pre_handoff_items,new_items,run_context. 176 KB history β 752 B on the wire (0.43% ratio). Eliminates the "re-derive context via RAG" cost on escalation.- Transcript compaction (
scripts/compact-handoff-history.py) β opt-in per escalation. Localgemma3:12bwith Haiku fallback; circuit-breaker aware. - Agent-as-tool wrapper (
scripts/agent_as_tool.py) β wraps the 10 sub-agent definitions as callable tools so the orchestrator LLM can conditionally invoke them in the ambiguous-risk (0.4β0.6) band, complementing our deterministic routing. - Handoff depth counter + cycle detection (
scripts/lib/handoff_depth.py) βhandoff_depth >= 5forces[POLL];>= 10hard-halts; any agent twice in the chain is refused and logged ashandoff_cycle_detected. - Immutable per-turn snapshots (
scripts/lib/snapshot.py) β a snapshot is captured BEFORE each mutating tool call (Bash,Edit,Write,Task; read-only tools skipped);rollback_to(id)restores any priorsessionsrow. 7-day retention.
Four new SQLite tables (event_log, handoff_log, session_state_snapshot, session_turns) bring the total to 35. Migrations 006β011 apply idempotently on both fresh and legacy DBs. Two follow-ups since then β the A/B prompt patcher (IFRNLLEI01PRD-645, prompt_patch_trial + session_trial_assignment) and the CLI-session RAG capture pipeline (-646/-647/-648, no new tables; chunks + tool calls + knowledge rows tagged issue_id='cli-<uuid>' on the existing schema) β bring the live total to 39.
CLI-Session RAG Capture β interactive claude sessions flow into RAG too (2026-04-20)
Before this, only YT-backed Runner sessions had their transcripts/tool-calls/extracted knowledge written into the shared RAG tables. Interactive claude CLI sessions (human-in-the-loop dev work) were only captured by poll-claude-usage.sh for cost/tokens β their content was lost to retrieval.
A 3-tier pipeline (IFRNLLEI01PRD-646/-647/-648) closes the gap. A single cron line chains three idempotent steps over every CLI JSONL:
archive-session-transcript.pychunks exchange pairs βsession_transcripts+nomic-embed-textembeddings + doc-chain refined summary atchunk_index=-1(sessions β₯ 5000 assistant chars).parse-tool-calls.pyextractstool_use/tool_resultpairs βtool_call_log(issue_id resolves tocli-<uuid>via patched path inference).extract-cli-knowledge.pyrunsgemma3:12bin strict-JSON mode over the summary rows βincident_knowledgewithproject='chatops-cli', embedded for retrieval.
Retrieval weights chatops-cli rows at CLI_INCIDENT_WEIGHT=0.75 by default so real infra incidents still win close ties. Byte-offset watermark skips unchanged files. Soak test (10 files): 12 chunks + 245 tool-call rows + 4 knowledge extractions β gemma correctly classified one sample as subsystem=sqlite-schema, tags=[schema, migration, versioning, data] at 0.95 confidence.
Skill Authoring Uplift β 6 dimensions closed vs google/agents-cli (2026-04-23)
A deep audit against google/agents-cli flagged 6 skill-authoring dimensions where we trailed (phase-gate choreography, discoverability, anti-guidance, inline behavioral anti-patterns, governance/versioning, skill index). An 11-commit uplift (IFRNLLEI01PRD-712 umbrella, Phases AβJ) closed every gap. 0 reverts.
- Master phase-gate skill β new
.claude/skills/chatops-workflow/SKILL.mdcodifies the Phase 0β6 incident lifecycle (triage β drift-check β context β propose β approve β execute β post-incident). Force-injected into every Runner session's Build Prompt (marker-delimited for surgical removal; rollback anchor preserved at/tmp/runner-pre-IMMUTABLE.json). - Auto-generated skill index β
scripts/render-skill-index.pyemits a drift-gateddocs/skills-index.mdfrom all SKILL.md + agent frontmatter. Guarded bytest-656-skill-index-fresh.sh, refreshed as a pre-step of the daily 04:30 UTC wiki-compile cron. - Versioned + audited skills β every SKILL.md + agent frontmatter now carries
version: 1.x.0+requires: {bins, env}.scripts/audit-skill-requires.sh+ a Prometheus exporter feed two new alerts (SkillPrereqMissing,SkillMetricsExporterStale).scripts/audit-skill-versions.shwalks git history for body-changed-without-bump cases; semver convention atdocs/runbooks/skill-versioning.md. - Anti-guidance trailing clauses β every primary skill/agent description now ends with "Do NOT use for X (use /other-skill instead)". Measurably reduces over-routing to adjacent-sounding agents.
- Shortcuts-to-Resist tables inlined on 11 agents (46 rows drawn from
memory/feedback_*.mdwith source citations) β behavioral inoculation at the surface where the model is about to act. - Proving-Your-Work directive β new
check_evidence()inscripts/classify-session-risk.pyemits anevidence_missingrisk signal that forces[POLL]when CONFIDENCE β₯ 0.8 but the reply carries no tool output / code fence. Mirrored in the Runner's Prepare Result node to strip unearned[AUTO-RESOLVE]markers and prepend aGUARDRAIL EVIDENCE-MISSING:banner. - User-vocabulary map β
config/user-vocabulary.json(20 entries:"the firewall"βnl-fw01;gr-fw01,"xs4all"β"budget"post-2026-04-21 rename, etc.) scanned by the prompt-submit hook; every match emits a typedvocabularyevent toevent_log.
Scorecard delta: 3.94 β 4.94 average; 13/16 dimensions at 5/5 (was 9/16). Full memo: docs/scorecard-post-agents-cli-adoption.md. E2E hardened in the same batch via a J1βJ5 pass: live vocabulary event captured by firing the real prompt-submit hook, promtool test rules executed inside the live Prometheus pod, force-injection proven by a real Runner session whose first tool call grepped for Phase 0 in the injected skill body.
NVIDIA DLI Cross-Audit + P0+P1 Implementation (2026-04-29)
The 19-transcript NVIDIA Deep Learning Institute Agentic AI Systems course (Vadim Kudlai) was the last major agentic-AI source not yet evaluated against this platform. The 12-dimension cross-audit on 2026-04-29 initially graded the system A (4.4/5.0) β the lowest of any of the 9 sources audited. A same-day implementation of all 7 P0+P1 items lifted it to A+ (4.83/5.0), putting the system at A+ across all 9 sources (aggregate A+ 4.79).
Shipped in 4 commits (G1βG4) under YouTrack umbrella IFRNLLEI01PRD-747 with children -748..-751. Six commits direct-pushed to main, zero reverts. 57/57 new QA tests pass.
- G1 β Long-horizon reasoning replay eval (
scripts/long-horizon-replay.py) replays the 30 longest historical sessions weekly (Mon 05:00 UTC), scoring trace_coherence, tool_efficiency, poll_correctness, cost_per_turn_z. Newlong_horizon_replay_resultstable;LongHorizonReplayStalealert. - G1 β Jailbreak corpus + Greek extension β 39 fixtures across the 5 NVIDIA-DLI-08 vectors (asterisk-obfuscation, persona-shift, retroactive-history-edit, context-injection, lost-in-middle-bait), including 8 Greek operator-language fixtures. Pure-regex
scripts/lib/jailbreak_detector.py; weekly regression cron (Wed 05:00 UTC);JailbreakBypassDetectedalert on any miss. - G2 β Intermediate semantic rail (DARK-FIRST) β
scripts/lib/intermediate_rail.py(heuristic + Ollama dual-backend) inserted as aCheck Intermediate RailCode node between Build Plan and Classify Risk in the Runner workflow (now 50 nodes). Emitsintermediate_rail_checkevent per session;IntermediateRailDriftHighalert at >20% out-of-dist over 24h. Observe-only β does NOT block; soft-gate evaluation deferred β₯7 days post-data. - G2 β Grammar-constrained decoding β JSON Schemas at
scripts/lib/grammars/passed to Ollama via theformatfield whenOLLAMA_USE_GRAMMAR=1(default on). Falls back toformat=jsonon schema rejection. Circuit-breaker semantics preserved. - G3 β Team-formation skill (
.claude/skills/team-formation/SKILL.mdv1.0.0) +scripts/lib/team_formation.pypropose a sub-agent roster per(alert_category, risk_level, hostname). Build Prompt injects a## Team Charter (advisory)section; same JSON emitted asteam_charterevent_log row. KNOWN_AGENTS inventory enforced against.claude/agents/*.md. - G3 β Inference-Time-Scaling explicit budget β
EXTENDED_THINKING_BUDGET_Senv var (+ optional per-category override) drives a## Reasoning BudgetBuild Prompt section;its_budget_consumedevent captures observed turns/thinking_chars at session end. - G4 β Server-side session-replay endpoint β new workflow
claude-gateway-session-replay.json(idlJEGboDYLmx25kBo) ACTIVE. POST/session-replayaccepts{session_id, prompt}, validates format, sqlite3-checks session existence inside the SSH command (the n8n task-runner sandbox blockschild_processin Code nodes), runsclaude -r, returns JSON. HTTP 404 on unknown session, HTTP 400 on malformed input.session_replay_invokedevent.
event_log schema bumped 1 β 4 (13 β 17 event types). 18 β 19 schema-versioned tables. 5 cron entries installed. 5 YouTrack issues all moved to Done via direct REST POST (the tonyzorin/youtrack-mcp:latest container's update_issue_state omits the $type: "StateBundleElement" discriminator β bug documented in memory/feedback_youtrack_mcp_state_bug.md).
Full state-of-the-platform reference: docs/agentic-platform-state-2026-04-29.md.
QA Suite β 411/0 PASS (99.52%), 44 suite files
scripts/qa/run-qa-suite.sh runs 44 suite files (~3β5 min) with JSON scorecard + summary output, guarded by a per-suite QA_PER_SUITE_TIMEOUT wrapper (IFRNLLEI01PRD-724) that caps any slow/wedged suite at 120 s and emits a synthetic FAIL record so the orchestrator never hangs silently:
- Per-issue suites β sanity + QA + integration for every adoption, plus 16 tests for the preference-iterating patcher (-645) and 12 tests for the CLI-session RAG pipeline (-646/-647/-648).
- Writer coverage β every script that
INSERTs into a versioned table is asserted to stampschema_version=1; same for all 5 n8n-workflow INSERT sites. - Pattern-by-pattern coverage β 53 deny-pattern tests + 32 reject-pattern tests.
- Payload shape β every one of the 13 event types round-trips through the CLI + Python paths.
- Concurrent-bump fuzz β 8 parallel
handoff_depth.bump()calls with no-lost-updates assertion. Surfaced and fixed a real race condition. - Mock HTTP server (
scripts/qa/lib/mock_http.py) β stdlib-only fake ollama/anthropic endpoints for testing successful compaction offline. - 6 e2e scenarios β happy path (all 9 adoptions in one flow), cycle prevention, crash + rollback, schema forward-compat, envelope-to-subagent, compaction in handoff.
- Benchmarks β p95 latencies for event emit (111 ms), handoff bump (108 ms), envelope encode (76 ms), snapshot capture (86 ms), unified-guard hook (198 ms), migration on a 10K-row legacy DB (~200 ms).
Architecture
Alert β n8n β OpenClaw (GPT-5.1, 7-21s) β Haiku Planner (+AWX) β Claude Code (Opus 4.6, 5-15min) β Human (Matrix)
| Component | Role |
|---|---|
| n8n | 27 workflows β alert intake, session management, knowledge population, teacher-agent runner, server-side session-replay |
| OpenClaw v2026.4.11 (GPT-5.1) | Tier 1 β fast triage with 17 skills + Active Memory, handles 80%+ without escalation |
| Claude Code (Opus 4.6) | Tier 2 β 11 sub-agents + master chatops-workflow phase-gate skill, ReAct reasoning, interactive [POLL] approval |
| AWX | 41 Ansible playbooks wired into AI planner |
| Matrix (Synapse) | Human-in-the-loop β polls, reactions, replies |
| Prometheus + Grafana | 11 dashboards, 64+ panels, 16+ metric exporters, 4 alert-rule files |
| OpenObserve | OTel tracing β 39K spans, OTLP export |
| Ollama (RTX 3090 Ti) | Local embeddings β nomic-embed-text, query rewriting |
| Compiled Wiki | 44 articles from 7+ sources, daily recompilation |
Safety β 7 Layers
The system investigates freely but never executes infrastructure changes without human approval:
- Claude Code hooks β 7 injection detection groups + 59 destructive/exfiltration patterns blocked deterministically. Now emits the 3-behavior taxonomy (
allow/reject_content/deny) β recoverable patterns get a retry hint instead of a wall. Every rejection lands inevent_logas a typedtool_guardrail_rejectionevent. Theevidence_missingrisk signal (IFRNLLEI01PRD-718) fires in-band whenCONFIDENCE β₯ 0.8is claimed without a visible tool output block, forcing[POLL]and stripping unearned[AUTO-RESOLVE]markers. - safe-exec.sh β code-level blocklist that prompt injection cannot bypass
- exec-approvals.json β 36 specific skill patterns (no wildcards)
- Evaluator-Optimizer β Haiku screens high-stakes responses before posting
- Confidence gating β < 0.5 stops, < 0.7 escalates
- Budget ceilings β EUR 5/session warning, $25/day plan-only mode
- Credential scanning β 16 PII patterns redacted, 39 credentials tracked with rotation
Plus: handoff depth counter forces [POLL] at depth β₯ 5 / hard-halts at β₯ 10, and any agent cycling back into its own chain is refused. An audit-risk-decisions.sh weekly invariant check rejects any reject_content event with an empty message (would blind the agent).
Key Numbers
| Metric | Value |
|---|---|
| Operational activation audit | A (91.8%) β 23 tables populated, 148K+ rows |
| Agentic design patterns | 21/21 at A+ (tri-source audit: 11/11 dimensions) |
| OpenAI Agents SDK adoption batch | 9/9 implemented (issues 635β643), 45 files changed, 6 migrations, 4 new tables |
| Preference-iterating prompt patcher | Live (issue 645) β N-candidate A/B trials, Welch t-test, auto-promote |
| CLI-session RAG capture | Live (issues 646/647/648) β transcripts + tool-calls + knowledge extraction |
| QA suite | 468/0 PASS (99.57%), 2 benign skips, across 51 suite files β ~3β5 min run, JSON scorecard, per-suite timeout guard. (411 baseline + 57 new NVIDIA G1-G4 tests across 7 suites.) |
Skill-authoring scorecard vs google/agents-cli | 4.94 / 5.00 (was 3.94) β 13/16 dimensions at 5/5; 6 targeted gap dimensions closed |
| NVIDIA DLI 12-dim scorecard | A+ (4.83 / 5.0) β was A (4.4) before 2026-04-29; 9/12 dimensions at A+, 1 at B (multi-tenant, intentional single-operator design); 9-source aggregate A+ (4.79) |
| Handoff envelope compression | 0.43% ratio (176 KB input_history β 752 B on the wire, zlib+b64) |
| AWX/Ansible runbooks | 41 playbooks wired into Plan-and-Execute |
| Tool call instrumentation | 88,448 calls across 108 types, per-tool error rates + latency p50/p95 |
| OTel tracing | 39K spans β OpenObserve + Prometheus metrics |
| Typed session events | 17 event classes, queryable event_log table + Prom exporter (event_log schema_version=4) |
| GraphRAG knowledge graph | 360 entities, 193 relationships |
| Self-improving prompt patches | 5 active (auto-generated from eval scores) |
| Predictive risk scoring | 123 devices scanned daily, 23 at elevated risk |
| Holistic health check | 96%+ β 142 checks (functional + e2e + cross-site) |
| Session-holistic E2E | 100% (23/23) β covers 18 YT issues with before/after scoring |
| SQLite tables | 43 (42 + long_horizon_replay_results [-748]); 19 schema-versioned via the central CURRENT_SCHEMA_VERSION registry |
| Industry benchmark | 4.10/5.00 (82%) -- 15 dimensions, 23 industry sources, E2E certified (39/39) |
| RAGAS golden set | 33 queries (15 hard-eval tagged) β multi-hop / temporal / negation / meta / cross-corpus |
| Weekly hard-eval (50-q) | judge-graded hit@5 = 0.90, p50 5.7s, p95 13.6s |
| RAGAS RAG quality | Faithfulness 0.88, Precision 0.86, Recall 0.88 (18 evaluations via Claude Haiku) |
| NIST behavioral telemetry | 5/5 AG-MS.1 signals active (action velocity, permission escalation, cross-boundary, delegation depth, exception rate) |
| Adversarial red-team | 54 tests (32 baseline + 22 adversarial), quarterly schedule, 12 bypass vectors hardened |
| Governance compliance | EU AI Act limited-risk assessment, QMS (Art. 17), NIST oversight boundary framework |
| Supply chain security | CycloneDX SBOM in CI, model provenance chain, agent decommissioning procedure |
Documentation
| Document | What it covers |
|---|---|
| Operational Activation Audit | Scores data activation β 21/21 tables, 109K rows |
| Tri-Source Audit | 11/11 dimensions A+ (Gulli + Anthropic + industry) |
| External Source Mapping | atlas-agents + claude-code-from-source techniques applied |
| Agentic Patterns Audit | 21/21 pattern scorecard |
| Evaluation Process | 3-set eval, flywheel, CI gate |
| ACI Tool Audit | 10 MCP tools against 8-point checklist |
| Compiled Wiki | 45 auto-compiled articles |
| Industry Benchmark | 15-dimension scored assessment against 23 industry sources |
| Skill-Authoring Scorecard | 16-dimension scorecard vs google/agents-cli β 3.94 β 4.94, 6 gap dimensions closed |
| Skill Versioning Runbook | Per-skill semver convention (patch/minor/MAJOR tied to the SKILL contract) + audit-skill-versions.sh |
| Skills Index | Auto-generated from all SKILL.md + agent frontmatter; drift-gated by test-656 |
| Agentic Platform State | Single source-of-record describing the post-NVIDIA-batch platform; merges the audit + cert + rescored docs into one canonical "where the system is right now" reference |
| NVIDIA DLI Cross-Audit (source) | Original 12-dimension cross-audit + 9-source master scorecard + P0/P1/P2 gap-closure roadmap |
| NVIDIA P0+P1 Certification | E2E certification: 57/57 G1-G4 tests, integration audits, live smoke fires, schema-bump trace, operator-gate closure |
| NVIDIA DLI Cross-Audit (re-scored) | Per-dimension delta after implementation β A (4.4) β A+ (4.83) |
| EU AI Act Assessment | Risk classification + article mapping |
| Tool Risk Classification | 153 MCP tools classified (NIST AG-MP.1) |
| Agent Decommissioning | Per-tier lifecycle procedures |
| Installation Guide | Setup steps + cron configuration |
Quick Start
git clone https://github.com/papadopouloskyriakos/agentic-chatops.git
cd agentic-chatops
cp .env.example .env # Add your credentials
See the Installation Guide for full setup.
References
- Agentic Design Patterns by Antonio Gulli (Springer, 2025) β 21 patterns, all implemented
- Claude Certified Architect β Foundations (Anthropic) β sub-agent design
- Industry References β Anthropic, OpenAI, LangChain, Microsoft
- atlas-agents + claude-code-from-source β external techniques applied
- google/agents-cli β reference implementation of skill-authoring discipline (phase-gate master skill, auto-generated skills index, "Do NOT use for X" anti-guidance, Shortcuts-to-Resist, Proving-Your-Work). Six gap dimensions adopted 2026-04-23 under IFRNLLEI01PRD-712.
License
Sanitized mirror of a private GitLab repository. Provided as-is for educational and reference purposes.
Built by a solo infrastructure operator who got tired of waking up at 3am for alerts that an AI could triage.
