Automatised Pipeline
Codebase intelligence as an MCP server โ tree-sitter AST โ LadybugDB graph โ Louvain communities โ hybrid BM25 + TF-IDF + RRF search. 23 tools ยท 10 stages ยท 220 tests ยท Rust ยท Clean Architecture. The read-only intelligence layer between finding and PRD.
Ask AI about Automatised Pipeline
Powered by Claude ยท Grounded in docs
I know everything about Automatised Pipeline. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
What An Agent Can Ask ยท Getting Started ยท Pipeline ยท Tools ยท Architecture ยท Zetetic Standard
Companion projects:
Cortex โ persistent memory that consolidates and reconsolidates across sessions
zetetic-team-subagents โ 97 genius reasoning agents + 18 team specialists
prd-spec-generator โ TypeScript PRD generator that consumes our graph intelligence
Every AI coding assistant hits the same wall: you ask it to change handle_tool_call, and it either hallucinates a function that was renamed last week, edits something in the wrong community of the codebase, or silently breaks a call chain three modules away. Agents operate on strings; codebases have structure. The gap is where bugs live.
automatised-pipeline is a Rust MCP server that indexes any Rust / Python / TypeScript codebase into a LadybugDB property graph, resolves imports and call chains across files, detects functional communities via Leiden-class community detection, traces execution flows from entry points, builds a hybrid BM25 + sparse TF-IDF + RRF search index, and exposes all of it to AI agents through 23 MCP tools.
It is the codebase intelligence layer that sits between a finding ("this bug exists") and a PRD ("here is the fix, here is what it affects, here is what it must never break"). It is read-only intelligence โ it never writes code, opens PRs, or runs CI. It tells the system what is true about the code so the next stage can reason without guessing.
One pipeline stage = one MCP tool. 10 stages. 23 tools. 12,000+ lines of Rust. 220 tests. Zero warnings. Every constant sourced.
What an agent can ask it
analyze_codebase(path: "/path/to/project", output_dir: "/tmp/run")
โ index + resolve + cluster + build search index in one call
โ 430 nodes, 400 edges, 216 communities, 35 processes on our own codebase
search_codebase(graph_path, query: "process incoming tool requests")
โ hybrid ranked results: BM25 lexical + sparse TF-IDF semantic + RRF fusion
โ returns: handle_tool_call (score 0.021), dispatch_request (0.020), ...
get_context(graph_path, qualified_name: "src/main.rs::handle_tool_call")
โ 360ยฐ view: community membership, process participation,
incoming calls, outgoing calls, types used, types that use it
โ did-you-mean suggestions when the symbol isn't found exactly
get_impact(graph_path, qualified_name)
โ blast radius: every process that transits this symbol, every community it touches
โ the answer to "what breaks if I change this?"
detect_changes(graph_path, diff_text OR base_ref+head_ref)
โ git diff โ affected symbols โ impacted communities โ touched processes
โ risk score for the change
validate_prd_against_graph(prd_path, graph_path)
โ does the PRD reference real symbols? (symbol hallucination check)
โ does "scoped to X" match the actual community count?
โ does "doesn't affect main" hold against the call graph?
check_security_gates(graph_path, changed_symbols)
โ auth-critical community touch ยท unsafe symbol ยท public API change ยท
unresolved imports ยท test coverage gap
verify_semantic_diff(before_graph_path, after_graph_path)
โ what nodes/edges appeared, what disappeared, what dangles,
new cycles via Tarjan SCC, regression score with verdict
Getting started
Prerequisites
- Rust 1.94+ (
rustup install stable) - CMake (LadybugDB builds its C++ core from source โ ~5 minutes first build, cached after)
Clone + build
git clone https://github.com/cdeust/automatised-pipeline.git
cd automatised-pipeline
cargo build --release
# First build: ~5 minutes (compiles LadybugDB C++ core)
# Subsequent builds: <1 second incremental
Register the MCP server
The repo ships a .mcp.json that Claude Code picks up automatically when you open the directory:
{
"mcpServers": {
"ai-architect": {
"command": "cargo",
"args": ["run", "--quiet", "--release", "--manifest-path", "Cargo.toml"]
}
}
}
Or register globally:
claude mcp add ai-architect -- /absolute/path/to/target/release/ai-architect-mcp
First run
# Run the binary directly to verify the handshake
./target/release/ai-architect-mcp
# Or exercise it via stdio JSON-RPC:
printf '%s\n' \
'{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
'{"jsonrpc":"2.0","id":2,"method":"tools/list"}' \
'{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"health_check","arguments":{}}}' \
| ./target/release/ai-architect-mcp
The pipeline
Every stage is a tool. Stages build on each other but are independently callable. The pipeline is serial in logical order but MCP calls are stateless โ you can re-run stages 3a-3d on a fresh codebase without re-running stages 1-2.
| # | Tool(s) | What it does |
|---|---|---|
| 0 | health_check | Handshake + protocol + tool count |
| 1 | extract_finding, refine_finding | Deterministic finding extraction + orchestrator-aware prompt refinement |
| 2 | start_verification, append_clarification, finalize_verification, abort_verification | Human-gated clarification loop with SHA-256 transcript digest, atomic single-file session state |
| 3a | index_codebase, query_graph, get_symbol | tree-sitter AST โ LadybugDB graph (16 node labels, 36+ relationship tables) |
| 3b | resolve_graph, lsp_resolve | Import/call/impl resolution with confidence scoring + optional LSP deep resolution (rust-analyzer / pyright / typescript-language-server) |
| 3c | cluster_graph, get_processes, get_impact | Leiden-class community detection (Louvain + C2 repair) + BFS execution-flow tracing from entry points |
| 3d | search_codebase, get_context, analyze_codebase, detect_changes | Hybrid BM25 + sparse TF-IDF + RRF search ยท 360ยฐ symbol view ยท all-in-one analysis ยท git-diff impact |
| 4 | prepare_prd_input | Bundle verified finding + graph intel โ artifact for prd-spec-generator |
| 6 | validate_prd_against_graph | Symbol hallucination ยท community consistency ยท process-impact contradiction |
| 8 | check_security_gates | Auth-critical community ยท unsafe symbol ยท public-API change ยท unresolved-import intro ยท test-coverage gap |
| 9 | verify_semantic_diff | Before/after graph diff with Tarjan SCC cycle detection and regression scoring |
Stages 5 (PRD generation), 7 (implementation), 10 (benchmark), 11 (deployment), 12 (PR) belong to other systems in the pipeline: prd-spec-generator, the coding agent, CI, and
gh. This project is the read-only intelligence half.
23 MCP Tools
Every tool takes structured JSON arguments via the MCP protocol and returns a structured JSON response. No LLM is called from inside any tool โ intelligence is the agent's job; the tool's job is safe, fast data movement with invariants.
Stage 0: health_check
Stage 1: extract_finding ยท refine_finding
Stage 2: start_verification ยท append_clarification ยท finalize_verification ยท abort_verification
Stage 3a: index_codebase ยท query_graph ยท get_symbol
Stage 3b: resolve_graph ยท lsp_resolve
Stage 3c: cluster_graph ยท get_processes ยท get_impact
Stage 3d: search_codebase ยท get_context ยท analyze_codebase ยท detect_changes
Stage 4: prepare_prd_input
Stage 6: validate_prd_against_graph
Stage 8: check_security_gates
Stage 9: verify_semantic_diff
Each tool has a JSON Schema enforced at the wire, reason codes on error (no cryptic protocol errors), and a receipt-style response with timing and counts.
Architecture
Rust MCP server, hand-rolled stdio JSON-RPC 2.0 (no SDK โ we own the wire). Clean Architecture with module boundaries.
transport (stdio, JSON-RPC framing)
โ
server/main.rs (request dispatch, tool registry)
โ
handlers (do_* functions, one per tool)
โ
core modules:
graph_store โ LadybugDB port (Cypher + UNWIND + prepared statements)
parser/{rust,python,typescript,mod} โ tree-sitter AST extractors
indexer โ walk + parse + persist pipeline
resolver โ cross-file import/call/impl resolution
lsp_{client,resolver} โ optional LSP deep resolution
clustering โ inline Louvain + C2 repair + process tracing
search/{bm25,vector,rrf,mod} โ hybrid search (Tantivy + sparse TF-IDF + RRF)
prd_input โ stage 4: bundle for prd-spec-generator
prd_validator โ stage 6: validate PRD claims against graph
security_gates โ stage 8: auth/unsafe/API/imports/coverage checks
semantic_diff โ stage 9: before/after graph regression scoring
git_diff โ diff parser + symbol mapping
Crates
Eight crates. Nothing speculative; everything justified.
| Crate | Purpose | License | Why |
|---|---|---|---|
serde + serde_json | Wire serialization | MIT | JSON-RPC, artifact persistence |
sha2 | Stage-2 transcript digest | MIT | Tamper detection |
lbug (LadybugDB) | Embedded property graph + Cypher | MIT | Native Cypher, FTS-ready, the Kรนzu successor |
tree-sitter | Incremental parser runtime | MIT | First-class Rust bindings |
tree-sitter-rust ยท -python ยท -typescript | Language grammars | MIT | Semantic structure without a compiler |
tantivy | Lucene-grade BM25 | MIT | Real ranked text search, <10ms startup |
Deliberately not included: async runtime (we're stdio-blocking), HTTP client, LLM SDK, embedding model runtime (sparse TF-IDF replaces it at zero dep cost).
Storage
Graphs are per-finding by design (Lamport's isolation invariant): each finding gets its own LadybugDB instance at <output_dir>/runs/<run_id>/findings/<finding_id>/graph/. Zero-coordination concurrency, trivial cleanup, no cross-finding state leakage. Redundant indexing for shared codebases is acknowledged and mitigated in a later optional cache layer โ not shoehorned into the core.
The zetetic standard
Inherited from zetetic-team-subagents. Not a prompt suggestion โ an enforcement rule that holds in code.
| Pillar | Question |
|---|---|
| Logical | Is it consistent? |
| Critical | Is it true? |
| Rational | Is it useful? |
| Essential | Is it necessary? |
In this codebase it concretely means:
- Every algorithm traces to a source. Louvain โ Blondel et al. 2008. Leiden C2 repair โ Traag et al. 2019. RRF โ Cormack, Clarke, Bรผttcher 2009. SCC โ Tarjan 1972. BM25 via Tantivy โ Robertson et al. 1994.
- Every named constant has a
// source:comment.RRF_K = 60cites Cormack 2009.BULK_BATCH_SIZE = 500cites Kรนzu/LadybugDB tuning.PARSE_TIMEOUT_MICROS = 5_000_000is justified in the block above it. - No invented numbers. Where a value was chosen by judgment, the comment says so ("heuristic, not paper-backed") and cites its operational justification.
- Tool responses cite the spec that governs each error reason.
unsafe finding_id (spec ยง5.1.4, ยง9.3 Q4): must match [A-Za-z0-9._-]+โ callers see which rule they violated. - When a capability can't be proved at spec time, the tool degrades gracefully and says so in plain language. Example:
lsp_resolveon a stub binary returnslsp_probe_failed: found on PATH but didn't respond as an LSP server (stdout closed immediately; likely a stub, proxy, or non-LSP binary)โ not a cryptic protocol error.
Security
Four CRITICAL, four HIGH, three MEDIUM findings were surfaced by a security-auditor agent pass and fixed in commit 512d683:
- Cypher injection via
insert_edgeโ centralizedcypher_str()escaping (\first, then') - Git argument injection โ
validate_git_refrejects--, newlines, NUL;--separator before refs - Arbitrary binary execution via
lsp_commandโ strict allowlist (rust-analyzer,pyright,pyright-langserver,typescript-language-server) - Symlink traversal โ
fs::symlink_metadata+MAX_DEPTH - Resource exhaustion โ
MAX_FILES=100_000,MAX_FILE_BYTES=10 MB,MAX_TOTAL_BYTES=2 GB,MAX_DEPTH=64 - Tree-sitter pathological input โ
set_timeout_micros(5_000_000)+MAX_PARSE_BYTES=1 MB query_graphread-only โ forbidden-keyword whole-word filter (CREATE/DELETE/MERGE/SET/REMOVE/DROP/ALTER/CALL/LOAD)graph_pathfilesystem safety โvalidate_graph_path_safe()before anyremove_dir_all- LSP
rootUriโ RFC 3986 percent-encoding - Diff line overflow โ
DIFF_LINE_MAX = u64::MAX / 2guard
Each fix has a test that asserts the exploit is now rejected. Run cargo test to see 220 tests pass including the exploit-regression suite.
Scale
Verified by the dba agent through compile-and-run probes against lbug 0.15.3:
| Strategy | ms/edge |
|---|---|
| Raw string per edge (naive) | 5.36 |
| Prepared statement, no transaction | 5.48 |
BEGIN TRANSACTION + prepared + COMMIT | 0.70 |
UNWIND + typed LogicalType::Struct | 0.143 |
The bulk-insert path uses UNWIND with a typed struct schema (the engineer who wrote the first version used LogicalType::Any which fails the binder โ the typed struct form works). Prepared statements are cached in a RefCell<HashMap<query, PreparedStatement>> on the GraphStore. Sparse TF-IDF replaces the dense N ร V ร 4B matrix โ 30.5ร smaller on our own codebase (108 KB vs 3.2 MB) and scales linearly with non-zero terms rather than vocab size. Clustering eliminated probe_node_label_for_process (per-node Cypher round-trip) in favor of a single in-memory HashMap<id, label> population pass.
500-file synthetic Rust fixture indexes in ~38 seconds end-to-end (parse + resolve + cluster + search index), down from the pre-audit implied "5 min โ 1 hour" bracket.
Integration with the rest of the stack
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Claude Code agent โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP (stdio JSON-RPC)
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ automatised-pipeline โ โ this repo
โ stage 0 ยท 1 ยท 2 ยท 3a-d ยท 4 ยท 6 ยท 8 ยท 9 โ
โ Rust ยท LadybugDB ยท tree-sitter ยท Tantivy โ
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ โโโโโโ stage 5 (PRD gen)
โ [prd-spec-generator]
โ TypeScript / Node
โโโโโโโโโโโโโโโโโโโ โ
โ Cortex โ โ
โ memory engine โ โโโโโโโโโโโโโโโโโโโโ
โ PostgreSQL + โ
โ pgvector โ
โโโโโโโโโโโโโโโโโโโ
โ
โ cross-session memory for findings,
โ decisions, lessons learned
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ zetetic-team-subagents โ
โ 97 genius + 18 specialists โ
โ problem-shape routing โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Cortex โ every architectural decision made during a pipeline run gets remembered. When the next finding touches a similar area, Cortex surfaces the prior reasoning before you re-derive it.
- zetetic-team-subagents โ the genius agents (Shannon, Lamport, Simon, Popper, Feynman, Fermi, dba, architect, security-auditor, engineer) designed this project stage by stage. Every major decision in
stages/*.mdtraces to an agent dispatch. - prd-spec-generator โ consumes our
stage-4.prd_input.jsonartifact via disk or MCP-to-MCP query ofsearch_codebase/get_context/get_impact. Each in its ideal language: our performance-critical graph work in Rust, their document generation in TypeScript.
Testing
cargo test # 220 tests, full suite
cargo test --release --test scalability_bench # 500-file synthetic fixture
cargo test --release --test lbug_bulk_investigation # dba's 9 UNWIND probes
cargo test --release --test stage3a_integration # end-to-end per sub-stage
cargo test --release --test stage9_integration # before/after diff
cargo check # zero warnings required
cargo build --release # release binary
Every stage has an integration test with fixture data. The lbug_bulk_investigation test is intentionally preserved โ it's the compile-and-run proof that dba's UNWIND pattern works, kept for regression protection and documentation.
Repository layout
automatised-pipeline/
โโโ src/
โ โโโ main.rs โ MCP server, 23 tool handlers
โ โโโ tool_schemas.rs โ JSON Schemas for every tool
โ โโโ lib.rs โ re-exports for integration tests
โ โโโ graph_store.rs โ LadybugDB port (UNWIND + prepared + cached)
โ โโโ parser/
โ โ โโโ mod.rs โ language dispatch
โ โ โโโ rust.rs ยท python.rs ยท typescript.rs
โ โโโ indexer.rs โ walk + parse + persist
โ โโโ resolver.rs โ cross-file resolution
โ โโโ lsp_client.rs โ minimal LSP probe + client
โ โโโ lsp_resolver.rs โ LSP-backed deep resolution
โ โโโ clustering.rs โ Louvain + C2 repair + BFS process tracing
โ โโโ search/
โ โ โโโ mod.rs โ orchestration, get_context, 3-layer qn lookup
โ โ โโโ bm25.rs ยท vector.rs ยท rrf.rs
โ โโโ prd_input.rs โ stage 4
โ โโโ prd_validator.rs โ stage 6
โ โโโ security_gates.rs โ stage 8
โ โโโ semantic_diff.rs โ stage 9
โ โโโ git_diff.rs โ diff parsing + symbol mapping
โโโ stages/ โ locked spec per stage (Shannon, then engineer implements)
โ โโโ stage-1.md ยท stage-2.md ยท stage-3.md ยท stage-3b.md ยท stage-3c.md
โ โโโ stage-6.md ยท stage-8.md
โ โโโ stage-1.review.md ยท stage-3-db-evaluation.md ยท stage-3-research.md
โ โโโ decisions/ โ Popper / Lamport / Simon verdicts per decision
โโโ tests/
โ โโโ stage{3a,3b,3c,3d,4,6,8,9}_integration.rs
โ โโโ multilang_integration.rs
โ โโโ stage3d_hybrid_search.rs
โ โโโ scalability_bench.rs
โ โโโ lbug_bulk_investigation.rs
โ โโโ tfidf_size_report.rs
โ โโโ fixtures/multilang/ โ sample.rs ยท sample.py ยท sample.ts
โโโ .claude/
โ โโโ agents/ โ 18 specialists + 97 genius agents
โ โโโ skills/ ยท commands/ ยท tools/ ยท hooks/
โ โโโ scripts/
โโโ .mcp.json
โโโ NOTES.md โ stages table + growth rule
โโโ Cargo.toml
โโโ README.md
The zetetic decisions behind the build
Every major architectural decision was made by a genius agent with a specific problem shape. Stored in stages/decisions/*.md and in Cortex.
| Decision | Agent | Verdict |
|---|---|---|
| Rust vs C/C++ for the glue layer | Popper | Conjecture "Rust is the right language" is unfalsified. lbug + tree-sitter already run native C/C++; Rust is the glue where the borrow checker pays the most. |
| Graph-per-finding vs graph-per-codebase | Lamport | Per-finding. Isolation holds by construction with zero coordination; the redundant-indexing cost is mitigable in an optional cache layer later. |
| Stage 3a decomposition | Simon | Five steps, satisficed against the growth rule; first useful query at step 4. |
| DB backend choice | dba | LadybugDB (lbug 0.15.3) โ only option simultaneously maintained, native Cypher, embedded, with FTS + vector + algo extensions. |
| Stage 2 clarification loop shape | Shannon | Four-tool state machine with atomic single-file session (no crash window between separate files), unconditional one-round-minimum before finalize. |
| lbug UNWIND pattern | dba | LogicalType::Struct { fields } works; LogicalType::Any fails the binder โ 38ร speedup verified by compile-and-run probes. |
Agents are spawned via zetetic-team-subagents; each genius is a reasoning pattern (not a persona) with canonical moves and primary-source citations.
Status
Private repo by design. Not ready for public release until the full hardening pass is done โ security audit fixes are in, correctness fixes are in, scale fixes are in, stages 4/6/8/9 are live, but every capability marked "live" above has been verified end-to-end on this machine, not yet in a production context.
What works today: indexing Rust / Python / TypeScript codebases end-to-end, resolving cross-file relationships, clustering into communities, tracing processes from entry points, hybrid search, PRD input preparation, PRD claim validation, security gate checking, before/after regression detection.
What's deferred:
- Cross-file indexer batching to unlock the full 38ร UNWIND win (currently 1.17ร aggregate; per-edge rate is already 0.143 ms)
is_unsafeextraction in the Rust parser (stage 8 S2 runs ininfo-skip mode pending this)- LSP-based deep method resolution on inferred types
- Multi-repo / workgroup operations (GitNexus
group_*) - Rename / refactor tools (we are read-only by design)
License
MIT โ see LICENSE.
Built by cdeust. Every stage designed by a genius agent. Every constant sourced.
