Alembic
Reduce noisy shell, CI, diff, and MCP-adjacent output into compact answers your coding agent can actually use. Alembic is a local, skill-first tool for Codex and Claude that cuts context waste without adding a network dependency.
Ask AI about Alembic
Powered by Claude Β· Grounded in docs
I know everything about Alembic. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Alembic
reduce noisy shell output before your coding agent burns context on it
Before & After β’ Why Alembic β’ Install β’ First Benchmark β’ Levels β’ Flags β’ Effort β’ Presets β’ Usage β’ Benchmarks β’

Alembic is a skill-first, local noise-reduction tool for Codex and Claude. It takes test floods, CI logs, MCP chatter, stack traces, diffs, watch output, Docker logs, and Kubernetes logs, then compresses them into the smallest artifact that still answers the question you asked.
Why Alembic
Most agent loops do not fail because the model is weak. They fail because the model is forced to read too much noise.
Alembic exists to fix that:
- reduce passing-test floods to the one line that matters
- collapse CI logs into the failing job, step, command, and reason
- turn giant MCP JSON payloads into the rows you actually asked for
- shrink watch-mode churn so repeated checks stop burning context
- benchmark the savings on your own repo instead of asking you to trust a screenshot
Why this matters:
- less output to read means less context wasted
- smaller answers are usually faster to process in tight loops
- noisy logs stop hiding the real failure
- benchmark mode gives you a direct proof of value on your own workflows
Before And After
Alembic is useful when raw output is much larger than the decision you need to make.
Tests
Before:
[... 180 passing test lines omitted ...]
FAIL src/auth/session.test.ts > rejects expired token
AssertionError: expected 401 to equal 403
[... stack trace omitted ...]
After:
FAIL src/auth/session.test.ts > rejects expired token
Why this helps: the agent keeps the failing test and loses the flood.
GitHub Actions
Before:
[... 400 setup, cache, install, and cleanup lines omitted ...]
build Run pnpm test 2026-04-11T15:10:04.000Z pnpm test
build Run pnpm test 2026-04-11T15:10:07.000Z FAIL src/auth/session.test.ts > rejects expired token
build Run pnpm test 2026-04-11T15:10:07.100Z AssertionError: expected 401 to equal 403
build Run pnpm test 2026-04-11T15:10:08.000Z Error: Process completed with exit code 1.
After:
FAIL Job: build
Step: Run pnpm test
Command: pnpm test
Reason: FAIL src/auth/session.test.ts > rejects expired token
Exit: 1
Why this helps: it turns a whole workflow log into the single failing step and reason.
MCP Tool Response Data
MCP tool calls return large JSON payloads full of wrapper fields. Alembic auto-detects JSON responses and compacts them to a readable list.
Before (raw response from an MCP tool call):
{"issues":[{"id":"PAYMENTS-17","title":"TypeError: Cannot read properties of undefined (reading 'id')","level":"error","status":"unresolved","count":"137","userCount":29,"firstSeen":"2026-04-01T09:00:00.000Z","lastSeen":"2026-04-10T14:22:00.000Z","culprit":"payments/checkout.tsx in submitPayment","project":{"slug":"web-app"}},{"id":"API-42","title":"TimeoutError: upstream request exceeded 30s","level":"warning","status":"regressed","count":"54","userCount":7,"firstSeen":"2026-04-08T10:15:00.000Z","lastSeen":"2026-04-10T13:57:00.000Z","culprit":"api/routes/orders.ts in createOrder","project":{"slug":"backend-api"}}]}
After:
[unresolved]
PAYMENTS-17 | TypeError: Cannot read properties of undefined (reading 'id') | error | 137 events
[regressed]
API-42 | TimeoutError: upstream request exceeded 30s | warning | 54 events
Why this helps: wrapper fields, IDs, timestamps, and metadata disappear. The rows you asked for stay.
MCP Error Logs
Before:
[... session setup omitted ...]
[MCP] tool=list_issues server=sentry request_id=abc123
[MCP] error=permission_denied project=web-app endpoint=issues
[MCP] retry attempt 1/3
[MCP] retry attempt 2/3 error=timeout
After:
MCP FAIL list_issues
Server: sentry
Target: project=web-app endpoint=issues
Reason: [MCP] error=permission_denied project=web-app endpoint=issues
Next: [MCP] retry attempt 1/3
Why this helps: the failing tool, target, and reason are preserved. Retry chatter disappears.
Docker
Before:
[... image pull and layer cache lines omitted ...]
Step 7/10 : RUN pnpm build
---> Running in 123456789abc
src/server.ts:12:3 error TS2304: Cannot find name 'window'.
The command '/bin/sh -c pnpm build' returned a non-zero code: 2
[... cleanup omitted ...]
After:
DOCKER FAIL unknown service
Step: Step 7/10 : RUN pnpm build
Command: pnpm build
Reason: src/server.ts:12:3 error TS2304: Cannot find name 'window'.
Why this helps: build failure stays; layer churn disappears.
Kubernetes
Before:
[... describe output omitted ...]
Name: api-7d9f8c6d8f-2xk9m
Warning Failed 2m (x4 over 5m) kubelet Error: ImagePullBackOff
Warning Failed 2m (x4 over 5m) kubelet Failed to pull image "ghcr.io/openai/alembic:missing"
[... events omitted ...]
After:
K8S FAIL api-7d9f8c6d8f-2xk9m
Container: unknown container
Reason: Warning Failed 2m (x4 over 5m) kubelet Error: ImagePullBackOff
Why this helps: you keep the failing pod and reason without scrolling through event spam.
Install
Pick your agent. Follow the row.
| Agent | Install |
|---|---|
| Claude Code | Add this repo from the Claude marketplace, install the Alembic plugin, then run npm run install:claude. |
| Codex | Clone this repo, open /plugins, search Alembic, and install it. Optional: run npm run install:codex if you also want a personal skill copy outside this repo. |
| Cursor | npx skills add FrangSierra/Alembic -a cursor |
| Copilot | npx skills add FrangSierra/Alembic -a github-copilot |
Install once. Use it from there.
What You Get
| Feature | Claude Code | Codex | Cursor | Copilot |
|---|---|---|---|---|
| Alembic skill available in your personal skills list | Y | Y | Y | Y |
Repo-local plugin defaults and /alembic ... commands | Y | Y | β | β |
| Good default setup for tests, CI logs, diffs, MCP payloads, and watch output | Y | Y | Y | Y |
npx skills add installs the Alembic skill for Cursor and Copilot, but does not install plugin hooks or repo-local defaults.
If Claude Code already has the plugin but you still do not see the skill, run:
npm run install:claude
You can re-run that command safely to refresh an existing Alembic skill install.
Use --force only if ~/.claude/skills/alembic already exists but does not look like an Alembic install and you want to replace it:
node scripts/install-claude.mjs --force
Add Project Rules With alembic init
Installing the skill makes Alembic available. It does not, by itself, teach every repo when to use it.
If you want a project to consistently invoke Alembic in the right places, run this once in that repo:
alembic init
alembic init claude
What it does:
alembic initcreates or updatesAGENTS.mdalembic init claudecreates or updatesCLAUDE.md- inserts a managed Alembic block instead of overwriting the rest of the file
- tells agents to use
$alembicfor noisy output like test suites, CI logs, MCP payloads, Docker output, Kubernetes logs, diffs, and other shell chatter - tells agents to keep raw output only when the user explicitly asked for it
- can mention the optional effort suggestion workflow as a recommendation step, not as proof of the app's live thread setting
You can re-run it safely. Alembic keeps its own managed block up to date and leaves the rest of your project rules alone.
If you prefer to manage project rules by hand, this is the exact block Alembic writes:
<!-- ALEMBIC:START -->
## Alembic
- Use `$alembic` whenever shell output is noisy and the user did not ask for the raw transcript. This includes test suites, watch-mode churn, CI or GitHub Actions logs, MCP tool payloads, long diffs, `rg` or grep results, Docker builds and container logs, Kubernetes `describe` or logs output, migration output, and sandbox or approval chatter.
- Resolve the helper before invoking it. Prefer `alembic` if it is already on `PATH`. Otherwise use `node <helper> ...` with the first existing helper in this order: `$HOME/.codex/skills/alembic/runtime/bin/alembic.mjs`, `$HOME/.claude/skills/alembic/runtime/bin/alembic.mjs`, `./tooling/alembic/bin/alembic.mjs`, `./alembic/runtime/bin/alembic.mjs`, `./plugins/alembic/skills/alembic/runtime/bin/alembic.mjs`. When probing in shell, check existence with `-e`, not executability with `-x`.
- If Alembic effort suggestion is enabled and the user has already shaped the task enough to judge implementation scope, you may run the Alembic effort heuristic as a recommendation step before a costly implementation pass.
- You may skip the effort check for obviously tiny, mechanical, reversible edits like a typo fix, a single rename, or a one-line text change.
- Use `<alembic-helper> effort --format json --requested <intended-or-user-provided-level> "<task summary>"` for that check, where `<alembic-helper>` is either `alembic` or `node <resolved-helper-path>`.
- Treat the result as a recommendation for the task.
- Prefer Alembic detail levels to control output volume: `flame` for full reduced summaries, `ember` for compact one-line reductions, and `ash` to suppress routine success noise.
- Do not reduce outputs the user explicitly wants verbatim, and do not use Alembic as a substitute for inspecting exact facts when raw output is the task.
<!-- ALEMBIC:END -->
Repo Helper Only
If you only want to use the helper inside this repo:
node tooling/alembic/bin/alembic.mjs --help
Optional shell alias:
alias alembic='node /absolute/path/to/tooling/alembic/bin/alembic.mjs'
The repo also keeps a self-contained validation bundle in alembic/ so installs and generated copies can be tested without a build step.
Prove It In 30 Seconds
The best proof of concept is not a screenshot. It is running Alembic on a command you already use.
Start with:
alembic benchmark exec --preset tests -- npm test
Or if you want a diff-focused proof:
alembic benchmark diff --preset safety --base main
What benchmark mode gives you:
- the raw command output
- Alembic's reduced answer
- byte reduction
- line reduction
- token reduction or token estimate
- a preservation check, so you can see whether the important signal survived
Typical benchmark output looks like:
alembic benchmark: exec
Verdict: PASS
Compared: npm test raw output vs Alembic-reduced output
Source used: wrapped command output
Reduction: 6,459 bytes / 105 lines -> 5 bytes / 1 lines (99.9% bytes, 99.0% lines)
Tokens: 1,257 -> 2 (99.8% saved)
Preservation: status preserved; no failing tests to compare
If your install is running without the tokenizer dependency, Alembic still shows a local Token estimate: line so the benchmark remains useful in self-contained skill installs.
Detail Levels
Alembic supports three output levels:
| Level | What it does | Best for | Why |
|---|---|---|---|
flame | Full reduced output | normal interactive work | keeps the important details while still cutting noise |
ember | Compact one-line or near-one-line output | tight agent loops, repeated checks, quick status reads | minimizes token use without going silent |
ash | Silence on success, still show failures, prompts, and action-needed states | watch mode, background validation, noisy pass-heavy commands | removes the churn that usually wastes the most context |
Quick mental model:
- use
flamewhen a human will read the answer - use
emberwhen you mostly want status - use
ashwhen successful output is not useful
Examples:
alembic --level flame "Summarize this output."
alembic --level ember "Summarize this output."
alembic --level ash "Summarize this output."
alembic exec --level ash --preset tests -- npm test
Hook Flags And Persisted Defaults
Alembic plugin installs can persist a few defaults so you do not have to repeat them every session.
Config values:
enabled:onoroffdetailLevel:flame,ember, orasheffortAdvisor:offorsuggest
Inspect or change them from the CLI:
alembic hook-config show --host claude
alembic hook-config set --host claude --level ember --effort off
alembic hook-config set --host codex --level ash
alembic hook-config set --host codex --disable
alembic hook-config reset --host claude
You can also change them from the prompt with plugin installs:
/alembic
/alembic status
/alembic on
/alembic off
/alembic flame
/alembic ember
/alembic ash
/alembic effort suggest
/alembic effort off
/alembic reset
What these flags are for:
enabled on|off: turn Alembic hook defaults on or off without uninstalling anythingdetailLevel: decide how much reduced output you want by defaulteffortAdvisor suggest: let Alembic recommend a likely reasoning level for the task before you commit to an expensive implementation passeffortAdvisor off: keep Alembic focused only on output reduction
Why this matters:
- teams often want different defaults for interactive work versus loops or watch-mode churn
- some users want Alembic always visible, others only when things fail
- many teams overuse
xhigh;suggestgives a lightweight local check before a costly implementation pass - when Alembic recommends a change, it gives you a clearer signal about whether to stay put or switch levels before continuing
- the best place to use
suggestis after Plan mode has shaped the work, then before you commit to "yes, implement" alembic initlets you persist those expectations inside a repo so agents know when to reach for Alembic without relying on memory
Effort Advisor
Alembic also ships a local effort advisor for host-side integrations, but it is important to frame it correctly:
- it is a task-effort heuristic, not a live thread inspector
- it helps answer "what level should we probably run this at?" rather than "what level is the app definitely using right now?"
- it is most useful when you want to avoid wasting tokens on
highorxhighbefore the task shape is clear - it becomes most believable after Plan mode, once the task has been decomposed enough to judge whether implementation is still small, bounded, and reversible or has become broad, risky, and uncertain
alembic effort --requested xhigh "Rename a few files and update imports."
alembic effort --requested medium "Plan complete: implement the approved auth migration across API, DB, and rollout validation."
alembic effort --format json --requested low "After the plan, validate whether implementing the flaky auth race fix still fits low or should move higher."
alembic benchmark effort --set smoke
How it works:
- it is local and deterministic
- it scores engineering difficulty, not prompt length alone
- it only asks for a change when the mismatch is meaningful
--requestedshould be the level you intend to use or the level the user tells you is currently set- low-confidence calls stay non-blocking
Recommended workflow:
- Use Plan mode first so the task stops being a vague request and becomes a concrete implementation shape.
- Run
alembic efforton the post-plan implementation summary, using the level you intend to run or the level the user explicitly gives you. - If Alembic recommends a cheaper or stronger level, decide whether you want to switch before continuing.
- If the task is tiny and reversible, skip the check and just do the work.
Examples:
- Small rename after planning:
alembic effort --requested xhigh "After plan review: rename AccountCard to ProfileCard and update imports."
This is the kind of task Alembic should usually push downward so you do not burn xhigh on a mechanical edit.
- Big implementation after planning:
alembic effort --requested medium "After plan approval: implement the billing retry redesign, add migration, update workers, and validate rollback safety."
This is where Alembic can justify asking for high or xhigh before you say "yes, implement."
Current calibration goals:
- long prompt alone does not mean
xhigh - bounded read-only UI work caps at
high - local async bugs with clear repro do not auto-jump to
xhigh xhighis reserved for tasks that combine uncertainty with blast radius, coupling, statefulness, or hard validation
Current execution lesson:
- the heuristic itself is useful
- it should be described as a recommendation system, not as a mechanism that can verify the app's current effort level
- prompt-only enforcement is not fully deterministic once an agent has already started exploring
- if you want the highest real-world success rate, treat Plan mode as the primary checkpoint and everything else as a fallback
This is useful when your team tends to default to expensive reasoning for everything, or the opposite problem: underpowering complex debugging and migration work.
Presets
Alembic ships focused output contracts for the most common noisy workflows:
| Preset | What you get |
|---|---|
tests | PASS, or FAIL plus failing test names |
actions | failing GitHub Actions job, step, command, reason, exit code |
mcp | failing MCP tool/server, target, error, and next step |
docker | failing Docker step/service and smallest useful clue |
k8s | failing pod/container, reason, and next clue |
safety | SAFE, REVIEW, or UNSAFE, then only risky files and why |
files | file paths only |
summary | changed files with a one-line summary each |
logs | highest-signal errors plus the smallest useful summary |
approval | blocked action, reason, next step |
Rules:
- use
--presetwhen one of these contracts already matches the job - use
--questionwhen you need a custom output shape - if both are present,
--questionwins
Useful examples:
alembic --preset logs
alembic exec --preset tests -- npm test
gh run view --log | alembic --preset actions
cat mcp-trace.log | alembic --preset mcp
docker build . 2>&1 | alembic --preset docker --drop-level INFO,DEBUG
kubectl describe pod api-123 | alembic --preset k8s --keep-level ERROR,WARN
alembic benchmark diff --preset safety --base main
rg -n "TODO" . | alembic --preset files
Approval noise is another good fit:
Before:
Command failed in sandbox.
Network access is restricted.
This command requires escalated permissions.
Please re-run with sandbox_permissions=require_escalated.
Command failed in sandbox.
Network access is restricted.
This command requires escalated permissions.
Please re-run with sandbox_permissions=require_escalated.
npm install failed because registry access is blocked.
After:
BLOCKED npm install failed because registry access is blocked.
Reason: Network access is restricted.
Next: Please re-run with sandbox_permissions=require_escalated.
Usage
Prefer exec when exit status matters:
alembic exec --level ash --preset tests -- npm test
Use pipeline mode when it is simpler:
git diff -- src | alembic --level ember --preset summary
MCP Tool Response Data
JSON responses from MCP tool calls are auto-detected. Pipe the response and describe what you want:
# Compact list with default fields (ID, title, status)
echo "$MCP_RESPONSE" | alembic --question "List each item: ID, title, status."
# Grouped by a field
echo "$MCP_RESPONSE" | alembic --question "List each Sentry issue: ID, title, status, level. Group by status."
# Filtered
echo "$MCP_RESPONSE" | alembic --question "List each Sentry issue: ID, title, status where status is unresolved or regressed."
Force with --mode data if auto-detection misses:
echo "$MCP_RESPONSE" | alembic --mode data --question "List each item: ID, title, status."
Other Useful Commands
alembic doctor
alembic benchmark exec --level flame --preset tests -- npm test
gh run view --log | alembic --level ember --preset actions
cat mcp-trace.log | alembic --level ember --preset mcp
docker build . 2>&1 | alembic --level ember --preset docker --drop-level INFO,DEBUG
kubectl describe pod api-123 | alembic --level ash --preset k8s --keep-level ERROR,WARN
alembic benchmark diff --level ember --preset safety --base main
Benchmark Your Own Workflow
Benchmark mode is the best way to decide whether Alembic belongs in your loop.
Use it whenever you want numbers instead of vibes:
alembic benchmark exec --preset tests -- npm test
alembic benchmark exec --preset actions -- gh run view --log
alembic benchmark diff --preset safety --base main
alembic benchmark effort --set smoke
What each benchmark proves:
benchmark exec: how much a real command shrinksbenchmark diff: how much branch or working-tree review noise shrinksbenchmark effort: whether the effort heuristic is still calibrated
These sample numbers come from real local runs:
| Workflow | Raw tokens | Current tokens | Token save | Byte save |
|---|---|---|---|---|
npm test -> PASS | 1,257 | 2 | 99.8% | 99.9% |
| Sentry top 50 issues | 96,439 | 344 | 99.6% | 99.5% |
| Sentry grouped by status | 96,439 | 881 | 99.1% | 99.4% |
| Sentry filtered statuses | 96,439 | 244 | 99.7% | 99.7% |
| First-commit diff summary | 154,386 | 787 | 99.5% | 99.5% |
Why this section matters:
- line count alone can lie, especially on JSONL-heavy MCP output
- bytes and tokens show the real savings
- it gives users a fast, concrete proof right after install
Validation
Local validation:
alembic doctor
Repo validation:
npm test
npm run check:generated
npm run release:validate
Install validation:
npm run install:codexnpm run install:claude- run
alembic doctorfrom the repo or the installed bundle
Current boundaries:
- Alembic is a skill, not a live MCP client
- it reduces noisy shell output, agent output, and MCP-adjacent output after the host emits it
- it has no provider or network dependency in the default path
Star This Repo
Attribution
Alembic began with inspiration from Distill. What started as a small project built on that foundation slowly grew, iteration by iteration, into its own tool for reducing noisy agent workflows and making local validation easier to trust.
Big thanks to the original Distill project for the spark that helped this repo get started.
