📦

Alembic

Reduce noisy shell, CI, diff, and MCP-adjacent output into compact answers your coding agent can actually use. Alembic is a local, skill-first tool for Codex and Claude that cuts context waste without adding a network dependency.

0 installs

Trust: 34 — Low

Blockchain

Ask AI about Alembic

I know everything about Alembic. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Alembic

reduce noisy shell output before your coding agent burns context on it

Before & After • Why Alembic • Install • First Benchmark • Levels • Flags • Effort • Presets • Usage • Benchmarks •

Alembic is a skill-first, local noise-reduction tool for Codex and Claude. It takes test floods, CI logs, MCP chatter, stack traces, diffs, watch output, Docker logs, and Kubernetes logs, then compresses them into the smallest artifact that still answers the question you asked.

Why Alembic

Most agent loops do not fail because the model is weak. They fail because the model is forced to read too much noise.

Alembic exists to fix that:

reduce passing-test floods to the one line that matters
collapse CI logs into the failing job, step, command, and reason
turn giant MCP JSON payloads into the rows you actually asked for
shrink watch-mode churn so repeated checks stop burning context
benchmark the savings on your own repo instead of asking you to trust a screenshot

Why this matters:

less output to read means less context wasted
smaller answers are usually faster to process in tight loops
noisy logs stop hiding the real failure
benchmark mode gives you a direct proof of value on your own workflows

Before And After

Alembic is useful when raw output is much larger than the decision you need to make.

Tests

Before:

[... 180 passing test lines omitted ...]
FAIL src/auth/session.test.ts > rejects expired token
AssertionError: expected 401 to equal 403
[... stack trace omitted ...]

After:

FAIL src/auth/session.test.ts > rejects expired token

Why this helps: the agent keeps the failing test and loses the flood.

GitHub Actions

Before:

[... 400 setup, cache, install, and cleanup lines omitted ...]
build	Run pnpm test	2026-04-11T15:10:04.000Z pnpm test
build	Run pnpm test	2026-04-11T15:10:07.000Z FAIL src/auth/session.test.ts > rejects expired token
build	Run pnpm test	2026-04-11T15:10:07.100Z AssertionError: expected 401 to equal 403
build	Run pnpm test	2026-04-11T15:10:08.000Z Error: Process completed with exit code 1.

After:

FAIL Job: build
Step: Run pnpm test
Command: pnpm test
Reason: FAIL src/auth/session.test.ts > rejects expired token
Exit: 1

Why this helps: it turns a whole workflow log into the single failing step and reason.

MCP Tool Response Data

MCP tool calls return large JSON payloads full of wrapper fields. Alembic auto-detects JSON responses and compacts them to a readable list.

Before (raw response from an MCP tool call):

{"issues":[{"id":"PAYMENTS-17","title":"TypeError: Cannot read properties of undefined (reading 'id')","level":"error","status":"unresolved","count":"137","userCount":29,"firstSeen":"2026-04-01T09:00:00.000Z","lastSeen":"2026-04-10T14:22:00.000Z","culprit":"payments/checkout.tsx in submitPayment","project":{"slug":"web-app"}},{"id":"API-42","title":"TimeoutError: upstream request exceeded 30s","level":"warning","status":"regressed","count":"54","userCount":7,"firstSeen":"2026-04-08T10:15:00.000Z","lastSeen":"2026-04-10T13:57:00.000Z","culprit":"api/routes/orders.ts in createOrder","project":{"slug":"backend-api"}}]}

After:

[unresolved]
  PAYMENTS-17 | TypeError: Cannot read properties of undefined (reading 'id') | error | 137 events
[regressed]
  API-42 | TimeoutError: upstream request exceeded 30s | warning | 54 events

Why this helps: wrapper fields, IDs, timestamps, and metadata disappear. The rows you asked for stay.

MCP Error Logs

Before:

[... session setup omitted ...]
[MCP] tool=list_issues server=sentry request_id=abc123
[MCP] error=permission_denied project=web-app endpoint=issues
[MCP] retry attempt 1/3
[MCP] retry attempt 2/3 error=timeout

After:

MCP FAIL list_issues
Server: sentry
Target: project=web-app endpoint=issues
Reason: [MCP] error=permission_denied project=web-app endpoint=issues
Next: [MCP] retry attempt 1/3

Why this helps: the failing tool, target, and reason are preserved. Retry chatter disappears.

Docker

Before:

[... image pull and layer cache lines omitted ...]
Step 7/10 : RUN pnpm build
 ---> Running in 123456789abc
src/server.ts:12:3 error TS2304: Cannot find name 'window'.
The command '/bin/sh -c pnpm build' returned a non-zero code: 2
[... cleanup omitted ...]

After:

DOCKER FAIL unknown service
Step: Step 7/10 : RUN pnpm build
Command: pnpm build
Reason: src/server.ts:12:3 error TS2304: Cannot find name 'window'.

Why this helps: build failure stays; layer churn disappears.

Kubernetes

Before:

[... describe output omitted ...]
Name:           api-7d9f8c6d8f-2xk9m
Warning  Failed     2m (x4 over 5m)   kubelet  Error: ImagePullBackOff
Warning  Failed     2m (x4 over 5m)   kubelet  Failed to pull image "ghcr.io/openai/alembic:missing"
[... events omitted ...]

After:

K8S FAIL api-7d9f8c6d8f-2xk9m
Container: unknown container
Reason: Warning  Failed     2m (x4 over 5m)   kubelet  Error: ImagePullBackOff

Why this helps: you keep the failing pod and reason without scrolling through event spam.

Install

Pick your agent. Follow the row.

Agent	Install
Claude Code	Add this repo from the Claude marketplace, install the Alembic plugin, then run `npm run install:claude`.
Codex	Clone this repo, open `/plugins`, search `Alembic`, and install it. Optional: run `npm run install:codex` if you also want a personal skill copy outside this repo.
Cursor	`npx skills add FrangSierra/Alembic -a cursor`
Copilot	`npx skills add FrangSierra/Alembic -a github-copilot`

Install once. Use it from there.

What You Get

Feature	Claude Code	Codex	Cursor	Copilot
Alembic skill available in your personal skills list	Y	Y	Y	Y
Repo-local plugin defaults and `/alembic ...` commands	Y	Y	—	—
Good default setup for tests, CI logs, diffs, MCP payloads, and watch output	Y	Y	Y	Y

npx skills add installs the Alembic skill for Cursor and Copilot, but does not install plugin hooks or repo-local defaults.

If Claude Code already has the plugin but you still do not see the skill, run:

npm run install:claude

You can re-run that command safely to refresh an existing Alembic skill install.

Use --force only if ~/.claude/skills/alembic already exists but does not look like an Alembic install and you want to replace it:

node scripts/install-claude.mjs --force

Add Project Rules With `alembic init`

Installing the skill makes Alembic available. It does not, by itself, teach every repo when to use it.

If you want a project to consistently invoke Alembic in the right places, run this once in that repo:

alembic init
alembic init claude

What it does:

alembic init creates or updates AGENTS.md
alembic init claude creates or updates CLAUDE.md
inserts a managed Alembic block instead of overwriting the rest of the file
tells agents to use $alembic for noisy output like test suites, CI logs, MCP payloads, Docker output, Kubernetes logs, diffs, and other shell chatter
tells agents to keep raw output only when the user explicitly asked for it
can mention the optional effort suggestion workflow as a recommendation step, not as proof of the app's live thread setting

You can re-run it safely. Alembic keeps its own managed block up to date and leaves the rest of your project rules alone.

If you prefer to manage project rules by hand, this is the exact block Alembic writes:

<!-- ALEMBIC:START -->
## Alembic

- Use `$alembic` whenever shell output is noisy and the user did not ask for the raw transcript. This includes test suites, watch-mode churn, CI or GitHub Actions logs, MCP tool payloads, long diffs, `rg` or grep results, Docker builds and container logs, Kubernetes `describe` or logs output, migration output, and sandbox or approval chatter.
- Resolve the helper before invoking it. Prefer `alembic` if it is already on `PATH`. Otherwise use `node <helper> ...` with the first existing helper in this order: `$HOME/.codex/skills/alembic/runtime/bin/alembic.mjs`, `$HOME/.claude/skills/alembic/runtime/bin/alembic.mjs`, `./tooling/alembic/bin/alembic.mjs`, `./alembic/runtime/bin/alembic.mjs`, `./plugins/alembic/skills/alembic/runtime/bin/alembic.mjs`. When probing in shell, check existence with `-e`, not executability with `-x`.
- If Alembic effort suggestion is enabled and the user has already shaped the task enough to judge implementation scope, you may run the Alembic effort heuristic as a recommendation step before a costly implementation pass.
- You may skip the effort check for obviously tiny, mechanical, reversible edits like a typo fix, a single rename, or a one-line text change.
- Use `<alembic-helper> effort --format json --requested <intended-or-user-provided-level> "<task summary>"` for that check, where `<alembic-helper>` is either `alembic` or `node <resolved-helper-path>`.
- Treat the result as a recommendation for the task.
- Prefer Alembic detail levels to control output volume: `flame` for full reduced summaries, `ember` for compact one-line reductions, and `ash` to suppress routine success noise.
- Do not reduce outputs the user explicitly wants verbatim, and do not use Alembic as a substitute for inspecting exact facts when raw output is the task.
<!-- ALEMBIC:END -->

Repo Helper Only

If you only want to use the helper inside this repo:

node tooling/alembic/bin/alembic.mjs --help

Optional shell alias:

alias alembic='node /absolute/path/to/tooling/alembic/bin/alembic.mjs'

The repo also keeps a self-contained validation bundle in alembic/ so installs and generated copies can be tested without a build step.

Prove It In 30 Seconds

The best proof of concept is not a screenshot. It is running Alembic on a command you already use.

Start with:

alembic benchmark exec --preset tests -- npm test

Or if you want a diff-focused proof:

alembic benchmark diff --preset safety --base main

What benchmark mode gives you:

the raw command output
Alembic's reduced answer
byte reduction
line reduction
token reduction or token estimate
a preservation check, so you can see whether the important signal survived

Typical benchmark output looks like:

alembic benchmark: exec
Verdict: PASS
Compared: npm test raw output vs Alembic-reduced output
Source used: wrapped command output
Reduction: 6,459 bytes / 105 lines -> 5 bytes / 1 lines (99.9% bytes, 99.0% lines)
Tokens: 1,257 -> 2 (99.8% saved)
Preservation: status preserved; no failing tests to compare

If your install is running without the tokenizer dependency, Alembic still shows a local Token estimate: line so the benchmark remains useful in self-contained skill installs.

Detail Levels

Alembic supports three output levels:

Level	What it does	Best for	Why
`flame`	Full reduced output	normal interactive work	keeps the important details while still cutting noise
`ember`	Compact one-line or near-one-line output	tight agent loops, repeated checks, quick status reads	minimizes token use without going silent
`ash`	Silence on success, still show failures, prompts, and action-needed states	watch mode, background validation, noisy pass-heavy commands	removes the churn that usually wastes the most context

Quick mental model:

use flame when a human will read the answer
use ember when you mostly want status
use ash when successful output is not useful

Examples:

alembic --level flame "Summarize this output."
alembic --level ember "Summarize this output."
alembic --level ash "Summarize this output."
alembic exec --level ash --preset tests -- npm test

Hook Flags And Persisted Defaults

Alembic plugin installs can persist a few defaults so you do not have to repeat them every session.

Config values:

enabled: on or off
detailLevel: flame, ember, or ash
effortAdvisor: off or suggest

Inspect or change them from the CLI:

alembic hook-config show --host claude
alembic hook-config set --host claude --level ember --effort off
alembic hook-config set --host codex --level ash
alembic hook-config set --host codex --disable
alembic hook-config reset --host claude

You can also change them from the prompt with plugin installs:

/alembic
/alembic status
/alembic on
/alembic off
/alembic flame
/alembic ember
/alembic ash
/alembic effort suggest
/alembic effort off
/alembic reset

What these flags are for:

enabled on|off: turn Alembic hook defaults on or off without uninstalling anything
detailLevel: decide how much reduced output you want by default
effortAdvisor suggest: let Alembic recommend a likely reasoning level for the task before you commit to an expensive implementation pass
effortAdvisor off: keep Alembic focused only on output reduction

Why this matters:

teams often want different defaults for interactive work versus loops or watch-mode churn
some users want Alembic always visible, others only when things fail
many teams overuse xhigh; suggest gives a lightweight local check before a costly implementation pass
when Alembic recommends a change, it gives you a clearer signal about whether to stay put or switch levels before continuing
the best place to use suggest is after Plan mode has shaped the work, then before you commit to "yes, implement"
alembic init lets you persist those expectations inside a repo so agents know when to reach for Alembic without relying on memory

Effort Advisor

Alembic also ships a local effort advisor for host-side integrations, but it is important to frame it correctly:

it is a task-effort heuristic, not a live thread inspector
it helps answer "what level should we probably run this at?" rather than "what level is the app definitely using right now?"
it is most useful when you want to avoid wasting tokens on high or xhigh before the task shape is clear
it becomes most believable after Plan mode, once the task has been decomposed enough to judge whether implementation is still small, bounded, and reversible or has become broad, risky, and uncertain

alembic effort --requested xhigh "Rename a few files and update imports."
alembic effort --requested medium "Plan complete: implement the approved auth migration across API, DB, and rollout validation."
alembic effort --format json --requested low "After the plan, validate whether implementing the flaky auth race fix still fits low or should move higher."
alembic benchmark effort --set smoke

How it works:

it is local and deterministic
it scores engineering difficulty, not prompt length alone
it only asks for a change when the mismatch is meaningful
--requested should be the level you intend to use or the level the user tells you is currently set
low-confidence calls stay non-blocking

Recommended workflow:

Use Plan mode first so the task stops being a vague request and becomes a concrete implementation shape.
Run alembic effort on the post-plan implementation summary, using the level you intend to run or the level the user explicitly gives you.
If Alembic recommends a cheaper or stronger level, decide whether you want to switch before continuing.
If the task is tiny and reversible, skip the check and just do the work.

Examples:

Small rename after planning:

alembic effort --requested xhigh "After plan review: rename AccountCard to ProfileCard and update imports."

This is the kind of task Alembic should usually push downward so you do not burn xhigh on a mechanical edit.

Big implementation after planning:

alembic effort --requested medium "After plan approval: implement the billing retry redesign, add migration, update workers, and validate rollback safety."

This is where Alembic can justify asking for high or xhigh before you say "yes, implement."

Current calibration goals:

long prompt alone does not mean xhigh
bounded read-only UI work caps at high
local async bugs with clear repro do not auto-jump to xhigh
xhigh is reserved for tasks that combine uncertainty with blast radius, coupling, statefulness, or hard validation

Current execution lesson:

the heuristic itself is useful
it should be described as a recommendation system, not as a mechanism that can verify the app's current effort level
prompt-only enforcement is not fully deterministic once an agent has already started exploring
if you want the highest real-world success rate, treat Plan mode as the primary checkpoint and everything else as a fallback

This is useful when your team tends to default to expensive reasoning for everything, or the opposite problem: underpowering complex debugging and migration work.

Presets

Alembic ships focused output contracts for the most common noisy workflows:

Preset	What you get
`tests`	`PASS`, or `FAIL` plus failing test names
`actions`	failing GitHub Actions job, step, command, reason, exit code
`mcp`	failing MCP tool/server, target, error, and next step
`docker`	failing Docker step/service and smallest useful clue
`k8s`	failing pod/container, reason, and next clue
`safety`	`SAFE`, `REVIEW`, or `UNSAFE`, then only risky files and why
`files`	file paths only
`summary`	changed files with a one-line summary each
`logs`	highest-signal errors plus the smallest useful summary
`approval`	blocked action, reason, next step

Rules:

use --preset when one of these contracts already matches the job
use --question when you need a custom output shape
if both are present, --question wins

Useful examples:

alembic --preset logs
alembic exec --preset tests -- npm test
gh run view --log | alembic --preset actions
cat mcp-trace.log | alembic --preset mcp
docker build . 2>&1 | alembic --preset docker --drop-level INFO,DEBUG
kubectl describe pod api-123 | alembic --preset k8s --keep-level ERROR,WARN
alembic benchmark diff --preset safety --base main
rg -n "TODO" . | alembic --preset files

Approval noise is another good fit:

Before:

Command failed in sandbox.
Network access is restricted.
This command requires escalated permissions.
Please re-run with sandbox_permissions=require_escalated.
Command failed in sandbox.
Network access is restricted.
This command requires escalated permissions.
Please re-run with sandbox_permissions=require_escalated.
npm install failed because registry access is blocked.

After:

BLOCKED npm install failed because registry access is blocked.
Reason: Network access is restricted.
Next: Please re-run with sandbox_permissions=require_escalated.

Usage

Prefer exec when exit status matters:

alembic exec --level ash --preset tests -- npm test

Use pipeline mode when it is simpler:

git diff -- src | alembic --level ember --preset summary

MCP Tool Response Data

JSON responses from MCP tool calls are auto-detected. Pipe the response and describe what you want:

# Compact list with default fields (ID, title, status)
echo "$MCP_RESPONSE" | alembic --question "List each item: ID, title, status."

# Grouped by a field
echo "$MCP_RESPONSE" | alembic --question "List each Sentry issue: ID, title, status, level. Group by status."

# Filtered
echo "$MCP_RESPONSE" | alembic --question "List each Sentry issue: ID, title, status where status is unresolved or regressed."

Force with --mode data if auto-detection misses:

echo "$MCP_RESPONSE" | alembic --mode data --question "List each item: ID, title, status."

Other Useful Commands

alembic doctor
alembic benchmark exec --level flame --preset tests -- npm test
gh run view --log | alembic --level ember --preset actions
cat mcp-trace.log | alembic --level ember --preset mcp
docker build . 2>&1 | alembic --level ember --preset docker --drop-level INFO,DEBUG
kubectl describe pod api-123 | alembic --level ash --preset k8s --keep-level ERROR,WARN
alembic benchmark diff --level ember --preset safety --base main

Benchmark Your Own Workflow

Benchmark mode is the best way to decide whether Alembic belongs in your loop.

Use it whenever you want numbers instead of vibes:

alembic benchmark exec --preset tests -- npm test
alembic benchmark exec --preset actions -- gh run view --log
alembic benchmark diff --preset safety --base main
alembic benchmark effort --set smoke

What each benchmark proves:

benchmark exec: how much a real command shrinks
benchmark diff: how much branch or working-tree review noise shrinks
benchmark effort: whether the effort heuristic is still calibrated

These sample numbers come from real local runs:

Workflow	Raw tokens	Current tokens	Token save	Byte save
`npm test` -> `PASS`	1,257	2	99.8%	99.9%
Sentry top 50 issues	96,439	344	99.6%	99.5%
Sentry grouped by status	96,439	881	99.1%	99.4%
Sentry filtered statuses	96,439	244	99.7%	99.7%
First-commit diff summary	154,386	787	99.5%	99.5%

Why this section matters:

line count alone can lie, especially on JSONL-heavy MCP output
bytes and tokens show the real savings
it gives users a fast, concrete proof right after install

Validation

Local validation:

alembic doctor

Repo validation:

npm test
npm run check:generated
npm run release:validate

Install validation:

npm run install:codex
npm run install:claude
run alembic doctor from the repo or the installed bundle

Current boundaries:

Alembic is a skill, not a live MCP client
it reduces noisy shell output, agent output, and MCP-adjacent output after the host emits it
it has no provider or network dependency in the default path

Star This Repo

Attribution

Alembic began with inspiration from Distill. What started as a small project built on that foundation slowly grew, iteration by iteration, into its own tool for reducing noisy agent workflows and making local validation easier to trust.

Big thanks to the original Distill project for the spark that helped this repo get started.

Alembic

Reviews

Documentation

Alembic

Why Alembic

Before And After

Tests

GitHub Actions

MCP Tool Response Data

MCP Error Logs

Docker

Kubernetes

Install

What You Get

Add Project Rules With alembic init

Repo Helper Only

Prove It In 30 Seconds

Detail Levels

Hook Flags And Persisted Defaults

Effort Advisor

Presets

Usage

MCP Tool Response Data

Other Useful Commands

Benchmark Your Own Workflow

Validation

Star This Repo

Attribution

Security Checklist

Add Project Rules With `alembic init`