Devplanner
MCP server: Devplanner
Installation
npx devplannerAsk AI about Devplanner
Powered by Claude Β· Grounded in docs
I know everything about Devplanner. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
DevPlanner
A human + AI project management platform where every piece of work lives as a plain-text file on disk. Teams and AI coding agents share the same board, read and write the same Markdown cards, and both see live updates β no proprietary database, no sync conflicts, no silos.
Built for the era where humans and AI agents collaborate on the same codebase, DevPlanner gives both sides a transparent, version-controllable surface for planning, tracking, and shipping software.
Why DevPlanner?
Modern development increasingly involves AI agents β tools like Claude Code, GitHub Copilot, and custom automation β working alongside human developers. Most project management tools are built exclusively for humans: they hide data behind opaque APIs, require GUIs to interact with, and make it hard for agents to read or update task state.
DevPlanner is designed from the ground up for human + agent collaboration:
- Plain text, always β Cards are
.mdfiles with YAML frontmatter. Any tool β VS Code,grep, Claude Code, a shell script β can read and write them natively. - A live Kanban board β Humans get a visual drag-and-drop UI with real-time updates, rich card editing, and inline diff viewing.
- MCP-native for AI agents β A Model Context Protocol server gives AI assistants structured access to every project, card, and task without screen-scraping.
- Git-backed artifacts β Documents produced during development are tracked in git; stage, commit, and diff from inside the UI.
- Skill evaluation tooling β Built-in framework to evaluate and refine how well AI models follow the DevPlanner skill instructions.
Feature Highlights
| Feature | Description |
|---|---|
| ποΈ Kanban Board | Drag-and-drop lanes with collapsible columns and lane focus mode for deep dive card review |
| π Doc Manager | Integrated Markdown viewer/editor with a hierarchical file browser for vault artifacts |
| π Diff Viewer | Split-pane comparison with inline word highlights; git-aware mode switcher (All / Staged / Unstaged) |
| π Git Integration | Per-file status dots, stage/unstage/discard/commit from the bottom bar without leaving the UI |
| π Command Palette | Ctrl+K / Cmd+K cross-project search across titles, tasks, tags, descriptions, and links |
| π€ MCP Server | 17 tools + 3 resources for AI agents (Claude Code, GitHub Copilot, etc.) |
| π§ͺ Skill Eval Framework | Bun-based toolchain to score how well LLMs follow DevPlanner skill instructions; supports testing across multiple models and comparing runs over time |
| β‘ Real-time Sync | WebSocket broadcasts card changes, moves, and task updates to all connected clients in real time |
| π³ Docker | Single-port containerised deployment (UI + API on port 17103) |
| π Card Dispatch | One-click dispatch of any card to Claude Code CLI or Gemini CLI β agent works in an isolated git worktree, updates the board in real time via MCP, and auto-completes the card when done |
Purpose
DevPlanner bridges the gap between human developers and AI coding agents by providing a shared, transparent project management surface. Cards are Markdown files with YAML frontmatter, lanes are folders, and everything is version-controllable with git. The web UI provides a drag-and-drop Kanban board for visual management, while the file-based storage means any tool β from VS Code to Claude Code β can interact with the data natively.
Architecture
DevPlanner/
βββ src/ # Backend (Bun + Elysia, port 17103)
β βββ server.ts # Elysia app setup, error handler, route registration
β βββ mcp-server.ts # MCP server entry point (stdio transport)
β βββ routes/ # Thin route handlers
β β βββ projects.ts # Project CRUD
β β βββ cards.ts # Card CRUD, search
β β βββ tasks.ts # Task add, toggle, edit, delete
β β βββ links.ts # Card URL link CRUD
β β βββ artifacts.ts # Vault artifact creation
β β βββ history.ts # Per-project activity history
β β βββ activity.ts # Cross-project activity feed
β β βββ stats.ts # Project health metrics
β β βββ preferences.ts # Workspace preferences
β β βββ backup.ts # Workspace backup
β β βββ search.ts # Global and per-project search
β β βββ vault.ts # Vault file CRUD (`GET /api/vault/content`, `PUT /api/vault/file`, `DELETE /api/vault/file`, `GET /api/vault/tree`)
β β βββ vault-git.ts # Per-file git operations (status, stage, unstage, discard, commit, diff, show)
β β βββ config.ts # Public config endpoint (`GET /api/config/public`)
β β βββ websocket.ts # WebSocket upgrade handler
β βββ services/ # Core business logic and file I/O
β β βββ card.service.ts # Card CRUD, move, reorder, search
β β βββ task.service.ts # Checklist add/toggle/edit/delete (per-card mutex)
β β βββ link.service.ts # Card URL link management
β β βββ vault.service.ts # Obsidian vault artifact creation and file management
β β βββ git.service.ts # Per-file git operations via porcelain v1 status + plumbing commands
β β βββ project.service.ts # Project CRUD
β β βββ markdown.service.ts # Frontmatter parsing, checklist manipulation
β β βββ history.service.ts # In-memory activity event tracking
β β βββ history-persistence.service.ts # History JSON file persistence
β β βββ websocket.service.ts # WebSocket connection management, broadcasting
β β βββ file-watcher.service.ts # External file change detection
β β βββ backup.service.ts # Workspace backup to zip
β β βββ preferences.service.ts # Workspace preferences (digestAnchor, etc.)
β β βββ config.service.ts # Singleton env var loader
β βββ mcp/ # MCP server for AI agents
β β βββ tool-handlers.ts # 17 tool implementations
β β βββ resource-providers.ts # 3 resource providers
β β βββ schemas.ts # JSON schemas for tool inputs
β β βββ errors.ts # LLM-friendly error messages
β β βββ types.ts # MCP-specific type definitions
β βββ types/ # Shared TypeScript interfaces
β βββ utils/ # Slug generation, prefix utilities
β βββ seed.ts # Seed data script
βββ frontend/ # Frontend (React 19 + Vite + Tailwind CSS 4)
β βββ src/
β βββ api/client.ts # Typed fetch wrappers for all endpoints
β βββ services/ # WebSocket client
β βββ store/index.ts # Zustand state management
β βββ components/ # React components (Kanban, card detail, diff viewer, doc manager)
βββ tools/
β βββ skill-evals/ # Skill evaluation framework (Bun)
β βββ run-eval.ts # Single-skill eval runner
β βββ run-eval-models.ts # Model sweep runner
β βββ compare-runs.ts # Cross-run comparison report generator
β βββ lib/ # Scorer, reporter, LM client, parser
βββ workspace/ # Project data directory (gitignored)
Data Storage
Project data lives in $DEVPLANNER_WORKSPACE (configurable via env var):
workspace/
βββ my-project/
βββ _project.json # Project metadata, lane config, card ID prefix
βββ _history.json # Persisted activity history events
βββ 01-upcoming/
β βββ _order.json # Card display order for drag-and-drop
β βββ feature-card.md # Card: YAML frontmatter + Markdown body
βββ 02-in-progress/
βββ 03-complete/
βββ 04-archive/
Cards are .md files with YAML frontmatter for metadata (title, description, priority, assignee, tags, cardNumber, blockedReason) and standard Markdown checkboxes (- [ ] / - [x]) for task tracking. Each card has a unique cardId (e.g., DEV-42) composed from the project prefix and card number.
Tech Stack
| Layer | Technology |
|---|---|
| Runtime | Bun |
| Backend | Elysia |
| Markdown parsing | gray-matter |
| AI agent protocol | @modelcontextprotocol/sdk (MCP) |
| Frontend framework | React 19 |
| Build tool | Vite 6 |
| Styling | Tailwind CSS 4 (dark mode) |
| State management | Zustand |
| Drag-and-drop | @dnd-kit |
| Animations | Framer Motion |
| Markdown rendering | marked |
Getting Started
Prerequisites
- Bun (latest)
Setup
# Install backend dependencies
bun install
# Install frontend dependencies
cd frontend && bun install && cd ..
# Create workspace directory
mkdir -p workspace
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
DEVPLANNER_WORKSPACE | Yes | β | Absolute path to the workspace directory |
PORT | No | 17103 | Backend server port |
DEVPLANNER_BACKUP_DIR | No | {workspace}/_backups | Directory for workspace backups |
DISABLE_FILE_WATCHER | No | false | Set to true to disable file watching (useful for debugging WebSocket vs file watcher issues) |
ARTIFACT_BASE_URL | No | β | Base URL prefix for vault artifact links (e.g. https://viewer.example.com/view?path=10-Projects). Required for vault artifact creation and the Diff Viewer. |
ARTIFACT_BASE_PATH | No | β | Absolute path to the directory where artifact files are written. Required for vault artifact creation and the Diff Viewer. |
OLLAMA_BASE_URL | No | β | Ollama API base URL for skill evaluation (e.g. http://localhost:11434) |
OLLAMA_MODEL | No | β | Default Ollama model for skill eval runs (e.g. qwen2.5-coder:14b) |
OLLAMA_FEEDBACK_MODEL | No | $OLLAMA_MODEL | Separate model for LLM feedback step in skill evals |
OLLAMA_CALL_DELAY_MS | No | 500 | Milliseconds to pause between scenario calls in skill evals |
Using a .env file
A .env.example file is included in the repository with all variables documented. Copy it to .env and edit the values before starting the server:
cp .env.example .env
# Edit .env β at minimum set DEVPLANNER_WORKSPACE to an absolute path
Note: The
.envfile is gitignored. Never commit local paths or secrets. Bun loads.envautomatically when you run anybun run β¦command.
Running
# Start both backend and frontend (with hot reload)
bun run dev
# Or start them separately in two terminals:
DEVPLANNER_WORKSPACE=$(pwd)/workspace bun run dev:backend
DEVPLANNER_WORKSPACE=$(pwd)/workspace bun run dev:frontend
# Seed sample data (3 demo projects)
DEVPLANNER_WORKSPACE=$(pwd)/workspace bun run seed
# Run tests
bun test
# Run E2E demo (exercises real-time features β watch the UI!)
bun run demo:e2e
# Start MCP server (for AI agent integration)
DEVPLANNER_WORKSPACE=$(pwd)/workspace bun run mcp
# --- Skill Evaluation ---
# Run all devplanner skill eval scenarios
bun run eval:skill -- --skill .claude/skills/devplanner
# Run a model sweep across all configured Ollama models
bun run eval:models -- --skill .claude/skills/devplanner
# Compare all existing runs and generate a timeline report
bun run eval:compare -- --skill .claude/skills/devplanner
The frontend dev server proxies /api requests to the backend at http://localhost:17103.
Running with Docker
DevPlanner ships with a Dockerfile and docker-compose.yml for containerised use β ideal for self-hosting on a home server or making DevPlanner accessible across a local network or via Tailscale.
The Docker image builds the React frontend and serves it as static files from the Elysia backend. Both the UI and the API are available on a single port (17103) β no separate frontend server needed.
Quick start
# 1. Copy and edit the environment file
cp .env.example .env
# Edit .env β set DEVPLANNER_WORKSPACE to an absolute path on the host, e.g.:
# DEVPLANNER_WORKSPACE=/home/alice/devplanner-workspace
# 2. Create the workspace directory if it doesn't exist
mkdir -p /home/alice/devplanner-workspace
# 3. Build and start (frontend is built during the Docker build)
docker compose up --build
Everything is then available at http://localhost:17103 β the Kanban UI, the REST API, and the WebSocket endpoint.
Local network and Tailscale access
The backend binds to all network interfaces (0.0.0.0), so once the container is running you can access DevPlanner from any machine on the same local network or via Tailscale by replacing localhost with the host machine's local IP or Tailscale IP/hostname:
http://<tailscale-hostname>:17103
No additional proxy or VPN configuration is required when using Tailscale.
Port remapping
To expose DevPlanner on a different host port, set PORT in your .env:
# .env
PORT=8080
Or edit docker-compose.yml directly:
ports:
- "8080:17103" # DevPlanner accessible at :8080 on the host
Workspace persistence
The workspace directory is bind-mounted from the host (the value of DEVPLANNER_WORKSPACE in .env), so all project data persists across container restarts and rebuilds. You can also edit card files directly on the host and the file watcher inside the container will detect the changes in real time.
MCP Server (AI Agent Integration)
DevPlanner provides a Model Context Protocol (MCP) server that enables AI coding assistants like Claude Code and GitHub Copilot to directly interact with projects.
Running the MCP Server
# Start the MCP server with stdio transport
DEVPLANNER_WORKSPACE=$(pwd)/workspace bun run mcp
The server communicates via stdin/stdout and provides:
Card references: All tools and REST endpoints that accept a card slug (
:cardin URLs,cardSlugin MCP tools) also accept a card ID (e.g.DEV-42,dev42,dev-42). The API resolves IDs to the canonical slug automatically. IDs are matched case-insensitively by card number within the project.
Tools for project management:
- Core CRUD:
list_projects,get_project,create_project,list_cards,get_card,create_card,update_card,move_card,add_task,toggle_task - Smart/Workflow:
get_board_overview,get_next_tasks,batch_update_tasks,search_cards,update_card_content,get_project_progress,archive_card - Vault:
create_vault_artifactβ writes a Markdown file to the Obsidian Vault and attaches it as a link
3 Resources for read-only access:
devplanner://projects- List all projectsdevplanner://projects/{slug}- Project details with recent cardsdevplanner://projects/{slug}/cards/{cardSlug}- Full card details
Using with Claude Code
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"devplanner": {
"command": "bun",
"args": ["run", "mcp"],
"cwd": "/absolute/path/to/DevPlanner",
"env": {
"DEVPLANNER_WORKSPACE": "/absolute/path/to/workspace"
}
}
}
}
Then restart Claude Desktop. The AI will have access to all DevPlanner tools and can manage projects, cards, and tasks directly.
Env var note: The
envblock is additive β entries merge with the spawned process's inherited environment (think overrides). The project's.envfile is only auto-loaded by Bun whencwdresolves to the DevPlanner project root, so it's best to set required variables explicitly in theenvblock rather than relying on.envauto-loading.
Using with Remote Agents (HTTP Gateway)
If your agent runs on a separate machine, it cannot spawn a stdio process on a different host. The solution is to run supergateway as a sidecar on the DevPlanner server. It wraps the stdio MCP server and exposes it as an HTTP/SSE endpoint that any remote MCP client can reach.
Start the gateway
# Locally (reads env from the current shell / .env in project root)
bun run mcp:http
The gateway listens on port 17104 and exposes /sse (SSE transport).
Docker Compose (recommended)
The mcp-gateway service is included in docker-compose.yml and starts automatically alongside the main DevPlanner service:
docker compose up
Both services share the same workspace volume and environment variables from .env.
Hermes agent config
Hermes uses mcp.client.streamable_http internally β connect to the /mcp endpoint:
mcp_servers:
devplanner:
url: "http://<your-server-hostname>:17104/mcp"
enabled: true
Replace <your-server-hostname> with the server's Tailscale hostname, local IP, or domain name. No env block is needed β variables are injected on the server side.
The gateway runs in
--streamableHttpmode by default. If you need SSE for another client (e.g. Claude Desktop connecting remotely), remove--streamableHttpfrom the command β the endpoint becomes/sse.
See docs/features/mcp-http-gateway.md for full details and verification steps.
Example Agent Workflow
1. Agent: list_projects β discovers available work
2. Agent: get_next_tasks(assignee='agent') β finds uncompleted tasks
3. Agent: update_card(assignee='agent') β claims work
4. Agent: toggle_task(...) β reports progress as subtasks complete
5. Agent: move_card(targetLane='complete') β marks done
6. Agent: update_card_content(...) β adds implementation notes
All actions are tracked in DevPlanner's activity history and visible in the web UI.
Card Dispatch (AI Coding Agent Integration)
Card Dispatch lets you hand off any card directly to an AI coding agent from the board. The agent works in an isolated git worktree, updates the board in real time via the DevPlanner MCP server, and auto-moves the card to "Complete" when all tasks are done.
How It Works
- Open any card in the detail panel
- Click the Dispatch button (rocket icon) in the card header
- Choose an agent, optionally set a model, and click Dispatch
- The agent spawns in an isolated git worktree on branch
card/<card-slug> - Watch tasks check off in real time as the agent works
- Open View Output to see a live terminal feed of agent activity
- On success, the card moves to "Complete" automatically
Supported Agents
| Agent | Adapter Name | CLI | Install |
|---|---|---|---|
| Claude Code | claude-cli | claude | npm install -g @anthropic-ai/claude-code |
| Gemini CLI | gemini-cli | gemini | npm install -g @google/gemini-cli |
Setup
1. Configure the project's repository path
In Project Settings, set the Git Repository Path to the absolute local path of the git repository where the agent will work.
Or via API:
curl -X PATCH http://localhost:17103/api/projects/my-project \
-H "Content-Type: application/json" \
-d '{"repoPath": "/absolute/path/to/your/repo"}'
2. Install an agent CLI
# Claude Code CLI (requires ANTHROPIC_API_KEY)
npm install -g @anthropic-ai/claude-code
# Gemini CLI (requires GEMINI_API_KEY)
npm install -g @google/gemini-cli
3. Set API keys in your environment
# In your shell profile or .env file β never stored in DevPlanner data files
export ANTHROPIC_API_KEY=sk-ant-...
export GEMINI_API_KEY=...
4. Configure optional dispatch settings in .env
# Base directory for git worktrees (default: {os.tmpdir()}/devplanner-dispatch/)
DISPATCH_WORKTREE_BASE=/tmp/devplanner-dispatch
# Timeout in milliseconds (default: 1800000 = 30 minutes)
DISPATCH_TIMEOUT_MS=1800000
Completion Behavior
| Agent exit | All tasks done? | Result |
|---|---|---|
| Exit 0 | Yes | Card β 03-complete |
| Exit 0 | No | Tasks swept to done β card β 03-complete |
| Exit 0 (blocked) | Partial | Card stays in-progress with "review" status |
| Non-zero | β | Card marked blocked with error reason; worktree kept for debugging |
| Gemini exit 53 (turn limit) | β | Card marked "needs review" |
Auto-PR
Check Auto-create Pull Request in the dispatch modal to run gh pr create automatically after a successful dispatch. Requires the gh CLI to be installed and authenticated on the host.
API Overview
All endpoints are under /api and return JSON. Interactive API documentation is available at /swagger when the server is running (auto-generated from route definitions via Scalar). The raw OpenAPI 3.0 JSON spec is served at /swagger/json.
| Method | Endpoint | Description |
|---|---|---|
GET | /api/projects | List projects |
POST | /api/projects | Create project |
PATCH | /api/projects/:slug | Update project |
DELETE | /api/projects/:slug | Archive project |
GET | /api/projects/:slug/cards | List cards (?lane=, ?since=, ?staleDays=) |
POST | /api/projects/:slug/cards | Create card |
GET | /api/projects/:slug/cards/:card | Get card details |
DELETE | /api/projects/:slug/cards/:card | Archive card |
PATCH | /api/projects/:slug/cards/:card | Update card metadata |
PATCH | /api/projects/:slug/cards/:card/move | Move card to lane |
POST | /api/projects/:slug/cards/:card/tasks | Add checklist item |
PATCH | /api/projects/:slug/cards/:card/tasks/:index | Toggle or edit task |
DELETE | /api/projects/:slug/cards/:card/tasks/:index | Delete task |
POST | /api/projects/:slug/cards/:card/links | Add a URL link to a card |
PATCH | /api/projects/:slug/cards/:card/links/:linkId | Update a link |
DELETE | /api/projects/:slug/cards/:card/links/:linkId | Delete a link |
POST | /api/projects/:slug/cards/:card/artifacts | Write Markdown file to Obsidian Vault and attach as link |
GET | /api/projects/:slug/cards/search?q= | Search cards by title, tasks, description (legacy) |
GET | /api/projects/:slug/search?q= | Palette search β cards, tasks, descriptions, tags, assignees, links |
GET | /api/search?q=&projects= | Global cross-project palette search |
PATCH | /api/projects/:slug/lanes/:lane/order | Reorder cards in lane |
GET | /api/projects/:slug/stats | Project health stats |
GET | /api/projects/:slug/history | Activity history (?limit=, ?since=) |
GET | /api/activity | Cross-project activity feed (?since=, ?limit=) |
GET | /api/preferences | Get workspace preferences |
PATCH | /api/preferences | Update preferences (incl. digestAnchor) |
POST | /api/backup | Create workspace backup (zip) |
GET | /api/vault/content?path= | Read raw vault artifact file content (requires ARTIFACT_BASE_PATH) |
PUT | /api/vault/file | Write or overwrite a vault artifact file |
DELETE | /api/vault/file | Delete a vault artifact file |
GET | /api/vault/tree | List vault directory tree (requires ARTIFACT_BASE_PATH) |
GET | /api/vault/git/status?path= | Single-file git status |
POST | /api/vault/git/statuses | Batch git status for multiple files |
POST | /api/vault/git/stage | Stage a file (git add) |
POST | /api/vault/git/unstage | Unstage a file (git restore --staged, fallback git reset HEAD) |
POST | /api/vault/git/discard | Discard unstaged changes (git restore --worktree) |
POST | /api/vault/git/commit | Commit staged changes with a message |
GET | /api/vault/git/diff?path=&mode=working|staged | Raw unified diff output |
GET | /api/vault/git/show?path=&ref= | File content at a git ref (HEAD, :0β:3 for staged index, or 40-char commit SHA) |
GET | /api/config/public | Public server configuration (artifactBaseUrl) |
WS | /api/ws | WebSocket connection for real-time updates |
POST | /api/projects/:slug/cards/:card/dispatch | Dispatch card to AI agent (Claude Code CLI or Gemini CLI) |
GET | /api/projects/:slug/cards/:card/dispatch | Get active dispatch status for a card |
POST | /api/projects/:slug/cards/:card/dispatch/cancel | Cancel an active dispatch |
GET | /api/projects/:slug/cards/:card/dispatch/output | Get buffered agent output for an active dispatch |
GET | /api/dispatches | List all currently active dispatches |
See SPECIFICATION.md for full API contracts, request/response schemas, and validation rules. For an interactive view, visit /swagger when the server is running.
Features
Current
Kanban & Cards
- β Kanban Board UI β Drag-and-drop cards between lanes with collapsible lanes and animated transitions
- β Lane Focus Mode β Click any lane header to enter focus mode: full-height expanded card list with rich metadata, keyboard Escape to exit
- β
Card Management β Create, edit, archive cards with Markdown content; description,
blockedReasonfield, inline title/metadata editing - β
Card IDs β Unique identifiers (e.g.,
DEV-42) with project-configurable prefix - β Card Links β Attach structured URL references to cards (docs, specs, tickets, repos) with kind classification
- β
Task Tracking β Checkbox-based task lists with progress visualization; per-task
addedAt/completedAttimestamps; inline task editing and deletion
Doc Manager & Artifacts
- β Doc Manager β Integrated Markdown viewer/editor with a full-screen bottom-drawer panel; hierarchical folder tree for navigating vault artifact directories; breadcrumb path display and "go up" navigation
- β
Vault Artifacts β Write Markdown files directly to an Obsidian Vault and auto-attach as card links; upload files from the card detail panel via
UploadLinkForm - β
Git Integration β Per-file git status tracking with color-coded dots (
clean/untracked/modified/staged/staged-new/modified-staged/ignored/outside-repo); stage, unstage, discard, and commit from the bottom bar without leaving the editor; status refreshes immediately on save and file selection; batch status fetch for all visible files; configurable auto-refresh interval (5β300 s) - β Diff Viewer β Split-pane file comparison with syntax highlighting and inline word-level highlights for precise change spotting; left pane = older version, right pane = newer version; git-aware mode switcher shows available comparisons (All changes / Staged diff / Unstaged changes) based on the file's git state; quick-access diff buttons in the bottom bar; opens vault artifact files directly from card links; manual mode supports drag-and-drop, paste, and file picker
Search & Discovery
- β
Search & Filter β Command-palette overlay (
Ctrl+K/Cmd+K) with real-time search across card titles, tasks, descriptions, tags, assignees, and links; keyboard navigation; type-filter tabs; scope toggle (this project / all projects); matched text highlighted in results
Infrastructure
- β Real-time Sync β WebSocket infrastructure for live updates across all connected clients
- β File Watching β Automatic detection of external file changes (e.g., edits made in VS Code or via AI agents)
- β
Activity History β Per-project history with
?since=filter; cross-project/api/activityfeed - β Project Stats β Health metrics endpoint (WIP count, backlog depth, completion velocity)
- β
Digest Checkpoint β
digestAnchorpreference for precise time-bounded agent queries - β Visual Indicators β Animated feedback for background changes
- β Responsive Design β Mobile, tablet, and desktop layouts
- β Preferences β Last-selected project and digest anchor persistence
- β Backup β Workspace backup to zip via API
- β MCP Server β Model Context Protocol integration for AI agents (17 tools, 3 resources)
- β Docker β Containerized deployment with single-port serving (UI + API)
- β Project Management β Multi-project support with card counts and per-project prefix configuration
Skill Evaluation Framework
- β
Skill Eval Runner (
bun run eval:skill) β Completion-only harness: sends SKILL.md + scenario prompts to an Ollama model, parses the JSON plan of API calls, and scores against expected criteria (endpoint, method, body fields, ordering, NEVER violations); saves timestamped run folder with raw responses, scored results, and HTML report - β
Model Sweep (
bun run eval:models) β Runs the same eval across multiple Ollama models in sequence; produces a ranked comparison table;--waitflag to evict prior model from GPU before the next loads - β
Run Comparison (
bun run eval:compare) β Generates a visual HTML timeline across all runs for a skill with color-coded pass rates, skill-snapshot diff markers, and model column; pinpoints when a SKILL.md edit helped or hurt
Planned (Post-MVP)
- π User Attribution β Track who made each change (agent identity via
X-DevPlanner-Actorheader) - π WebSocket Smart Merge β Prevent edit interruptions when agents update cards being edited
- π Multi-Agent Support β Coordinate multiple AI agents with claim/release, session registry
- π Sub-task Support β Nested checklists with indentation
- π Dashboard View β Project health metrics at a glance
Project Documents
| File | Purpose |
|---|---|
| BRAINSTORM.md | Design decisions, resolved questions, future ideas |
| SPECIFICATION.md | Technical spec β file formats, API contracts, component architecture |
| TASKS.md | Phased build plan with task checklist |
| tools/skill-evals/README.md | Skill Eval Framework β running evals, model sweeps, comparing runs, adding new skills |
