io.github.HomenShum/nodebench
260 MCP tools across 49 domains. AI Flywheel, quality gates, research, web scraping.
Ask AI about io.github.HomenShum/nodebench
Powered by Claude Β· Grounded in docs
I know everything about io.github.HomenShum/nodebench. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
NodeBench AI
Entity intelligence for any company, market, or question.
Live: nodebenchai.com
npm: npx nodebench-mcp / npx nodebench-mcp-power / npx nodebench-mcp-admin
GitHub: HomenShum/nodebench-ai
Product
NodeBench is a research and reporting product built around five user-facing surfaces:
Home= start quicklyChat= do the workReports= reusable memoryNudges= return at the right momentMe= operator context and control
The core idea is simple:
users do not just need a chatbot that answers once.
They need a system that can:
- take a question, file, URL, or prior thread
- search and synthesize with sources
- turn the run into a reusable artifact
- watch for meaningful change later
- improve the next run from what it learned
What Shipped
- five-surface web app across
Home,Chat,Reports,Nudges, andMe - typed search and reporting pipeline
- live SSE streaming with saved runtime state
- Convex-backed product state for sessions, reports, entities, nudges, files, and related objects
- shared-context handoff and delegation plumbing
- local and deployed server runtime for search, streaming, voice, and shared context routes
nodebench-mcp,nodebench-mcp-power, andnodebench-mcp-admindistribution lanes- builder-facing Oracle, dogfood, eval, replay, and control-plane infrastructure
Product At A Glance
USER SURFACES
-------------
Home -> start quickly
Chat -> answer, sources, trace, follow-ups
Reports -> reusable report memory
Nudges -> return loop
Me -> operator context, permissions, controls
BACKEND
-------
Convex tables and product state for sessions, reports, entities, nudges,
files, shared context, and evaluation artifacts
RUNTIME
-------
search pipeline
-> answer packet
-> saved report
-> tracked entity / tracked theme / follow-up task
-> nudge or prep brief
-> resumed chat or reopened report
COMPOUNDING LOOP
----------------
question
-> answer
-> saved report
-> watch item
-> useful nudge
-> better next run
DISTRIBUTION
------------
nodebenchai.com
nodebench-mcp
nodebench-mcp-power
nodebench-mcp-admin
Why This Design
NodeBench is designed around a few product realities:
- A useful answer should not disappear after one chat turn.
- Saved work should become reusable memory, not a dead archive row.
- The product should bring the user back only when something meaningful changes.
- The system should gradually learn how the user works without forcing a heavy onboarding flow.
- Operator context should improve future runs without turning the system into corporate-speak or fake-agreeable sludge.
That drives the current design:
- answer-first execution
- advisor mode by design via dynamic routing:
- fast executive lane for routine work
- deeper advisor lane for ambiguity, planning, and harder reasoning
- similar in spirit to Claude Code's official
opusplansplit: stronger planning lane, cheaper execution lane
- saved artifacts as first-class objects
- visible sources and traceability
- a five-page loop instead of five unrelated tabs
- future
Harness v2work focused on specification, operator context, and compounding behavior
Plain English:
NodeBench should not spend the most expensive reasoning path on every request.
It should move fast by default, then go deeper when the task, evidence, or user
request justifies it.
The detailed implementation, verification, and evaluation plan for this mode lives in:
How The Five Pages Compound
NodeBench should not feel like five separate destinations.
The intended product behavior is:
Home
-> start quickly
Chat
-> do the work
-> create the first useful artifact
Reports
-> turn that artifact into reusable memory
Nudges
-> bring the user back when something important changes
Me
-> improve how the next run is handled
Next Home or Chat run
-> starts with more context than before
The shortest version of the compounding loop is:
question
-> answer
-> saved report
-> watch item
-> useful nudge
-> better next run
Plain-English artifact flow:
input
-> answer packet
-> saved report
-> tracked entity / tracked theme / follow-up task
-> nudge or prep brief
-> resumed report or resumed chat
-> user correction or confirmation
-> updated operator context
-> better next run
What each page contributes:
Homestarts the run with the least friction possibleChatcreates the answer, sources, trace, entities, and next actionsReportsturns those into a durable report the user can reopen, refresh, and reuseNudgeswatches that report or entity and decides when it matters againMestores the operator context that improves the next answer
Current Legacy Infrastructure
NodeBench is not starting from zero. The repo already contains a substantial legacy stack that works today.
Current legacy foundation:
- five-surface web product
- Convex-backed canonical data layer
- local and deployed server runtime
- harness v1 planning and execution path
- shared-context handoff and delegation support
- MCP distribution lanes
- builder-facing evaluation and control-plane systems
What that means:
- the problem is not missing architecture
- the problem is product behavior, workflow compression, and clearer cross-surface compounding
Roadmap
The near-term goal is:
keep the working legacy foundation
remove accidental complexity
add specification-aware operator context
ship one clear compounding workflow
Main tasks still to finish:
- make
Home -> Chat -> Reports -> Nudges -> Mebehave like one continuous workflow instead of five adjacent surfaces - turn harness v1 into the clearer v2 shape described in HARNESS_V2_PROPOSAL
- ship
Layer 0operator context so the system can learn useful workflow patterns without forcing a heavy onboarding flow - support permissioned transcript ingestion from NodeBench chats first, then
optional external logs such as Claude Code JSONL transcripts for
nodebench-mcp - add style-drift guardrails so the system learns judgment and workflow without overfitting to corporate voice, filler, or sycophancy
- add anticipatory prep behavior so the system can prepare the user before important interactions, not only answer after the fact
- make saved reports behave like reusable memory, not storage
- make
Nudgesa real return loop with at least one working daily trigger - make
Meclearly improve future runs by exposing what context is being used and why - finish the
nodebench-mcpv3 cut-and-split plan so default runtime, power runtime, and admin runtime are clearly separated - instrument real latency, real cost, real artifact completion, and real reuse across both web and MCP flows
- keep README, runtime behavior, and exposed tool counts in sync so the public story matches the actual system
- keep dogfood, eval, and builder-control infrastructure as internal leverage instead of letting it leak into the main user-facing product
Quick Start
Web app
Open nodebenchai.com and start in Home.
MCP
# Claude Code
claude mcp add nodebench -- npx -y nodebench-mcp
# Claude Code power lane
claude mcp add nodebench-power -- npx -y nodebench-mcp-power
# Claude Code admin lane
claude mcp add nodebench-admin -- npx -y nodebench-mcp-admin
# Cursor
npx nodebench-mcp --preset cursor
# Generic MCP client
npx nodebench-mcp
Local development
git clone https://github.com/HomenShum/nodebench-ai.git
cd nodebench-ai
npm install
cp .env.example .env.local
# Frontend + Convex + voice server
npm run dev
# Production build
npm run build
Architecture
nodebenchai.com (React + Vite + Tailwind)
|
Convex Cloud (sessions, reports, entities, nudges, files, product state)
|
server runtime + search pipeline + SSE
|
answer packet
|
saved report
|
tracked entities / watch conditions / nudges
|
future runs with better operator context
Student Learning Lessons
The notebook and diligence stack in this repo are a good example of a common product engineering tradeoff:
- the best user experience is one notebook that feels continuous
- the safest current runtime is still layered and block-addressable underneath
For NodeBench, that means:
founderis a trait and diligence block, not a permanent sixth tab- diligence should use one generic pipeline, not many narrow
*Identify.tsfeatures - the runtime should stay
scratchpad-first -> structuring pass -> deterministic merge - user-owned prose should feel local-first and calm while typing
- live agent output should arrive as overlays or decorations first, not as direct document mutations
- accepted agent output should become frozen, user-owned notebook content
- provenance should stay available, but secondary to the reading and writing flow
Why the notebook does not use one giant live editor model yet:
- collaboration is more reliable when the system can address bounded sections
- provenance, evidence, and contribution logs need stable attachment points
- background agent updates should not compete with user keystrokes
- deterministic section-level merge is easier to reason about than whole-page mutation churn
The practical rule in this repo is:
UX should feel monolithic.
Runtime should stay layered.
Typing should be local-first.
Agent output should be overlay-first.
Accepted output should become owned prose.
Current notebook refactor lessons:
- hide the block machinery from the reading path
- keep chrome quiet and move metadata to hover or focus
- isolate the notebook surface from page-level re-render churn
- favor one memoized notebook boundary over many inline object props
- treat live diligence as read-only reference overlay until the user accepts it
- when accepted, materialize a frozen notebook snapshot with explicit provenance
- anchor live overlays at the notebook surface, not inside the first editable row
- let Convex projection rows carry real source metadata so the UI is not forced to reconstruct trust state from prose alone
- use one generic projection producer for overlays: report save writes the same structured rows that page-load backfill and manual refresh re-run
- when moving beyond report-backed overlays, stream raw scratchpad only in a secondary rail and emit structured projection rows on checkpoint rather than dumping scratchpad prose into the notebook body
- if checkpoint structure comes from an LLM, keep it block-scoped and schema-bound:
scratchpad checkpoint -> JSON -> validation/repair -> deterministic fallback -> projection row - let the model structure intermediate JSON, but keep merge, persistence, and notebook ownership deterministic
- ship generic diligence primitives first, then block-specific renderers
For students reading the code, the most relevant docs are:
The live notebook refactor is deliberately incremental:
- current shipped slices make the notebook feel more continuous and reduce per-keystroke render churn
- current shipped slices also move live diligence into notebook-surface overlays instead of seeded block-like records and freeze accepted snapshots
- the end state is one notebook experience with layered internals, not a raw block UI and not a brittle giant document runtime
Key tech
- Frontend: React, Vite, TypeScript, Tailwind CSS
- Backend: Convex
- Search: Linkup + Gemini extraction + grounding pipeline
- MCP server: Node.js + TypeScript
- Realtime runtime: SSE + Convex-backed persistence
API Keys
Set these in .env.local for local work or in Convex / Vercel for deployed
environments.
| Key | Required | Purpose |
|---|---|---|
GEMINI_API_KEY | Yes | classification, extraction, synthesis |
LINKUP_API_KEY | Recommended | web search and sourced answers |
VITE_CONVEX_URL | Yes | Convex deployment URL |
Codebase map
Top-3 levels, annotated. See ARCHITECTURE.md for the
pipeline diagram and docs/architecture/README.md
for the 13 canonical architecture docs.
nodebench-ai/
βββ README.md β you are here
βββ ARCHITECTURE.md β top-level pipeline diagram
βββ CONTRIBUTING.md β contribution bar
βββ CLAUDE.md β Claude Code conventions for this repo
βββ AGENTS.md β agent methodology + eval bench
βββ LICENSE β MIT
β
βββ src/ β React frontend (Vite)
β βββ features/ β feature-first, 30 folders (Home Β· Chat Β· Reports Β· Nudges Β· Me Β· entities Β· agents Β· β¦)
β β βββ <feature>/ β views Β· components Β· hooks Β· lib Β· __tests__ (colocated)
β βββ shared/ β shared UI primitives, hooks, utils
β βββ lib/ β registry, analytics, error reporting
β βββ layouts/ β shell + cockpit + public
β
βββ server/ β Node runtime (Express + MCP gateway)
β βββ pipeline/ β agent harness runtime + diligence blocks
β βββ routes/ β HTTP routes (search, harness, founder episodes)
β βββ mcpGateway.ts β WebSocket MCP gateway
β βββ services/ β shared services
β
βββ convex/ β Convex backend
β βββ domains/ β 19 domain folders (agents Β· product Β· research Β· founder Β· search Β· β¦)
β βββ schema.ts β database schema (includes agentScratchpads)
β βββ crons.ts β scheduled jobs
β
βββ packages/
β βββ mcp-local/ β the published nodebench-mcp npm package (MIT)
β βββ mcp-client/ β typed client SDK
β βββ convex-mcp-nodebench/ β Convex-side MCP auditor
β
βββ .claude/
β βββ README.md β map of the .claude/ layout
β βββ rules/ β 31 modular rules with related_ cross-refs
β βββ skills/ β reusable how-to procedures
β βββ agents/ β subagent configs
β βββ commands/ β custom slash commands
β
βββ docs/
β βββ README.md β docs tree map
β βββ ONBOARDING.md β 30-minute new-contributor path
β βββ architecture/ β 13 canonical specs + plans/ + README index
β βββ agents/ β agent docs + bootstrap configs
β βββ guides/ β how-to for builders
β βββ decisions/ β ADRs
β βββ changelog/ β release notes
β βββ product/ β product decisions
β βββ qa/ β QA protocols
β βββ archive/ β superseded content, provenance-only
β
βββ tests/
β βββ e2e/ β Playwright end-to-end
β βββ fixtures/ β shared fixtures
β
βββ scripts/ β dogfood, eval harness, one-offs
βββ public/ β static assets served by Vite + Vercel
βββ vendor/ β third-party references
Related Docs
Start here: docs/ONBOARDING.md Β· ARCHITECTURE.md Β· docs/architecture/README.md
The 13 canonical architecture docs are organized in 4 tiers. See docs/architecture/README.md for the indexed map:
- Tier 1 (core pipeline):
AGENT_PIPELINEΒ·DILIGENCE_BLOCKSΒ·USER_FEEDBACK_SECURITY - Tier 2 (sub-patterns):
SCRATCHPAD_PATTERNΒ·PROSEMIRROR_DECORATIONSΒ·AGENT_OBSERVABILITYΒ·SESSION_ARTIFACTS - Tier 3 (features):
FOUNDER_FEATUREΒ·REPORTS_AND_ENTITIESΒ·AUTH_AND_SHARING - Tier 4 (cross-cutting):
MCP_INTEGRATIONΒ·EVAL_AND_FLYWHEELΒ·DESIGN_SYSTEM
Historical specs are preserved in docs/archive/2026-q1/.
Product Suite
NodeBench AI = flagship user surface
nodebench-mcp = workflow lane
Attrition.sh = measured replay + optimization lane
Attrition is not a third flagship. It is the measurable optimization lane for the same NodeBench workflow.
License
MIT
