io.github.wei-ai-lab/clinical-trial-design
Phase 2/3 clinical trial design via gsDesign / gsDesign2: sample size, GS boundaries, multiplicity
Ask AI about io.github.wei-ai-lab/clinical-trial-design
Powered by Claude Β· Grounded in docs
I know everything about io.github.wei-ai-lab/clinical-trial-design. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
clinical-trial-design
A Claude Code plugin and MCP server for end-to-end clinical trial design.
clinical-trial-design helps biostatisticians and clinical trialists design Phase 2 and Phase 3 confirmatory studies through a conversational interface, backed by validated R packages (gsDesign, gsDesign2, graphicalMCP).
v0.0.13 β pre-beta. Nine MCP tools across single-primary endpoint design (binary, continuous, time-to-event under PH and four NPH frameworks), multi-hypothesis design (co-primary, multi-population, graphical multiplicity / Maurer-Bretz), Monte-Carlo verification, and Word/PDF reporting. Reasoning-chain schema with sponsor-confidential redaction. Operational kernel solves accrual β duration β N, plus optional
max_n/max_durationfeasibility warnings. 288/288 R tests, 18/18 MCP smoke. Published to npm (clinical-trial-design) and the official MCP registry (io.github.wei-ai-lab/clinical-trial-design). Full change history in CHANGELOG.md; API contract in API_STABILITY.md.
What it is
clinical-trial-design has four layers:
| Layer | Role |
|---|---|
R package (r-package/ClinicalTrialDesign) | Pure R statistical computation engine. Wraps and extends established packages (gsDesign, gsDesign2, graphicalMCP, simtrial) behind a unified result schema. |
MCP server (mcp-server/) | Exposes the R functions as typed tools over the Model Context Protocol so Claude Code β or any MCP client β can call them. |
Skill (skills/clinical-trial-design/) | Domain-expert prompt. Translates a user's design brief into the right tool calls and interprets results in clinical-trial terms; includes a 9-step Phase 3 orchestration workflow with waypoints. |
Benchmark corpus (benchmarks/) | 176 curated public-trial designs across 21 family directories. Each case is human-readable Markdown plus machine-readable YAML with expected outputs and tolerances. |
Plus an eval harness under eval/ (11 reproducible scenarios Γ six scoring dimensions Γ multi-vendor Claude family) and an examples gallery under examples/ (5 published trials reproduced end-to-end).
What clinical-trial-design actually computes
| Family | Status |
|---|---|
| Fixed-sample binary | β super / NI / equivalence |
| Fixed-sample continuous | β super / NI / equivalence |
| Fixed-sample TTE β PH log-rank | β super / NI |
| Fixed-sample TTE β NPH MaxCombo / RMST / milestone | β superiority |
| Group-sequential binary / continuous | β
super / NI; futility via test.type |
| Group-sequential TTE β PH | β
super / NI; events via Schoenfeld + OBF inflation by default (regulatory-defensible; events_calc selector) |
| Group-sequential TTE β NPH (MaxCombo / WLR / AHR) | β superiority |
| Multi-hypothesis β co-primary endpoints | β fixed-sequence / alpha-split / Bonferroni |
| Multi-hypothesis β multi-population (subgroup + ITT) | β nested or disjoint relations |
| Multi-hypothesis β graphical (Maurer-Bretz) | β initial weights + transition matrix + Rule-3 validator |
| Adaptive (SSR, enrichment, selection) | β³ corpus has cases; wrappers are roadmap |
| MAMS / platform / basket / umbrella | β³ corpus has cases; wrappers are roadmap |
Status
| Layer | State |
|---|---|
| R package | β 288/288 testthat |
| MCP server | β 9 tools over stdio, esbuild bundle, 18/18 smoke |
| Skill | β 9-step Phase 3 orchestration workflow + waypoints |
| Benchmark corpus | β 176 curated public-trial cases / 21 families |
| Plugin manifest | β
.claude-plugin/plugin.json + marketplace.json |
| npm package | β
clinical-trial-design@0.0.13 published 2026-04-29 |
| Official MCP registry | β
io.github.wei-ai-lab/clinical-trial-design |
| CI release-gate | β
.github/workflows/release-gate.yml (R tests + R CMD check + MCP build/smoke + scenario validation) |
| LLM benchmark harness | β
11 scenarios Γ 6-dimension rubric (eval/) |
| Pre-beta hand-off | β items tracked in BETA_HANDOFF.md |
Tool surface
Nine MCP tools β three single-primary design tools, three multi-hypothesis design tools, three meta tools. Same unified result schema across families.
Single-primary design tools (3)
| Tool | Endpoint | Selectors | R backend |
|---|---|---|---|
design_binary | event / no-event | design_class β {"fixed", "group-sequential"} | gsDesign::nBinomial, gsDesign::gsDesign |
design_continuous | mean difference | design_class β {"fixed", "group-sequential"} | gsDesign::nNormal, gsDesign::gsDesign |
design_survival | time-to-event | design_class Γ model β {"ph", "maxcombo", "rmst", "milestone", "wlr", "ahr"} | gsDesign::nSurv/gsSurv (PH); gsDesign2::fixed_design_* and gs_design_* (NPH) |
All three accept comparison β {"superiority", "non-inferiority", "equivalence"} (equivalence on fixed-sample binary / continuous only), alpha, power, sided, allocation_ratio, GS parameters (k, timing, sfu, sfl, test.type), an optional operational block, and an optional reasoning_chain array (citation trail with source_type tags).
design_survival adds events_calc β {"schoenfeld" (default), "lachin-foulkes", "freedman"} for PH GS designs and accepts control_hazard_rate (events per patient-year) as an alternative to control_median.
Multi-hypothesis design tools (3)
| Tool | When to use | Strategies | R backend |
|---|---|---|---|
design_co_primary | Two or more co-primary endpoints (PFS+OS, CV death+HHF, mixed binary+continuous) | fixed-sequence (hierarchical, default), alpha-split (weighted), bonferroni | Per-endpoint dispatch to single-primary tools at the appropriate effective alpha; total N = max across endpoints |
design_multi_population | Same endpoint tested across multiple populations (biomarker subgroup + ITT, nested PD-L1 strata) | Same three strategies; relation β {"nested", "disjoint"} | Same per-population dispatch; for nested, total N driven by largest implied-enrolled-N (events / prevalence); for disjoint, total N is the sum |
design_graphical_multiplicity | Multi-hypothesis with alpha recycling (Maurer-Bretz) β mixed primary+secondary, dose-response | Graphical procedure with user-supplied initial weights and transition matrix; built-in Rule-3 validator | graphicalMCP::graph_create; per-hypothesis sample-size at worst-case alpha |
Meta tools (3)
| Tool | Purpose |
|---|---|
validate_against_benchmark | Replay a benchmark case through its matching design tool and diff against expected values within tolerance. |
verify_design | Monte Carlo cross-check of any result. Closed-form simulation under H0 and H1; Β±2 pp power / Β±0.5 pp Type I tolerance gate. Supports fixed and GS designs on binary, continuous, and PH-survival endpoints. |
design_report | Render a clinician-readable design summary in markdown (default), Word (format="docx" via officer), or PDF (format="pdf" via rmarkdown + Pandoc). Reasoning chain rendered as a Word table; sponsor-confidential entries surface a redaction warning at the top of the document. |
Operational kernel
Every endpoint design tool accepts an operational block that solves the simple relations accrual_rate Γ accrual_duration = sample_size_total and total_trial_duration = accrual_duration + follow_up_duration (plus target_events = sample_size_total Γ cumulative_event_rate(...) for survival, via uniroot over the closed-form pooled exponential-PH event probability β same kernel gsDesign::nSurv uses internally).
Supply any 0β4 of {accrual_rate, accrual_duration, follow_up_duration, total_trial_duration} plus optional caps {max_n, max_duration}. The solver fills in the missing values with an audit trail (given, derived); cap violations surface as structured feasibility_warnings rather than silent over-cap designs.
Quick start
Prerequisites: R β₯ 4.2, Node β₯ 18. No npm install step (the MCP server ships pre-bundled in mcp-server/dist/index.js) and no remotes::install_local step (the launcher sources r-package/ClinicalTrialDesign/R/*.R directly out of the plugin cache).
1. Install CRAN dependencies (one-time)
R -e 'install.packages(c("gsDesign","gsDesign2","graphicalMCP","jsonlite","officer","rmarkdown"))'
The first four are runtime imports; officer and rmarkdown are Suggests: and only needed for design_report(format="docx") / format="pdf".
Tested dependency versions
clinical-trial-design v0.0.13 was developed and tested against the versions below. CRAN's latest is usually fine; pin to these floors only if you hit a version-skew issue.
| Layer | Dependency | Tested version |
|---|---|---|
| R runtime | R | 4.5.3 (works on R β₯ 4.2) |
| R imports | gsDesign | 3.9.0 |
gsDesign2 | 1.1.8 | |
graphicalMCP | 0.2.9 | |
jsonlite | 2.0.0 | |
| R suggests | officer | 0.6.x (for design_report(format="docx")) |
rmarkdown | 2.20+ (for design_report(format="pdf"); Pandoc system dep) | |
simtrial | 1.0.2 (for verify_design Monte Carlo) | |
rpact | 4.4.0 | |
yaml | 2.3.12 | |
testthat | 3.3.2 | |
| Node runtime | Node | 22.22.1 (works on Node β₯ 18) |
| Node bundled | @modelcontextprotocol/sdk | ^1.0.0 (inlined in dist/index.js) |
zod | ^3.23.0 (inlined) |
2. Install the plugin
Method A β slash commands (recommended, inside Claude Code)
/plugin marketplace add wei-ai-lab/clinical-trial-design
/plugin install clinical-trial-design@wei-ai-lab
After install, restart Claude Code so it loads the bundled MCP server. Confirm with /plugin (clinical-trial-design should be listed and enabled at version 0.0.13).
Method B β host shell (equivalent, scriptable)
claude plugin marketplace add wei-ai-lab/clinical-trial-design
claude plugin install clinical-trial-design@wei-ai-lab
claude plugin list # confirm: clinical-trial-design@wei-ai-lab, version 0.0.13, enabled
If anything goes wrong, claude plugin validate /full/path/to/clinical-trial-design will tell you whether the marketplace + plugin manifests parse cleanly.
Quick local-dev alternative β skip the marketplace step and load the plugin directly from a checkout:
git clone https://github.com/wei-ai-lab/clinical-trial-design ~/clinical-trial-design
claude --plugin-dir ~/clinical-trial-design
Environment overrides
The MCP server auto-discovers Rscript in the usual locations (/opt/R/<version>/bin/, /usr/local/lib/R/bin, /usr/lib/R/bin, /usr/lib64/R/bin, /usr/local/bin, /usr/bin, /opt/homebrew/bin, /Library/Frameworks/R.framework/Resources/bin). For non-standard installs, override:
DESIGNR_RSCRIPT=/full/path/to/Rscriptβ explicit path to your R binary.DESIGNR_LAUNCHER=/full/path/to/launcher.Rβ override the R launcher (rare).
Posit Workbench, RStudio Server, VS Code Remote, and other sandboxed Claude Code hosts
Claude Code in these environments doesn't always inherit your shell's environment when it spawns the MCP server. Set the env var in ~/.claude/settings.json (NOT just ~/.bashrc):
{
"env": {
"DESIGNR_RSCRIPT": "/opt/R/4.5.1/bin/Rscript"
}
}
As of v0.0.14, the auto-discovery walk usually finds Posit Workbench's managed installs at /opt/R/<version>/bin/Rscript automatically, so the env override is only needed for non-standard setups. If you still see rscript_spawn_failed, the error message now reports exactly which paths were checked.
(The DESIGNR_* prefix is preserved as a wire-format contract; see API_STABILITY.md.)
Standalone MCP server (without Claude Code)
The MCP server is published to npm as clinical-trial-design. Any MCP-aware client (Claude Desktop, Cursor, Continue, custom MCP host) can launch it via npx:
npx clinical-trial-design@latest
The package bundles the R sources under r/; the launcher resolves them via import.meta.url so it works from a global install, a local install, or npx. CRAN dependencies (above) still need to be in your R user library.
Updating
Method A β slash command (inside Claude Code):
/plugin update clinical-trial-design@wei-ai-lab
Method B β host shell:
claude plugin update clinical-trial-design@wei-ai-lab
Restart Claude Code after updating.
Try it
Five conversational prompts you can paste into Claude Code once the plugin is installed. Each demonstrates a v0.0.13 capability:
-
Fixed binary superiority with reasoning chain (CAPTURE-style)
"Design a Phase 3 trial for refractory unstable angina. Control 30-day event rate β 15%, hoped-for treatment rate β 9%, two-sided Ξ± = 0.05, power 80%, 1:1. Cite the precedent for the assumed effect size."
Expect
design_binary(design_class = "fixed") with N β 1,000 and a populatedreasoning_chain(the agent should tag the alpha asfda_guidance, the precedent-derived effect asllm_precedent). -
Group-sequential survival under PH, regulatory-default events
"Phase 3 oncology, single-primary OS, 1L metastatic. Median 11 vs 17 mo (HR β 0.65), 2:1 randomization, 5% two-sided, 80% power, three analyses at 50%, 75%, 100% information time, OBF spending. 25 patients/month accrual, 12-month minimum follow-up, 5%/year dropout."
Expect
design_survival(model="ph",design_class="group-sequential") with events β 190 (Schoenfeld + OBF inflation), boundaries (2.96, 2.36, 2.01), and a Word/PDF report on follow-up if you ask for one. Passevents_calc="lachin-foulkes"if you want the v0.0.7 default behavior;"schoenfeld"is the new default and matches regulatory convention. -
CVOT with annualized event rate (v0.0.13's
control_hazard_rate)"Cardiovascular outcomes trial. Control event rate is 2.5% per patient-year, target HR 0.80, 1:1, 2.5% one-sided, 90% power, fixed-sample. We'll enroll 200 patients/month and need at least 12 months of follow-up after the last enrollment."
Expect
design_survivalto acceptcontrol_hazard_rate = 0.025directly (no need to translate to a median first), use theoperationalblock to solve duration, and report an events count in the high hundreds. -
Co-primary PFS + OS, hierarchical (KEYNOTE-189-style)
"Phase 3 1L NSCLC. Co-primary PFS and OS, hierarchical (PFS first, then OS). 2:1 randomization. PFS HR 0.50, control median 4.7 mo. OS HR 0.70, control median 17 mo. 80% power per endpoint, Ξ± = 0.025 one-sided. Plan a 20-month accrual, 12-month minimum follow-up for PFS / 24 months for OS. Report a Word document at the end."
Expect
design_co_primarywithstrategy = "fixed-sequence". OS will drive the total N. Both endpoints sized at full alpha = 0.025 (NOT alpha-split). Final tool call todesign_report(format = "docx"). -
Operational kernel + feasibility warning
"For prompt 1 above, we can enroll 80 patients/month with at least 3 months follow-up β and we can't go above 1,000 patients total."
Expect the operational block to derive
accrual_duration β 12.5months, total study duration β 15.5 months, plus afeasibility_warningsentry on the result if N exceeds the 1,000 cap (it doesn't quite β should land ~960). For a violation case, ask for power 90% with the same constraints and watch the warning surface.
For an end-to-end reproducible example, see examples/:
01_capture_binaryβ binary fixed superiority02_paradigm_hf_survivalβ TTE PH fixed03_keynote024_maxcomboβ TTE NPH MaxCombo04_keynote189_co_primaryβ co-primary hierarchical05_keynote042_multi_populationβ nested PD-L1 strata
Each is a runnable run.R plus a narrative README.md. The full 18-prompt smoke matrix is in mcp-server/SMOKE.md.
Roadmap
In priority order based on the corpus's family weights and current LLM-benchmark gaps:
- NPH evaluation step β pharma-skills' workflow gate where a PH design is followed by an NPH-evaluation pass that reports power under both PH and NPH assumptions. Currently NPH is a model selector, not an eval step.
- Piecewise control hazard for
design_survival(currently scalar exponential only). verify_designfor NPH GS designs (maxcombo/wlr/ahrgroup-sequential).- Adaptive sample-size re-estimation (corpus:
adaptive-ssr/) βrpact::getSampleSizeRates+ Promising-Zone rule. - Adaptive treatment selection / population enrichment (corpus:
adaptive-selection/,adaptive-enrichment/). - MAMS (corpus:
mams/) βMAMS::mamsorrpact::getDesignMams. - Recurrent events (corpus:
recurrent-events/). - Count / rate endpoints (corpus:
count-rate/). - Bayesian designs (corpus:
bayesian/). - Platform / basket / umbrella (corpus:
platform/,basket/,umbrella/).
Each row above already has β₯ 7 curated benchmark cases ready as regression anchors. See BETA_HANDOFF.md for items pending before the v0.5.0 beta tag.
Related work
RConsortium/pharma-skills is a complementary R Consortium working group skill collection focused on survival group-sequential designs with deep multi-hypothesis support and a Word-report deliverable backed by a Python template. As of v0.0.8, clinical-trial-design ships its own multi-hypothesis tools (design_co_primary, design_multi_population, design_graphical_multiplicity) covering hierarchical alpha control, biomarker subgroup + ITT patterns, and Maurer-Bretz alpha recycling.
The two projects still solve adjacent problems with different shapes: clinical-trial-design is broad and MCP-native (validated tools across the gsDesign / gsDesign2 / graphicalMCP surface, no local R session needed; cost-cheap because the agent doesn't reload skill content per turn), while pharma-skills runs in the user's local R session and requires lrsim() simulation pass before declaring a design done.
clinical-trial-design's verify_design adopts the same simulation-verification convention (Β±2 pp power / Β±0.5 pp Type I tolerance) so a design produced here can be subjected to the same credibility floor.
Contributing
clinical-trial-design welcomes contributions from both human biostatisticians and AI agents. Two entry points:
- AGENTS.md β codebase tour and conventions written for AI-agent contributors. Covers the four-layer architecture, a concrete walkthrough of adding a new design wrapper, the benchmark anchor schema, the agent-contributor protocol, and the reasoning-chain conventions.
- CONTRIBUTING.md β human-facing process: priority list, PR checklist, review expectations.
The highest-impact contribution is a new benchmark anchor β see .github/ISSUE_TEMPLATE/add-benchmark-case.yml for the machine-fillable template that mirrors benchmarks/schema/design.schema.json.
Trust boundary and hosting
- SECURITY.md β
clinical-trial-design's statelessness as a design property: the R package and MCP server are CI-gated against disk writes and network calls (.github/workflows/security-grep.yml). Any PR introducing forbidden patterns (writeLines,saveRDS,download.file,httr::,fs.writeFile,fetch,http.request, β¦) fails before merge. Confidential trial inputs you give the agent never leave your conversation through the plugin. - HOSTING.md β three deployment profiles: small-co. on public Claude Code (no persistence wanted), large-enterprise on Claude Code Enterprise + Bedrock private endpoint (corporate transcript retention as audit log), and air-gapped (forthcoming). Persistence and audit are host concerns; the plugin stays the same.
- API_STABILITY.md β what's frozen (MCP tool names, result-JSON top-level shape,
source_typeenum, error-class names) vs flexible (tool descriptions, defaults, internal helpers).
License
Apache License 2.0. All R code, MCP server, skill content, and benchmark corpus.
