methods-mcp
MCP server for structured methods extraction + reproducibility heuristics on academic papers.
Ask AI about methods-mcp
Powered by Claude Β· Grounded in docs
I know everything about methods-mcp. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
methods-mcp
Lightweight, on-demand MCP server for structured methods extraction + reproducibility heuristics on academic papers. Built for the Worldwide AI Science Fellowship build challenge.
β οΈ Status: alpha (0.1.x). The tool surface and output shapes may shift between minor versions. Pin to an exact version in production. Bug reports very welcome via GitHub Issues.
Quick demo
$ uvx --from methods-mcp methods-mcp --version
methods-mcp 0.1.6
# In a Claude Code session:
> /mcp add methods-mcp methods-mcp
> Run methods_repro_review on https://arxiv.org/abs/2509.06917
β tool: methods_repro_review({"input_str":"https://arxiv.org/abs/2509.06917"})
# Returns a MethodsReproReview object. Read `narrative` first β it explains
# everything else in plain English, so no tool-learning is required:
{
"status": "ok",
"narrative":
"Resolved the paper: 'Paper2Agent' by Miao et al. (arxiv 2509.06917, "
"2025-09-08). Extracted 11 methods steps at moderate self-reported "
"confidence (0.72) β the procedure is clearly described but hyperparameters "
"and software versions are absent. Detected the associated code repository "
"https://github.com/jmiao24/Paper2Agent from an inline link in the paper "
"text (detection confidence 0.94). The repo scored 0.90/1.00 on the "
"reproducibility heuristic β verdict: likely reproducible. Present signals: "
"substantive README, dependencies file, notebooks, figure-plotting script, "
"recent activity, permissive license. Missing: data/fixtures directory. "
"Suggested entrypoint: `python make_figures.py`.",
"metadata": { ... }, # PaperMetadata
"methods": { ... }, # MethodsStructured (null if extraction failed)
"code_repo": { ... }, # CodeRepo (null only if input unresolvable)
"repro_assessment": { ... }, # ReproAssessment (null if no repo detected)
"errors": [] # [{step, error_type, message, hint}] on partial
}
methods-mcp is a small, sharply-scoped Model Context Protocol server. It gives any AI agent (Claude Code, Claude Desktop, your Agent SDK script, etc.) eight tools that turn an academic paper URL into:
- canonical metadata,
- best-effort full text + section split,
- a Pydantic-validated structured methods object (steps / reagents / equipment / analyses),
- the paper's associated code repository (best-effort discovery),
- a no-execution-required reproducibility verdict for that repo, and
- a multi-mode summary.
The wedge: heavyweight pipelines like Paper2Agent (Stanford) take 30 minutes to hours to digest a paper into agent-ready tools. methods-mcp is the agent-callable, on-demand complement β every tool returns in seconds, no clone, no execution.
Install
uv add methods-mcp
# or, install globally:
uv tool install methods-mcp
# or, classic pip:
pip install methods-mcp
API keys
For best performance, set both:
| Variable | Required? | What you get without it |
|---|---|---|
ANTHROPIC_API_KEY | Required for extract_methods, summarize_paper, methods_repro_review | Those tools raise RuntimeError: ANTHROPIC_API_KEY not set. Non-LLM tools (fetch_paper_text, find_code_repo, assess_repo_reproducibility) still work fine. |
GITHUB_TOKEN | Optional but recommended for assess_repo_reproducibility / methods_repro_review | You're capped at the GitHub unauthenticated rate limit (60 req/hr per IP). Each repo assessment is ~3 calls, so you'll hit the ceiling after ~15β20 repos/hr. With a token: 5,000 req/hr (effectively unlimited). |
export ANTHROPIC_API_KEY=sk-ant-...
export GITHUB_TOKEN=ghp_... # optional but recommended
Neither key is logged or persisted β they're sent only to api.anthropic.com and api.github.com respectively. See SECURITY.md.
Use it from Claude Code
/mcp add methods-mcp methods-mcp
Then in any Claude Code chat:
Take https://arxiv.org/abs/2509.06917 and run
methods_repro_review. Summarise what the paper does, the methods steps, and how reproducible the repo looks.
Use it from the Claude Agent SDK
from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
options = ClaudeAgentOptions(
mcp_servers={
"methods-mcp": {
"type": "stdio",
"command": "methods-mcp",
"args": [],
}
},
allowed_tools=["mcp__methods-mcp__methods_repro_review"],
)
async with ClaudeSDKClient(options=options) as client:
await client.query(
"Run methods_repro_review on https://arxiv.org/abs/2509.06917 "
"and tell me whether the repo looks reproducible."
)
async for msg in client.receive_response():
print(msg)
Tools
| Tool | What it does |
|---|---|
health | Server liveness + config check. |
get_paper_metadata(input_str) | Resolve URL / arXiv ID / DOI to canonical metadata. arXiv inputs hit the arXiv export API for title/authors/abstract. |
fetch_paper_text(input_str, prefer="auto"|"html"|"pdf") | Full text + section split. Defaults to ar5iv HTML for arXiv papers (cheap, structured), PDF fallback otherwise. |
extract_methods(input_str, model=None) | LLM-driven, Pydantic-validated structured methods extraction. Returns {steps, reagents, equipment, analyses, confidence}. |
find_code_repo(input_str) | Discover the paper's code repo via paper text β abstract β Papers With Code. |
assess_repo_reproducibility(repo_url, paper_id=None) | Heuristic, no-clone reproducibility assessment via the GitHub REST API. Weighted signals (README, deps, fixtures, notebooks, figure scripts, recent maintenance, license) β {verdict, score, recommended_entrypoint}. |
summarize_paper(input_str, mode="tldr"|"abstract"|"exec") | LLM summary in three depths. |
methods_repro_review(input_str) | Composite β metadata + methods + repo + repro in one call. |
All tools return Pydantic v2 models (validated, JSON-serialisable). See src/methods_mcp/schemas.py for the full type surface.
Design notes
extract_methodsuses Anthropic tool-use to coerce the model into emitting an instance of theMethodsStructuredPydantic schema. On validation failure we send one repair message with the validation error and try again before raising.assess_repo_reproducibilitydoes not clone or execute anything. It scores the repo from publicly-readable GitHub metadata + the recursive tree listing. This is the deliberate wedge against batch tools that try to actually rerun the paper.fetch_paper_textprefers ar5iv HTML over PDF parsing for arXiv papers. Falls back topypdffor non-arXiv inputs.- The default model is
claude-sonnet-4-6. Override viaMETHODS_MCP_MODELenv var or per-callmodel=arg. methods_repro_reviewreturns a self-describing response. Every call sets a top-levelstatus("ok"/"partial"/"empty") and anarrativestring that summarises everything retrieved in plain English β including every numeric score in context. A reader who reads onlynarrative+statusgets the full picture without needing to learn the sub-object shapes. Sub-objects can benullwhen unavailable (e.g.repro_assessment: nullon a paper with no detected repo βstatusstays"ok"because "no repo" isn't a failure). Failed sub-steps contribute a structured entry toerrorswith{step, error_type, message, hint}, wherehintis an actionable plain-English suggestion for recognised patterns (missing API keys, rate-limits, 404s, timeouts, etc.) andnullotherwise.
Scores & verdicts explained
Tool outputs contain three numeric fields that look similar but mean very different things. They are triage signals for an agent deciding whether a paper is worth digging into, not calibrated claims about correctness.
| Field | Range | How it's computed | How to read it |
|---|---|---|---|
methods.confidence | 0β1 | LLM self-report. The extractor model sets it per instructions in the system prompt: β₯0.8 only if the paper gives explicit reagents/volumes/equipment, ~0.3 if the methods section is sparse. Uncalibrated. | Soft signal for "is this a wet-lab paper with concrete procedure, or a sparse systems paper?" Useful as a flag; don't treat as a trust percentage. |
code_repo.confidence | 0β1 | Varies by detection_method. papers-with-code: fixed 0.95 (authoritative paperβrepo API). paper-text: computed as 0.6 + 0.2Β·(strong-phrase-present) + 0.015Β·score_margin, capped at 0.95. abstract-link: fixed 0.85. none: 0.0. | Tells you how the repo was found and how decisively. High score + paper-text means a strong phrase like "code is available at β¦" sat next to the URL. |
repro_assessment.overall_score | 0β1 | Weighted sum of 8 binary signals, all computed from the GitHub REST API (no clone, no execution): has_readme (0.10), readme_substantial (0.15), has_dependencies_file (0.20), has_data_or_fixtures (0.10), has_notebook (0.10), has_figure_script (0.20), actively_maintained (0.10), permissive_license (0.05). Each present signal contributes its weight. | The only fully-deterministic score of the three. Still a heuristic, not a proof β a high score means the repo looks well-structured for reproduction. For actual validation see Paper2Agent. |
Verdict buckets (repro_assessment.verdict) are thresholds on overall_score:
| Verdict | Score | Meaning |
|---|---|---|
likely-reproducible | β₯ 0.70 | Most repro-friendly signals present. Worth trying to run. |
partial | β₯ 0.45 | Some infrastructure, likely gaps. Expect to fill in missing pieces. |
unlikely | β₯ 0.20 | Minimal signal. Possible code dump without the scaffolding to rerun it. |
insufficient-info | < 0.20 or repo unreachable | Not enough to tell. Don't draw conclusions either way. |
Enum values you'll see in outputs:
code_repo.detection_method:paper-text|abstract-link|papers-with-code|metadata|nonemetadata.source:arxiv|biorxiv|doi|url|unknown
Security & limitations
What this server actually does when you install and run it:
- Network calls only to:
export.arxiv.org,ar5iv.labs.arxiv.org,arxiv.org(PDFs),api.github.com,paperswithcode.com,api.anthropic.com. No telemetry, no analytics, no phone-home. - Reads
ANTHROPIC_API_KEY(required for LLM tools) and optionallyGITHUB_TOKENfrom environment variables. These are sent only to Anthropic / GitHub respectively. Never logged, never persisted to disk. - Writes nothing to your filesystem. No cache directories, no downloaded PDFs, no temp files.
- Executes no user-supplied code. No
eval,exec,subprocess,pickle.loads, or shell-outs. The reproducibility tool deliberately does not clone or run repositories β it scores from the GitHub REST API only.
Limitations to be aware of:
- Adversarial papers may produce misleading structured output. The
extract_methodstool sends paper text to Claude. A paper containing prompt-injection content could yield wrong (but schema-valid) structured methods. Treat the output as a research aid, not ground truth. - The reproducibility verdict is a heuristic, not a proof. A high score means the repo looks well-structured for reproduction; it does not guarantee that running the code reproduces the paper. For full validation see Paper2Agent.
- Intended for local stdio use. The HTTP/SSE transports are provided for development convenience but should only be exposed on trusted networks (no SSRF protection beyond what httpx provides).
Reporting issues:
Security issues: please email flynnlachendro@hotmail.co.uk (also see SECURITY.md). Functional bugs: open a GitHub issue.
Pair with paper-mcp
For broader paper search / citation graph tooling, run paper-mcp (Bhvaik) alongside in the same Claude Code session. paper-mcp does title-keyed search, full-text fetch, citations, and references; methods-mcp adds the structured-methods + reproducibility layer on top. The two were intentionally designed to compose.
Develop locally
git clone https://github.com/FlynnLachendro/methods-mcp
cd methods-mcp
uv sync --extra dev --extra agent
uv run pytest # 49 tests, offline (respx-mocked httpx + unittest.mock for Anthropic)
uv run ruff format .
uv run ruff check . --fix
uv run mypy src
uv run methods-mcp --help
License
MIT β see LICENSE.
Acknowledgements
Built for the Worldwide AI Science Fellowship inaugural cohort. Thanks to Michael Raspuzzi for the open-ended brief.
Built on:
- FastMCP 3.x β the MCP server scaffold.
- Claude Agent SDK β the agent loop in the demo.
- ar5iv.labs.arxiv.org β clean HTML for arXiv papers.
- Anthropic Claude β the LLM behind structured extraction.
