cxg-census-mcp
MCP server for the CZ CELLxGENE Census. Single-cell, ontology-aware. Community, unaffiliated.
Ask AI about cxg-census-mcp
Powered by Claude ยท Grounded in docs
I know everything about cxg-census-mcp. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
cxg-census-mcp
An MCP server that lets LLM agents query the CZ CELLxGENE Discover Census single-cell atlas without lying about it โ ontology-aware filters, cost caps, full provenance + attribution on every response. Drop it into Cursor / Claude Desktop / Claude Code and ask questions like "compare immune cell composition of healthy vs COVID-19 human lung" in plain English.
Independent / unaffiliated. Not affiliated with, endorsed by, or sponsored by the Chan Zuckerberg Initiative (CZI), EMBL-EBI, the U.S. Census Bureau, or anyone else. "CELLxGENE" is a CZI mark; references here are descriptive (nominative) use only.
No warranty. MIT-licensed source, "as is". Research/exploration tool โ not a clinical or diagnostic instrument. Always verify results before publication. See LICENSE for the full trademark and content attribution notice, and SECURITY.md for the threat model and known-issues policy.
Alpha (v0.1.2).
CHANGELOG.md
Demos
Healthy vs COVID-19 lung, side-by-side. Two parallel queries, the
disease_multi_value_v7 schema-drift rewrite kicks in for the COVID
cohort, attribution from both contributing dataset sets surfaces in the
same chat turn.
https://github.com/user-attachments/assets/c836f225-5075-4643-87aa-70d311bc5fd2
Cell-type composition of human lung in one query. Free-text "lung"
resolved to UBERON:0002048, routed through tissue_general, every CURIE
labeled, all in a single Tier-0 call.
https://github.com/user-attachments/assets/b0e10ca7-e46b-4e5f-ae63-11949d328c4d
(Videos render on GitHub. On PyPI they appear as bare URLs โ head to the GitHub README to watch.)
More prompts in docs/example-questions.md.
Architecture at a glance
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
MCP client โ tools/ thin MCP wrappers, no logic โ
(Claude, โโบ โ โ โ
Cursor, โ โผ โ
Code, โฆ) โ planner/ FilterSpec โ QueryPlan, โ
โ โ cost estimate, tier routing โ
โ โผ โ
โ ontology/ OLS4 + hint overlay, โ
โ โ CL/UBERON/MONDO expansion โ
โ โผ โ
โ execution/ Tier 0 facet counts โ
โ โ Tier 1 chunked obs scan โ
โ โ Tier 2 expression aggregate โ
โ โ Tier 9 refuse โ snippet โ
โ โผ โ
โ clients/ OLS4 (HTTPS) + Census/SOMA โ
โ โ
โ caches/ OLS, facet, plan, filter LRU โ
โ models/ Response envelope w/ โ
โ attribution + provenance โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
โ EBI OLS4 (ontology) โ
โ CZ CELLxGENE Census โ
โ (CC BY 4.0 data) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
Full architecture notes: docs/architecture.md.
Tool reference: docs/tool-reference.md.
Example questions: docs/example-questions.md.
Install
From PyPI (recommended):
uv tool install "cxg-census-mcp[census]"
cxg-census-mcp # speaks MCP over stdio
Or with pip:
pip install "cxg-census-mcp[census]"
Without the [census] extra you get mock mode (deterministic fixtures) โ
handy for offline demos and verifying your MCP client config without pulling
tiledbsoma's ~1 GB of native deps.
From source (for development):
git clone https://github.com/MaxMLang/cxg-census-mcp
cd cxg-census-mcp
uv sync --extra dev --extra census
uv run cxg-census-mcp
MCP client config
Cursor (~/.cursor/mcp.json) and Claude Desktop
(~/Library/Application Support/Claude/claude_desktop_config.json on macOS)
both expect the same shape. Cleanest is uvx once installed from PyPI:
{
"mcpServers": {
"cxg-census": {
"command": "/absolute/path/to/uvx",
"args": ["--from", "cxg-census-mcp[census]", "cxg-census-mcp"]
}
}
}
Use the absolute path to
uvx(which uvxfrom your shell). MCP clients spawn the server in a non-interactive subprocess that doesn't source your shell rc, so a bare"uvx"will fail withNo such file or directory.
If you cloned from source instead, point at the checkout:
{
"mcpServers": {
"cxg-census": {
"command": "/absolute/path/to/uv",
"args": ["--directory", "/path/to/cxg-census-mcp", "run", "cxg-census-mcp"]
}
}
}
Claude Code:
claude mcp add cxg-census -- /absolute/path/to/uvx --from "cxg-census-mcp[census]" cxg-census-mcp
Quit + relaunch your client (โQ on macOS โ closing the window isn't enough) and the server should show up in the MCP panel with 13 tools.
Tools (13 total)
Workflow: census_summary, get_census_versions, count_cells,
list_datasets, gene_coverage, aggregate_expression, preview_obs,
export_snippet, get_server_limits.
Inspection: resolve_term, expand_term, term_definition,
list_available_values.
Plus MCP resources (markdown docs at cxg-census-mcp://docs/{slug}),
prompts (census_workflow, disambiguation), and cooperative
progress / cancellation notifications. Details in
docs/tool-reference.md.
Configuration
All env vars use the CXG_CENSUS_MCP_ prefix. Most useful:
| Variable | Default | Purpose |
|---|---|---|
CXG_CENSUS_MCP_CENSUS_VERSION | stable | Census release to pin |
CXG_CENSUS_MCP_CACHE_DIR | platformdirs default | Disk cache root |
CXG_CENSUS_MCP_MOCK_MODE | 0 | If 1, never opens a real Census handle |
CXG_CENSUS_MCP_LOG_LEVEL | WARNING | stdlib log level |
Full list and validation: src/cxg_census_mcp/config.py.
Development & operations
Quick loop:
make install-all # uv sync --extra dev --extra census
make lint typecheck test # ruff + mypy + pytest (mock mode)
make cov # tests + coverage HTML in ./htmlcov
make audit # pip-audit on locked production deps
Operational tasks (cache pre-warm, schema diff, container build, metrics
dump, plan-cache vacuum, weekly hint/facet refresh) live in the
Makefile
and are documented in
docs/operational-playbook.md.
Documentation index
| Topic | Where |
|---|---|
| System architecture | docs/architecture.md |
| Tool reference | docs/tool-reference.md |
| Example agent questions | docs/example-questions.md |
| Ontology resolution | docs/ontology-resolution.md |
| Schema-drift handling | docs/schema-drift-format.md |
| Census version pinning | docs/version-pinning.md |
| Progress / cancellation | docs/progress-and-cancellation.md |
| Error model | docs/error-model.md |
| Known limitations | docs/limitations.md |
| Ops runbook | docs/operational-playbook.md |
| Changelog | CHANGELOG.md |
License & attribution
Source code: MIT. The MIT license covers only the code in this repository, not the upstream data, ontologies, or third-party trademarks.
- Data. Tool responses are derived (filtered/aggregated) from the
CZ CELLxGENE Discover Census, distributed by the Chan Zuckerberg
Initiative under CC BY 4.0.
Every response carries an
attributionfield; downstream users must preserve attribution and indicate that changes were made. - Ontologies are fetched via EBI Ontology Lookup Service (OLS4) from CL, UBERON, MONDO, EFO, HANCESTRO, and others; each carries its own license.
- Trademarks ("CELLxGENE", "Cursor", "Claude", "Anthropic", "Model Context Protocol", โฆ) belong to their respective owners. Use here is descriptive only and does not imply affiliation.
This project is a client of the CZ CELLxGENE Discover Census; it does not host, mirror, or redistribute Census data.
Full notice in LICENSE.
