Marimo Sandbox
FastMCP server: run Python in auditable Marimo notebooks
Ask AI about Marimo Sandbox
Powered by Claude Β· Grounded in docs
I know everything about Marimo Sandbox. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
marimo-sandbox
A FastMCP server that runs Python code inside auditable Marimo
notebooks. Every execution is saved as a human-readable .py file you can open,
inspect, and re-run at any time.
Why
When an AI agent (Claude Code, etc.) runs Python on your behalf, you get back stdout and maybe a traceback. You can't see the full code in context, can't re-run it, can't modify it interactively.
marimo-sandbox fixes this by wrapping every execution in a Marimo notebook:
- Auditable β the exact code that ran is saved as a
.pyfile alongside its output - Viewable β
marimo edit <notebook>opens it in the browser with reactive cells - Re-runnable β the notebook is standalone;
python notebook.pyworks without the server - Persistent β all runs stored in SQLite with stdout, stderr, status, code hash, and artifacts
- Safe β static risk analysis runs before every execution; critical patterns can require approval
Install
pip install marimo-sandbox
# or with uv:
uv pip install marimo-sandbox
Requires Python 3.11+ and marimo:
pip install marimo
Add to Claude Code
claude mcp add marimo-sandbox -- python -m marimo_sandbox
Or with uv:
claude mcp add marimo-sandbox -- uvx marimo-sandbox
Set a custom data directory (where notebooks and the database are stored):
claude mcp add marimo-sandbox \
-e MARIMO_SANDBOX_DIR=/your/preferred/path \
-- python -m marimo_sandbox
Tools (17)
run_python
Run Python code and get back results + a notebook you can open.
code Python source to execute
description Short label for this run (shown in list_runs)
timeout_seconds Max execution time (default 60)
sandbox Run in Docker with --network=none (default False)
packages PyPI packages to install before running (e.g. ["pandas", "httpx"])
dry_run If True, return static risk analysis only β do not execute (default False)
require_approval If True, block execution when critical risk patterns are found (default False)
Returns: run_id, status, stdout, stderr, error, duration_ms, notebook_path,
view_command, code_hash, artifacts, and optionally risk_findings, packages_installed,
freeze.
Packages are installed via uv pip install when uv is available, falling back to pip. A
full pip freeze snapshot is captured after installation and stored with the run.
Structured outputs
Your code can expose typed data to agents via the __outputs__ dict:
import pandas as pd
df = pd.read_csv("data.csv")
__outputs__["summary"] = df.describe().to_dict()
__outputs__["row_count"] = len(df)
Retrieve these values later with get_run_outputs.
Static risk analysis
Every call to run_python runs an AST-based risk scan before execution. Findings appear
in risk_findings in the response. Use dry_run=True to get the analysis without running:
risk_findings severity tiers:
critical subprocess calls, os.system/popen, eval/exec/compile
high dangerous imports (os, subprocess, socket, requests, β¦)
medium open() with write/append mode
low os.environ[] access
Use require_approval=True to block execution when critical patterns are found. The response
will include an approval_token β pass it to approve_run to proceed.
approve_run
Confirm a blocked run and execute it. Tokens expire after 1 hour.
approval_token Token returned by run_python when status='awaiting_confirmation'
reason Optional note explaining the approval
list_pending_approvals
List all runs currently awaiting approval, including expiry status and critical finding count.
list_artifacts
List files created by a run's code (everything in the notebook directory except the notebook itself and the result sidecar).
run_id Run to inspect
Returns artifact_count and artifacts β each entry has path, size_bytes, extension.
read_artifact
Read the content of an artifact file. Path traversal is rejected. Large files are refused (default limit: 5 MB).
run_id Run that created the file
artifact_path Relative path from list_artifacts
max_size_bytes Size limit in bytes (default 5 000 000)
Returns content (str) for text files or content_base64 for binary files, plus
media_type, size_bytes, is_text.
get_run_outputs
Retrieve the structured __outputs__ dict written by the run. Returns {} if the
run hasn't completed successfully or didn't populate __outputs__.
run_id Run to read outputs from
rerun
Re-execute a previous run's code by run_id, optionally with modifications.
run_id Run to re-execute
code Override the code (default: use original)
description Override the description (default: original + " (rerun)")
timeout_seconds Max execution time (default 60)
sandbox Run in Docker sandbox (default False)
packages PyPI packages to install (default: reuse original run's packages)
open_notebook
Open a previous run in Marimo's interactive editor.
run_id ID returned by run_python
port Local port for the editor (default 2718)
Returns a url to open in your browser. You can then edit cells and re-run them.
cancel_run
Cancel a run that is currently executing (async_mode=True). Sends SIGTERM to the
process and marks the run as cancelled in the database.
run_id The run to cancel (must have status 'running')
Returns success, run_id, pid β or error if the run is not found or not running.
list_environments
List cached virtual environments (hash-based venv cache).
Each environment corresponds to a unique set of packages. Environments are reused
automatically when run_python is called with the same package list.
Returns count and environments β each entry has env_hash, packages,
size_bytes, created_at, last_used_at.
clean_environments
Delete cached virtual environments that haven't been used recently.
older_than_days Delete envs whose last_used_at is older than this many days (default 90)
Returns deleted_count, deleted_hashes, freed_bytes.
diff_runs
Compare two runs and explain what changed between them. By default compares run_id
against its parent run; supply compare_to to choose an explicit reference.
run_id The run to inspect (the "after" run)
compare_to ID of the reference run (the "before" run); defaults to run_id's parent
Returns run_a, run_b, relationship (parent_child / siblings / unrelated),
summary flags, code_diff (including diff_text), env_diff, artifact_diff,
output_diff, duration_diff, and a plain-English explanation.
list_runs
List recent runs with status, description, and timestamp.
limit Max results (default 20)
status Filter: 'success', 'error', or 'pending'
offset Number of runs to skip for pagination (default 0)
Returns total, count, offset, and runs.
get_run
Full details of a specific run, including the four provenance fields stored per run:
code_hash, env_hash, freeze, and risk_findings.
run_id Run to look up
include_code Include submitted code (default True)
include_notebook_source Include full .py notebook source (default False)
delete_run
Remove a run's database record and its notebook files from disk.
run_id Run to delete
delete_files Also remove the notebook directory (default True)
purge_runs
Bulk-delete runs older than N days to reclaim disk space.
older_than_days Delete runs older than this many days (default 30)
delete_files Also remove notebook directories (default True)
dry_run Preview what would be deleted without deleting (default False)
When dry_run=False returns deleted_runs, files_deleted, run_ids.
When dry_run=True returns dry_run=True, would_delete_runs, run_ids.
check_setup
Verify marimo, Docker, and uv are available and show the data directory.
Notebooks
Generated notebooks live at:
~/.marimo-sandbox/notebooks/{run_id}/notebook.py
Open any of them directly:
marimo edit ~/.marimo-sandbox/notebooks/run_a1b2c3d4/notebook.py
Or run headlessly:
python ~/.marimo-sandbox/notebooks/run_a1b2c3d4/notebook.py
A result sidecar is written alongside the notebook on success:
~/.marimo-sandbox/notebooks/{run_id}/{run_id}_result.json
Any other files your code writes to disk are captured as artifacts and
accessible via list_artifacts / read_artifact.
Sandbox mode (Docker)
For untrusted code, run_python(sandbox=True) runs inside Docker with:
--network=noneβ no outbound connections--memory=512mβ memory cap--cpus=1β CPU cap--read-onlyβ read-only root filesystem- writable
/sandboxmount for the notebook and result file
Build the sandbox image first:
docker build -f Dockerfile.sandbox -t marimo-sandbox:latest .
Add packages your code needs to Dockerfile.sandbox and rebuild.
Configuration
| Env var | Default | Description |
|---|---|---|
MARIMO_SANDBOX_DIR | ~/.marimo-sandbox | Where notebooks and DB are stored |
MARIMO_SANDBOX_DOCKER_IMAGE | marimo-sandbox:latest | Docker image for sandbox mode |
Notebook structure
Every generated notebook has four fixed cells:
| Cell | Purpose |
|---|---|
__setup__ | Imports marimo, returns (mo,) |
__context__ | Displays run metadata (description, run_id, timestamp) |
__execution__ | Initialises __outputs__: dict = {}; runs your code; returns (sandbox_executed, __outputs__) |
__record__ | Depends on sandbox_executed and __outputs__ β only runs on success; writes result sidecar with outputs |
The __record__ β __execution__ dependency means: if your code raises an
exception, __record__ never runs (Marimo's DAG won't execute a cell whose
dependencies failed). The executor detects the missing sidecar and reports an
error with the captured stderr.
Development
# Install in editable mode with dev dependencies
uv pip install -e ".[dev]"
# Lint
ruff check src/ tests/
# Type check
mypy src/
# Unit tests (fast, no subprocess)
pytest tests/ -m "not slow" -v
# Integration tests (run real Marimo subprocesses)
pytest tests/ -m slow -v
Known limitations
- Top-level
returnstatements in submitted code are rejected (they exit the cell function before the sentinel is set). Wrap such code in a function. sys.exit()in user code is detected and reported as an error.- Generated notebooks always import
marimo. If marimo is not installed in the execution environment, the notebook will fail.
