ApexAgents DockerImage
Contains code for apex agents docker image
Ask AI about ApexAgents DockerImage
Powered by Claude · Grounded in docs
I know everything about ApexAgents DockerImage. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Archipelago Environment (Prime Intellect / APEX-Agents)
Dockerized FastAPI environment for running APEX-Agents tasks with a hot-swappable MCP gateway, world bootstrap, data population, and snapshotting.
This repository builds the image used in the Prime Intellect ApexAgents environment:
viditostwal/archipelago-environment-pi:latest
What This Service Does
The container runs a gateway on port 5001 that:
- hosts MCP tools under
/mcp(after configuration), - configures MCP servers dynamically via
/apps, - populates environment state into
/filesystemand/.apps_data, - snapshots current state (stream or S3),
- bootstraps APEX world/task state from the
mercor/apex-agentsdataset.
High-Level Architecture
runner/main.py: FastAPI entrypoint (/health,/apps,/data/*,/bootstrap,/mcp).runner/gateway/*: MCP proxy build/warmup/hot-swap logic.runner/data/populate/*: ingesttar.gzuploads or S3 sources into subsystems.runner/data/snapshot/*: stream or upload snapshots from subsystems.runner/helper_functions.py: one-step benchmark bootstrap (bootstrap_world_and_mcp).config/mcp_config_all_oss_servers.json: default MCP server config (9 servers).mcp_servers/*: individual MCP app servers (calendar/chat/code/docs/filesystem/mail/pdf/slides/sheets).
Included MCP Servers
Default config (config/mcp_config_all_oss_servers.json) wires these stdio servers:
calendar_serverchat_servercode_execution_serversheets_serverfilesystem_servermail_serverpdf_serverslides_serverdocs_server
Quick Start (Docker)
Pull and run latest image
docker pull viditostwal/archipelago-environment-pi:latest
docker run --rm -p 5001:5001 \
--name apex-env \
viditostwal/archipelago-environment-pi:latest
Health check
curl -s http://localhost:5001/health
Expected response: OK
API Overview
POST /apps
Hot-swap MCP gateway config.
Request body shape:
{
"mcpServers": {
"filesystem_server": {
"transport": "stdio",
"command": "uv",
"args": ["run", "python", "main.py"],
"cwd": "/app/mcp_servers/filesystem/mcp_servers/filesystem_server",
"env": {
"APP_FS_ROOT": "/filesystem"
}
}
}
}
POST /data/populate
Populate a subsystem from uploaded tar.gz:
- multipart file field:
archive - query param:
subsystem(filesystem,.apps_data, or nested path under those roots)
POST /data/populate/s3
Populate from S3 URLs:
{
"sources": [
{
"url": "s3://my-bucket/path-or-prefix",
"subsystem": "filesystem"
}
]
}
POST /data/snapshot
Stream a tarball snapshot of both subsystems.
POST /data/snapshot/s3
Upload snapshot to S3 (format: "files" default, or "tar.gz").
POST /bootstrap
Single-call APEX bootstrap:
{
"task_selection": "task_9ba58a6197114140877a1df1754d2993"
}
task_selection can be:
- a task ID (
task_<...>), or - a numeric index into
tasks_and_rubrics.json.
What bootstrap does:
- Downloads task/world metadata from Hugging Face dataset
mercor/apex-agents. - Downloads the world snapshot zip for the resolved
world_id. - Populates
/filesystemand/.apps_datavia/data/populate. - Configures MCP servers using
config/mcp_config_all_oss_servers.json.
Returns IDs and output path metadata (task_id, world_id, trajectory_id, grading_run_id, output_dir).
Local Development
Requirements
- Python
>=3.13,<3.14 uv
Install
uv sync --all-groups
Run API locally
uv run uvicorn runner.main:app --host 0.0.0.0 --port 5001
Environment Variables
Configured in runner/utils/settings.py:
ENV(local|dev|demo|prod, defaultlocal)DATADOG_LOGGING(falseby default)DATADOG_API_KEY,DATADOG_APP_KEYS3_SNAPSHOTS_BUCKET(defaultsnapshots)S3_SNAPSHOTS_PREFIX(default empty)S3_DEFAULT_REGION(defaultus-west-2)S3_ACCESS_KEY_ID,S3_SECRET_ACCESS_KEY,S3_SESSION_TOKEN
Container defaults in Dockerfile also include:
APP_FS_ROOT=/filesystemGUI_ENABLED=trueINTERNET_ENABLED=falseHAS_STATE=trueSTATE_LOCATION=/.apps_data/chat
APEX-Agents Benchmark (Research Summary)
APEX-Agents is Mercor’s benchmark for long-horizon, cross-application professional workflows. It evaluates agents on realistic tasks spanning investment banking, consulting, and law.
Key points (from the official dataset card and paper):
480tasks across36worlds.- World contexts include files plus tool-based apps (calendar/chat/code/docs/filesystem/mail/pdf/sheets/slides).
- Grading is rubric-based with binary criterion-level judgments.
- Benchmark and files are open sourced, and Archipelago is the execution/eval infrastructure.
Primary sources:
- arXiv paper (
2601.14242): https://arxiv.org/abs/2601.14242 - Official dataset card: https://huggingface.co/datasets/mercor/apex-agents
- Archipelago repository: https://github.com/Mercor-Intelligence/archipelago
Notes
- This repository corresponds to the environment component (gateway + MCP servers) used in Archipelago-style APEX runs.
- Output examples from local evals are under
outputs/evals/.
