🔍

ApexAgents DockerImage

Contains code for apex agents docker image

0 installs

Trust: 34 — Low

Ask AI about ApexAgents DockerImage

I know everything about ApexAgents DockerImage. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Archipelago Environment (Prime Intellect / APEX-Agents)

Dockerized FastAPI environment for running APEX-Agents tasks with a hot-swappable MCP gateway, world bootstrap, data population, and snapshotting.

This repository builds the image used in the Prime Intellect ApexAgents environment:

viditostwal/archipelago-environment-pi:latest

What This Service Does

The container runs a gateway on port 5001 that:

hosts MCP tools under /mcp (after configuration),
configures MCP servers dynamically via /apps,
populates environment state into /filesystem and /.apps_data,
snapshots current state (stream or S3),
bootstraps APEX world/task state from the mercor/apex-agents dataset.

High-Level Architecture

runner/main.py: FastAPI entrypoint (/health, /apps, /data/*, /bootstrap, /mcp).
runner/gateway/*: MCP proxy build/warmup/hot-swap logic.
runner/data/populate/*: ingest tar.gz uploads or S3 sources into subsystems.
runner/data/snapshot/*: stream or upload snapshots from subsystems.
runner/helper_functions.py: one-step benchmark bootstrap (bootstrap_world_and_mcp).
config/mcp_config_all_oss_servers.json: default MCP server config (9 servers).
mcp_servers/*: individual MCP app servers (calendar/chat/code/docs/filesystem/mail/pdf/slides/sheets).

Included MCP Servers

Default config (config/mcp_config_all_oss_servers.json) wires these stdio servers:

calendar_server
chat_server
code_execution_server
sheets_server
filesystem_server
mail_server
pdf_server
slides_server
docs_server

Quick Start (Docker)

Pull and run latest image

docker pull viditostwal/archipelago-environment-pi:latest

docker run --rm -p 5001:5001 \
  --name apex-env \
  viditostwal/archipelago-environment-pi:latest

Health check

curl -s http://localhost:5001/health

Expected response: OK

API Overview

`POST /apps`

Hot-swap MCP gateway config.

Request body shape:

{
  "mcpServers": {
    "filesystem_server": {
      "transport": "stdio",
      "command": "uv",
      "args": ["run", "python", "main.py"],
      "cwd": "/app/mcp_servers/filesystem/mcp_servers/filesystem_server",
      "env": {
        "APP_FS_ROOT": "/filesystem"
      }
    }
  }
}

`POST /data/populate`

Populate a subsystem from uploaded tar.gz:

multipart file field: archive
query param: subsystem (filesystem, .apps_data, or nested path under those roots)

`POST /data/populate/s3`

Populate from S3 URLs:

{
  "sources": [
    {
      "url": "s3://my-bucket/path-or-prefix",
      "subsystem": "filesystem"
    }
  ]
}

`POST /data/snapshot`

Stream a tarball snapshot of both subsystems.

`POST /data/snapshot/s3`

Upload snapshot to S3 (format: "files" default, or "tar.gz").

`POST /bootstrap`

Single-call APEX bootstrap:

{
  "task_selection": "task_9ba58a6197114140877a1df1754d2993"
}

task_selection can be:

a task ID (task_<...>), or
a numeric index into tasks_and_rubrics.json.

What bootstrap does:

Downloads task/world metadata from Hugging Face dataset mercor/apex-agents.
Downloads the world snapshot zip for the resolved world_id.
Populates /filesystem and /.apps_data via /data/populate.
Configures MCP servers using config/mcp_config_all_oss_servers.json.

Returns IDs and output path metadata (task_id, world_id, trajectory_id, grading_run_id, output_dir).

Local Development

Requirements

Python >=3.13,<3.14
uv

Install

uv sync --all-groups

Run API locally

uv run uvicorn runner.main:app --host 0.0.0.0 --port 5001

Environment Variables

Configured in runner/utils/settings.py:

ENV (local|dev|demo|prod, default local)
DATADOG_LOGGING (false by default)
DATADOG_API_KEY, DATADOG_APP_KEY
S3_SNAPSHOTS_BUCKET (default snapshots)
S3_SNAPSHOTS_PREFIX (default empty)
S3_DEFAULT_REGION (default us-west-2)
S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY, S3_SESSION_TOKEN

Container defaults in Dockerfile also include:

APP_FS_ROOT=/filesystem
GUI_ENABLED=true
INTERNET_ENABLED=false
HAS_STATE=true
STATE_LOCATION=/.apps_data/chat

APEX-Agents Benchmark (Research Summary)

APEX-Agents is Mercor’s benchmark for long-horizon, cross-application professional workflows. It evaluates agents on realistic tasks spanning investment banking, consulting, and law.

Key points (from the official dataset card and paper):

480 tasks across 36 worlds.
World contexts include files plus tool-based apps (calendar/chat/code/docs/filesystem/mail/pdf/sheets/slides).
Grading is rubric-based with binary criterion-level judgments.
Benchmark and files are open sourced, and Archipelago is the execution/eval infrastructure.

Primary sources:

arXiv paper (2601.14242): https://arxiv.org/abs/2601.14242
Official dataset card: https://huggingface.co/datasets/mercor/apex-agents
Archipelago repository: https://github.com/Mercor-Intelligence/archipelago

Notes

This repository corresponds to the environment component (gateway + MCP servers) used in Archipelago-style APEX runs.
Output examples from local evals are under outputs/evals/.