Fabric API MCP
MCP server exposing FABRIC Testbed provisioning and resource APIs via FastMCP and token-based FABRIC API integration.
Ask AI about Fabric API MCP
Powered by Claude Β· Grounded in docs
I know everything about Fabric API MCP. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
FABRIC API MCP Server
A production-ready Model Context Protocol (MCP) server that lets LLM clients (Claude Desktop, Claude Code, VS Code Copilot, ChatGPT, Chatbox, etc.) query, provision, and manage resources on the FABRIC Testbed β a nation-wide programmable network research infrastructure.
Key features
- Two modes β run locally (full-featured, SSH to VMs) or connect to a shared remote server
- 30+ tools β query sites/hosts/links, build slices, modify resources, manage SSH keys, reboot nodes, and more
- Stateless & secure β no credentials stored; every call uses a Bearer FABRIC ID token (server mode) or local
fabric_rcconfig (local mode) - Declarative filter DSL β powerful filtering, sorting, and pagination on all query tools
- Production-ready β OpenResty reverse proxy, Prometheus metrics, Grafana dashboards, structured logging with per-request tracing
- Resource cache β optional background refresh for sub-second topology queries
Table of contents
Quick install
Prerequisites: Python 3.11+ and a FABRIC account. Remote mode also requires Node.js and
jq.
Set up FABRIC MCP with a single command:
# Local mode (full-featured: SSH to VMs, post-boot config)
curl -fsSL https://raw.githubusercontent.com/fabric-testbed/fabric_api_mcp/main/install.sh | bash -s -- --local
# Remote mode (connects to remote MCP server via mcp-remote)
curl -fsSL https://raw.githubusercontent.com/fabric-testbed/fabric_api_mcp/main/install.sh | bash -s -- --remote
# Both modes
curl -fsSL https://raw.githubusercontent.com/fabric-testbed/fabric_api_mcp/main/install.sh | bash -s -- --local --remote
# Headless environment (no browser)
curl -fsSL https://raw.githubusercontent.com/fabric-testbed/fabric_api_mcp/main/install.sh | bash -s -- --local --no-browser
The installer:
- Creates
~/work/fabric-api-mcp/with a Python venv, bin directory, and helper scripts - Installs
fabric_api_mcp(which includesfabric-cli) into the venv - Runs
fabric-cli configure setupto authenticate via CILogon and set up your FABRIC config (token, SSH keys,fabric_rc) in~/work/fabric_config/ - Prints the configured project ID and MCP client config snippet
Project selection: By default, your first FABRIC project is used. The installer prints the project ID at the end. To change it later:
~/work/fabric-api-mcp/venv/bin/fabric-cli configure setup --config-dir ~/work/fabric_config --projectname <name> # or by UUID: ~/work/fabric-api-mcp/venv/bin/fabric-cli configure setup --config-dir ~/work/fabric_config --projectid <uuid>
See --help for all options (--config-dir, --venv, --no-browser).
For manual setup or more control, see Local mode setup and Remote mode setup below.
MCP client configuration
After installing (via the one-liner above or manually), add the FABRIC MCP server to your client. Replace <SCRIPT> with the path to your helper script:
- Local mode:
~/work/fabric-api-mcp/bin/fabric-api-local.sh(or wherever you placed it) - Remote mode:
~/work/fabric-api-mcp/bin/fabric-api.sh
Claude Code CLI
claude mcp add fabric-api <SCRIPT>
Claude Desktop
Edit claude_desktop_config.json (Settings β Developer β Edit Config):
{
"mcpServers": {
"fabric-api": {
"command": "<SCRIPT>"
}
}
}
VS Code
Add to .mcp.json in your project root (or workspace settings):
{
"servers": {
"fabric-api": {
"type": "stdio",
"command": "<SCRIPT>"
}
}
}
Chatbox
Chatbox (v1.14+) supports MCP servers. Go to Settings β MCP β Add Server, then paste this JSON:
{
"name": "fabric-api",
"command": "<SCRIPT>",
"args": [],
"env": {}
}
Alternatively, for remote mode (SSE transport), use the URL-based format:
{
"name": "fabric-api",
"url": "https://<YOUR_HOST>/mcp/sse"
}
Tools reference
All tools accept JSON parameters and return JSON responses.
Topology queries
| Tool | Description |
|---|---|
query-sites | List sites with filters, sorting, and pagination |
query-hosts | List hosts with filters, sorting, and pagination |
query-facility-ports | List external facility port connections |
query-links | List L2/L3 network links between sites |
Slice lifecycle
| Tool | Description |
|---|---|
build-slice | Create a slice with nodes, networks, components, switches, and facility ports |
query-slices | Search/list slices or fetch a single slice by name/ID |
get-slivers | List slivers (VMs, network services) within a slice |
modify-slice-resources | Add or remove nodes, components, networks from an existing slice |
accept-modify | Accept the last pending modification |
renew-slice | Extend a slice's lease end time |
delete-slice | Delete a slice by ID |
post-boot-config | Configure networking inside VMs after slice reaches StableOK (local mode only) |
Networking
| Tool | Description |
|---|---|
list-nodes | List nodes in a slice with SSH commands |
list-networks | List networks in a slice with subnet/gateway info |
list-interfaces | List interfaces in a slice with MAC/VLAN/IP details |
get-network-info | Get network details: available IPs, public IPs, gateway, subnet |
make-ip-publicly-routable | Enable external access for FABNetv4Ext/FABNetv6Ext IPs |
User & project management
| Tool | Description |
|---|---|
get-user-info | Fetch user info (self or by UUID) β name, email, bastion login, roles |
show-my-projects | List FABRIC projects for the current user |
list-project-users | List users in a specific project |
get-user-keys | Fetch a user's SSH/public keys |
get-bastion-username | Get the bastion login username |
Node operations (POA)
| Tool | Description |
|---|---|
add-public-key | Add an SSH public key to a sliver |
remove-public-key | Remove an SSH public key from a sliver |
os-reboot | Reboot a sliver's OS |
Authentication
| Mode | How it works |
|---|---|
| Server mode | Every request must include Authorization: Bearer <FABRIC_ID_TOKEN>. The server does not store tokens. |
| Local mode | Credentials are read automatically from your fabric_rc file (FABRIC_TOKEN_LOCATION). No Bearer header needed. |
Get a token: Use fabric-cli tokens create (installed with this package) or download from the FABRIC Portal β Experiments β Manage Tokens. The token JSON contains an id_token field.
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP Client (Claude Desktop / VS Code / ChatGPT / Chatbox) β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β call_tool + Bearer token (server mode)
β β or stdio (local mode)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FABRIC MCP Server (FastMCP + FastAPI) β
β ββ Tools (topology, slices, networking, user mgmt, POA) β
β ββ ResourceCache (optional background refresh) β
β ββ Middleware (access log, rate limit, metrics, security) β
β ββ Prometheus /metrics endpoint β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FABRIC APIs β
β ββ Orchestrator (slice lifecycle, resources) β
β ββ Credential Manager (token validation) β
β ββ Core API (user info, projects, roles) β
β ββ Artifact Manager (images, metadata) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ

- Every request carries a x-request-id for end-to-end tracing
- ResourceCache refreshes topology snapshots every 5 minutes for sub-second query responses
Repo layout
.
βββ fabric_api_mcp/ # Python package
β βββ __main__.py # FastMCP entrypoint
β βββ metrics.py # Prometheus metric definitions
β βββ resources_cache.py # Background topology cache
β βββ system.md # System prompt (served via MCP prompt)
β βββ middleware/ # Request processing pipeline
β β βββ access_log.py # HTTP access logging
β β βββ metrics.py # Prometheus HTTP metrics
β β βββ rate_limit.py # Rate limiting
β β βββ security_metrics.py # Auth failure & IP tracking
β βββ tools/ # MCP tool implementations
β βββ topology.py # Site/host/link/facility-port queries
β βββ slices/ # Slice lifecycle, networking, POA
βββ scripts/
β βββ fabric-api-local.sh # Local mode launcher
β βββ fabric-api.sh # Remote mode launcher
βββ nginx/
β βββ nginx.conf # OpenResty base config
β βββ default.conf # Reverse proxy + Vouch auth + Lua role check
βββ vouch/config # Vouch Proxy CILogon OIDC config
βββ monitoring/
β βββ prometheus/prometheus.yml # Scrape config
β βββ grafana/ # Dashboards + provisioning
βββ docker-compose.yml # All 5 services
βββ Dockerfile # MCP server image
βββ pyproject.toml # Package config (pip-installable)
βββ install.sh # One-line installer
βββ env.template # Template for .env
Environment variables
Server respects these (all optional unless stated):
| Var | Default | Purpose |
|---|---|---|
FABRIC_ORCHESTRATOR_HOST | orchestrator.fabric-testbed.net | Orchestrator host |
FABRIC_CREDMGR_HOST | cm.fabric-testbed.net | Credential manager host |
FABRIC_AM_HOST | artifacts.fabric-testbed.net | Artifact manager host |
FABRIC_CORE_API_HOST | uis.fabric-testbed.net | Core API host |
PORT | 5000 | MCP HTTP port (internal) |
HOST | 0.0.0.0 | Bind address |
LOG_LEVEL | INFO | Python log level |
LOG_FORMAT | text | text or json |
UVICORN_ACCESS_LOG | 1 | 1/true to emit access logs |
REFRESH_INTERVAL_SECONDS | 300 | ResourceCache refresh interval |
CACHE_MAX_FETCH | 5000 | Cache fetch limit per cycle |
MAX_FETCH_FOR_SORT | 5000 | Max fetch when client asks to sort |
METRICS_ENABLED | 1 (server) / 0 (local) | Enable Prometheus metrics + /metrics endpoint |
FABRIC_LOCAL_MODE | 0 | 1 to enable local/stdio mode (no Bearer token required) |
FABRIC_MCP_TRANSPORT | stdio (local) / http (server) | Override transport (stdio or http) |
The
system.mdfile is served to clients via an MCP prompt namedfabric-system.
Deploy with Docker Compose (Server Mode)
Step 1: Clone the repository
git clone https://github.com/fabric-testbed/fabric_api_mcp.git
cd fabric-mcp
Step 2: Place your TLS certificates
The NGINX reverse proxy terminates TLS and requires a certificate and private key. Update the volume paths in docker-compose.yml to point to your actual cert files:
volumes:
- /path/to/your/fullchain.pem:/etc/ssl/public.pem
- /path/to/your/privkey.pem:/etc/ssl/private.pem
Or copy/symlink them into the default location:
cp /path/to/your/fullchain.pem ssl/fullchain.pem
cp /path/to/your/privkey.pem ssl/privkey.pem
Step 3: Create the .env file
Docker Compose reads container UIDs/GIDs from a .env file so that Prometheus and Grafana can write to their host bind-mount directories. Copy the template and adjust if needed:
cp env.template .env
The defaults match the standard container users (Prometheus = 65534/nobody, Grafana = 472/grafana). If your host directories are owned by a different UID/GID, update .env accordingly:
# Check current ownership
stat -c '%u:%g' /opt/data/production/services/api-mcp/monitoring/prometheus
stat -c '%u:%g' /opt/data/production/services/api-mcp/monitoring/grafana
# Then edit .env to match, e.g.:
# PROMETHEUS_UID=1000
# PROMETHEUS_GID=1000
Step 4: Create monitoring data directories
Prometheus and Grafana persist data to host bind-mount directories. Create them and set ownership to match the UIDs in your .env before first start:
mkdir -p /opt/data/production/services/api-mcp/monitoring/{prometheus,grafana}
chown 65534:65534 /opt/data/production/services/api-mcp/monitoring/prometheus # prometheus (nobody)
chown 472:472 /opt/data/production/services/api-mcp/monitoring/grafana # grafana
Note: The UIDs above must match
PROMETHEUS_UID/GRAFANA_UIDin your.envfile.
Step 5: Start the services
docker compose up -d
This starts five containers:
fabric-api-mcpβ the MCP server (port 5000, internal only)fabric-api-nginxβ OpenResty reverse proxy (port 443, public)fabric-api-prometheusβ Prometheus metrics collector (internal only, 30-day retention)fabric-api-grafanaβ Grafana dashboards (exposed via NGINX at/grafana/, protected by Vouch Proxy)fabric-api-vouchβ Vouch Proxy for CILogon OIDC authentication (internal only)
Step 6: Verify
# Check containers are running
docker compose ps
# Check health endpoint
curl -k https://localhost/healthz
# Check MCP server logs
docker compose logs -f mcp-server
# Check Prometheus is scraping (internal only β use docker exec)
docker compose exec prometheus wget -qO- http://localhost:9090/api/v1/targets | python3 -m json.tool
# fabric-mcp target should show state: "up"
# Check the raw metrics endpoint (internal, not exposed via NGINX)
docker compose exec mcp-server curl -s http://localhost:5000/metrics | head -20
# Access Grafana via NGINX
# Open https://<your-host>/grafana/ (login: admin/admin)
| Service | URL | Access |
|---|---|---|
| MCP endpoint | https://<your-host>/mcp | Bearer token required |
| Grafana | https://<your-host>/grafana/ | CILogon login (requires facility-operators or facility-viewers role) |
| Prometheus | Internal only (Docker network) | Via docker compose exec prometheus ... |
Configuration files
The full Docker Compose and NGINX configurations are in the repository:
docker-compose.ymlβ defines all 5 services (MCP server, OpenResty, Vouch Proxy, Prometheus, Grafana)nginx/default.confβ OpenResty reverse proxy config with Bearer token passthrough, Vouch auth for Grafana, and Lua role checkingnginx/nginx.confβ base OpenResty configvouch/configβ Vouch Proxy CILogon OIDC settingsenv.templateβ template for.env(container UIDs, CILogon credentials)
Key NGINX requirements for the MCP endpoint:
- Pass
Authorizationheader:proxy_set_header Authorization $http_authorization - Use HTTP/1.1:
proxy_http_version 1.1 - Disable buffering for SSE:
proxy_buffering off
Adding new tools
- Add your tool function to an existing module under
fabric_api_mcp/tools/(or create a new one) and include it in that module'sTOOLSlist. - If you add a new module, import it in
fabric_api_mcp/tools/__init__.pyand append itsTOOLStoALL_TOOLS. __main__.pyauto-registers everything inALL_TOOLS, so no extra wiring is needed after export.
The MCP server runs on port 5000 in the container (
mcp.run(transport="http", host=0.0.0.0, port=5000)).
Local mode setup
Local mode runs the MCP server on your machine using your FABRIC token file and environment β no remote server required. The server reads credentials from your fabric_rc file and supports all tools including post_boot_config (SSH into VMs).
Quick install:
curl -fsSL https://raw.githubusercontent.com/fabric-testbed/fabric_api_mcp/main/install.sh | bash -s -- --localβ automates all the steps below.
Step 1: Create a Python virtual environment
Requires Python 3.11+ (tested with 3.13 and 3.14).
python3 -m venv ~/work/fabric-api-mcp/venv
source ~/work/fabric-api-mcp/venv/bin/activate
You can place the venv anywhere β just remember the path for later steps.
Step 2: Install the package
pip install git+https://github.com/fabric-testbed/fabric_api_mcp.git
This installs fabric_api_mcp and fabric-cli (included as a dependency) into the venv.
Or clone and install in development mode:
git clone https://github.com/fabric-testbed/fabric_api_mcp.git
cd fabric-mcp
pip install -e .
Step 3: Set up the FABRIC config directory
Use the venv's fabric-cli (installed as a dependency in Step 2) to set up your config. This creates the config directory, generates a token, creates bastion and sliver SSH keys, and writes ssh_config and fabric_rc files β all in one step:
~/work/fabric-api-mcp/venv/bin/fabric-cli configure setup --config-dir ~/work/fabric_config
This opens a browser for CILogon authentication. Once complete, it generates all required files in the config directory. Add --no-browser for remote/headless environments.
Important: Use the venv's
fabric-cli(~/work/fabric-api-mcp/venv/bin/fabric-cli), not a system-installed one, to ensure you have the correct version with theconfigurecommand.
Project selection: By default, your first FABRIC project is used. To specify a project:
--projectid <uuid>or--projectname <name>. The selected project ID is stored infabric_rcasFABRIC_PROJECT_ID.
To change your project later, re-run configure with the new project:
~/work/fabric-api-mcp/venv/bin/fabric-cli configure setup --config-dir ~/work/fabric_config --projectname <name>
Alternatively, set up manually. The config directory should contain:
fabric_rcβ environment file that exports FABRIC variables (token location, SSH key paths, etc.)tokens.jsonβ your FABRIC token file (downloaded from the FABRIC Portal β Experiments β Manage Tokens)- SSH keys β bastion key, slice key, and slice key
.pub(see Portal β Experiments β SSH Keys)
A minimal fabric_rc looks like:
export FABRIC_CREDMGR_HOST=cm.fabric-testbed.net
export FABRIC_ORCHESTRATOR_HOST=orchestrator.fabric-testbed.net
export FABRIC_CORE_API_HOST=uis.fabric-testbed.net
export FABRIC_PROJECT_ID=<your-project-uuid>
export FABRIC_TOKEN_LOCATION=~/work/fabric_config/tokens.json
export FABRIC_BASTION_HOST=bastion.fabric-testbed.net
export FABRIC_BASTION_USERNAME=<your_bastion_username>
export FABRIC_BASTION_KEY_LOCATION=~/work/fabric_config/fabric_bastion_key
export FABRIC_SLICE_PRIVATE_KEY_FILE=~/work/fabric_config/slice_key
export FABRIC_SLICE_PUBLIC_KEY_FILE=~/work/fabric_config/slice_key.pub
export FABRIC_LOG_FILE=~/fablib.log
export FABRIC_LOG_LEVEL=INFO
export FABRIC_SSH_COMMAND_LINE="ssh -i {{ _self_.private_ssh_key_file }} -F ~/work/fabric_config/ssh_config {{ _self_.username }}@{{ _self_.management_ip }}"
Replace <your-project-uuid> and <your_bastion_username> with your actual values from the FABRIC portal.
Step 4: Get the helper script
If you cloned the repo, the script is already at scripts/fabric-api-local.sh.
Otherwise, download it:
curl -o ~/fabric-api-local.sh \
https://raw.githubusercontent.com/fabric-testbed/fabric_api_mcp/main/scripts/fabric-api-local.sh
chmod +x ~/fabric-api-local.sh
Step 5: Configure the script for your environment
The script defaults are shown below. Update if your paths differ β either edit the script directly or override at runtime via env vars:
| Var | Default | Purpose |
|---|---|---|
FABRIC_VENV | ~/work/fabric-api-mcp/venv | Path to your Python venv |
FABRIC_RC | ~/work/fabric_config/fabric_rc | Path to your fabric_rc file |
If you used a different venv path in Step 1, update accordingly:
# Override at runtime:
FABRIC_VENV=~/my-other-venv ./scripts/fabric-api-local.sh
# Or edit the script default directly
Step 6: Test
~/fabric-api-local.sh
# or if using cloned repo:
./scripts/fabric-api-local.sh
You should see the MCP server start in stdio mode. Press Ctrl+C to stop.
Step 7: Configure your MCP client
See MCP client configuration β use the path to your fabric-api-local.sh script as <SCRIPT>.
Remote mode setup
Remote mode connects to a Docker Compose-deployed MCP server over HTTPS. It uses mcp-remote to bridge stdio to the remote endpoint and sends a Bearer token with each request.
Quick install:
curl -fsSL https://raw.githubusercontent.com/fabric-testbed/fabric_api_mcp/main/install.sh | bash -s -- --remoteβ automates all the steps below. The installer sets up a Python venv withfabric_api_mcp+fabric-cli(for token management), then installsjqand Node.js.
Step 1: Install prerequisites
Remote mode requires Python 3.11+ (for the venv with fabric-cli), plus jq and npx (mcp-remote):
# macOS
brew install jq node
# Linux
sudo apt install jq nodejs npm
Step 2: Set up the venv
Create a Python venv and install fabric_api_mcp (which includes fabric-cli):
python3 -m venv ~/work/fabric-api-mcp/venv
~/work/fabric-api-mcp/venv/bin/pip install git+https://github.com/fabric-testbed/fabric_api_mcp.git
Step 3: Create your token
Use the venv's fabric-cli to create a token:
~/work/fabric-api-mcp/venv/bin/fabric-cli tokens create --tokenlocation ~/work/fabric-api-mcp/id_token.json
This opens a browser for CILogon authentication, then saves the token automatically.
If running on a remote/headless VM, add
--no-browserand follow the printed URL manually. PressCtrl+Cafter login and paste the authorization code.
Alternatively, download your token from the FABRIC Portal β Experiments β Manage Tokens:
cp /path/to/downloaded/token.json ~/work/fabric-api-mcp/id_token.json
Step 4: Get the helper script
If you cloned the repo, the script is already at scripts/fabric-api.sh.
Otherwise, download it:
curl -o ~/fabric-api.sh \
https://raw.githubusercontent.com/fabric-testbed/fabric_api_mcp/main/scripts/fabric-api.sh
chmod +x ~/fabric-api.sh
Step 5: Configure the script
Update these if your paths or server URL differ from the defaults:
| Var | Default | Purpose |
|---|---|---|
FABRIC_TOKEN_JSON | ~/work/fabric-api-mcp/id_token.json | Path to JSON file containing {"id_token": "..."} |
FABRIC_MCP_URL | https://api-mcp.fabric-testbed.net/mcp | URL of the remote MCP server |
Step 6: Test
~/fabric-api.sh
# or if using cloned repo:
./scripts/fabric-api.sh
The script reads your token and connects to the remote MCP server via mcp-remote.
Step 7: Configure your MCP client
See MCP client configuration β use the path to your fabric-api.sh script as <SCRIPT>.
Local vs Remote β which to use?
| Local mode | Remote mode | |
|---|---|---|
| Script | fabric-api-local.sh | fabric-api.sh |
| Auth | Automatic from fabric_rc | Bearer token via id_token.json |
| Transport | stdio (direct) | stdio β mcp-remote β HTTPS |
| Server | Runs on your machine | Shared Docker Compose deployment |
| Post-boot config | Yes (SSH access to VMs) | No (no SSH access) |
| All tools available | Yes (30+ tools) | All except post-boot-config |
| Dependencies | Python 3.11+ | Python 3.11+ / Node.js / jq |
| Best for | Full-featured development & experimentation | Quick queries, shared team server |
Recommendation: Use local mode for the best experience β it supports all tools including SSH-based post-boot configuration of VMs.
Filter DSL
All query-* tools support a declarative JSON filter DSL with sorting and pagination.
Operators
| Operator | Description | Example |
|---|---|---|
eq | Equals | {"name": {"eq": "UCSD"}} |
ne | Not equals | {"state": {"ne": "Dead"}} |
lt, lte, gt, gte | Numeric comparisons | {"cores_available": {"gte": 32}} |
in | Value in list | {"name": {"in": ["RENC", "UCSD", "STAR"]}} |
contains | Substring, key, or element match | {"components": {"contains": "GPU"}} |
icontains | Case-insensitive contains | {"name": {"icontains": "utah"}} |
regex | Regex match | {"name": {"regex": "(?i)^u.*"}} |
any, all | List quantifiers | {"hosts": {"any": {"icontains": "gpu"}}} |
Logical OR: {"or": [{"name": {"eq": "UCSD"}}, {"name": {"eq": "STAR"}}]}
Sorting & pagination
{"sort": {"field": "cores_available", "direction": "desc"}, "limit": 50, "offset": 0}
Response format: {"items": [...], "total": 150, "count": 50, "offset": 0, "has_more": true}
Quick tool examples
Query hosts at UCSD with GPUs, sorted by free cores
{
"tool": "query-hosts",
"params": {
"filters": {"site": {"eq": "UCSD"}, "components": {"contains": "GPU"}},
"sort": { "field": "cores_available", "direction": "desc" },
"limit": 50
}
}
POA: reboot a nodeβs sliver
{
"tool": "os-reboot",
"params": {
"sliver_id": "<SLIVER-UUID>"
}
}
Build and submit a slice
{
"tool": "build-slice",
"params": {
"name": "demo-slice",
"ssh_keys": ["ssh-ed25519 AAAA... user@example"],
"nodes": [
{
"name": "node1",
"site": "UCSD",
"cores": 4,
"ram": 16,
"disk": 50,
"image": "default_rocky_8",
"components": [
{ "model": "GPU_TeslaT4", "name": "gpu0" }
]
},
{
"name": "node2",
"site": "RENC",
"cores": 8,
"ram": 32,
"disk": 100
}
],
"networks": [
{
"name": "net1",
"type": "L2PTP",
"nodes": ["node1", "node2"],
"bandwidth": 10
}
],
"lifetime": 60
}
}
Valid component models
| Category | Models |
|---|---|
| GPUs | GPU_TeslaT4, GPU_RTX6000, GPU_A40, GPU_A30 |
| NICs | NIC_Basic, NIC_ConnectX_5, NIC_ConnectX_6, NIC_ConnectX_7_100 (100G), NIC_ConnectX_7_400 (400G) |
| Storage | NVME_P4510 |
| FPGAs | FPGA_Xilinx_U280, FPGA_Xilinx_SN1022 |
Network types
| Type | Scope | Description |
|---|---|---|
L2Bridge | Single-site | Local bridge |
L2STS | Cross-site | Site-to-site L2 (default for multi-site) |
L2PTP | Cross-site | Point-to-point with ERO for dedicated QoS |
L2 | Auto | Shorthand β auto-selects L2Bridge or L2STS |
FABNetv4 / FABNetv6 | Per-site | Orchestrator-assigned L3 subnet |
FABNetv4Ext / FABNetv6Ext | Per-site | Externally routable L3 (use make-ip-publicly-routable) |
IPv4 / IPv6 / IPv4Ext / IPv6Ext | β | Aliases for the FABNet types above |
Auto-selection behavior:
- NIC: auto-selected based on network type and bandwidth (100 Gbps β
NIC_ConnectX_6, 25 Gbps βNIC_ConnectX_5, default βNIC_Basic). Override withnicin network spec. - Site: if omitted from a node, a random site with sufficient resources is chosen. Nodes are spread across different sites when possible.
- Multi-site FABNet*: creates per-site networks automatically (e.g.,
mynet-UTAH,mynet-STAR)
IP Assignment by Network Type
| Network Type | Subnet | IP Assignment |
|---|---|---|
| L2 (L2PTP, L2STS, L2Bridge) | User chooses any subnet | Manual assignment inside VMs |
| L3 (FABNetv4, FABNetv6) | Orchestrator assigns | Assign from orchestrator's subnet |
| L3 Ext (FABNetv4Ext, FABNetv6Ext) | Orchestrator assigns | Use make-ip-publicly-routable, configure returned IP |
- FABNetv4Ext: IPv4 subnet is shared across all slices at the site. Requested IP may be in use; orchestrator returns actual available IP. After calling
make-ip-publicly-routable, always re-fetch withget-network-infoand use the returnedpublic_ipsvalue. - FABNetv6Ext: Entire IPv6 subnet is dedicated to your slice. Any IP from the subnet can be requested.
- After modify: When adding FABNetv4Ext/FABNetv6Ext via
modify-slice-resources, wait forModifyOKstate before fetching network info and enabling public routing.
SSH Access to VMs
To access FABRIC VMs, you need:
- Bastion keys β Create at https://portal.fabric-testbed.net/experiments#sshKeys
- Slice SSH keys β The keys specified when creating the slice
- SSH config β Configure your
~/.ssh/config:
UserKnownHostsFile /dev/null
StrictHostKeyChecking no
ServerAliveInterval 120
Host bastion.fabric-testbed.net
User <bastion_login>
ForwardAgent yes
Hostname %h
IdentityFile ~/.ssh/bastion_key
IdentitiesOnly yes
Host * !bastion.fabric-testbed.net
ProxyJump <bastion_login>@bastion.fabric-testbed.net:22
Replace <bastion_login> with your bastion username (from get-user-info tool, e.g., kthare10_0011904101).
Example SSH command:
ssh -i /path/to/slice_key -F /path/to/ssh_config ubuntu@<vm_ipv6_address>
The VM's management IP (IPv6) is available from get-slivers output.
System prompt
Your fabric_api_mcp/system.md is exposed to clients via:
@mcp.prompt(name="fabric-system")
def fabric_system_prompt():
return Path("system.md").read_text().strip()
Put guardrails here (token validation reminders, exclusions, etc.).
Logging
- Structured per-request access logs (opt-in via
UVICORN_ACCESS_LOG=1) - App logs support
textorjsonformat viaLOG_FORMAT - Each HTTP request and tool call carries a request_id (also returned as
x-request-id)
Example JSON log:
{"ts":"2025-11-06T18:22:10+0000","level":"INFO","logger":"fabric.mcp",
"msg":"Tool done in 85.31ms (size=42)","tool":"query-hosts","request_id":"9a7c3e1b12ac"}
Resource cache
The server wires a ResourceCache (if present) to periodically refresh public topology/resource snapshots:
- Interval:
REFRESH_INTERVAL_SECONDS(default 300s) - Fetch limit:
CACHE_MAX_FETCH(default 5000) - Sorting big lists:
MAX_FETCH_FOR_SORT(default 5000)
This accelerates query-sites, query-hosts, query-facility-ports, query-links.
Monitoring & Metrics (Server Mode Only)
The MCP server includes built-in Prometheus metrics and a pre-configured Grafana dashboard. Metrics are enabled by default in server mode and disabled in local mode. Override with METRICS_ENABLED=0 or METRICS_ENABLED=1.
Architecture
Client β NGINX (:443) β MCP Server (:5000)
β
/grafana/ β Grafana (:3000) β Prometheus (:9090) β MCP Server (:5000/metrics)
- All services are on the internal
frontendDocker network /metricsendpoint is internal only β not exposed through NGINX- Prometheus and Grafana have no ports exposed to the host β Grafana is accessed via NGINX at
/grafana/ - Prometheus data is retained for 30 days (~100-500 MB depending on cardinality)
- Data is persisted to NFS at
/opt/data/production/services/api-mcp/monitoring/
Accessing the dashboard
After docker compose up -d:
| Service | URL | Credentials |
|---|---|---|
| Grafana | https://<your-host>/grafana/ | CILogon login (requires facility-operators or facility-viewers role) |
| Prometheus | Internal only | docker compose exec prometheus ... |
In Grafana, the FABRIC MCP dashboard is auto-provisioned and available immediately.
Available metrics
HTTP metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
mcp_http_requests_total | Counter | method, path, status | Total HTTP requests |
mcp_http_request_duration_seconds | Histogram | method, path | Request latency |
mcp_http_requests_in_progress | Gauge | method | Currently active requests |
Tool metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
mcp_tool_calls_total | Counter | tool, user_uuid, user_email, project_name, status | Tool calls (who called what, from which project) |
mcp_tool_call_duration_seconds | Histogram | tool | Tool execution latency |
User identity uses the FABRIC user UUID (a GUID from the JWT uuid claim) and email (email claim), not the CILogon sub URI. Project name is extracted from the first project in the JWT projects claim.
Per-user / access log metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
mcp_requests_by_user_total | Counter | user_uuid, user_email | Total requests per user |
mcp_requests_by_user_path_total | Counter | user_uuid, user_email, method, path | Per-user per-endpoint breakdown |
mcp_rate_limit_hits_total | Counter | key_type | Rate limit 429 responses |
Security metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
mcp_auth_failures_total | Counter | reason, client_ip | Auth failures by reason and source IP |
mcp_auth_success_total | Counter | user_uuid, user_email, client_ip | Successful auth by user + IP |
mcp_requests_by_ip_total | Counter | client_ip | All requests by source IP |
Auth failure reasons: missing_token, malformed_header, invalid_jwt, expired_token.
Grafana dashboard panels
The pre-built FABRIC MCP dashboard is organized into 4 sections with 19 panels:
Overview:
- Request rate (total / 5xx / 429)
- Request latency percentiles (p50, p95, p99)
- Active requests gauge
- Error rate percentage
- Rate limit hits over time
Tool Calls:
- Tool call rate by tool name
- Tool latency by tool (p95)
- Tool errors by tool
- Tool calls by user (email + UUID β tool, with call count)
- Tool calls total by tool (pie chart β tool usage distribution)
- Tool calls by project (project + email β tool breakdown)
Users & Access:
- Top users by total request count (email + UUID)
- Requests by user + endpoint (email + method + path)
Security:
- Auth failures by reason (stacked time series)
- Auth failures by IP (table β spot brute-force or overseas probing)
- Top client IPs by request volume
- User-to-IP mapping (table β spot token reuse from unexpected locations)
Example Prometheus queries
# Request rate over last 5 minutes
sum(rate(mcp_http_requests_total[5m]))
# p95 latency for all tool calls
histogram_quantile(0.95, sum(rate(mcp_tool_call_duration_seconds_bucket[5m])) by (le, tool))
# Which tools did a specific user call?
sum(mcp_tool_calls_total{user_email="user@example.edu"}) by (tool)
# Tool calls by project
sum(mcp_tool_calls_total) by (project_name, tool)
# Auth failures from a specific IP in the last hour
sum(increase(mcp_auth_failures_total{client_ip="203.0.113.42"}[1h]))
# Users authenticating from multiple IPs (possible token sharing)
count(mcp_auth_success_total) by (user_email) > 3
# Top 10 users by request count in the last 24h
topk(10, sum(increase(mcp_requests_by_user_total[24h])) by (user_email, user_uuid))
Disabling metrics
Set METRICS_ENABLED=0 in the MCP server environment. This disables:
- The
/metricsendpoint - All Prometheus metric collection (HTTP, tool, security)
- The
prometheus-clientlibrary is never imported (zero overhead)
Prometheus and Grafana containers can still run but will have no data to scrape.
Grafana authentication (Vouch Proxy + CILogon)
Grafana is protected by Vouch Proxy using CILogon OIDC. Only users with facility-operators or facility-viewers roles can access dashboards. NGINX (OpenResty) forwards the vouch session cookie to the FABRIC Core API to check user roles via Lua.
Setup:
-
Register a CILogon OIDC client at https://cilogon.org/oauth2/register
- Set the callback URL to
https://<your-host>/auth - Note the
client_idandclient_secret
- Set the callback URL to
-
Configure environment variables in your
.envfile (seeenv.template):VOUCH_HOSTNAME=your-mcp-host.fabric-testbed.net CILOGON_CLIENT_ID=your-cilogon-client-id CILOGON_CLIENT_SECRET=your-cilogon-client-secret -
Update the Vouch config β replace placeholders in
vouch/config:VOUCH_HOSTNAMEβ your server hostnameCILOGON_CLIENT_ID/CILOGON_CLIENT_SECRETβ from step 1- Ensure
publicAccess: false(required β iftrue, vouch passes unauthenticated requests)
-
Start services β
docker compose up -dnow starts 5 containers (addsvouch-proxy)
How it works:
- Unauthenticated requests to
/grafana/are redirected to CILogon login via Vouch Proxy - After login, Vouch Proxy sets a session cookie (
fabric-serviceon thefabric-testbed.netdomain) - On each request, NGINX's
auth_requestcalls Vouch to validate the session - A Lua
access_by_lua_blockthen forwards the vouch cookie to the FABRIC Core API:GET /whoamiβ retrieves the user's UUIDGET /people/{uuid}?as_self=trueβ retrieves the user's roles- Checks for
facility-operatorsorfacility-viewersin the roles list
- Role check results are cached for 5 minutes (
lua_shared_dict role_cache) to avoid repeated API calls - Users without the required roles get a 403 Forbidden response
- Grafana is configured for anonymous viewer access (auth is enforced at the NGINX layer)
- The
/mcpendpoint is not affected β it continues using Bearer token auth
Production considerations
- Grafana access control: Grafana is protected by Vouch Proxy + CILogon at the NGINX layer. Only users with
facility-operatorsorfacility-viewersroles can access it. Grafana itself uses anonymous viewer access (the admin password is only needed for dashboard editing via CLI). - No exposed ports: Prometheus and Grafana have no ports exposed to the host. Grafana is served through NGINX at
/grafana/. Prometheus is accessible only from within the Docker network. - Data retention: Prometheus is configured with 30-day retention (
--storage.tsdb.retention.time=30d). Estimated disk usage is ~100-500 MB for 30 days depending on user/tool cardinality. - NFS persistence: Prometheus and Grafana data directories are bind-mounted to
/opt/data/production/services/api-mcp/monitoring/. Container UIDs are configured via.env(copy fromenv.template). Ensure host directory ownership matches the UIDs in your.envfile. - Client IP forwarding: NGINX forwards the real client IP via
X-Real-IPandX-Forwarded-Forheaders. These must be set inside eachlocationblock (NGINX does not inheritproxy_set_headerfrom the server block when a location defines its own). - Alerting: Add Prometheus alerting rules (e.g., alert on auth failure spikes, error rate > 5%) and configure Grafana notification channels (email, Slack, PagerDuty).
Security notes
- Tokens are accepted only via Authorization header; they are not stored.
- Do not print tokens in logs. (Server code avoids this.)
- Terminate TLS at NGINX; keep the MCP service on an internal network.
- Rotate TLS certs and restrict
client_max_body_sizeif desired. - Auth monitoring: Prometheus tracks auth failures (missing/malformed/invalid/expired tokens) by client IP, and successful auth by user (UUID + email) + IP pair. Tool calls are tracked per user and FABRIC project. Use the Grafana security panels or Prometheus queries to detect brute-force attempts, overseas probing, and token reuse from unexpected locations.
License
MIT.
