📦

Argus Automation

Argus Automation — SOTA desktop automation for AI agents. Works with Claude Code, Codex, and OpenClaw.

0 installs

Trust: 39 — Low

Blockchain

Ask AI about Argus Automation

I know everything about Argus Automation. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Argus Automation

English | 中文 | 日本語 | Français | Deutsch

SOTA desktop automation for AI agents.
Works with Claude Code, Codex, and OpenClaw.

Argus (Ἄργος Πανόπτης) — the hundred-eyed giant of Greek mythology, the all-seeing guardian who never sleeps. We named this project Argus because it sees your entire desktop through screenshots and controls it with surgical precision — just as the mythological guardian watched over everything entrusted to him.

Why Argus? Every other desktop MCP gives the AI a screenshot and a mouse — then hopes for the best. Argus is different: it's a direct port of Anthropic's own Chicago MCP, the production system behind Claude Code's built-in desktop control. We took the 6,300 lines of battle-tested logic — the security gates, the token budgeting, the batch execution engine — and swapped only the native layer for Windows. The result is arguably the most architecturally advanced computer-use implementation available today, and it's open source.

Two Fundamentally Different Design Philosophies

Every other MCP takes the "hand the model a hammer" approach — provide screenshot + click + type as atomic tools, then hope the model figures out the rest. Every step is: screenshot → look → decide → act → repeat.

Argus takes a fundamentally different approach: model desktop automation as a stateful, governed session — with layered security, token budgeting, and batch execution. The gap is enormous.

Comparison 1: Tool Design — Flat Primitives vs Layered Architecture

CursorTouch (5,000 stars) tools:

Click, Type, Scroll, Move, Shortcut, Screenshot, App, Shell...

Each tool is an independent atomic operation with no context relationship. The model must screenshot → look → decide → act at every single step.

Argus's layered tool design:

Session Layer:     request_access, list_granted_applications
Vision Layer:      screenshot, zoom
Precision Layer:   left_click, double_click, triple_click, right_click,
                   middle_click, left_mouse_down, left_mouse_up
Input Layer:       type, key, hold_key
Efficiency Layer:  computer_batch (N actions → 1 API call)
Navigation Layer:  open_application, switch_display
State Query Layer: cursor_position, read_clipboard, write_clipboard
Wait Layer:        wait

24 top-level tools + 16 batch action types. The essence of this layered design: let the model think at the right abstraction level, instead of starting from pixels every time.

Comparison 2: "Use APIs When You Can" — The Most Underrated Design Principle

This is the most underrated design point. Other MCPs force the model to perceive everything through vision. Argus's principle: if information can be retrieved via a structured API, never waste vision tokens on it. Screenshots are reserved for when you genuinely need visual understanding.

Task	Other MCPs	Argus	What You Save
Know which apps exist	Screenshot → model reads taskbar	`listInstalledApps()` → structured data	1 screenshot + 1 vision inference
Open an application	Screenshot → find icon → click	`open_application("Excel")` → direct API	2-3 screenshots + multiple clicks
Know which app is focused	Screenshot → model reads title bar	`getFrontmostApp()` → returns bundleId	1 screenshot + inference
Know cursor position	Screenshot → model guesses	`cursor_position` → exact coordinates	1 screenshot
Read clipboard	Ctrl+V into Notepad → screenshot → read	`read_clipboard` → returns text	Multiple actions + 2 screenshots
Switch monitor	Screenshot → wrong one → trial and error	`switch_display("Dell U2720Q")`	Trial-and-error loop
Read small text	Model squints at compressed screenshot	`zoom` → high-res regional crop	Misclick costs

Each avoided screenshot saves ~1,500 vision tokens and 3-5 seconds of latency.

Comparison 3: `computer_batch` — The Only Batch Execution Engine

This is a capability no competitor has. Here's how big the gap is:

Other MCPs performing "click field → type text → press Enter":

Call 1: screenshot        → model receives image → inference → next step
Call 2: click(100, 200)   → model receives OK   → inference → next step
Call 3: type("hello")     → model receives OK   → inference → next step
Call 4: key("Return")     → model receives OK   → inference → next step
Call 5: screenshot        → model confirms result

= 5 API round-trips × 3-8 seconds = 15-40 seconds

Argus doing the same thing:

Call 1: screenshot
Call 2: computer_batch([
  { action: "left_click", coordinate: [100, 200] },
  { action: "type", text: "hello" },
  { action: "key", text: "Return" },
  { action: "screenshot" }
])

= 2 API round-trips = 6-16 seconds

60% less latency and tokens. And every action inside the batch still gets a frontmost-app security check — not blind execution.

Comparison 4: Security Model — Production-Grade vs None

Security Dimension	CursorTouch (5k stars)	MCPControl (306 stars)	Argus
App-level permissions	No	No	3-tier (read/click/full)
Frontmost app gate	No (can click any window)	No	Checked before every action
Dangerous key blocking	No	No	Alt+F4, Win+L, Ctrl+Alt+Del
Click target validation	No	No	9×9 pixel staleness guard
Clipboard isolation	No	No	Stash/restore for click-tier apps
App deny-list	No	No	Browsers→read-only, Terminals→click-only

CursorTouch's README literally says "POTENTIALLY DANGEROUS". Argus's security model is designed for commercial products — Anthropic's Cowork and desktop app both use the same architecture.

Head-to-Head Summary

Capability	Argus	CursorTouch (5k stars)	MCPControl (306 stars)	domdomegg (176 stars)	sbroenne (24 stars)
Batch Execution	Yes	No	No	No	No
Token Budget Optimization	Yes	No	No	No	No
3-Tier App Permissions	Yes	No	No	No	No
Frontmost App Gate	Yes	No	No	No	No
Dangerous Key Blocking	Yes	No	No	No	No
Structured APIs (no-screenshot info)	Yes	Partial	Partial	No	Yes
Zoom (high-res detail crop)	Yes	No	No	No	No
Multi-Display Switch	Yes	No	No	No	No
Same Schema as Claude Code Built-in	Yes	No	No	Close	No
Anthropic Upstream Code Reused	6,300+ lines	0	0	0	0
Tool Count	24	19	12	6	10
Language	TypeScript	Python	TypeScript	TypeScript	C#

Quick Start

Prerequisites

Node.js 18+
Windows 10/11
Visual Studio Build Tools (for robotjs)

Install

git clone https://github.com/storyweaver/argus-automation.git
cd argus-automation
npm install
npm run build

Configure in Claude Code

Add to your project's .mcp.json:

{
  "mcpServers": {
    "argus": {
      "command": "node",
      "args": ["C:/path/to/argus-automation/dist/server-mcp.js"]
    }
  }
}

Restart Claude Code. You'll see 24 new tools prefixed with mcp__argus__.

Configure in Codex on Windows

Argus also ships a Codex-specific MCP entry point that keeps the Claude Code tool surface unchanged:

[mcp_servers.argus-codex]
command = "node"
args = ["D:\\claude-d\\gui-automation\\argus-automation\\dist\\server-codex-mcp.js"]
cwd = "D:\\claude-d\\gui-automation\\argus-automation"
tool_timeout_sec = 120

This local server does not call the OpenAI API and does not require OPENAI_API_KEY; Codex invokes it as a local MCP tool server. See docs/CODEX_WINDOWS_CUA.md for the action schema and plugin packaging notes.

Test

npm test          # 75 tests (unit + integration)
npm run test:unit # Unit tests only

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│  Upstream Layer — 6,300+ lines from Anthropic's Chicago MCP         │
│  (only 1 line changed)                                              │
│                                                                     │
│  toolCalls.ts (3,649 lines) — security gates + tool dispatch        │
│  mcpServer.ts — Server factory + session binding                    │
│  tools.ts — 24 tool schema definitions                              │
│  types.ts — complete type system                                    │
│  keyBlocklist.ts — dangerous key interception (win32 branch)        │
│  pixelCompare.ts — 9×9 staleness detection                         │
│  imageResize.ts — token budget algorithm                            │
└──────────────────────────┬──────────────────────────────────────────┘
                           │ ComputerExecutor interface
                           ▼
┌─────────────────────────────────────────────────────────────────────┐
│  Windows Native Layer — ~400 lines, new code                        │
│                                                                     │
│  screen.ts — node-screenshots + sharp (DXGI capture, JPEG, resize)  │
│  input.ts  — robotjs (SendInput mouse/keyboard)                     │
│  window.ts — koffi + Win32 API (window management)                  │
│  clipboard.ts — PowerShell Get/Set-Clipboard                        │
└─────────────────────────────────────────────────────────────────────┘

Tech Stack

Each library is the Windows equivalent of what the macOS version uses:

Module	macOS (Chicago MCP)	Windows (Argus)	Role
Screenshot	SCContentFilter	node-screenshots (DXGI)	Screen capture
Input	enigo (Rust)	robotjs (SendInput)	Mouse & keyboard
Window Mgmt	Swift + NSWorkspace	koffi + Win32 API	Window control
Image Processing	Sharp	Sharp	JPEG compress + resize
MCP Framework	@modelcontextprotocol/sdk	@modelcontextprotocol/sdk	MCP protocol

The 24 Tools

Category	Tools
Session	`request_access`, `list_granted_applications`
Vision	`screenshot`, `zoom`
Mouse Click	`left_click`, `double_click`, `triple_click`, `right_click`, `middle_click`
Mouse Control	`mouse_move`, `left_click_drag`, `left_mouse_down`, `left_mouse_up`, `cursor_position`
Scroll	`scroll`
Keyboard	`type`, `key`, `hold_key`
Clipboard	`read_clipboard`, `write_clipboard`
App/Display	`open_application`, `switch_display`
Batch + Wait	`computer_batch`, `wait`

Security Model

Three-tier per-app permissions — the only MCP server with this level of access control:

Tier	Screenshot	Click	Type/Paste
read (browsers, trading)	Yes	No	No
click (terminals, IDEs)	Yes	Left-click only	No
full (everything else)	Yes	Yes	Yes

Plus: dangerous key blocking, frontmost app gate on every action, session-scoped grants.

Logs

All tool calls logged to:

%LOCALAPPDATA%\argus-automation\logs\mcp-YYYY-MM-DD.log

Known Limitations

CJK text input: Use write_clipboard + key("ctrl+v") for non-ASCII text
App discovery: Currently returns running apps only (registry scan planned)
Pixel validation: Disabled on Windows (async sharp can't satisfy sync interface)
hideBeforeAction: Disabled (minimizing breaks WebView2 child processes)

License

MIT

Acknowledgements

Built with Claude

This entire project — architecture design, 6,300+ lines of upstream code analysis, Windows native layer implementation, 70 tests, and this README — was built in a single Claude Code session powered by Claude Opus 4.6. The AI agent analyzed Anthropic's Chicago MCP source code, identified the platform-agnostic abstraction boundary (the ComputerExecutor interface), reconstructed missing type definitions from usage patterns, implemented the Windows native layer from scratch, and wrote comprehensive tests — all in one continuous session.

Chicago MCP

The upstream code in src/upstream/ is from Anthropic's @ant/computer-use-mcp package (Chicago MCP), extracted from Claude Code v2.1.88. This is Anthropic's production desktop-control architecture; we ported only the native layer to Windows. The architectural brilliance of separating platform-agnostic logic from native implementation is entirely Anthropic's design.

Argus Automation

Reviews

Documentation

Argus Automation

Two Fundamentally Different Design Philosophies

Comparison 1: Tool Design — Flat Primitives vs Layered Architecture

Comparison 2: "Use APIs When You Can" — The Most Underrated Design Principle

Comparison 3: computer_batch — The Only Batch Execution Engine

Comparison 4: Security Model — Production-Grade vs None

Head-to-Head Summary

Quick Start

Prerequisites

Install

Configure in Claude Code

Configure in Codex on Windows

Test

Architecture

Tech Stack

The 24 Tools

Security Model

Logs

Known Limitations

License

Acknowledgements

Built with Claude

Chicago MCP

Comparison 3: `computer_batch` — The Only Batch Execution Engine