Cua Desktop Operator Skill
MCP skill that lets any AI agent operate a Windows desktop โ clone-ready, model-agnostic, no cloud required
Ask AI about Cua Desktop Operator Skill
Powered by Claude ยท Grounded in docs
I know everything about Cua Desktop Operator Skill. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
What Is This?
CUA Desktop Operator Skill is a standalone, clone-ready skill repository that gives any MCP-capable AI agent a structured way to operate a Windows desktop.
The repository root is the skill package โ clone it directly into your agent's skills directory and it works.
agent (Codex / Claude Code / Cursor / OpenCode / ...)
โโโบ MCP Client
โโโบ desktop-operator (local stdio server, this repo)
โโโบ Windows Desktop
Why This Exists
Most desktop automation stacks fall into one of two extremes:
| Approach | Problem |
|---|---|
| Brittle scripts | No structured observation model; breaks on any UI change |
| Heavyweight agent systems | Assume a fixed model backend, cloud planner, or custom visual stack |
CUA Desktop Operator takes a different path:
| Principle | What it means |
|---|---|
| Reasoning stays in the agent | The AI model decides; the skill just executes |
| Execution stays local | No cloud round-trip, no external visual model required |
| Interface stays standard | MCP tools are the same regardless of which agent calls them |
| Skill stays portable | Clone once, use from Codex, Claude Code, Cursor, or any MCP client |
The result is a practical desktop operator that can be reused by multiple AI clients without rebuilding the execution layer for each one.
Key Capabilities
Desktop Control
|
Observation-First Workflow
|
Reusable Macro Layer
|
Cross-Agent Interface
|
Architecture
flowchart TB
subgraph AGT["AI Agent Layer"]
direction LR
A1["Codex"] & A2["Claude Code"] & A3["Cursor"] & A4["OpenCode"]
end
subgraph SKL["Skill Layer"]
S1["SKILL.md ยท references/ ยท scripts/"]
end
subgraph MCPL["MCP Layer"]
M1["desktop-operator ยท local stdio server"]
end
subgraph RTM["Runtime Layer"]
direction LR
R1["Actions & Observation"] & R2["Macro Engine"] & R3["Artifact Manager"]
end
subgraph WIN["Windows Desktop"]
direction LR
W1["Applications"] & W2["UI Automation"] & W3["Screenshot / State"]
end
AGT -->|reads| SKL
AGT -->|MCP calls| MCPL
MCPL --> RTM
RTM --> WIN
Layer responsibilities
| Layer | Role |
|---|---|
| Skill layer | Tells the agent when and how to use this skill; defines the observe โ plan โ act โ verify workflow; provides client setup guidance |
| MCP layer | Exposes a stable, versioned tool surface over stdio; returns structured results identical across all clients |
| Runtime layer | Performs real desktop actions via Win32 / UI Automation; captures screenshots and window state; manages task-scoped artifact lifecycle |
Repository Layout
cua_desktop_operator_skill/
โโโ SKILL.md โ Agent reads this first
โโโ README.md โ English documentation
โโโ README.zh-CN.md โ Simplified Chinese
โโโ README.zh-Hant.md โ Traditional Chinese
โโโ README.ja.md โ Japanese
โโโ README.ko.md โ Korean
โโโ LICENSE โ GNU AGPL v3.0
โโโ SECURITY.md
โโโ agents/
โ โโโ openai.yaml โ Agent manifest (Codex / OpenCode)
โโโ references/
โ โโโ compatibility.md โ Cross-agent notes
โ โโโ failure-recovery.md โ Recovery patterns
โ โโโ interaction-patterns.md โ Interaction best practices
โ โโโ macro-catalog.md โ Built-in macro reference
โ โโโ mcp-client-setup.md โ Client configuration guide
โ โโโ mcp-tool-catalog.md โ Complete MCP tool reference
โโโ scripts/
โ โโโ setup_runtime.ps1 โ Install dependencies
โ โโโ start_mcp_server.ps1 โ Launch MCP server
โ โโโ verify_real_tasks.ps1 โ Validate skill end-to-end
โ โโโ verify_real_tasks.py
โโโ desktop_operator_core/ โ Runtime library
โโโ desktop_operator_mcp/ โ MCP server package
Quick Start
Step 1 โ Clone into your skills directory
# For Codex
git clone https://github.com/Marways7/cua_desktop_operator_skill "$HOME\.codex\skills\cua_desktop_operator_skill"
# For Claude Code
git clone https://github.com/Marways7/cua_desktop_operator_skill "$HOME\.claude\skills\cua_desktop_operator_skill"
# For Cursor
git clone https://github.com/Marways7/cua_desktop_operator_skill "$HOME\.cursor\skills\cua_desktop_operator_skill"
Step 2 โ Install dependencies
.\scripts\setup_runtime.ps1
Step 3 โ Start the local MCP server
.\scripts\start_mcp_server.ps1
Step 4 โ Let your agent read SKILL.md
Point your agent at SKILL.md in this repository root. The agent will read the skill file and automatically configure itself โ understanding the available tools, the recommended workflow, and how to connect to the local MCP server.
No manual MCP wiring needed. The skill is self-describing.
MCP Tool Reference
Observation tools
| Tool | Description |
|---|---|
desktop_observe | Capture full screenshot, active window, window list, optional cropped target image, and JSON state artifact |
desktop_get_last_artifacts | Load latest screenshot, state, execution, and failure artifact paths |
desktop_cleanup_artifacts | Remove task-scoped temporary files after successful task completion |
Window management
| Tool | Description |
|---|---|
desktop_list_windows | Quick inventory of all visible windows |
desktop_find_window | Find candidate windows by title filter |
desktop_focus_window | Bring a window to foreground before keyboard interaction |
desktop_launch_app | Launch shell command, executable, URI, or shortcut |
Primitive actions
| Tool | When to use |
|---|---|
desktop_click_relative | Preferred โ click at a position relative to a target window |
desktop_click_absolute | Last resort โ absolute screen coordinates |
desktop_send_keys | Single key or hotkey sequence (Ctrl+C, Alt+F4, etc.) |
desktop_type_text | Short plain ASCII text |
desktop_paste_text | Preferred for CJK or long text โ clipboard-backed paste |
desktop_scroll | Scroll the focused area up or down |
desktop_wait | Explicit wait while UI is loading |
UI Automation
| Tool | Description |
|---|---|
desktop_uia_query | Enumerate UIA controls with optional selectors (text, automation ID, control type) |
desktop_uia_click | Click a UIA control by text, automation ID, or control type |
desktop_uia_type | Focus a UIA control and type into it |
Workflow tools
| Tool | Description |
|---|---|
desktop_run_macro | Run a built-in macro; use macro_id="__catalog__" to list all macros |
desktop_validate_state | Verify that a window or control is present after an action |
Full descriptions: references/mcp-tool-catalog.md
Macro Catalog
Macros encode stable, reusable GUI patterns. Prefer them over raw primitives for well-known flows.
| Macro ID | Category | Purpose |
|---|---|---|
app_launch | App launch | Launch app by command, URI, or executable |
desktop_shortcut_launch | App launch | Launch via .lnk shortcut path |
search_box_submit | Search | Focus search box, type query, submit |
chat_panel_toggle | Chat | Toggle chat panel by hotkey or relative click |
media_play_pause | Media | Send play/pause key to media player |
browser_focus_address_bar | Browser | Focus browser address bar via shortcut |
submit_or_confirm | Confirm | Press submit / confirm key sequence |
open_windows_settings | Settings | Open Windows Settings app |
Full descriptions: references/macro-catalog.md
Design Principles
| Principle | Details |
|---|---|
| Agent-neutral | One execution layer, many clients โ the same MCP tools serve every agent without modification |
| Local-first | No required cloud planner; no required external visual model; runs entirely on the local machine |
| Observe before acting | Every interaction loop starts with desktop_observe; never act blind |
| Small, safe steps | Keep each action bounded; prefer reversible actions; validate after every mutation |
| Reusable over brittle | Use macros for repeatable patterns; fall back to primitives only when needed |
| Portable by default | No hardcoded machine paths; no user-profile assumptions; no repo-local artifact dependencies |
Recommended Workflow for Agents
1. Verify that the desktop-operator MCP server is connected.
โโ If not: follow references/mcp-client-setup.md before proceeding.
2. Call desktop_observe.
โโ Inspect: screenshot path, active window, visible windows, optional cropped image.
3. Choose the smallest next action using this priority order:
desktop_focus_window โ before keyboard input
desktop_run_macro โ for any recognized reusable pattern
desktop_click_relative โ for stable window-relative positions
desktop_uia_click / uia_type โ when a reliable UIA control is visible
desktop_click_absolute โ last resort
4. Execute the action.
5. Call desktop_observe or desktop_validate_state to confirm the result.
6. Repeat from step 2 until the success condition is satisfied.
7. Call desktop_cleanup_artifacts.
โโ Skip only if the user explicitly asked to keep debug traces.
Artifact Management
Task screenshots, JSON state files, and execution logs are treated as temporary artifacts by default.
| Property | Value |
|---|---|
| Default storage | %LOCALAPPDATA%\desktop-operator\artifacts (Windows) / system temp (fallback) |
| Scope | Current task session only |
| Cleanup | Agent calls desktop_cleanup_artifacts after success |
| Override | Set DESKTOP_OPERATOR_ARTIFACTS environment variable |
Artifacts are never committed back to the repository.
Validation
Run the built-in validation script to confirm the skill is working end-to-end:
.\scripts\verify_real_tasks.ps1 --task observe
Available validation targets:
| Target | What it tests |
|---|---|
observe | Screenshot capture and window detection |
notepad | Launch, type, save in Notepad |
browser | Browser address bar and navigation |
settings | Open Windows Settings |
media | Media play/pause via macro |
chat | Chat panel toggle via macro |
all | Run all targets in sequence |
To keep artifacts for inspection after validation:
.\scripts\verify_real_tasks.ps1 --task all --keep-artifacts
Acknowledgements
We are grateful to the open-source community and the researchers whose work made this project possible. Special thanks to:
- microsoft/cua_skill โ for pioneering the Computer Use Agent skill concept and the structured skill-packaging approach that inspired this repository's design.
- bytedance/UI-TARS-desktop โ for the excellent work on GUI agent research and desktop interaction patterns that influenced our observation-first workflow.
License
This project is distributed under the GNU Affero General Public License v3.0.
AGPL is used here so that redistributed or hosted modified versions remain open under the same license.
Copyright (C) 2026 Marways7 and contributors.
Star History
If this project helps you, please consider giving it a star on GitHub.
