Macos Desktop Control
MCP Server for native macOS desktop automation. Screenshot, mouse, keyboard, window management. Works with Claude Code, Codex CLI, Gemini CLI, Cursor, and any MCP client.
Ask AI about Macos Desktop Control
Powered by Claude Β· Grounded in docs
I know everything about Macos Desktop Control. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
macos-desktop-control
MCP server for native macOS desktop automation β screen, mouse, keyboard, window management, and mobile simulators.
No Docker. No virtual display. Controls your actual Mac desktop. AI operates in the background or the foreground β you choose.
What's New in v3.1
Smart screenshot compression β screenshots are now compressed by default to prevent API "Input too long" errors on high-DPI displays (Retina, 4K).
| Preset | Max Width | Quality | Format | Typical Size |
|---|---|---|---|---|
none | original | 100 | PNG | 4-15 MB |
low | 2048 px | 85 | JPEG | 300-500 KB |
medium | 1280 px | 70 | JPEG | 100-400 KB |
high | 800 px | 50 | JPEG | 30-150 KB |
Default is medium. Agent picks the level based on the task β or uses none for pixel-perfect work.
Tile mode β when full resolution is needed, split a screenshot into a grid. Agent fetches tiles one at a time, each small enough for the API.
New tool: screenshot_tile β fetch individual tiles from a tiled screenshot.
Compression also works on sim_screenshot and emu_screenshot.
v3.0
Two operation modes. 30 tools (up from 13). Optional iOS/Android simulator control.
| Mode | How It Works | User Experience |
|---|---|---|
| Foreground | cliclick + AppleScript (same as v2) | You watch the AI operate your screen |
| Background | CGEvent API via CGEventPostToPid | AI works in a target window β your focus stays untouched |
Add target: { app: "Safari" } to any supported tool. Coordinates become window-relative. The AI never steals your foreground.
Quick Start
# 1. Install cliclick
brew install cliclick
# 2. Clone and install
git clone https://github.com/d-wwei/macos-desktop-control.git
cd macos-desktop-control
npm install
# 3. Add to your MCP client (example: Claude Code)
claude mcp add macos-desktop-control -- node /path/to/macos-desktop-control/src/index.js
Grant Accessibility permission to your terminal: System Settings β Privacy & Security β Accessibility.
Features
Foreground Mode (default)
All original v2 capabilities, unchanged.
- Screen capture β full screen, region, or specific display; with compression presets and tile mode
- Mouse β click (left/right/double/triple), move, drag, scroll, with modifier keys
- Keyboard β three typing modes (keystroke, cliclick, direct IME bypass), any key combo via AppleScript key codes
- Window management β list windows, focus by app/title, open apps
- System β run macOS Shortcuts workflows
- Focus protection β
appparameter auto-refocuses the target before each action
Background Mode (target parameter)
Add target: { app: "AppName", title?: "WindowTitle" } to operate without stealing focus.
| Tool | Background Behavior |
|---|---|
screenshot | Captures the target window via screencapture -l<windowId> |
click | Sends CGEvent mouse events directly to the target PID |
type_text | Pastes text via CGEvent Cmd+V to the target PID (saves/restores clipboard) |
key_press | Sends CGEvent keyboard events to the target PID |
scroll | Sends CGEvent scroll wheel events to the target PID |
drag | Flash technique: briefly activates target β drags β restores your foreground app |
open_app | Launches via open -g (background, no focus steal) |
list_windows | Returns CGWindowID + PID for each window (used internally for targeting) |
When target is set, x/y coordinates are window-relative β (0,0) is the top-left corner of the target window. The server converts to screen-absolute coordinates internally.
iOS Simulator (requires Xcode)
Tools register automatically when xcrun simctl is detected.
| Tool | Function |
|---|---|
sim_list_devices | List simulators and their status |
sim_boot / sim_shutdown | Start or stop a simulator |
sim_screenshot | Capture at native device resolution |
sim_tap | Tap at iOS-space coordinates (auto-mapped to Simulator window) |
sim_swipe | Swipe gesture with duration control |
sim_type | Type text into the simulator |
sim_open_url | Open a URL on the simulator |
sim_install_app | Install a .app bundle |
Android Emulator (requires adb)
Tools register automatically when adb is detected. All operations are fully background β adb never steals focus.
| Tool | Function |
|---|---|
emu_list_devices | List connected devices/emulators |
emu_screenshot | Capture via adb exec-out screencap |
emu_tap | Tap at device coordinates |
emu_swipe | Swipe with duration control |
emu_type | Type text |
emu_key | Send keyevent (HOME, BACK, ENTER, etc.) |
emu_open_url | Open a URL via intent |
emu_install_app | Install an APK |
Usage Examples
Background screenshot of a specific app
{ "target": { "app": "Safari" } }
Captures Safari's window even if it's behind other windows. Your foreground stays untouched.
Background click in a window
{ "x": 100, "y": 200, "target": { "app": "Safari", "title": "GitHub" } }
Clicks at position (100, 200) relative to the Safari window titled "GitHub". No focus change.
Background text input
{ "text": "hello world", "target": { "app": "Notes" } }
Types into Notes via clipboard paste (CGEvent Cmd+V). Clipboard is saved and restored.
Compressed screenshot (default behavior in v3.1)
{ "target": { "app": "Chrome" } }
Returns a 1280px-wide JPEG (~150KB) instead of a raw PNG (~5MB). Works out of the box.
High-res screenshot with no compression
{ "target": { "app": "Chrome" }, "compression": "none" }
Returns the raw PNG β same as v3.0 behavior.
Custom compression
{ "target": { "app": "Chrome" }, "compression": "low", "maxWidth": 1920, "quality": 90 }
Explicit maxWidth/quality/format override the preset values.
Tile mode for full-resolution inspection
{ "target": { "app": "Chrome" }, "tile": { "rows": 2, "cols": 2 } }
Returns a manifest with tile metadata. Then fetch individual tiles:
{ "id": "tiles-1711929600000-abc123", "index": 0, "compression": "medium" }
Focus-safe foreground operation
{ "text": "hello", "app": "TextEdit", "mode": "direct" }
Writes text directly via AppleScript β bypasses input method entirely.
Prerequisites
- macOS (tested on Sequoia 15.x and Tahoe 26.x)
- Node.js 18+
- cliclick:
brew install cliclick - Accessibility permission for your terminal app
- Optional: Xcode (for iOS simulator tools)
- Optional: Android SDK with adb (for Android emulator tools)
Client Configuration
Uses stdio transport. Configuration is the same across all MCP clients.
Claude Code
# Project scope
claude mcp add macos-desktop-control -- node /path/to/macos-desktop-control/src/index.js
# Global scope
claude mcp add macos-desktop-control -s user -- node /path/to/macos-desktop-control/src/index.js
Claude Desktop
~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"macos-desktop-control": {
"command": "node",
"args": ["/path/to/macos-desktop-control/src/index.js"]
}
}
}
OpenAI Codex CLI
.codex/mcp.json:
{
"mcpServers": {
"macos-desktop-control": {
"command": "node",
"args": ["/path/to/macos-desktop-control/src/index.js"]
}
}
}
Gemini CLI
~/.gemini/settings.json:
{
"mcpServers": {
"macos-desktop-control": {
"command": "node",
"args": ["/path/to/macos-desktop-control/src/index.js"]
}
}
}
Cursor
.cursor/mcp.json:
{
"mcpServers": {
"macos-desktop-control": {
"command": "node",
"args": ["/path/to/macos-desktop-control/src/index.js"]
}
}
}
VS Code (GitHub Copilot)
.vscode/mcp.json:
{
"servers": {
"macos-desktop-control": {
"command": "node",
"args": ["/path/to/macos-desktop-control/src/index.js"]
}
}
}
Architecture
βββββββββββββββββββββββββββββββββββ
β MCP Server (stdio transport) β
ββββββββββββ¬βββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββ
β β β
Foreground Mode Background Mode Simulator Mode
β β β
ββββββββββββ΄βββββββββββ βββββββββ΄βββββββββ ββββββββββ΄βββββββββ
β cliclick (mouse) β β CGEvent API β β xcrun simctl β
β osascript (keyboard)β β via JXA bridge β β (iOS) β
β screencapture β β CGEventPost- β β β
β shortcuts CLI β β ToPid(pid) β β adb β
βββββββββββββββββββββββ β screencapture β β (Android) β
β -l<windowId> β βββββββββββββββββββ
ββββββββββββββββββ
Background mode internals:
CGWindowListCopyWindowInfovia JXA enumerates windows with CGWindowID, PID, and bounds- Window-relative coordinates are converted to screen-absolute using bounds
CGEventPostToPidsends mouse/keyboard/scroll events directly to the target processscreencapture -l<windowId>captures a specific window without requiring focus
Compared to Alternatives
| Solution | Platform | Background Mode | Simulator Support | Real Desktop |
|---|---|---|---|---|
| This project | macOS | Yes (CGEvent) | iOS + Android | Yes |
| Anthropic Computer Use | Linux | No | No | No (virtual) |
| MCPControl | Windows | No | No | Yes |
| Playwright MCP | Cross-platform | Partial | No | Browser only |
| PyAutoGUI MCP servers | Cross-platform | No | No | Yes |
Why macOS-native
- Background operation β CGEvent API posts events to a target PID without touching focus. PyAutoGUI and cliclick both require the window to be foreground.
- Focus-stealing prevention β
appparameter +ensureAppFocus()handles the approval-dialog problem that all MCP clients share. - IME bypass β
directmode writes text through AppleScript, skipping the input method entirely. PyAutoGUI'stypewriteonly handles ASCII. - Simulator integration β iOS and Android simulators controlled through the same MCP interface. No separate tools needed.
- Lightweight β cliclick (one brew package) + built-in macOS tools. No Python runtime, no ONNX, no heavy dependencies.
When to choose a cross-platform solution
- You need Windows or Linux support
- You need OCR-based element detection
- Background operation is not a requirement for your workflow
Update Management
This project integrates update-kit for update orchestration with policy control, verification, and rollback.
Check for updates:
npx update-kit check --cwd /path/to/macos-desktop-control --json
Apply an update (git pull + syntax verification):
npx update-kit apply --cwd /path/to/macos-desktop-control
Rollback if something goes wrong:
npx update-kit rollback --cwd /path/to/macos-desktop-control
Configuration lives in update.config.json. State and audit logs are stored in .update-kit/ (gitignored).
License
MIT
