Mcpmaccontrol
MCP server for AI-driven macOS control β screenshots, mouse, keyboard, windows, and shell sessions
Installation
npx mcpmaccontrolAsk AI about Mcpmaccontrol
Powered by Claude Β· Grounded in docs
I know everything about Mcpmaccontrol. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
MCPMacControl
Give your AI eyes and hands on your Mac.
MCPMacControl is an MCP server that turns any AI assistant into an agentic computer user β it can see your screen, move the mouse, type on the keyboard, and run shell commands across any macOS application, not just the browser.
Works with Claude Code, Cursor, Windsurf, and any MCP-compatible client.
No scripting. No AppleScript. Just tell the AI what you want in plain language and watch it drive your Mac.
What can your AI do with it?
- Test your app like a QA engineer β click through flows, fill forms, verify visual state
- Automate legacy software that has no API β navigate menus, extract data from any window
- Debug UI issues by looking at screenshots and reproducing the problem step by step
- Drive native apps β Xcode, Figma, Finder, Mail, Excel β anything with a window
- Run interactive terminal sessions β full PTY with TUI support (vim, htop, docker logs)
"Take a screenshot of Safari, click the search bar, and type anthropic.com"
That's a real prompt. The AI does the rest.
Install
Download (recommended)
Download MCPMacControl.app.zip from the latest release, unzip, and move to /Applications/.
Release builds are signed and notarized with an Apple Developer ID certificate, so macOS permissions persist across updates and no Gatekeeper warnings appear.
Build from source
git clone https://github.com/sstraus/mcpmaccontrol
cd mcpmaccontrol
make build
make install
Configure your MCP client
Add to your MCP client's config (e.g. ~/.claude.json for Claude Code):
{
"mcpServers": {
"mac-control": {
"command": "/Applications/MCPMacControl.app/Contents/MacOS/mcpmaccontrol"
}
}
}
Then ask your AI: "Take a screenshot of Safari and click the search box"
Why not Playwright / browser automation?
Browser automation tools only control browsers. MCPMacControl controls everything on your Mac β native apps, Electron apps, system dialogs, menu bars, the Dock. If it has pixels on screen, Claude can see it and interact with it.
It also includes full PTY shell sessions, so Claude can run vim, docker, ssh, or any interactive terminal tool β not just headless commands.
Why an .app bundle?
Unlike most MCP servers, MCPMacControl needs macOS Screen Recording and Accessibility permissions. macOS grants these permissions per-application, not per-binary. When a plain binary runs from a terminal, macOS attributes the permission to the terminal app (iTerm, Terminal.app), not to the binary itself. This means:
- Permissions don't persist when the binary is rebuilt
- Every terminal app that launches the binary needs separate permission grants
- There's no way for users to manage the binary's permissions in System Settings
Wrapping the binary in a signed .app bundle gives it its own identity in macOS privacy settings. The binary still communicates via stdio like any MCP server - the .app bundle is just how macOS identifies it for permission management.
That's why the command path is /Applications/MCPMacControl.app/Contents/MacOS/mcpmaccontrol instead of /usr/local/bin/mcpmaccontrol.
Permissions
On first launch, macOS will prompt for two permissions:
| Permission | Purpose |
|---|---|
| Screen Recording | Screenshot capture |
| Accessibility | Mouse/keyboard control |
Grant both in System Settings > Privacy & Security. The app registers itself automatically.
Features
- Visual feedback β menu bar icon, native popover showing current operation, sound + orange border flash before automation
- Input safety β verifies the target app is focused before sending keystrokes, preventing input from reaching the wrong window
- App context inheritance β
focuspropagates to subsequent actions in a batch, so you writeapponce - Signed
.appbundle β macOS permissions persist across updates, no Gatekeeper warnings - Region capture β screenshot only what you need to save tokens
- Auto-exit when the AI client disconnects
Tools
| Tool | Description |
|---|---|
help | Built-in documentation - call first! |
list_windows | Find windows by app name |
capture_window | Screenshot window or region with click coordinates |
capture_screen | Screenshot entire screen |
do | Execute actions: click, type, key, scroll, focus, minimize, etc. |
shell | PTY shell sessions: spawn, send_input, get_snapshot, resize, close, list |
processes | List running processes with filtering for debugging |
Built-in Help
The AI calls help() to learn the API:
help()β Overview and workflowhelp("actions")β All action types fordo()help("shell")β Shell/PTY documentationhelp("examples")β Usage examples
How it works
Claude operates in a see-think-act loop β the same way a human would use a computer:
1. capture_window("Safari") β Screenshot saved to temp file
2. [Claude reads the image] β "I see a search bar at (400, 50)"
3. do([ β Click, type, press Enter
{type:"click", app:"Safari", x:400, y:50},
{type:"type", text:"anthropic.com"},
{type:"key", key:"enter"}
])
4. capture_window("Safari") β Verify the result
Pixel coordinates in the screenshot map directly to do() action coordinates β no conversion needed.
Region capture
Capture a portion of a window to save tokens:
capture_window("Safari", region_x: 0, region_y: 0, region_width: 400, region_height: 300)
Click coordinates: click_x = region_x + x_in_image, click_y = region_y + y_in_image.
The do Tool
Execute one or more actions in sequence:
do({"actions": [
{"type": "click", "app": "Safari", "x": 400, "y": 50},
{"type": "type", "text": "anthropic.com"},
{"type": "key", "key": "enter"}
]})
Action Types
| Action | Required | Optional | Description |
|---|---|---|---|
click | app, x, y | button, double | Click at position |
move | app, x, y | Move mouse | |
type | text | Type text string | |
key | key | modifiers | Press key (supports "cmd+shift+g" compound syntax) |
scroll | app, x, y, delta_y/delta_x | Scroll in window | |
wait | ms | Pause (milliseconds) | |
focus | app | Bring window to front | |
minimize | app | Minimize to dock | |
restore | app | Restore from dock | |
close | app | Close window | |
resize | app, width, height | Resize window |
Examples
Click:
{"type": "click", "app": "Safari", "x": 100, "y": 50}
{"type": "click", "app": "Finder", "x": 200, "y": 100, "button": "right"}
{"type": "click", "app": "Finder", "x": 200, "y": 100, "double": true}
Keyboard:
{"type": "type", "text": "Hello World"}
{"type": "key", "key": "enter"}
{"type": "key", "key": "cmd+v"}
{"type": "key", "key": "cmd+shift+g"}
{"type": "key", "key": "v", "modifiers": ["cmd"]}
Scroll:
{"type": "scroll", "app": "Safari", "x": 400, "y": 300, "delta_y": -100}
Window Control:
{"type": "focus", "app": "Safari"}
{"type": "minimize", "app": "Finder"}
{"type": "restore", "app": "Finder"}
{"type": "resize", "app": "Safari", "width": 1024, "height": 768}
{"type": "close", "app": "TextEdit"}
Shell Sessions
Run interactive terminal sessions with full PTY and terminal emulation (supports TUI apps like vim, htop):
1. shell(action: "spawn") β Get session ID
2. shell(action: "send_input", session_id: "ID", input: "ls") β Send text
3. shell(action: "send_input", session_id: "ID", special_key: "enter")
4. shell(action: "get_snapshot", session_id: "ID", format: "ansi") β Get screen
5. shell(action: "close", session_id: "ID") β Cleanup
| Action | Parameters | Description |
|---|---|---|
spawn | command?, cwd?, cols?, rows? | Start session (default: /bin/bash) |
send_input | session_id, input?, special_key?, wait_ms? | Send text or keys |
get_snapshot | session_id, format? | Get screen state (text or ansi) |
resize | session_id, cols?, rows? | Resize terminal |
list | List active sessions | |
close | session_id | Close session |
Headless Mode
Run without menu bar (for CI/SSH):
{
"mcpServers": {
"mac-control": {
"command": "/Applications/MCPMacControl.app/Contents/MacOS/mcpmaccontrol",
"env": {
"MCPMACCONTROL_HEADLESS": "1"
}
}
}
}
Image Optimization
Screenshots are optimized for AI vision:
- WebP lossy format (default quality 25, tunable 1-100 via
qualityparameter; use 50+ for small glyphs/icons) - Scaled to window point dimensions (coordinates match clicks)
- Use region capture to further reduce size and tokens
Development
make build # Build signed .app bundle
make test # Run tests
make install # Install to /Applications
make clean # Remove build artifacts
make verify-sign # Check code signature
To sign with your own Apple Developer certificate:
SIGN_IDENTITY="Developer ID Application: YOUR NAME (TEAMID)" make build
Requirements
- macOS 12+
- Go 1.24+ (for building)
License
MIT
