Autogui
AutoGUI MCP Server - AI-driven screen automation via Model Context Protocol
Installation
npx autoguiAsk AI about Autogui
Powered by Claude · Grounded in docs
I know everything about Autogui. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
中文 | English
AutoGUI
AI-driven screen automation MCP Server. Send natural language tasks, and the internal AI captures screenshots, analyzes them, and performs mouse/keyboard actions autonomously.
Architecture
MCP Client (Claude Code, Claude Desktop, Cursor, etc.)
| stdio
v
server.py (FastMCP async orchestration loop)
|
v
agent.py (ScreenAgent toolkit: capture, execute, parse, safety)
Quick Start
git clone https://github.com/stellariums/AutoGUI.git
cd AutoGUI
pip install -r requirements.txt
Configuration
AutoGUI supports layered configuration: environment variables (highest priority) > config.json > defaults.
Option A: Environment Variables Only (Simplest)
Only 3 variables needed to get started:
set AUTOGUI_API_KEY=your-api-key
set AUTOGUI_BASE_URL=https://api.openai.com/v1
set AUTOGUI_MODEL=gpt-4o
Or copy .env.example to .env and pass via your MCP client config (see below).
Option B: Config File
cp config.json.example config.json
Edit config.json — only the api section is required, everything else has sensible defaults:
{
"api": {
"base_url": "https://your-api-endpoint/v1",
"api_key": "your-api-key",
"model": "your-model-name"
}
}
Advanced config options
{
"screen": {
"max_width": 1280,
"max_height": 720,
"allowed_region": null
},
"agent": {
"max_iterations": 20,
"delay_between_actions": 1.0,
"max_history_rounds": 10
},
"safety": {
"enable_confirmation": true,
"fallback_action": "block",
"dangerous_keys": ["delete", "backspace", "escape"],
"dangerous_hotkeys": [["ctrl", "w"], ["alt", "f4"]],
"dangerous_patterns": ["rm ", "del ", "format ", "shutdown"]
}
}
MCP Client Setup
Claude Code
claude mcp add AutoGUI -- python /path/to/AutoGUI/server.py
Or add to .mcp.json:
{
"mcpServers": {
"AutoGUI": {
"command": "python",
"args": ["/path/to/AutoGUI/server.py"],
"env": {
"AUTOGUI_API_KEY": "your-api-key",
"AUTOGUI_BASE_URL": "https://api.openai.com/v1",
"AUTOGUI_MODEL": "gpt-4o"
}
}
}
}
Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"AutoGUI": {
"command": "python",
"args": ["C:/path/to/AutoGUI/server.py"],
"env": {
"AUTOGUI_API_KEY": "your-api-key",
"AUTOGUI_BASE_URL": "https://api.openai.com/v1",
"AUTOGUI_MODEL": "gpt-4o"
}
}
}
}
Cursor
Add to Cursor MCP settings (.cursor/mcp.json):
{
"mcpServers": {
"AutoGUI": {
"command": "python",
"args": ["/path/to/AutoGUI/server.py"],
"env": {
"AUTOGUI_API_KEY": "your-api-key",
"AUTOGUI_BASE_URL": "https://api.openai.com/v1",
"AUTOGUI_MODEL": "gpt-4o"
}
}
}
}
MCP Inspector (Testing)
npx @modelcontextprotocol/inspector python server.py
Tool
| Tool | Description |
|---|---|
autogui_execute_task | Execute a screen automation task via natural language |
Supported Actions
| Action | Description |
|---|---|
| click | Click at position |
| double_click | Double click |
| right_click | Right click |
| type | Input text (supports CJK via clipboard) |
| press | Key combination |
| scroll | Scroll |
| drag | Drag |
| move | Move cursor |
| wait | Wait |
| task_complete | Mark task as done |
Safety
- Dangerous action detection (rule-based + AI self-labeling)
- Configurable dangerous keys, hotkeys, and text patterns
- Optional region restriction (
allowed_region) - Elicit-based confirmation for dangerous operations
- Configurable fallback:
blockorallow
FAQ
Q: Screenshot is black or empty A: Make sure the screen is not locked. On Windows, pyautogui/mss cannot capture the lock screen.
Q: Chinese input not working
A: AutoGUI uses clipboard (pyperclip + Ctrl+V) for text input, which supports CJK characters. Make sure pyperclip is installed.
Q: "API key required" error
A: Set AUTOGUI_API_KEY env var or add api.api_key in config.json.
Q: "Another task is already running" error A: AutoGUI processes one task at a time. Wait for the current task to finish.
Requirements
- Windows 10/11
- Python >= 3.10
- An OpenAI-compatible vision API (GPT-4o, Qwen-VL, etc.)
Acknowledgments
This project is based on tech-shrimp/qwen_autogui, refactored as an MCP Server with added safety checks, layered configuration, and multi-client support.
License
MIT
