AutoControlGUI
A framework for GUI automation
Ask AI about AutoControlGUI
Powered by Claude ยท Grounded in docs
I know everything about AutoControlGUI. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
AutoControl
AutoControl is a cross-platform Python GUI automation framework providing mouse control, keyboard input, image recognition, screen capture, action scripting, and report generation โ all through a unified API that works on Windows, macOS, and Linux (X11).
Table of Contents
- Features
- Architecture
- Installation
- Requirements
- Quick Start
- Mouse Control
- Keyboard Control
- Image Recognition
- Accessibility Element Finder
- AI Element Locator (VLM)
- OCR (Text on Screen)
- LLM Action Planner
- Runtime Variables & Control Flow
- Remote Desktop
- Clipboard
- Screenshot
- Action Recording & Playback
- JSON Action Scripting
- MCP Server (Use AutoControl from Claude)
- Scheduler (Interval & Cron)
- Global Hotkey Daemon
- Event Triggers
- Run History
- Report Generation
- Remote Automation (Socket / REST)
- Plugin Loader
- Shell Command Execution
- Screen Recording
- Callback Executor
- Package Manager
- Project Management
- Window Management
- GUI Application
- Command-Line Interface
- Platform Support
- Development
- License
Features
- Mouse Automation โ move, click, press, release, drag, and scroll with precise coordinate control
- Keyboard Automation โ press/release individual keys, type strings, hotkey combinations, key state detection
- Image Recognition โ locate UI elements on screen using OpenCV template matching with configurable threshold
- Accessibility Element Finder โ query the OS accessibility tree (Windows UIA / macOS AX) to locate buttons, menus, and controls by name/role
- AI Element Locator (VLM) โ describe a UI element in plain language and let a vision-language model (Anthropic / OpenAI) find its screen coordinates
- OCR โ extract text from screen regions using Tesseract; wait for, click, or locate rendered text; regex search and full-region dump
- LLM Action Planner โ translate a plain-language description into a validated
AC_*action list using Claude - Runtime Variables & Control Flow โ
${var}substitution at execution time, plusAC_set_var/AC_inc_var/AC_if_var/AC_for_each/AC_loop/AC_retryfor data-driven scripts - Remote Desktop โ stream this machine's screen and accept remote input over a token-authenticated TCP protocol, or connect to another machine and view + control it (host + viewer GUIs included). Optional TLS (HTTPS-grade encryption), WebSocket transport (ws:// + wss:// for browser / firewall-friendly clients), persistent 9-digit Host ID, hostโviewer audio streaming, bidirectional clipboard sync (text + image), and chunked file transfer (drag-drop + progress bar; arbitrary destination path; no size cap). Plus folder sync (additive mirror โ local deletions never propagate) and a self-hosted coturn TURN config bundle generator (turnserver.conf + systemd unit + docker-compose + README). AnyDesk-style popout: when the viewer authenticates, the live remote desktop opens in its own resizable top-level window so the control panel stays uncluttered. The Remote Desktop tabs are wrapped in
QScrollAreaso the panel stays usable on small windows and stretches edge-to-edge on 4K displays. Driveable headlessly viaje_auto_controland over MCP through the newac_remote_*tools - Driver-level input backends (opt-in) โ for games / apps that ignore SendInput (Win) or XTest (Linux): Interception driver backend for Windows (HID-layer keyboard / mouse injection via Oblita's WHQL-signed driver, opt-in via
JE_AUTOCONTROL_WIN32_BACKEND=interception), uinput backend for Linux (kernel/dev/uinputsynthetic HID device, opt-in viaJE_AUTOCONTROL_LINUX_BACKEND=uinput), and ViGEm virtual gamepad for Windows games that read controllers (virtual Xbox 360 pad with friendly button / dpad / stick / trigger API, exposed asAC_gamepad_*executor commands andac_gamepad_*MCP tools). All three fall back gracefully when the driver isn't installed, so existing deployments keep working unchanged - Clipboard โ read/write system clipboard text on Windows, macOS, and Linux
- Screenshot & Screen Recording โ capture full screen or regions as images, record screen to video (AVI/MP4)
- Action Recording & Playback โ record mouse/keyboard events and replay them
- JSON-Based Action Scripting โ define and execute automation flows using JSON action files (dry-run + step debug)
- Scheduler โ run scripts on an interval or cron expression; jobs persist across restarts
- Global Hotkey Daemon โ bind OS-level hotkeys to action scripts (Windows today; macOS/Linux stubs in place)
- Event Triggers โ fire scripts when an image appears, a window opens, a pixel changes, or a file is modified
- Run History โ SQLite-backed run log across scheduler / triggers / hotkeys / REST with auto error-screenshot artifacts
- Report Generation โ export test records as HTML, JSON, or XML reports with success/failure status
- MCP Server โ JSON-RPC 2.0 Model Context Protocol server (stdio + HTTP/SSE) so Claude Desktop / Claude Code / custom tool-use loops can drive AutoControl. ~100 tools, full protocol coverage (resources, prompts, sampling, roots, logging, progress, cancellation, elicitation), bearer-token auth + TLS, audit log, rate limit, plugin hot-reload, CI fake backend. New in this release:
ac_remote_host_start/ac_remote_host_stop/ac_remote_host_status/ac_remote_viewer_connect/ac_remote_viewer_disconnect/ac_remote_viewer_status/ac_remote_viewer_send_inputwrap the same singleton remote-desktop registry the GUI uses, so a model can spin up a host, open a viewer to another machine, and forward mouse / keyboard / type / hotkey actions through the active session - Remote Automation โ TCP socket server and hardened REST API: bearer-token auth, per-IP rate limit + lockout, SQLite audit hook, Prometheus
/metrics, OpenAPI-style endpoint table (/health,/screen_size,/sessions,/screenshot,/execute,/audit/list,/audit/verify,/inspector/recent,/usb/devices,/diagnose, ...), and a vanilla-JS browser dashboard at/dashboard(any phone with HTTP reach can monitor the host) - Plugin Loader โ drop
.pyfiles exposingAC_*callables into a directory and register them as executor commands at runtime - Shell Integration โ execute shell commands within automation workflows with async output capture
- Callback Executor โ trigger automation functions with callback hooks for chaining operations
- Dynamic Package Loading โ extend the executor at runtime by importing external Python packages
- Project & Template Management โ scaffold automation projects with keyword/executor directory structure
- Window Management โ send keyboard/mouse events directly to specific windows (Windows/Linux)
- GUI Application โ built-in PySide6 graphical interface with live language switching (English / ็น้ซไธญๆ / ็ฎไฝไธญๆ / ๆฅๆฌ่ช)
- CLI Runner โ
python -m je_auto_control.cli run|list-jobs|start-server|start-rest - Cross-Platform โ unified API across Windows, macOS, and Linux (X11)
- Multi-Host Admin Console โ register N AutoControl REST endpoints in one address book, poll them in parallel for health/sessions/jobs, broadcast actions to all of them. Persisted to
~/.je_auto_control/admin_hosts.json(mode 0600 on POSIX). Bad-token hosts surface as unhealthy with the actual HTTP error - Tamper-Evident Audit Log โ SQLite events table with SHA-256 hash chain (
prev_hash+row_hashper row); editing any past row breaks the chain.verify_chain()walks rows top-down and reports the first broken link. Legacy tables get backfilled at startup ("trust on first use") - WebRTC Packet Inspector โ process-global rolling window of
StatsSnapshotsamples (default 600 / ~10 min @ 1Hz) fed by the existing WebRTC stats pollers. Per-metriclast/min/max/avg/p95for RTT, FPS, bitrate, packet loss, jitter - USB Device Enumeration โ read-only cross-platform device listing. Tries pyusb (libusb) first; falls back to platform-specific (Windows
Get-PnpDevice, macOSsystem_profiler, Linux/sys/bus/usb/devices). Phase 2 (passthrough) intentionally deferred pending design review - System Diagnostics โ single-command "is everything OK?" probe across platform, optional deps, executor command count, audit chain, screenshot, mouse, disk space, REST registry. CLI exits 0 if all green / 1 otherwise; REST
/diagnose; severity-tagged GUI tab - USB Hotplug Events โ polling-based hotplug watcher (
UsbHotplugWatcher) with bounded ring buffer + sequence-numbered events;GET /usb/events?since=Nlets late subscribers catch up. GUI auto-refresh toggle on the USB tab. - OpenAPI 3.1 + Swagger UI โ
GET /openapi.json(auth-gated, generated from the live route table) +GET /docs(browser Swagger UI with bearer token bar). Drift test in CI catches new routes added without metadata. - Configuration Bundle โ single-file JSON export/import of user config (admin hosts, address book, trusted viewers, known hosts, host service, IDs). Atomic write with
<name>.bak.<timestamp>backups; CLIpython -m je_auto_control.utils.config_bundle export|import;POST /config/{export,import}; GUI buttons on the REST API tab. - USB Passthrough (experimental, opt-in) โ wire-level protocol over a WebRTC
usbDataChannel (10 opcodes, CREDIT-based flow control, 16 KiB payload cap). Host-sideUsbPassthroughSessionend-to-end on the Linux libusb backend; WindowsWinUSBbackend with full ctypes wiring (hardware-unverified); macOSIOKitskeleton. Viewer-side blocking client (UsbPassthroughClientโClientHandle.control_transfer / bulk_transfer / interrupt_transfer). Persistent ACL (~/.je_auto_control/usb_acl.json, default deny, mode 0600) with host-side prompt QDialog and tamper-evident audit-log integration. Default off โ opt-in viaenable_usb_passthrough(True)orJE_AUTOCONTROL_USB_PASSTHROUGH=1. Phase 2e external security review checklist included; default-on requires sign-off.
Architecture
The runtime is layered: client surfaces (CLI, GUI, MCP/REST/socket
servers) sit on top of the headless API (wrapper/ + utils/),
which resolves to a per-OS backend chosen at import time by
wrapper/platform_wrapper.py. The package faรงade
(je_auto_control/__init__.py) re-exports every public name so users
need only import je_auto_control regardless of which surface or
backend they hit.
flowchart LR
subgraph Clients["Client Surfaces"]
direction TB
Claude[["Claude Desktop /<br/>Claude Code"]]
APIUser[["Custom Anthropic /<br/>OpenAI tool loops"]]
HTTPClient[["HTTP / SSE clients"]]
TCPClient[["Socket / REST clients"]]
Browser[["Browser<br/>(/dashboard ยท /docs)"]]
GUIUser[["PySide6 GUI"]]
CLIUser[["python -m<br/>je_auto_control[.cli]"]]
Library[["Library users<br/>(import je_auto_control)"]]
end
subgraph Transports["Transports & Servers"]
direction TB
Stdio["MCP stdio<br/>JSON-RPC 2.0"]
HTTPMCP["MCP HTTP /<br/>SSE + auth + TLS"]
REST["REST server :9939<br/>bearer auth ยท rate-limit ยท<br/>OpenAPI ยท /metrics ยท /dashboard"]
Socket["Socket server<br/>:9938"]
WebRTC["WebRTC sessions<br/>(remote desktop ยท<br/>files ยท audio ยท USB)"]
end
subgraph MCP["mcp_server/"]
direction TB
Dispatcher["MCPServer<br/>(JSON-RPC dispatcher)"]
Tools["tools/<br/>~90 ac_* + aliases"]
Resources["resources/<br/>files ยท history ยท<br/>commands ยท screen-live"]
Prompts["prompts/<br/>built-in templates"]
Context["context ยท audit ยท<br/>rate-limit ยท log-bridge"]
FakeBE["fake_backend<br/>(CI smoke)"]
end
subgraph Core["Headless Core (wrapper/ + utils/)"]
direction TB
Wrapper["wrapper/<br/>mouse ยท keyboard ยท screen ยท<br/>image ยท record ยท window"]
Executor["executor/<br/>AC_* JSON action engine"]
Vision["vision/ ยท ocr/ ยท<br/>accessibility/"]
Recorder["scheduler/ ยท triggers/ ยท<br/>hotkey/ ยท plugin_loader/<br/>run_history/"]
IOUtils["clipboard/ ยท cv2_utils/ ยท<br/>shell_process/ ยท json/"]
end
subgraph Ops["Operations Layer (utils/)"]
direction TB
Admin["admin/<br/>multi-host poll +<br/>broadcast"]
Audit["remote_desktop/<br/>audit_log<br/>(SHA-256 chain)"]
Inspector["remote_desktop/<br/>webrtc_inspector"]
Diag["diagnostics/<br/>self-test"]
ConfigB["config_bundle/<br/>export/import"]
end
subgraph USB["USB"]
direction TB
UsbEnum["usb/<br/>list + hotplug events"]
UsbPass["usb/passthrough/<br/>session ยท client ยท ACL ยท<br/>libusb ยท WinUSB ยท IOKit"]
end
subgraph Remote["Remote Desktop (utils/remote_desktop/)"]
direction TB
RDHost["host ยท webrtc_host ยท<br/>signaling ยท multi_viewer"]
RDFiles["webrtc_files ยท file_sync ยท<br/>clipboard_sync ยท audio"]
RDTrust["trust_list ยท fingerprint ยท<br/>turn_config ยท lan_discovery"]
end
subgraph Backends["Per-OS Backends"]
direction TB
Win["windows/<br/>Win32 ctypes"]
Mac["osx/<br/>pyobjc ยท Quartz"]
X11["linux_with_x11/<br/>python-Xlib"]
end
Claude --> Stdio
APIUser --> Stdio
HTTPClient --> HTTPMCP
TCPClient --> Socket
TCPClient --> REST
Browser --> REST
Stdio --> Dispatcher
HTTPMCP --> Dispatcher
Dispatcher --> Tools
Dispatcher --> Resources
Dispatcher --> Prompts
Dispatcher -.- Context
Tools -.optional.-> FakeBE
Tools --> Wrapper
Tools --> Executor
Tools --> Vision
Tools --> Recorder
Tools --> IOUtils
Resources --> Recorder
Resources --> Wrapper
REST --> Executor
REST --> Ops
REST --> USB
Socket --> Executor
WebRTC --> Remote
WebRTC --> UsbPass
GUIUser --> Wrapper
GUIUser --> Recorder
GUIUser --> Ops
GUIUser --> USB
GUIUser --> Remote
CLIUser --> Executor
Library --> Wrapper
Library --> Executor
Library --> Ops
Admin --> REST
Inspector -.- WebRTC
Audit -.- REST
Audit -.- USB
UsbPass --> Backends
Wrapper --> Backends
Vision -.- Wrapper
Recorder -.- Executor
je_auto_control/
โโโ wrapper/ # Platform-agnostic API layer
โ โโโ platform_wrapper.py # Auto-detects OS and loads the correct backend
โ โโโ auto_control_mouse.py # Mouse operations
โ โโโ auto_control_keyboard.py# Keyboard operations
โ โโโ auto_control_image.py # Image recognition (OpenCV template matching)
โ โโโ auto_control_screen.py # Screenshot, screen size, pixel color
โ โโโ auto_control_window.py # Cross-platform window manager facade
โ โโโ auto_control_record.py # Action recording/playback
โโโ windows/ # Windows-specific backend (Win32 API / ctypes)
โโโ osx/ # macOS-specific backend (pyobjc / Quartz)
โโโ linux_with_x11/ # Linux-specific backend (python-Xlib)
โโโ gui/ # PySide6 GUI application
โโโ utils/
โโโ mcp_server/ # MCP server (stdio + HTTP/SSE) โ server, tools/, resources, prompts, audit, rate_limit, fake_backend, plugin_watcher
โโโ executor/ # JSON action executor engine
โโโ callback/ # Callback function executor
โโโ cv2_utils/ # OpenCV screenshot, template matching, video recording
โโโ accessibility/ # UIA (Windows) / AX (macOS) element finder
โโโ vision/ # VLM-based locator (Anthropic / OpenAI backends)
โโโ ocr/ # Tesseract-backed text locator
โโโ clipboard/ # Cross-platform clipboard (text + image)
โโโ llm/ # Plain-language โ AC_* action planner
โโโ scheduler/ # Interval + cron scheduler
โโโ hotkey/ # Global hotkey daemon
โโโ triggers/ # Image/window/pixel/file triggers
โโโ run_history/ # SQLite run log + error-screenshot artifacts
โโโ rest_api/ # Stdlib HTTP/REST server โ auth ยท audit ยท rate-limit ยท OpenAPI ยท /metrics ยท dashboard ยท Swagger UI
โโโ admin/ # Multi-host AdminConsoleClient (poll + broadcast)
โโโ diagnostics/ # System self-test runner + CLI
โโโ config_bundle/ # Single-file user-config export / import
โโโ usb/ # Cross-platform enumeration, hotplug events, passthrough/{protocol, session, viewer client, ACL, libusb / WinUSB / IOKit}
โโโ remote_desktop/ # WebRTC host + viewer, signalling, multi-viewer, file/clipboard/audio sync, audit log (hash chain), trust list, TURN config, mDNS discovery, WebRTC stats inspector
โโโ plugin_loader/ # Dynamic AC_* plugin discovery
โโโ socket_server/ # TCP socket server for remote automation
โโโ shell_process/ # Shell command manager
โโโ generate_report/ # HTML / JSON / XML report generators
โโโ test_record/ # Test action recording
โโโ script_vars/ # Script variable interpolation
โโโ watcher/ # Mouse / pixel / log watchers (Live HUD)
โโโ recording_edit/ # Trim, filter, re-scale recorded actions
โโโ json/ # JSON action file read/write
โโโ project/ # Project scaffolding & templates
โโโ package_manager/ # Dynamic package loading
โโโ logging/ # Logging
โโโ exception/ # Custom exception classes
The platform_wrapper.py module automatically detects the current operating system and imports the corresponding backend, so all wrapper functions work identically regardless of platform.
Installation
Basic Installation
pip install je_auto_control
With GUI Support (PySide6)
pip install je_auto_control[gui]
Linux Prerequisites
On Linux, install the following system packages before installing:
sudo apt-get install cmake libssl-dev
Requirements
- Python >= 3.10
- pip >= 19.3
Dependencies
| Package | Purpose |
|---|---|
je_open_cv | Image recognition (OpenCV template matching) |
pillow | Screenshot capture |
mss | Fast multi-monitor screenshot |
pyobjc | macOS backend (auto-installed on macOS) |
python-Xlib | Linux X11 backend (auto-installed on Linux) |
PySide6 | GUI application (optional, install with [gui]) |
qt-material | GUI theme (optional, install with [gui]) |
uiautomation | Windows accessibility backend (optional, loaded on demand) |
pytesseract + Tesseract | OCR engine (optional, loaded on demand) |
anthropic | VLM locator โ Anthropic backend (optional, loaded on demand) |
openai | VLM locator โ OpenAI backend (optional, loaded on demand) |
See Third_Party_License.md for a full list of third-party components and their licenses.
Quick Start
Mouse Control
import je_auto_control
# Get current mouse position
x, y = je_auto_control.get_mouse_position()
print(f"Mouse at: ({x}, {y})")
# Move mouse to coordinates
je_auto_control.set_mouse_position(500, 300)
# Left click at current position (use key name)
je_auto_control.click_mouse("mouse_left")
# Right click at specific coordinates
je_auto_control.click_mouse("mouse_right", x=800, y=400)
# Scroll down
je_auto_control.mouse_scroll(scroll_value=5)
Keyboard Control
import je_auto_control
# Press and release a single key
je_auto_control.type_keyboard("a")
# Type a whole string character by character
je_auto_control.write("Hello World")
# Hotkey combination (e.g., Ctrl+C)
je_auto_control.hotkey(["ctrl_l", "c"])
# Check if a key is currently pressed
is_pressed = je_auto_control.check_key_is_press("shift_l")
Image Recognition
import je_auto_control
# Find all occurrences of an image on screen
positions = je_auto_control.locate_all_image("button.png", detect_threshold=0.9)
# Returns: [[x1, y1, x2, y2], ...]
# Find a single image and get its center coordinates
cx, cy = je_auto_control.locate_image_center("icon.png", detect_threshold=0.85)
print(f"Found at: ({cx}, {cy})")
# Find an image and automatically click it
je_auto_control.locate_and_click("submit_button.png", mouse_keycode="mouse_left")
Accessibility Element Finder
Query the OS accessibility tree to locate controls by name, role, or app.
Works on Windows (UIA, via uiautomation) and macOS (AX).
import je_auto_control
# List all visible buttons in the Calculator app
elements = je_auto_control.list_accessibility_elements(app_name="Calculator")
# Find a specific element
ok = je_auto_control.find_accessibility_element(name="OK", role="Button")
if ok is not None:
print(ok.bounds, ok.center)
# Click it directly
je_auto_control.click_accessibility_element(name="OK", app_name="Calculator")
Raises AccessibilityNotAvailableError if no accessibility backend is
installed for the current platform.
AI Element Locator (VLM)
When template matching and accessibility both fail, describe the element in plain language and let a vision-language model find its coordinates.
import je_auto_control
# Uses Anthropic by default if ANTHROPIC_API_KEY is set, else OpenAI.
x, y = je_auto_control.locate_by_description("the green Submit button")
# Or click it in one shot
je_auto_control.click_by_description(
"the cookie-banner 'Accept all' button",
screen_region=[0, 800, 1920, 1080], # optional crop
)
Configuration (environment variables only โ keys are never persisted or logged):
| Variable | Effect |
|---|---|
ANTHROPIC_API_KEY | Enables the Anthropic backend |
OPENAI_API_KEY | Enables the OpenAI backend |
AUTOCONTROL_VLM_BACKEND | anthropic or openai to force a backend |
AUTOCONTROL_VLM_MODEL | Override the default model (e.g. claude-opus-4-7, gpt-4o-mini) |
Raises VLMNotAvailableError if neither SDK is installed or no API key
is set.
OCR (Text on Screen)
import je_auto_control as ac
# Locate all matches of a piece of text
matches = ac.find_text_matches("Submit")
# Center of the first match, or None
cx, cy = ac.locate_text_center("Submit")
# Click text in one call
ac.click_text("Submit")
# Block until text appears (or timeout)
ac.wait_for_text("Loading complete", timeout=15.0)
If Tesseract is not on PATH, point at it explicitly:
ac.set_tesseract_cmd(r"C:\Program Files\Tesseract-OCR\tesseract.exe")
Dump every recognised text record in a region (or full screen), or search by regex when the text varies:
import je_auto_control as ac
# Every hit in a region as TextMatch records (text, bounding box, confidence)
for match in ac.read_text_in_region(region=[0, 0, 800, 600]):
print(match.text, match.center, match.confidence)
# Regex โ accepts a pattern string or a compiled re.Pattern
for match in ac.find_text_regex(r"Order#\d+"):
print(match.text, match.center)
GUI: OCR Reader tab.
LLM Action Planner
Translate plain-language descriptions into validated AC_* action lists
using an LLM (Anthropic Claude by default). Output is leniently parsed
(strips code fences, extracts the first JSON array from prose) and then
validated by the same schema the executor uses, so the result can be
piped straight into execute_action:
import je_auto_control as ac
from je_auto_control.utils.executor.action_executor import executor
actions = ac.plan_actions(
"click the Submit button, then type 'done' and save",
known_commands=executor.known_commands(),
)
executor.execute_action(actions)
# Or in a single call:
ac.run_from_description("open Notepad and type hello", executor=executor)
| Variable | Effect |
|---|---|
ANTHROPIC_API_KEY | Enables the Anthropic backend |
AUTOCONTROL_LLM_BACKEND | anthropic to force a backend |
AUTOCONTROL_LLM_MODEL | Override the default model (e.g. claude-opus-4-7) |
GUI: LLM Planner tab โ description box, QThread-backed Plan
button, action-list preview, and a Run plan button.
Runtime Variables & Control Flow
The executor resolves ${var} placeholders per command call rather
than pre-flattening, so nested body / then / else lists keep their
placeholders and re-bind on every iteration. Combined with new mutation
commands, scripts can drive themselves from data without Python glue:
[
["AC_set_var", {"name": "items", "value": ["alpha", "beta"]}],
["AC_set_var", {"name": "i", "value": 0}],
["AC_for_each", {
"items": "${items}", "as": "name",
"body": [
["AC_inc_var", {"name": "i"}],
["AC_if_var", {
"name": "i", "op": "ge", "value": 2,
"then": [["AC_break"]], "else": []
}]
]
}]
]
AC_if_var operators: eq, ne, lt, le, gt, ge, contains,
startswith, endswith. GUI: Variables tab โ live view of
executor.variables with single-set, JSON seed, and clear-all controls.
Remote Desktop
Stream this machine's screen and accept remote input, or view and control another machine. The wire format is a length-prefixed framing on raw TCP (no extra deps), starting with an HMAC-SHA256 challenge / response handshake; viewers that fail auth are dropped before they can see a frame. JPEG frames are produced at the configured FPS / quality and broadcast to authenticated viewers via a shared latest-frame slot, so a slow viewer drops frames instead of blocking the rest. Viewer input is JSON, validated against an allowlist, and applied through the existing wrappers.
# Be remoted โ start a host and hand the token + port to whoever views you
from je_auto_control import RemoteDesktopHost
host = RemoteDesktopHost(token="hunter2", bind="127.0.0.1",
port=0, fps=10, quality=70)
host.start()
print("listening on", host.port, "viewers:", host.connected_clients)
# Control another machine โ connect a viewer and send input
from je_auto_control import RemoteDesktopViewer
viewer = RemoteDesktopViewer(host="10.0.0.5", port=51234, token="hunter2",
on_frame=lambda jpeg: ...)
viewer.connect()
viewer.send_input({"action": "mouse_move", "x": 100, "y": 200})
viewer.send_input({"action": "type", "text": "hello"})
viewer.disconnect()
GUI: Remote Desktop tab with two sub-tabs.
- Host โ token field with a Generate button, security warning about the bind address, start / stop controls, refreshing port + viewer-count status, and a 4 fps preview pane below the controls so the user being remoted sees what viewers see.
- Viewer โ address / port / token form, Connect / Disconnect,
and a custom frame-display widget that paints incoming JPEG frames
scaled with
KeepAspectRatio. Mouse / wheel / key events on the display are remapped from widget coordinates back to the remote screen's pixel space using the latest frame's dimensions, then forwarded asINPUTmessages.
โ ๏ธ Anyone with the host:port and token gets full mouse / keyboard control of the host machine. Default bind is
127.0.0.1; expose externally only via SSH tunnel or TLS front-end. The token is the only line of defence โ treat it like a password.
Encrypted transports + alternate protocols. Pass an ssl_context
to either RemoteDesktopHost or RemoteDesktopViewer to wrap every
connection in TLS. For firewall-friendly access, use the in-tree
WebSocket variants (no extra deps) โ same protocol, RFC 6455 framing,
and wss:// if you also pass ssl_context:
from je_auto_control import (
WebSocketDesktopHost, WebSocketDesktopViewer,
)
host = WebSocketDesktopHost(token="hunter2", ssl_context=server_ctx)
viewer = WebSocketDesktopViewer(
host="example.com", port=443, token="hunter2",
ssl_context=client_ctx, expected_host_id="123456789",
)
Persistent Host ID. Every host owns a stable 9-digit numeric ID
(persisted at ~/.je_auto_control/remote_host_id), announced in
AUTH_OK and verifiable via the viewer's expected_host_id:
print(host.host_id) # e.g. "123456789"
viewer = RemoteDesktopViewer(
host=..., port=..., token=...,
expected_host_id="123456789", # AuthenticationError on mismatch
)
Audio streaming (host โ viewer). Optional sounddevice dep; opt
in with an AudioCaptureConfig on the host, attach an AudioPlayer
(or your own callback) on the viewer:
from je_auto_control.utils.remote_desktop import AudioCaptureConfig
host = RemoteDesktopHost(
token="tok",
audio_config=AudioCaptureConfig(enabled=True), # default mic
)
# Or pick a loopback / monitor device:
# audio_config=AudioCaptureConfig(enabled=True, device=12)
from je_auto_control.utils.remote_desktop import AudioPlayer
player = AudioPlayer(); player.start()
viewer = RemoteDesktopViewer(host=..., on_audio=player.play)
Clipboard sync (text + image, bidirectional). Explicit per-call โ
no auto-poll loops. Image clipboard works on Windows (CF_DIB via
ctypes) and Linux (xclip -t image/png); macOS get is supported via
Pillow ImageGrab, set requires PyObjC.
viewer.send_clipboard_text("hello")
viewer.send_clipboard_image(open("logo.png", "rb").read())
host.broadcast_clipboard_text("greetings")
File transfer with progress. Bidirectional, chunked, arbitrary destination path, no size cap; the GUI viewer also accepts drag-drop:
viewer.send_file(
"local.bin", "/tmp/uploaded.bin",
on_progress=lambda tid, done, total: print(done, total),
)
host.send_file_to_viewers("local.bin", "/tmp/from_host.bin")
โ ๏ธ Path is unrestricted and there is no aggregate size limit. Anyone with the token can write any file to any location and can fill the disk โ keep "trusted token holders == trusted users" in mind, or wrap with your own
FileReceiversubclass that vets destination paths.
Clipboard
import je_auto_control as ac
ac.set_clipboard("hello")
text = ac.get_clipboard()
Backends: Windows (Win32 via ctypes), macOS (pbcopy/pbpaste),
Linux (xclip or xsel).
Screenshot
import je_auto_control
# Take a full-screen screenshot and save to file
je_auto_control.pil_screenshot("screenshot.png")
# Take a screenshot of a specific region [x1, y1, x2, y2]
je_auto_control.pil_screenshot("region.png", screen_region=[100, 100, 500, 400])
# Get screen resolution
width, height = je_auto_control.screen_size()
# Get pixel color at coordinates
color = je_auto_control.get_pixel(500, 300)
Action Recording & Playback
import je_auto_control
import time
# Start recording mouse and keyboard events
je_auto_control.record()
time.sleep(10) # Record for 10 seconds
# Stop recording and get the action list
actions = je_auto_control.stop_record()
# Replay the recorded actions
je_auto_control.execute_action(actions)
JSON Action Scripting
Create a JSON action file (actions.json):
[
["AC_set_mouse_position", {"x": 500, "y": 300}],
["AC_click_mouse", {"mouse_keycode": "mouse_left"}],
["AC_write", {"write_string": "Hello from AutoControl"}],
["AC_screenshot", {"file_path": "result.png"}],
["AC_hotkey", {"key_code_list": ["ctrl_l", "s"]}]
]
Execute it:
import je_auto_control
# Execute from file
je_auto_control.execute_action(je_auto_control.read_action_json("actions.json"))
# Or execute from a list directly
je_auto_control.execute_action([
["AC_set_mouse_position", {"x": 100, "y": 200}],
["AC_click_mouse", {"mouse_keycode": "mouse_left"}]
])
Available action commands:
| Category | Commands |
|---|---|
| Mouse | AC_click_mouse, AC_set_mouse_position, AC_get_mouse_position, AC_get_mouse_table, AC_press_mouse, AC_release_mouse, AC_mouse_scroll, AC_mouse_left, AC_mouse_right, AC_mouse_middle |
| Keyboard | AC_type_keyboard, AC_press_keyboard_key, AC_release_keyboard_key, AC_write, AC_hotkey, AC_check_key_is_press, AC_get_keyboard_keys_table |
| Image | AC_locate_all_image, AC_locate_image_center, AC_locate_and_click |
| Screen | AC_screen_size, AC_screenshot |
| Accessibility | AC_a11y_list, AC_a11y_find, AC_a11y_click |
| VLM (AI Locator) | AC_vlm_locate, AC_vlm_click |
| OCR | AC_locate_text, AC_click_text, AC_wait_text, AC_read_text_in_region, AC_find_text_regex |
| LLM planner | AC_llm_plan, AC_llm_run |
| Clipboard | AC_clipboard_get, AC_clipboard_set |
| Window | AC_list_windows, AC_focus_window, AC_wait_window, AC_close_window |
| Flow control | AC_loop, AC_break, AC_continue, AC_if_image_found, AC_if_pixel, AC_if_var, AC_while_image, AC_for_each, AC_wait_image, AC_wait_pixel, AC_sleep, AC_retry |
| Variables | AC_set_var, AC_get_var, AC_inc_var |
| Remote desktop | AC_start_remote_host, AC_stop_remote_host, AC_remote_host_status, AC_remote_connect, AC_remote_disconnect, AC_remote_viewer_status, AC_remote_send_input |
| Record | AC_record, AC_stop_record, AC_set_record_enable |
| Report | AC_generate_html, AC_generate_json, AC_generate_xml, AC_generate_html_report, AC_generate_json_report, AC_generate_xml_report |
| Run history | AC_history_list, AC_history_clear |
| Project | AC_create_project |
| Shell | AC_shell_command |
| Process | AC_execute_process |
| Executor | AC_execute_action, AC_execute_files, AC_add_package_to_executor, AC_add_package_to_callback_executor |
| MCP server | AC_start_mcp_server, AC_start_mcp_http_server |
MCP Server (Use AutoControl from Claude)
Expose AutoControl as a Model Context Protocol server so any MCP-compatible client (Claude Desktop, Claude Code, custom Anthropic / OpenAI tool-use loops) can drive the host machine. Stdlib-only โ JSON-RPC 2.0 over stdio or HTTP+SSE.
Register with Claude Code:
claude mcp add autocontrol -- python -m je_auto_control.utils.mcp_server
Register with Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"autocontrol": {
"command": "python",
"args": ["-m", "je_auto_control.utils.mcp_server"]
}
}
}
Start programmatically:
import je_auto_control as ac
# Stdio (blocks until stdin closes)
ac.start_mcp_stdio_server()
# Or HTTP / SSE with bearer-token auth + optional TLS
ac.start_mcp_http_server(host="127.0.0.1", port=9940,
auth_token="hunter2")
Inspect the catalogue without starting the server:
je_auto_control_mcp --list-tools
je_auto_control_mcp --list-tools --read-only
je_auto_control_mcp --list-resources
je_auto_control_mcp --list-prompts
What ships:
| Surface | Coverage |
|---|---|
| Tools (~90) | mouse ยท keyboard ยท drag ยท screen / multi-monitor ยท screenshot-as-image ยท diff ยท OCR ยท image ยท windows (move/min/max/restore/...) ยท clipboard text+image ยท process / shell ยท recording ยท screen recording ยท scheduler / triggers / hotkeys ยท accessibility tree ยท VLM locator ยท executor ยท history |
| Aliases | click, type, screenshot, find_image, drag, shell, wait_image, ... โ toggle with JE_AUTOCONTROL_MCP_ALIASES=0 |
| Resources | autocontrol://files/<name>, autocontrol://history, autocontrol://commands, autocontrol://screen/live (with resources/subscribe) |
| Prompts | automate_ui_task, record_and_generalize, compare_screenshots, find_widget, explain_action_file |
| Protocol | tools / resources / prompts / sampling / roots / logging / progress / cancellation / list_changed / elicitation |
| Transports | stdio, HTTP POST /mcp, SSE streaming when Accept: text/event-stream |
| Safety | tool annotations ยท JE_AUTOCONTROL_MCP_READONLY ยท JE_AUTOCONTROL_MCP_CONFIRM_DESTRUCTIVE ยท audit log ยท token-bucket rate limiter ยท auto-screenshot on error |
| Ops | bearer-token auth ยท TLS via ssl_context ยท PluginWatcher hot-reload ยท JE_AUTOCONTROL_FAKE_BACKEND=1 for CI |
See docs/source/Eng/doc/mcp_server/mcp_server_doc.rst for the full reference (or the ็น้ซไธญๆ version).
โ ๏ธ The MCP server can move the mouse, send keystrokes, capture the screen, and execute arbitrary
AC_*actions. Only register it with MCP clients you trust. HTTP defaults to127.0.0.1; binding to0.0.0.0requires explicit reason and must be paired withauth_tokenplusssl_context.
Scheduler (Interval & Cron)
import je_auto_control as ac
# Interval job โ run every 30 seconds
job = ac.default_scheduler.add_job(
script_path="scripts/poll.json", interval_seconds=30, repeat=True,
)
# Cron job โ 09:00 on weekdays (minute hour dom month dow)
cron_job = ac.default_scheduler.add_cron_job(
script_path="scripts/daily.json", cron_expression="0 9 * * 1-5",
)
ac.default_scheduler.start()
Both flavours coexist; job.is_cron tells them apart.
Global Hotkey Daemon
Bind OS-level hotkeys to action JSON scripts (Windows backend today;
macOS / Linux raise NotImplementedError on start() with Strategy-
pattern seams in place).
from je_auto_control import default_hotkey_daemon
default_hotkey_daemon.bind("ctrl+alt+1", "scripts/greet.json")
default_hotkey_daemon.start()
Event Triggers
Poll-based triggers that fire a script when a condition becomes true:
from je_auto_control import (
default_trigger_engine, ImageAppearsTrigger,
WindowAppearsTrigger, PixelColorTrigger, FilePathTrigger,
)
default_trigger_engine.add(ImageAppearsTrigger(
trigger_id="", script_path="scripts/click_ok.json",
image_path="templates/ok_button.png", threshold=0.85, repeat=True,
))
default_trigger_engine.start()
Run History
Every run from the scheduler, trigger engine, hotkey daemon, REST API,
and manual GUI replay is recorded to ~/.je_auto_control/history.db.
Errors automatically attach a screenshot under
~/.je_auto_control/artifacts/run_{id}_{ms}.png for post-mortem.
from je_auto_control import default_history_store
for run in default_history_store.list_runs(limit=20):
print(run.id, run.source, run.status, run.artifact_path)
The GUI Run History tab exposes filter/refresh/clear and double-click-to-open on the artifact column.
Report Generation
import je_auto_control
# Enable test recording first
je_auto_control.test_record_instance.set_record_enable(True)
# ... perform automation actions ...
je_auto_control.set_mouse_position(100, 200)
je_auto_control.click_mouse("mouse_left")
# Generate reports
je_auto_control.generate_html_report("test_report") # -> test_report.html
je_auto_control.generate_json_report("test_report") # -> test_report.json
je_auto_control.generate_xml_report("test_report") # -> test_report.xml
# Or get report content as string
html_string = je_auto_control.generate_html()
json_string = je_auto_control.generate_json()
xml_string = je_auto_control.generate_xml()
Reports include: function name, parameters, timestamp, and exception info (if any) for each recorded action. HTML reports display successful actions in cyan and failed actions in red.
Remote Automation (Socket / REST)
Two servers are available โ a raw TCP socket and a stdlib HTTP/REST
server. Both default to 127.0.0.1; binding to 0.0.0.0 is an explicit,
documented opt-in.
import je_auto_control as ac
# TCP socket server (default: 127.0.0.1:9938)
ac.start_autocontrol_socket_server(host="127.0.0.1", port=9938)
# REST API server (default: 127.0.0.1:9939)
ac.start_rest_api_server(host="127.0.0.1", port=9939)
# Endpoints:
# GET /health liveness probe
# GET /jobs scheduler job list
# POST /execute body: {"actions": [...]}
Client example:
import socket
import json
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("localhost", 9938))
# Send an automation command
command = json.dumps([
["AC_set_mouse_position", {"x": 500, "y": 300}],
["AC_click_mouse", {"mouse_keycode": "mouse_left"}]
])
sock.sendall(command.encode("utf-8"))
# Receive response
response = sock.recv(8192).decode("utf-8")
print(response)
sock.close()
Plugin Loader
Drop .py files defining top-level AC_* callables into a directory,
then register them as executor commands at runtime:
from je_auto_control import (
load_plugin_directory, register_plugin_commands,
)
commands = load_plugin_directory("./my_plugins")
register_plugin_commands(commands)
# Now usable from any JSON action script:
# [["AC_greet", {"name": "world"}]]
Warning: Plugin files execute arbitrary Python on load. Only load from directories you control.
Shell Command Execution
import je_auto_control
# Using the default shell manager
je_auto_control.default_shell_manager.exec_shell("echo Hello")
je_auto_control.default_shell_manager.pull_text() # Print captured output
# Or create a custom ShellManager
shell = je_auto_control.ShellManager(shell_encoding="utf-8")
shell.exec_shell("ls -la")
shell.pull_text()
shell.exit_program()
Screen Recording
import je_auto_control
import time
# Method 1: ScreenRecorder (manages multiple recordings)
recorder = je_auto_control.ScreenRecorder()
recorder.start_new_record(
recorder_name="my_recording",
path_and_filename="output.avi",
codec="XVID",
frame_per_sec=30,
resolution=(1920, 1080)
)
time.sleep(10)
recorder.stop_record("my_recording")
# Method 2: RecordingThread (simple single recording, outputs MP4)
recording = je_auto_control.RecordingThread(video_name="my_video", fps=20)
recording.start()
time.sleep(10)
recording.stop()
Callback Executor
Execute an automation function and trigger a callback upon completion:
import je_auto_control
def my_callback():
print("Action completed!")
# Execute set_mouse_position then call my_callback
je_auto_control.callback_executor.callback_function(
trigger_function_name="AC_set_mouse_position",
callback_function=my_callback,
x=500, y=300
)
# With callback parameters
def on_done(message):
print(f"Done: {message}")
je_auto_control.callback_executor.callback_function(
trigger_function_name="AC_click_mouse",
callback_function=on_done,
callback_function_param={"message": "Click finished"},
callback_param_method="kwargs",
mouse_keycode="mouse_left"
)
Package Manager
Dynamically load external Python packages into the executor at runtime:
import je_auto_control
# Add all functions/classes from a package to the executor
je_auto_control.package_manager.add_package_to_executor("os")
# Now you can use os functions in JSON action scripts:
# ["os_getcwd", {}]
# ["os_listdir", {"path": "."}]
Project Management
Scaffold a project directory structure with template files:
import je_auto_control
# Create a project structure
je_auto_control.create_project_dir(project_path="./my_project", parent_name="AutoControl")
# This creates:
# my_project/
# โโโ AutoControl/
# โโโ keyword/
# โ โโโ keyword1.json # Template action file
# โ โโโ keyword2.json # Template action file
# โ โโโ bad_keyword_1.json # Error handling template
# โโโ executor/
# โโโ executor_one_file.py # Execute single file example
# โโโ executor_folder.py # Execute folder example
# โโโ executor_bad_file.py # Error handling example
Window Management
Send events directly to specific windows (Windows and Linux only):
import je_auto_control
# Send keyboard event to a window by title
je_auto_control.send_key_event_to_window("Notepad", keycode="a")
# Send mouse event to a window handle
je_auto_control.send_mouse_event_to_window(window_handle, mouse_keycode="mouse_left", x=100, y=50)
GUI Application
Launch the built-in graphical interface (requires [gui] extra):
import je_auto_control
je_auto_control.start_autocontrol_gui()
Or from the command line:
python -m je_auto_control
Command-Line Interface
AutoControl can be used directly from the command line:
# Execute a single action file
python -m je_auto_control -e actions.json
# Execute all action files in a directory
python -m je_auto_control -d ./action_files/
# Execute a JSON string directly
python -m je_auto_control --execute_str '[["AC_screenshot", {"file_path": "test.png"}]]'
# Create a project template
python -m je_auto_control -c ./my_project
A richer subcommand CLI built on the headless APIs:
# Run a script, optionally with variables, and/or a dry-run
python -m je_auto_control.cli run script.json
python -m je_auto_control.cli run script.json --var name=alice --dry-run
# List scheduler jobs
python -m je_auto_control.cli list-jobs
# Start the socket or REST server
python -m je_auto_control.cli start-server --port 9938
python -m je_auto_control.cli start-rest --port 9939
--var name=value is parsed as JSON when possible (so count=10 becomes
an int), otherwise treated as a string.
Platform Support
| Platform | Status | Backend | Notes |
|---|---|---|---|
| Windows 10 / 11 | Supported | Win32 API (ctypes) | Full feature support |
| macOS 10.15+ | Supported | pyobjc / Quartz | Action recording not available; send_key_event_to_window / send_mouse_event_to_window not supported |
| Linux (X11) | Supported | python-Xlib | Full feature support |
| Linux (Wayland) | Not supported | โ | May be added in a future release |
| Raspberry Pi 3B / 4B | Supported | python-Xlib | Runs on X11 |
Development
Setting Up
git clone https://github.com/Intergration-Automation-Testing/AutoControl.git
cd AutoControl
pip install -r dev_requirements.txt
Running Tests
# Unit tests
python -m pytest test/unit_test/
# Integration tests
python -m pytest test/integrated_test/
Project Links
- Homepage: https://github.com/Intergration-Automation-Testing/AutoControl
- Documentation: https://autocontrol.readthedocs.io/en/latest/
- PyPI: https://pypi.org/project/je_auto_control/
License
MIT License ยฉ JE-Chen. See Third_Party_License.md for the licenses of bundled and optional third-party dependencies.
