Oscribe
If you can see it, Oscribe can click it.
Installation
npx oscribeAsk AI about Oscribe
Powered by Claude Β· Grounded in docs
I know everything about Oscribe. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
OScribe
Vision-based desktop automation MCP server. Control any application via screenshot + AI vision.
Supported Platforms & Applications
Operating Systems
|
macOS |
Windows |
Native Applications
|
Finder File management |
Explorer File operations |
System Settings macOS & Windows |
Web Browsers (CDP-enhanced)
|
Chrome 200-300+ elements |
Brave Full CDP support |
Edge, Arc, Opera Chromium-based |
Note: Chrome 136+ requires automatic profile sync (~20-30s) due to CDP security changes.
Table of Contents
- Supported Platforms & Applications
- Why OScribe?
- Demo
- Features
- Quick Start
- MCP Integration
- How It Works
- Configuration
- Troubleshooting
- License
- Acknowledgements
Why OScribe?
"If you can see it, OScribe can click it."
OScribe is your fallback when traditional automation tools fail:
- Legacy apps without APIs
- Games and canvas apps without DOM
- Third-party software you can't modify
- Ad-hoc automation without infrastructure setup
Demo
Helltaker - Full Chapter 1 Automated
Claude plays through the entire first chapter of Helltaker using OScribe MCP tools - navigating menus, solving puzzles, and progressing through dialogue, all via screenshot + vision.
Features
- π― Vision-based - Locate UI elements by description using Claude vision
- π UI Automation - Get element coordinates via Windows accessibility tree
- π§ MCP Server - Integrates with Claude Desktop, Claude Code, Cursor, Windsurf
- β‘ Native Input - Uses robotjs for reliable mouse/keyboard control
- πΈ Multi-monitor - Supports multiple screens with DPI awareness
- πͺ Windows - Currently tested on Windows only
- βοΈ Electron Support - Full UI element detection in Electron apps (via NVDA)
Quick Start
Guided Installation (Recommended)
Run our interactive installer that checks and installs all prerequisites for you:
# macOS/Linux
curl -fsSL https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/install.mjs | node
# Windows (PowerShell as Administrator)
irm https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/install.mjs -OutFile install.mjs; node install.mjs
The installer will:
- β Check Node.js version (22+ required)
- β Check/install Python
- β Check/install build tools (VS Build Tools or Xcode CLI)
- β Install OScribe
Manual Installation
If you prefer manual installation or already have prerequisites:
npm install -g oscribe
Then configure your MCP client (see MCP Integration below).
Installation
System Prerequisites
OScribe uses robotjs for native mouse/keyboard control, which requires compilation tools:
Windows
-
Node.js 22+ - Download
-
Python 3.x - Download (check "Add to PATH" during install)
-
Visual Studio Build Tools - Install with C++ workload:
# Option 1: Via npm (recommended) npm install -g windows-build-tools # Option 2: Manual install # Download from https://visualstudio.microsoft.com/visual-cpp-build-tools/ # Select "Desktop development with C++" workload
macOS
-
Node.js 22+ - Download or
brew install node -
Xcode Command Line Tools:
xcode-select --install -
Python 3.x - Usually pre-installed, verify with
python3 --version
Verify Prerequisites
Before installing, run the diagnostic script to check all prerequisites:
# macOS/Linux - Run directly without installation
curl -fsSL https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs | node
# Windows (PowerShell)
irm https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs -OutFile doctor.mjs; node doctor.mjs
The doctor script checks:
- Node.js version (22+)
- Python installation
- Build tools (VS Build Tools on Windows, Xcode CLI on macOS)
It provides step-by-step fix instructions for any missing prerequisites.
After OScribe is installed, you can also run:
oscribe doctor
Additional Requirements
- Claude Desktop, Claude Code, or any MCP client (provides OAuth authentication)
From npm (Recommended)
# Global installation
npm install -g oscribe
# Verify installation
oscribe --version
From Source
git clone https://github.com/mikealkeal/oscribe.git
cd oscribe
npm install
npm run build
npm link # Makes 'oscribe' command available globally
Platform Support
| Platform | Status |
|---|---|
| Windows | β Fully supported |
| macOS | β Supported |
| Linux | π§ Not tested yet |
Windows Details
- PowerShell (included)
- UI Automation via PowerShell + .NET
- NVDA support for Electron apps
macOS Details
- Native screencapture command
- UI Automation via AXUIElement API (
ax-readerbinary) - Requires: Accessibility permissions (System Settings β Privacy & Security β Accessibility)
- Add Terminal or your IDE to allowed apps
- IMPORTANT for VSCode users: You must also authorize VSCode in "App Management" (Login Items & Extensions)
- Open System Settings β General β Login Items & Extensions
- Find "Visual Studio Code"
- Toggle ON the switch
- Enter your password or use Touch ID to confirm
- This is required for OScribe MCP to control your system from Claude Code
- Native apps (Chrome, Safari, Finder) work well
- Electron apps (VS Code, etc.) have limited element detection (same as Windows without NVDA)
Usage
CLI Commands
Vision-Based Clicking (The Core of OScribe!)
oscribe click "Submit button" # Click by description - the magic!
oscribe click "File menu" # Works on any visible element
oscribe click "Export as PNG" --screen 1 # Target specific monitor
oscribe click "Close" --dry-run # Preview without clicking
Input & Automation
oscribe type "hello world" # Type text
oscribe hotkey "ctrl+c" # Press keyboard shortcut
oscribe hotkey "ctrl+shift+esc" # Multiple modifiers
Screenshots
oscribe screenshot # Capture primary screen
oscribe screenshot -o capture.png # Save to file
oscribe screenshot --screen 1 # Capture second monitor
oscribe screenshot --list # List available screens
oscribe screenshot --describe # Describe screen content with AI
Window Management
oscribe windows # List open windows
oscribe focus "Chrome" # Focus window by name
oscribe focus "Calculator" # Works with partial matches
MCP Server
oscribe serve # Start MCP server (stdio transport)
Global Options
--verbose, -v # Detailed output
--dry-run # Simulate without executing
--quiet, -q # Minimal output
--screen N # Target specific screen (default: 0)
Examples
# Take screenshot and save
oscribe screenshot -o desktop.png
# Type with delay between keystrokes
oscribe type "slow typing" --delay 100
# Use second monitor
oscribe screenshot --screen 1 --describe
# Dry run to see what would happen
oscribe type "test" --dry-run
MCP Integration
OScribe exposes tools via Model Context Protocol for AI agents. Works with Claude Desktop, Claude Code, Cursor, Windsurf, and any MCP-compatible client.
Quick Setup
Claude Desktop
Edit your config file:
| OS | Config Path |
|---|---|
| Windows | %APPDATA%\Claude\claude_desktop_config.json |
| macOS | ~/Library/Application Support/Claude/claude_desktop_config.json |
Add OScribe to mcpServers:
{
"mcpServers": {
"oscribe": {
"command": "npx",
"args": ["-y", "oscribe", "serve"]
}
}
}
Or if installed globally (npm install -g oscribe):
{
"mcpServers": {
"oscribe": {
"command": "oscribe",
"args": ["serve"]
}
}
}
Then restart Claude Desktop. You'll see a π icon indicating MCP tools are available.
Claude Code / Cursor / Windsurf
Add a .mcp.json file in your project root:
{
"mcpServers": {
"oscribe": {
"command": "npx",
"args": ["-y", "oscribe", "serve"]
}
}
}
Or if installed globally:
{
"mcpServers": {
"oscribe": {
"command": "oscribe",
"args": ["serve"]
}
}
}
Available MCP Tools
| Tool | Description | Parameters |
|---|---|---|
os_screenshot | πΈ Capture screenshot + cursor position | screen? (default: 0) |
os_inspect | π Get UI elements via Windows UI Automation | window? |
os_inspect_at | π― Get element info at coordinates | x, y |
os_move | Move mouse cursor | x, y |
os_click | Click at current cursor position | window?, button? |
os_click_at | Move + click in one action | x, y, window?, button? |
os_type | Type text | text |
os_hotkey | Press keyboard shortcut | keys (e.g., "ctrl+c") |
os_scroll | Scroll in direction | direction, amount? |
os_windows | List open windows + screens | - |
os_focus | Focus window by name | window |
os_wait | Wait for duration (UI loading) | ms (max 30000) |
os_nvda_status | Check NVDA screen reader status (Electron support) | - |
os_nvda_install | Download NVDA portable for Electron apps | - |
os_nvda_start | Start NVDA in silent mode | - |
os_nvda_stop | Stop NVDA screen reader | - |
MCP Usage Example
Once configured, Claude can automate your desktop:
"Take a screenshot and describe what you see"
"Inspect the UI elements and click the Submit button"
"List all windows and focus on Chrome"
"Type 'hello world' and press Ctrl+Enter"
Workflow: Claude uses os_screenshot to see the screen, os_inspect to get element coordinates, then os_move + os_click for precise interaction.
Configuration
Config directory: ~/.oscribe/
Files
config.json- Application settings
config.json
{
"defaultScreen": 0,
"dryRun": false,
"logLevel": "info",
"cursorSize": 128
}
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
defaultScreen | number | 0 | Default monitor to capture |
dryRun | boolean | false | Simulate actions without executing |
logLevel | string | "info" | Log level: debug, info, warn, error |
cursorSize | number | 128 | Cursor size in screenshots (32-256) |
nvda.autoDownload | boolean | false | Auto-download NVDA when needed |
nvda.autoStart | boolean | true | Auto-start NVDA for Electron apps |
nvda.customPath | string | - | Custom NVDA installation path |
How It Works
OScribe uses a multi-layer approach for desktop automation (Windows):
-
Screenshot Layer - Captures screen using PowerShell + .NET System.Drawing
-
UI Automation Layer - Gets element coordinates via Windows accessibility tree:
- Uses Windows UI Automation API via PowerShell
- Returns interactive elements with screen coordinates
- Works like a DOM for desktop apps
-
Input Layer - Uses robotjs for:
- Mouse movement and clicks
- Keyboard input and hotkeys
- Adapts to Windows mouse button swap settings
Best strategy: Use os_screenshot which returns UI elements with coordinates, then os_move + os_click for precise interaction.
Development
Setup
git clone https://github.com/mikealkeal/oscribe.git
cd oscribe
npm install
Scripts
npm run build # Build TypeScript
npm run dev # Development mode (watch)
npm run typecheck # Type check only
npm run lint # Run ESLint
npm run lint:fix # Fix linting issues
npm run format # Format with Prettier
npm run clean # Remove dist folder
Project Structure
oscribe/
βββ bin/
β βββ oscribe.ts # CLI entry point
βββ src/
β βββ core/
β β βββ screenshot.ts # Multi-platform screen capture
β β βββ input.ts # Mouse/keyboard control (robotjs)
β β βββ windows.ts # Window management
β β βββ uiautomation.ts # Windows UI Automation (accessibility)
β βββ cli/
β β βββ commands/ # CLI command implementations
β β βββ index.ts # Command registration
β βββ mcp/
β β βββ server.ts # MCP server (12 tools)
β βββ config/
β β βββ index.ts # Config management with Zod
β βββ index.ts # Main exports
βββ package.json
βββ tsconfig.json
βββ .env.example
βββ LICENSE
Tech Stack
- Runtime: Node.js 22+ (ESM)
- Language: TypeScript 5.7+ (strict mode)
- Validation: Zod
- CLI: Commander + Chalk + Ora
- Vision: Anthropic SDK (Claude Sonnet 4)
- Input: robotjs (native automation)
- Screenshot: screenshot-desktop + platform-specific tools
- MCP: @modelcontextprotocol/sdk
Troubleshooting
Installation Issues
npm install fails with node-gyp errors:
First, run the diagnostic script (no installation required):
# macOS/Linux
curl -fsSL https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs | node
# Windows (PowerShell)
irm https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs -OutFile doctor.mjs; node doctor.mjs
This is usually due to missing build tools. robotjs requires native compilation.
# Error examples:
# - "gyp ERR! find Python"
# - "gyp ERR! find VS"
# - "node-pre-gyp ERR! build error"
Windows fix:
# 1. Install Python (if missing)
# Download from https://www.python.org/downloads/
# IMPORTANT: Check "Add Python to PATH" during installation
# 2. Install Visual Studio Build Tools
npm install -g windows-build-tools
# Or manually: download from https://visualstudio.microsoft.com/visual-cpp-build-tools/
# Select "Desktop development with C++" workload
# 3. Retry installation
npm install -g oscribe
macOS fix:
# 1. Install Xcode Command Line Tools
xcode-select --install
# 2. Retry installation
npm install -g oscribe
Still failing? Try clearing npm cache:
npm cache clean --force
npm install -g oscribe
MCP Server Issues
Server not starting:
- Check Node.js version:
node --version(requires 22+) - Rebuild if needed:
npm run build - Check path in your MCP config file
Tools not appearing in Claude Desktop:
- Restart Claude Desktop after config changes
- Check
claude_desktop_config.jsonsyntax (valid JSON) - Look for π icon in Claude Desktop interface
Windows Issues
Clicks not working:
- OScribe auto-detects swapped mouse buttons
- No manual configuration needed
UI elements not detected:
- Some apps don't expose UI Automation elements
- Use
os_screenshotto see what's visible - Coordinates are returned in the screenshot response
Electron apps showing few UI elements:
Electron/Chromium apps require NVDA screen reader to expose their full accessibility tree:
# Install NVDA portable (one-time)
oscribe nvda install
# Start NVDA silently (no audio)
oscribe nvda start
Or via MCP tools: os_nvda_install β os_nvda_start
NVDA runs in silent mode (no speech, no sounds). The agent will prompt to install NVDA when needed.
Manual NVDA installation:
If you prefer to install NVDA yourself, download from nvaccess.org and set the path in config:
{
"nvda": {
"customPath": "C:/Program Files/NVDA"
}
}
License
BSL 1.1 (Business Source License 1.1)
- β Free for personal use
- β Free for open-source projects
- β οΈ Commercial use requires a paid license (until 2029)
- π Converts to MIT on 2029-01-30 (then free for everyone)
See LICENSE for full terms.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Guidelines
- Follow the existing code style (ESLint + Prettier configured)
- Add tests for new features
- Update documentation as needed
- Ensure
npm run buildsucceeds - Check types with
npm run typecheck
Areas for Contribution
- Additional platform support (BSD, other Unix variants)
- More sophisticated element location strategies
- Performance optimizations
- Additional MCP tools
- Better error messages
- Documentation improvements
Support
- π Bug reports: GitHub Issues
- π¬ Questions: GitHub Discussions
- π Documentation: This README + inline code comments
Roadmap
- npm package distribution
- Web interface for remote control
- Recording and playback of automation sequences
- Multi-provider vision support (GPT-4V, Gemini)
- Plugin system for custom tools
- Docker container distribution
Acknowledgements
OScribe is built on top of these great open-source projects:
- robotjs - Native mouse/keyboard control
- screenshot-desktop - Cross-platform screen capture
- @anthropic-ai/sdk - Claude API client
- @modelcontextprotocol/sdk - MCP server framework
- ffmpeg - GIF generation (optional, external)
Maintained by MickaΓ«l Bellun
