io.github.HurleySk/can-see
Let AI agents see and interact with terminal/CLI apps via PNG screenshots
Ask AI about io.github.HurleySk/can-see
Powered by Claude Β· Grounded in docs
I know everything about io.github.HurleySk/can-see. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
can-see
MCP server that lets AI agents see and interact with terminal/CLI applications through virtual terminals and PNG screenshots.
Built for Claude Code and any MCP-compatible agent.
Why?
Some things are easier to show than describe. When debugging a TUI app, an interactive CLI wizard, or anything with visual terminal output, can-see lets the agent see exactly what you see β colors, layout, cursor position, and all.
How it works
- Launch a CLI app in a virtual terminal (node-pty + @xterm/headless)
- Screenshot the terminal as a PNG image (rendered via node-canvas)
- Send keys/text to interact with the app
- Screenshot again to see the result
- Close the session when done
Installation
npm install -g can-see
Prerequisites
can-see depends on node-canvas (Cairo) and node-pty, which require native compilation. Most systems will need:
- Windows: Visual Studio Build Tools (C++ workload) β
npm install --global windows-build-toolsor install from Visual Studio Installer - macOS: Xcode Command Line Tools β
xcode-select --install - Linux:
sudo apt install build-essential libcairo2-dev libjpeg-dev libpango1.0-dev libgif-dev librsvg2-dev
Configuration
Claude Code
Add to your project's .mcp.json:
{
"mcpServers": {
"can-see": {
"command": "npx",
"args": ["-y", "can-see"]
}
}
}
Or if installed globally:
{
"mcpServers": {
"can-see": {
"command": "can-see"
}
}
}
Other MCP clients
can-see uses stdio transport. Point your MCP client at the can-see binary or npx -y can-see.
Tools
| Tool | Description |
|---|---|
launch | Start a CLI app in a virtual terminal. Returns a sessionId. Accepts optional env to set environment variables. |
screenshot | Capture the terminal as a PNG image. |
screenshot_region | Capture a specific rectangular area of the terminal. |
screenshot_text_region | Find text in the viewport and capture the surrounding area as a PNG. |
capture_baseline | Snapshot terminal state for later diff comparison. |
diff_screenshot | Compare current state against baseline with highlighted changes. |
get_cell_info | Query character, colors, and attributes at specific cell(s). Supports compact mode for reduced output. |
read_text | Read the terminal buffer as plain text. |
read_scrollback | Read text that scrolled above the visible viewport. |
wait_for_text | Wait until specific text appears in the terminal buffer. |
wait_for_idle | Wait until terminal output has been stable for a given duration. Supports stableMs for content-comparison mode (for apps with timers/spinners), excludeRows to ignore specific rows, and excludePattern (regex) for dynamic row exclusion. |
wait_for_color | Wait until a specific color appears at a position. |
wait_for_exit | Wait until the process exits and return its exit code and signal. |
start_recording | Begin capturing frames for an animated GIF. |
stop_recording | Stop recording and return the animated GIF with metadata (frameCount, durationMs). Auto-trims frames or saves to file if GIF exceeds inline size limit. |
send_keys | Send keystrokes (e.g., Enter, Ctrl+C, ['Down', 'Down', 'Enter']). |
send_text | Type a string of text into the app. |
get_process_status | Get process status β distinguish "app is idle" from "app has exited". Returns PID, running state, exit code. |
list_sessions | List all active terminal sessions. |
close | Kill the app and clean up. Always close when done. |
close_all | Kill all active sessions at once. Useful for cleanup between test runs. |
Supported keys
Enter, Tab, Escape, Backspace, Space, Up, Down, Left, Right, Home, End, Delete, PageUp, PageDown, Ctrl+A through Ctrl+Z.
Environment variables
| Variable | Default | Description |
|---|---|---|
DEFAULT_COLS | 120 | Terminal width in columns |
DEFAULT_ROWS | 30 | Terminal height in rows |
IDLE_TIMEOUT_MS | 300000 | Auto-close idle sessions after this many ms (5 min) |
Example usage
From an MCP-connected agent:
Agent: I'll launch your app to see what's happening.
β launch("node", ["app.js"]) β sessionId: "abc-123"
Agent: Let me wait for the app to start.
β wait_for_text("abc-123", "Ready") β Found "Ready" after 1200ms
Agent: Let me read the current output.
β read_text("abc-123") β "Welcome to MyApp\nReady\n> "
Agent: I can see the prompt. Let me select option 2.
β send_keys("abc-123", ["Down", "Enter"])
Agent: Waiting for the screen to settle.
β wait_for_idle("abc-123") β Terminal idle for 520ms
Agent: Let me check the result.
β screenshot("abc-123") β [PNG image showing result]
Agent: Done, closing the session.
β close("abc-123")
Changelog
0.5.0
New tools:
wait_for_exitβ wait for process exit, get exit code and signalclose_allβ kill all active sessions at onceget_process_statusβ distinguish "app is idle" from "app has exited"screenshot_text_regionβ find text in viewport, capture surrounding area as PNG
Enhancements:
launchacceptsenvparameter for custom environment variableswait_for_idlesupportsexcludePattern(regex) for dynamic row exclusion in stableMs modestop_recordingreturnsframeCountanddurationMsmetadata alongside GIFget_cell_infosupportscompactoption for reduced output ({char, fg, bold}only)
Bug fixes:
- Fixed
wait_for_textandwait_for_colorrace condition where text/color present in the final buffer was missed when the process exited simultaneously - Added mutual exclusion validation when both
stableMsandidleMsare passed towait_for_idle
License
MIT
