Agentic Vision
Persistent visual memory for AI agents β capture screenshots, embed with CLIP ViT-B/32, compare, recall. MCP server + Rust core library.
Ask AI about Agentic Vision
Powered by Claude Β· Grounded in docs
I know everything about Agentic Vision. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Quickstart Β· Problems Solved Β· Why Β· Benchmarks Β· How It Works Β· Install Β· API Β· Papers
AI agents can't see across sessions.
Your agent takes a screenshot, analyzes it, and forgets. Next session β blank slate. It can't compare what a page looks like now versus yesterday. It can't recall what the error dialog said three conversations ago. It can't search its own visual history.
Text-based memory exists. Visual memory doesn't β until now.
AgenticVision gives AI agents persistent visual memory. Capture images, embed them with CLIP ViT-B/32, store them in a compact binary format, and query them by similarity, time, or description. Every capture is a first-class MCP resource that any LLM can access.
Problems Solved (Read This First)
- Problem: agents cannot remember what they saw last session.
Solved:.aviskeeps persistent visual history across sessions and model changes. - Problem: visual regressions are noticed late or missed.
Solved: built-in compare and diff workflows surface change quickly. - Problem: screenshots pile up with no searchable structure.
Solved: each capture is embedded, timestamped, and queryable by similarity and metadata. - Problem: image context stays trapped in one tool.
Solved: MCP tools/resources expose visual memory to any compatible client. - Problem: what an agent sees is disconnected from what it remembers.
Solved: memory linking connects visual captures directly to cognitive graph nodes.
cargo install agentic-vision-cli agentic-vision-mcp
CLI + MCP binaries. 21 MCP tools. Persistent .avis files. Works with Claude Desktop, VS Code, Cursor, Windsurf, and any MCP-compatible client.
Benchmarks
Rust core. CLIP ViT-B/32 via ONNX Runtime. Binary .avis format. Real numbers from cargo test --release:
| Operation | Time | Notes |
|---|---|---|
| Image capture (file β embed β store) | 47 ms | CLIP ViT-B/32, 512-dim |
| Similarity search (top-5) | 1-2 ms | Brute-force cosine, f64 precision |
| Visual diff (pixel-level) | <1 ms | 8Γ8 grid region detection |
| MCP tool round-trip | 7.2 ms | Including process startup (~6.1 ms) |
| Storage per capture | ~4.26 KB | Embedding + JPEG thumbnail |
| Capacity per GB | ~250K | Observations |
All benchmarks on Apple M4, macOS 26.2, Rust 1.90.0
--release. ONNX Runtime for CLIP inference. Fallback mode available when ONNX model is not present.
Why AgenticVision
Agents need visual continuity. A debugging agent should remember what the UI looked like before and after a code change. A monitoring agent should detect visual regressions. A research agent should build a visual knowledge base over time.
Capture once, query forever. Every image is embedded into a 512-dimensional CLIP vector and stored with its JPEG thumbnail, timestamp, and description. Query by cosine similarity, time range, or text search β in milliseconds.
Binary format, not a database. The .avis file is a single portable binary β 64-byte header, JSON payload, JPEG thumbnails. Copy it, share it, back it up. No server, no database, no dependencies.
Works with every MCP client. AgenticVision-MCP exposes 21 tools, 6 resources, and 4 prompts via the Model Context Protocol. Any LLM that speaks MCP gains visual memory automatically.
Links to AgenticMemory. The vision_link tool connects visual captures to AgenticMemory cognitive graph nodes β bridging what an agent sees with what it knows.
Ghost Writer
New in v0.2.4 -- Auto-syncs visual context to your AI coding tools every 5 seconds.
| Client | Config Location | Status |
|---|---|---|
| Claude Code | ~/.claude/memory/VISION_CONTEXT.md | Full support |
| Cursor | ~/.cursor/memory/agentic-vision.md | Full support |
| Windsurf | ~/.windsurf/memory/agentic-vision.md | Full support |
| Cody | ~/.sourcegraph/cody/memory/agentic-vision.md | Full support |
Syncs: recent captures, observations, visual tool calls. Zero configuration. Context survives sessions automatically.
MCP Hardening
New in v0.2.5 -- Production-grade stdio transport.
- Content-Length framing with 8 MiB limit
- JSON-RPC 2.0 validation
- Atomic writes (temp + rename + fsync)
- No silent fallbacks
How It Works
-
Capture β
vision_captureaccepts images from files, base64, screenshots, or the system clipboard. Each image is resized, embedded via CLIP ViT-B/32 into a 512-dimensional vector, compressed to JPEG thumbnail, and stored in the.avisbinary file. Screenshots support optional region capture; clipboard reads the current image from the OS clipboard. -
Query β
vision_queryretrieves captures by time range, description, recency, and quality constraints (min_quality,sort_by). Results include capture metadata, quality scores, thumbnails, and similarity scores. -
Compare β
vision_compareplaces two captures side-by-side for LLM analysis.vision_diffperforms pixel-level differencing with 8Γ8 grid region detection to identify exactly what changed. -
Link β
vision_linkconnects captures to AgenticMemory nodes, bridging visual observations with the agent's cognitive graph. An agent can recall "what did the UI look like when I made that decision?"
The .avis binary format uses a 64-byte fixed header (magic 0x41564953, version, counts, timestamps) followed by a JSON payload containing captures with embedded JPEG thumbnails and 512-dim float vectors. Single-file, portable, no external dependencies.
MCP surface area
21 Tools (core 11 + grounding 3 + workspace 5 + observation 1 + session 1):
| Tool | Description |
|---|---|
vision_capture | Capture and embed an image (file, base64, screenshot, clipboard), with metadata redaction and quality scoring |
vision_compare | Side-by-side comparison of two captures |
vision_query | Query captures by time, description, recency |
vision_ocr | Extract text from a captured image |
vision_similar | Find visually similar captures (cosine similarity) |
vision_track | Track visual changes to a target over time |
vision_diff | Pixel-level diff between two captures |
vision_health | Quality + staleness + memory-link coverage summary |
vision_link | Link a capture to an AgenticMemory node |
session_start | Begin a named observation session |
session_end | End the current session |
6 Resources:
| URI | Description |
|---|---|
avis://capture/{id} | Single capture with metadata and thumbnail |
avis://session/{id} | All captures in a session |
avis://timeline/{start}/{end} | Captures within a time range |
avis://similar/{id} | Visually similar captures |
avis://stats | Storage statistics and counts |
avis://recent | Most recent captures |
4 Prompts:
| Prompt | Description |
|---|---|
observe | Guided visual observation workflow |
compare | Structured comparison between captures |
track | Change tracking over time |
describe | Detailed image description |
Install
One-liner (desktop profile, backwards-compatible):
curl -fsSL https://agentralabs.tech/install/vision | bash
Environment profiles (one command per environment):
# Desktop MCP clients (auto-merge Claude Desktop + Claude Code when detected)
curl -fsSL https://agentralabs.tech/install/vision/desktop | bash
# Terminal-only (no desktop config writes)
curl -fsSL https://agentralabs.tech/install/vision/terminal | bash
# Remote/server hosts (no desktop config writes)
curl -fsSL https://agentralabs.tech/install/vision/server | bash
| Channel | Command | Result |
|---|---|---|
| GitHub installer (official) | curl -fsSL https://agentralabs.tech/install/vision | bash | Installs release binaries when available, otherwise source fallback; merges MCP config |
| GitHub installer (desktop profile) | curl -fsSL https://agentralabs.tech/install/vision/desktop | bash | Explicit desktop profile behavior |
| GitHub installer (terminal profile) | curl -fsSL https://agentralabs.tech/install/vision/terminal | bash | Installs binaries only; no desktop config writes |
| GitHub installer (server profile) | curl -fsSL https://agentralabs.tech/install/vision/server | bash | Installs binaries only; server-safe behavior |
| crates.io + Cargo deps (official) | cargo install agentic-vision-cli agentic-vision-mcp + cargo add agentic-vision | Installs avis, MCP server binary, and adds the core library crate to your project |
| npm (wasm) | npm install @agenticamem/vision | WASM-based vision SDK for Node.js and browser |
Server auth and artifact sync
For cloud/server runtime:
export AGENTIC_TOKEN="$(openssl rand -hex 32)"
All MCP clients must send Authorization: Bearer <same-token>.
If .avis/.amem/.acb files are on another machine, sync them to the server first.
CLI + MCP Server (for Claude Desktop, VS Code, Cursor, Windsurf):
cargo install agentic-vision-cli agentic-vision-mcp
Core library (for Rust projects):
cargo add agentic-vision
Configure Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"agentic-vision": {
"command": "agentic-vision-mcp",
"args": ["--vision", "~/.vision.avis", "serve"]
}
}
}
See INSTALL.md for full installation guide, VS Code / Cursor configuration, build from source, and troubleshooting.
Do not use
/tmpfor vision files β macOS and Linux clear this directory periodically. Use~/.vision.avisfor persistent storage.
Deployment Model
- Standalone by default: AgenticVision is independently installable and operable. Integration with AgenticMemory or AgenticCodebase is optional, never required.
- Autonomic operations by default: daemon/runtime maintenance uses safe profile-based defaults with cache hygiene, migration safeguards, and health-ledger snapshots.
| Area | Default behavior | Controls |
|---|---|---|
| Autonomic profile | Conservative local-first posture | `CORTEX_AUTONOMIC_PROFILE=desktop |
| Cache + registry maintenance | Periodic expiry cleanup and registry GC | CORTEX_MAINTENANCE_TICK_SECS, CORTEX_REGISTRY_GC_EVERY_TICKS, CORTEX_REGISTRY_GC_KEEP_DELTAS |
| Storage migration | Policy-gated with checkpointed auto-safe path | `CORTEX_STORAGE_MIGRATION_POLICY=auto-safe |
| Storage budget policy | 20-year projection + capture rollup under pressure | `CORTEX_STORAGE_BUDGET_MODE=auto-rollup |
| Maintenance throttling | SLA-aware under sustained cache pressure | CORTEX_SLA_MAX_CACHE_ENTRIES_BEFORE_GC_THROTTLE |
| Health ledger | Periodic operational snapshots (default: ~/.agentra/health-ledger) | CORTEX_HEALTH_LEDGER_DIR, AGENTRA_HEALTH_LEDGER_DIR, CORTEX_HEALTH_LEDGER_EMIT_SECS |
Quickstart
MCP (Claude Desktop, VS Code, Cursor)
After configuring the MCP server (see Install), ask your agent:
"Take a screenshot and remember it."
The LLM calls vision_capture automatically. Then later:
"What did the screen look like earlier?"
The LLM calls vision_query to retrieve and display past captures.
Rust API
use agentic_vision::{VisionStore, CaptureSource};
let mut store = VisionStore::open("observations.avis")?;
// Capture from file
let id = store.capture(
CaptureSource::File("screenshot.png"),
"Homepage after deploy"
)?;
// Find similar
let matches = store.similar(id, 5)?;
for m in matches {
println!(" {} (similarity: {:.3})", m.description, m.score);
}
Common Workflows
-
Track UI regression -- After a deploy, capture before/after screenshots and compare:
vision_capture (before deploy screenshot, label: "pre-deploy") vision_capture (after deploy screenshot, label: "post-deploy") vision_diff id_a=<before_id> id_b=<after_id> # Pixel-level region diff -
Build visual evidence trail -- During debugging, attach screenshots to memory nodes:
vision_capture source=screenshot, labels=["bug-123", "dialog-state"] vision_link capture_id=<id> memory_node_id=<node> relationship="evidence_for" -
Find similar UI states -- When diagnosing a recurring visual bug:
vision_similar capture_id=<current_issue_id> top_k=5 min_similarity=0.8 -
Audit capture quality -- Periodic maintenance to clean up stale or low-quality captures:
vision_health stale_after_hours=168 low_quality_threshold=0.45
Validation
| Suite | Tests | Notes |
|---|---|---|
Rust core (agentic-vision) | 38 | Unit + integration (includes screenshot/clipboard) |
| Python SDK tests | 47 | Edge cases, format validation |
| MCP integration suite | 3 | Python β Rust stdio transport |
| Multi-agent suite | 3 | Shared file, vision-memory linking, rapid handoff |
| Total | 91 | All passing |
Two research papers:
- Paper I: Cortex β Web Cartography (10 pages, 8 figures, 13 tables)
- Paper II: AgenticVision-MCP β Persistent Visual Memory via MCP (8 pages, 4 figures, 7 tables)
Repository Structure
This is a Cargo workspace monorepo containing the core library, CLI, MCP server, and FFI bindings.
agentic-vision/
βββ Cargo.toml # Workspace root
βββ crates/
β βββ agentic-vision/ # Core library (crates.io: agentic-vision v0.2.2)
β βββ agentic-vision-cli/ # CLI (crates.io: agentic-vision-cli v0.2.2)
β βββ agentic-vision-mcp/ # MCP server (crates.io: agentic-vision-mcp v0.2.2)
β βββ agentic-vision-ffi/ # FFI bindings (crates.io: agentic-vision-ffi v0.2.2)
βββ tests/ # Integration tests (Python β Rust, multi-agent)
βββ models/ # ONNX model directory (CLIP ViT-B/32)
βββ publication/ # Research papers (I, II)
βββ assets/ # SVG diagrams and visuals
βββ docs/ # Guides and reference
Running Tests
# All workspace tests (unit + integration)
cargo test --workspace
# Core library only
cargo test -p agentic-vision
# MCP server only
cargo test -p agentic-vision-mcp
# Python integration tests
python tests/integration/test_mcp_clients.py
python tests/integration/test_multi_agent.py
MCP Server Quick Start
cargo install agentic-vision-cli agentic-vision-mcp
Configure Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"agentic-vision": {
"command": "agentic-vision-mcp",
"args": ["--vision", "~/.vision.avis", "serve"]
}
}
}
Configure VS Code / Cursor (.vscode/settings.json):
{
"mcp.servers": {
"agentic-vision": {
"command": "agentic-vision-mcp",
"args": ["--vision", "~/.vision.avis", "serve"]
}
}
}
agentic-vision-mcp supports both line-delimited JSON-RPC and Content-Length framed MCP stdio messages.
Roadmap: Next β Remote Server Support
The next release is planned to add HTTP/SSE transport for remote deployments. Track progress in #2.
| Feature | Status |
|---|---|
--token bearer auth | Planned |
--multi-tenant per-user vision files | Planned |
/health endpoint | Planned |
--tls-cert / --tls-key native HTTPS | Planned |
OCR with Tesseract (--features ocr) | Planned |
| Clipboard TIFF fix | Planned |
delete / export / compact CLI commands | Planned |
| Docker image + compose | Planned |
| Remote deployment docs | Planned |
Planned CLI shape (not available in current release):
agentic-vision-mcp serve-http --port 8081 --token "<token>"
agentic-vision-mcp serve-http --multi-tenant --data-dir /data/users --port 8081 --token "<token>"
The .avis File
Your agent's visual memory. Everything it's seen.
| Size | ~5-8 GB over 20 years |
| Format | Binary captures with embeddings |
| Works with | Any vision-capable model |
v0.2: Grounding & Workspaces
Grounding: Agent cannot claim "page shows X" without capture evidence.
Workspaces: Compare across sites and time periods.
Contributing
See CONTRIBUTING.md. The fastest ways to help:
- Try it and file issues
- Add an MCP tool β extend the visual memory surface
- Write an example β show a real use case
- Improve docs β every clarification helps someone
Privacy and Security
- All captures stay local in
.avisfiles -- no telemetry, no cloud sync by default. - Metadata scrubbing removes EXIF and location data from captured images before storage.
- Storage budget policy prevents unbounded disk growth with 20-year projection and capture rollup.
- Server mode requires an explicit
AGENTIC_TOKENenvironment variable for bearer auth. - Quality scoring helps identify and prune low-value captures to keep the store lean.
Built by Agentra Labs
