Video Analyzer
MCP server for video analysis β extracts transcripts, key frames, OCR text, and metadata from video URLs. Supports Loom and direct video files.
Ask AI about Video Analyzer
Powered by Claude Β· Grounded in docs
I know everything about Video Analyzer. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
mcp-video-analyzer
Featured in awesome-mcp-servers.
MCP server for video analysis β extracts transcripts, key frames, and metadata from video URLs. Supports Loom, direct video files (.mp4, .webm), and more.
No existing video MCP combines transcripts + visual frames + metadata in one tool. This one does.
Installation
Prerequisites
- Node.js 18+ β required to run the server via
npx - yt-dlp (optional) β enables frame extraction via ffmpeg. Install with
pip install yt-dlp - Chrome/Chromium (optional) β fallback for frame extraction if yt-dlp is unavailable
Without yt-dlp or Chrome, the server still works β you'll get transcripts, metadata, and comments, just no frames.
Claude Code (CLI)
claude mcp add video-analyzer -- npx mcp-video-analyzer@latest
Then restart Claude Code or start a new conversation.
VS Code / Cursor
Add to your MCP settings file:
- VS Code:
File β Preferences β Settings β search "MCP"or edit~/.vscode/mcp.json/%APPDATA%\Code\User\mcp.json(Windows) - Cursor:
Settings β MCP Servers β Add
{
"servers": {
"mcp-video-analyzer": {
"type": "stdio",
"command": "npx",
"args": ["mcp-video-analyzer@latest"]
}
}
}
Then reload the window (Ctrl+Shift+P β "Developer: Reload Window").
Claude Desktop
Add to your Claude Desktop config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"video-analyzer": {
"command": "npx",
"args": ["mcp-video-analyzer@latest"]
}
}
}
Then restart Claude Desktop.
Verify it works
Once installed, ask your AI assistant:
Analyze this video: https://www.loom.com/share/bdebdfe44b294225ac718bad241a94fe
If the server is connected, it will automatically call the analyze_video tool.
Tools
analyze_video β Full video analysis
Extracts everything from a video URL in one call:
> Analyze this video: https://www.loom.com/share/abc123...
Returns:
- Transcript with timestamps and speakers
- Key frames extracted via scene-change detection (automatically deduplicated)
- OCR text extracted from frames (code, error messages, UI text visible on screen)
- Annotated timeline merging transcript + frames + OCR into a unified "what happened when" view
- Metadata (title, duration, platform)
- Comments from viewers
- Chapters and AI summary (when available)
The AI will automatically call this tool when it sees a video URL β no need to ask.
Options:
detailβ analysis depth:"brief"(metadata + truncated transcript, no frames),"standard"(default),"detailed"(dense sampling, more frames)fieldsβ array of specific fields to return, e.g.["metadata", "transcript"]. Available:metadata,transcript,frames,comments,chapters,ocrResults,timeline,aiSummarymaxFrames(1-60, default depends on detail level) β cap on extracted framesthreshold(0.0-1.0, default 0.1) β scene-change sensitivityforceRefreshβ bypass cache and re-analyzeskipFramesβ skip frame extraction for transcript-only analysis
get_transcript β Transcript only
> Get the transcript from this video
Quick transcript extraction. Falls back to Whisper transcription when no native transcript is available.
get_metadata β Metadata only
> What's this video about?
Returns metadata, comments, chapters, and AI summary without downloading the video.
get_frames β Frames only
> Extract frames from this video with dense sampling
Two modes:
- Scene-change detection (default) β captures visual transitions
- Dense sampling (
dense: true) β 1 frame/sec for full coverage
analyze_moment β Deep-dive on a time range
> Analyze what happens between 1:30 and 2:00 in this video
Combines burst frame extraction + filtered transcript + OCR + annotated timeline for a focused segment. Use when you need to understand exactly what happens at a specific moment.
get_frame_at β Single frame at a timestamp
> Show me the frame at 1:23 in this video
The AI reads the transcript, spots a critical moment, and requests the exact frame to see what's on screen.
get_frame_burst β N frames in a time range
> Show me 10 frames between 0:15 and 0:17 of this video
For motion, vibration, animations, or fast scrolling β burst mode captures N frames in a narrow window so the AI can see frame-by-frame changes.
Detail Levels
| Level | Frames | Transcript | OCR | Timeline | Use case |
|---|---|---|---|---|---|
brief | None | First 10 entries | No | No | Quick check β what's this video about? |
standard | Up to 20 (scene-change) | Full | Yes | Yes | Default β full analysis |
detailed | Up to 60 (1fps dense) | Full | Yes | Yes | Deep analysis β every second captured |
Caching
Results are cached in memory for 10 minutes. Subsequent calls with the same URL and options return instantly. Use forceRefresh: true to bypass the cache.
Supported Platforms
| Platform | Transcript | Metadata | Comments | Frames | Auth |
|---|---|---|---|---|---|
| Loom | Yes | Yes | Yes | Yes | None |
| Direct URL (.mp4, .webm) | No | Duration only | No | Yes | None |
Frame Extraction Strategies
Frame extraction uses a two-strategy fallback chain β no single dependency is required:
| Strategy | How it works | Speed | Requirements |
|---|---|---|---|
| yt-dlp + ffmpeg (primary) | Downloads video, extracts frames via scene detection | Fast, precise | yt-dlp (pip install yt-dlp) |
| Browser (fallback) | Opens video in headless Chrome, seeks to timestamps, takes screenshots | Slower, no download needed | Chrome or Chromium installed |
The fallback is automatic β if yt-dlp is not available, the server tries browser-based extraction via puppeteer-core. If neither is available, analysis still returns transcript + metadata + comments, just no frames.
Post-Processing Pipeline
After frame extraction, the pipeline automatically applies:
| Step | What it does | Why |
|---|---|---|
| Frame deduplication | Removes near-identical consecutive frames using perceptual hashing (dHash + Hamming distance) | Screencasts often have long static moments β dedup removes redundant frames, saving tokens |
| OCR | Extracts text visible on screen from each frame (via tesseract.js) | Captures code, error messages, terminal output, UI text that the transcript doesn't cover |
| Annotated timeline | Merges transcript timestamps + frame timestamps + OCR text into a single chronological view | Gives the AI a unified "what was said, what changed visually, and what text appeared" at each moment |
The OCR step requires tesseract.js (included as a dependency). If it fails to load, analysis continues without OCR β no frames or transcript are lost.
Complementary Tools
Chrome DevTools MCP
For live web debugging alongside video analysis, pair this server with the Chrome DevTools MCP:
claude mcp add chrome-devtools npx @anthropic-ai/mcp-devtools@latest
When to use each:
| Scenario | Tool |
|---|---|
| Bug report recorded as a Loom video | mcp-video-analyzer β extract transcript, frames, and error text from the recording |
| Live debugging a web page | Chrome DevTools MCP β inspect DOM, console, network, take screenshots |
| Video shows UI issue, need to reproduce it | Use both: analyze the video first, then open the page in Chrome DevTools to reproduce |
The two MCPs complement each other: video analyzer understands recorded content, DevTools interacts with live pages.
Example Output
The examples/loom-demo/ folder contains real outputs from analyzing a public Loom video (Boost In-App Demo Video, 2:55).
| File | What it shows |
|---|---|
metadata.json | Title, duration, platform |
transcript.json | 42 timestamped entries with speaker IDs |
timeline.json | Unified chronological view (transcript + frames merged) |
moment-transcript-0m30s-0m45s.json | Filtered transcript for analyze_moment (0:30β0:45) |
full-analysis.json | Complete analyze_video output |
Frame images (19 total in examples/loom-demo/frames/):
scene_*.jpgβ scene-change detection (key visual transitions)dense_*.jpgβ 1fps dense sampling (every 10th frame saved as sample)burst_*.jpgβ burst extraction for moment analysis (0:30β0:45)
Regenerate after changes:
npx tsx examples/generate.tsβ requires yt-dlp + network access.
Development
# Install dependencies
npm install
# Run all checks (format, lint, typecheck, knip, tests)
npm run check
# Build
npm run build
# Run E2E tests (requires network)
npm run test:e2e
# Open MCP Inspector for manual testing
npm run inspect
Architecture
src/
βββ index.ts # Entry point (shebang + stdio)
βββ server.ts # FastMCP server + tool registration
βββ tools/ # MCP tool definitions (7 tools)
β βββ analyze-video.ts # Full analysis with detail levels + caching
β βββ analyze-moment.ts # Deep-dive on a time range
β βββ get-transcript.ts # Transcript-only with Whisper fallback
β βββ get-metadata.ts # Metadata + comments + chapters
β βββ get-frames.ts # Frames-only (scene-change or dense)
β βββ get-frame-at.ts # Single frame at timestamp
β βββ get-frame-burst.ts # N frames in a time range
βββ adapters/ # Platform-specific logic
β βββ adapter.interface.ts # IVideoAdapter interface + registry
β βββ loom.adapter.ts # Loom: authless GraphQL
β βββ direct.adapter.ts # Direct URL: any mp4/webm link
βββ processors/ # Shared processing
β βββ frame-extractor.ts # ffmpeg scene detection + dense + burst extraction
β βββ browser-frame-extractor.ts # Headless Chrome fallback for frames
β βββ audio-transcriber.ts # Whisper fallback (HF transformers β CLI β OpenAI)
β βββ image-optimizer.ts # sharp resize/compress
β βββ frame-dedup.ts # Perceptual dedup (dHash + Hamming distance)
β βββ frame-ocr.ts # OCR text extraction (tesseract.js)
β βββ annotated-timeline.ts # Unified timeline (transcript + frames + OCR)
βββ config/
β βββ detail-levels.ts # brief / standard / detailed config
βββ utils/
β βββ cache.ts # In-memory TTL cache with LRU eviction
β βββ field-filter.ts # Selective field filtering for responses
β βββ url-detector.ts # Platform detection from URL
β βββ vtt-parser.ts # WebVTT β transcript entries
β βββ temp-files.ts # Temp directory management
βββ types.ts # Shared TypeScript interfaces
License
MIT
