Percept
Ambient voice intelligence for AI agents. Wearable mics to agent actions via MCP.
Ask AI about Percept
Powered by Claude Β· Grounded in docs
I know everything about Percept. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
β Percept
Give your AI agent ears.
Open-source ambient voice intelligence for AI agents
Quick Start β’ Getting Started β’ API β’ Architecture β’ CLI β’ Protocol
ποΈ Ambient Voice Pipeline
https://github.com/GetPercept/percept/raw/main/demo.mp4
π€ MCP Integration β Claude Desktop
https://github.com/GetPercept/percept/raw/main/demo-mcp.mov
Percept is an open-source ambient voice pipeline that connects wearable microphones to AI agents. Wear a pendant, speak naturally, and your agent executes voice commands, summarizes meetings, identifies speakers, and builds a searchable knowledge graph β all processed locally on your machine.
What makes Percept different: It's not just transcription. The Context Intelligence Layer (CIL) transforms raw speech into structured, actionable context β entity extraction, relationship graphs, speaker resolution, and semantic search β so your agent actually understands what's being said.
Quick Start
# Install
pip install getpercept
# Start the server (receiver on :8900, dashboard on :8960)
percept serve
# Point your Omi webhook to:
# https://your-host:8900/webhook/transcript
Say "Hey Jarvis, remind me to check email" and watch it work.
β¨ Features
Voice Pipeline
- ποΈ Wake Word Detection β "Hey Jarvis" (configurable via DB settings) triggers voice commands
- β‘ 7 Action Types β Email, text, reminders, search, calendar, notes, orders β by voice
- π Auto Summaries β Meeting summaries sent via iMessage after 60s of silence
- π£οΈ Speaker Identification β Say "that was Sarah" to teach it who's talking
- π Ambient Logging β Full transcript history with timestamps and speaker labels
- π Local-First β faster-whisper runs on your machine. Audio never leaves your hardware
Context Intelligence Layer (CIL)
- π§ Entity Extraction β Two-pass pipeline: fast regex + LLM semantic extraction
- π Relationship Graph β Auto-builds entity relationships (mentioned_with, works_on, client_of)
- π― Entity Resolution β 5-tier cascade: exact β fuzzy β contextual β recency β semantic
- π Semantic Search β NVIDIA NIM embeddings + LanceDB vector store
- πΎ SQLite Persistence β Conversations, utterances, speakers, contacts, actions, relationships
- π FTS5 Full-Text Search β Porter-stemmed search across all utterances
- β° TTL Auto-Purge β Configurable retention: utterances 30d, summaries 90d, relationships 180d
Security & Privacy
- π Speaker Authorization β Allowlist of authorized speakers. Only approved voices trigger commands
- π Webhook Authentication β Bearer token or URL token (
?token=) on all webhook endpoints - π Security Audit Log β All blocked attempts logged with timestamp, speaker, transcript snippet, and reason
- π‘οΈ Command Safety Classifier β 6-category pattern matching blocks exfiltration, credential access, destructive commands, network changes, info leaks, and prompt injection. Pen tested: 7/7 attacks blocked
- π΅οΈ PII Detection & Redaction β Auto-scans transcripts for SSN, credit cards, phone numbers, emails, DOB. Redacts before storage. Luhn-validated card detection
- π Local-First β Audio and transcripts never leave your machine. No cloud dependency
- π Full security documentation β
Intent Parser
- ποΈ Two-Tier Hybrid β Fast regex (handles ~80% of commands instantly) + LLM fallback
- π’ Spoken Number Support β "thirty minutes" β 1800s, "an hour and a half" β 5400s
- π Contact Resolution β "email Sarah" auto-resolves from contacts registry
- π¬ Spoken Email Normalization β "jane at example dot com" β jane@example.com
Architecture
Mic (Omi Pendant / Apple Watch)
β BLE
Phone App (streams audio)
β Webhook
Percept Receiver (FastAPI, port 8900)
ββ Webhook authentication (Bearer token / URL token)
ββ Speaker authorization gate (allowlist check)
ββ Wake word detection (from DB settings)
ββ Intent parser (regex + LLM, injection-resistant)
ββ Conversation segmentation (3s command / 60s summary)
ββ Entity extraction + relationship graph
ββ SQLite persistence (conversations, utterances, speakers, actions)
ββ LanceDB vector indexing (NVIDIA NIM embeddings)
ββ Security audit log (blocked attempts)
ββ Action dispatch β OpenClaw / stdout / webhook
β
Dashboard (port 8960)
ββ Live transcript feed
ββ Conversation history + search
ββ Analytics (words/day, speakers, actions)
ββ Settings management (wake words, contacts, speakers)
ββ Data export + purge
Integrations
OpenClaw Skill
Install the percept-meetings skill to give your OpenClaw agent meeting context:
clawhub install percept-meetings
Your agent can then search meetings, find action items, and follow up β from Zoom, Granola, and Omi sources. See ClawHub for details.
Granola Meeting Notes
Import your Granola meeting notes into Percept's searchable knowledge base:
percept granola-sync
Reads from ~/Library/Application Support/Granola/cache-v3.json, maps documents + transcripts into Percept's conversations table. Your Omi ambient audio and Granola structured notes become one unified, searchable knowledge base β all queryable through the MCP tools or CLI.
Supports --since 2026-02-01, --dry-run, and Enterprise API mode (GRANOLA_API_KEY).
Zoom Cloud Recordings
Import Zoom meeting transcripts automatically:
# Sync last 7 days of recordings
percept zoom-sync --days 7
# Import a specific meeting or VTT file
percept zoom-import <meeting_id>
percept zoom-import /path/to/meeting.vtt --topic "Weekly Standup"
Requires a Zoom Server-to-Server OAuth app (setup guide). Also supports a webhook server for auto-import when recordings complete:
percept zoom-serve --port 8902
ChatGPT Custom GPT
Expose Percept as a ChatGPT Actions API for any Custom GPT:
# Start the API server
percept chatgpt-api --port 8901
# Export OpenAPI schema for Custom GPT import
percept chatgpt-api --export-schema openapi.json
5 REST endpoints: /api/search, /api/transcripts, /api/speakers, /api/entities, /api/status. Bearer token auth via PERCEPT_API_TOKEN.
Browser Audio Capture β Give Any AI Agent Ears for the Browser
Any audio playing in a browser tab, captured and understood by your AI agent. Meetings, podcasts, YouTube, webinars, earnings calls, online courses, customer support calls β if it plays in Chrome, your agent hears it.
No API keys. No OAuth. No per-platform integrations. One extension captures everything.
Works with any AI agent framework β Claude, ChatGPT, OpenClaw, LangChain, CrewAI, or your own. If your agent can make HTTP requests or run shell commands, it can receive browser audio.
Any Browser Tab Audio β Chrome Extension β PCM16 @ 16kHz β Your AI Pipeline
Use cases
- ποΈ Meetings β Zoom, Meet, Teams auto-detected and captured
- π§ Train your agent on any subject β Play a Stanford lecture, a podcast series, or a YouTube playlist. Your agent builds a searchable knowledge graph from everything it hears β entities, relationships, key concepts, timestamps. "What did the professor say about T-cell response in lecture 3?" Just play the content. Your agent learns.
- π Learning β YouTube tutorials, Coursera, Udemy β searchable notes your agent can reference
- π§ Podcasts & webinars β Capture and summarize while you listen
- π Competitive intel β Earnings calls, product demos, investor presentations β structured insights
- π¬ Customer calls β Browser-based support tools (Zendesk, Intercom) β auto-summarize, extract action items
- πΊ Any audio content β If it plays in a tab, your agent gets a transcript
Auto-detected meeting platforms
Google Meet β’ Zoom (web) β’ Microsoft Teams β’ Webex β’ Whereby β’ Around β’ Cal.com β’ Riverside β’ StreamYard β’ Ping β’ Daily.co β’ Jitsi β’ Discord β meetings are auto-flagged, but capture works on any tab.
Quick start
Option 1: Chrome Extension (recommended β one click, persistent capture)
chrome://extensions/β Developer mode β Load unpacked β selectsrc/browser_capture/extension/- Join any meeting in Chrome
- Click the Percept icon β Start Capturing This Tab
- Audio streams to
http://localhost:8900/audio/browseras base64 PCM16 JSON - Close the popup β capture continues in the background
Option 2: CLI via Chrome DevTools Protocol
pip install aiohttp
# List open tabs (meeting tabs flagged with ποΈ)
percept capture-browser tabs
# Auto-detect and capture meeting tabs
percept capture-browser capture
# Continuous watch mode β auto-starts when you join a meeting
percept capture-browser watch --interval 15
# Check what's capturing / stop
percept capture-browser status
percept capture-browser stop
Requires Chrome running with --remote-debugging-port=9222.
Standalone skill (for any OpenClaw agent)
clawhub install browser-audio-capture
This installs the Chrome extension + CLI as a skill that any OpenClaw agent can use β regardless of model (Claude, GPT, Gemini, Llama, etc.).
Audio output format
Audio POSTs to your endpoint as JSON:
{
"sessionId": "browser_1709234567890",
"audio": "<base64 PCM16>",
"sampleRate": 16000,
"format": "pcm16",
"source": "browser_extension",
"tabUrl": "https://meet.google.com/abc-defg-hij",
"tabTitle": "Weekly Standup"
}
Point it at any transcription service β Whisper, Deepgram, AssemblyAI, NVIDIA Riva β or pipe it straight into your agent's context. The endpoint is configurable in offscreen.js (PERCEPT_URL).
Supported Hardware
| Device | Status | Notes |
|---|---|---|
| Omi Pendant | β Live | Primary device. BLE to phone, all-day battery. "Critical to our story" |
| Apple Watch | π Beta | WatchOS app built (push-to-talk, raise-to-speak). Needs real device testing |
| Browser (CDP) | β Live | Chrome extension captures audio from any browser tab β meetings, YouTube, podcasts, courses, anything |
| AirPods | π Planned | Via phone mic passthrough |
| Any Webhook Source | β Ready | Standard HTTP webhook interface β any device that POSTs transcripts |
Supported Actions
| Action | Voice Example | Resolution |
|---|---|---|
| "Hey Jarvis, email Sarah about the meeting" | Contact lookup β email | |
| Text | "Hey Jarvis, text Rob I'm running late" | Contact lookup β phone |
| Reminder | "Hey Jarvis, remind me in thirty minutes to call the dentist" | Spoken number parsing |
| Search | "Hey Jarvis, look up the weather in Cape Town" | Web search |
| Note | "Hey Jarvis, remember the API key is in the shared doc" | Context capture |
| Calendar | "Hey Jarvis, schedule a call with Mike tomorrow at 2pm" | Calendar integration |
| Summary | "Hey Jarvis, summarize this conversation" | On-demand summary |
CLI Quick Reference
percept serve # Start receiver + dashboard
percept listen # Start receiver, output JSON events
percept status # Pipeline health check
percept transcripts # List recent transcripts
percept transcripts --today # Today's transcripts only
percept actions # List recent voice actions
percept search "budget" # Semantic search over conversations
percept audit # Data stats (conversations, utterances, storage)
percept purge --older-than 90 # Delete old data
percept config # Show configuration
percept config --set whisper.model_size=small
percept speakers list # Show authorized + known speakers
percept speakers authorize SPEAKER_0 # Authorize a speaker
percept speakers revoke SPEAKER_0 # Revoke a speaker
percept config set webhook_secret <token> # Set webhook auth token
percept security-log # View blocked attempts
# Meeting source connectors
percept granola-sync # Import from Granola (local cache)
percept granola-sync --api # Import via Granola Enterprise API
percept zoom-sync --days 7 # Sync recent Zoom recordings
percept zoom-import <id> # Import specific Zoom meeting
percept zoom-import file.vtt # Import local VTT transcript
percept chatgpt-api # Start ChatGPT Actions API (port 8901)
# Browser audio capture
percept capture-browser tabs # List tabs (flags meetings)
percept capture-browser capture # Start capturing (auto-detects meetings)
percept capture-browser watch # Auto-detect mode (continuous)
percept capture-browser status # Show active captures
percept capture-browser stop # Stop all captures
See CLI Reference for full details.
MCP Server (Claude Desktop / Anthropic Ecosystem)
Percept exposes all capabilities as MCP (Model Context Protocol) tools, so Claude can natively search your conversations, check transcripts, and more.
# Start MCP server (stdio transport)
percept mcp
Claude Desktop Configuration
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"percept": {
"command": "/path/to/percept/.venv/bin/python",
"args": ["/path/to/percept/run_mcp.py"]
}
}
}
Restart Claude Desktop after editing. The Percept tools will appear automatically.
Available MCP Tools
| Tool | Description |
|---|---|
percept_search | Full-text search across conversations |
percept_transcripts | List recent transcripts |
percept_actions | Voice command history |
percept_speakers | Known speakers with word counts |
percept_status | Pipeline health check |
percept_security_log | Blocked attempts log |
percept_conversations | Conversations with summaries |
percept_listen | Live transcript stream |
MCP Resources
percept://statusβ Current pipeline statuspercept://speakersβ Known speakers list
Dashboard
The web dashboard runs on port 8960 and provides:
- Live transcript feed β real-time stream of what's being said
- Conversation history β searchable archive with speaker labels
- Analytics β words/day, segments/hour, speaker breakdown, action history
- Settings page β manage wake words, speakers, contacts, transcriber config from DB
- Entity graph β browse extracted entities and relationships
- Search β FTS5 keyword search with LanceDB vector search fallback
- Data management β export all data as JSON, purge by TTL or manually
Transcription
| Transcriber | Status | Use Case |
|---|---|---|
| Omi on-device | β Default | Omi app transcribes locally, sends text via webhook |
| faster-whisper | β Built | Local transcription for raw audio (base model, int8, M-series optimized) |
| NVIDIA Parakeet | β Tested | NVIDIA NIM ASR via gRPC. Superior accuracy, requires API key |
| Deepgram | π Planned | Cloud ASR option |
Three-tier strategy: Local (faster-whisper) β NVIDIA (Parakeet NIM) β Cloud (Deepgram)
Data Model (SQLite)
| Table | Purpose | Records |
|---|---|---|
conversations | Full conversation records with transcripts, summaries | Core |
utterances | Atomic speech units (FTS5 indexed, porter stemming) | CIL atomic unit |
speakers | Speaker profiles with word counts, relationships | Identity |
contacts | Name β email/phone lookup with aliases | Resolution |
actions | Voice command history with status tracking | Audit |
entity_mentions | Entity occurrences per conversation | CIL extraction |
relationships | Weighted entity graph (source, target, type, evidence) | CIL knowledge |
authorized_speakers | Speaker allowlist for command authorization | Security |
security_log | Blocked attempts (unauthorized, invalid auth, injection) | Security |
settings | Runtime config (wake words, timeouts, transcriber) | Config |
Percept Protocol
The Percept Protocol defines a framework-agnostic JSON schema for voiceβintentβaction handoff:
- 6 event types: transcript, conversation, intent, action_request, action_response, summary
- 3 transports: JSON Lines on stdout, WebSocket, Webhook
- Unix composable:
percept listen | jq 'select(.type == "intent")' | my-agent
π Documentation
| Doc | Description |
|---|---|
| Getting Started | Install, configure Omi, first voice command |
| Configuration | Config file, wake words, transcriber, CIL settings, environment variables |
| CLI Reference | Every command, every flag, with examples |
| API Reference | Webhook endpoints, dashboard API, request/response formats |
| Architecture | Pipeline diagram, CIL design, data flow, extending Percept |
| Percept Protocol | JSON event protocol for agent integration |
| OpenClaw Integration | Using Percept with OpenClaw |
| Decisions | Architecture Decision Records β what we chose and why |
| Roadmap | Current status and what's next |
| Contributing | Dev setup, PR guidelines, good first issues |
Built for OpenClaw
Percept is designed as a first-class OpenClaw skill, but works standalone with any agent framework β LangChain, CrewAI, AutoGen, or a simple webhook.
# With OpenClaw
openclaw skill install percept
# Without OpenClaw β pipe events anywhere
percept listen --format json | your-agent-consumer
Five skill components: percept-listen, percept-voice-cmd, percept-summarize, percept-speaker-id, percept-ambient
See OpenClaw Integration for details.
Project Structure
percept/
βββ src/
β βββ receiver.py # FastAPI server, webhooks, wake word, action dispatch
β βββ transcriber.py # faster-whisper transcription, conversation tracking
β βββ intent_parser.py # Two-tier intent parser (regex + LLM fallback)
β βββ database.py # SQLite persistence (11 tables, FTS5, WAL mode)
β βββ context_engine.py # CIL: Context packet assembly, entity resolution
β βββ entity_extractor.py # CIL: Two-pass entity extraction + relationship building
β βββ vector_store.py # NVIDIA NIM embeddings + LanceDB semantic search
β βββ context.py # Context extraction, conversation file saving
β βββ cli.py # CLI entry point (9 commands)
βββ config/config.json # Server, whisper, audio settings
βββ data/
β βββ percept.db # SQLite database (WAL mode)
β βββ vectors/ # LanceDB vector store
β βββ conversations/ # Conversation markdown files
β βββ summaries/ # Auto-generated summaries
β βββ speakers.json # Speaker ID β name mapping
β βββ contacts.json # Contact registry
βββ dashboard/
β βββ server.py # Dashboard FastAPI backend (port 8960)
β βββ index.html # Dashboard web UI
βββ protocol/
β βββ PROTOCOL.md # Event protocol specification
β βββ schemas/ # JSON Schema for 6 event types
βββ landing/ # getpercept.ai landing page (port 8950)
βββ watch-app/ # Apple Watch app (push-to-talk, raise-to-speak)
βββ scripts/ # Utility scripts (backfill, vector indexing)
βββ research/ # Research notes (OpenHome, Zuna BCI, etc.)
βββ docs/ # Full documentation
Contributing
We'd love your help:
- β Star the repo β helps more than you think
- π§ͺ Try it β install, use it for a day, file issues
- π§ Build β language packs, hardware integrations, new action types
- π£ Share β blog about it, tweet about it
See Contributing Guide for dev setup and PR guidelines.
License
MIT β do whatever you want with it.
"Fei-Fei Li gave AI eyes with ImageNet. We're giving AI agents ears."
