Osborn
Voice-enabled AI coding assistant with LiveKit + Claude Code
Ask AI about Osborn
Powered by Claude Β· Grounded in docs
I know everything about Osborn. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Osborn - Voice AI Research & Development Assistant
Voice-enabled research and coding assistant powered by LiveKit + Claude Agent SDK. Talk to your code, research deeply, and build plans before executing.
Features
- Voice Interface: Real-time voice conversation using LiveKit Agents 1.2.x
- Three Voice Modes:
- Pipeline (default): STT (Deepgram Flux semantic turn detection) β ClaudeLLM (persistent session) wrapped by
PipelineDirectLLM+ parallel Gemini Flash AFC observer β TTS. Interruption context is enriched into the next user message. - Direct: STT β ClaudeLLM β TTS, no parallel observer.
- Realtime: OpenAI Realtime / Gemini Live native speech-to-speech, with the model acting as a thin teleprompter calling
ask_fast_brainfor every turn.
- Pipeline (default): STT (Deepgram Flux semantic turn detection) β ClaudeLLM (persistent session) wrapped by
- Persistent Session: Single Claude subprocess per voice session β no JSONL replay after first message. Uses
query()with anAsyncIterable<SDKUserMessage>MessageChanneland a long-lived background consumer. - Multi-Agent Orchestration: Sonnet orchestrator delegates to three named sub-agents β researcher (Sonnet, read-only investigation), reasoner (Opus, deep analysis and planning), writer (Sonnet with verify-first workflow).
- Research Mode: Read code, search web, run commands, fetch YouTube transcripts, save findings to session workspace.
- Claude Agent SDK v0.2.91: Full tool access (Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch, Task, TodoWrite). File checkpointing + rewind via
enableFileCheckpointing. - Permission System: Approve/deny operations via voice or UI.
agent_type-awarePreToolUsehook routes the writer sub-agent throughcanUseToolfor explicit prompts; researcher/reasoner/main are restricted to workspace paths.canUseToolauto-approves workspace writes EXCEPTspec.md(the fast brain owns it). - Session Management: Browse and resume conversations from ALL Claude Code projects on your machine β sorted by mtime via
listAllClaudeSessions(). - Fast Brain (Teleprompter): Single gateway
askFastBrain()handling every realtime user turn. Provider chain: Gemini Flash primary (~1-2s, 1M tokens, no cold start) β Anthropic Haiku β Agent SDK fallback. 12 JSONL tools includingread_agent_results,read_agent_text,deep_read_results,deep_read_text,search_jsonl,read_subagents,get_session_stats,send_to_chat. - Pipeline Fast Brain: Gemini Flash AFC (Automatic Function Calling) observer runs in parallel with Claude. Three tools:
search_session(ripgrep over the summary index + byte-offset reads),get_recent, andemergency_stopthat aborts and restarts the Claude subprocess on destructive-action user signals. - Summary Index: Compact line-oriented index over JSONL session files with byte-offset reads (<5ms search). Lazy build + file watcher in pipeline mode.
- Ripgrep + BM25 Search: Bundled
@vscode/ripgrepbinary for fast regex search across JSONL files; in-memoryminisearchBM25 index over recent session messages. - Meeting Integration: Recall.ai bot joins Zoom/Google Meet and routes real-time transcripts to Claude as
[Meeting β Speaker]: text. - JSONL Session Access: 25+ functions in
session-access.tsfor reading FULL untruncated tool results, agent reasoning, and sub-agent transcripts from~/.claude/projects/. - Non-Blocking Research:
executeResearch()runs background Claude queries; the SDK handles internal queuing. Progress is debounced (8s batch) and contextualized through the fast brain before voice relay. - Parallel Sub-Agents: Orchestrator spawns concurrent
Tasksub-agents for independent research streams. - Visual Documents: Mermaid diagrams, comparison tables, analysis docs generated via
generateVisualDocument()and surfaced in the files panel. - Gemini Auto-Recovery: Automatic session recovery from crashes (1008/1011) with conversation-history briefing on resume.
- MCP Integration: Stdio servers + Smithery cloud servers via in-process
smithery-proxy(bypasses Claude SDK HTTP bug #18296). - Research Artifacts: Plans, diagrams (Mermaid), notes, and analysis files β persist across session resumes via
listWorkspaceArtifacts(). - Claude OAuth Flow: For headless / cloud deployments.
claude-auth.tswalks env β credentials file β CLI check β interactiveclaude setup-tokenvianode-pty. Auth URL surfaces viaclaude_auth_urldata channel; user pastes code back via the auth modal; token persists in~/.claude/.credentials.json. - Per-User Cloud Sandboxes: Self-hosted Daytona on a Hostinger VPS provisions isolated Linux sandboxes per user (Claude Code + osborn pre-installed). Each user authenticates Claude via their own OAuth flow inside the sandbox. Tokens persist on the sandbox filesystem. Local vs Cloud toggle in dashboard settings β see
DAYTONA-SETUP.md. waitForToolboxReady(): Bridges Daytona's metadata-vs-reverse-proxy race so Resume actually delivers a working sandbox instead of a 502.autoStopInterval: 0: Sandboxes don't auto-stop because self-hosted Daytona has a backup-system bug that fills disk on every cycle. Defense-in-depth via daily backup-prune cron on the VPS.killCurrentLLM()cleanup: Hard-kills the persistent Claude subprocess on disconnect so the SDK doesn't keep draining MessageChannel into a dead session.- Self-Healing CWD Fallback:
[OSBORN_CWD env, config.workingDirectory, process.cwd()]walked in priority order, picking the first thatexistsSync(). Cures the misleading "Claude Code executable not found" error which is actuallychild_process.spawnENOENT on a stale cwd. - LiveKit Cloud Turn Detector Shim:
CloudTurnDetectorimplements_TurnDetectordirectly viafetchtoLIVEKIT_REMOTE_EOT_URL, noJobContext/ worker framework required.
Cloud Sandboxes (Fly.io Sprites)
As of April 2026, Sprites (frontend/src/lib/sprites.ts) replaces the self-hosted Daytona setup for per-user cloud sandboxes. Sprites are persistent Linux sandboxes (Ubuntu, Node 22, 100GB NVMe) managed by Fly.io.
Required env: SPRITES_API_TOKEN in frontend/.env.local.
Key behaviors:
- First-run provisioning: ~6 minutes (npm install from scratch)
- Checkpoint restore: ~10-20 seconds (CRIU)
- Auto-hibernates after ~30s idle β wakes on HTTP request
- Service registration may 503 for ~45s after creation β retry loop handles this
- Unique sprite names per create (since v0.8.38):
generateUniqueSpriteNameappends a base36 timestamp suffix to bypass any per-name "stuck routing" entries in Sprites' API gateway.findUserSandbox(userId, knownSandboxId?)reads the actual sprite name from Supabase as source of truth; deterministic name is fallback only. - Warm-wake LiveKit kick (since v0.8.38): When
startSandbox()resumes a warm sprite that has marker bootstrap, itrestartService()to give the agent a fresh process with a fresh LiveKit WebSocket. CRIU snapshot preserves the local socket but LiveKit Cloud has evicted the agent during hibernation; without a process restart the agent is a "ghost" in the room. - Marker-bootstrap install loop avoidance: Bootstrap writes
/home/sprite/.osborn-installed-versionafter a successful install. Subsequent restarts compare WANT vs marker and skip install when they match. - Two-click delete confirmation: Sprites does NOT soft-delete (probed 6 different undelete endpoint shapes β all 404). Trash icon arms on first click, deletes on second click within 4s. Auto-disarms otherwise.
fs/writeis asymmetric with container view: writes via fs API land on the persistent disk and are NOT visible to the running container (overlay layer). Don't use it to "inject" files into a sprite β bake them into the bootstrap or copy them via service exec.process.cwd()is/home/sprite/workspace(perOSBORN_CWD). Files shipped with the npm package must resolve via ESM__dirname, not cwd. See themeeting-output.htmlhandler inagent/src/index.tsfor the 3-candidate path pattern.
Planned: Pre-warm pool to reduce new-user wait to ~30s. Agent-side LiveKit reconnect watchdog so warm-wake doesn't require a service restart.
The Daytona setup (DAYTONA_API_KEY, daytona.ts) is preserved but no longer active.
- Multi-User Auth: Google + GitHub OAuth via Supabase SSR. Dashboard with recent chats, settings, agent health. Auto-connect on login.
- File Attachments: Upload images/files to Supabase Storage (
osborn-storagebucket). Images render inline viaMessageContent; files render as download cards. - Setup Wizard: 6-step in-dashboard wizard generates
agent/.envandfrontend/.env.localfor first-run local users. - Files Explorer Modal: Full-screen file viewer with type badges, copy/copy-all, and inline rendering of plans, diagrams, notes, HTML.
- Mobile-First UI: Responsive dark theme (amber/charcoal), hamburger menu drawer, compact visualizer, sheet-style permission modals.
- Permission Modal: Git-style diff viewer (
diff+diff2html) with line numbers, addition/deletion counts, expand/collapse, dismissable auth errors.
Research Mode
The agent operates in a single research mode. It reads code, searches the web, runs commands, fetches YouTube transcripts, and saves findings to a session workspace co-located with Claude's native JSONL files (~/.claude/projects/{slug}/osb/{sessionId}/). Write operations are restricted to the workspace directory for safety. Research artifacts appear in the always-visible Files panel.
Architecture
Frontend (Next.js 14) <--> LiveKit Cloud <--> Agent (local or cloud sandbox)
βββ ClaudeLLM (persistent SDK session)
β βββ researcher sub-agent (Sonnet, read-only)
β βββ reasoner sub-agent (Opus, planning)
β βββ writer sub-agent (Sonnet, verify-first)
βββ Fast Brain (Gemini Flash primary,
β Anthropic Haiku fallback, 12 JSONL tools)
βββ Pipeline Fast Brain (Gemini Flash AFC observer
β with search_session, get_recent, emergency_stop)
βββ OpenAI/Gemini Realtime (voice)
βββ Recall.ai (meeting bot integration)
βββ Smithery cloud MCP proxy
βββ Self-hosted Daytona sandboxes (per-user)
Quick Start
Option 1: Using Hosted Frontend
# Install and run the agent
npx osborn
# Copy the room code shown (e.g., "abc123")
# Visit https://osborn.app
# Enter the room code and click Join
Option 2: Local Development
- Clone and install:
git clone https://github.com/Erriccc/osborn.git
cd osborn
cd agent && npm install
cd ../frontend && npm install
- Configure environment variables:
agent/.env:
LIVEKIT_URL=wss://your-livekit-url
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
ANTHROPIC_API_KEY=your-anthropic-key # Or rely on the Claude OAuth flow
GOOGLE_API_KEY=your-google-key # Recommended β Gemini Flash is fast brain primary
OPENAI_API_KEY=your-openai-key # At least one of OpenAI or Google required
DEEPGRAM_API_KEY=your-deepgram-key # STT for direct/pipeline modes
# Optional:
RECALL_API_KEY=your-recall-key # Zoom / Google Meet bot integration
SMITHERY_API_KEY=your-smithery-key # Cloud MCP servers
LIVEKIT_REMOTE_EOT_URL=... # LiveKit Cloud remote turn detection
OSBORN_CWD=/path/to/project # Override config.workingDirectory
OSBORN_API_PORT=8741 # Auto-bumps on EADDRINUSE
frontend/.env.local:
NEXT_PUBLIC_LIVEKIT_URL=wss://your-livekit-url
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
- Run:
# Terminal 1: Agent
cd agent && npm run dev
# Terminal 2: Frontend
cd frontend && npm run dev
# Open http://localhost:3000
Configuration
Create ~/.osborn/config.yaml:
workingDirectory: /path/to/project
voiceMode: pipeline # 'direct' | 'realtime' | 'pipeline' (default)
defaultProvider: openai # or 'gemini' (for realtime mode)
realtime:
provider: openai # or 'gemini'
openaiVoice: alloy
geminiVoice: Puck
direct:
stt:
provider: deepgram
tts:
provider: deepgram
voice: aura-asteria-en
mcpServers:
github:
enabled: true
command: npx
args: ['-y', '@modelcontextprotocol/server-github']
env:
GITHUB_PERSONAL_ACCESS_TOKEN: ${GITHUB_TOKEN}
You can also configure these via the in-dashboard Setup Wizard (SetupWizard.tsx) β a 6-step UX that generates agent/.env and frontend/.env.local and verifies agent health for first-run users.
Voice Commands
Research
- "Research how authentication works in this project"
- "Show me the data flow from API to database"
- "Create a diagram of the component architecture"
- "Write up a plan for adding OAuth"
- "Search for best practices on rate limiting"
- "What dependencies does this project use?"
- "Get the transcript for this YouTube video"
Project Structure
osborn/
βββ agent/ # LiveKit voice agent (backend)
β βββ src/
β β βββ index.ts # Main entry: room events, session creation, voice queue, HTTP API,
β β β # killCurrentLLM, self-healing CWD chain, executeResearch
β β βββ claude-llm.ts # ClaudeLLM persistent session, three sub-agents,
β β β # agent_type-aware PreToolUse hook, canUseTool gating
β β βββ pipeline-direct-llm.ts # PipelineDirectLLM wraps ClaudeLLM + interruption enrichment
β β βββ pipeline-fastbrain.ts # Gemini Flash AFC observer with search_session/get_recent/emergency_stop
β β βββ fast-brain.ts # askFastBrain orchestrator, 12 JSONL tools, spec consolidation
β β βββ session-access.ts # JSONL session reader (~25 functions, full untruncated)
β β βββ jsonl-search.ts # ripgrep + BM25 search over JSONL (bundled @vscode/ripgrep)
β β βββ summary-index.ts # Compact summary index with byte-offset reads
β β βββ prompts.ts # Centralized prompts (~15 exports)
β β βββ config.ts # Config, sessions, workspace helpers, MCP catalog
β β βββ recall-client.ts # Recall.ai meeting bot integration
β β βββ claude-auth.ts # Claude OAuth flow (env β file β CLI β pty setup-token)
β β βββ smithery-proxy.ts # Smithery cloud MCP proxy (bypasses SDK HTTP bug #18296)
β β βββ voice-io.ts # STT/TTS/VAD/Realtime model factory
β β βββ turn-detector-shim.ts # CloudTurnDetector for LiveKit Cloud (no JobContext)
β β βββ status-manager.ts # Background TaskStatus tracker (singleton)
β β βββ codex-llm.ts # Optional @openai/codex-sdk LLM wrapper
β β βββ codex-handler.ts # Standalone Codex handler
β β βββ bridge-llm.ts # Gemini/GPT-4o LiveKit LLM factory for pipelined configs
β β βββ claude-handler.ts # Standalone Agent SDK handler (predates ClaudeLLM)
β β βββ meeting-output.html # Recall.ai bot audio output page
β βββ package.json
βββ frontend/ # Next.js 14 web frontend
β βββ src/
β β βββ components/
β β β βββ VoiceRoom.tsx # Main voice UI (~2000 lines)
β β β βββ MarkdownMessage.tsx # Markdown + Mermaid renderer
β β β βββ SessionBrowser.tsx # Past session browser (all projects)
β β β βββ FilesExplorerModal.tsx # Full-screen files explorer
β β β βββ LogsDrawer.tsx # Bottom drawer for debug log
β β β βββ SetupWizard.tsx # 6-step env-file generator
β β βββ lib/
β β β βββ daytona.ts # Server-only Daytona sandbox provisioning
β β β βββ setup.ts # Pure setup utilities
β β β βββ sessions.ts # Client-safe session helpers
β β β βββ supabase*.ts # Supabase client factories
β β βββ app/
β β β βββ api/token/route.ts # LiveKit JWT token generation
β β β βββ api/instance/route.ts # User instance CRUD
β β β βββ api/sandbox/route.ts # Daytona sandbox CRUD + keepalive
β β β βββ dashboard/page.tsx # Recent chats, settings, agent health
β β β βββ chat/page.tsx # Voice chat wrapper
β β β βββ page.tsx # Landing page (OAuth + guest)
β β βββ middleware.ts # Supabase auth middleware
β βββ package.json
βββ CLAUDE.md # AI coding assistant guidance
βββ PROGRESS.md # Current feature status
βββ CHANGELOG.md # Version history
βββ DAYTONA-SETUP.md # Self-hosted Daytona deployment notes
βββ README.md
Current Status (v0.8.6)
| Component | Status |
|---|---|
| Voice Interface (LiveKit Agents 1.2.x) | Working |
| Persistent ClaudeLLM session (no per-message JSONL replay) | Working |
| Multi-Agent Orchestration (researcher/reasoner/writer) | Working |
| Pipeline Mode (Claude + Gemini fast brain observer) | Working |
| OpenAI Realtime | Working |
| Gemini Live (with auto-recovery) | Working |
| Direct Mode (STT + Claude + TTS) | Working |
| Claude Agent SDK v0.2.91 | Working |
Permission System (agent_type-aware PreToolUse + canUseTool) | Working |
spec.md write block (fast brain owns it) | Working |
| Skill auto-install for writer agent | Working |
| Session Management (all-projects scanner) | Working |
| Research Artifacts (files panel) | Working |
| Recall.ai Meeting Integration | Working |
| Non-blocking research (SDK-managed queuing) | Working |
| Parallel sub-agents (Task tool) | Working |
| Fast Brain (Gemini Flash primary, 12 JSONL tools) | Working |
| Pipeline Fast Brain (Gemini AFC + emergency_stop) | Working |
| Summary Index (byte-offset reads, <5ms search) | Working |
| Ripgrep + BM25 search over JSONL | Working |
| JSONL Session Access (full untruncated data) | Working |
| Post-Research JSONL Consolidation | Working |
| Visual Documents (Mermaid, comparison, analysis) | Working |
| Proactive Conversational Loop | Working |
| Claude OAuth flow (pty) | Working |
| Files Panel + Files Explorer Modal | Working |
| MCP Integration (Smithery cloud proxy) | Working |
| Cloud Sandboxes (self-hosted Daytona) | Working |
| Per-User Claude OAuth (sandbox-side) | Working |
waitForToolboxReady (Daytona race fix) | Working |
autoStopInterval: 0 (Daytona disk-fill fix) | Working |
killCurrentLLM() subprocess cleanup | Working |
| Self-healing CWD fallback chain | Working |
| LiveKit Cloud turn detector shim | Working |
| Local/Cloud mode toggle | Working |
| Setup Wizard (6-step in-dashboard) | Working |
Tech Stack
- Voice: LiveKit Agents 1.2.x +
@livekit/rtc-node0.13.x - Realtime AI: OpenAI Realtime API / Gemini Live API
- Coding Agent: Claude via
@anthropic-ai/claude-agent-sdkv0.2.91 - Sub-Agents: researcher (Sonnet), reasoner (Opus), writer (Sonnet, verify-first)
- Fast Brain (primary): Gemini Flash via
@google/genai(~1-2s, 1M tokens) - Fast Brain (fallback): Anthropic Haiku via
@anthropic-ai/sdk, then Agent SDK - Pipeline Fast Brain: Gemini Flash AFC observer with
search_session,get_recent,emergency_stop - Search: bundled
@vscode/ripgrep+minisearchBM25 - Meeting: Recall.ai (Zoom / Google Meet bot integration)
- MCP: stdio + Smithery cloud via in-process
@smithery/api/mcpproxy - Frontend: Next.js 14 + React + Tailwind CSS,
react-markdown+mermaid+highlight.js - Auth: Supabase SSR (Google + GitHub OAuth)
- Cloud Sandboxes: Self-hosted Daytona on Hostinger VPS via raw HTTP
- Claude OAuth:
node-ptyinteractiveclaude setup-tokenflow for headless deployments - STT: Deepgram Flux (semantic turn detection)
- TTS: Deepgram aura, with OpenAI / ElevenLabs / Gemini options
- Optional:
@openai/codex-sdk(alternative coding agent β wired but not used by default)
License
MIT
