HomeAssist
Home Assistant & Personal AI Infrastructure for premium agent-assisted living! First stop on the road to Tony Stark's Jarvis.
Ask AI about HomeAssist
Powered by Claude Β· Grounded in docs
I know everything about HomeAssist. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
HomeAssist
Voice Controlled Personal AI infrastructure. Always listening. Low latency. Actually useful.
HomeAssist V3 is a voice-first assistant tailored to my life, workflows, and dataβmodular, agentic, and built to feel like a real companion/mentor (not a reactive chatbot). It doesn't just answer: it proactively surfaces what matters, delivers briefings, and nudges me at the right moment.
V3 improvements: Dramatically faster response times and proactive jump-in briefings that speak immediately on wake word detection.
Say the wake word and speak naturallyβHomeAssist handles lights, music, calendar, questions, and more. The first steps on the road to building a real-life Jarvis.
What It Does
HomeAssist is a hands-free voice interface that:
- Listens continuously for a customizable wake word
- Transcribes speech in real-time with streaming ASR
- Thinks with LLMs to understand requests and generate responses
- Speaks responses via neural text-to-speech
- Controls smart home devices through an extensible tool system
- Remembers you across sessions with persistent and semantic memory
You can interrupt it mid-sentence (barge-in), ask multi-step questions, and have natural back-and-forth conversations.
Capabilities
Voice Interaction
| Feature | Description |
|---|---|
| Wake word detection | Hands-free activation with custom trigger phrases |
| Real-time transcription | Low-latency streaming speech-to-text |
| Neural TTS | Natural-sounding responses (multiple voice options) |
| Barge-in | Interrupt the assistant by speaking |
| Send phrases | "Send it", "Sir" to submit your message |
| Auto-send | Automatic submission after silence timeout |
| Termination phrase | Instant cutoff -> end the session by saying "over out" at any time |
Smart Home & Services
| Tool | What It Controls |
|---|---|
| Lighting | Kasa smart lightsβon/off, brightness, color, scenes |
| Spotify | Music playback, search, queue management |
| Calendar | Google Calendarβview events, create appointments |
| Weather | Current conditions and forecasts |
| Web Search | Google search with AI-summarized results |
| SMS | Send text messages via macOS Messages |
| Notifications | Email summaries, news digests, custom alerts |
| System Info | Explain the assistantβs architecture and capabilities |
| Cursor | Local editor automation helpers |
Memory & Context
| System | Purpose |
|---|---|
| Conversation context | Maintains history within a session |
| Persistent memory | Remembers facts, preferences, and patterns about you |
| Vector memory | Semantic search over past conversations |
| Briefing announcements | Proactive updates spoken on wake word |
Tool Chaining
Multi-step requests are handled automatically:
"Find that song we talked about and text me the link"
The assistant recognizes this needs multiple tools (search β SMS), executes them in sequence, and confirms completion.
Architecture
Design Philosophy
- Modular providers β Each component (transcription, TTS, wake word, LLM) is swappable via configuration
- State machine coordination β Audio components are orchestrated through explicit state transitions to prevent conflicts
- Process isolation β Wake word + fast termination detection run in separate processes to prevent model corruption
- Tool abstraction β Smart home control via MCP (Model Context Protocol) for clean separation
System Flow
βββββββββββββββββββ Wake word βββββββββββββββββββ
β Microphone βββββββββββββββββββββΊβ Transcription β
β (always on) β β (streaming) β
βββββββββββββββββββ ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ Response βββββββββββββββββββ
β TTS Speaker ββββββββββββββββββββββ LLM + Tools β
β (neural voice) β β (orchestrator) β
βββββββββββββββββββ ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β MCP Server β
β (tool actions) β
βββββββββββββββββββ
Component Overview
| Component | Provider Options | Default |
|---|---|---|
| Wake Word | OpenWakeWord | hey_honey_v2 |
| Transcription | AssemblyAI, OpenAI Whisper | AssemblyAI |
| Response/LLM | OpenAI Realtime API (WebSocket) | gpt-realtime |
| TTS | Piper, macOS, Google Cloud, Chatterbox | Piper |
| Tools | MCP Server | HTTP on localhost:3000 |
LLM Strategy
| Scenario | Model | Rationale |
|---|---|---|
| Direct conversation (streaming) | gpt-realtime | Low-latency, natural streaming dialogue |
| Tool decisions (in-session) | gpt-realtime | Lets the realtime model decide and call MCP tools directly |
| Tool chaining / iterative tool execution | gpt-4o-mini | Fast, cost-effective multi-step tool planning + execution loops |
| Final answer (after tools) | gpt-4o-mini | Compose the final response from tool results |
Tool Calling Subagent API
The Tool-Calling Mini inference server (https://inference.stuart-labs.com) exposes a fine-tuned Qwen3 model specialized in tool-call routing for the same 12 tools used by HomeAssist. It accepts natural-language messages via POST /v1/chat/completions and returns structured tool_calls with extracted arguments β handling tool selection and parameter extraction on-device rather than through OpenAI.
This provides an alternative delegation path for the tool subagent (tool_subagent_model in config), where tool-call inference can be routed to the local model instead of gpt-4o-mini. Auth uses rotating HMAC-SHA256 API keys (5-minute TTL) from a shared refresh token.
Full API spec: TOOL_CALLING_MINI_API.md
Project Structure
HomeAssist/
βββ assistant_framework/ # Core voice assistant
β βββ orchestrator.py # Main coordination logic
β βββ config.py # All configuration
β βββ providers/ # Pluggable implementations
β β βββ transcription/ # Speech-to-text
β β βββ response/ # LLM responses
β β βββ tts/ # Text-to-speech
β β βββ wakeword/ # Wake word detection
β β βββ termination/ # Fast termination detection ("over out")
β β βββ context/ # Conversation history
β βββ utils/ # Shared utilities
β β βββ state_machine.py # Audio lifecycle management
β β βββ barge_in.py # Interrupt detection
β β βββ briefing_manager.py
β β βββ briefing_processor.py
β βββ interfaces/ # Abstract base classes
β βββ termination.py # TerminationInterface
β
βββ mcp_server/ # Tool server
β βββ server.py # MCP entry point
β βββ tools/ # Tool implementations
β β βββ kasa_lighting.py
β β βββ spotify.py
β β βββ calendar.py
β β βββ weather.py
β β βββ ...
β βββ tools_config.py # Enable/disable tools
β
βββ scripts/scheduled/ # Background jobs
β βββ email_summarizer/ # Email digest pipeline
β βββ news_summary/ # News summary pipeline
β βββ calendar_briefing/ # Calendar reminder announcements
β
βββ audio_data/ # Model files
β βββ wake_word_models/
β βββ piper_models/
β
βββ state_management/ # Runtime state
βββ persistent_memory.json
βββ conversation_summary.json
Quick Start
1. Configure environment
cp env.example .env
# Edit .env with your API keys
Required keys:
OPENAI_API_KEYβ LLM responsesASSEMBLYAI_API_KEYβ TranscriptionSUPABASE_URL+SUPABASE_KEYβ Memory storage
2. Install dependencies
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
3. Run
python3 -m assistant_framework.main continuous
4. Interact
- Say "Hey Honey" (or your configured wake word)
- Speak your request
- Say "send it" or wait for auto-send
- Listen to the response (interrupt anytime by speaking)
- Say "over out" to end immediately (including during speech, if fast termination is enabled)
Configuration
All settings are in assistant_framework/config.py. Key sections:
| Section | Controls |
|---|---|
| Provider Selection | Which implementation for each component |
| Wake Word | Trigger phrases, sensitivity, multiple wake words |
| Termination | Fast termination detection config (parallel βover outβ) |
| Transcription | ASR provider settings |
| Response/LLM | Model selection, temperature, system prompt |
| TTS | Voice selection, speed, chunking |
| Barge-In | Interrupt sensitivity |
| Memory | Persistent facts, vector search settings |
| Briefing Processor | Proactive announcement generation |
See SETUP.md for detailed configuration reference.
Extending
Adding a new tool
- Create
mcp_server/tools/your_tool.pyimplementing the tool interface - Register in
mcp_server/tools_config.py - The assistant automatically discovers and can use the tool
Adding a new provider
- Implement the appropriate interface in
assistant_framework/interfaces/ - Create provider class in
assistant_framework/providers/ - Register in
assistant_framework/factory.py - Select via config
Custom wake word
- Train a model using OpenWakeWord
- Place
.onnxfile inaudio_data/wake_word_models/ - Update
WAKEWORD_CONFIGin config
Requirements
- Python 3.10+
- macOS or Linux with audio support
- Microphone and speakers
- API keys for cloud services (OpenAI, AssemblyAI, etc.)
License
MIT License β See LICENSE file for details.
