📦

HomeAssist

Home Assistant & Personal AI Infrastructure for premium agent-assisted living! First stop on the road to Tony Stark's Jarvis.

0 installs

Trust: 34 — Low

Files

Ask AI about HomeAssist

I know everything about HomeAssist. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

HomeAssist

Voice Controlled Personal AI infrastructure. Always listening. Low latency. Actually useful.

HomeAssist V3 is a voice-first assistant tailored to my life, workflows, and data—modular, agentic, and built to feel like a real companion/mentor (not a reactive chatbot). It doesn't just answer: it proactively surfaces what matters, delivers briefings, and nudges me at the right moment.

V3 improvements: Dramatically faster response times and proactive jump-in briefings that speak immediately on wake word detection.

Say the wake word and speak naturally—HomeAssist handles lights, music, calendar, questions, and more. The first steps on the road to building a real-life Jarvis.

What It Does

HomeAssist is a hands-free voice interface that:

Listens continuously for a customizable wake word
Transcribes speech in real-time with streaming ASR
Thinks with LLMs to understand requests and generate responses
Speaks responses via neural text-to-speech
Controls smart home devices through an extensible tool system
Remembers you across sessions with persistent and semantic memory

You can interrupt it mid-sentence (barge-in), ask multi-step questions, and have natural back-and-forth conversations.

Capabilities

Voice Interaction

Feature	Description
Wake word detection	Hands-free activation with custom trigger phrases
Real-time transcription	Low-latency streaming speech-to-text
Neural TTS	Natural-sounding responses (multiple voice options)
Barge-in	Interrupt the assistant by speaking
Send phrases	"Send it", "Sir" to submit your message
Auto-send	Automatic submission after silence timeout
Termination phrase	Instant cutoff -> end the session by saying "over out" at any time

Smart Home & Services

Tool	What It Controls
Lighting	Kasa smart lights—on/off, brightness, color, scenes
Spotify	Music playback, search, queue management
Calendar	Google Calendar—view events, create appointments
Weather	Current conditions and forecasts
Web Search	Google search with AI-summarized results
SMS	Send text messages via macOS Messages
Notifications	Email summaries, news digests, custom alerts
System Info	Explain the assistant’s architecture and capabilities
Cursor	Local editor automation helpers

Memory & Context

System	Purpose
Conversation context	Maintains history within a session
Persistent memory	Remembers facts, preferences, and patterns about you
Vector memory	Semantic search over past conversations
Briefing announcements	Proactive updates spoken on wake word

Tool Chaining

Multi-step requests are handled automatically:

"Find that song we talked about and text me the link"

The assistant recognizes this needs multiple tools (search → SMS), executes them in sequence, and confirms completion.

Architecture

Design Philosophy

Modular providers — Each component (transcription, TTS, wake word, LLM) is swappable via configuration
State machine coordination — Audio components are orchestrated through explicit state transitions to prevent conflicts
Process isolation — Wake word + fast termination detection run in separate processes to prevent model corruption
Tool abstraction — Smart home control via MCP (Model Context Protocol) for clean separation

System Flow

┌─────────────────┐     Wake word      ┌─────────────────┐
│  Microphone     │───────────────────►│  Transcription  │
│  (always on)    │                    │  (streaming)    │
└─────────────────┘                    └────────┬────────┘
                                                │
                                                ▼
┌─────────────────┐     Response       ┌─────────────────┐
│  TTS Speaker    │◄───────────────────│  LLM + Tools    │
│  (neural voice) │                    │  (orchestrator) │
└─────────────────┘                    └────────┬────────┘
                                                │
                                                ▼
                                       ┌─────────────────┐
                                       │  MCP Server     │
                                       │  (tool actions) │
                                       └─────────────────┘

Component Overview

Component	Provider Options	Default
Wake Word	OpenWakeWord	`hey_honey_v2`
Transcription	AssemblyAI, OpenAI Whisper	AssemblyAI
Response/LLM	OpenAI Realtime API (WebSocket)	`gpt-realtime`
TTS	Piper, macOS, Google Cloud, Chatterbox	Piper
Tools	MCP Server	HTTP on `localhost:3000`

LLM Strategy

Scenario	Model	Rationale
Direct conversation (streaming)	`gpt-realtime`	Low-latency, natural streaming dialogue
Tool decisions (in-session)	`gpt-realtime`	Lets the realtime model decide and call MCP tools directly
Tool chaining / iterative tool execution	`gpt-4o-mini`	Fast, cost-effective multi-step tool planning + execution loops
Final answer (after tools)	`gpt-4o-mini`	Compose the final response from tool results

Tool Calling Subagent API

The Tool-Calling Mini inference server (https://inference.stuart-labs.com) exposes a fine-tuned Qwen3 model specialized in tool-call routing for the same 12 tools used by HomeAssist. It accepts natural-language messages via POST /v1/chat/completions and returns structured tool_calls with extracted arguments — handling tool selection and parameter extraction on-device rather than through OpenAI.

This provides an alternative delegation path for the tool subagent (tool_subagent_model in config), where tool-call inference can be routed to the local model instead of gpt-4o-mini. Auth uses rotating HMAC-SHA256 API keys (5-minute TTL) from a shared refresh token.

Full API spec: TOOL_CALLING_MINI_API.md

Project Structure

HomeAssist/
├── assistant_framework/       # Core voice assistant
│   ├── orchestrator.py     # Main coordination logic
│   ├── config.py              # All configuration
│   ├── providers/             # Pluggable implementations
│   │   ├── transcription/  # Speech-to-text
│   │   ├── response/          # LLM responses
│   │   ├── tts/               # Text-to-speech
│   │   ├── wakeword/       # Wake word detection
│   │   ├── termination/       # Fast termination detection ("over out")
│   │   └── context/           # Conversation history
│   ├── utils/                 # Shared utilities
│   │   ├── state_machine.py   # Audio lifecycle management
│   │   ├── barge_in.py        # Interrupt detection
│   │   ├── briefing_manager.py
│   │   └── briefing_processor.py
│   └── interfaces/            # Abstract base classes
│       ├── termination.py     # TerminationInterface
│
├── mcp_server/                # Tool server
│   ├── server.py              # MCP entry point
│   ├── tools/                 # Tool implementations
│   │   ├── kasa_lighting.py
│   │   ├── spotify.py
│   │   ├── calendar.py
│   │   ├── weather.py
│   │   └── ...
│   └── tools_config.py        # Enable/disable tools
│
├── scripts/scheduled/         # Background jobs
│   ├── email_summarizer/      # Email digest pipeline
│   ├── news_summary/          # News summary pipeline
│   └── calendar_briefing/     # Calendar reminder announcements
│
├── audio_data/                # Model files
│   ├── wake_word_models/
│   └── piper_models/
│
└── state_management/          # Runtime state
    ├── persistent_memory.json
    └── conversation_summary.json

Quick Start

1. Configure environment

cp env.example .env
# Edit .env with your API keys

Required keys:

OPENAI_API_KEY — LLM responses
ASSEMBLYAI_API_KEY — Transcription
SUPABASE_URL + SUPABASE_KEY — Memory storage

2. Install dependencies

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Run

python3 -m assistant_framework.main continuous

4. Interact

Say "Hey Honey" (or your configured wake word)
Speak your request
Say "send it" or wait for auto-send
Listen to the response (interrupt anytime by speaking)
Say "over out" to end immediately (including during speech, if fast termination is enabled)

Configuration

All settings are in assistant_framework/config.py. Key sections:

Section	Controls
Provider Selection	Which implementation for each component
Wake Word	Trigger phrases, sensitivity, multiple wake words
Termination	Fast termination detection config (parallel “over out”)
Transcription	ASR provider settings
Response/LLM	Model selection, temperature, system prompt
TTS	Voice selection, speed, chunking
Barge-In	Interrupt sensitivity
Memory	Persistent facts, vector search settings
Briefing Processor	Proactive announcement generation

See SETUP.md for detailed configuration reference.

Extending

Adding a new tool

Create mcp_server/tools/your_tool.py implementing the tool interface
Register in mcp_server/tools_config.py
The assistant automatically discovers and can use the tool

Adding a new provider

Implement the appropriate interface in assistant_framework/interfaces/
Create provider class in assistant_framework/providers/
Register in assistant_framework/factory.py
Select via config

Custom wake word

Train a model using OpenWakeWord
Place .onnx file in audio_data/wake_word_models/
Update WAKEWORD_CONFIG in config

Requirements

Python 3.10+
macOS or Linux with audio support
Microphone and speakers
API keys for cloud services (OpenAI, AssemblyAI, etc.)

License

MIT License — See LICENSE file for details.