Voice Triage Poc
Local-first voice triage POC (whisper.cpp + RAG + workflows)
Ask AI about Voice Triage Poc
Powered by Claude · Grounded in docs
I know everything about Voice Triage Poc. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Voice Triage POC
Local-first voice triage scaffold for a dev PC. You can run either:
- a local web app (
voice_triage web) with start/stop mic controls and chat-style transcript/response display - a terminal demo (
voice_triage demo) for quick CLI checks
What is implemented
- Python 3.11+ project managed with
uv - Local web server + UI:
uv run voice_triage web - REST API server (UI-independent):
uv run voice_triage api - Browser microphone capture with Start/Stop controls (no push-to-talk prompt)
- Per-turn ASR using local
whisper.cpp - Multi-turn conversation engine with routing and move-home state progression
- Heuristic extractor (intent + UK postcode + basic fields)
- Minimal RAG with sqlite-backed chunk index from
./kb - Optional BYO inference backend via REST (
VOICE_TRIAGE_INFERENCE_BACKEND=byo) - Workflow handlers (
move_home,electoral_register,council_taxstubs) - SQLite persistence for turns (
./data/voice_triage.db) - Tests with
pytest - Lint/type tooling (
ruff,mypy) + pre-commit + GitHub Actions CI - Developer references for REST API, BYO inference, telephony, and MCP under
docs/
Quickstart
- Install Python 3.11+ and
uv. - Create and activate a project-local virtual environment.
PowerShell:
uv venv .venv
.\.venv\Scripts\Activate.ps1
Bash:
uv venv .venv
source .venv/bin/activate
- Sync dependencies into
.venv:
uv sync --dev
- Bake runtime paths into
.venv/.env(recommended):
.\scripts\configure_venv_env.ps1 `
-WhisperBin ".venv/tools/whispercpp/Release/whisper-cli.exe" `
-WhisperModel ".venv/tools/whispercpp/models/ggml-base.en.bin" `
-WhisperUseGpu $true `
-WhisperGpuLayers 60 `
-WhisperThreads 6 `
-WhisperTimeoutSeconds 45 `
-InferenceBackend "local" `
-ByoInferenceUrl "" `
-ByoApiStyle "generic" `
-ByoModel "" `
-ByoApiKey "" `
-ByoSystemPrompt "" `
-PiperBin ".venv/Scripts/piper.exe" `
-PiperModel ".venv/tools/piper/models/en_GB-alba-medium.onnx" `
-PiperDefaultVoiceId "en_GB-alba-medium" `
-PiperTimeoutSeconds 30 `
-WebHost "0.0.0.0" `
-WebPort 8443 `
-SslCertFile ".venv/certs/dev-cert.pem" `
-SslKeyFile ".venv/certs/dev-key.pem"
This writes .venv/.env (project-local), and the app auto-loads it.
- Generate TLS certs (required for LAN browser mic access):
.\scripts\generate_dev_tls_cert.ps1
By default this includes localhost, 127.0.0.1, and detected LAN IPv4 addresses.
- (Optional) Keep tools inside
.venvfor fully self-contained local runtime:
- whisper.cpp binary:
.venv/tools/whispercpp/whisper-cli.exe(or.venv/tools/whispercpp/main.exe) - whisper model:
.venv/tools/whispercpp/models/ggml-base.en.bin - piper binary:
.venv/tools/piper/piper.exe - piper model:
.venv/tools/piper/models/en_GB-northern_english_male-medium.onnx- any additional
*.onnxfiles in.venv/tools/piper/modelsappear in the web voice dropdown - default web voice can be set with
PIPER_DEFAULT_VOICE_ID(defaults toen_GB-alba-medium)
- any additional
- (Optional) Add local knowledge base files under
./kb/*.mdor./kb/*.txt.
Reindex knowledge base on demand (no restart required):
uv run voice_triage reindex
Run local website
Start server:
uv run voice_triage web --host 127.0.0.1 --port 8000
Then open:
http://127.0.0.1:8000
The web UI now uses the versioned REST surface under /api/v1/*.
For LAN access:
uv run voice_triage web --host 0.0.0.0 --port 8443 --ssl-certfile .venv/certs/dev-cert.pem --ssl-keyfile .venv/certs/dev-key.pem
Then browse from another device on your LAN:
https://<your-pc-lan-ip>:8443
Run REST API only
uv run voice_triage api --host 127.0.0.1 --port 8000 --no-ssl
Useful endpoints:
POST /api/v1/sessionPOST /api/v1/session/{session_id}/turn(audio upload)POST /api/v1/session/{session_id}/turn/text(pre-transcribed text)GET /api/v1/voicesGET /api/v1/config(client runtime config, including VAD tuning)POST /api/v1/reindex(rebuild RAG index from current./kb)POST /api/v1/session/{session_id}/voiceGET /api/v1/tts/{audio_id}
Important for browser microphone access:
- Many browsers block
getUserMediaon plainhttp://<lan-ip>. localhostis usually allowed without HTTPS, but LAN IPs usually require HTTPS.- For LAN clients, use HTTPS with a cert that includes your LAN IP in SANs.
Web flow:
- Click
Start Listening - Speak naturally
- The browser VAD auto-detects end-of-turn silence and sends the turn
- See:
- user transcript (what ASR heard)
- assistant response
- assistant audio playback (Piper)
- selectable Piper voice (dropdown)
- The app keeps listening for the next turn; click
Stop Listeningto end hands-free mode
CLI demo (legacy)
uv run voice_triage demo
Optional MCP server
Run MCP stdio server:
uv run voice_triage mcp
MCP also exposes reindex_kb for on-demand knowledge base refresh.
If MCP SDK is missing, install optional dependency:
uv add --optional mcp "mcp>=1.0.0"
Common commands
make sync
make format
make lint
make typecheck
make docstrings
make test
make telephony-contract
make telephony-smoke-local
# with custom base URL:
# make telephony-smoke-remote BASE_URL=https://your-public-host
make demo
make web
make api
make web-lan
make stop-web
make cert-dev
make web-ssl
Developer references
- REST API reference:
docs/API_REFERENCE.md - BYO inference reference:
docs/BYO_INFERENCE_REFERENCE.md - Telephony provider how-tos:
docs/TELEPHONY_REFERENCE.md - Telephony guide index:
docs/telephony/README.md - Telephony integration status:
docs/TELEPHONY_INTEGRATIONS.md - MCP reference:
docs/MCP_REFERENCE.md
Audio prerequisites (brief)
- Ubuntu/Debian:
sudo apt-get install -y portaudio19-dev - macOS:
brew install portaudio - Windows: no extra package is usually needed when using standard Python wheels.
CUDA acceleration for whisper.cpp
For faster STT turn latency on NVIDIA GPUs, use a CUDA-enabled whisper.cpp build and set:
WHISPERCPP_USE_GPU=1
WHISPERCPP_GPU_LAYERS=60
WHISPERCPP_THREADS=6
WHISPERCPP_TIMEOUT_SECONDS=45
Notes:
- Increase
WHISPERCPP_GPU_LAYERStoward99if VRAM allows. - If your build fails with GPU flags, set
WHISPERCPP_USE_GPU=0. - You can pass additional whisper CLI flags via
WHISPERCPP_EXTRA_ARGSin.venv/.env.
BYO inference backend
Set:
VOICE_TRIAGE_INFERENCE_BACKEND=byo
VOICE_TRIAGE_BYO_INFERENCE_URL=http://localhost:9000/infer
VOICE_TRIAGE_BYO_INFERENCE_TIMEOUT_SECONDS=12
VOICE_TRIAGE_BYO_API_STYLE=generic
Expected BYO endpoint request payload:
{"query":"How do I order a garden waste bin?"}
Expected response payload:
{"answer":"...", "metadata":{"provider":"my-model"}}
Ollama (OpenAI-compatible) example:
VOICE_TRIAGE_INFERENCE_BACKEND=byo
VOICE_TRIAGE_BYO_API_STYLE=openai
VOICE_TRIAGE_BYO_INFERENCE_URL=http://127.0.0.1:11434/v1/chat/completions
VOICE_TRIAGE_BYO_MODEL=llama3.1:8b
VOICE_TRIAGE_BYO_INFERENCE_TIMEOUT_SECONDS=20
VOICE_TRIAGE_BYO_API_KEY=
When BYO is unavailable, local RAG fallback is used automatically.
Notes
- If
./kbhas no indexed chunks, RAG answers with a clear fallback message asking the user to rephrase or ask about supported council service topics. - TTS normalizes UK currency values for more natural speech (for example
£33.50becomes33 pounds and 50 pence). - Browser mic recording requires microphone permission for
localhost. - If Piper is not configured, text responses still work and UI shows a TTS error message.
- If LAN clients cannot connect, allow inbound TCP
8443in Windows Firewall. - If Ctrl+C fails in one terminal, run
uv run voice_triage stop-webfrom another terminal.
