๐ฆ
Glm Asr
All-in-One Speech Recognition Service based on GLM-ASR-Nano | Web UI โข REST API โข MCP Server โข Long Audio Support
0 installs
8 stars
3 forks
Trust: 53 โ Fair
Devtools
Installation
npx glm-asrAsk AI about Glm Asr
Powered by Claude ยท Grounded in docs
I know everything about Glm Asr. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Loading tools...
Reviews
Documentation
English | ็ฎไฝไธญๆ | ็น้ซไธญๆ | ๆฅๆฌ่ช
GLM-ASR
All-in-One Speech Recognition Service based on GLM-ASR-Nano
Web UI โข REST API โข SSE Streaming โข Swagger Docs
๐ฅ๏ธ Screenshot

โจ Features
- ๐ฏ High Accuracy - Based on GLM-ASR-Nano-2512 (1.5B), outperforms Whisper V3
- ๐ 17 Languages - Chinese, English, Cantonese, Japanese, Korean, and more
- ๐ค Long Audio - VAD smart segmentation for unlimited audio length
- ๐ SSE Streaming - Real-time progress and results for long audio
- ๐ฅ๏ธ Web UI - Modern dark-mode interface with 4 language support
- ๐ REST API - Full API with Swagger documentation
- ๐พ GPU Management - Manual load/unload for memory control
- ๐ณ Docker Ready - One-command deployment with pre-loaded model
๐ Quick Start
Docker (Recommended)
docker run -d --gpus all -p 7860:7860 neosun/glm-asr:v2.0.1
Access:
- Web UI: http://localhost:7860
- Swagger Docs: http://localhost:7860/docs
- ReDoc: http://localhost:7860/redoc
Docker Compose
git clone https://github.com/neosun100/glm-asr.git
cd glm-asr
docker compose up -d
๐ API Reference
Base URL
http://localhost:7860
Endpoints
Health Check
GET /health
{"status": "ok", "model_loaded": true}
Transcribe (Sync) - For short audio
POST /api/transcribe
Content-Type: multipart/form-data
| Parameter | Type | Default | Description |
|---|---|---|---|
| file | File | required | Audio file (wav/mp3/flac/m4a/ogg/webm) |
| max_new_tokens | int | 512 | Max output tokens (1-2048) |
curl -X POST http://localhost:7860/api/transcribe \
-F "file=@audio.mp3" \
-F "max_new_tokens=512"
{"status": "success", "text": "Transcribed text here..."}
Transcribe (SSE Stream) - For long audio
POST /api/transcribe/stream
Content-Type: multipart/form-data
Returns Server-Sent Events with real-time progress:
| Event Type | Description | Example |
|---|---|---|
start | Processing started | {"type": "start"} |
progress | Segment progress | {"type": "progress", "current": 3, "total": 10, "duration": 22.5} |
partial | Segment result | {"type": "partial", "text": "Segment text..."} |
done | Complete | {"type": "done", "text": "Full transcription..."} |
error | Error occurred | {"type": "error", "message": "Error details"} |
curl -X POST http://localhost:7860/api/transcribe/stream \
-F "file=@long_audio.mp3"
GPU Status
GET /gpu/status
{
"model_loaded": true,
"device": "cuda",
"gpu_memory_used_mb": 4320.5,
"gpu_memory_total_mb": 24576.0
}
Load/Unload Model
POST /gpu/load
POST /gpu/unload
Interactive Documentation
- Swagger UI: http://localhost:7860/docs
- ReDoc: http://localhost:7860/redoc
โ๏ธ Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
MODEL_CHECKPOINT | zai-org/GLM-ASR-Nano-2512 | HuggingFace model path |
PORT | 7860 | Service port |
HF_HOME | /app/cache | Model cache directory |
docker-compose.yml
services:
glm-asr:
image: neosun/glm-asr:v2.0.1
container_name: glm-asr
ports:
- "7860:7860"
volumes:
- ./cache:/app/cache
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
๐๏ธ Tech Stack
| Component | Technology |
|---|---|
| Model | GLM-ASR-Nano-2512 (1.5B) |
| Backend | FastAPI + Uvicorn |
| Streaming | Server-Sent Events (SSE) |
| Frontend | HTML5 + Vanilla JS |
| Container | Docker + NVIDIA CUDA |
| API Docs | Swagger / ReDoc |
๐ Benchmark
GLM-ASR-Nano achieves the lowest average error rate (4.10) among comparable models:

๐ Changelog
v2.0.1 (2025-12-28)
- โ Migrated to FastAPI async framework
- โ SSE streaming for real-time progress
- โ Complete Swagger API documentation
- โ Dual API mode: sync + streaming
- โ Fixed browser timeout for long audio
- โ Modern dark UI with progress display
v1.1.0 (2025-12-15)
- โ VAD smart segmentation (silero-vad)
- โ Support unlimited audio length
v1.0.0 (2025-12-14)
- โ Initial release
- โ Web UI with 4 language support
- โ REST API with Swagger docs
- โ Docker all-in-one image
