disconnect() was a no-op in the base class; writer.close() was never called, leaving HA waiting for a TCP FIN that never arrived. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
kokoro-rocm-wyoming
A Docker image running Kokoro-82M TTS on AMD GPUs via ROCm, with a Wyoming protocol server for Home Assistant integration.
Stack
| Component | Version |
|---|---|
| ROCm | 6.1.2 |
| PyTorch | 2.5.1 |
| Target GPU | AMD RX 6700 XT (gfx1031) |
| Kokoro model | hexgrad/Kokoro-82M |
| Protocol | Wyoming (TCP, port 10300) |
Quick start
docker compose up -d
The Wyoming server will be available at <host-ip>:10300.
Home Assistant setup
- In Home Assistant, go to Settings → Devices & Services → Add Integration
- Search for Wyoming Protocol
- Enter your host IP and port
10300 - Kokoro voices will appear in your voice assistant configuration
Configuration
Edit config.yaml before building to change the default voice, language, speed, or the list of voices advertised to Home Assistant
tts:
device: cuda # ROCm presents as 'cuda' to PyTorch via HIP
language: a # a=American English, b=British English, etc.
default_voice: af_heart
default_speed: 1.0
voices:
- name: af_heart
description: "Heart (Female, American English)"
language: en-us
# add more voices here
Available language codes: a (American English), b (British English), e (Spanish), f (French), h (Hindi), i (Italian), j (Japanese), p (Portuguese), z (Mandarin).
Building
The image is built automatically by Gitea Actions on every push to main and on v* tags. To build locally:
docker build -t kokoro-rocm-wyoming .
Model weights are downloaded from HuggingFace at build time. Voice files are fetched on first use and cached in the hf_cache Docker volume.
GPU passthrough
The compose file passes through /dev/kfd and /dev/dri and adds the video and render groups. If ROCm does not detect the 6700 XT, uncomment the override in docker-compose.yml:
environment:
- HSA_OVERRIDE_GFX_VERSION=10.3.0
Audio output
Kokoro outputs 24 kHz 16-bit mono PCM. The Wyoming server streams chunks to Home Assistant as they are generated — long utterances start playing before synthesis is complete.
License
Model weights: Apache 2.0