Some checks failed
Build and Push Docker Image / build (push) Failing after 47s
Remove original Kokoro library source, demo, examples, tests, JS port, and GitHub config. Add Dockerfile (ROCm 6.1 / PyTorch 2.5.1), Wyoming TCP server, docker-compose with GPU passthrough, config, entrypoint, and Gitea Actions build workflow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
75 lines
2.3 KiB
Markdown
75 lines
2.3 KiB
Markdown
# kokoro-rocm-wyoming
|
|
|
|
A Docker image running [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) TTS on AMD GPUs via ROCm, with a [Wyoming protocol](https://github.com/rhasspy/wyoming) server for Home Assistant integration.
|
|
|
|
## Stack
|
|
|
|
| Component | Version |
|
|
|-----------|---------|
|
|
| ROCm | 6.1.2 |
|
|
| PyTorch | 2.5.1 |
|
|
| Target GPU | AMD RX 6700 XT (gfx1031) |
|
|
| Kokoro model | hexgrad/Kokoro-82M |
|
|
| Protocol | Wyoming (TCP, port 10200) |
|
|
|
|
## Quick start
|
|
|
|
```bash
|
|
docker compose up -d
|
|
```
|
|
|
|
The Wyoming server will be available at `<host-ip>:10200`.
|
|
|
|
## Home Assistant setup
|
|
|
|
1. In Home Assistant, go to **Settings → Devices & Services → Add Integration**
|
|
2. Search for **Wyoming Protocol**
|
|
3. Enter your host IP and port `10200`
|
|
4. Kokoro voices will appear in your voice assistant configuration
|
|
|
|
## Configuration
|
|
|
|
Edit `config.yaml` before building to change the default voice, language, speed, or the list of voices advertised to Home Assistant.
|
|
|
|
```yaml
|
|
tts:
|
|
device: cuda # ROCm presents as 'cuda' to PyTorch via HIP
|
|
language: a # a=American English, b=British English, etc.
|
|
default_voice: af_heart
|
|
default_speed: 1.0
|
|
voices:
|
|
- name: af_heart
|
|
description: "Heart (Female, American English)"
|
|
language: en-us
|
|
# add more voices here
|
|
```
|
|
|
|
Available language codes: `a` (American English), `b` (British English), `e` (Spanish), `f` (French), `h` (Hindi), `i` (Italian), `j` (Japanese), `p` (Portuguese), `z` (Mandarin).
|
|
|
|
## Building
|
|
|
|
The image is built automatically by Gitea Actions on every push to `main` and on `v*` tags. To build locally:
|
|
|
|
```bash
|
|
docker build -t kokoro-rocm-wyoming .
|
|
```
|
|
|
|
Model weights are downloaded from HuggingFace at build time. Voice files are fetched on first use and cached in the `hf_cache` Docker volume.
|
|
|
|
## GPU passthrough
|
|
|
|
The compose file passes through `/dev/kfd` and `/dev/dri` and adds the `video` and `render` groups. If ROCm does not detect the 6700 XT, uncomment the override in `docker-compose.yml`:
|
|
|
|
```yaml
|
|
environment:
|
|
- HSA_OVERRIDE_GFX_VERSION=10.3.0
|
|
```
|
|
|
|
## Audio output
|
|
|
|
Kokoro outputs 24 kHz 16-bit mono PCM. The Wyoming server streams chunks to Home Assistant as they are generated — long utterances start playing before synthesis is complete.
|
|
|
|
## License
|
|
|
|
Model weights: [Apache 2.0](https://huggingface.co/hexgrad/Kokoro-82M)
|