# kokoro-rocm-wyoming A Docker image running [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) TTS on AMD GPUs via ROCm, with a [Wyoming protocol](https://github.com/rhasspy/wyoming) server for Home Assistant integration. ## Stack | Component | Version | |-----------|---------| | ROCm | 6.1.2 | | PyTorch | 2.5.1 | | Target GPU | AMD RX 6700 XT (gfx1031) | | Kokoro model | hexgrad/Kokoro-82M | | Protocol | Wyoming (TCP, port 10200) | ## Quick start ```bash docker compose up -d ``` The Wyoming server will be available at `:10200`. ## Home Assistant setup 1. In Home Assistant, go to **Settings → Devices & Services → Add Integration** 2. Search for **Wyoming Protocol** 3. Enter your host IP and port `10200` 4. Kokoro voices will appear in your voice assistant configuration ## Configuration Edit `config.yaml` before building to change the default voice, language, speed, or the list of voices advertised to Home Assistant ```yaml tts: device: cuda # ROCm presents as 'cuda' to PyTorch via HIP language: a # a=American English, b=British English, etc. default_voice: af_heart default_speed: 1.0 voices: - name: af_heart description: "Heart (Female, American English)" language: en-us # add more voices here ``` Available language codes: `a` (American English), `b` (British English), `e` (Spanish), `f` (French), `h` (Hindi), `i` (Italian), `j` (Japanese), `p` (Portuguese), `z` (Mandarin). ## Building The image is built automatically by Gitea Actions on every push to `main` and on `v*` tags. To build locally: ```bash docker build -t kokoro-rocm-wyoming . ``` Model weights are downloaded from HuggingFace at build time. Voice files are fetched on first use and cached in the `hf_cache` Docker volume. ## GPU passthrough The compose file passes through `/dev/kfd` and `/dev/dri` and adds the `video` and `render` groups. If ROCm does not detect the 6700 XT, uncomment the override in `docker-compose.yml`: ```yaml environment: - HSA_OVERRIDE_GFX_VERSION=10.3.0 ``` ## Audio output Kokoro outputs 24 kHz 16-bit mono PCM. The Wyoming server streams chunks to Home Assistant as they are generated — long utterances start playing before synthesis is complete. ## License Model weights: [Apache 2.0](https://huggingface.co/hexgrad/Kokoro-82M)