kokoro/README.md

# kokoro-rocm-wyoming

A Docker image running [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) TTS on AMD GPUs via ROCm, with a [Wyoming protocol](https://github.com/rhasspy/wyoming) server for Home Assistant integration.

## Stack

| Component | Version |
|-----------|---------|
| ROCm | 6.1.2 |
| PyTorch | 2.5.1 |
| Target GPU | AMD RX 6700 XT (gfx1031) |
| Kokoro model | hexgrad/Kokoro-82M |
| Protocol | Wyoming (TCP, port 10200) |

## Quick start

```bash
docker compose up -d
```

The Wyoming server will be available at `<host-ip>:10200`.

## Home Assistant setup

1. In Home Assistant, go to **Settings → Devices & Services → Add Integration**
2. Search for **Wyoming Protocol**
3. Enter your host IP and port `10200`
4. Kokoro voices will appear in your voice assistant configuration

## Configuration

Edit `config.yaml` before building to change the default voice, language, speed, or the list of voices advertised to Home Assistant.

```yaml
tts:
  device: cuda          # ROCm presents as 'cuda' to PyTorch via HIP
  language: a           # a=American English, b=British English, etc.
  default_voice: af_heart
  default_speed: 1.0
  voices:
    - name: af_heart
      description: "Heart (Female, American English)"
      language: en-us
    # add more voices here
```

Available language codes: `a` (American English), `b` (British English), `e` (Spanish), `f` (French), `h` (Hindi), `i` (Italian), `j` (Japanese), `p` (Portuguese), `z` (Mandarin).

## Building

The image is built automatically by Gitea Actions on every push to `main` and on `v*` tags. To build locally:

```bash
docker build -t kokoro-rocm-wyoming .
```

Model weights are downloaded from HuggingFace at build time. Voice files are fetched on first use and cached in the `hf_cache` Docker volume.

## GPU passthrough

The compose file passes through `/dev/kfd` and `/dev/dri` and adds the `video` and `render` groups. If ROCm does not detect the 6700 XT, uncomment the override in `docker-compose.yml`:

```yaml
environment:
  - HSA_OVERRIDE_GFX_VERSION=10.3.0
```

## Audio output

Kokoro outputs 24 kHz 16-bit mono PCM. The Wyoming server streams chunks to Home Assistant as they are generated — long utterances start playing before synthesis is complete.

## License

Model weights: [Apache 2.0](https://huggingface.co/hexgrad/Kokoro-82M)