Files
rocm-chatterbox-whisper/config.yaml
scott 59731084cd
All checks were successful
Build ROCm Image / build (push) Successful in 2m49s
Multi-pass warmup and smaller chunk_size to fix HA timeout
torch.compile with dynamic=True still specializes per shape family on
first call. The warmup was running one text length, leaving real requests
to JIT-compile their own shapes (15-22s for first chunk). HA freezes
because it gets no AudioChunk for 22 seconds.

Fix:
- Run 3 warmup passes (short/medium/long text) so torch.compile builds
  a dynamic shape graph covering the range HA actually sends. Real
  requests then hit a cached compilation and synthesize in 3-8s.
- Reduce default chunk_size from 300 to 120 chars so the first text
  chunk is shorter, producing faster synthesis and earlier first audio.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 15:04:46 -04:00

30 lines
790 B
YAML

model:
# Options: chatterbox, chatterbox-turbo
repo_id: chatterbox-turbo
tts_engine:
# Device: cuda, cpu, or leave empty for auto-detect
device: ""
predefined_voices_path: voices
reference_audio_path: reference_audio
# Fallback voice (stem name, e.g. "default" matches default.wav)
default_voice_id: default.wav
generation_defaults:
# Turbo model: uses temperature only (exaggeration/cfg_weight ignored)
# Standard model: uses exaggeration and cfg_weight (temperature ignored)
temperature: 0.8
exaggeration: 0.5
cfg_weight: 0.5
# seed: 0 = random each call, >0 = reproducible output
seed: 0
wyoming:
host: "0.0.0.0"
port: 10200
# Max characters per synthesis chunk (split at sentence boundaries)
chunk_size: 120
paths:
model_cache: /app/hf_cache