All checks were successful
Build ROCm Image / build (push) Successful in 2m49s
torch.compile with dynamic=True still specializes per shape family on first call. The warmup was running one text length, leaving real requests to JIT-compile their own shapes (15-22s for first chunk). HA freezes because it gets no AudioChunk for 22 seconds. Fix: - Run 3 warmup passes (short/medium/long text) so torch.compile builds a dynamic shape graph covering the range HA actually sends. Real requests then hit a cached compilation and synthesize in 3-8s. - Reduce default chunk_size from 300 to 120 chars so the first text chunk is shorter, producing faster synthesis and earlier first audio. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
30 lines
790 B
YAML
30 lines
790 B
YAML
model:
|
|
# Options: chatterbox, chatterbox-turbo
|
|
repo_id: chatterbox-turbo
|
|
|
|
tts_engine:
|
|
# Device: cuda, cpu, or leave empty for auto-detect
|
|
device: ""
|
|
predefined_voices_path: voices
|
|
reference_audio_path: reference_audio
|
|
# Fallback voice (stem name, e.g. "default" matches default.wav)
|
|
default_voice_id: default.wav
|
|
|
|
generation_defaults:
|
|
# Turbo model: uses temperature only (exaggeration/cfg_weight ignored)
|
|
# Standard model: uses exaggeration and cfg_weight (temperature ignored)
|
|
temperature: 0.8
|
|
exaggeration: 0.5
|
|
cfg_weight: 0.5
|
|
# seed: 0 = random each call, >0 = reproducible output
|
|
seed: 0
|
|
|
|
wyoming:
|
|
host: "0.0.0.0"
|
|
port: 10200
|
|
# Max characters per synthesis chunk (split at sentence boundaries)
|
|
chunk_size: 120
|
|
|
|
paths:
|
|
model_cache: /app/hf_cache
|