torch.compile with dynamic=True still specializes per shape family on
first call. The warmup was running one text length, leaving real requests
to JIT-compile their own shapes (15-22s for first chunk). HA freezes
because it gets no AudioChunk for 22 seconds.
Fix:
- Run 3 warmup passes (short/medium/long text) so torch.compile builds
a dynamic shape graph covering the range HA actually sends. Real
requests then hit a cached compilation and synthesize in 3-8s.
- Reduce default chunk_size from 300 to 120 chars so the first text
chunk is shorter, producing faster synthesis and earlier first audio.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wyoming-only server built around the official chatterbox TTS model.
Includes ROCm/AMD GPU support, sentence-level streaming, config.yaml
management, and Gitea CI for container builds.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>