- Warmup now uses a ~170-char representative sentence so torch.compile
JIT-compiles for typical token sequence lengths. Previously "Warmup."
compiled for very short shapes, causing a full re-compile (17s) on the
first real HA request and pushing total synthesis past 30s.
- Compile model.ve (voice encoder) in addition to s3gen — both are
convolutional and hit the MIOpen workspace=0 bug.
- Fix _patch_timing: attribute is model.ve not model.voice_encoder,
so the timing wrap was silently skipping the speaker embedding.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Warmup: run a synthesis before accepting Wyoming connections so MIOpen
benchmarks and caches all conv layer shapes. Without this, the first HA
request triggers hundreds of benchmark runs and times out.
fp16: wrap in try/except so a failed autocast retries in fp32 rather
than dropping the request silently.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wyoming-only server built around the official chatterbox TTS model.
Includes ROCm/AMD GPU support, sentence-level streaming, config.yaml
management, and Gitea CI for container builds.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>