rocm-chatterbox-whisper

scott/rocm-chatterbox-whisper

Fork 0

Commit Graph

Author	SHA1	Message	Date
scott	169e003a34	Fix warmup text length and ve attribute for torch.compile All checks were successful Build ROCm Image / build (push) Successful in 3m35s Details - Warmup now uses a ~170-char representative sentence so torch.compile JIT-compiles for typical token sequence lengths. Previously "Warmup." compiled for very short shapes, causing a full re-compile (17s) on the first real HA request and pushing total synthesis past 30s. - Compile model.ve (voice encoder) in addition to s3gen — both are convolutional and hit the MIOpen workspace=0 bug. - Fix _patch_timing: attribute is model.ve not model.voice_encoder, so the timing wrap was silently skipping the speaker embedding. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:51:08 -04:00
scott	bdde4a2480	Add startup warmup and make fp16 autocast fault-tolerant All checks were successful Build ROCm Image / build (push) Successful in 3m10s Details Warmup: run a synthesis before accepting Wyoming connections so MIOpen benchmarks and caches all conv layer shapes. Without this, the first HA request triggers hundreds of benchmark runs and times out. fp16: wrap in try/except so a failed autocast retries in fp32 rather than dropping the request silently. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 13:48:41 -04:00
scott	16ea2853f5	Initial implementation: Chatterbox TTS with ROCm and Wyoming All checks were successful Build ROCm Image / build (push) Successful in 15m27s Details Wyoming-only server built around the official chatterbox TTS model. Includes ROCm/AMD GPU support, sentence-level streaming, config.yaml management, and Gitea CI for container builds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 09:51:09 -04:00

Author

SHA1

Message

Date

scott

169e003a34

Fix warmup text length and ve attribute for torch.compile

Build ROCm Image / build (push) Successful in 3m35s

Details

- Warmup now uses a ~170-char representative sentence so torch.compile
  JIT-compiles for typical token sequence lengths. Previously "Warmup."
  compiled for very short shapes, causing a full re-compile (17s) on the
  first real HA request and pushing total synthesis past 30s.
- Compile model.ve (voice encoder) in addition to s3gen — both are
  convolutional and hit the MIOpen workspace=0 bug.
- Fix _patch_timing: attribute is model.ve not model.voice_encoder,
  so the timing wrap was silently skipping the speaker embedding.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 14:51:08 -04:00

scott

bdde4a2480

Add startup warmup and make fp16 autocast fault-tolerant

Build ROCm Image / build (push) Successful in 3m10s

Details

Warmup: run a synthesis before accepting Wyoming connections so MIOpen
benchmarks and caches all conv layer shapes. Without this, the first HA
request triggers hundreds of benchmark runs and times out.

fp16: wrap in try/except so a failed autocast retries in fp32 rather
than dropping the request silently.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 13:48:41 -04:00

scott

16ea2853f5

Initial implementation: Chatterbox TTS with ROCm and Wyoming

Build ROCm Image / build (push) Successful in 15m27s

Details

Wyoming-only server built around the official chatterbox TTS model.
Includes ROCm/AMD GPU support, sentence-level streaming, config.yaml
management, and Gitea CI for container builds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 09:51:09 -04:00

3 Commits