rocm-chatterbox-whisper

scott/rocm-chatterbox-whisper

Fork 0

Commit Graph

Author	SHA1	Message	Date
scott	514bbad0e9	Enable cudnn.benchmark to fix MIOpen workspace=0 on convolutions Some checks failed Build ROCm Image / build (push) Has been cancelled Details Timing showed s3gen.inference (HiFiGAN vocoder) taking 22s and ref audio processing ~18s - both dominated by Conv1d ops hitting MIOpen fallback. With benchmark=False (default), PyTorch passes ptr=0 size=0 workspace to MIOpen causing GemmFwdRest to fail and fall back to a slow path every call. With benchmark=True, PyTorch evaluates convolution algorithms with proper workspace allocation and caches the best result via MIOPEN_USER_DB_PATH. First inference will be slower while benchmarking; subsequent calls use cache. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 13:24:05 -04:00
scott	bfe20b7742	Add timing instrumentation to pinpoint synthesis bottleneck All checks were successful Build ROCm Image / build (push) Successful in 3m21s Details Wraps s3tokenizer, voice_encoder, and s3gen.inference with timing logs so we can see exactly which step is consuming the missing ~33 seconds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 13:09:14 -04:00
scott	16ea2853f5	Initial implementation: Chatterbox TTS with ROCm and Wyoming All checks were successful Build ROCm Image / build (push) Successful in 15m27s Details Wyoming-only server built around the official chatterbox TTS model. Includes ROCm/AMD GPU support, sentence-level streaming, config.yaml management, and Gitea CI for container builds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 09:51:09 -04:00

Author

SHA1

Message

Date

scott

514bbad0e9

Enable cudnn.benchmark to fix MIOpen workspace=0 on convolutions

Build ROCm Image / build (push) Has been cancelled

Details

Timing showed s3gen.inference (HiFiGAN vocoder) taking 22s and ref audio
processing ~18s - both dominated by Conv1d ops hitting MIOpen fallback.

With benchmark=False (default), PyTorch passes ptr=0 size=0 workspace to
MIOpen causing GemmFwdRest to fail and fall back to a slow path every call.
With benchmark=True, PyTorch evaluates convolution algorithms with proper
workspace allocation and caches the best result via MIOPEN_USER_DB_PATH.

First inference will be slower while benchmarking; subsequent calls use cache.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 13:24:05 -04:00

scott

bfe20b7742

Add timing instrumentation to pinpoint synthesis bottleneck

Build ROCm Image / build (push) Successful in 3m21s

Details

Wraps s3tokenizer, voice_encoder, and s3gen.inference with timing logs
so we can see exactly which step is consuming the missing ~33 seconds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 13:09:14 -04:00

scott

16ea2853f5

Initial implementation: Chatterbox TTS with ROCm and Wyoming

Build ROCm Image / build (push) Successful in 15m27s

Details

Wyoming-only server built around the official chatterbox TTS model.
Includes ROCm/AMD GPU support, sentence-level streaming, config.yaml
management, and Gitea CI for container builds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 09:51:09 -04:00

3 Commits