rocm-chatterbox-whisper/engine.py at 7babd0584e520f5f1872aeb8d73e9d9a9efe761a

Files

Build ROCm Image / build (push) Successful in 2m47s

Details

Replace MIOpen convolution path with torch.compile on s3gen

The GemmFwdRest workspace=0 issue is in MIOpen itself — PyTorch's ROCm
backend does not allocate workspace for convolutions, causing HiFiGAN to
use a slow fallback solver regardless of benchmark settings.

torch.compile(s3gen, dynamic=True) replaces MIOpen's conv path with
Triton-generated kernels, bypassing the issue entirely. dynamic=True
handles variable audio lengths without recompiling per request. The warmup
triggers JIT compilation so first HA request is fast.

Also removes fp16 autocast (Triton handles precision internally) and
cudnn.benchmark (no longer needed without MIOpen convs).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 14:27:09 -04:00

4.2 KiB

Raw Blame History

View Raw

4.2 KiB Raw Blame History

4.2 KiB

Raw Blame History