rocm-chatterbox-whisper/engine.py at a8e3e62dbce7acecc77e1da3f3e5611baf4e9be2

Files

Build ROCm Image / build (push) Has been cancelled

Details

Enable cudnn.benchmark to fix MIOpen workspace=0 on convolutions

Timing showed s3gen.inference (HiFiGAN vocoder) taking 22s and ref audio
processing ~18s - both dominated by Conv1d ops hitting MIOpen fallback.

With benchmark=False (default), PyTorch passes ptr=0 size=0 workspace to
MIOpen causing GemmFwdRest to fail and fall back to a slow path every call.
With benchmark=True, PyTorch evaluates convolution algorithms with proper
workspace allocation and caches the best result via MIOPEN_USER_DB_PATH.

First inference will be slower while benchmarking; subsequent calls use cache.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 13:24:05 -04:00

3.9 KiB

Raw Blame History

View Raw

3.9 KiB Raw Blame History

3.9 KiB

Raw Blame History