Add fp16 autocast to synthesis for faster GPU throughput
All checks were successful
Build ROCm Image / build (push) Successful in 2m49s
All checks were successful
Build ROCm Image / build (push) Successful in 2m49s
The 6700 XT has significantly higher fp16 throughput than fp32.
autocast("cuda") uses fp16 for matmuls and convolutions (HiFiGAN,
S3 tokenizer, flow matching) while keeping fp32 for precision-sensitive
ops like softmax and layer norm.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -122,7 +122,7 @@ def synthesize(
|
|||||||
kwargs["exaggeration"] = exaggeration
|
kwargs["exaggeration"] = exaggeration
|
||||||
kwargs["cfg_weight"] = cfg_weight
|
kwargs["cfg_weight"] = cfg_weight
|
||||||
|
|
||||||
with torch.inference_mode():
|
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.float16):
|
||||||
wav = chatterbox_model.generate(text=text, **kwargs)
|
wav = chatterbox_model.generate(text=text, **kwargs)
|
||||||
|
|
||||||
if torch.cuda.is_available():
|
if torch.cuda.is_available():
|
||||||
|
|||||||
Reference in New Issue
Block a user