Add fp16 autocast to synthesis for faster GPU throughput

The 6700 XT has significantly higher fp16 throughput than fp32. autocast("cuda") uses fp16 for matmuls and convolutions (HiFiGAN, S3 tokenizer, flow matching) while keeping fp32 for precision-sensitive ops like softmax and layer norm. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:34:21 -04:00
parent a8e3e62dbc
commit f20699aed3
1 changed files with 1 additions and 1 deletions
--- a/engine.py
+++ b/engine.py
@@ -122,7 +122,7 @@ def synthesize(
        kwargs["exaggeration"] = exaggeration
        kwargs["cfg_weight"] = cfg_weight
-    with torch.inference_mode():
+    with torch.inference_mode(), torch.autocast("cuda", dtype=torch.float16):
        wav = chatterbox_model.generate(text=text, **kwargs)
    if torch.cuda.is_available():