Revert FP16 autocast — increases TTFA on first request

autocast triggers fp16 kernel selection at first call for each tensor shape. Since the warmup uses short text, real requests re-trigger selection and are slower net. Keeping FP32 + conditionals cache. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:30:49 -04:00
parent 29b66e24bb
commit 967ed41239
1 changed files with 1 additions and 2 deletions
--- a/engine.py
+++ b/engine.py
@@ -116,8 +116,7 @@ def synthesize(
        kwargs["cfg_weight"] = cfg_weight

    with torch.inference_mode():
-        with torch.amp.autocast(device_type="cuda", dtype=torch.float16):
-            wav = chatterbox_model.generate(text=text, **kwargs)
+        wav = chatterbox_model.generate(text=text, **kwargs)

    if torch.cuda.is_available():
        torch.cuda.synchronize()