Revert FP16 autocast — increases TTFA on first request
All checks were successful
Build ROCm Image / build (push) Successful in 3m21s
All checks were successful
Build ROCm Image / build (push) Successful in 3m21s
autocast triggers fp16 kernel selection at first call for each tensor shape. Since the warmup uses short text, real requests re-trigger selection and are slower net. Keeping FP32 + conditionals cache. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -116,8 +116,7 @@ def synthesize(
|
|||||||
kwargs["cfg_weight"] = cfg_weight
|
kwargs["cfg_weight"] = cfg_weight
|
||||||
|
|
||||||
with torch.inference_mode():
|
with torch.inference_mode():
|
||||||
with torch.amp.autocast(device_type="cuda", dtype=torch.float16):
|
wav = chatterbox_model.generate(text=text, **kwargs)
|
||||||
wav = chatterbox_model.generate(text=text, **kwargs)
|
|
||||||
|
|
||||||
if torch.cuda.is_available():
|
if torch.cuda.is_available():
|
||||||
torch.cuda.synchronize()
|
torch.cuda.synchronize()
|
||||||
|
|||||||
Reference in New Issue
Block a user