Add startup warmup and make fp16 autocast fault-tolerant
All checks were successful
Build ROCm Image / build (push) Successful in 3m10s
All checks were successful
Build ROCm Image / build (push) Successful in 3m10s
Warmup: run a synthesis before accepting Wyoming connections so MIOpen benchmarks and caches all conv layer shapes. Without this, the first HA request triggers hundreds of benchmark runs and times out. fp16: wrap in try/except so a failed autocast retries in fp32 rather than dropping the request silently. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -122,8 +122,13 @@ def synthesize(
|
||||
kwargs["exaggeration"] = exaggeration
|
||||
kwargs["cfg_weight"] = cfg_weight
|
||||
|
||||
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.float16):
|
||||
wav = chatterbox_model.generate(text=text, **kwargs)
|
||||
try:
|
||||
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.float16, enabled=torch.cuda.is_available()):
|
||||
wav = chatterbox_model.generate(text=text, **kwargs)
|
||||
except Exception:
|
||||
logger.warning("fp16 autocast failed, retrying in fp32", exc_info=True)
|
||||
with torch.inference_mode():
|
||||
wav = chatterbox_model.generate(text=text, **kwargs)
|
||||
|
||||
if torch.cuda.is_available():
|
||||
torch.cuda.synchronize()
|
||||
|
||||
Reference in New Issue
Block a user