9b62fce5c5c4ef917e36fe96ce868b1d813d92c0
Converting t3/s3gen/ve to fp16 once at load time means: - Warmup runs in fp16, covering the right dtypes for all real requests - No per-call autocast casting overhead - ~2x faster matrix ops and convolutions on RDNA 2 hardware Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Description
No description provided
Languages
Python
100%