Converting t3/s3gen/ve to fp16 once at load time means: - Warmup runs in fp16, covering the right dtypes for all real requests - No per-call autocast casting overhead - ~2x faster matrix ops and convolutions on RDNA 2 hardware Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4.2 KiB
4.2 KiB