scott 9b62fce5c5 [dev-fp16] Convert model weights to fp16 at load time
Converting t3/s3gen/ve to fp16 once at load time means:
- Warmup runs in fp16, covering the right dtypes for all real requests
- No per-call autocast casting overhead
- ~2x faster matrix ops and convolutions on RDNA 2 hardware

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:34:33 -04:00
Description
No description provided
213 KiB
Languages
Python 100%