rocm-chatterbox-whisper

Go to file

Build ROCm Image / build (push) Successful in 2m39s

Details

[dev-fp16] Only convert T3 to fp16, leave s3gen/ve in fp32

s3gen.speaker_encoder (CAMPPlus xvector) hardcodes float32 inputs in
its inference() method, causing dtype mismatch when weights are fp16.
T3 (the autoregressive GPT-2-medium LLM) has no such constraint and
is the token-generation bottleneck that benefits most from fp16.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 20:41:24 -04:00

.gitea/workflows

Build dev-fp16 branch to :dev-fp16 tag (not :latest)

2026-04-05 20:34:54 -04:00

.gitignore

Initial implementation: Chatterbox TTS with ROCm and Wyoming

2026-04-05 09:51:09 -04:00

config.py

Multi-pass warmup and smaller chunk_size to fix HA timeout

2026-04-05 15:04:46 -04:00

config.yaml

Multi-pass warmup and smaller chunk_size to fix HA timeout