Switch to ONNX runtime with chatterbox-turbo-ONNX (fp16)

Replaces the PyTorch/chatterbox-tts stack with direct ONNX inference using ResembleAI/chatterbox-turbo-ONNX fp16 weights. - engine.py: full rewrite — ONNX sessions, autoregressive KV-cache LM loop, voice conditionals cache via speech_encoder outputs - wyoming_handler.py: remove torch dep, use np.asarray for audio bytes - requirements-rocm-init.txt: onnxruntime-rocm replaces torch wheels - requirements-rocm.txt: drop chatterbox/torch deps, keep audio utils - Dockerfile.rocm: remove chatterbox-tts install step Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 19:08:26 -04:00
parent 4c79a82428
commit 2b1398109d
5 changed files with 209 additions and 103 deletions
--- a/requirements-rocm.txt
+++ b/requirements-rocm.txt
@@ -2,21 +2,10 @@
 numpy>=1.24.0,<2.0.0
 soundfile
 librosa==0.11.0
-pyloudnorm

-# ML dependencies (pinned to match chatterbox without overwriting ROCm torch)
-transformers==4.46.3
-diffusers==0.29.0
-safetensors>=0.4.1
+# ONNX model dependencies
+transformers>=4.40.0
 huggingface-hub
-omegaconf
-
-# Chatterbox dependencies (installed separately since chatterbox uses --no-deps)
-conformer==0.3.2
-s3tokenizer==0.3.0
-spacy-pkuseg
-pykakasi==2.3.0
-resemble-perth==1.0.1

 # Wyoming protocol
 wyoming>=1.5.4