Switch to ONNX runtime with chatterbox-turbo-ONNX (fp16)

Replaces the PyTorch/chatterbox-tts stack with direct ONNX inference using ResembleAI/chatterbox-turbo-ONNX fp16 weights. - engine.py: full rewrite — ONNX sessions, autoregressive KV-cache LM loop, voice conditionals cache via speech_encoder outputs - wyoming_handler.py: remove torch dep, use np.asarray for audio bytes - requirements-rocm-init.txt: onnxruntime-rocm replaces torch wheels - requirements-rocm.txt: drop chatterbox/torch deps, keep audio utils - Dockerfile.rocm: remove chatterbox-tts install step Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 19:08:26 -04:00
parent 4c79a82428
commit 2b1398109d
5 changed files with 209 additions and 103 deletions
--- a/Dockerfile.rocm
+++ b/Dockerfile.rocm
@@ -17,18 +17,15 @@ RUN apt-get update && apt-get install -y --no-install-recommends \

 WORKDIR /app

-# Step 1: Install ROCm-compatible PyTorch stack first.
-# This must happen before anything else to prevent pip from pulling CPU wheels.
+# Step 1: Install onnxruntime-rocm first so it claims the onnxruntime namespace
+# before any other package can pull in the CPU-only onnxruntime wheel.
 COPY requirements-rocm-init.txt .
 RUN pip3 install -r requirements-rocm-init.txt

-# Step 2: Install remaining dependencies (pinned to avoid overwriting torch).
+# Step 2: Install remaining dependencies.
 COPY requirements-rocm.txt .
 RUN pip3 install -r requirements-rocm.txt

-# Step 3: Install chatterbox with --no-deps so pip cannot replace ROCm torch.
-RUN pip3 install --no-deps chatterbox-tts
-
 # Application source
 COPY engine.py config.py wyoming_handler.py wyoming_voices.py main.py ./