Switch to ONNX runtime with chatterbox-turbo-ONNX (fp16)

Replaces the PyTorch/chatterbox-tts stack with direct ONNX inference using ResembleAI/chatterbox-turbo-ONNX fp16 weights. - engine.py: full rewrite — ONNX sessions, autoregressive KV-cache LM loop, voice conditionals cache via speech_encoder outputs - wyoming_handler.py: remove torch dep, use np.asarray for audio bytes - requirements-rocm-init.txt: onnxruntime-rocm replaces torch wheels - requirements-rocm.txt: drop chatterbox/torch deps, keep audio utils - Dockerfile.rocm: remove chatterbox-tts install step Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 19:08:26 -04:00
parent 4c79a82428
commit 2b1398109d
5 changed files with 209 additions and 103 deletions
--- a/wyoming_handler.py
+++ b/wyoming_handler.py
@@ -3,6 +3,8 @@ import logging
 import time
 from typing import Dict, Optional

+import numpy as np
+
 from wyoming.audio import AudioChunk, AudioStart, AudioStop
 from wyoming.event import Event
 from wyoming.info import Describe, Info
@@ -151,7 +153,7 @@ class ChatterboxWyomingHandler(AsyncEventHandler):
                continue

            audio_bytes = (
-                audio_tensor.cpu().numpy().squeeze() * 32767
+                np.asarray(audio_tensor).squeeze() * 32767
            ).astype("int16").tobytes()

            if first_chunk: