Switch to ONNX runtime with chatterbox-turbo-ONNX (fp16)
Some checks failed
Build ROCm Image / build (push) Has been cancelled

Replaces the PyTorch/chatterbox-tts stack with direct ONNX inference
using ResembleAI/chatterbox-turbo-ONNX fp16 weights.

- engine.py: full rewrite — ONNX sessions, autoregressive KV-cache LM
  loop, voice conditionals cache via speech_encoder outputs
- wyoming_handler.py: remove torch dep, use np.asarray for audio bytes
- requirements-rocm-init.txt: onnxruntime-rocm replaces torch wheels
- requirements-rocm.txt: drop chatterbox/torch deps, keep audio utils
- Dockerfile.rocm: remove chatterbox-tts install step

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-06 19:08:26 -04:00
parent 4c79a82428
commit 2b1398109d
5 changed files with 209 additions and 103 deletions

View File

@@ -3,6 +3,8 @@ import logging
import time
from typing import Dict, Optional
import numpy as np
from wyoming.audio import AudioChunk, AudioStart, AudioStop
from wyoming.event import Event
from wyoming.info import Describe, Info
@@ -151,7 +153,7 @@ class ChatterboxWyomingHandler(AsyncEventHandler):
continue
audio_bytes = (
audio_tensor.cpu().numpy().squeeze() * 32767
np.asarray(audio_tensor).squeeze() * 32767
).astype("int16").tobytes()
if first_chunk: