rocm-chatterbox-whisper

Author	SHA1	Message	Date
scott	69f5489532	Merge branch 'main' into dev	2026-04-06 17:41:40 -04:00
scott	f292ace76c	Trigger rebuild to restore latest tag All checks were successful Build ROCm Image / build (push) Successful in 14m47s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 17:33:16 -04:00
scott	766ca9d278	Fix image tagging: dev branch tags as dev, not latest All checks were successful Build ROCm Image / build (push) Successful in 25s Details main branch → :latest + :sha other branches → :<branch-name> + :sha Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 17:29:59 -04:00
scott	9a017df4ca	Trigger CI builds on dev branch All checks were successful Build ROCm Image / build (push) Successful in 17m21s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 17:10:56 -04:00
scott	fe3c77ff4f	Upgrade to ROCm 7.2, Python 3.11, PyTorch 2.11.0 - Base image: rocm/dev-ubuntu-22.04:6.1 → 7.2 - Python 3.10 → 3.11 via deadsnakes PPA - torch/torchaudio: 2.5.1 → 2.11.0 - torchvision: 0.20.1 → 0.26.0 - pytorch_triton_rocm: 3.1.0 → 3.3.0 - transformers: 4.46.3 → >=4.50.0 - diffusers: 0.29.0 → >=0.32.0 - safetensors: >=0.4.1 → >=0.4.5 - config: temperature 0.8→0.9, seed 0→1960 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 17:09:56 -04:00
scott	967ed41239	Revert FP16 autocast — increases TTFA on first request All checks were successful Build ROCm Image / build (push) Successful in 3m21s Details autocast triggers fp16 kernel selection at first call for each tensor shape. Since the warmup uses short text, real requests re-trigger selection and are slower net. Keeping FP32 + conditionals cache. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:30:49 -04:00
scott	29b66e24bb	Cache voice conditionals and add FP16 autocast All checks were successful Build ROCm Image / build (push) Successful in 3m17s Details Voice conditionals (s3tokenizer + voice encoder + mel embeddings) are expensive to compute but depend only on the reference audio, not the text. Previously they ran on every synthesis chunk — 3x wasted work for a 3-chunk request. Now computed once at startup and reused. Also wrap generate() in torch.amp.autocast(float16) for ~2x speedup on all model computation (T3 LLM, S3Gen CFM, HiFiGAN vocoder). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:22:13 -04:00
scott	0fac076de1	Fix safetensors version conflict with transformers 4.46.3 All checks were successful Build ROCm Image / build (push) Successful in 14m11s Details transformers 4.46.3 requires safetensors>=0.4.1, not ==0.4.0. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:34:23 -04:00
scott	8de67c8bd9	Switch to ROCm 6.1 + torch 2.5.1 to fix MIOpen workspace=0 slowness Some checks failed Build ROCm Image / build (push) Failing after 11s Details ROCm 7.2 + PyTorch 2.11.0 has a bug where PyTorch passes workspace=0 to MIOpen convolutions, forcing fallback to the slow GemmFwdRest solver. This caused s3gen.inference to take 15-22s instead of <5s, making synthesis 3-4x slower than real-time audio playback. ROCm 6.1 allocates workspace correctly so MIOpen picks fast GEMM solvers without needing torch.compile workarounds. Changes: - Base image: rocm/dev-ubuntu-22.04:7.2 → 6.1 - torch 2.11.0 → 2.5.1 (rocm6.1 wheel index) - Add pytorch_triton_rocm==3.1.0 - transformers 5.2.0 → 4.46.3, safetensors 0.5.3 → 0.4.0 - s3tokenizer unpinned → 0.3.0 - resemble-perth==1.0.1 directly (v1.0.1 is pip-installable; drop stub) - Drop Dockerfile perth_stub steps - Drop torch.compile and timing patches from engine.py (not needed) - Drop multi-pass warmup from main.py (torch JIT warmup not needed) - Drop ROCm 7.2-specific env vars from docker-compose.yml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:27:21 -04:00
scott	23a0b914fa	Add per-event logging and top-level exception catching All checks were successful Build ROCm Image / build (push) Successful in 6m2s Details Log every event type on arrival and wrap handle_event in try/except so silent crashes are visible. Helps diagnose the streaming protocol hang where no logs appear after supports_synthesize_streaming=True. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:10:32 -04:00
scott	3d3e8bdabf	Add supports_synthesize_streaming=True to TtsProgram All checks were successful Build ROCm Image / build (push) Successful in 4m56s Details Without this flag HA buffers all audio until AudioStop before forwarding to the media player. With it, HA streams AudioChunk events to the player as they arrive, so playback starts on the first chunk rather than after the full text is synthesized. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 16:52:02 -04:00
scott	d0f13dea8d	Log incoming HA text in synthesis request line All checks were successful Build ROCm Image / build (push) Successful in 3m50s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 16:26:52 -04:00
scott	8c5d3c4f06	Suppress MIOpen workspace warning noise via MIOPEN_LOG_LEVEL=2 Some checks failed Build ROCm Image / build (push) Has been cancelled Details The GemmFwdRest workspace=0 warnings are expected (PyTorch ROCm passes null workspace; MIOpen falls back to a working solver). They are not actionable and clutter the logs. Level 2 keeps error-level output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 16:24:18 -04:00
scott	a196294d4a	Fix Wyoming protocol: remove SynthesizeStopped from Synthesize path Some checks failed Build ROCm Image / build (push) Has been cancelled Details The plain Synthesize event (HA's standard TTS path) should NOT be followed by SynthesizeStopped. That event belongs only to the streaming protocol (SynthesizeStart/Chunk/Stop). Sending it after Synthesize confuses HA's Wyoming client, causing it to hang indefinitely. Also: - Guard Synthesize path against duplicate events during streaming - Send audio as one AudioChunk per sentence (matches working reference) - Remove numpy import (no longer needed) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 16:22:47 -04:00
scott	59731084cd	Multi-pass warmup and smaller chunk_size to fix HA timeout All checks were successful Build ROCm Image / build (push) Successful in 2m49s Details torch.compile with dynamic=True still specializes per shape family on first call. The warmup was running one text length, leaving real requests to JIT-compile their own shapes (15-22s for first chunk). HA freezes because it gets no AudioChunk for 22 seconds. Fix: - Run 3 warmup passes (short/medium/long text) so torch.compile builds a dynamic shape graph covering the range HA actually sends. Real requests then hit a cached compilation and synthesize in 3-8s. - Reduce default chunk_size from 300 to 120 chars so the first text chunk is shorter, producing faster synthesis and earlier first audio. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 15:04:46 -04:00
scott	169e003a34	Fix warmup text length and ve attribute for torch.compile All checks were successful Build ROCm Image / build (push) Successful in 3m35s Details - Warmup now uses a ~170-char representative sentence so torch.compile JIT-compiles for typical token sequence lengths. Previously "Warmup." compiled for very short shapes, causing a full re-compile (17s) on the first real HA request and pushing total synthesis past 30s. - Compile model.ve (voice encoder) in addition to s3gen — both are convolutional and hit the MIOpen workspace=0 bug. - Fix _patch_timing: attribute is model.ve not model.voice_encoder, so the timing wrap was silently skipping the speaker embedding. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:51:08 -04:00
scott	5766870304	Fix UnboundLocalError: move torch._dynamo import to module level All checks were successful Build ROCm Image / build (push) Successful in 2m39s Details import inside a function creates a local binding that shadows the module-level torch import, breaking all earlier torch references in the same function scope. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:34:45 -04:00
scott	7babd0584e	Replace MIOpen convolution path with torch.compile on s3gen All checks were successful Build ROCm Image / build (push) Successful in 2m47s Details The GemmFwdRest workspace=0 issue is in MIOpen itself — PyTorch's ROCm backend does not allocate workspace for convolutions, causing HiFiGAN to use a slow fallback solver regardless of benchmark settings. torch.compile(s3gen, dynamic=True) replaces MIOpen's conv path with Triton-generated kernels, bypassing the issue entirely. dynamic=True handles variable audio lengths without recompiling per request. The warmup triggers JIT compilation so first HA request is fast. Also removes fp16 autocast (Triton handles precision internally) and cudnn.benchmark (no longer needed without MIOpen convs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:27:09 -04:00
scott	cd33b1c161	Fix MIOpen MLIR kernel compilation crash during benchmark search All checks were successful Build ROCm Image / build (push) Successful in 18s Details Two changes: - ulimits nofile=65536: MIOpen exhaustive search compiles many MLIR kernels in parallel, each opening temp files in /tmp. Default container limit (1024) is too low and ld.lld fails with 'too many open files'. - MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=0: disables the MLIR-based ImplicitGEMM solvers that generate the failing kernels, leaving Direct/Winograd/GEMM. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:21:32 -04:00
scott	e69b072b70	Add MIOPEN_DISABLE_CACHE=1 to prevent SQLite crash on benchmark All checks were successful Build ROCm Image / build (push) Successful in 19s Details cudnn.benchmark triggers MIOpen exhaustive kernel search which then crashes writing results to SQLite. Disabling the cache skips the write. PyTorch's in-memory benchmark cache still applies so warmup results are reused for all requests within a container run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:14:44 -04:00
scott	7436c49d44	Remove custom MIOpen cache path — let MIOpen use its defaults All checks were successful Build ROCm Image / build (push) Successful in 3m25s Details The named volume overlay was causing SQLite 'unable to open database file' crashes. MIOpen's default cache location (~/.config/miopen) works reliably inside the container. The startup warmup repopulates it each run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:04:05 -04:00
scott	60279389f2	Create miopen_cache dir in Dockerfile before volume mount All checks were successful Build ROCm Image / build (push) Successful in 3m11s Details MIOpen crashes with SQLite 'unable to open database file' when the directory doesn't exist at container start. mkdir + chmod 777 ensures the path is present and writable before the named volume overlays it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 13:52:03 -04:00
scott	bdde4a2480	Add startup warmup and make fp16 autocast fault-tolerant All checks were successful Build ROCm Image / build (push) Successful in 3m10s Details Warmup: run a synthesis before accepting Wyoming connections so MIOpen benchmarks and caches all conv layer shapes. Without this, the first HA request triggers hundreds of benchmark runs and times out. fp16: wrap in try/except so a failed autocast retries in fp32 rather than dropping the request silently. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 13:48:41 -04:00
scott	f20699aed3	Add fp16 autocast to synthesis for faster GPU throughput All checks were successful Build ROCm Image / build (push) Successful in 2m49s Details The 6700 XT has significantly higher fp16 throughput than fp32. autocast("cuda") uses fp16 for matmuls and convolutions (HiFiGAN, S3 tokenizer, flow matching) while keeping fp32 for precision-sensitive ops like softmax and layer norm. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 13:34:21 -04:00
scott	a8e3e62dbc	Stream audio in 4096-sample sub-chunks for immediate HA playback All checks were successful Build ROCm Image / build (push) Successful in 4m20s Details Previously the entire synthesized audio for a sentence was sent as one AudioChunk event. HA buffers until it arrives in full, so playback didn't start until synthesis was complete. Splitting into 4096-sample chunks lets HA begin playing as data arrives. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 13:25:54 -04:00
scott	514bbad0e9	Enable cudnn.benchmark to fix MIOpen workspace=0 on convolutions Some checks failed Build ROCm Image / build (push) Has been cancelled Details Timing showed s3gen.inference (HiFiGAN vocoder) taking 22s and ref audio processing ~18s - both dominated by Conv1d ops hitting MIOpen fallback. With benchmark=False (default), PyTorch passes ptr=0 size=0 workspace to MIOpen causing GemmFwdRest to fail and fall back to a slow path every call. With benchmark=True, PyTorch evaluates convolution algorithms with proper workspace allocation and caches the best result via MIOPEN_USER_DB_PATH. First inference will be slower while benchmarking; subsequent calls use cache. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 13:24:05 -04:00
scott	bfe20b7742	Add timing instrumentation to pinpoint synthesis bottleneck All checks were successful Build ROCm Image / build (push) Successful in 3m21s Details Wraps s3tokenizer, voice_encoder, and s3gen.inference with timing logs so we can see exactly which step is consuming the missing ~33 seconds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 13:09:14 -04:00
scott	b990cacd31	Enable HSA_OVERRIDE_GFX_VERSION=10.3.0 for RX 6700 XT All checks were successful Build ROCm Image / build (push) Successful in 18s Details gfx1031 is not natively supported in ROCm 7.2. Without the override the GPU falls back to software emulation causing 40+ second synthesis. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 13:07:46 -04:00
scott	2a80555c60	Disable MIOpen GEMM solver to fix null workspace fallback All checks were successful Build ROCm Image / build (push) Successful in 35s Details PyTorch passes ptr=0 size=0 workspace to MIOpen convolutions, causing GemmFwdRest to warn and fall back to a slow path on every operation. MIOPEN_DEBUG_CONV_GEMM=0 skips GEMM entirely and uses Direct/Winograd solvers which have no workspace requirement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 13:00:21 -04:00
scott	f15cdcf049	Pin base image to rocm/dev-ubuntu-22.04:7.2, restore torch 2.11.0 All checks were successful Build ROCm Image / build (push) Successful in 16m4s Details Using :latest was pulling a ROCm 6.x image whose MIOpen was incompatible with our ROCm 7.2 PyTorch wheels. Pinning to the 7.2 tag gives matching MIOpen libraries and should resolve the workspace/fallback performance issue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 12:35:58 -04:00
scott	b68bccb20f	Revert to torch 2.5.1 + ROCm 6.1 (known working combination) Some checks failed Build ROCm Image / build (push) Has been cancelled Details PyTorch 2.11.0 with ROCm 7.2 wheels against rocm/dev-ubuntu-22.04:latest causes MIOpen version mismatches that force every convolution onto a slow zero-workspace fallback path (41s synthesis). The existing working project uses torch 2.5.1 + ROCm 6.1 successfully on the same base image. Also remove MIOPEN_FIND_ENFORCE override - unnecessary with matched versions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 12:34:06 -04:00
scott	7a966c8532	Fix MIOPEN_FIND_ENFORCE: 3 -> 1 (DB_UPDATE) Some checks failed Build ROCm Image / build (push) Has been cancelled Details Enforce=3 (SEARCH_DB_UPDATE) runs exhaustive kernel benchmarking on every single GPU operation, making inference impossibly slow. Enforce=1 searches once, writes to cache, then reuses cached results on all subsequent calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 12:29:54 -04:00
scott	f45fa0496e	Fix MIOpen workspace warnings and add kernel cache persistence Some checks failed Build ROCm Image / build (push) Has been cancelled Details MIOPEN_FIND_ENFORCE=3 tells MIOpen to only select solvers that fit in available workspace, eliminating the GemmFwdRest fallback warnings and the associated performance hit. Persisting the MIOpen cache via a named volume avoids kernel recompilation on every container start. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 12:20:18 -04:00
scott	f81e5f42fb	Switch to inline cache to avoid registry blob size limits Some checks failed Build ROCm Image / build (push) Has been cancelled Details mode=max was hitting a 400 Bad Request when pushing the large ROCm PyTorch layer (~GB) as a separate cache blob. Inline cache embeds metadata in the already-pushed image instead, so no separate upload. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 12:14:35 -04:00
scott	d9a540f8e8	Add registry layer cache and fix Dockerfile cache order Some checks failed Build ROCm Image / build (push) Failing after 19m47s Details - CI: cache-from/cache-to with mode=max stores all intermediate layers in the registry so subsequent builds skip unchanged layers (especially the slow ROCm PyTorch download) - Dockerfile: move COPY perth_stub.py below pip install layers so a stub change doesn't bust the cache for everything above it Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 11:53:32 -04:00
scott	4b21d6c252	Fix TtsVoice missing required version argument Some checks failed Build ROCm Image / build (push) Has been cancelled Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 11:52:14 -04:00
scott	de6156a336	Replace resemble-perth with a no-op stub All checks were successful Build ROCm Image / build (push) Successful in 17m32s Details resemble-perth uses uv-build which is incompatible with the old system pip in the ROCm base image. Since watermarking is unnecessary for self-hosted private use, stub out the perth module so chatterbox's import is satisfied without any build complexity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 11:16:08 -04:00
scott	dc7a3cf769	Upgrade to ROCm 7.2 and PyTorch 2.11.0 Some checks failed Build ROCm Image / build (push) Failing after 7m25s Details - Update torch/torchaudio to 2.11.0 with ROCm 7.2 wheel index - Drop torchvision (unused for TTS) and pytorch_triton_rocm (bundled in 2.11) - Update HSA_OVERRIDE_GFX_VERSION docs; RX 7000+ natively supported in ROCm 7.2 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 11:06:39 -04:00
scott	d7247d31fe	Install resemble-perth with --no-build-isolation Some checks failed Build ROCm Image / build (push) Failing after 5m28s Details pip's isolated build environments don't have the uv binary available, causing uv-build to fail. Installing with --no-build-isolation lets pip use the already-installed uv from the system environment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 10:58:21 -04:00
scott	88c2084d19	Symlink uv binary to /usr/local/bin for pip build envs Some checks failed Build ROCm Image / build (push) Failing after 1m30s Details pip's isolated build environments inherit system PATH but don't get the uv binary automatically. Symlinking via uv.find_uv_bin() makes it available so resemble-perth's uv-build backend can execute. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 10:53:14 -04:00
scott	5d1689e7f4	Install uv before pip deps to support uv-build backend Some checks failed Build ROCm Image / build (push) Failing after 4m25s Details resemble-perth uses uv as its build backend; without uv installed the metadata-generation step fails. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 10:43:26 -04:00
scott	84e87dceb2	Fix chatterbox deps to match official pyproject.toml Some checks failed Build ROCm Image / build (push) Failing after 4m21s Details - Update transformers to 5.2.0 (required by official chatterbox) - Add omegaconf (pulled by s3gen/flow.py) - Install resemble-perth from git source - Pin safetensors to 0.5.3 - Remove onnx (not a chatterbox dep) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 10:34:42 -04:00
scott	7b34b202da	Add missing chatterbox deps to requirements All checks were successful Build ROCm Image / build (push) Successful in 13m41s Details resemble-perth, conformer, s3tokenizer, onnx, spacy-pkuseg, pykakasi, and pyloudnorm are all chatterbox deps that were skipped by --no-deps and need to be installed explicitly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 10:15:08 -04:00
scott	16ea2853f5	Initial implementation: Chatterbox TTS with ROCm and Wyoming All checks were successful Build ROCm Image / build (push) Successful in 15m27s Details Wyoming-only server built around the official chatterbox TTS model. Includes ROCm/AMD GPU support, sentence-level streaming, config.yaml management, and Gitea CI for container builds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 09:51:09 -04:00
scott	4b15e44181	Initial commit	2026-04-05 09:38:32 -04:00

45 Commits