rocm-chatterbox-whisper

Author	SHA1	Message	Date
scott	d83fabeaa8	Use :dev image tag in docker-compose All checks were successful Build ROCm Image / build (push) Successful in 14m40s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 18:13:14 -04:00
scott	d517f730c2	Merge main into dev and suppress MIOpen workspace warnings All checks were successful Build ROCm Image / build (push) Successful in 3m27s Details - Merge: voice conditionals cache and warmup pre-computation from main - Add MIOPEN_LOG_LEVEL=2 to suppress GemmFwdRest workspace=0 warnings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 17:41:54 -04:00
scott	8de67c8bd9	Switch to ROCm 6.1 + torch 2.5.1 to fix MIOpen workspace=0 slowness Some checks failed Build ROCm Image / build (push) Failing after 11s Details ROCm 7.2 + PyTorch 2.11.0 has a bug where PyTorch passes workspace=0 to MIOpen convolutions, forcing fallback to the slow GemmFwdRest solver. This caused s3gen.inference to take 15-22s instead of <5s, making synthesis 3-4x slower than real-time audio playback. ROCm 6.1 allocates workspace correctly so MIOpen picks fast GEMM solvers without needing torch.compile workarounds. Changes: - Base image: rocm/dev-ubuntu-22.04:7.2 → 6.1 - torch 2.11.0 → 2.5.1 (rocm6.1 wheel index) - Add pytorch_triton_rocm==3.1.0 - transformers 5.2.0 → 4.46.3, safetensors 0.5.3 → 0.4.0 - s3tokenizer unpinned → 0.3.0 - resemble-perth==1.0.1 directly (v1.0.1 is pip-installable; drop stub) - Drop Dockerfile perth_stub steps - Drop torch.compile and timing patches from engine.py (not needed) - Drop multi-pass warmup from main.py (torch JIT warmup not needed) - Drop ROCm 7.2-specific env vars from docker-compose.yml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:27:21 -04:00
scott	8c5d3c4f06	Suppress MIOpen workspace warning noise via MIOPEN_LOG_LEVEL=2 Some checks failed Build ROCm Image / build (push) Has been cancelled Details The GemmFwdRest workspace=0 warnings are expected (PyTorch ROCm passes null workspace; MIOpen falls back to a working solver). They are not actionable and clutter the logs. Level 2 keeps error-level output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 16:24:18 -04:00
scott	cd33b1c161	Fix MIOpen MLIR kernel compilation crash during benchmark search All checks were successful Build ROCm Image / build (push) Successful in 18s Details Two changes: - ulimits nofile=65536: MIOpen exhaustive search compiles many MLIR kernels in parallel, each opening temp files in /tmp. Default container limit (1024) is too low and ld.lld fails with 'too many open files'. - MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=0: disables the MLIR-based ImplicitGEMM solvers that generate the failing kernels, leaving Direct/Winograd/GEMM. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:21:32 -04:00
scott	e69b072b70	Add MIOPEN_DISABLE_CACHE=1 to prevent SQLite crash on benchmark All checks were successful Build ROCm Image / build (push) Successful in 19s Details cudnn.benchmark triggers MIOpen exhaustive kernel search which then crashes writing results to SQLite. Disabling the cache skips the write. PyTorch's in-memory benchmark cache still applies so warmup results are reused for all requests within a container run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:14:44 -04:00
scott	7436c49d44	Remove custom MIOpen cache path — let MIOpen use its defaults All checks were successful Build ROCm Image / build (push) Successful in 3m25s Details The named volume overlay was causing SQLite 'unable to open database file' crashes. MIOpen's default cache location (~/.config/miopen) works reliably inside the container. The startup warmup repopulates it each run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:04:05 -04:00
scott	b990cacd31	Enable HSA_OVERRIDE_GFX_VERSION=10.3.0 for RX 6700 XT All checks were successful Build ROCm Image / build (push) Successful in 18s Details gfx1031 is not natively supported in ROCm 7.2. Without the override the GPU falls back to software emulation causing 40+ second synthesis. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 13:07:46 -04:00
scott	2a80555c60	Disable MIOpen GEMM solver to fix null workspace fallback All checks were successful Build ROCm Image / build (push) Successful in 35s Details PyTorch passes ptr=0 size=0 workspace to MIOpen convolutions, causing GemmFwdRest to warn and fall back to a slow path on every operation. MIOPEN_DEBUG_CONV_GEMM=0 skips GEMM entirely and uses Direct/Winograd solvers which have no workspace requirement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 13:00:21 -04:00
scott	b68bccb20f	Revert to torch 2.5.1 + ROCm 6.1 (known working combination) Some checks failed Build ROCm Image / build (push) Has been cancelled Details PyTorch 2.11.0 with ROCm 7.2 wheels against rocm/dev-ubuntu-22.04:latest causes MIOpen version mismatches that force every convolution onto a slow zero-workspace fallback path (41s synthesis). The existing working project uses torch 2.5.1 + ROCm 6.1 successfully on the same base image. Also remove MIOPEN_FIND_ENFORCE override - unnecessary with matched versions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 12:34:06 -04:00
scott	7a966c8532	Fix MIOPEN_FIND_ENFORCE: 3 -> 1 (DB_UPDATE) Some checks failed Build ROCm Image / build (push) Has been cancelled Details Enforce=3 (SEARCH_DB_UPDATE) runs exhaustive kernel benchmarking on every single GPU operation, making inference impossibly slow. Enforce=1 searches once, writes to cache, then reuses cached results on all subsequent calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 12:29:54 -04:00
scott	f45fa0496e	Fix MIOpen workspace warnings and add kernel cache persistence Some checks failed Build ROCm Image / build (push) Has been cancelled Details MIOPEN_FIND_ENFORCE=3 tells MIOpen to only select solvers that fit in available workspace, eliminating the GemmFwdRest fallback warnings and the associated performance hit. Persisting the MIOpen cache via a named volume avoids kernel recompilation on every container start. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 12:20:18 -04:00
scott	dc7a3cf769	Upgrade to ROCm 7.2 and PyTorch 2.11.0 Some checks failed Build ROCm Image / build (push) Failing after 7m25s Details - Update torch/torchaudio to 2.11.0 with ROCm 7.2 wheel index - Drop torchvision (unused for TTS) and pytorch_triton_rocm (bundled in 2.11) - Update HSA_OVERRIDE_GFX_VERSION docs; RX 7000+ natively supported in ROCm 7.2 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 11:06:39 -04:00
scott	16ea2853f5	Initial implementation: Chatterbox TTS with ROCm and Wyoming All checks were successful Build ROCm Image / build (push) Successful in 15m27s Details Wyoming-only server built around the official chatterbox TTS model. Includes ROCm/AMD GPU support, sentence-level streaming, config.yaml management, and Gitea CI for container builds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 09:51:09 -04:00

14 Commits