PyTorch passes ptr=0 size=0 workspace to MIOpen convolutions, causing
GemmFwdRest to warn and fall back to a slow path on every operation.
MIOPEN_DEBUG_CONV_GEMM=0 skips GEMM entirely and uses Direct/Winograd
solvers which have no workspace requirement.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Using :latest was pulling a ROCm 6.x image whose MIOpen was incompatible
with our ROCm 7.2 PyTorch wheels. Pinning to the 7.2 tag gives matching
MIOpen libraries and should resolve the workspace/fallback performance issue.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PyTorch 2.11.0 with ROCm 7.2 wheels against rocm/dev-ubuntu-22.04:latest
causes MIOpen version mismatches that force every convolution onto a slow
zero-workspace fallback path (41s synthesis). The existing working project
uses torch 2.5.1 + ROCm 6.1 successfully on the same base image.
Also remove MIOPEN_FIND_ENFORCE override - unnecessary with matched versions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Enforce=3 (SEARCH_DB_UPDATE) runs exhaustive kernel benchmarking on
every single GPU operation, making inference impossibly slow. Enforce=1
searches once, writes to cache, then reuses cached results on all
subsequent calls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MIOPEN_FIND_ENFORCE=3 tells MIOpen to only select solvers that fit in
available workspace, eliminating the GemmFwdRest fallback warnings and
the associated performance hit. Persisting the MIOpen cache via a named
volume avoids kernel recompilation on every container start.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mode=max was hitting a 400 Bad Request when pushing the large ROCm
PyTorch layer (~GB) as a separate cache blob. Inline cache embeds
metadata in the already-pushed image instead, so no separate upload.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CI: cache-from/cache-to with mode=max stores all intermediate layers
in the registry so subsequent builds skip unchanged layers (especially
the slow ROCm PyTorch download)
- Dockerfile: move COPY perth_stub.py below pip install layers so a
stub change doesn't bust the cache for everything above it
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
resemble-perth uses uv-build which is incompatible with the old system
pip in the ROCm base image. Since watermarking is unnecessary for
self-hosted private use, stub out the perth module so chatterbox's
import is satisfied without any build complexity.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update torch/torchaudio to 2.11.0 with ROCm 7.2 wheel index
- Drop torchvision (unused for TTS) and pytorch_triton_rocm (bundled in 2.11)
- Update HSA_OVERRIDE_GFX_VERSION docs; RX 7000+ natively supported in ROCm 7.2
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pip's isolated build environments don't have the uv binary available,
causing uv-build to fail. Installing with --no-build-isolation lets pip
use the already-installed uv from the system environment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pip's isolated build environments inherit system PATH but don't get
the uv binary automatically. Symlinking via uv.find_uv_bin() makes it
available so resemble-perth's uv-build backend can execute.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
resemble-perth uses uv as its build backend; without uv installed
the metadata-generation step fails.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update transformers to 5.2.0 (required by official chatterbox)
- Add omegaconf (pulled by s3gen/flow.py)
- Install resemble-perth from git source
- Pin safetensors to 0.5.3
- Remove onnx (not a chatterbox dep)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
resemble-perth, conformer, s3tokenizer, onnx, spacy-pkuseg, pykakasi,
and pyloudnorm are all chatterbox deps that were skipped by --no-deps
and need to be installed explicitly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wyoming-only server built around the official chatterbox TTS model.
Includes ROCm/AMD GPU support, sentence-level streaming, config.yaml
management, and Gitea CI for container builds.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>