Commit Graph

11 Commits

Author SHA1 Message Date
8c5d3c4f06 Suppress MIOpen workspace warning noise via MIOPEN_LOG_LEVEL=2
Some checks failed
Build ROCm Image / build (push) Has been cancelled
The GemmFwdRest workspace=0 warnings are expected (PyTorch ROCm passes
null workspace; MIOpen falls back to a working solver). They are not
actionable and clutter the logs. Level 2 keeps error-level output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 16:24:18 -04:00
cd33b1c161 Fix MIOpen MLIR kernel compilation crash during benchmark search
All checks were successful
Build ROCm Image / build (push) Successful in 18s
Two changes:
- ulimits nofile=65536: MIOpen exhaustive search compiles many MLIR
  kernels in parallel, each opening temp files in /tmp. Default container
  limit (1024) is too low and ld.lld fails with 'too many open files'.
- MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=0: disables the MLIR-based ImplicitGEMM
  solvers that generate the failing kernels, leaving Direct/Winograd/GEMM.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 14:21:32 -04:00
e69b072b70 Add MIOPEN_DISABLE_CACHE=1 to prevent SQLite crash on benchmark
All checks were successful
Build ROCm Image / build (push) Successful in 19s
cudnn.benchmark triggers MIOpen exhaustive kernel search which then
crashes writing results to SQLite. Disabling the cache skips the write.
PyTorch's in-memory benchmark cache still applies so warmup results are
reused for all requests within a container run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 14:14:44 -04:00
7436c49d44 Remove custom MIOpen cache path — let MIOpen use its defaults
All checks were successful
Build ROCm Image / build (push) Successful in 3m25s
The named volume overlay was causing SQLite 'unable to open database file'
crashes. MIOpen's default cache location (~/.config/miopen) works reliably
inside the container. The startup warmup repopulates it each run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 14:04:05 -04:00
b990cacd31 Enable HSA_OVERRIDE_GFX_VERSION=10.3.0 for RX 6700 XT
All checks were successful
Build ROCm Image / build (push) Successful in 18s
gfx1031 is not natively supported in ROCm 7.2. Without the override
the GPU falls back to software emulation causing 40+ second synthesis.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:07:46 -04:00
2a80555c60 Disable MIOpen GEMM solver to fix null workspace fallback
All checks were successful
Build ROCm Image / build (push) Successful in 35s
PyTorch passes ptr=0 size=0 workspace to MIOpen convolutions, causing
GemmFwdRest to warn and fall back to a slow path on every operation.
MIOPEN_DEBUG_CONV_GEMM=0 skips GEMM entirely and uses Direct/Winograd
solvers which have no workspace requirement.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:00:21 -04:00
b68bccb20f Revert to torch 2.5.1 + ROCm 6.1 (known working combination)
Some checks failed
Build ROCm Image / build (push) Has been cancelled
PyTorch 2.11.0 with ROCm 7.2 wheels against rocm/dev-ubuntu-22.04:latest
causes MIOpen version mismatches that force every convolution onto a slow
zero-workspace fallback path (41s synthesis). The existing working project
uses torch 2.5.1 + ROCm 6.1 successfully on the same base image.

Also remove MIOPEN_FIND_ENFORCE override - unnecessary with matched versions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 12:34:06 -04:00
7a966c8532 Fix MIOPEN_FIND_ENFORCE: 3 -> 1 (DB_UPDATE)
Some checks failed
Build ROCm Image / build (push) Has been cancelled
Enforce=3 (SEARCH_DB_UPDATE) runs exhaustive kernel benchmarking on
every single GPU operation, making inference impossibly slow. Enforce=1
searches once, writes to cache, then reuses cached results on all
subsequent calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 12:29:54 -04:00
f45fa0496e Fix MIOpen workspace warnings and add kernel cache persistence
Some checks failed
Build ROCm Image / build (push) Has been cancelled
MIOPEN_FIND_ENFORCE=3 tells MIOpen to only select solvers that fit in
available workspace, eliminating the GemmFwdRest fallback warnings and
the associated performance hit. Persisting the MIOpen cache via a named
volume avoids kernel recompilation on every container start.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 12:20:18 -04:00
dc7a3cf769 Upgrade to ROCm 7.2 and PyTorch 2.11.0
Some checks failed
Build ROCm Image / build (push) Failing after 7m25s
- Update torch/torchaudio to 2.11.0 with ROCm 7.2 wheel index
- Drop torchvision (unused for TTS) and pytorch_triton_rocm (bundled in 2.11)
- Update HSA_OVERRIDE_GFX_VERSION docs; RX 7000+ natively supported in ROCm 7.2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 11:06:39 -04:00
16ea2853f5 Initial implementation: Chatterbox TTS with ROCm and Wyoming
All checks were successful
Build ROCm Image / build (push) Successful in 15m27s
Wyoming-only server built around the official chatterbox TTS model.
Includes ROCm/AMD GPU support, sentence-level streaming, config.yaml
management, and Gitea CI for container builds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 09:51:09 -04:00