Fix MIOpen MLIR kernel compilation crash during benchmark search
All checks were successful
Build ROCm Image / build (push) Successful in 18s
All checks were successful
Build ROCm Image / build (push) Successful in 18s
Two changes: - ulimits nofile=65536: MIOpen exhaustive search compiles many MLIR kernels in parallel, each opening temp files in /tmp. Default container limit (1024) is too low and ld.lld fails with 'too many open files'. - MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=0: disables the MLIR-based ImplicitGEMM solvers that generate the failing kernels, leaving Direct/Winograd/GEMM. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -17,6 +17,10 @@ services:
|
|||||||
shm_size: 8g
|
shm_size: 8g
|
||||||
security_opt:
|
security_opt:
|
||||||
- seccomp=unconfined
|
- seccomp=unconfined
|
||||||
|
ulimits:
|
||||||
|
nofile:
|
||||||
|
soft: 65536
|
||||||
|
hard: 65536
|
||||||
volumes:
|
volumes:
|
||||||
- ./config.yaml:/app/config.yaml
|
- ./config.yaml:/app/config.yaml
|
||||||
- ./voices:/app/voices
|
- ./voices:/app/voices
|
||||||
@@ -26,11 +30,12 @@ services:
|
|||||||
- HF_HUB_ENABLE_HF_TRANSFER=1
|
- HF_HUB_ENABLE_HF_TRANSFER=1
|
||||||
# Required for RX 6700 XT (gfx1031) - not natively supported in ROCm 7.2.
|
# Required for RX 6700 XT (gfx1031) - not natively supported in ROCm 7.2.
|
||||||
- HSA_OVERRIDE_GFX_VERSION=10.3.0
|
- HSA_OVERRIDE_GFX_VERSION=10.3.0
|
||||||
# Disable MIOpen's SQLite cache. Without this, cudnn.benchmark triggers an
|
# Disable MIOpen's SQLite cache — avoids crashes writing benchmark results.
|
||||||
# exhaustive kernel search and then crashes trying to write results to SQLite.
|
# PyTorch's in-memory benchmark cache still applies within a container run.
|
||||||
# PyTorch's own in-memory benchmark cache still works so warmup results are
|
|
||||||
# reused for all subsequent requests within the same container run.
|
|
||||||
- MIOPEN_DISABLE_CACHE=1
|
- MIOPEN_DISABLE_CACHE=1
|
||||||
|
# Disable MLIR-based ImplicitGEMM solvers. These compile MLIR kernels on the
|
||||||
|
# fly and hit 'too many open files' during the exhaustive benchmark search.
|
||||||
|
- MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=0
|
||||||
# - HF_TOKEN=your_token_here
|
# - HF_TOKEN=your_token_here
|
||||||
|
|
||||||
volumes:
|
volumes:
|
||||||
|
|||||||
Reference in New Issue
Block a user