Fix warmup text length and ve attribute for torch.compile

- Warmup now uses a ~170-char representative sentence so torch.compile JIT-compiles for typical token sequence lengths. Previously "Warmup." compiled for very short shapes, causing a full re-compile (17s) on the first real HA request and pushing total synthesis past 30s. - Compile model.ve (voice encoder) in addition to s3gen — both are convolutional and hit the MIOpen workspace=0 bug. - Fix _patch_timing: attribute is model.ve not model.voice_encoder, so the timing wrap was silently skipping the speaker embedding. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 14:51:08 -04:00
parent 5766870304
commit 169e003a34
2 changed files with 19 additions and 10 deletions
--- a/main.py
+++ b/main.py
@@ -23,7 +23,13 @@ def _warmup(voices: dict) -> None:
    audio_prompt = resolve_voice(None, voices) if voices else None
    logger.info("Running warmup synthesis to populate MIOpen kernel cache...")
    try:
-        engine.synthesize(text="Warmup.", audio_prompt_path=audio_prompt)
+        engine.synthesize(
+            text=(
+                "This is a warmup synthesis request used to pre-compile neural network kernels "
+                "for typical text lengths, so that the first real request runs at full speed."
+            ),
+            audio_prompt_path=audio_prompt,
+        )
        logger.info("Warmup complete — MIOpen cache populated")
    except Exception:
        logger.warning("Warmup synthesis failed (non-fatal)", exc_info=True)