fix(timing): warmup pass before timing loop to amortise torch.compile JIT#69
Closed
ivanbasov wants to merge 6 commits into
Closed
fix(timing): warmup pass before timing loop to amortise torch.compile JIT#69ivanbasov wants to merge 6 commits into
ivanbasov wants to merge 6 commits into
Conversation
…fault torch.compile=on combined with DataLoader spawn workers during LER validation causes a segfault (20 leaked semaphores, core dumped). Set PREDECODER_TORCH_COMPILE=0 for the Train all orientations step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vent segfault" This reverts commit 7f0f6c8.
… JIT Without this, the first batch in the timing loop bears the full torch.compile lazy-compilation cost (~887 ms vs ~1 ms steady-state), skewing Phase Timing numbers — especially at low sample counts like PREDECODER_INFERENCE_NUM_SAMPLES=1. The warmup only runs when torch.compile is active and TRT is not in use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracts the warmup block into a named helper so it can be tested in isolation. Five tests cover: fires when compile is active (CPU), skipped when compile is off, skipped when TRT context is present, CUDA sync called on GPU device, CUDA sync not called on CPU device. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pipeline_modulebefore the timing loop incompute_logical_error_ratetorch.compilelazy compilation so the JIT cost does not inflate the first-batch timing measurementtrt_context is None and _applied_compile(torch-only path with compile enabled)Motivation
Without this, the first batch in the timing loop bears the full torch.compile lazy-compilation cost, skewing Phase Timing numbers — especially at low sample counts (
PREDECODER_INFERENCE_NUM_SAMPLES=1):With large sample counts the JIT cost gets amortised naturally, but at small counts it dominates and makes Phase Timing numbers misleading. Proposed by Igor Almeida Baratta; approved by Ben Howe.
Test plan
test_inference_latency_timing.py,test_tensorrt_fallback.py)PREDECODER_INFERENCE_NUM_SAMPLES=1, confirm first-batch model-forward time matches steady-state🤖 Generated with Claude Code