Skip to content

PR #5961 Review Fixes — Per-Fusion Statement Tracking#6015

Merged
mdavis36 merged 4 commits intomd/phase2-per-fusionfrom
md/phase2-per-fusion-review-fixes
Feb 26, 2026
Merged

PR #5961 Review Fixes — Per-Fusion Statement Tracking#6015
mdavis36 merged 4 commits intomd/phase2-per-fusionfrom
md/phase2-per-fusion-review-fixes

Conversation

@mdavis36
Copy link
Collaborator

Summary

Review fixes for PR #5961 (Per-Fusion statement tracking):

  • O(n²) → O(n): Optimize removeStatementsOwnedBy with std::erase_if
  • Per-Fusion counts: Convert numExprs()/numVals() to return per-Fusion counts instead of global
  • StatementGuard fixes: Snapshot and compare per-Fusion counts for correct LIFO rollback in shared containers
  • LIFO assertions: Verify tail elements belong to this Fusion before popping

Tests

All tests pass:

  • ✅ StatementGuardTest.ExecuteAfterGuard
  • ✅ StatementGuardTest.LazySpecialValsNotDangling
  • ✅ FusionCopy_CUDA
  • ✅ FusionMove_CUDA

Replace manual iterate-and-erase loops in removeStatementsOwnedBy with
std::erase_if, which does single-pass partition-and-truncate instead of
repeated O(n) deque shifts. This fixes O(n²) performance when a Fusion
with many statements is destroyed.
…ortcuts

Change Fusion::numExprs() and Fusion::numVals() to return counts from
per_fusion_exprs_/per_fusion_vals_ (filtered to this Fusion) instead of
global container counts. This is necessary for correct StatementGuard
behavior with shared containers.

Add numValsExcludingShortcuts() helper that subtracts active shortcut val
pointers (zero_val_, one_val_, etc.) from the per-Fusion count. These
singletons should persist across guard scopes, so the count must exclude
them for LIFO pop-back consistency in removeStatementsCreatedAfter.
…tainers

Update Fusion::removeStatementsCreatedAfter to compare per-Fusion counts
(from exprsOwnedBy(this) and numValsExcludingShortcuts()) instead of global
deque sizes. This correctly handles shared containers where other Fusions'
statements would inflate the global counts.

Add NVF_ERROR assertions to verify the LIFO invariant: the tail element
of the global deque must belong to this Fusion. If violated, another
Fusion appended concurrently (should be prevented by PR #5971 locking).

Remove now-unnecessary deque size validation checks.
Change StatementGuard constructor to snapshot numExprs() (already per-Fusion)
and numValsExcludingShortcuts() instead of numVals() (now also per-Fusion).
This ensures the snapshot is consistent with the LIFO rollback logic in
removeStatementsCreatedAfter, which correctly handles shortcut singletons.
@mdavis36
Copy link
Collaborator Author

!test

@github-actions
Copy link

Description

  • Optimize statement removal performance from O(n²) to O(n) using std::erase_if

  • Convert numExprs/numVals to return per-Fusion counts instead of global container sizes

  • Add numValsExcludingShortcuts() helper for correct LIFO rollback in StatementGuard

  • Add LIFO assertions to verify tail elements belong to current Fusion before popping

Changes walkthrough

Relevant files
Enhancement
fusion.cpp
Fix removeStatementsCreatedAfter for per-Fusion counts and LIFO
verification

csrc/fusion.cpp

  • Update removeStatementsCreatedAfter to use per-Fusion counts via
    exprsOwnedBy(this)
  • Replace global deque size checks with LIFO assertions for tail element
    ownership
  • Change val removal to use numValsExcludingShortcuts() for correct
    shortcut handling
  • Add NVF_ERROR assertions to verify LIFO invariant before popping
    expressions and values
  • +11/-17 
    container.cpp
    Optimize removeStatementsOwnedBy performance with std::erase_if

    csrc/ir/container.cpp

  • Replace manual iterate-and-erase loops with std::erase_if for O(n)
    performance
  • Optimize both val and expr removal in removeStatementsOwnedBy function
  • Fix O(n²) performance issue when destroying Fusions with many
    statements
  • +14/-14 
    statement_guard.cpp
    Update StatementGuard to use per-Fusion counts excluding shortcuts

    csrc/statement_guard.cpp

  • Change StatementGuard to snapshot numValsExcludingShortcuts() instead
    of numVals()
  • Ensure snapshot consistency with LIFO rollback logic in
    removeStatementsCreatedAfter
  • Properly handle shortcut singletons that persist across guard scopes
  • +1/-1     
    fusion.h
    Convert count methods to per-Fusion and add shortcut exclusion helper

    csrc/fusion.h

  • Convert numExprs() to return per-Fusion count using exprsOwnedBy(this)
  • Convert numVals() to return per-Fusion count using valsOwnedBy(this)
  • Add numValsExcludingShortcuts() helper that subtracts shortcut val
    pointers
  • Update comments to clarify per-Fusion counting behavior
  • +16/-3   

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review
    LIFO Invariant Validation

    The new NVF_ERROR checks verify that tail elements belong to the correct Fusion before popping. This is crucial for maintaining the LIFO invariant in per-fusion statement tracking. The error messages are clear and provide context for debugging.

    c->per_fusion_exprs_[this].count(e) > 0,
    "removeStatementsCreatedAfter: tail expr belongs to another Fusion");
    Performance Optimization

    The optimization from manual iterator manipulation to std::erase_if is a good improvement. This changes the complexity from O(n²) to O(n) for the removeStatementsOwnedBy function. The lambda-based approach is cleaner and more maintainable.

    std::erase_if(vals_up_, [&](const std::unique_ptr<Val>& v) {
      if (owned.count(v.get()) > 0) {
        vals_.erase(v.get());
        return true;
      }
      return false;
    });
    Shortcut Value Handling

    The new numValsExcludingShortcuts() method correctly handles the edge case where shortcut values (zero_val_, one_val_, etc.) should persist across StatementGuard scopes. This ensures LIFO ordering works correctly by excluding these singleton values from the count used for rollback.

    int64_t numValsExcludingShortcuts() const noexcept {
      int64_t count = std::ssize(ir_container()->valsOwnedBy(this));
      count -= (zero_val_ != nullptr) + (one_val_ != nullptr) +
          (true_val_ != nullptr) + (false_val_ != nullptr) +
          (magic_zero_val_ != nullptr);
      return count;
    }

    @mdavis36 mdavis36 marked this pull request as ready for review February 26, 2026 15:07
    @mdavis36 mdavis36 merged commit bc595c5 into md/phase2-per-fusion Feb 26, 2026
    49 of 52 checks passed
    @mdavis36 mdavis36 deleted the md/phase2-per-fusion-review-fixes branch February 26, 2026 15:09
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 26, 2026

    Greptile Summary

    Applies review fixes from PR #5961 for per-Fusion statement tracking in shared containers:

    • Performance: Optimizes removeStatementsOwnedBy from O(n²) to O(n) using std::erase_if
    • Per-Fusion semantics: Converts numExprs()/numVals() to return per-Fusion counts instead of global counts
    • LIFO correctness: Updates removeStatementsCreatedAfter to use per-Fusion counts with assertions verifying tail elements belong to this Fusion before popping
    • Shortcut vals: Adds numValsExcludingShortcuts() to properly handle singleton shortcut vals (zero_val_, one_val_, etc.) that should persist across StatementGuard scopes

    All changes are well-implemented with defensive assertions to catch violations of the LIFO invariant. The optimizations are algorithmically sound and the per-Fusion counting ensures correct behavior when multiple Fusions share an IrContainer.

    Confidence Score: 5/5

    • This PR is safe to merge with no issues found
    • All changes are correct, well-tested (tests pass), include defensive assertions, and implement sound algorithmic optimizations. The per-Fusion counting logic properly handles edge cases like shortcut vals, and LIFO assertions ensure proper StatementGuard nesting
    • No files require special attention

    Important Files Changed

    Filename Overview
    csrc/ir/container.cpp Optimizes removeStatementsOwnedBy from O(n²) to O(n) using std::erase_if, correctly maintains both deque and set consistency
    csrc/fusion.h Converts numExprs()/numVals() to per-Fusion counts; adds numValsExcludingShortcuts() for correct StatementGuard rollback with singleton vals
    csrc/fusion.cpp Updates removeStatementsCreatedAfter to use per-Fusion counts and adds LIFO assertions to verify tail elements belong to this Fusion
    csrc/statement_guard.cpp Updates snapshot to use numValsExcludingShortcuts() for correct handling of singleton shortcut vals across guard scopes

    Last reviewed commit: 0abaf11

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    4 files reviewed, no comments

    Edit Code Review Agent Settings | Greptile

    mdavis36 added a commit that referenced this pull request Mar 3, 2026
    ## Summary
    
    Review fixes for PR #5961 (Per-Fusion statement tracking):
    
    - **O(n²) → O(n)**: Optimize `removeStatementsOwnedBy` with
    `std::erase_if`
    - **Per-Fusion counts**: Convert `numExprs()`/`numVals()` to return
    per-Fusion counts instead of global
    - **StatementGuard fixes**: Snapshot and compare per-Fusion counts for
    correct LIFO rollback in shared containers
    - **LIFO assertions**: Verify tail elements belong to this Fusion before
    popping
    
    ## Tests
    
    All tests pass:
    - ✅ StatementGuardTest.ExecuteAfterGuard
    - ✅ StatementGuardTest.LazySpecialValsNotDangling
    - ✅ FusionCopy_CUDA
    - ✅ FusionMove_CUDA
    mdavis36 added a commit that referenced this pull request Mar 4, 2026
    ## Summary
    
    Review fixes for PR #5961 (Per-Fusion statement tracking):
    
    - **O(n²) → O(n)**: Optimize `removeStatementsOwnedBy` with
    `std::erase_if`
    - **Per-Fusion counts**: Convert `numExprs()`/`numVals()` to return
    per-Fusion counts instead of global
    - **StatementGuard fixes**: Snapshot and compare per-Fusion counts for
    correct LIFO rollback in shared containers
    - **LIFO assertions**: Verify tail elements belong to this Fusion before
    popping
    
    ## Tests
    
    All tests pass:
    - ✅ StatementGuardTest.ExecuteAfterGuard
    - ✅ StatementGuardTest.LazySpecialValsNotDangling
    - ✅ FusionCopy_CUDA
    - ✅ FusionMove_CUDA
    mdavis36 added a commit that referenced this pull request Mar 4, 2026
    ## Summary
    
    Review fixes for PR #5961 (Per-Fusion statement tracking):
    
    - **O(n²) → O(n)**: Optimize `removeStatementsOwnedBy` with
    `std::erase_if`
    - **Per-Fusion counts**: Convert `numExprs()`/`numVals()` to return
    per-Fusion counts instead of global
    - **StatementGuard fixes**: Snapshot and compare per-Fusion counts for
    correct LIFO rollback in shared containers
    - **LIFO assertions**: Verify tail elements belong to this Fusion before
    popping
    
    ## Tests
    
    All tests pass:
    - ✅ StatementGuardTest.ExecuteAfterGuard
    - ✅ StatementGuardTest.LazySpecialValsNotDangling
    - ✅ FusionCopy_CUDA
    - ✅ FusionMove_CUDA
    mdavis36 added a commit that referenced this pull request Mar 5, 2026
    ## Summary
    
    Review fixes for PR #5961 (Per-Fusion statement tracking):
    
    - **O(n²) → O(n)**: Optimize `removeStatementsOwnedBy` with
    `std::erase_if`
    - **Per-Fusion counts**: Convert `numExprs()`/`numVals()` to return
    per-Fusion counts instead of global
    - **StatementGuard fixes**: Snapshot and compare per-Fusion counts for
    correct LIFO rollback in shared containers
    - **LIFO assertions**: Verify tail elements belong to this Fusion before
    popping
    
    ## Tests
    
    All tests pass:
    - ✅ StatementGuardTest.ExecuteAfterGuard
    - ✅ StatementGuardTest.LazySpecialValsNotDangling
    - ✅ FusionCopy_CUDA
    - ✅ FusionMove_CUDA
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant