Skip to content

[IR Container] Phase 2.4 Per-fusion statement tracking #5961

Open
mdavis36 wants to merge 2 commits intomd/phase2-shared-ptrfrom
md/phase2-per-fusion
Open

[IR Container] Phase 2.4 Per-fusion statement tracking #5961
mdavis36 wants to merge 2 commits intomd/phase2-shared-ptrfrom
md/phase2-per-fusion

Conversation

@mdavis36
Copy link
Collaborator

@mdavis36 mdavis36 commented Feb 12, 2026

Summary

Add per-Fusion ownership tracking maps to IrContainer so each Fusion can efficiently query only its own Vals and Exprs within a shared container. Update all Fusion-level accessors to filter by ownership.

This is the highest-risk change in the Phase 2 chain. Every accessor call path is touched — vals(), deterministic_vals(), deterministic_exprs(), unordered_exprs(), and more now return ownership-filtered results rather than raw container contents. For single-Fusion containers the results are identical, but the implementation changes underneath every consumer.

Relationship to Phase 2

This is the critical invariant that makes shared containers safe:

Invariant: Fusion accessors filter by ownership

  Fusion A ─┐
             ├──→ shared_ptr<IrContainer> ──→ {val_0(A), val_1(A), val_0'(B), val_1'(B)}
  Fusion B ─┘

  A.vals()  →  {val_0, val_1}       // Only A's vals
  B.vals()  →  {val_0', val_1'}     // Only B's vals
  container->vals()  →  {val_0, val_1, val_0', val_1'}  // ALL vals (raw)

Without per-Fusion filtering, a shared container copy would break every consumer — Fusion::copy would iterate all vals (including other Fusions'), deterministic_vals() would return interleaved results, and StatementGuard rollback would destroy statements belonging to other Fusions.

This invariant is what allows Phase 2 to maintain independent IR graphs despite shared storage:

  • Invariant : fusion->vals() returns {v : v in container AND v->container() == fusion}
  • Invariant : Fusion::clear() only clears THIS Fusion's state (not ir_container()->clear())

CI Risk

Highest of all Phase 2 PRs. Every accessor-dependent code path is touched. For single-Fusion containers, filtered results are identical to unfiltered — but regressions would surface in accessor-heavy code paths (scheduling, lowering, codegen).

@mdavis36 mdavis36 changed the base branch from main to md/phase2-shared-ptr February 12, 2026 22:08
@github-actions
Copy link

github-actions bot commented Feb 12, 2026

Review updated until commit 2e491e9

Description

  • Add per-Fusion ownership tracking maps (per_fusion_vals_, per_fusion_exprs_) to IrContainer for efficient per-Fusion statement queries in shared containers

  • Update Fusion accessors (vals(), deterministic_vals(), deterministic_exprs(), unordered_exprs(), numExprs(), numVals()) to return ownership-filtered results

  • Modify Fusion::clear() to use removeStatementsOwnedBy(this) instead of clearing entire container, and update Fusion::swap() to transfer ownership

  • Fix StatementGuard to use numValsExcludingShortcuts() for correct LIFO rollback in shared containers

Changes walkthrough

Relevant files
Enhancement
fusion.cpp
Update Fusion methods for per-Fusion ownership tracking   

csrc/fusion.cpp

  • Update Fusion::swap() to call transferStatementOwnership() after
    container swap
  • Change Fusion::clear() to use
    ir_container_->removeStatementsOwnedBy(this) instead of
    ir_container()->clear()
  • Add per-Fusion tracking in registerVal(), registerExpr(), removeVal(),
    removeExpr()
  • Update removeStatementsCreatedAfter() to use per-Fusion counts and add
    ownership assertions
  • +25/-27 
    container.cpp
    Add ownership tracking methods to IrContainer                       

    csrc/ir/container.cpp

  • Add valsOwnedBy(), exprsOwnedBy() methods to query statements per
    Fusion
  • Add transferStatementOwnership() for Fusion swap operations
  • Add removeStatementsOwnedBy() with O(n) std::erase_if optimization
  • Add deterministicValsOwnedBy(), deterministicExprsOwnedBy(),
    deterministicValsMapOwnedBy(), deterministicExprsMapOwnedBy() methods
  • Update clear() and swap() to handle per-Fusion tracking maps
  • +131/-0 
    fusion.h
    Update Fusion accessors for ownership-filtered results     

    csrc/fusion.h

  • Change accessor return types from const to non-const (e.g.,
    deterministic_vals())
  • Update accessors to call ownership-filtered container methods
    (deterministicValsOwnedBy(), etc.)
  • Change numExprs() and numVals() to return per-Fusion counts
  • Add numValsExcludingShortcuts() to exclude singleton shortcut vals
    from LIFO count
  • +26/-15 
    container.h
    Add per-Fusion tracking data structures and method declarations

    csrc/ir/container.h

  • Add per_fusion_vals_ and per_fusion_exprs_ maps to track ownership
  • Declare new ownership query methods: valsOwnedBy(), exprsOwnedBy(),
    transferStatementOwnership(), removeStatementsOwnedBy()
  • Declare deterministic ownership methods: deterministicValsOwnedBy(),
    deterministicExprsOwnedBy(), deterministicValsMapOwnedBy(),
    deterministicExprsMapOwnedBy()
  • +18/-0   
    Bug fix
    statement_guard.cpp
    Update StatementGuard to use per-Fusion val count excluding shortcuts

    csrc/statement_guard.cpp

  • Change prev_num_vals_ to use numValsExcludingShortcuts() instead of
    numVals() for correct LIFO rollback
  • +1/-1     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review
    Iterator Invalidation Risk

    In removeStatementsOwnedBy, the code uses std::erase_if on vals_up_ and exprs_up_ deques while iterating over owned sets. While std::erase_if handles this correctly, the pattern of erasing elements during iteration over a different container (owned) should be validated to ensure no edge cases exist when the same pointer appears in multiple fusion ownership sets.

    void IrContainer::removeStatementsOwnedBy(const Fusion* fusion) {
      auto vals_it = per_fusion_vals_.find(fusion);
      if (vals_it != per_fusion_vals_.end()) {
        const auto& owned = vals_it->second;
        std::erase_if(vals_up_, [&](const std::unique_ptr<Val>& v) {
          if (owned.count(v.get()) > 0) {
            vals_.erase(v.get());
            return true;
          }
          return false;
        });
        per_fusion_vals_.erase(vals_it);
      }
    
      auto exprs_it = per_fusion_exprs_.find(fusion);
      if (exprs_it != per_fusion_exprs_.end()) {
        const auto& owned = exprs_it->second;
        std::erase_if(exprs_up_, [&](const std::unique_ptr<Expr>& e) {
          if (owned.count(e.get()) > 0) {
            exprs_.erase(e.get());
            return true;
          }
          return false;
        });
        per_fusion_exprs_.erase(exprs_it);
      }
    }
    Ownership Transfer Completeness

    In Fusion::swap, after calling transferStatementOwnership, the code still iterates over a.vals() and b.vals() to update ir_container_ pointers. Since vals() now returns ownership-filtered results, verify this iteration correctly captures all transferred statements or if additional handling is needed.

    a.ir_container()->transferStatementOwnership(&b, &a);
    b.ir_container()->transferStatementOwnership(&a, &b);
    
    if (a.ir_container_) {
      for (auto val : a.vals()) {
        val->ir_container_ = &a;
      }
      for (auto expr : a.deterministic_exprs()) {
        expr->ir_container_ = &a;
    Shortcut Val Count Logic

    The numValsExcludingShortcuts() method subtracts shortcut val pointers from the count. However, shortcut vals are registered in per_fusion_vals_ during registerVal(). Need to verify that the subtraction logic correctly accounts for all cases (e.g., when some shortcut vals are nullptr vs not yet created) and that this doesn't cause underflow.

    int64_t numValsExcludingShortcuts() const noexcept {
      int64_t count = std::ssize(ir_container()->valsOwnedBy(this));
      count -= (zero_val_ != nullptr) + (one_val_ != nullptr) +
          (true_val_ != nullptr) + (false_val_ != nullptr) +
          (magic_zero_val_ != nullptr);
      return count;
    }

    @mdavis36 mdavis36 force-pushed the md/phase2-shared-ptr branch from 3a199c8 to 53e5045 Compare February 12, 2026 22:09
    @mdavis36
    Copy link
    Collaborator Author

    !test

    @mdavis36 mdavis36 changed the title [IR Container] Phase 2 Per-fusion statement tracking [IR Container] Phase 2.4 Per-fusion statement tracking Feb 18, 2026
    @mdavis36 mdavis36 force-pushed the md/phase2-per-fusion branch from 33629cb to 8b162d9 Compare February 18, 2026 03:13
    @mdavis36 mdavis36 force-pushed the md/phase2-shared-ptr branch from 53e5045 to f8ff364 Compare February 18, 2026 03:13
    @mdavis36
    Copy link
    Collaborator Author

    !test

    @mdavis36 mdavis36 marked this pull request as ready for review February 18, 2026 06:37
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 18, 2026

    Greptile Summary

    This PR implements per-Fusion ownership tracking maps (per_fusion_vals_, per_fusion_exprs_) inside IrContainer, and updates every Fusion-level accessor to return only statements owned by that Fusion. It is the critical invariant that will allow a shared container to host multiple independent Fusions safely in later Phase 2 PRs. For current single-Fusion containers the filtered results are semantically identical to the previous unfiltered ones.

    Key changes and issues found:

    • removeStatementsOwnedBy skips Val::uses_ cleanup (csrc/ir/container.cpp:232-243): Expressions are destroyed via erase_if without calling in->removeUse(e) for their inputs. Safe today (all vals belonging to this Fusion are destroyed in the same call), but will create dangling pointers in uses_ sets of cross-Fusion vals once shared containers are used in earnest.
    • LIFO invariant undocumented precondition (csrc/fusion.cpp:430-443): removeStatementsCreatedAfter pops from the back of the global exprs_up_/vals_up_ deques and asserts ownership of the tail. This silently assumes no other Fusion interleaved allocations into the shared deque during the guard scope. For multi-Fusion shared containers with concurrent StatementGuard usage, this will fire NVF_ERROR. The precondition should be documented or enforced with an early check.
    • numValsExcludingShortcuts misses typed shortcuts (csrc/fusion.h:564-570): Only the 5 named fields (zero_val_, etc.) are excluded. Typed shortcuts from zeroVal(DataType) / oneVal(DataType) are tracked in per_fusion_vals_ but not subtracted, risking a dangling pointer in managed_data_ if they are first-created within a StatementGuard scope.
    • deterministicValsMapOwnedBy / deterministicExprsMapOwnedBy now emit sequential indices scoped to the fusion's owned subset, which is semantically identical to the old global-deque indices for single-Fusion containers but is a subtle semantic shift to be aware of.
    • exprsOwnedBy in container.h is missing NVF_API while valsOwnedBy has it — minor visibility inconsistency in shared-library builds.

    Confidence Score: 3/5

    • Safe to merge for single-Fusion containers; carries forward-looking logic hazards for the shared-container phases that follow.
    • For the current single-Fusion use case the filtering changes are semantically transparent and all existing tests should pass. However, three substantive issues were found that will become real bugs as soon as shared containers with cross-Fusion references are activated: the missing Val::uses_ cleanup in removeStatementsOwnedBy, the undocumented LIFO precondition in removeStatementsCreatedAfter, and the incomplete shortcut-val exclusion in numValsExcludingShortcuts. These do not block the current PR but will need resolution before or alongside the Phase 2 PRs that introduce actual sharing.
    • csrc/ir/container.cpp (missing uses_ cleanup), csrc/fusion.cpp (LIFO assumption), and csrc/fusion.h (numValsExcludingShortcuts completeness)

    Important Files Changed

    Filename Overview
    csrc/ir/container.cpp Adds per-Fusion ownership maps and all filtered accessors. Key concern: removeStatementsOwnedBy destroys exprs without cleaning up Val::uses_, which is safe today (single-Fusion) but will silently corrupt use-chains when shared containers with cross-Fusion val references are active.
    csrc/ir/container.h Adds new public API methods and private per_fusion_vals_/per_fusion_exprs_ maps. Declaration looks correct; exprsOwnedBy is missing NVF_API annotation unlike valsOwnedBy, which may affect visibility in shared-library builds.
    csrc/fusion.cpp All registration, removal, and swap paths updated to maintain per-Fusion ownership maps. removeStatementsCreatedAfter carries a LIFO assumption on the global deque that will fail with interleaved multi-Fusion allocation; currently guarded by NVF_ERROR but not documented as a precondition.
    csrc/fusion.h All Fusion-level accessors now delegate to ownership-filtered container methods. numValsExcludingShortcuts excludes only the 5 named shortcut fields but misses typed shortcut vals (zeroVal(dtype), oneVal(dtype)), creating a potential dangling pointer if those are lazily created inside a StatementGuard.
    csrc/statement_guard.cpp Baseline snapshot now uses numValsExcludingShortcuts() instead of numVals(), correctly preventing pre-existing shortcut vals from being rolled back. Change is minimal and correct for the single-Fusion case.

    Sequence Diagram

    sequenceDiagram
        participant FusionA as Fusion A
        participant FusionB as Fusion B
        participant C as shared IrContainer
        participant SG as StatementGuard
    
        FusionA->>C: registerVal(v1) → per_fusion_vals_[A].insert(v1)
        FusionA->>C: registerExpr(e1) → per_fusion_exprs_[A].insert(e1)
        FusionB->>C: registerVal(v2) → per_fusion_vals_[B].insert(v2)
        FusionB->>C: registerExpr(e2) → per_fusion_exprs_[B].insert(e2)
    
        Note over C: vals_up_ = [v1, v2]<br/>exprs_up_ = [e1, e2]
    
        SG->>FusionA: StatementGuard(A) captures numExprs(A)=1, numValsExcludingShortcuts(A)=1
        FusionA->>C: registerVal(v3) → per_fusion_vals_[A].insert(v3)
        FusionA->>C: registerExpr(e3) → per_fusion_exprs_[A].insert(e3)
    
        Note over C: vals_up_ = [v1, v2, v3]<br/>exprs_up_ = [e1, e2, e3]
    
        SG->>FusionA: ~StatementGuard → removeStatementsCreatedAfter(1, 1)
        FusionA->>C: exprs_up_.back()=e3 ✓ owned by A → pop, erase
        FusionA->>C: vals_up_.back()=v3 ✓ owned by A → pop, erase
    
        Note over C: Restored: vals_up_=[v1,v2], exprs_up_=[e1,e2]
    
        FusionA->>C: removeStatementsOwnedBy(A) [Fusion::clear()]
        C->>C: erase_if vals_up_ where owned[A] → destroys v1
        Note over C: ⚠ Val::uses_ on v1 not cleaned up<br/>(safe only if v1 not cross-referenced)
        C->>C: erase_if exprs_up_ where owned[A] → destroys e1
    
    Loading

    Last reviewed commit: 50cb886

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    4 files reviewed, 5 comments

    Edit Code Review Agent Settings | Greptile

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 18, 2026

    Additional Comments (2)

    csrc/fusion.cpp
    Shared-container safety gap in rollback

    In removeStatementsCreatedAfter, statements are popped from the back of the container-global exprs_up_ deque (line 445), but only the current Fusion's per-fusion map is cleaned up (line 449: c->per_fusion_exprs_[this].erase(e)).

    In a shared container scenario where another Fusion has appended statements after the StatementGuard snapshot, the statement at the back of exprs_up_ could belong to the other Fusion. This code would:

    1. Destroy the unique_ptr (freeing the Expr memory)
    2. Remove it from exprs_ (global set)
    3. Only erase from per_fusion_exprs_[this] — leaving a dangling pointer in the other Fusion's per-fusion set

    The same issue applies to the vals_up_ loop below (line 467).

    This is safe for single-Fusion containers today, but since this PR's stated goal is enabling shared-container safety, the rollback path should also clean up the correct owner's per-fusion map — e.g., by looking up which Fusion actually owns each statement being removed.


    csrc/fusion.h
    numExprs/numVals still return container-global counts

    numExprs() and numVals() forward directly to the container's total counts, not per-Fusion counts. This is inconsistent with the fact that vals(), unordered_exprs(), and deterministic_* now return per-Fusion filtered results.

    More importantly, StatementGuard uses numExprs() and numVals() to snapshot statement counts and later passes them to removeStatementsCreatedAfter, which pops from the global deques. In a shared container where another Fusion adds statements concurrently, the snapshot will include the other Fusion's additions, and the rollback will remove them — destroying statements that don't belong to this Fusion.

    For single-Fusion containers this is fine, but these methods should be updated to return per-Fusion counts (or at least documented as container-global) before shared containers are actually used.

    @mdavis36 mdavis36 force-pushed the md/phase2-per-fusion branch from 8b162d9 to b8d202d Compare February 26, 2026 00:29
    @mdavis36
    Copy link
    Collaborator Author

    !test

    mdavis36 added a commit that referenced this pull request Feb 26, 2026
    ## Summary
    
    Review fixes for PR #5961 (Per-Fusion statement tracking):
    
    - **O(n²) → O(n)**: Optimize `removeStatementsOwnedBy` with
    `std::erase_if`
    - **Per-Fusion counts**: Convert `numExprs()`/`numVals()` to return
    per-Fusion counts instead of global
    - **StatementGuard fixes**: Snapshot and compare per-Fusion counts for
    correct LIFO rollback in shared containers
    - **LIFO assertions**: Verify tail elements belong to this Fusion before
    popping
    
    ## Tests
    
    All tests pass:
    - ✅ StatementGuardTest.ExecuteAfterGuard
    - ✅ StatementGuardTest.LazySpecialValsNotDangling
    - ✅ FusionCopy_CUDA
    - ✅ FusionMove_CUDA
    @mdavis36 mdavis36 force-pushed the md/phase2-shared-ptr branch from c965408 to 5339bf5 Compare March 3, 2026 02:51
    mdavis36 added a commit that referenced this pull request Mar 3, 2026
    ## Summary
    
    Review fixes for PR #5961 (Per-Fusion statement tracking):
    
    - **O(n²) → O(n)**: Optimize `removeStatementsOwnedBy` with
    `std::erase_if`
    - **Per-Fusion counts**: Convert `numExprs()`/`numVals()` to return
    per-Fusion counts instead of global
    - **StatementGuard fixes**: Snapshot and compare per-Fusion counts for
    correct LIFO rollback in shared containers
    - **LIFO assertions**: Verify tail elements belong to this Fusion before
    popping
    
    ## Tests
    
    All tests pass:
    - ✅ StatementGuardTest.ExecuteAfterGuard
    - ✅ StatementGuardTest.LazySpecialValsNotDangling
    - ✅ FusionCopy_CUDA
    - ✅ FusionMove_CUDA
    @mdavis36 mdavis36 force-pushed the md/phase2-per-fusion branch from bc595c5 to 9f944b4 Compare March 3, 2026 02:51
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Mar 3, 2026

    Additional Comments (2)

    csrc/ir/container.cpp, line 220
    Iterator invalidation UB in transferStatementOwnership

    The per_fusion_vals_.find(from) iterator vals_it is stored, and then per_fusion_vals_[to] is called via operator[]. If to is a new key (not yet in the map), the insertion can trigger a rehash, which invalidates all existing iterators — including vals_it. The subsequent to_vals.insert(vals_it->second.begin(), vals_it->second.end()) and per_fusion_vals_.erase(vals_it) both dereference the now-dangling iterator, producing undefined behaviour.

    This is actively triggered by Fusion::swap, where the post-swap container has only one key (e.g. &b) and calling operator[&a] inserts a second entry, doubling the bucket count and rehashing.

    The same pattern repeats for per_fusion_exprs_ (lines 215–220).

    The fix is to extract the source set before any mutation of the map:

    auto vals_it = per_fusion_vals_.find(from);
    if (vals_it != per_fusion_vals_.end()) {
      // Copy the source set before operator[] can rehash and invalidate vals_it.
      auto from_vals = std::move(vals_it->second);
      per_fusion_vals_.erase(vals_it);
      per_fusion_vals_[to].insert(from_vals.begin(), from_vals.end());
    }
    
    auto exprs_it = per_fusion_exprs_.find(from);
    if (exprs_it != per_fusion_exprs_.end()) {
      auto from_exprs = std::move(exprs_it->second);
      per_fusion_exprs_.erase(exprs_it);
      per_fusion_exprs_[to].insert(from_exprs.begin(), from_exprs.end());
    }

    csrc/fusion.cpp, line 444
    LIFO invariant breaks silently under shared containers

    The expr-removal loop unconditionally pops from c->exprs_up_.back() and asserts (hard-abort via NVF_ERROR) that the tail entry belongs to this Fusion. The same invariant governs the vals loop (lines 447–466).

    For a single-Fusion container this holds, but once two Fusions share a container (the stated goal of Phase 2), another Fusion may register statements into the shared deque after this StatementGuard scope opened. In that case the tail could belong to the other Fusion, NVF_ERROR fires, and the process aborts rather than correctly rolling back only this Fusion's statements.

    Effectively, StatementGuard becomes incompatible with concurrent statement creation across Fusions sharing a container. Since this is Phase 2 infrastructure that explicitly targets shared containers, this is worth surfacing early — even if addressed in a later PR in the chain.

    One possible approach for future-proofing: instead of a deque tail pop, iterate c->exprs_up_ in reverse and erase only entries owned by this, or collect the set of exprs created during the scope and remove them directly by pointer rather than relying on positional ordering.

    @mdavis36 mdavis36 force-pushed the md/phase2-shared-ptr branch from 5339bf5 to 24e4f69 Compare March 4, 2026 21:49
    mdavis36 added a commit that referenced this pull request Mar 4, 2026
    ## Summary
    
    Review fixes for PR #5961 (Per-Fusion statement tracking):
    
    - **O(n²) → O(n)**: Optimize `removeStatementsOwnedBy` with
    `std::erase_if`
    - **Per-Fusion counts**: Convert `numExprs()`/`numVals()` to return
    per-Fusion counts instead of global
    - **StatementGuard fixes**: Snapshot and compare per-Fusion counts for
    correct LIFO rollback in shared containers
    - **LIFO assertions**: Verify tail elements belong to this Fusion before
    popping
    
    ## Tests
    
    All tests pass:
    - ✅ StatementGuardTest.ExecuteAfterGuard
    - ✅ StatementGuardTest.LazySpecialValsNotDangling
    - ✅ FusionCopy_CUDA
    - ✅ FusionMove_CUDA
    @mdavis36 mdavis36 force-pushed the md/phase2-per-fusion branch from 9f944b4 to 7b3ce1f Compare March 4, 2026 22:04
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Mar 4, 2026

    Additional Comments (3)

    csrc/ir/container.cpp, line 215
    Iterator invalidated by insertion before use

    per_fusion_vals_[to] performs an insertion when to is not yet a key in the map. Per the C++ standard, any insertion that triggers a rehash invalidates all existing iterators. Since vals_it was obtained before this call, if the insertion causes a rehash, both the subsequent vals_it->second read and the per_fusion_vals_.erase(vals_it) call are undefined behaviour.

    The same hazard applies in the per_fusion_exprs_ block a few lines below, where exprs_it is similarly invalidated by per_fusion_exprs_[to].

    A safe alternative is to extract the node first, which is both iterator-safe and avoids copying the entire set:

    void IrContainer::transferStatementOwnership(
        const Fusion* from,
        const Fusion* to) {
      auto vals_node = per_fusion_vals_.extract(from);
      if (vals_node) {
        auto& to_vals = per_fusion_vals_[to];
        to_vals.insert(
            std::make_move_iterator(vals_node.mapped().begin()),
            std::make_move_iterator(vals_node.mapped().end()));
      }
    
      auto exprs_node = per_fusion_exprs_.extract(from);
      if (exprs_node) {
        auto& to_exprs = per_fusion_exprs_[to];
        to_exprs.insert(
            std::make_move_iterator(exprs_node.mapped().begin()),
            std::make_move_iterator(exprs_node.mapped().end()));
      }
    }

    std::unordered_map::extract removes the node without rehashing the table, so no other iterators are invalidated, and the existing set memory is moved rather than copied.


    csrc/ir/container.cpp, line 244
    Missing Val::uses_ cleanup when removing exprs

    removeStatementsCreatedAfter (the StatementGuard rollback path) carefully calls in->removeUse(e) for every input Val before erasing each Expr. removeStatementsOwnedBy (the Fusion::clear() path) does not perform this cleanup.

    For single-Fusion containers this is currently safe because every owned Val is also destroyed in the same call, so dangling uses_ entries are never read. However, for shared containers — the stated end-goal of the Phase 2 chain — it is entirely possible that a Val owned by Fusion B references (via its uses_ set) an Expr owned by Fusion A. When Fusion A is cleared, Fusion B's Val retains a now-dangling pointer in its uses_ set, making any subsequent traversal of Val::uses_ unsafe.

    Removing exprs first (before destroying vals) and calling in->removeUse(e) for each input would make this safe for the multi-Fusion case and keep the two removal paths consistent with each other.


    csrc/fusion.cpp, line 436
    operator[] in assertion silently mutates the map

    c->per_fusion_exprs_[this] calls the non-const operator[], which inserts a default-constructed empty set if this is not yet a key in per_fusion_exprs_. Logically this should never fire (we only enter the loop body when exprsOwnedBy(this) is non-empty, so this must already be present), but the operator[] call still mutates the map as a side-effect of running the assertion.

    The same pattern appears in the val loop on line 448 (c->per_fusion_vals_[this].count(v)).

    Prefer a read-only lookup to avoid the accidental mutation:

        NVF_ERROR(
            c->per_fusion_exprs_.count(this) > 0 &&
                c->per_fusion_exprs_.at(this).count(e) > 0,
            "removeStatementsCreatedAfter: tail expr belongs to another Fusion");
    

    @mdavis36 mdavis36 force-pushed the md/phase2-shared-ptr branch from 24e4f69 to fab1748 Compare March 4, 2026 23:09
    mdavis36 added a commit that referenced this pull request Mar 4, 2026
    ## Summary
    
    Review fixes for PR #5961 (Per-Fusion statement tracking):
    
    - **O(n²) → O(n)**: Optimize `removeStatementsOwnedBy` with
    `std::erase_if`
    - **Per-Fusion counts**: Convert `numExprs()`/`numVals()` to return
    per-Fusion counts instead of global
    - **StatementGuard fixes**: Snapshot and compare per-Fusion counts for
    correct LIFO rollback in shared containers
    - **LIFO assertions**: Verify tail elements belong to this Fusion before
    popping
    
    ## Tests
    
    All tests pass:
    - ✅ StatementGuardTest.ExecuteAfterGuard
    - ✅ StatementGuardTest.LazySpecialValsNotDangling
    - ✅ FusionCopy_CUDA
    - ✅ FusionMove_CUDA
    @mdavis36 mdavis36 force-pushed the md/phase2-per-fusion branch from 7b3ce1f to 2e491e9 Compare March 4, 2026 23:09
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Mar 4, 2026

    Additional Comments (3)

    csrc/ir/container.cpp, line 216
    Iterator invalidation UB in transferStatementOwnership

    per_fusion_vals_[to] (line 205) uses operator[], which inserts a new entry when to is not yet in the map. Inserting into an std::unordered_map may trigger a rehash, which invalidates all iterators — including vals_it obtained on line 203. The subsequent vals_it->second.begin() / .end() call on line 206 is therefore undefined behaviour whenever to was not previously in the map. The same issue affects exprs_it / per_fusion_exprs_[to] on lines 212–214.

    Safe fix — extract the from set before touching the map for to:

    void IrContainer::transferStatementOwnership(
        const Fusion* from,
        const Fusion* to) {
      auto vals_it = per_fusion_vals_.find(from);
      if (vals_it != per_fusion_vals_.end()) {
        // Move out before any insertion that could rehash
        auto from_set = std::move(vals_it->second);
        per_fusion_vals_.erase(vals_it);
        per_fusion_vals_[to].insert(from_set.begin(), from_set.end());
      }
    
      auto exprs_it = per_fusion_exprs_.find(from);
      if (exprs_it != per_fusion_exprs_.end()) {
        auto from_set = std::move(exprs_it->second);
        per_fusion_exprs_.erase(exprs_it);
        per_fusion_exprs_[to].insert(from_set.begin(), from_set.end());
      }
    }
    

    Alternatively, std::unordered_map::extract can be used — it transfers the node without rehashing, keeping all iterators valid.


    csrc/ir/container.cpp, line 244
    Missing Val::uses_ cleanup when removing owned expressions

    removeStatementsOwnedBy destroys exprs via std::erase_if (which calls unique_ptr destructors) but never removes those exprs from the uses_ lists of their input Vals. Compare with removeStatementsCreatedAfter, which explicitly calls in->removeUse(e) for every input before popping each expr.

    For a shared container this is a latent bug: if a Val owned by Fusion B has Fusion A's Expr in its uses_ list, and Fusion::clear() for Fusion A calls removeStatementsOwnedBy, Fusion B's Val is left with a dangling pointer in uses_.

    For the current single-Fusion use case all Vals are destroyed in the same call, so the stale uses_ pointers are never read. However, given that the stated goal of Phase 2 is shared containers, the cleanup should mirror removeStatementsCreatedAfter:

    std::erase_if(exprs_up_, [&](const std::unique_ptr<Expr>& e) {
      if (owned.count(e.get()) > 0) {
        for (Val* in : e->inputs()) {
          in->removeUse(e.get());
        }
        exprs_.erase(e.get());
        return true;
      }
      return false;
    });
    

    csrc/fusion.cpp, line 466
    LIFO invariant holds only for single-Fusion containers

    Both loops pop from the global exprs_up_ / vals_up_ deque tail and then assert the popped element belongs to this Fusion. This is sound for a single-Fusion container, but the invariant breaks as soon as two Fusions interleave statement registration in a shared container.

    Consider:

    Guard created for Fusion A  (num_exprs_before = N)
    Fusion A registers expr_a   → tail of exprs_up_ is expr_a
    Fusion B registers expr_b   → tail of exprs_up_ is expr_b   ← different Fusion!
    Guard for A is destroyed:
      exprsOwnedBy(A).size() > N  → true
      exprs_up_.back() == expr_b  → NVF_ERROR fires
    

    The NVF_ERROR will surface the violation rather than silently corrupt state, which is good. But since this invariant is a load-bearing assumption for the StatementGuard rollback mechanism, it is worth recording it explicitly (e.g., a NVF_CHECK(sharingCount() == 1, ...) guard at the top of the function) or documenting that StatementGuard usage must be restricted to single-Fusion containers until a per-Fusion deque is introduced.

    @mdavis36
    Copy link
    Collaborator Author

    mdavis36 commented Mar 4, 2026

    Critical issues found:

    • Iterator invalidation UB in IrContainer::transferStatementOwnership: per_fusion_vals_[to] (operator[]) can trigger a rehash, invalidating the vals_it iterator obtained a line earlier; subsequent vals_it->second access is undefined behaviour. Same issue in the exprs branch.
    • Missing Val::uses_ cleanup in IrContainer::removeStatementsOwnedBy: exprs are destroyed via erase_if without removing themselves from their input Vals' uses_ lists — a latent dangling-pointer bug for shared containers.
    • LIFO invariant fragility in Fusion::removeStatementsCreatedAfter: rollback pops from the global exprs_up_/vals_up_ tail and asserts ownership; this fails (via NVF_ERROR) if another Fusion appends statements to the shared container after this Fusion's guard scope began. Should add sharingCount() == 1 guard or documentation.

    These are pretty much all addressed by changes in later chained PRs. They latter two are specifically addressed in #5983 as this came back to bite me w/ parallel compilation for shared containers across segments.

    @mdavis36 mdavis36 force-pushed the md/phase2-shared-ptr branch from fab1748 to dcc99d7 Compare March 5, 2026 00:07
    mdavis36 added 2 commits March 4, 2026 16:07
    Add per_fusion_vals_ / per_fusion_exprs_ maps to IrContainer so each
    Fusion can efficiently query only its own statements in a shared
    container. Fusion forwarding methods (vals(), unordered_exprs(),
    deterministic_vals(), etc.) now return per-Fusion filtered results.
    Fusion::clear() uses removeStatementsOwnedBy(this) instead of
    ir_container()->clear().
    ## Summary
    
    Review fixes for PR #5961 (Per-Fusion statement tracking):
    
    - **O(n²) → O(n)**: Optimize `removeStatementsOwnedBy` with
    `std::erase_if`
    - **Per-Fusion counts**: Convert `numExprs()`/`numVals()` to return
    per-Fusion counts instead of global
    - **StatementGuard fixes**: Snapshot and compare per-Fusion counts for
    correct LIFO rollback in shared containers
    - **LIFO assertions**: Verify tail elements belong to this Fusion before
    popping
    
    ## Tests
    
    All tests pass:
    - ✅ StatementGuardTest.ExecuteAfterGuard
    - ✅ StatementGuardTest.LazySpecialValsNotDangling
    - ✅ FusionCopy_CUDA
    - ✅ FusionMove_CUDA
    @mdavis36 mdavis36 force-pushed the md/phase2-per-fusion branch from 2e491e9 to 50cb886 Compare March 5, 2026 00:07
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Mar 5, 2026

    Additional Comments (3)

    csrc/ir/container.cpp, line 243
    removeStatementsOwnedBy silently skips Val::uses_ cleanup

    When removing expressions via removeStatementsOwnedBy, the inputs' uses_ sets are never updated. In contrast, removeStatementsCreatedAfter (the StatementGuard path) explicitly calls in->removeUse(e) for each input:

    // removeStatementsCreatedAfter (correct):
    for (Val* in : e->inputs()) {
      in->removeUse(e);
    }

    For the current single-Fusion case (Fusion::clear()), this is safe because all vals owned by this Fusion are destroyed in the same call — no live Val* retains the dangling uses_ entry. However, once shared containers land in a later phase, a Val owned by Fusion B could be an input to an Expr owned by Fusion A. After removeStatementsOwnedBy(A) destroys that expr, Fusion B's val retains a dangling pointer in its uses_ set, which any subsequent val->uses() traversal will dereference.

    Consider adding the same cleanup here to make the invariant hold regardless of sharing:

    std::erase_if(exprs_up_, [&](const std::unique_ptr<Expr>& e) {
      if (owned.count(e.get()) > 0) {
        for (Val* in : e->inputs()) {
          in->removeUse(e.get());
        }
        exprs_.erase(e.get());
        return true;
      }
      return false;
    });

    csrc/fusion.cpp, line 443
    LIFO invariant breaks silently under shared containers with interleaved allocation

    removeStatementsCreatedAfter pops from the back of the global exprs_up_ deque and then asserts the popped element is owned by this:

    Expr* e = c->exprs_up_.back().get();
    NVF_ERROR(
        c->per_fusion_exprs_[this].count(e) > 0,
        "removeStatementsCreatedAfter: tail expr belongs to another Fusion");

    This is safe today (single-Fusion containers) but fails in the target Phase 2 multi-Fusion scenario. Consider Fusion A and B sharing a container where StatementGuard is nested or interleaved:

    exprs_up_ (global tail): [..., A1, B1, A2]
    

    When A's guard destructs, it checks exprs_up_.back()A2 ✓. But if B's guard destructs first, it sees A2 at the back → NVF_ERROR fires. There is no safe ordering guarantee once allocations interleave.

    The NVF_ERROR prevents silent corruption, which is good, but it means StatementGuard cannot be used safely from two Fusions sharing a container. The same issue applies to the vals_up_ loop. A comment calling this out explicitly (or a NVF_ERROR early in the function checking !ir_container()->hasMultipleFusions()) would help future readers understand the precondition before relaxing it.


    csrc/fusion.h, line 570
    numValsExcludingShortcuts misses typed shortcut vals (zeroVal(dtype), oneVal(dtype))

    The function subtracts only the 5 named shortcut fields (zero_val_, one_val_, true_val_, false_val_, magic_zero_val_). Typed shortcuts created via zeroVal(DataType) and oneVal(DataType) are stored separately (e.g., in managed_data_) and are also registered in per_fusion_vals_ — but they are not excluded from this count.

    If a typed shortcut val (e.g., zeroVal(DataType::Float)) is lazily created inside a StatementGuard scope, it will be appended to the back of vals_up_ and included in numValsExcludingShortcuts(). The removeStatementsCreatedAfter loop will then pop and destroy it, leaving managed_data_ holding a dangling Val* for the rest of the Fusion's lifetime. Any subsequent use of that typed constant (e.g., during lowering) would dereference freed memory.

    The existing axioms() call in the StatementGuard constructor is the pattern used to pre-initialize lazy state before the guard captures the baseline. Consider extending that pattern to also pre-initialize any typed shortcut vals that callers are likely to create inside the guard, or document the precondition that typed shortcut vals must not be first-created within a StatementGuard scope.

    @mdavis36
    Copy link
    Collaborator Author

    mdavis36 commented Mar 5, 2026

    !test

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant