[IR Container] Phase 2.4 Per-fusion statement tracking by mdavis36 · Pull Request #5961 · NVIDIA/Fuser

mdavis36 · 2026-02-12T22:08:17Z

Summary

Add per-Fusion ownership tracking maps to IrContainer so each Fusion can efficiently query only its own Vals and Exprs within a shared container. Update all Fusion-level accessors to filter by ownership.

This is the highest-risk change in the Phase 2 chain. Every accessor call path is touched — vals(), deterministic_vals(), deterministic_exprs(), unordered_exprs(), and more now return ownership-filtered results rather than raw container contents. For single-Fusion containers the results are identical, but the implementation changes underneath every consumer.

Relationship to Phase 2

This is the critical invariant that makes shared containers safe:

Invariant: Fusion accessors filter by ownership

  Fusion A ─┐
             ├──→ shared_ptr<IrContainer> ──→ {val_0(A), val_1(A), val_0'(B), val_1'(B)}
  Fusion B ─┘

  A.vals()  →  {val_0, val_1}       // Only A's vals
  B.vals()  →  {val_0', val_1'}     // Only B's vals
  container->vals()  →  {val_0, val_1, val_0', val_1'}  // ALL vals (raw)

Without per-Fusion filtering, a shared container copy would break every consumer — Fusion::copy would iterate all vals (including other Fusions'), deterministic_vals() would return interleaved results, and StatementGuard rollback would destroy statements belonging to other Fusions.

This invariant is what allows Phase 2 to maintain independent IR graphs despite shared storage:

Invariant : fusion->vals() returns {v : v in container AND v->container() == fusion}
Invariant : Fusion::clear() only clears THIS Fusion's state (not ir_container()->clear())

CI Risk

Highest of all Phase 2 PRs. Every accessor-dependent code path is touched. For single-Fusion containers, filtered results are identical to unfiltered — but regressions would surface in accessor-heavy code paths (scheduling, lowering, codegen).

github-actions · 2026-02-12T22:09:11Z

Review updated until commit 2e491e9

Description

Add per-Fusion ownership tracking maps (per_fusion_vals_, per_fusion_exprs_) to IrContainer for efficient per-Fusion statement queries in shared containers
Update Fusion accessors (vals(), deterministic_vals(), deterministic_exprs(), unordered_exprs(), numExprs(), numVals()) to return ownership-filtered results
Modify Fusion::clear() to use removeStatementsOwnedBy(this) instead of clearing entire container, and update Fusion::swap() to transfer ownership
Fix StatementGuard to use numValsExcludingShortcuts() for correct LIFO rollback in shared containers

Changes walkthrough

Relevant files

Enhancement

fusion.cpp `Update Fusion methods for per-Fusion ownership tracking` csrc/fusion.cpp Update `Fusion::swap()` to call `transferStatementOwnership()` after container swap Change `Fusion::clear()` to use `ir_container_->removeStatementsOwnedBy(this)` instead of `ir_container()->clear()` Add per-Fusion tracking in `registerVal()`, `registerExpr()`, `removeVal()`, `removeExpr()` Update `removeStatementsCreatedAfter()` to use per-Fusion counts and add ownership assertions	+25/-27
container.cpp `Add ownership tracking methods to IrContainer` csrc/ir/container.cpp Add `valsOwnedBy()`, `exprsOwnedBy()` methods to query statements per Fusion Add `transferStatementOwnership()` for Fusion swap operations Add `removeStatementsOwnedBy()` with O(n) `std::erase_if` optimization Add `deterministicValsOwnedBy()`, `deterministicExprsOwnedBy()`, `deterministicValsMapOwnedBy()`, `deterministicExprsMapOwnedBy()` methods Update `clear()` and `swap()` to handle per-Fusion tracking maps	+131/-0
fusion.h `Update Fusion accessors for ownership-filtered results` csrc/fusion.h Change accessor return types from `const` to non-const (e.g., `deterministic_vals()`) Update accessors to call ownership-filtered container methods (`deterministicValsOwnedBy()`, etc.) Change `numExprs()` and `numVals()` to return per-Fusion counts Add `numValsExcludingShortcuts()` to exclude singleton shortcut vals from LIFO count	+26/-15
container.h `Add per-Fusion tracking data structures and method declarations` csrc/ir/container.h Add `per_fusion_vals_` and `per_fusion_exprs_` maps to track ownership Declare new ownership query methods: `valsOwnedBy()`, `exprsOwnedBy()`, `transferStatementOwnership()`, `removeStatementsOwnedBy()` Declare deterministic ownership methods: `deterministicValsOwnedBy()`, `deterministicExprsOwnedBy()`, `deterministicValsMapOwnedBy()`, `deterministicExprsMapOwnedBy()`	+18/-0

Bug fix

statement_guard.cpp `Update StatementGuard to use per-Fusion val count excluding shortcuts` csrc/statement_guard.cpp Change `prev_num_vals_` to use `numValsExcludingShortcuts()` instead of `numVals()` for correct LIFO rollback	+1/-1

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests

⚡ Recommended focus areas for review

Iterator Invalidation Risk

In removeStatementsOwnedBy, the code uses std::erase_if on vals_up_ and exprs_up_ deques while iterating over owned sets. While std::erase_if handles this correctly, the pattern of erasing elements during iteration over a different container (owned) should be validated to ensure no edge cases exist when the same pointer appears in multiple fusion ownership sets.

void IrContainer::removeStatementsOwnedBy(const Fusion* fusion) {
  auto vals_it = per_fusion_vals_.find(fusion);
  if (vals_it != per_fusion_vals_.end()) {
    const auto& owned = vals_it->second;
    std::erase_if(vals_up_, [&](const std::unique_ptr<Val>& v) {
      if (owned.count(v.get()) > 0) {
        vals_.erase(v.get());
        return true;
      }
      return false;
    });
    per_fusion_vals_.erase(vals_it);
  }

  auto exprs_it = per_fusion_exprs_.find(fusion);
  if (exprs_it != per_fusion_exprs_.end()) {
    const auto& owned = exprs_it->second;
    std::erase_if(exprs_up_, [&](const std::unique_ptr<Expr>& e) {
      if (owned.count(e.get()) > 0) {
        exprs_.erase(e.get());
        return true;
      }
      return false;
    });
    per_fusion_exprs_.erase(exprs_it);
  }
}

Ownership Transfer Completeness

In Fusion::swap, after calling transferStatementOwnership, the code still iterates over a.vals() and b.vals() to update ir_container_ pointers. Since vals() now returns ownership-filtered results, verify this iteration correctly captures all transferred statements or if additional handling is needed.

a.ir_container()->transferStatementOwnership(&b, &a);
b.ir_container()->transferStatementOwnership(&a, &b);

if (a.ir_container_) {
  for (auto val : a.vals()) {
    val->ir_container_ = &a;
  }
  for (auto expr : a.deterministic_exprs()) {
    expr->ir_container_ = &a;

Shortcut Val Count Logic

The numValsExcludingShortcuts() method subtracts shortcut val pointers from the count. However, shortcut vals are registered in per_fusion_vals_ during registerVal(). Need to verify that the subtraction logic correctly accounts for all cases (e.g., when some shortcut vals are nullptr vs not yet created) and that this doesn't cause underflow.

int64_t numValsExcludingShortcuts() const noexcept {
  int64_t count = std::ssize(ir_container()->valsOwnedBy(this));
  count -= (zero_val_ != nullptr) + (one_val_ != nullptr) +
      (true_val_ != nullptr) + (false_val_ != nullptr) +
      (magic_zero_val_ != nullptr);
  return count;
}

mdavis36 · 2026-02-12T22:09:32Z

!test

mdavis36 · 2026-02-18T06:37:34Z

!test

greptile-apps · 2026-02-18T06:44:04Z

Greptile Summary

This PR implements per-Fusion ownership tracking maps (per_fusion_vals_, per_fusion_exprs_) inside IrContainer, and updates every Fusion-level accessor to return only statements owned by that Fusion. It is the critical invariant that will allow a shared container to host multiple independent Fusions safely in later Phase 2 PRs. For current single-Fusion containers the filtered results are semantically identical to the previous unfiltered ones.

Key changes and issues found:

removeStatementsOwnedBy skips Val::uses_ cleanup (csrc/ir/container.cpp:232-243): Expressions are destroyed via erase_if without calling in->removeUse(e) for their inputs. Safe today (all vals belonging to this Fusion are destroyed in the same call), but will create dangling pointers in uses_ sets of cross-Fusion vals once shared containers are used in earnest.
LIFO invariant undocumented precondition (csrc/fusion.cpp:430-443): removeStatementsCreatedAfter pops from the back of the global exprs_up_/vals_up_ deques and asserts ownership of the tail. This silently assumes no other Fusion interleaved allocations into the shared deque during the guard scope. For multi-Fusion shared containers with concurrent StatementGuard usage, this will fire NVF_ERROR. The precondition should be documented or enforced with an early check.
numValsExcludingShortcuts misses typed shortcuts (csrc/fusion.h:564-570): Only the 5 named fields (zero_val_, etc.) are excluded. Typed shortcuts from zeroVal(DataType) / oneVal(DataType) are tracked in per_fusion_vals_ but not subtracted, risking a dangling pointer in managed_data_ if they are first-created within a StatementGuard scope.
deterministicValsMapOwnedBy / deterministicExprsMapOwnedBy now emit sequential indices scoped to the fusion's owned subset, which is semantically identical to the old global-deque indices for single-Fusion containers but is a subtle semantic shift to be aware of.
exprsOwnedBy in container.h is missing NVF_API while valsOwnedBy has it — minor visibility inconsistency in shared-library builds.

Confidence Score: 3/5

Safe to merge for single-Fusion containers; carries forward-looking logic hazards for the shared-container phases that follow.
For the current single-Fusion use case the filtering changes are semantically transparent and all existing tests should pass. However, three substantive issues were found that will become real bugs as soon as shared containers with cross-Fusion references are activated: the missing Val::uses_ cleanup in removeStatementsOwnedBy, the undocumented LIFO precondition in removeStatementsCreatedAfter, and the incomplete shortcut-val exclusion in numValsExcludingShortcuts. These do not block the current PR but will need resolution before or alongside the Phase 2 PRs that introduce actual sharing.
csrc/ir/container.cpp (missing uses_ cleanup), csrc/fusion.cpp (LIFO assumption), and csrc/fusion.h (numValsExcludingShortcuts completeness)

Important Files Changed

Filename	Overview
csrc/ir/container.cpp	Adds per-Fusion ownership maps and all filtered accessors. Key concern: `removeStatementsOwnedBy` destroys exprs without cleaning up `Val::uses_`, which is safe today (single-Fusion) but will silently corrupt use-chains when shared containers with cross-Fusion val references are active.
csrc/ir/container.h	Adds new public API methods and private `per_fusion_vals_`/`per_fusion_exprs_` maps. Declaration looks correct; `exprsOwnedBy` is missing `NVF_API` annotation unlike `valsOwnedBy`, which may affect visibility in shared-library builds.
csrc/fusion.cpp	All registration, removal, and swap paths updated to maintain per-Fusion ownership maps. `removeStatementsCreatedAfter` carries a LIFO assumption on the global deque that will fail with interleaved multi-Fusion allocation; currently guarded by `NVF_ERROR` but not documented as a precondition.
csrc/fusion.h	All `Fusion`-level accessors now delegate to ownership-filtered container methods. `numValsExcludingShortcuts` excludes only the 5 named shortcut fields but misses typed shortcut vals (`zeroVal(dtype)`, `oneVal(dtype)`), creating a potential dangling pointer if those are lazily created inside a `StatementGuard`.
csrc/statement_guard.cpp	Baseline snapshot now uses `numValsExcludingShortcuts()` instead of `numVals()`, correctly preventing pre-existing shortcut vals from being rolled back. Change is minimal and correct for the single-Fusion case.

Sequence Diagram

sequenceDiagram
    participant FusionA as Fusion A
    participant FusionB as Fusion B
    participant C as shared IrContainer
    participant SG as StatementGuard

    FusionA->>C: registerVal(v1) → per_fusion_vals_[A].insert(v1)
    FusionA->>C: registerExpr(e1) → per_fusion_exprs_[A].insert(e1)
    FusionB->>C: registerVal(v2) → per_fusion_vals_[B].insert(v2)
    FusionB->>C: registerExpr(e2) → per_fusion_exprs_[B].insert(e2)

    Note over C: vals_up_ = [v1, v2]<br/>exprs_up_ = [e1, e2]

    SG->>FusionA: StatementGuard(A) captures numExprs(A)=1, numValsExcludingShortcuts(A)=1
    FusionA->>C: registerVal(v3) → per_fusion_vals_[A].insert(v3)
    FusionA->>C: registerExpr(e3) → per_fusion_exprs_[A].insert(e3)

    Note over C: vals_up_ = [v1, v2, v3]<br/>exprs_up_ = [e1, e2, e3]

    SG->>FusionA: ~StatementGuard → removeStatementsCreatedAfter(1, 1)
    FusionA->>C: exprs_up_.back()=e3 ✓ owned by A → pop, erase
    FusionA->>C: vals_up_.back()=v3 ✓ owned by A → pop, erase

    Note over C: Restored: vals_up_=[v1,v2], exprs_up_=[e1,e2]

    FusionA->>C: removeStatementsOwnedBy(A) [Fusion::clear()]
    C->>C: erase_if vals_up_ where owned[A] → destroys v1
    Note over C: ⚠ Val::uses_ on v1 not cleaned up<br/>(safe only if v1 not cross-referenced)
    C->>C: erase_if exprs_up_ where owned[A] → destroys e1

_{Last reviewed commit: 50cb886}

greptile-apps

_{4 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

csrc/ir/container.cpp

csrc/fusion.cpp

greptile-apps · 2026-02-18T06:44:19Z

Additional Comments (2)

csrc/fusion.cpp
Shared-container safety gap in rollback

In removeStatementsCreatedAfter, statements are popped from the back of the container-global exprs_up_ deque (line 445), but only the current Fusion's per-fusion map is cleaned up (line 449: c->per_fusion_exprs_[this].erase(e)).

In a shared container scenario where another Fusion has appended statements after the StatementGuard snapshot, the statement at the back of exprs_up_ could belong to the other Fusion. This code would:

Destroy the unique_ptr (freeing the Expr memory)
Remove it from exprs_ (global set)
Only erase from per_fusion_exprs_[this] — leaving a dangling pointer in the other Fusion's per-fusion set

The same issue applies to the vals_up_ loop below (line 467).

This is safe for single-Fusion containers today, but since this PR's stated goal is enabling shared-container safety, the rollback path should also clean up the correct owner's per-fusion map — e.g., by looking up which Fusion actually owns each statement being removed.

csrc/fusion.h
numExprs/numVals still return container-global counts

numExprs() and numVals() forward directly to the container's total counts, not per-Fusion counts. This is inconsistent with the fact that vals(), unordered_exprs(), and deterministic_* now return per-Fusion filtered results.

More importantly, StatementGuard uses numExprs() and numVals() to snapshot statement counts and later passes them to removeStatementsCreatedAfter, which pops from the global deques. In a shared container where another Fusion adds statements concurrently, the snapshot will include the other Fusion's additions, and the rollback will remove them — destroying statements that don't belong to this Fusion.

For single-Fusion containers this is fine, but these methods should be updated to return per-Fusion counts (or at least documented as container-global) before shared containers are actually used.

mdavis36 · 2026-02-26T03:57:17Z

!test

## Summary Review fixes for PR #5961 (Per-Fusion statement tracking): - **O(n²) → O(n)**: Optimize `removeStatementsOwnedBy` with `std::erase_if` - **Per-Fusion counts**: Convert `numExprs()`/`numVals()` to return per-Fusion counts instead of global - **StatementGuard fixes**: Snapshot and compare per-Fusion counts for correct LIFO rollback in shared containers - **LIFO assertions**: Verify tail elements belong to this Fusion before popping ## Tests All tests pass: - ✅ StatementGuardTest.ExecuteAfterGuard - ✅ StatementGuardTest.LazySpecialValsNotDangling - ✅ FusionCopy_CUDA - ✅ FusionMove_CUDA

greptile-apps · 2026-03-03T02:56:35Z

Additional Comments (2)

csrc/ir/container.cpp, line 220
Iterator invalidation UB in transferStatementOwnership

The per_fusion_vals_.find(from) iterator vals_it is stored, and then per_fusion_vals_[to] is called via operator[]. If to is a new key (not yet in the map), the insertion can trigger a rehash, which invalidates all existing iterators — including vals_it. The subsequent to_vals.insert(vals_it->second.begin(), vals_it->second.end()) and per_fusion_vals_.erase(vals_it) both dereference the now-dangling iterator, producing undefined behaviour.

This is actively triggered by Fusion::swap, where the post-swap container has only one key (e.g. &b) and calling operator[&a] inserts a second entry, doubling the bucket count and rehashing.

The same pattern repeats for per_fusion_exprs_ (lines 215–220).

The fix is to extract the source set before any mutation of the map:

auto vals_it = per_fusion_vals_.find(from);
if (vals_it != per_fusion_vals_.end()) {
  // Copy the source set before operator[] can rehash and invalidate vals_it.
  auto from_vals = std::move(vals_it->second);
  per_fusion_vals_.erase(vals_it);
  per_fusion_vals_[to].insert(from_vals.begin(), from_vals.end());
}

auto exprs_it = per_fusion_exprs_.find(from);
if (exprs_it != per_fusion_exprs_.end()) {
  auto from_exprs = std::move(exprs_it->second);
  per_fusion_exprs_.erase(exprs_it);
  per_fusion_exprs_[to].insert(from_exprs.begin(), from_exprs.end());
}

csrc/fusion.cpp, line 444
LIFO invariant breaks silently under shared containers

The expr-removal loop unconditionally pops from c->exprs_up_.back() and asserts (hard-abort via NVF_ERROR) that the tail entry belongs to this Fusion. The same invariant governs the vals loop (lines 447–466).

For a single-Fusion container this holds, but once two Fusions share a container (the stated goal of Phase 2), another Fusion may register statements into the shared deque after this StatementGuard scope opened. In that case the tail could belong to the other Fusion, NVF_ERROR fires, and the process aborts rather than correctly rolling back only this Fusion's statements.

Effectively, StatementGuard becomes incompatible with concurrent statement creation across Fusions sharing a container. Since this is Phase 2 infrastructure that explicitly targets shared containers, this is worth surfacing early — even if addressed in a later PR in the chain.

One possible approach for future-proofing: instead of a deque tail pop, iterate c->exprs_up_ in reverse and erase only entries owned by this, or collect the set of exprs created during the scope and remove them directly by pointer rather than relying on positional ordering.

## Summary Review fixes for PR #5961 (Per-Fusion statement tracking): - **O(n²) → O(n)**: Optimize `removeStatementsOwnedBy` with `std::erase_if` - **Per-Fusion counts**: Convert `numExprs()`/`numVals()` to return per-Fusion counts instead of global - **StatementGuard fixes**: Snapshot and compare per-Fusion counts for correct LIFO rollback in shared containers - **LIFO assertions**: Verify tail elements belong to this Fusion before popping ## Tests All tests pass: - ✅ StatementGuardTest.ExecuteAfterGuard - ✅ StatementGuardTest.LazySpecialValsNotDangling - ✅ FusionCopy_CUDA - ✅ FusionMove_CUDA

greptile-apps · 2026-03-04T22:10:11Z

Additional Comments (3)

csrc/ir/container.cpp, line 215
Iterator invalidated by insertion before use

per_fusion_vals_[to] performs an insertion when to is not yet a key in the map. Per the C++ standard, any insertion that triggers a rehash invalidates all existing iterators. Since vals_it was obtained before this call, if the insertion causes a rehash, both the subsequent vals_it->second read and the per_fusion_vals_.erase(vals_it) call are undefined behaviour.

The same hazard applies in the per_fusion_exprs_ block a few lines below, where exprs_it is similarly invalidated by per_fusion_exprs_[to].

A safe alternative is to extract the node first, which is both iterator-safe and avoids copying the entire set:

void IrContainer::transferStatementOwnership(
    const Fusion* from,
    const Fusion* to) {
  auto vals_node = per_fusion_vals_.extract(from);
  if (vals_node) {
    auto& to_vals = per_fusion_vals_[to];
    to_vals.insert(
        std::make_move_iterator(vals_node.mapped().begin()),
        std::make_move_iterator(vals_node.mapped().end()));
  }

  auto exprs_node = per_fusion_exprs_.extract(from);
  if (exprs_node) {
    auto& to_exprs = per_fusion_exprs_[to];
    to_exprs.insert(
        std::make_move_iterator(exprs_node.mapped().begin()),
        std::make_move_iterator(exprs_node.mapped().end()));
  }
}

std::unordered_map::extract removes the node without rehashing the table, so no other iterators are invalidated, and the existing set memory is moved rather than copied.

csrc/ir/container.cpp, line 244
Missing Val::uses_ cleanup when removing exprs

removeStatementsCreatedAfter (the StatementGuard rollback path) carefully calls in->removeUse(e) for every input Val before erasing each Expr. removeStatementsOwnedBy (the Fusion::clear() path) does not perform this cleanup.

For single-Fusion containers this is currently safe because every owned Val is also destroyed in the same call, so dangling uses_ entries are never read. However, for shared containers — the stated end-goal of the Phase 2 chain — it is entirely possible that a Val owned by Fusion B references (via its uses_ set) an Expr owned by Fusion A. When Fusion A is cleared, Fusion B's Val retains a now-dangling pointer in its uses_ set, making any subsequent traversal of Val::uses_ unsafe.

Removing exprs first (before destroying vals) and calling in->removeUse(e) for each input would make this safe for the multi-Fusion case and keep the two removal paths consistent with each other.

csrc/fusion.cpp, line 436
operator[] in assertion silently mutates the map

c->per_fusion_exprs_[this] calls the non-const operator[], which inserts a default-constructed empty set if this is not yet a key in per_fusion_exprs_. Logically this should never fire (we only enter the loop body when exprsOwnedBy(this) is non-empty, so this must already be present), but the operator[] call still mutates the map as a side-effect of running the assertion.

The same pattern appears in the val loop on line 448 (c->per_fusion_vals_[this].count(v)).

Prefer a read-only lookup to avoid the accidental mutation:

    NVF_ERROR(
        c->per_fusion_exprs_.count(this) > 0 &&
            c->per_fusion_exprs_.at(this).count(e) > 0,
        "removeStatementsCreatedAfter: tail expr belongs to another Fusion");

## Summary Review fixes for PR #5961 (Per-Fusion statement tracking): - **O(n²) → O(n)**: Optimize `removeStatementsOwnedBy` with `std::erase_if` - **Per-Fusion counts**: Convert `numExprs()`/`numVals()` to return per-Fusion counts instead of global - **StatementGuard fixes**: Snapshot and compare per-Fusion counts for correct LIFO rollback in shared containers - **LIFO assertions**: Verify tail elements belong to this Fusion before popping ## Tests All tests pass: - ✅ StatementGuardTest.ExecuteAfterGuard - ✅ StatementGuardTest.LazySpecialValsNotDangling - ✅ FusionCopy_CUDA - ✅ FusionMove_CUDA

greptile-apps · 2026-03-04T23:17:26Z

Additional Comments (3)

csrc/ir/container.cpp, line 216
Iterator invalidation UB in transferStatementOwnership

per_fusion_vals_[to] (line 205) uses operator[], which inserts a new entry when to is not yet in the map. Inserting into an std::unordered_map may trigger a rehash, which invalidates all iterators — including vals_it obtained on line 203. The subsequent vals_it->second.begin() / .end() call on line 206 is therefore undefined behaviour whenever to was not previously in the map. The same issue affects exprs_it / per_fusion_exprs_[to] on lines 212–214.

Safe fix — extract the from set before touching the map for to:

void IrContainer::transferStatementOwnership(
    const Fusion* from,
    const Fusion* to) {
  auto vals_it = per_fusion_vals_.find(from);
  if (vals_it != per_fusion_vals_.end()) {
    // Move out before any insertion that could rehash
    auto from_set = std::move(vals_it->second);
    per_fusion_vals_.erase(vals_it);
    per_fusion_vals_[to].insert(from_set.begin(), from_set.end());
  }

  auto exprs_it = per_fusion_exprs_.find(from);
  if (exprs_it != per_fusion_exprs_.end()) {
    auto from_set = std::move(exprs_it->second);
    per_fusion_exprs_.erase(exprs_it);
    per_fusion_exprs_[to].insert(from_set.begin(), from_set.end());
  }
}

Alternatively, std::unordered_map::extract can be used — it transfers the node without rehashing, keeping all iterators valid.

csrc/ir/container.cpp, line 244
Missing Val::uses_ cleanup when removing owned expressions

removeStatementsOwnedBy destroys exprs via std::erase_if (which calls unique_ptr destructors) but never removes those exprs from the uses_ lists of their input Vals. Compare with removeStatementsCreatedAfter, which explicitly calls in->removeUse(e) for every input before popping each expr.

For a shared container this is a latent bug: if a Val owned by Fusion B has Fusion A's Expr in its uses_ list, and Fusion::clear() for Fusion A calls removeStatementsOwnedBy, Fusion B's Val is left with a dangling pointer in uses_.

For the current single-Fusion use case all Vals are destroyed in the same call, so the stale uses_ pointers are never read. However, given that the stated goal of Phase 2 is shared containers, the cleanup should mirror removeStatementsCreatedAfter:

std::erase_if(exprs_up_, [&](const std::unique_ptr<Expr>& e) {
  if (owned.count(e.get()) > 0) {
    for (Val* in : e->inputs()) {
      in->removeUse(e.get());
    }
    exprs_.erase(e.get());
    return true;
  }
  return false;
});

csrc/fusion.cpp, line 466
LIFO invariant holds only for single-Fusion containers

Both loops pop from the global exprs_up_ / vals_up_ deque tail and then assert the popped element belongs to this Fusion. This is sound for a single-Fusion container, but the invariant breaks as soon as two Fusions interleave statement registration in a shared container.

Consider:

Guard created for Fusion A  (num_exprs_before = N)
Fusion A registers expr_a   → tail of exprs_up_ is expr_a
Fusion B registers expr_b   → tail of exprs_up_ is expr_b   ← different Fusion!
Guard for A is destroyed:
  exprsOwnedBy(A).size() > N  → true
  exprs_up_.back() == expr_b  → NVF_ERROR fires

The NVF_ERROR will surface the violation rather than silently corrupt state, which is good. But since this invariant is a load-bearing assumption for the StatementGuard rollback mechanism, it is worth recording it explicitly (e.g., a NVF_CHECK(sharingCount() == 1, ...) guard at the top of the function) or documenting that StatementGuard usage must be restricted to single-Fusion containers until a per-Fusion deque is introduced.

mdavis36 · 2026-03-04T23:54:27Z

Critical issues found:

Iterator invalidation UB in IrContainer::transferStatementOwnership: per_fusion_vals_[to] (operator[]) can trigger a rehash, invalidating the vals_it iterator obtained a line earlier; subsequent vals_it->second access is undefined behaviour. Same issue in the exprs branch.

Missing Val::uses_ cleanup in IrContainer::removeStatementsOwnedBy: exprs are destroyed via erase_if without removing themselves from their input Vals' uses_ lists — a latent dangling-pointer bug for shared containers.

LIFO invariant fragility in Fusion::removeStatementsCreatedAfter: rollback pops from the global exprs_up_/vals_up_ tail and asserts ownership; this fails (via NVF_ERROR) if another Fusion appends statements to the shared container after this Fusion's guard scope began. Should add sharingCount() == 1 guard or documentation.

These are pretty much all addressed by changes in later chained PRs. They latter two are specifically addressed in #5983 as this came back to bite me w/ parallel compilation for shared containers across segments.

Add per_fusion_vals_ / per_fusion_exprs_ maps to IrContainer so each Fusion can efficiently query only its own statements in a shared container. Fusion forwarding methods (vals(), unordered_exprs(), deterministic_vals(), etc.) now return per-Fusion filtered results. Fusion::clear() uses removeStatementsOwnedBy(this) instead of ir_container()->clear().

## Summary Review fixes for PR #5961 (Per-Fusion statement tracking): - **O(n²) → O(n)**: Optimize `removeStatementsOwnedBy` with `std::erase_if` - **Per-Fusion counts**: Convert `numExprs()`/`numVals()` to return per-Fusion counts instead of global - **StatementGuard fixes**: Snapshot and compare per-Fusion counts for correct LIFO rollback in shared containers - **LIFO assertions**: Verify tail elements belong to this Fusion before popping ## Tests All tests pass: - ✅ StatementGuardTest.ExecuteAfterGuard - ✅ StatementGuardTest.LazySpecialValsNotDangling - ✅ FusionCopy_CUDA - ✅ FusionMove_CUDA

greptile-apps · 2026-03-05T00:18:19Z

Additional Comments (3)

csrc/ir/container.cpp, line 243
removeStatementsOwnedBy silently skips Val::uses_ cleanup

When removing expressions via removeStatementsOwnedBy, the inputs' uses_ sets are never updated. In contrast, removeStatementsCreatedAfter (the StatementGuard path) explicitly calls in->removeUse(e) for each input:

// removeStatementsCreatedAfter (correct):
for (Val* in : e->inputs()) {
  in->removeUse(e);
}

For the current single-Fusion case (Fusion::clear()), this is safe because all vals owned by this Fusion are destroyed in the same call — no live Val* retains the dangling uses_ entry. However, once shared containers land in a later phase, a Val owned by Fusion B could be an input to an Expr owned by Fusion A. After removeStatementsOwnedBy(A) destroys that expr, Fusion B's val retains a dangling pointer in its uses_ set, which any subsequent val->uses() traversal will dereference.

Consider adding the same cleanup here to make the invariant hold regardless of sharing:

std::erase_if(exprs_up_, [&](const std::unique_ptr<Expr>& e) {
  if (owned.count(e.get()) > 0) {
    for (Val* in : e->inputs()) {
      in->removeUse(e.get());
    }
    exprs_.erase(e.get());
    return true;
  }
  return false;
});

csrc/fusion.cpp, line 443
LIFO invariant breaks silently under shared containers with interleaved allocation

removeStatementsCreatedAfter pops from the back of the global exprs_up_ deque and then asserts the popped element is owned by this:

Expr* e = c->exprs_up_.back().get();
NVF_ERROR(
    c->per_fusion_exprs_[this].count(e) > 0,
    "removeStatementsCreatedAfter: tail expr belongs to another Fusion");

This is safe today (single-Fusion containers) but fails in the target Phase 2 multi-Fusion scenario. Consider Fusion A and B sharing a container where StatementGuard is nested or interleaved:

exprs_up_ (global tail): [..., A1, B1, A2]

When A's guard destructs, it checks exprs_up_.back() → A2 ✓. But if B's guard destructs first, it sees A2 at the back → NVF_ERROR fires. There is no safe ordering guarantee once allocations interleave.

The NVF_ERROR prevents silent corruption, which is good, but it means StatementGuard cannot be used safely from two Fusions sharing a container. The same issue applies to the vals_up_ loop. A comment calling this out explicitly (or a NVF_ERROR early in the function checking !ir_container()->hasMultipleFusions()) would help future readers understand the precondition before relaxing it.

csrc/fusion.h, line 570
numValsExcludingShortcuts misses typed shortcut vals (zeroVal(dtype), oneVal(dtype))

The function subtracts only the 5 named shortcut fields (zero_val_, one_val_, true_val_, false_val_, magic_zero_val_). Typed shortcuts created via zeroVal(DataType) and oneVal(DataType) are stored separately (e.g., in managed_data_) and are also registered in per_fusion_vals_ — but they are not excluded from this count.

If a typed shortcut val (e.g., zeroVal(DataType::Float)) is lazily created inside a StatementGuard scope, it will be appended to the back of vals_up_ and included in numValsExcludingShortcuts(). The removeStatementsCreatedAfter loop will then pop and destroy it, leaving managed_data_ holding a dangling Val* for the rest of the Fusion's lifetime. Any subsequent use of that typed constant (e.g., during lowering) would dereference freed memory.

The existing axioms() call in the StatementGuard constructor is the pattern used to pre-initialize lazy state before the guard captures the baseline. Consider extending that pattern to also pre-initialize any typed shortcut vals that callers are likely to create inside the guard, or document the precondition that typed shortcut vals must not be first-created within a StatementGuard scope.

mdavis36 · 2026-03-05T01:01:48Z

!test

mdavis36 changed the base branch from main to md/phase2-shared-ptr February 12, 2026 22:08

mdavis36 force-pushed the md/phase2-shared-ptr branch from 3a199c8 to 53e5045 Compare February 12, 2026 22:09

mdavis36 changed the title ~~[IR Container] Phase 2 Per-fusion statement tracking~~ [IR Container] Phase 2.4 Per-fusion statement tracking Feb 18, 2026

mdavis36 force-pushed the md/phase2-per-fusion branch from 33629cb to 8b162d9 Compare February 18, 2026 03:13

mdavis36 force-pushed the md/phase2-shared-ptr branch from 53e5045 to f8ff364 Compare February 18, 2026 03:13

This was referenced Feb 18, 2026

[IR Container] Phase 2.6 Concurrency & Thread Safety #5971

Open

[IR Container] Phase 2 IR Container Refactor #5975

Draft

mdavis36 marked this pull request as ready for review February 18, 2026 06:37

greptile-apps bot reviewed Feb 18, 2026

View reviewed changes

csrc/ir/container.cpp Show resolved Hide resolved

csrc/ir/container.cpp Show resolved Hide resolved

csrc/fusion.cpp Show resolved Hide resolved

mdavis36 force-pushed the md/phase2-per-fusion branch from 8b162d9 to b8d202d Compare February 26, 2026 00:29

mdavis36 mentioned this pull request Feb 26, 2026

PR #5961 Review Fixes — Per-Fusion Statement Tracking #6015

Merged

mdavis36 force-pushed the md/phase2-shared-ptr branch from c965408 to 5339bf5 Compare March 3, 2026 02:51

mdavis36 force-pushed the md/phase2-per-fusion branch from bc595c5 to 9f944b4 Compare March 3, 2026 02:51

mdavis36 force-pushed the md/phase2-shared-ptr branch from 5339bf5 to 24e4f69 Compare March 4, 2026 21:49

mdavis36 force-pushed the md/phase2-per-fusion branch from 9f944b4 to 7b3ce1f Compare March 4, 2026 22:04

mdavis36 force-pushed the md/phase2-shared-ptr branch from 24e4f69 to fab1748 Compare March 4, 2026 23:09

mdavis36 force-pushed the md/phase2-per-fusion branch from 7b3ce1f to 2e491e9 Compare March 4, 2026 23:09

mdavis36 force-pushed the md/phase2-shared-ptr branch from fab1748 to dcc99d7 Compare March 5, 2026 00:07

mdavis36 added 2 commits March 4, 2026 16:07

mdavis36 force-pushed the md/phase2-per-fusion branch from 2e491e9 to 50cb886 Compare March 5, 2026 00:07

Conversation

mdavis36 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Relationship to Phase 2

CI Risk

Uh oh!

github-actions bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Uh oh!

mdavis36 commented Feb 12, 2026

Uh oh!

mdavis36 commented Feb 18, 2026

Uh oh!

greptile-apps bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Feb 18, 2026

Uh oh!

mdavis36 commented Feb 26, 2026

Uh oh!

greptile-apps bot commented Mar 3, 2026

Uh oh!

greptile-apps bot commented Mar 4, 2026

Uh oh!

greptile-apps bot commented Mar 4, 2026

Uh oh!

mdavis36 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 5, 2026

Uh oh!

mdavis36 commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mdavis36 commented Feb 12, 2026 •

edited

Loading

github-actions bot commented Feb 12, 2026 •

edited

Loading

greptile-apps bot commented Feb 18, 2026 •

edited

Loading

mdavis36 commented Mar 4, 2026 •

edited

Loading