Skip to content

[IR Container] Phase 2.5 Copy-Move Semantics#5964

Open
mdavis36 wants to merge 10 commits intomd/phase2-per-fusionfrom
md/phase2-copy-move
Open

[IR Container] Phase 2.5 Copy-Move Semantics#5964
mdavis36 wants to merge 10 commits intomd/phase2-per-fusionfrom
md/phase2-copy-move

Conversation

@mdavis36
Copy link
Collaborator

@mdavis36 mdavis36 commented Feb 12, 2026

Summary

Implement shared-container-aware copy, move, and swap operations, plus per-Fusion name counters that ensure cloned Vals get matching names. This PR combines the originally planned Tasks 3 and 4 — per-Fusion name counters were required to fix CI failures from the copy implementation (553 failures from duplicate TV names when name counter synchronization was missing).

Changes

Copy semantics:

  • Copy constructor & assignment op: Share container pointer via shared_ptr, register with container, delegate to Fusion::copy
  • Fusion::copy: Clear destination, create IrCloner targeting dest, clone source's deterministic_vals into shared container, clone Fusion-level state (inputs, outputs, axioms, metadata)

Move semantics:

  • Move constructor: Create empty Fusion, swap with source
  • Move assignment: Clear, swap

Swap:

  • Ownership-filtered pointer swap handling three distinct cases:
    1. Two Fusions with different containers
    2. Two Fusions sharing the same container
    3. Swap with third-party Fusions sharing a container

Per-Fusion name counters:

  • val_type_name_map_ and expr_name_counter_ added as Fusion members
  • getValName(ValType) and getExprName() methods on Fusion
  • Counter lifecycle: sync in copy, swap in swap, reset in clear

Copy Semantics in Detail

BEFORE:
  Fusion A ──→ shared_ptr<Container C> ──→ {val_0(A), val_1(A), expr_0(A)}
  Container C: sharing_fusions_ = {A}

COPY: Fusion B(A)   // copy constructor

AFTER:
  Fusion A ─┐
             ├──→ shared_ptr<Container C> ──→ {val_0(A), val_1(A), expr_0(A),
  Fusion B ─┘                                  val_0'(B), val_1'(B), expr_0'(B)}
  Container C: sharing_fusions_ = {A, B}

  // B's clones have matching names: val_0'->name() == val_0->name()
  // IR graphs are independent: modifying B's clone doesn't affect A

The copy constructor shares the container (increments shared_ptr refcount), then clones A's nodes into the same shared storage. Per-Fusion tracking ensures each Fusion's accessors still return only their own nodes.

Swap: Three Cases

Case 1: Different containers
  BEFORE:  A ──→ C1 ──→ {val_0(A)}     B ──→ C2 ──→ {val_1(B)}
  AFTER:   A ──→ C2 ──→ {val_1(A)}     B ──→ C1 ──→ {val_0(B)}
  Statement pointers updated: val_0→B, val_1→A

Case 2: Same container
  BEFORE:  A ─┐                         B ─┐
              ├──→ C ──→ {val_0(A), val_1(B)}
  AFTER:   A ─┐                         B ─┐
              ├──→ C ──→ {val_0(B), val_1(A)}
  Container pointer swap is a no-op; ownership flips.

Case 3: Third-party sharing
  BEFORE:  A ─┐
              ├──→ C1 ──→ {val_0(A), val_2(X)}     B ──→ C2 ──→ {val_1(B)}
         X ─┘
  AFTER:   A ──→ C2 ──→ {val_1(A)}
          B ─┐
              ├──→ C1 ──→ {val_0(B), val_2(X)}
         X ─┘
  Critical: X's statements are NEVER modified.

Why Name Counters Were Merged Into This PR

The initial implementation of Fusion::copy replaced the old IrContainer::copy with direct IrCloner-based cloning but dropped name counter synchronization. Without per-Fusion counters, cloned Vals in a shared container received names starting past the source's last name (e.g., T10–T19 instead of T0–T9), breaking alias_memory.cpp (duplicate tv->name() assertions) and cascading into 553 CI failures across codegen, validation, and numerical checks.

The fix — per-Fusion name counters as Fusion members — is architecturally cleaner than the originally planned IrContainer-level maps, avoids indirection, and aligns with the per-Fusion state model established in earlier tasks.

Relationship to Phase 2

Copy/move/swap are the operations that make shared containers usable. Without them, the shared_ptr and tracking infrastructure from PRs 1–2 are inert. This PR enables the core Phase 2 scenario:

SegmentedFusion::makeFusion (Phase 2 — separate containers):
  auto fusion_segment = make_unique<Fusion>();     // New container
  Fusion::copy(completeFusion(), fusion_segment);  // Clone into separate container

SegmentedFusion::makeFusion (Phase 3 — shared containers):
  auto fusion_segment = make_unique<Fusion>(*completeFusion());  // Copy ctor → shared!
  // Scalars reused, non-scalars cloned into shared container

Phase 2 establishes the copy/move/swap mechanics. Phase 3 simply changes makeFusion from default-ctor + Fusion::copy to copy-ctor (shared container), and the infrastructure from this PR handles everything correctly.

Per-Fusion name counters are critical for cross-clone name correspondence required by GreedyParams::at(tv->name()) and normalization_utils — both of which look up Vals by name as a map key across clone boundaries.

CI Risk

Medium. Copy/move/swap are well-defined operations with clear semantics. The 553-failure CI regression from missing name counters was identified and fixed before merge.

@mdavis36
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Feb 12, 2026

Review updated until commit d145d9e

Description

  • Implement shared-container-aware copy/move/swap operations where copy constructor shares source's container pointer instead of creating new one

  • Add per-Fusion name counters (val_type_name_map_, expr_name_counter_) to fix duplicate TV names after copy when source names are non-sequential

  • Rewrite Fusion::swap to use pointer-based swap with ownership tracking for same-container, different-container, and third-party cases

  • Update registerVal/registerExpr to use Fusion-level name counters instead of container-level ones

Changes walkthrough

Relevant files
Enhancement
fusion.cpp
Implement copy-move-swap with shared containers                   

csrc/fusion.cpp

  • Copy constructor now shares source's container pointer via shared_ptr
    and registers with container
  • Fusion::copy clones from deterministic_vals() directly instead of
    delegating to IrContainer::copy
  • Sync name counters from source to dest after cloning in Fusion::copy
  • Rewrite Fusion::swap to collect owned vals/exprs before swap, handle
    same-container vs different-container cases, swap container pointers
    and all Fusion-level members
  • Move constructor creates empty Fusion then swaps; move assignment
    clears then swaps
  • Add self-assignment guards in copy/move assignment operators
  • Remove noexcept from swap and move operations (can allocate vectors)
  • Clear name counters in Fusion::clear()
  • +121/-51
    fusion.h
    Add per-Fusion name counter members and methods                   

    csrc/fusion.h

  • Add val_type_name_map_ (unordered_map) and
    expr_name_counter_ as Fusion members
  • Add getValName(ValType) and getExprName() methods to Fusion
  • Remove noexcept from move constructor and move assignment declarations
  • Update swap declaration to remove noexcept
  • +18/-3   

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review
    Complex swap logic

    The Fusion::swap function handles three distinct cases: different containers, same container, and third-party sharing a container. While the logic appears sound, the complexity increases the risk of edge case bugs. The per-Fusion tracking key swapping at lines 185-191 and ownership transfer at lines 189-190 are particularly intricate. Consider adding more comprehensive unit tests for swap operations.

    void Fusion::swap(Fusion& a, Fusion& b) {
      FUSER_PERF_SCOPE("Fusion swap");
    
      if (&a == &b) {
        return;
      }
    
      NVF_ERROR(
          a.ir_container_ != nullptr, "Fusion::swap: a has null ir_container_");
      NVF_ERROR(
          b.ir_container_ != nullptr, "Fusion::swap: b has null ir_container_");
    
      // Collect statements owned by each Fusion BEFORE swap so we can update
      // Statement::ir_container_ pointers afterward.
      std::vector<Val*> a_owned_vals, b_owned_vals;
      std::vector<Expr*> a_owned_exprs, b_owned_exprs;
    
      const auto& av = a.ir_container_->valsOwnedBy(&a);
      const auto& ae = a.ir_container_->exprsOwnedBy(&a);
      a_owned_vals.assign(av.begin(), av.end());
      a_owned_exprs.assign(ae.begin(), ae.end());
    
      const auto& bv = b.ir_container_->valsOwnedBy(&b);
      const auto& be = b.ir_container_->exprsOwnedBy(&b);
      b_owned_vals.assign(bv.begin(), bv.end());
      b_owned_exprs.assign(be.begin(), be.end());
    
      // Transfer Fusion registrations between containers before pointer swap.
      // After swap, a will own b's container and b will own a's container.
      if (a.ir_container_.get() != b.ir_container_.get()) {
        a.ir_container_->transferFusion(&a, &b);
        b.ir_container_->transferFusion(&b, &a);
      }
    
      // Swap container pointers
      std::swap(a.ir_container_, b.ir_container_);
    
      // Swap all Fusion-level members
      std::swap(a.inputs_, b.inputs_);
      std::swap(a.outputs_, b.outputs_);
      std::swap(a.io_alias_, b.io_alias_);
      std::swap(a.all_tv_uses_valid_, b.all_tv_uses_valid_);
      std::swap(a.is_during_update_uses_, b.is_during_update_uses_);
      std::swap(a.managed_data_, b.managed_data_);
      std::swap(a.managed_named_data_, b.managed_named_data_);
      std::swap(a.expected_dynamic_smem_bytes_, b.expected_dynamic_smem_bytes_);
      std::swap(a.all_tvs_ptr_, b.all_tvs_ptr_);
      std::swap(a.zero_val_, b.zero_val_);
      std::swap(a.one_val_, b.one_val_);
      std::swap(a.true_val_, b.true_val_);
      std::swap(a.false_val_, b.false_val_);
      std::swap(a.magic_zero_val_, b.magic_zero_val_);
      std::swap(a.axioms_, b.axioms_);
      std::swap(a.metadata_, b.metadata_);
      std::swap(a.val_type_name_map_, b.val_type_name_map_);
      std::swap(a.expr_name_counter_, b.expr_name_counter_);
    
      // Update Statement::ir_container_ pointers: a's old statements now belong
      // to b, and b's old statements now belong to a
      for (auto* val : a_owned_vals) {
        val->ir_container_ = &b;
      }
      for (auto* expr : a_owned_exprs) {
        expr->ir_container_ = &b;
      }
      for (auto* val : b_owned_vals) {
        val->ir_container_ = &a;
      }
      for (auto* expr : b_owned_exprs) {
        expr->ir_container_ = &a;
      }
    
      // Update per-Fusion tracking keys in containers. At this point, both
      // a and b are guaranteed to have non-null ir_container_ (verified above).
      if (a.ir_container_.get() == b.ir_container_.get()) {
        // Same container: directly swap per-Fusion tracking entries
        auto* c = a.ir_container_.get();
        std::swap(c->per_fusion_vals_[&a], c->per_fusion_vals_[&b]);
        std::swap(c->per_fusion_exprs_[&a], c->per_fusion_exprs_[&b]);
      } else {
        // Different containers: rename tracking keys to match new owners
        a.ir_container_->transferStatementOwnership(&b, &a);
        b.ir_container_->transferStatementOwnership(&a, &b);
      }
    }
    Container sharing semantics

    The copy constructor shares the source's container via ir_container_ = other.ir_container_. This is intentional per the PR goals (shared-container-aware copy), but consumers need to be aware that modifications to one Fusion can affect the shared container state. Ensure this behavior is well-documented for API users.

    Fusion::Fusion(const Fusion& other) : ir_container_(other.ir_container_) {
      FUSER_PERF_SCOPE("Fusion copy");
      ir_container_->addFusion(this);
      Fusion::copy(&other, this);
    }
    Non-noexcept move operations

    Move constructor (line 329) and move assignment (line 347) are not marked noexcept with explicit justification. This could impact performance when Fusions are moved into standard library containers. Verify this trade-off is acceptable for expected use cases, as the comments indicate.

    // Not marked noexcept: Fusion::swap allocates local std::vectors to collect
    // statement ownership before the swap, which can throw. Since Fusions are not
    // expected to be moved into containers, the performance trade-off is
    // acceptable.
    // NOLINTNEXTLINE(cppcoreguidelines-noexcept-move-operations)
    Fusion::Fusion(Fusion&& other) : Fusion() {
      FUSER_PERF_SCOPE("Fusion move");
      swap(*this, other);
    }
    
    // Copy Assignment -- shares the source's container
    Fusion& Fusion::operator=(const Fusion& other) {
      FUSER_PERF_SCOPE("Fusion copy assign");
      if (this != &other) {
        Fusion copy(other);
        clear();
        swap(*this, copy);
      }
      return *this;
    }
    
    // Not marked noexcept: See move constructor above.
    // NOLINTNEXTLINE(cppcoreguidelines-noexcept-move-operations)
    Fusion& Fusion::operator=(Fusion&& other) {
      FUSER_PERF_SCOPE("Fusion move assign");
      if (this != &other) {
        clear();
        swap(*this, other);
      }
      return *this;
    }

    @mdavis36
    Copy link
    Collaborator Author

    !test

    @mdavis36 mdavis36 changed the title [IR Container] Phase 2 Copy-Move Semantics [IR Container] Phase 2.5 Copy-Move Semantics Feb 18, 2026
    @mdavis36 mdavis36 force-pushed the md/phase2-copy-move branch from 192fd55 to 35b7405 Compare February 18, 2026 03:13
    @mdavis36 mdavis36 force-pushed the md/phase2-per-fusion branch from 33629cb to 8b162d9 Compare February 18, 2026 03:13
    @mdavis36
    Copy link
    Collaborator Author

    !test

    @mdavis36 mdavis36 marked this pull request as ready for review February 18, 2026 06:37
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 18, 2026

    Greptile Summary

    This PR implements shared-container-aware copy, move, and swap semantics for Fusion, along with per-Fusion name counters (val_type_name_map_ / expr_name_counter_) that ensure cloned Vals receive matching names regardless of container ownership. The three-case swap logic (different containers, same container, third-party sharing) is well-designed and correctly handles sharing_fusions_ bookkeeping via transferFusion and transferStatementOwnership.

    The one remaining concern: the copy assignment operator calls clear() explicitly before swap, which violates the standard copy-and-swap idiom's exception guarantee. If swap throws (due to std::bad_alloc in its vector allocations), *this is left permanently cleared rather than preserving its original state. Removing the explicit clear() call restores proper exception safety.

    Confidence Score: 3/5

    • Safe to merge with awareness of the copy-assignment exception safety gap; all other semantics are correctly implemented.
    • The three-case swap logic, counter synchronization, and shared-container registration are all correctly implemented and well-tested (553 CI failures were caught and fixed before merge). The score is reduced from 5 because the copy assignment operator's explicit clear() before swap violates the basic exception guarantee: if swap throws (due to std::bad_alloc in its local vector allocations), *this is left permanently cleared rather than preserving its original state. This is a real, if low-probability, correctness risk for callers that catch exceptions.
    • csrc/fusion.cpp — specifically operator=(const Fusion&) at line 335, where the explicit clear() before swap should be removed to restore the standard idiom's exception safety.

    Sequence Diagram

    sequenceDiagram
        participant Caller
        participant FusionB as Fusion B (copy ctor)
        participant Container as shared IrContainer C
        participant IrCloner
        participant FusionA as Fusion A (source)
    
        Caller->>FusionB: Fusion B(A)  [copy constructor]
        FusionB->>Container: share A.ir_container_ (shared_ptr copy)
        FusionB->>Container: addFusion(&B)
        FusionB->>FusionB: Fusion::copy(&A, &B)
        FusionB->>FusionB: clear() → removeStatementsOwnedBy(&B) [no-op]
        FusionB->>IrCloner: IrCloner ir_cloner(B)
        loop clone vals in insertion order
            IrCloner->>Container: clone(val_i) → registerVal(&B) → getValName(vtype)
            IrCloner->>IrCloner: setName(src->name()) overrides counter
        end
        loop wire definitions/uses
            IrCloner->>Container: clone(expr_j) → registerExpr(&B) → getExprName()
            IrCloner->>FusionB: setDefinition / setUses
        end
        FusionB->>FusionB: sync val_type_name_map_ = A.val_type_name_map_
        FusionB->>FusionB: sync expr_name_counter_ = A.expr_name_counter_
        FusionB->>FusionB: remap inputs_, outputs_, io_alias_, axioms_, metadata_
    
        note over Container: C now holds A's vals (keyed &A) AND B's clones (keyed &B)
        note over FusionB: B.val_type_name_map_ matches A → new TVs start at max(name)+1
    
    Loading

    Last reviewed commit: c8ffe1d

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    4 files reviewed, 3 comments

    Edit Code Review Agent Settings | Greptile

    @mdavis36
    Copy link
    Collaborator Author

    !test

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Mar 3, 2026

    Additional Comments (1)

    csrc/fusion.cpp, line 221
    expr_name_counter_ sync placed before expr cloning

    The counter sync at lines 220–221 sets to->expr_name_counter_ = from->expr_name_counter_ before any exprs have been cloned. Expr cloning happens in the immediately following def/uses loop (lines 224–227): each ir_cloner.clone(val->definition_) call creates a cloned Expr and triggers registerExpr, which increments to->expr_name_counter_. As a result, after Fusion::copy completes, to->expr_name_counter_ ends up at from->expr_name_counter_ + N (where N is the number of exprs in from), rather than the intended from->expr_name_counter_.

    Compare with val counters: the val counter sync is correctly placed after the val-cloning loop so that registerVal calls during cloning don't pollute the final counter state. The same logic needs to apply to expr_name_counter_.

    While this doesn't cause immediate name collisions (the overcounted range is unused by any actual expr), it creates unnecessary name-space gaps and violates the invariant that a cloned Fusion's counter should match the source's counter — which is the premise relied on by GreedyParams::at(tv->name()) and similar consumers.

      // Wire up definitions and uses on cloned vals
      for (auto val : from->vals()) {
        ir_cloner.clone(val)->setDefinition(ir_cloner.clone(val->definition_));
        ir_cloner.clone(val)->setUses(ir_cloner.clone(val->uses_));
      }
    
      // Sync per-Fusion name counters from source to dest.
      // This must happen AFTER all cloning (both vals and exprs) so that the
      // temporary sequential names assigned by registerVal/registerExpr during
      // cloning do not inflate the counters past the source's values.
      to->val_type_name_map_ = from->val_type_name_map_;
      to->expr_name_counter_ = from->expr_name_counter_;
    

    @mdavis36
    Copy link
    Collaborator Author

    mdavis36 commented Mar 3, 2026

    !test

    @mdavis36 mdavis36 force-pushed the md/phase2-per-fusion branch from bc595c5 to 9f944b4 Compare March 3, 2026 02:51
    @mdavis36 mdavis36 force-pushed the md/phase2-copy-move branch from a9c62ea to 46080be Compare March 3, 2026 02:51
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Mar 3, 2026

    Additional Comments (2)

    csrc/fusion.cpp, line 212
    Non-deterministic expr clone ordering in second loop

    The second loop uses from->vals() — an unordered_set — to drive lazy expr cloning via ir_cloner.clone(val->definition_). Because hash-map iteration order is non-deterministic across runs, the order in which exprs are first encountered (and thus registered into exprs_up_) is unpredictable.

    The old IrContainer::copy explicitly cloned exprs in deterministic_exprs() order:

    // Copy expressions in deterministic order
    for (auto expr : from->deterministic_exprs()) {
        to->exprs_.insert(ir_cloner.clone(expr));
    }

    With the new code, the destination fusion's deterministic_exprs() may return exprs in a different insertion-order than from->deterministic_exprs(), since exprs_up_ insertion order is driven by whichever val happens to trigger each expr's first clone. Any downstream consumer of deterministic_exprs() that expects a stable ordering (e.g. serialisation, printing, diff-based testing) may break non-deterministically.

    The first loop already uses from->deterministic_vals() for reproducibility — the second loop should match:

      for (auto val : from->deterministic_vals()) {
        ir_cloner.clone(val)->setDefinition(ir_cloner.clone(val->definition_));
        ir_cloner.clone(val)->setUses(ir_cloner.clone(val->uses_));
      }
    

    csrc/fusion.cpp, line 324
    noexcept move constructor delegates to non-noexcept swap

    The move constructor is declared noexcept (both in fusion.h line 183 and in the definition here), but Fusion::swap was changed in this PR from noexcept to a regular (potentially-throwing) function. swap now performs multiple allocating operations (vector construction, unordered_map::operator[], unordered_set::insert) that can throw std::bad_alloc.

    Calling a potentially-throwing function from a noexcept context means any exception will immediately invoke std::terminate rather than propagate to the caller. The same issue applies to operator=(Fusion&&) on line 334 which is also noexcept and calls swap.

    The previous discussion noted that swap was problematic when it was noexcept — but now that noexcept has been removed from swap, the move operations need to be updated consistently:

    // In fusion.h:
    Fusion(Fusion&& other);                        // remove noexcept
    Fusion& operator=(Fusion&& other);             // remove noexcept
    
    // In fusion.cpp:
    Fusion::Fusion(Fusion&& other) : Fusion() {   // remove noexcept
      FUSER_PERF_SCOPE("Fusion move");
      swap(*this, other);
    }
    
    Fusion& Fusion::operator=(Fusion&& other) {   // remove noexcept
      ...
    }

    Note: removing noexcept from move operations does affect STL container optimisations (e.g. std::vector will copy instead of move if the move constructor is not noexcept). That may be an acceptable trade-off given Fusion objects are not typically stored in vectors, but worth being explicit about.

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Mar 3, 2026

    Additional Comments (3)

    csrc/fusion.cpp, line 333
    Copy assignment silently inherits shared container

    The copy assignment operator now produces a destination Fusion that shares other's IrContainer, whereas the old code left the destination with its own independent container. This is an indirect consequence of the copy constructor change: Fusion copy(other) now calls the new copy constructor (which does ir_container_(other.ir_container_)), and after the swap, *this ends up owning that shared container.

    Consider the following scenario:

    Fusion a;
    // populate a...
    Fusion b;
    b = a;
    // b now shares a's IrContainer — NOT isolated!
    a.clear();  // also removes statements from the container b depends on

    Before this PR, b = a gave b a fully independent container. After this PR, b shares a's container. The PR description explicitly documents this only for the copy constructor; the copy assignment operator is not mentioned. If this is intentional, a comment documenting the shared-container semantics of operator= would prevent future surprises.


    csrc/fusion.cpp, line 184
    Same-container swap inserts spurious empty entries via operator[]

    In the same-container path, operator[] on per_fusion_vals_ and per_fusion_exprs_ default-constructs an empty unordered_set when a key doesn't exist (e.g., one of the Fusions has never registered any vals):

    std::swap(c->per_fusion_vals_[&a], c->per_fusion_vals_[&b]);
    std::swap(c->per_fusion_exprs_[&a], c->per_fusion_exprs_[&b]);

    If &a or &b had no registered vals/exprs, a new empty-set entry is inserted for that key. All current consumers (valsOwnedBy, deterministicValsOwnedBy, etc.) handle empty sets correctly, so there's no observable correctness issue. However, the spurious entries accumulate over time and could mislead future code that inspects the keys of per_fusion_vals_ to determine which Fusions are registered.

    Consider using find first and only inserting/swapping entries when they actually exist, mirroring the defensive style in transferStatementOwnership.


    csrc/fusion.cpp, line 213
    Name override in IrBuilder::clone is skipped when src_container == dest_container (raw pointer comparison)

    In IrBuilder::clone (fusion.h line 731):

    if (src_container != dest_container) {
        dest_stmt->setName(IrBuilderPasskey(dest_container), src_stmt->name());
    }

    src_container is src_stmt->container() — the Fusion* that owns the source val — and dest_container is ir_cloner->container() — the destination Fusion*. In Fusion::copy, the IrCloner is constructed as IrCloner ir_cloner(to), so dest_container = to. Because from != to as Fusion objects (even when they share the same IrContainer), the condition is true and setName is correctly invoked, preserving name matching.

    However, this invariant is fragile: it relies on the fact that Statement::ir_container_ stores a Fusion* (not the raw IrContainer*). If any path sets val->ir_container_ to the shared IrContainer* instead of the owning Fusion*, names would silently stop matching without any assertion failure.

    A targeted NVF_CHECK or static assertion confirming that src_stmt->container() returns the owning Fusion (not the shared IrContainer) would make this assumption explicit and catch regressions.

    @mdavis36
    Copy link
    Collaborator Author

    mdavis36 commented Mar 3, 2026

    !test

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Mar 3, 2026

    Additional Comments (1)

    csrc/fusion.cpp, line 334
    Missing self-assignment guard in copy assignment operator

    The copy assignment operator is missing the self-assignment check that the move assignment operator already has (if (this != &other) at line 338). Without it, f = f is destructive:

    1. Fusion copy(f) — creates a copy sharing f's container; clones all vals into the same shared container under a temporary Fusion key
    2. clear() — deletes f's original vals from the shared container (physical unique_ptr in vals_up_ is released)
    3. swap(*this, copy) — f receives the clones; original pointers are gone

    Any external Val* / Expr* raw pointers held by callers (e.g., cached scheduler state, Statement::ir_container_ back-pointers) now point to freed memory. This mirrors the protection already in the move assignment operator.

    Fusion& Fusion::operator=(const Fusion& other) {
      FUSER_PERF_SCOPE("Fusion copy assign");
      if (this == &other) {
        return *this;
      }
      Fusion copy(other);
      clear();
      swap(*this, copy);
      return *this;
    }
    

    @mdavis36
    Copy link
    Collaborator Author

    mdavis36 commented Mar 3, 2026

    !test

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Mar 3, 2026

    Additional Comments (3)

    csrc/fusion.cpp, line 184
    Spurious empty-entry insertion via operator[] in same-container swap

    std::unordered_map::operator[] default-inserts a value (an empty unordered_set) when the key is absent. In the same-container swap path, if one of the Fusions was previously clear()-ed its entry will have been removed from per_fusion_vals_ by removeStatementsOwnedBy, so the operator[] call creates a new, empty set for it. After the std::swap, the formerly-cleared Fusion ends up with an explicit empty entry instead of no entry at all.

    While functionally harmless today, this inconsistency can confuse future callers of valsOwnedBy() that distinguish between "has an entry with an empty set" and "has no entry". Consider using find+insert to avoid the implicit insertion:

    auto& a_vals = c->per_fusion_vals_;
    auto& a_exprs = c->per_fusion_exprs_;
    auto it_av = a_vals.find(&a), it_bv = a_vals.find(&b);
    auto it_ae = a_exprs.find(&a), it_be = a_exprs.find(&b);
    if (it_av != a_vals.end() || it_bv != a_vals.end())
      std::swap(a_vals[&a], a_vals[&b]);
    // same for exprs

    csrc/fusion.cpp, line 213
    Fusion::copy no longer calls IrContainer::copy, leaving it as dead code

    Fusion::copy previously delegated to IrContainer::copy (the old first line was auto ir_cloner = IrContainer::copy(from->ir_container(), to->ir_container(), to)). This PR replaces that with an inline clone loop, which is correct — but it leaves IrContainer::copy (defined in container.cpp lines 88–114) with zero callers. Since IrContainer's copy/move constructors/operators are explicitly deleted, IrContainer::copy is now unreachable protected dead code.

    Similarly, IrContainer::swap (container.cpp lines 71–86) was previously called by the old Fusion::swap but the new Fusion::swap does not call it — leaving it as an additional dead method.

    Most importantly, both dead methods still manipulate the container-level name counters (val_type_name_map_ and expr_name_counter_ on IrContainer). Since Fusion::registerVal and Fusion::registerExpr now call Fusion::getValName/getExprName (the per-Fusion counters introduced in this PR) instead of the container-level equivalents, the container-level counters are never incremented in the new flow. They are permanently empty/zero, making IrContainer::getValName() and IrContainer::getExprName() return incorrect values if ever called in the future.

    Consider removing IrContainer::copy, IrContainer::swap, IrContainer::getValName, IrContainer::getExprName, and the fields val_type_name_map_/expr_name_counter_ from IrContainer as part of this PR to prevent future confusion and potential misuse.


    csrc/fusion.cpp, line 335
    Copy-assignment leaves this sharing other's container even when this had an independent container

    The sequence:

    Fusion copy(other);   // copy ctor → copy now shares other->ir_container_
    clear();              // clears this's statements from its original container
    swap(*this, copy);    // this ends up sharing other's container

    After this operation *this permanently shares other's IrContainer. That is by design per the PR description ("Copy semantics: share container pointer via shared_ptr"), but it creates a subtle lifetime asymmetry: this's original container is now held only by the temporary copy. When copy destructs, it calls copy.~Fusion()removeFusion(&copy) → the original container's refcount drops to zero and it is destroyed.

    This is correct for the current ownership model, but it means that after a = b, a and b now share a container. A subsequent a.clear() will call removeStatementsOwnedBy(&a) on the shared container — and only a's statements are removed, not b's. That is the intended semantics, but it may surprise callers who expect copy-assignment to be self-contained (e.g., existing code that calls Fusion::copy then makes independent modifications to both fusions would now inadvertently share storage). A comment on the declaration in fusion.h (near the copy-assignment signature) calling out that the assignment shares the source's container would prevent confusion.

    @mdavis36 mdavis36 force-pushed the md/phase2-per-fusion branch from 9f944b4 to 7b3ce1f Compare March 4, 2026 22:04
    @mdavis36 mdavis36 force-pushed the md/phase2-copy-move branch from e5256e7 to 058a980 Compare March 4, 2026 23:09
    @mdavis36 mdavis36 force-pushed the md/phase2-per-fusion branch from 7b3ce1f to 2e491e9 Compare March 4, 2026 23:09
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Mar 4, 2026

    Additional Comments (3)

    csrc/ir/container.cpp, line 243
    removeStatementsOwnedBy doesn't clean up use-def edges before destroying statements

    When a Fusion is cleared in a shared container, the destroyed Exprs are not removed from their input Vals' uses_ lists before the unique pointers are released. If an Expr owned by Fusion A takes a Val owned by Fusion B as input, destroying Fusion A's Expr leaves B's Val::uses_ with a dangling pointer to the destroyed expression.

    Compare with removeStatementsCreatedAfter, which explicitly cleans up uses before destruction:

    for (Val* in : e->inputs()) {
      in->removeUse(e);
    }

    This is not a risk for today's independent-clone graphs (Phase 2 copy semantics keep IR graphs separate), but the PR description explicitly calls out Phase 3 scalar reuse where Vals in one Fusion ARE referenced by Exprs in another Fusion. At that point, clearing a Fusion via Fusion::clear()removeStatementsOwnedBy will corrupt the surviving Fusion's IR graph with dangling Expr* pointers in Val::uses_.

    Consider at minimum adding a comment warning about this limitation, or adding edge cleanup before the erase_if loops:

    // For each expr owned by fusion, remove it from its inputs' uses_ lists
    // to avoid dangling pointers in other Fusions' Vals
    if (exprs_it != per_fusion_exprs_.end()) {
      for (Expr* e : exprs_it->second) {
        for (Val* inp : e->inputs()) {
          inp->removeUse(e);
        }
      }
    }
    // ... then erase_if loop

    csrc/fusion.cpp, line 504
    LIFO invariant is fragile in a shared-container setting

    removeStatementsCreatedAfter pops from the back of the shared exprs_up_ and vals_up_ deques under the assumption that this Fusion's most recently created statements are always at the tail ("LIFO invariant"). In a shared container where two Fusions can interleave statement creation, this invariant can silently break.

    Concrete scenario: StatementGuard is entered for FusionA (records count N), then FusionA creates expr e5, then FusionB creates expr e0 into the same shared deque. The global deque tail is now e0 (owned by FusionB). When the guard destructs and calls removeStatementsCreatedAfter:

    c->exprsOwnedBy(this) == N+1 > N  → enters while loop
    c->exprs_up_.back() == e0          → belongs to FusionB
    NVF_ERROR fires                    → crash
    

    With kPhase2DisableParallelCompile = true this is currently safe, but the assertion is structurally fragile. Any future work that allows two Fusions sharing a container to be active simultaneously (e.g., Phase 3's SegmentedFusion) will hit this. A more robust approach would iterate exprs_up_ from the back and skip (or separately track) statements belonging to other Fusions, rather than asserting on ownership of the global tail.


    csrc/runtime/fusion_kernel_runtime.cpp, line 30
    Global compile-time constant silently disables parallel compilation for all fusions

    kPhase2DisableParallelCompile = true is a compile-time constant, meaning parallel compilation is unconditionally disabled for every FusionKernelRuntime, regardless of whether the fusion actually uses shared containers. This is a hard-coded performance regression that affects all users of the runtime (not just Phase 2 shared-container paths) and cannot be toggled without a recompile.

    A runtime-checkable condition would be preferable — for example, checking whether the Fusion's ir_container_ptr() has hasMultipleFusions() or a dedicated Phase2 flag:

    if (num_groups == 1 ||
        fusion_->ir_container_ptr()->hasMultipleFusions() ||
        isOptionDisabled(DisableOption::ParallelCompile)) {

    This would restore parallel compilation for non-shared-container fusions immediately, rather than waiting for the TODO to be addressed.

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Mar 4, 2026

    Additional Comments (2)

    csrc/fusion.cpp, line 201
    Missing self-copy guard in Fusion::copy

    Fusion::copy is a static public method. If a caller passes the same Fusion as both from and to (i.e. Fusion::copy(f, f)), line 201 calls to->clear(), which calls ir_container_->removeStatementsOwnedBy(to) and destroys all of from's vals. The subsequent iteration over from->deterministic_vals() then sees an empty container and produces a silently empty result — data loss without any error.

    While the current internal call sites (copy constructor and copy assignment) never trigger this, the function is public and lacks a precondition assertion:

    IrCloner Fusion::copy(const Fusion* from, Fusion* to) {
      NVF_ERROR(from != to, "Fusion::copy: self-copy is not allowed");
      to->clear();
    

    csrc/fusion.cpp, line 343
    Exception safety hole: clear() before potentially-throwing swap

    The copy-assignment operator calls clear() on *this before calling swap(*this, copy). Since this PR correctly removed noexcept from swap (because swap can now throw std::bad_alloc through its vector allocations and map insertions), if swap throws after clear(), *this will be left in an empty, unusable state — neither the old content nor the new content.

    The classic copy-and-swap idiom avoids this by not explicitly clearing before the swap: the old state is destroyed by the temporary's destructor after a successful swap. However, with the new shared-container semantics in this PR, directly using a by-value parameter for copy-assignment would create unintended container sharing on the way in.

    One safe alternative is to check whether the copy completed before clearing:

    Fusion& Fusion::operator=(const Fusion& other) {
      if (this != &other) {
        Fusion copy(other);   // can throw — *this unchanged
        // Only reach swap (which can throw) after copy succeeded.
        // If swap throws here, *this is empty; consider moving swap-on-success
        // into a noexcept helper or accepting this as a medium-risk invariant.
        clear();
        swap(*this, copy);
      }
      return *this;
    }

    At minimum this should be documented as "basic exception safety only" — if swap throws, *this is valid but empty.

    @mdavis36 mdavis36 force-pushed the md/phase2-per-fusion branch from 2e491e9 to 50cb886 Compare March 5, 2026 00:07
    mdavis36 added 9 commits March 4, 2026 16:07
    Copy constructor now shares the source's container pointer instead of
    creating a new one. Fusion::copy clones directly from per-Fusion
    filtered vals rather than delegating to IrContainer::copy. Swap changed
    from content-based (IrContainer::swap) to pointer-based with per-Fusion
    ownership tracking for both same-container and different-container cases.
    Move val/expr name counters from IrContainer to Fusion so each Fusion
    independently tracks name assignment. This fixes CI failures where
    Fusion::copy left the dest counter at N (number of cloned vals) instead
    of max(name)+1 when source names were non-sequential, causing newly
    created TVs to collide with existing names.
    
    The fix adds val_type_name_map_ and expr_name_counter_ to Fusion, and
    updates registerVal/registerExpr to use the Fusion-level counters.
    Fusion::copy syncs counters from source to dest after cloning.
    Fusion::swap exchanges counters. Fusion::clear resets them.
    @mdavis36 mdavis36 force-pushed the md/phase2-copy-move branch from d145d9e to 3b9acdb Compare March 5, 2026 00:07
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Mar 5, 2026

    Additional Comments (3)

    csrc/fusion.cpp, line 208
    Fusion::copy is not self-copy safe. Calling it with from == to (the same pointer) would silently clear the source before cloning begins, producing an empty Fusion.

    Although the copy constructor and copy assignment operator guard against this scenario, Fusion::copy is a public static method with no precondition enforcement at the call site. A defensive assertion would prevent a subtle footgun:

    IrCloner Fusion::copy(const Fusion* from, Fusion* to) {
      NVF_ERROR(from != to, "Fusion::copy: from and to must be different Fusions");
      to->clear();
    

    csrc/fusion.cpp, line 321
    Copy constructor relies on an implicit ordering constraint: ir_container_->addFusion(this) must be called before Fusion::copy(&other, this) so that this is registered in the container's sharing_fusions_ set before clear() is invoked.

    This works for the current implementation (since this is freshly constructed with no statements), but future refactoring or subclassing could break this if the order changes or this is non-empty at the time of the call. Consider adding an explicit comment documenting this invariant:

    Fusion::Fusion(const Fusion& other) : ir_container_(other.ir_container_) {
      FUSER_PERF_SCOPE("Fusion copy");
      ir_container_->addFusion(this);
      // NOTE: Fusion::copy begins with to->clear() which calls
      // ir_container_->removeStatementsOwnedBy(this). Since this was just
      // registered with addFusion above and has no statements yet, the
      // clear is a no-op. This ordering must be preserved.
      Fusion::copy(&other, this);
    }
    

    csrc/fusion.cpp, line 227
    After Fusion::copy, the source and destination share name counter state. This is intentional — it ensures GreedyParams::at(tv->name()) and normalization_utils can perform cross-Fusion name lookups with matching names (e.g., source's T0 matches dest's T0).

    However, a side-effect is that names are unique within a single Fusion but not within the shared IrContainer. If both Fusions independently create new vals post-copy, they each receive the same next name, producing duplicate (ValType, name) pairs in the container. Consider adding an inline comment to document this invariant:

      // After this point, both source and dest Fusions share the same name counter
      // state. Names are unique within a single Fusion's val set but NOT unique
      // across all vals in the shared IrContainer. Cross-Fusion name lookups
      // (e.g. GreedyParams, normalization_utils) rely on this matching.
      to->val_type_name_map_ = from->val_type_name_map_;
      to->expr_name_counter_ = from->expr_name_counter_;
    

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

    @mdavis36 mdavis36 force-pushed the md/phase2-copy-move branch from 3b9acdb to c8ffe1d Compare March 5, 2026 00:47
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Mar 5, 2026

    Additional Comments (1)

    csrc/fusion.cpp, line 342
    The explicit clear() on line 339 weakens exception safety in the copy-and-swap idiom. If swap throws (e.g., std::bad_alloc in the vector allocations at lines 125–130), *this has already been cleared and its original content is permanently lost.

    The standard idiom should be:

    Fusion& Fusion::operator=(const Fusion& other) {
      if (this != &other) {
        Fusion copy(other);
        swap(*this, copy);  // On throw: *this still has original state intact
        // copy destructs → calls clear() + removeFusion → cleans up *this's old data
      }
      return *this;
    }

    Without the explicit clear(), if swap throws before modifying *this's state, the original content remains intact. The temporary's destructor then handles cleanup of *this's old data that was swapped into it.

    @mdavis36
    Copy link
    Collaborator Author

    mdavis36 commented Mar 5, 2026

    !test

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant