[IR Container] Phase 2.7 Segmenter Container Sharing#5983
[IR Container] Phase 2.7 Segmenter Container Sharing#5983mdavis36 wants to merge 5 commits intomd/phase2-thread-safetyfrom
Conversation
|
!test |
|
Review updated until commit b1873c8 Description
|
| Relevant files | |||||||
|---|---|---|---|---|---|---|---|
| Enhancement |
| ||||||
| Bug fix |
| ||||||
| Tests |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Shortcut val handling change
This is a significant behavioral change - shortcut vals (zero_val_, one_val_, etc.) are now included in the count. The nullOutShortcutIfNeeded function handles nulling these pointers during removal, but this logic should be verified to ensure shortcut vals are correctly tracked in per_fusion_vals_ and vals_up_ and that removal works as expected across both fast and slow paths. |
8fb976b to
31bccb9
Compare
504bde0 to
31bccb9
Compare
Description
|
| Relevant files | |||||||
|---|---|---|---|---|---|---|---|
| Enhancement |
| ||||||
| Bug fix |
| ||||||
| Tests |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ No major issues detected |
31bccb9 to
0a32c16
Compare
7489fca to
9a5298f
Compare
|
!test |
0a32c16 to
b62d5ff
Compare
9a5298f to
8bd6641
Compare
|
!test |
b62d5ff to
6eda820
Compare
8bd6641 to
f54045f
Compare
|
!test |
f54045f to
53d4ac2
Compare
6eda820 to
23bddaf
Compare
|
!test |
23bddaf to
d4478b1
Compare
b1873c8 to
fd3cb8b
Compare
Greptile SummaryThis PR wires the segmenter into Phase 2's shared Key changes:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant SF as SegmentedFusion
participant F as Fusion (segment)
participant IC as IrContainer (shared)
participant CG as StatementGuard
SF->>F: new Fusion(completeFusion()->ir_container_ptr())
F->>IC: addFusion(this) [unique_lock]
IC-->>IC: sharing_fusions_.insert(fusion)
IC-->>IC: per_fusion_vals_[fusion] pre-insert
IC-->>IC: per_fusion_exprs_[fusion] pre-insert
SF->>F: Fusion::copy(completeFusion(), fusion_segment)
F->>F: clear() → removeStatementsOwnedBy(self)
Note over IC: pre-allocated keys erased here
loop clone each Val/Expr
F->>IC: registerVal / registerExpr [unique_lock]
IC-->>IC: per_fusion_vals_[fusion].insert(v)
end
Note over SF,F: Parallel compilation per segment
opt Error rollback path only
CG->>F: ~StatementGuard()
F->>IC: removeStatementsCreatedAfter [unique_lock]
alt sharing_fusions_.size() <= 1 (fast path)
IC-->>IC: LIFO pop_back on deques
else sharing_fusions_.size() > 1 (slow path)
IC-->>IC: std::erase_if scan — skip other Fusions' stmts
IC-->>IC: remove only self's new stmts (O(N) total)
end
end
F->>IC: removeFusion(this) [unique_lock] on ~Fusion()
Last reviewed commit: deb0a38 |
| per_fusion_vals_[fusion]; // Pre-insert key so no outer-map rehash occurs during concurrent val/expr registration | ||
| per_fusion_exprs_[fusion]; |
There was a problem hiding this comment.
Pre-allocation immediately negated by Fusion::copy -> clear()
The comment claims this pre-allocation prevents outer-map rehash during concurrent val/expr registration. However, the call chain in makeFusion immediately undoes this:
new Fusion(container) → addFusion() → per_fusion_vals_[fusion_segment] = {} // pre-allocated ✓
Fusion::copy(completeFusion(), fusion_segment.get())
→ fusion_segment->clear()
→ ir_container_->removeStatementsOwnedBy(fusion_segment)
→ per_fusion_vals_.erase(vals_it) // key REMOVED! pre-allocation lost
After removeStatementsOwnedBy erases the pre-allocated key, the very first registerVal call inside Fusion::copy executes per_fusion_vals_[fusion_segment].insert(val) with operator[] on a missing key, causing a new key insertion into the outer unordered_map. This new insertion can trigger a rehash, which is exactly what the pre-allocation was supposed to prevent.
The race window is: valsOwnedBy() acquires shared_lock, returns a const& into per_fusion_vals_[X], releases the lock, and the caller then calls std::ssize on that reference. Between the lock release and std::ssize, another thread can insert the fusion_segment key (causing rehash), invalidating the reference — UB.
A minimal fix that preserves the pre-allocation invariant is to clear the inner set in removeStatementsOwnedBy instead of erasing the outer key:
// In removeStatementsOwnedBy, instead of:
per_fusion_vals_.erase(vals_it);
// use:
vals_it->second.clear(); // keep the key; prevents re-insertion + rehashThis does require a separate cleanup in removeFusion (which already holds unique_lock) to erase the now-empty key after the fusion is fully removed from sharing_fusions_.
| } | ||
|
|
||
| FusionExecutorCache executor_cache(std::move(fusion)); |
There was a problem hiding this comment.
EXPECT_GE(..., 6) assertion may be too strong and cause a false test failure
The comment itself acknowledges uncertainty: "the segmenter may merge some compatible reductions." With 4 inputs × 3 axes, the segmenter can legally merge all axis-0 reductions (from all 4 inputs) into one segment, all axis-1 reductions into another, and all axis-2 reductions into a third — producing only 3 segments, not 6.
If the segmenter is ever improved to merge reductions across inputs (a valid and correct optimisation), this test will EXPECT_GE(3, 6) and fail, even though the shared-container infrastructure is working perfectly. The stress test's correctness goal is already validated by testValidate; the segment-count assertion adds fragility without adding safety.
Consider removing the segment-count assertion entirely, or replacing it with just EXPECT_GT(runtime->fusionSegments()->groups().size(), 1) to verify at least some segmentation occurred (which is what the test is designed to exercise):
| } | |
| FusionExecutorCache executor_cache(std::move(fusion)); | |
| EXPECT_GT(runtime->fusionSegments()->groups().size(), 1u); |
d4478b1 to
5d12b11
Compare
…rContainer Add protected Fusion constructor that accepts an existing shared_ptr<IrContainer>, allowing SegmentedFusion::makeFusion() to share the complete Fusion's container instead of creating a new one per segment. Changes: - fusion.h: Add protected Fusion(shared_ptr<IrContainer>) constructor declaration - fusion.cpp: Implement the shared-container constructor - ir/container.cpp: Pre-allocate per_fusion_vals_/per_fusion_exprs_ in addFusion() to prevent rehash races during concurrent segment compilation - fusion_segmenter.cpp: Use shared container constructor in makeFusion() This enables real cross-thread lock contention on the IrContainer mutex during parallel segment compilation, validating Phase 2's shared_ptr + mutex infrastructure under realistic multi-Fusion sharing conditions.
Two new SegmentationTest cases that exercise the shared IrContainer under higher concurrent load than existing tests (max 3 segments): - SharedContainerStress8Segments: linear chain with 7 segment_set boundaries, 8 segments compiling in parallel - SharedContainerStress12ParallelBranches: 4 inputs x 3 independent reductions each, at least 6 segments compiling in parallel Both pass with correct results, validating the shared_ptr + mutex infrastructure under real multi-thread contention.
Statements cleaned up by statement guard need to be popped from the specific fusion only, not the entire IrContainer.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
13d90fa to
deb0a38
Compare
| if (c->per_fusion_exprs_[self].count(e) == 0) { | ||
| return false; // belongs to another Fusion — keep | ||
| } | ||
| if (exprs_kept < num_exprs_before) { | ||
| ++exprs_kept; | ||
| return false; // self's old expr — keep | ||
| } | ||
| // self's new expr — remove (clean up uses and index maps first) | ||
| for (Val* out : e->outputs()) { | ||
| out->setDefinition(nullptr); | ||
| } | ||
| for (Val* in : e->inputs()) { | ||
| in->removeUse(e); | ||
| } | ||
| c->per_fusion_exprs_[self].erase(e); | ||
| c->exprs_.erase(e); | ||
| return true; | ||
| }); | ||
|
|
||
| int64_t vals_kept = 0; | ||
| std::erase_if(c->vals_up_, [&](const std::unique_ptr<Val>& v_up) { | ||
| Val* v = v_up.get(); | ||
| if (c->per_fusion_vals_[self].count(v) == 0) { | ||
| return false; // belongs to another Fusion — keep |
There was a problem hiding this comment.
operator[] on outer map can insert spurious empty entries
Both slow-path predicates access the inner per-fusion sets via operator[] on the outer unordered_map:
if (c->per_fusion_exprs_[self].count(e) == 0) { … }
// and later:
if (c->per_fusion_vals_[self].count(v) == 0) { … }operator[] on an unordered_map performs a key insertion (with a default-constructed value) when the key is absent. The key for self can legitimately be absent here because Fusion::copy() → to->clear() → removeStatementsOwnedBy(to) erases it, and it is only re-inserted by the first registerExpr/registerVal call. If a StatementGuard is destroyed in the zero-statements case (guard scope added nothing and the fusion was just cleared), the slow path will be entered — c->sharing_fusions_.size() > 1 — and the predicate will call operator[] on a missing key, inserting a spurious empty set.
The behavior is still logically correct (count returns 0, predicate returns false, nothing is erased), but it silently re-populates a key that removeStatementsOwnedBy purposely removed. Use .find() to avoid the side-effect:
auto exprs_it = c->per_fusion_exprs_.find(self);
std::erase_if(c->exprs_up_, [&](const std::unique_ptr<Expr>& e_up) {
Expr* e = e_up.get();
if (exprs_it == c->per_fusion_exprs_.end() ||
exprs_it->second.count(e) == 0) {
return false; // belongs to another Fusion — keep
}
…
});And similarly for the vals_up_ pass.
| per_fusion_vals_[fusion]; // Pre-insert key so no outer-map rehash occurs during concurrent val/expr registration | ||
| per_fusion_exprs_[fusion]; |
There was a problem hiding this comment.
Overly long inline comment
The inline comment on line 156 is very long (well past 80–100 characters), which will likely fail the project's clang-format or line-length lint checks:
per_fusion_vals_[fusion]; // Pre-insert key so no outer-map rehash occurs during concurrent val/expr registrationConsider breaking it into a preceding block comment:
| per_fusion_vals_[fusion]; // Pre-insert key so no outer-map rehash occurs during concurrent val/expr registration | |
| per_fusion_exprs_[fusion]; | |
| // Pre-insert keys so no outer-map rehash occurs during concurrent | |
| // val/expr registration by segment Fusions sharing this container. | |
| per_fusion_vals_[fusion]; | |
| per_fusion_exprs_[fusion]; |
| auto fusion_segment = | ||
| std::unique_ptr<Fusion>(new Fusion(completeFusion()->ir_container_ptr())); |
There was a problem hiding this comment.
Raw new without explanatory comment
Using new Fusion(...) directly instead of std::make_unique<Fusion>(...) is the correct approach here — make_unique cannot access protected constructors even through friendship — but nothing in the code explains this to future maintainers. A one-line comment would prevent a well-intentioned "fix" that breaks compilation:
| auto fusion_segment = | |
| std::unique_ptr<Fusion>(new Fusion(completeFusion()->ir_container_ptr())); | |
| // The shared-container constructor is protected; use raw new via friendship | |
| // rather than std::make_unique (which cannot access protected ctors). | |
| auto fusion_segment = | |
| std::unique_ptr<Fusion>(new Fusion(completeFusion()->ir_container_ptr())); |
Summary
Wire the segmenter into Phase 2's shared `IrContainer` infrastructure: segment Fusions now share the complete Fusion's container instead of each creating a new one. This is the integration step that puts real cross-thread lock contention on the `shared_ptr` + `shared_mutex` infrastructure built in Phases 2.1–2.6.
What Changed
`fusion_segmenter.cpp` — `SegmentedFusion::makeFusion()` creates segment Fusions with the shared-container constructor instead of `std::make_unique()`, so all segment Fusions are registered in `completeFusion`'s `IrContainer`.
`fusion.h` / `fusion.cpp` — Add a `protected` constructor `Fusion(shared_ptr)` that creates an empty Fusion registered with a pre-existing container. Used exclusively by `makeFusion()`.
`ir/container.cpp` — Pre-allocate `per_fusion_vals_[fusion]` and `per_fusion_exprs_[fusion]` entries inside `addFusion()` (under the write lock) to prevent unguarded `unordered_map` rehash races when multiple segment Fusions register concurrently during parallel compilation.
`ContainerMutator::removeStatementsCreatedAfter()` (bugfix) — The previous implementation assumed a LIFO pop-back invariant on the global deques that only holds when a single Fusion owns the container. Under shared containers, other Fusions' statements can be interleaved at the tail. Two paths are now implemented:
`statement_guard.cpp` — `StatementGuard` switches from `numValsExcludingShortcuts()` to `numVals()`. The old exclusion was a workaround for the LIFO pop-back assuming shortcut vals were at the front; the new ownership-filtered scan preserves pre-guard vals naturally.
`tests/cpp/test_segmentation.cpp` — Two new stress tests exercising the shared container under genuine multi-thread contention:
Design Note
The fast/slow split in `removeStatementsCreatedAfter` is intentional. The slow path's `std::erase_if` is O(N) in total container size, but it is only reached when the container is shared (i.e., during segmented execution) and only for statement rollback on error paths. The common success path does not call it.