[IR Container] Phase 2.6 Concurrency & Thread Safety#5971
[IR Container] Phase 2.6 Concurrency & Thread Safety#5971mdavis36 wants to merge 3 commits intomd/phase2-copy-movefrom
Conversation
|
!test |
|
Review updated until commit 0a32c16 Description
|
| Relevant files |
|---|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Duplicate code line
auto* c = a.ir_container_.get(); appears twice - once before and once after std::unique_lock lock(c->mutex_);. The second occurrence should be removed as it redefines the variable after the lock is acquired. |
2cac45c to
8fb976b
Compare
192fd55 to
35b7405
Compare
|
!test |
Greptile SummaryThis PR completes Phase 2 thread-safety infrastructure by adding Key changes and observations:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller as Caller Thread
participant Fusion as Fusion (public API)
participant CM as ContainerMutator (lock-free)
participant IC as IrContainer
Note over Fusion, IC: Layer 1 — Public write methods acquire unique_lock
Caller->>Fusion: registerVal(val)
Fusion->>IC: unique_lock(mutex_)
Fusion->>CM: ContainerMutator::registerVal(self, val)
CM->>IC: inContainerImpl(val) [no re-lock]
CM->>IC: vals_up_.emplace_back / vals_.insert
CM->>IC: per_fusion_vals_[self].insert
CM->>IC: getValName() [no re-lock, caller holds lock]
Fusion->>IC: unique_lock released
Note over Fusion, IC: Nested call path — removeVal → removeExpr (no deadlock)
Caller->>Fusion: removeVal(val)
Fusion->>IC: unique_lock(mutex_)
Fusion->>CM: ContainerMutator::removeVal(self, val)
CM->>CM: removeExpr(self, val->definition()) [lock-free, no re-entry]
CM->>IC: per_fusion_exprs_[self].erase / exprs_.erase
CM->>IC: per_fusion_vals_[self].erase / vals_.erase
Fusion->>IC: unique_lock released
Note over Fusion, IC: Layer 1 — Public read methods acquire shared_lock
Caller->>IC: unordered_exprs() / vals()
IC->>IC: shared_lock(mutex_)
IC-->>Caller: const ref returned (lock released)
Last reviewed commit: 55401d6
|
Additional Comments (1)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
|
!test |
8fb976b to
31bccb9
Compare
35b7405 to
88b2e60
Compare
…tainers Update Fusion::removeStatementsCreatedAfter to compare per-Fusion counts (from exprsOwnedBy(this) and numValsExcludingShortcuts()) instead of global deque sizes. This correctly handles shared containers where other Fusions' statements would inflate the global counts. Add NVF_ERROR assertions to verify the LIFO invariant: the tail element of the global deque must belong to this Fusion. If violated, another Fusion appended concurrently (should be prevented by PR #5971 locking). Remove now-unnecessary deque size validation checks.
a9c62ea to
46080be
Compare
31bccb9 to
0a32c16
Compare
|
!test |
Additional Comments (2)
The same problem recurs one loop below at line 209 via The fix is to access the container fields directly, bypassing the locking wrappers, as is done everywhere else inside // line 194 – replace locking accessor with direct field access
while (std::ssize(c->per_fusion_exprs_[self]) > num_exprs_before) {// line 209 – inline the shortcut-adjusted count using direct field access
int64_t numVals = std::ssize(c->per_fusion_vals_[self])
- (self->zero_val_ != nullptr) - (self->one_val_ != nullptr)
- (self->true_val_ != nullptr) - (self->false_val_ != nullptr)
- (self->magic_zero_val_ != nullptr);
while (numVals > num_vals_before) {
...
numVals--;
} |
0a32c16 to
b62d5ff
Compare
|
!test |
Additional Comments (1)
Every other static method in The fix is to bypass the public accessor and read static int64_t numValsExcludingShortcuts(const Fusion* self) noexcept {
auto* c = self->ir_container();
auto it = c->per_fusion_vals_.find(self);
int64_t count = it != c->per_fusion_vals_.end() ? std::ssize(it->second) : 0;
count -= (self->zero_val_ != nullptr) + (self->one_val_ != nullptr) +
(self->true_val_ != nullptr) + (self->false_val_ != nullptr) +
(self->magic_zero_val_ != nullptr);
return count;
}Note: |
b62d5ff to
6eda820
Compare
|
!test |
Additional Comments (1)
All other loop conditions in The fix is to read the per-fusion val count directly from the container field, matching the pattern used for the expr loop just above: static int64_t numValsExcludingShortcuts(const Fusion* self) noexcept {
auto* c = self->ir_container();
// Use direct field access (caller must hold mutex_, or container must not
// be shared). Avoids re-entering valsOwnedBy() which acquires shared_lock.
const auto it = c->per_fusion_vals_.find(self);
int64_t count = it != c->per_fusion_vals_.end()
? static_cast<int64_t>(it->second.size())
: 0;
count -= (self->zero_val_ != nullptr) + (self->one_val_ != nullptr) +
(self->true_val_ != nullptr) + (self->false_val_ != nullptr) +
(self->magic_zero_val_ != nullptr);
return count;
}The public |
6eda820 to
23bddaf
Compare
|
!test |
d145d9e to
3b9acdb
Compare
23bddaf to
d4478b1
Compare
Additional Comments (1)
This public method calls directly into In Phase 2, container sharing doesn't happen, so concurrent access to a single Fusion is unlikely. However, Phase 3 will introduce shared containers and concurrent writes, at which point this unprotected read path becomes unsafe. Suggested fix: Acquire int64_t Fusion::numValsExcludingShortcuts() const noexcept {
std::shared_lock lock(ir_container()->mutex_);
return ContainerMutator::numValsExcludingShortcuts(this);
}The static helper's lock-free property is preserved for the |
3b9acdb to
c8ffe1d
Compare
Add std::shared_mutex to IrContainer to protect shared mutable state during concurrent access from parallel compilation threads. - IrContainer public methods self-lock (shared_lock for reads, unique_lock for writes) - Fusion mutation methods (registerVal/Expr, removeVal/Expr, removeStatementsCreatedAfter) acquire unique_lock then delegate to lock-free ContainerMutator static methods, avoiding self-deadlock on nested calls (e.g., removeVal → removeExpr) - ContainerMutator is a PIMPL struct defined only in fusion.cpp, keeping lock-free impl methods out of the header - Remove kPhase2DisableParallelCompile guard, re-enabling parallel compilation now that the mutex is in place - Delete dead IrContainer::copy() and IrContainer::swap() methods
d4478b1 to
5d12b11
Compare
Additional Comments (1)
The public
This API inconsistency will produce a data race in Phase 3 when containers are shared across threads. Consider wrapping the call with int64_t Fusion::numValsExcludingShortcuts() const noexcept {
std::shared_lock lock(ir_container()->mutex_);
return ContainerMutator::numValsExcludingShortcuts(this);
} |
This is fixed in #5983 as it is removed entirely. |
Summary
Add
std::shared_mutexto IrContainer for concurrent read access during parallel compilation, remove thekPhase2DisableParallelCompileserialization guard introduced in PR 1, and validate that the full test suite passes with parallel compilation re-enabled.This is a future-proofing and defensive correctness change. Phase 2's
makeFusionpath does NOT share containers (each segment gets its own container via the default constructor), so parallel compilation is technically safe without the mutex. However, Phase 3 will changemakeFusionto use the copy constructor (shared container), at which point multiple threads will write to the sameIrContainerconcurrently. The mutex must be in place before Phase 3 can enable that.The Nested Call Problem and ContainerMutator
Five Fusion methods directly access IrContainer's internal fields because statement registration was moved from IrContainer to Fusion previously:
removeVal()callsremoveExpr(), andregisterExpr()also callsremoveExpr(). Sincestd::shared_mutexis not recursive, acquiringunique_lockin both the outer and inner methods would deadlock.The solution is a two-layer locking architecture:
ContainerMutatoris forward-declared infusion.h(2 lines) and fully defined infusion.cpp. This keeps the header clean and makes the locking architecture self-documenting: everything insideContainerMutatorassumes the lock is already held.Thread Safety Analysis
Dead Code Removal
Investigation revealed that
IrContainer::copy()andIrContainer::swap()have zero call sites — all copy/move/swap semantics are handled at the Fusion level after previous work. Removing them eliminates ~45 lines of dead code and avoids complex dual-locking patterns.Relationship to Phase 2
This PR completes the Phase 2 architectural work. With thread safety in place, the full shared scalar infrastructure is ready for Phase 3:
CI Risk
Low-medium. This is the first CI run with parallel compilation re-enabled since PR #5961 serialized it. Any latent concurrency issues would surface here. The parallel compilation path doesn't share containers in Phase 2, so the mutex is defensive — but re-enabling parallelism exercises the full concurrent codegen pipeline.