-
Notifications
You must be signed in to change notification settings - Fork 139
fix(pb): stop actors from destroying on goingaway #3501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
Code ReviewOverviewThis PR fixes an issue where actors were incorrectly being destroyed when receiving a Changes AnalysisThe Fix:
Logic Flow:
This is semantically equivalent BUT with the critical addition that Findings✅ Logic CorrectnessThe inverted boolean logic is correct:
The key improvement is that when 🤔 Semantic Clarity ConcernThe variable naming could be clearer:
This is somewhat counterintuitive. A clearer name might be let needs_rescheduling = state.going_away || \!matches\!(
variant,
StoppedVariant::Normal { code: protocol::StopCode::Ok }
);
match (input.crash_policy, needs_rescheduling) {
(CrashPolicy::Restart, true) => { /* ... */ }
(CrashPolicy::Sleep, true) => { /* ... */ }
// ...
}However, this is a minor style preference and doesn't affect correctness. ✅ State ManagementMoving
🟡 Comment PlacementThe comment "Reset retry count on successful exit" (line 575) was moved from above the match arm to inside it. This is fine, though having it above the arm was slightly more visible. Not a significant issue. ✅ Edge CasesThe logic correctly handles:
Questions for Consideration
Verdict✅ Approve - The logic is correct and fixes the described issue. The only suggestions are minor improvements to clarity:
The fix properly ensures actors don't get destroyed when their runner is going away, allowing them to be rescheduled according to the crash policy instead. |
d65d8db to
a78fb2c
Compare
Code ReviewSummaryThis PR fixes a critical bug where actors were being incorrectly destroyed when receiving a ✅ Positive Aspects
🔍 Code AnalysisMain Logic Change (lines 660-668, 668-695)Before: let failed = !matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, failed) {
(CrashPolicy::Restart, true) => { /* reschedule */ }
(CrashPolicy::Sleep, true) => { /* sleep */ }
_ => { /* destroy */ }
}After: let graceful_exit = !state.going_away
&& matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* reschedule */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}Analysis:
State Flag Ordering (lines 584-586 → 711-713)Before: state.gc_timeout_ts = None;
state.going_away = false; // ❌ Cleared too early
state.stopping = false;After: state.gc_timeout_ts = None;
state.stopping = false;
// ... rescheduling logic ...
state.going_away = false; // ✅ Cleared after all logicWhy this matters: The 🎯 Semantic CorrectnessThe fix aligns with the intended actor lifecycle:
Crash Policy Semantics:
The new code correctly implements these semantics. 🧹 Code Quality Notes
|
a78fb2c to
c20b8cd
Compare
815c918 to
986289f
Compare
Code ReviewSummaryThis PR fixes a critical bug where actors were being destroyed instead of rescheduled when they stopped during a runner's graceful shutdown ("going away" state). The fix correctly distinguishes between graceful exits that should trigger destruction versus non-graceful exits that should trigger rescheduling. Changes OverviewThe main logic change is in Before:
After:
Logic AnalysisThe inverted crash policy matching is correct: match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* reschedule */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}This means:
Positive Aspects
Potential Concerns1. State Transition Safety
|
PR Review: fix(pb): stop actors from destroying on goingawaySummaryThis PR fixes a critical bug where actors were being destroyed when they received a Code Quality & Logic Changes ✅Main Logic Fix (actor/mod.rs:660-713)
State Management Timing (actor/mod.rs:586,713)
Correctness Analysis ✅The boolean logic inversion is handled correctly: Before: let failed = !matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, failed) {
(CrashPolicy::Restart, true) => { /* restart */ }
(CrashPolicy::Sleep, true) => { /* sleep */ }
_ => { /* destroy */ }
}After: let graceful_exit = !state.going_away && matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* restart */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}The match arms correctly flip Potential Issues & Concerns1. State Reset Ordering
|
c20b8cd to
a78fb2c
Compare
986289f to
815c918
Compare
Code ReviewSummaryThis PR fixes a critical bug where actors were being incorrectly destroyed when receiving a Key Changes Analysis1. Core Logic Fix in
|
PR Review: fix(pb): stop actors from destroying on goingawayI've reviewed this PR and have some observations and feedback: SummaryThis PR fixes a critical bug where actors were being incorrectly destroyed when receiving a Code Quality & Logic✅ Core Fix is SoundThe main logic change at Before: The After: The flag is cleared at the end (line 713), allowing the crash policy logic to correctly identify non-graceful exits and trigger rescheduling. The key improvement is in the let graceful_exit = !state.going_away
&& matches!(
variant,
StoppedVariant::Normal {
code: protocol::StopCode::Ok
}
);This correctly treats ✅ Crash Policy Logic InversionThe inversion of the crash policy matching (
This aligns with the expected behavior where actors should be rescheduled on crashes/forced stops, not destroyed. Potential Issues & Considerations
|
a78fb2c to
c20b8cd
Compare
Code ReviewSummaryThis PR fixes an important issue where actors were being incorrectly destroyed when receiving a Key Changes Analysis1. Critical Fix: Delayed
|
a78fb2c to
c20b8cd
Compare
815c918 to
986289f
Compare
Code ReviewThanks for this PR! I've reviewed the changes and have the following feedback: SummaryThis PR fixes an issue where actors were being destroyed when receiving a ✅ Code Quality & Best PracticesGood:
Observations:
🐛 Potential IssuesCritical Logic Concern: // OLD: match (input.crash_policy, failed)
// NEW: match (input.crash_policy, graceful_exit)
match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* reschedule */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}This is a significant semantic change:
The new
Question: Is this the intended behavior? Specifically:
This seems correct for the GOINGAWAY case, but I want to confirm this doesn't inadvertently change behavior for actors that exit gracefully with 🔒 Security ConsiderationsNo security concerns identified. The changes are internal state management logic. ⚡ Performance ConsiderationsNo performance impact. The changes are pure logic fixes with no algorithmic changes. 🧪 Test CoverageConcern: No test coverage added for this bug fix. Recommendation: Consider adding a test case that:
This would prevent regression and document the expected behavior. 📝 Minor Observations
❓ Questions for Author
Overall AssessmentThe fix appears to address the stated issue (actors destroying on GOINGAWAY), but the semantic inversion of the crash policy logic needs verification to ensure it doesn't introduce unintended behavior changes for other actor lifecycle scenarios. Recommendation: ✅ Approve with verification - Please confirm the crash policy logic change is fully intentional and tested. |
Code Review - PR #3501: fix(pb): stop actors from destroying on goingawaySummaryThis PR fixes a bug where actors were being destroyed when they received a ✅ Positive Aspects
🔍 Logic AnalysisBefore (buggy): let failed = !matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, failed) {
(CrashPolicy::Restart, true) => { /* reschedule */ }
(CrashPolicy::Sleep, true) => { /* sleep */ }
_ => { /* destroy */ }
}
After (fixed): let graceful_exit = !state.going_away && matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* reschedule */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}
🤔 Questions & Considerations1. Edge Case:
|
c20b8cd to
a78fb2c
Compare
986289f to
815c918
Compare
Code ReviewOverviewThis PR fixes a critical bug where actors were incorrectly destroyed when receiving a ✅ Main Logic Fix (actor/mod.rs:660-713)The Problem: The Solution:
// Before
let failed = !matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, failed) {
(CrashPolicy::Restart, true) => { /* reschedule */ }
(CrashPolicy::Sleep, true) => { /* sleep */ }
_ => { /* destroy */ }
}
// After
let graceful_exit = !state.going_away
&& matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* reschedule */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}Analysis:
✅ Critical State Management Fix (actor/mod.rs:713)Before: Why this matters: The 🎯 Correctness VerificationThe fix correctly implements these scenarios:
📝 Code Style & FormattingAll formatting changes align with CLAUDE.md conventions:
|
a78fb2c to
c20b8cd
Compare
815c918 to
986289f
Compare
Code ReviewSummaryThis PR fixes an issue where actors were being destroyed when they received a Code Quality & Logic ✅Primary Fix (actor/mod.rs:660-713) The core logic change is sound: Before: let failed = !matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, failed) {
(CrashPolicy::Restart, true) => { /* reschedule */ }
(CrashPolicy::Sleep, true) => { /* sleep */ }
_ => { /* destroy */ }
}After: let graceful_exit = !state.going_away
&& matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });
match (input.crash_policy, graceful_exit) {
(CrashPolicy::Restart, false) => { /* reschedule */ }
(CrashPolicy::Sleep, false) => { /* sleep */ }
_ => { /* destroy */ }
}This correctly identifies that:
State Management (actor/mod.rs:713) Moving
The original placement at line 586 was premature - it cleared the flag before using it in the rescheduling logic. Formatting Changes ✅The formatting changes in
These follow the codebase conventions and improve consistency. Potential Concerns1. State Transition Safety
2. Comment Relocation (actor/mod.rs:575) 3. Generation Checking Testing Recommendations 🧪Consider adding test coverage for:
Example test structure: #[test]
async fn test_going_away_respects_restart_policy() {
// Setup actor with CrashPolicy::Restart
// Send GoingAway signal
// Stop actor with StopCode::Ok
// Assert: actor was rescheduled, not destroyed
// Assert: going_away flag is cleared
}Performance Considerations ⚡No performance concerns. The change:
Security Considerations 🔒No security concerns. The change:
Documentation 📝Consider adding a code comment explaining the // Only treat as graceful exit if:
// 1. Actor stopped with StopCode::Ok, AND
// 2. Not marked as going_away (runner shutdown/eviction)
let graceful_exit = !state.going_away
&& matches!(variant, StoppedVariant::Normal { code: protocol::StopCode::Ok });Overall Assessment ✨Verdict: LGTM with minor suggestions The fix correctly addresses the issue where actors were being destroyed instead of rescheduled during Strengths:
Suggestions:
Generated with Claude Code |
Code ReviewSummaryThis PR fixes a critical bug where actors were being incorrectly destroyed when receiving a Core Logic Change (packages/pegboard/src/workflows/actor/mod.rs)The Bug:
The Fix:
Correctness:
This matches the comment on line 575: "Reset retry count on successful exit" - only truly successful exits should reset the counter. Code QualityStrengths:
Formatting Changes:
Note: Per CLAUDE.md, formatting should not be done manually - these changes appear to be from Potential Issues1. Test Coverage (Critical)
Recommendation: Add test coverage for: // Test that actors with crash_policy=Restart reschedule after GoingAway+Stop
// Test that actors with crash_policy=Sleep sleep after GoingAway+Stop
// Test that truly graceful exits (no GoingAway) are destroyed2. Edge Case: Concurrent Signals 🤔
The 3. Documentation 📝
Security & Performance✅ No security concerns identified VerdictApprove with recommendations:
Great catch on a subtle state management bug! The fix is minimal and correct. |
Merge activity
|

No description provided.