fix: daemon cursor starvation when sessions produce zero episodes#68
Open
dnnspaul wants to merge 1 commit intoMAnders333:mainfrom
Open
fix: daemon cursor starvation when sessions produce zero episodes#68dnnspaul wants to merge 1 commit intoMAnders333:mainfrom
dnnspaul wants to merge 1 commit intoMAnders333:mainfrom
Conversation
When all candidate sessions in a batch had too few messages to produce episodes (e.g. automated 2-message sessions), the daemon cursor never advanced — re-fetching the same batch of ineligible sessions every cycle. Root cause: the cursor only advanced based on lastSuccessMaxTime, which was only updated when episodes were successfully uploaded. With zero episodes, the cursor stayed at the old value indefinitely. Fix: when all sessions are examined and produce zero episodes (with no insert failures), advance the cursor past the examined batch. Also switch the polling loop from fixed setInterval to self-scheduling setTimeout that re-runs immediately when the cursor advances, avoiding multi-minute delays when skipping over large blocks of ineligible sessions. Added two tests covering the starvation scenario with and without the batch limit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When all candidate sessions in a polling batch have too few messages to produce episodes (e.g. short automated 2-message sessions), the daemon cursor never advances. The cursor only moved forward based on
lastSuccessMaxTime, which was only updated when episodes were successfully uploaded. With zero episodes to upload, the cursor stayed at the same value indefinitely — causing the daemon to re-fetch the same batch of ineligible sessions every polling cycle, forever.This is a realistic scenario: automated or trivial sessions that fall below
minSessionMessagesaccumulate over time and can form contiguous blocks that the daemon can never move past.Fix
Cursor advancement for zero-episode batches: When all sessions in a batch are examined and produce zero episodes (with no insert failures), the cursor now advances past the examined batch. This is handled as a distinct case (Case 2) in the upload logic, separate from the existing path that advances based on successfully uploaded episodes.
When the batch limit is hit, the cursor advances to
1ms before the last candidate session's timestampto avoid skipping sessions that share the same timestamp as the boundary. When the batch limit is not hit, the cursor advances to the maximum timestamp seen.Self-scheduling polling loop: Replaced the fixed
setIntervalwith a self-schedulingsetTimeout. When the cursor advances (indicating there may be more work to process), the next cycle runs immediately instead of waiting for the full polling interval. This avoids multi-minute delays when skipping over large blocks of ineligible sessions. The loop falls back to the configuredintervalMswhen there's nothing more to process.Shutdown handler timing: Moved SIGTERM/SIGINT handler registration before the first
runCycle()call, closing a window where a signal during the initial upload would not be caught.Tests
Added two test cases covering the starvation scenario:
lastCandidateTime - 1msandcursorAdvancedis truecursorAdvancedis true