Skip to content

fix: daemon cursor starvation when sessions produce zero episodes#68

Open
dnnspaul wants to merge 1 commit intoMAnders333:mainfrom
dnnspaul:fix/daemon-cursor-starvation
Open

fix: daemon cursor starvation when sessions produce zero episodes#68
dnnspaul wants to merge 1 commit intoMAnders333:mainfrom
dnnspaul:fix/daemon-cursor-starvation

Conversation

@dnnspaul
Copy link
Copy Markdown
Contributor

@dnnspaul dnnspaul commented Apr 1, 2026

Problem

When all candidate sessions in a polling batch have too few messages to produce episodes (e.g. short automated 2-message sessions), the daemon cursor never advances. The cursor only moved forward based on lastSuccessMaxTime, which was only updated when episodes were successfully uploaded. With zero episodes to upload, the cursor stayed at the same value indefinitely — causing the daemon to re-fetch the same batch of ineligible sessions every polling cycle, forever.

This is a realistic scenario: automated or trivial sessions that fall below minSessionMessages accumulate over time and can form contiguous blocks that the daemon can never move past.

Fix

Cursor advancement for zero-episode batches: When all sessions in a batch are examined and produce zero episodes (with no insert failures), the cursor now advances past the examined batch. This is handled as a distinct case (Case 2) in the upload logic, separate from the existing path that advances based on successfully uploaded episodes.

When the batch limit is hit, the cursor advances to 1ms before the last candidate session's timestamp to avoid skipping sessions that share the same timestamp as the boundary. When the batch limit is not hit, the cursor advances to the maximum timestamp seen.

Self-scheduling polling loop: Replaced the fixed setInterval with a self-scheduling setTimeout. When the cursor advances (indicating there may be more work to process), the next cycle runs immediately instead of waiting for the full polling interval. This avoids multi-minute delays when skipping over large blocks of ineligible sessions. The loop falls back to the configured intervalMs when there's nothing more to process.

Shutdown handler timing: Moved SIGTERM/SIGINT handler registration before the first runCycle() call, closing a window where a signal during the initial upload would not be caught.

Tests

Added two test cases covering the starvation scenario:

  • With batch limit hit: Verifies the cursor advances to lastCandidateTime - 1ms and cursorAdvanced is true
  • Without batch limit: Verifies the cursor advances to the max session timestamp and cursorAdvanced is true

When all candidate sessions in a batch had too few messages to produce
episodes (e.g. automated 2-message sessions), the daemon cursor never
advanced — re-fetching the same batch of ineligible sessions every cycle.

Root cause: the cursor only advanced based on lastSuccessMaxTime, which
was only updated when episodes were successfully uploaded. With zero
episodes, the cursor stayed at the old value indefinitely.

Fix: when all sessions are examined and produce zero episodes (with no
insert failures), advance the cursor past the examined batch. Also
switch the polling loop from fixed setInterval to self-scheduling
setTimeout that re-runs immediately when the cursor advances, avoiding
multi-minute delays when skipping over large blocks of ineligible
sessions.

Added two tests covering the starvation scenario with and without the
batch limit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant