Draft
Conversation
b0b4b83 to
c712091
Compare
bf1d66b to
bd4dba9
Compare
Add a new configuration option to throttle repeated stage error log messages in stream interpreters: pekko.stream.materializer.stage-errors-log-throttle-period = off When set to a positive duration (e.g. '10s'), only the first stage error within each time window is logged at ERROR level. Subsequent errors in the window are counted silently, and a summary warning is emitted when the next window opens or the interpreter finishes. Key implementation details: - Per-interpreter throttle state (not shared across streams) - Uses errorLogInitialized flag to ensure first error always logs regardless of System.nanoTime() origin (fix for GPT-5.4 review finding) - Validates negative durations with require() (fix for GPT-5.4 finding) - Flushes suppressed count in finish() for best-effort cleanup - Default 'off' preserves existing behavior (zero behavior change) Tests use Broadcast with 5 parallel failing stages to exercise actual throttle code paths (fix for Sonnet 4.6 + GPT-5.4 review finding that original tests only triggered single errors). Cross-reviewed by GPT-5.4 and Sonnet 4.6. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bd4dba9 to
e662724
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
When a stream stage fails rapidly and repeatedly (e.g., persistent network failure, bad data in a loop), each failure generates a log message. In high-throughput systems, this can result in thousands of log messages per second, overwhelming log aggregation systems and masking other important messages.
Modification
Added a new configuration option:
When set to a positive duration (e.g.
10s), only the first stage error within each time window is logged at ERROR level. Subsequent errors in the window are counted silently, and a summary warning is emitted when the next window opens or the interpreter finishes.Implementation details
errorLogInitializedflag to guarantee first error always logs, regardless ofSystem.nanoTime()originrequire()finish()for best-effort reportingoff— zero behavior change for existing usersFiles changed
stream/src/main/resources/reference.conf— New config keystream/.../GraphInterpreter.scala— Throttle state fields + modifiedreportStageError+finish()flushstream-tests/.../StageErrorLogThrottleSpec.scala— 4 tests (enabled throttle with Broadcast fan-out, single error, disabled with fan-out, disabled single error)Result
off)reportStageErrorcalls per interpreter)ActorGraphInterpreterSpecandInterpreterSpectests pass (no regression)References