enhance responses parser for semantic SSE events#311
enhance responses parser for semantic SSE events#311ndycode wants to merge 14 commits intofeat/openai-parity-pr2from
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
📝 Walkthroughwalkthroughrefactored sse parsing in changes
sequence diagram(s)sequenceDiagram
participant SSE as SSE Event Stream
participant Parser as parseSseStream()
participant State as ParsedResponseState
participant Finalizer as finalizeParsedResponse()
participant Consumer as Response Consumer
SSE->>Parser: open stream
activate Parser
Parser->>State: init fresh state
activate State
loop for each sse event
SSE->>Parser: sse event chunk
Parser->>State: maybeCaptureResponseEvent(event)
Note right of State: mutate outputs, deltas, summaries, phases, ids, errors
end
SSE->>Parser: stream end or error
Parser->>Finalizer: finalizeParsedResponse(state)
deactivate State
activate Finalizer
Finalizer->>Finalizer: merge outputs and summaries<br/>synthesize missing text fields<br/>set phase/commentary/final_answer
Finalizer->>Consumer: return finalized response or raw sse on error
deactivate Finalizer
deactivate Parser
estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ❌ 3❌ Failed checks (3 warnings)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/request/response-handler.ts`:
- Around line 14-21: The handler currently calls onResponseId for every envelope
carrying data.response because ParsedResponseState is recreated per parsed line
and callbacks are fired in processSemanticEvent/passThrough parsing; fix this by
elevating a seen-response-id set to the long-lived response stream scope (e.g.,
add a Set<string> seenResponseIds on the ResponseHandler or the outer
ParsedResponseState) and only call onResponseId(responseId) the first time an id
is observed; ensure functions like processSemanticEvent / handleEnvelope / the
passthrough parser reuse the same ParsedResponseState instance rather than
recreating it per line so dedupe works across envelopes, and update relevant
tests (fetch-helpers.test.ts and response-handler.test.ts) to assert
onResponseId is called once for multi-envelope semantic events.
- Around line 53-59: mergeRecord currently treats any update.content (including
empty arrays from semantic response.output_item.added frames) as authoritative
and can wipe richer terminal output when merged back into response.output;
change mergeRecord to only replace content when update.content is non-null and
has length > 0 (i.e., treat empty arrays as non-authoritative), using
cloneContentArray(update.content) when replacing and otherwise cloning
base.content, and ensure the same logic applies to the merge call that folds
items into response.output (the code around the response.output merging at lines
~213-218) so non-text or final answer parts are preserved. Reference:
mergeRecord, cloneContentArray and the response.output merging code path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a388a9c2-129e-4f16-9eb2-2356b87f8576
📒 Files selected for processing (3)
lib/request/response-handler.tstest/fetch-helpers.test.tstest/response-handler.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/fetch-helpers.test.tstest/response-handler.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/request/response-handler.ts
|
@coderabbitai review |
|
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/request/response-handler.ts`:
- Around line 120-139: syncPhaseText currently rebuilds a phase's full text by
scanning all phaseTextSegments on every append, causing O(n²) in pathological
cases; change the update strategy so appendPhaseTextSegment updates phaseText
incrementally instead of calling syncPhaseText to re-filter the whole map: when
adding a segment to phaseTextSegments (identify segments by the `${phase}\u0000`
prefix), append the new segment value to the existing state.phaseText.get(phase)
(or initialize it) and only fall back to a full rebuild in rare cases (e.g.,
segment removal or reordering). Update functions/places that call syncPhaseText
(including appendPhaseTextSegment) to use this incremental update and keep
syncPhaseText as a fallback or rename it to rebuildPhaseText for clarity.
- Around line 519-533: Add regression tests that exercise the
response.content_part.added and response.content_part.done branches in
response-handler.ts: construct events where eventRecord.part is a mutable record
with type "output_text", a phase, and content_index, then invoke the same
event-processing pathway used by existing tests and assert that
setOutputTextValue updates state (or is called) with the expected outputIndex,
content_index and text from getPartText; also add tests for the branch where
part is missing or not "output_text" to ensure capturePhase is invoked. Target
the test file test/response-handler.test.ts and reference the handler logic
around setOutputTextValue, getPartText, and capturePhase so the new tests cover
both success and validation-failure paths for response.content_part.added and
response.content_part.done.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 84fe93be-b96f-4a5e-a64a-7ebf655dbad5
📒 Files selected for processing (3)
lib/request/response-handler.tstest/fetch-helpers.test.tstest/response-handler.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/fetch-helpers.test.tstest/response-handler.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/request/response-handler.ts
🔇 Additional comments (16)
lib/request/response-handler.ts (8)
52-60: semantic distinction betweengetStringFieldandgetDeltaFieldis correct but subtle.
getStringField(line 52-55) rejects whitespace-only strings via.trim().length > 0, whilegetDeltaField(line 57-60) preserves them. this correctly handles whitespace deltas vs canonical terminal text.covered by
test/response-handler.test.ts:165-198(whitespace delta preservation test).
62-69: index guard against sparse array construction looks good.
MAX_SYNTHESIZED_EVENT_INDEX = 255(line 11) bounds all synthesized indices. this prevents memory exhaustion from malicious payloads likeoutput_index: 1000000.covered by
test/response-handler.test.ts:489-517(oversized indices test).
76-87: content merge now preserves richer terminal output.
mergeRecord(line 76-87) only replaces base content whenupdateContent.length > 0, addressing the past review about emptycontent: []frames erasing terminal content.covered by
test/response-handler.test.ts:200-223(richer terminal output test).
227-240:setReasoningSummaryValueclears stale deltas on null text.pr summary notes asymmetric behavior (line 235-237 deletes vs append which bails on null), but this is intentional:
*.doneevents with null/empty text clear accumulated deltas, while append on null is a no-op.covered by
test/response-handler.test.ts:276-296(stale reasoning delta clearing test).
299-307: canonical terminal content overrides accumulated deltas correctly.
applyAccumulatedOutputText(line 299-307) skips writing synthesized text whenpart.textis already a string (canonical terminal content present). it also syncs phase text from the canonical value.covered by
test/response-handler.test.ts:225-248(canonical terminal content test).
401-441:finalizeParsedResponsecorrectly aborts on error.line 404 returns null when
state.encounteredErroris true, ensuring error streams don't produce synthesized JSON.covered by
test/response-handler.test.ts:451-469(error event before done test).
451-467: response id deduplication viaseenResponseIdsaddresses past review.
notifyResponseId(line 457-458) checks and adds tostate.seenResponseIdsbefore invoking callback. this ensuresonResponseIdfires once per stream, not per envelope.covered by
test/response-handler.test.ts:433-449andtest/fetch-helpers.test.ts:747-765.
684-729: streaming capture now shares state across chunks.
createResponseIdCapturingStream(line 690) creates a singleParsedResponseStateinstance reused across all transform calls, addressing the past review about per-chunk state recreation.also correctly stops processing on
encounteredError(line 693, 709).test/response-handler.test.ts (7)
69-116: comprehensive semantic synthesis test.
test/response-handler.test.ts:69-116validates end-to-end synthesis ofoutput_text,reasoning_summary_text,phase,final_answer_text, andphase_textfrom semantic SSE events.assertions cover both top-level synthesized fields and nested
output[].content[]/output[].summary[]structure.
165-198: whitespace delta preservation test addresses regression.
test/response-handler.test.ts:165-198verifies that whitespace-only deltas (e.g.," ") are accumulated when no*.doneevent overrides them.this pins the fix for the whitespace truncation bug mentioned in pr summary.
250-296: stale delta clearing tests pin clearing behavior.
test/response-handler.test.ts:250-274andtest/response-handler.test.ts:276-296verify that*.doneevents with whitespace-only text clear accumulated deltas.note: the test at line 255 uses
"text":" "(single space) whichgetStringFieldtreats as empty due to.trim().length > 0check. this is correct per implementation.
433-449: response id dedup assertion added.
test/response-handler.test.ts:448addsexpect(onResponseId).toHaveBeenCalledTimes(1)for multi-envelope streams. this pins the fix for the past review about duplicateonResponseIdcalls.
451-487: error event and partial stream tests ensure robust fallback.
test/response-handler.test.ts:451-469verifies that error events beforeresponse.donereturn raw SSE and don't invokeonResponseId.
test/response-handler.test.ts:471-487verifies that partial streams (no terminal event) also return raw SSE.these pin the "only terminal events set
state.finalResponse" fix.
489-517: oversized index guard test prevents sparse array construction.
test/response-handler.test.ts:489-517sends events withoutput_index: 1000000and verifies they are ignored, preventing memory exhaustion attacks.assertions check that
output,output_text, andreasoning_summary_textare all undefined.
594-610: streaming error-stop test added forattachResponseIdCapture.
test/response-handler.test.ts:594-610verifies thatattachResponseIdCapturestops capturing response IDs after an SSE error event.this complements the non-streaming error test and ensures consistent behavior across both paths.
test/fetch-helpers.test.ts (1)
746-765: integration test for streaming response id capture.
test/fetch-helpers.test.ts:747-765validates thathandleSuccessResponse(response, true, { onResponseId })captures response IDs without rewriting the stream body.assertions check:
- stream body still contains
"resp_stream_123"(not rewritten)onResponseIdcalled with"resp_stream_123"onResponseIdcalled exactly once (dedup working)this pins the integration behavior for the response-handler changes.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/request/response-handler.ts`:
- Around line 522-536: Add a test in test/response-handler.test.ts that
exercises the branch in response-handler.ts where eventRecord.part is missing or
has a non-"output_text" type so capturePhase(state, part?.phase) is invoked;
construct an SSE stream (used by convertSseToJson) containing response.created,
response.output_item.added (to create the output index), then a
response.content_part.done with part missing or part.type set to "annotation"
and phase "final_answer", and finally response.done, then assert the converted
JSON body contains phase === "final_answer" to verify capturePhase was called.
- Around line 52-55: getStringField currently trims and rejects whitespace-only
strings which is fine for validation fields like 'type' and 'phase' but could
unintentionally drop meaningful whitespace for other uses; change getStringField
to accept an optional parameter (e.g., trim: boolean = true) and update its
behavior to only call trim() when trim is true, then update all callers (notably
the validation usages such as where 'type' is validated) to pass trim=true and
any textual callers to pass trim=false or omit if default is desired; also add a
short JSDoc above getStringField describing the trim behavior so future callers
know the intent.
In `@test/response-handler.test.ts`:
- Around line 543-571: Add a unit test that verifies delta events missing
output_index are ignored: create a new test (e.g., "ignores delta events with
missing output_index") that constructs an SSE string containing
response.created, response.output_text.delta (without output_index),
response.reasoning_summary_text.delta (without output_index), and response.done,
then call convertSseToJson with a Response and Headers and assert the returned
body.id equals the response id and that body.output_text and
body.reasoning_summary_text are undefined; this ensures the behavior around
getNumberField/makeOutputTextKey skipping null output_index values is covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d84ec947-0d19-44be-8fab-b2e23b33b528
📒 Files selected for processing (2)
lib/request/response-handler.tstest/response-handler.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/request/response-handler.ts
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/response-handler.test.ts
🔇 Additional comments (10)
lib/request/response-handler.ts (6)
128-139: o(n²) insyncPhaseTextremains — acceptable for typical streams.
syncPhaseTextrebuilds phase text by filtering allphaseTextSegmentson every append. this is already flagged in prior review as a nitpick. for pathological cases (many segments), consider lazy sync or incremental append if profiling shows hot paths.
15-26: state design looks good with proper deduplication.
seenResponseIdsat line 24 addresses the prior concern about duplicateonResponseIdcalls. theencounteredErrorflag at line 25 cleanly stops processing on SSE errors.test/response-handler.test.ts:502now assertstoHaveBeenCalledTimes(1).
76-87: content merge logic correctly preserves richer terminal content.lines 81-84 now treat empty
updateContentas non-authoritative, falling back tobase.content. this addresses the prior concern at lines 76-86 aboutresponse.output_item.addedframes withcontent: []erasing terminal content.test/response-handler.test.ts:254-277pins this behavior.
454-470: deduplication viaseenResponseIdscorrectly prevents double callbacks.
notifyResponseIdnow checksstate.seenResponseIds.has(responseId)at line 460 before invoking the callback, and adds the id to the set at line 461. this addresses the prior concern about duplicate side effects on windows filesystem and race conditions.
687-732: state reuse increateResponseIdCapturingStreamcorrectly deduplicates across envelopes.line 693 creates a single
ParsedResponseStateshared across the entire stream. lines 696 and 712 checkstate.encounteredErrorto short-circuit processing after an error event. this addresses the prior concern about recreating state per parsed line.
363-402: the canonical-text guard is working correctly; no action needed.tests confirm the intended behavior:
applyReasoningSummariesatlib/request/response-handler.ts:388-390correctly skips overwriting whennextPart.textis already a string (canonical terminal value). test attest/response-handler.test.ts:140-163validates that synthesized deltas don't overwrite canonical nested summary parts, and test attest/response-handler.test.ts:118-138confirms top-levelreasoning_summary_textremains canonical while nestedoutput[].summary[]can be synthesized. the guard is intentional and tested.test/response-handler.test.ts (4)
69-116: lgtm, comprehensive coverage for semantic SSE synthesis.this test exercises the full accumulation flow:
response.output_item.added,response.output_text.*,response.reasoning_summary_text.*, and phase tracking. assertions at lines 102-115 pin the synthesized fields (output_text,reasoning_summary_text,phase,final_answer_text,phase_text) and nested content/summary structures.
190-217: addresses past review concern forresponse.content_part.*coverage.this test exercises
lib/request/response-handler.ts:522-536by constructingresponse.content_part.addedandresponse.content_part.doneevents withtype: "output_text"payloads. assertions confirm thatsetOutputTextValueis invoked and that the final text from the.doneevent takes precedence over the.addeddelta.
505-541: solid error and early-termination coverage.tests at lines 505-523 and 525-541 verify that
onResponseIdis never called when anerrorevent arrives or when the stream ends without a terminal event. this aligns withlib/request/response-handler.ts:477-481settingstate.encounteredErrorandlib/request/response-handler.ts:595returningnull.
247-251: this review misidentifies the test expectations.the assertions at test/response-handler.test.ts:247-251 expect spaces to be preserved, not stripped. the test checks
'Hello world'(with space), which matches the test name "preserves whitespace-only semantic deltas when no done events override them."getDeltaField at lib/request/response-handler.ts:57-60 correctly preserves whitespace: it returns any non-empty string including space-only deltas via
value.length > 0(no trim). appendOutputTextValue then concatenates all non-null deltas. the code works as intended and the test will pass.> Likely an incorrect or invalid review comment.
129e40b to
6a273fe
Compare
All review threads are resolved and later commits addressed this stale automated change request.
Summary
output_text,reasoning_summary_text,commentary_text,final_answer_text, andphase_textStack
#310