Skip to content

enhance responses parser for semantic SSE events#311

Open
ndycode wants to merge 14 commits intofeat/openai-parity-pr2from
feat/openai-parity-pr3
Open

enhance responses parser for semantic SSE events#311
ndycode wants to merge 14 commits intofeat/openai-parity-pr2from
feat/openai-parity-pr3

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 22, 2026

Summary

  • parse semantic Responses SSE events into richer synthesized JSON fields
  • surface output_text, reasoning_summary_text, commentary_text, final_answer_text, and phase_text
  • keep canonical terminal response content authoritative when semantic events disagree

Stack

  • base: #310

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

📝 Walkthrough

walkthrough

refactored sse parsing in lib/request/response-handler.ts:1 to accumulate events in a mutable ParsedResponseState, synthesize missing fields at finalization, and defer returning until stream completion or error. tests added in test/response-handler.test.ts:1 and test/fetch-helpers.test.ts:1. missing regression tests, windows crlf handling, and concurrency risks flagged (see lib/request/response-handler.ts:1).

changes

Cohort / File(s) Summary
core sse parsing refactor
lib/request/response-handler.ts
introduced ParsedResponseState and stateful capture helpers. maybeCaptureResponseEvent now mutates shared state, parseSseStream and createResponseIdCapturingStream feed events into that state, and finalizeParsedResponse merges outputs and synthesizes output_text, reasoning_summary_text, and phase fields. logging and stream-error handling updated to reference generic terminal response events.
sse passthrough entitlement test
test/fetch-helpers.test.ts
added test that sends response.created and response.done SSE for same id; asserts original stream body is preserved and onResponseId is invoked once.
extensive semantic sse tests
test/response-handler.test.ts
added many convertSseToJson scenarios covering delta accumulation, terminal canonical overrides, stale-delta clearing, phase/commentary/final-answer mapping, whitespace-only deltas, early termination/error handling, oversized-index bounds, and response-id capture behavior.

sequence diagram(s)

sequenceDiagram
    participant SSE as SSE Event Stream
    participant Parser as parseSseStream()
    participant State as ParsedResponseState
    participant Finalizer as finalizeParsedResponse()
    participant Consumer as Response Consumer

    SSE->>Parser: open stream
    activate Parser
    Parser->>State: init fresh state
    activate State

    loop for each sse event
        SSE->>Parser: sse event chunk
        Parser->>State: maybeCaptureResponseEvent(event)
        Note right of State: mutate outputs, deltas, summaries, phases, ids, errors
    end

    SSE->>Parser: stream end or error
    Parser->>Finalizer: finalizeParsedResponse(state)
    deactivate State
    activate Finalizer

    Finalizer->>Finalizer: merge outputs and summaries<br/>synthesize missing text fields<br/>set phase/commentary/final_answer
    Finalizer->>Consumer: return finalized response or raw sse on error
    deactivate Finalizer
    deactivate Parser
Loading

estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning title follows conventional commits format with type 'enhance' (non-standard) and concise summary under 72 chars, but 'enhance' is not an approved type; should use 'feat' or 'refactor'. change title to 'refactor: enhance parser for semantic SSE events' or 'feat: add semantic SSE event parsing' to use approved conventional commit types.
Docstring Coverage ⚠️ Warning Docstring coverage is 5.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning pull request description lacks required sections from template. missing validation checklist, docs/governance items, and risk/rollback guidance. add complete validation checklist (lint, typecheck, test, build), confirm all docs sections reviewed, and document risk level with explicit rollback plan.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/openai-parity-pr3
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/openai-parity-pr3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ndycode ndycode added the needs-human-review Flagged by automated PR quality screening; maintainer review required. label Mar 22, 2026
coderabbitai[bot]
coderabbitai bot previously requested changes Mar 22, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/request/response-handler.ts`:
- Around line 14-21: The handler currently calls onResponseId for every envelope
carrying data.response because ParsedResponseState is recreated per parsed line
and callbacks are fired in processSemanticEvent/passThrough parsing; fix this by
elevating a seen-response-id set to the long-lived response stream scope (e.g.,
add a Set<string> seenResponseIds on the ResponseHandler or the outer
ParsedResponseState) and only call onResponseId(responseId) the first time an id
is observed; ensure functions like processSemanticEvent / handleEnvelope / the
passthrough parser reuse the same ParsedResponseState instance rather than
recreating it per line so dedupe works across envelopes, and update relevant
tests (fetch-helpers.test.ts and response-handler.test.ts) to assert
onResponseId is called once for multi-envelope semantic events.
- Around line 53-59: mergeRecord currently treats any update.content (including
empty arrays from semantic response.output_item.added frames) as authoritative
and can wipe richer terminal output when merged back into response.output;
change mergeRecord to only replace content when update.content is non-null and
has length > 0 (i.e., treat empty arrays as non-authoritative), using
cloneContentArray(update.content) when replacing and otherwise cloning
base.content, and ensure the same logic applies to the merge call that folds
items into response.output (the code around the response.output merging at lines
~213-218) so non-text or final answer parts are preserved. Reference:
mergeRecord, cloneContentArray and the response.output merging code path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a388a9c2-129e-4f16-9eb2-2356b87f8576

📥 Commits

Reviewing files that changed from the base of the PR and between 173d64f and 9968ba8.

📒 Files selected for processing (3)
  • lib/request/response-handler.ts
  • test/fetch-helpers.test.ts
  • test/response-handler.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/fetch-helpers.test.ts
  • test/response-handler.test.ts
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/request/response-handler.ts

@ndycode
Copy link
Owner Author

ndycode commented Mar 22, 2026

@coderabbitai review
@greptile @cubic review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

@ndycode, kicking off the review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ndycode
Copy link
Owner Author

ndycode commented Mar 22, 2026

@coderabbitai review
@greptile @cubic review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

@ndycode, triggering a review of PR #311 now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai[bot]
coderabbitai bot previously requested changes Mar 22, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/request/response-handler.ts`:
- Around line 120-139: syncPhaseText currently rebuilds a phase's full text by
scanning all phaseTextSegments on every append, causing O(n²) in pathological
cases; change the update strategy so appendPhaseTextSegment updates phaseText
incrementally instead of calling syncPhaseText to re-filter the whole map: when
adding a segment to phaseTextSegments (identify segments by the `${phase}\u0000`
prefix), append the new segment value to the existing state.phaseText.get(phase)
(or initialize it) and only fall back to a full rebuild in rare cases (e.g.,
segment removal or reordering). Update functions/places that call syncPhaseText
(including appendPhaseTextSegment) to use this incremental update and keep
syncPhaseText as a fallback or rename it to rebuildPhaseText for clarity.
- Around line 519-533: Add regression tests that exercise the
response.content_part.added and response.content_part.done branches in
response-handler.ts: construct events where eventRecord.part is a mutable record
with type "output_text", a phase, and content_index, then invoke the same
event-processing pathway used by existing tests and assert that
setOutputTextValue updates state (or is called) with the expected outputIndex,
content_index and text from getPartText; also add tests for the branch where
part is missing or not "output_text" to ensure capturePhase is invoked. Target
the test file test/response-handler.test.ts and reference the handler logic
around setOutputTextValue, getPartText, and capturePhase so the new tests cover
both success and validation-failure paths for response.content_part.added and
response.content_part.done.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 84fe93be-b96f-4a5e-a64a-7ebf655dbad5

📥 Commits

Reviewing files that changed from the base of the PR and between 9968ba8 and 3241406.

📒 Files selected for processing (3)
  • lib/request/response-handler.ts
  • test/fetch-helpers.test.ts
  • test/response-handler.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/fetch-helpers.test.ts
  • test/response-handler.test.ts
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/request/response-handler.ts
🔇 Additional comments (16)
lib/request/response-handler.ts (8)

52-60: semantic distinction between getStringField and getDeltaField is correct but subtle.

getStringField (line 52-55) rejects whitespace-only strings via .trim().length > 0, while getDeltaField (line 57-60) preserves them. this correctly handles whitespace deltas vs canonical terminal text.

covered by test/response-handler.test.ts:165-198 (whitespace delta preservation test).


62-69: index guard against sparse array construction looks good.

MAX_SYNTHESIZED_EVENT_INDEX = 255 (line 11) bounds all synthesized indices. this prevents memory exhaustion from malicious payloads like output_index: 1000000.

covered by test/response-handler.test.ts:489-517 (oversized indices test).


76-87: content merge now preserves richer terminal output.

mergeRecord (line 76-87) only replaces base content when updateContent.length > 0, addressing the past review about empty content: [] frames erasing terminal content.

covered by test/response-handler.test.ts:200-223 (richer terminal output test).


227-240: setReasoningSummaryValue clears stale deltas on null text.

pr summary notes asymmetric behavior (line 235-237 deletes vs append which bails on null), but this is intentional: *.done events with null/empty text clear accumulated deltas, while append on null is a no-op.

covered by test/response-handler.test.ts:276-296 (stale reasoning delta clearing test).


299-307: canonical terminal content overrides accumulated deltas correctly.

applyAccumulatedOutputText (line 299-307) skips writing synthesized text when part.text is already a string (canonical terminal content present). it also syncs phase text from the canonical value.

covered by test/response-handler.test.ts:225-248 (canonical terminal content test).


401-441: finalizeParsedResponse correctly aborts on error.

line 404 returns null when state.encounteredError is true, ensuring error streams don't produce synthesized JSON.

covered by test/response-handler.test.ts:451-469 (error event before done test).


451-467: response id deduplication via seenResponseIds addresses past review.

notifyResponseId (line 457-458) checks and adds to state.seenResponseIds before invoking callback. this ensures onResponseId fires once per stream, not per envelope.

covered by test/response-handler.test.ts:433-449 and test/fetch-helpers.test.ts:747-765.


684-729: streaming capture now shares state across chunks.

createResponseIdCapturingStream (line 690) creates a single ParsedResponseState instance reused across all transform calls, addressing the past review about per-chunk state recreation.

also correctly stops processing on encounteredError (line 693, 709).

test/response-handler.test.ts (7)

69-116: comprehensive semantic synthesis test.

test/response-handler.test.ts:69-116 validates end-to-end synthesis of output_text, reasoning_summary_text, phase, final_answer_text, and phase_text from semantic SSE events.

assertions cover both top-level synthesized fields and nested output[].content[] / output[].summary[] structure.


165-198: whitespace delta preservation test addresses regression.

test/response-handler.test.ts:165-198 verifies that whitespace-only deltas (e.g., " ") are accumulated when no *.done event overrides them.

this pins the fix for the whitespace truncation bug mentioned in pr summary.


250-296: stale delta clearing tests pin clearing behavior.

test/response-handler.test.ts:250-274 and test/response-handler.test.ts:276-296 verify that *.done events with whitespace-only text clear accumulated deltas.

note: the test at line 255 uses "text":" " (single space) which getStringField treats as empty due to .trim().length > 0 check. this is correct per implementation.


433-449: response id dedup assertion added.

test/response-handler.test.ts:448 adds expect(onResponseId).toHaveBeenCalledTimes(1) for multi-envelope streams. this pins the fix for the past review about duplicate onResponseId calls.


451-487: error event and partial stream tests ensure robust fallback.

test/response-handler.test.ts:451-469 verifies that error events before response.done return raw SSE and don't invoke onResponseId.

test/response-handler.test.ts:471-487 verifies that partial streams (no terminal event) also return raw SSE.

these pin the "only terminal events set state.finalResponse" fix.


489-517: oversized index guard test prevents sparse array construction.

test/response-handler.test.ts:489-517 sends events with output_index: 1000000 and verifies they are ignored, preventing memory exhaustion attacks.

assertions check that output, output_text, and reasoning_summary_text are all undefined.


594-610: streaming error-stop test added for attachResponseIdCapture.

test/response-handler.test.ts:594-610 verifies that attachResponseIdCapture stops capturing response IDs after an SSE error event.

this complements the non-streaming error test and ensures consistent behavior across both paths.

test/fetch-helpers.test.ts (1)

746-765: integration test for streaming response id capture.

test/fetch-helpers.test.ts:747-765 validates that handleSuccessResponse(response, true, { onResponseId }) captures response IDs without rewriting the stream body.

assertions check:

  • stream body still contains "resp_stream_123" (not rewritten)
  • onResponseId called with "resp_stream_123"
  • onResponseId called exactly once (dedup working)

this pins the integration behavior for the response-handler changes.

coderabbitai[bot]
coderabbitai bot previously requested changes Mar 22, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/request/response-handler.ts`:
- Around line 522-536: Add a test in test/response-handler.test.ts that
exercises the branch in response-handler.ts where eventRecord.part is missing or
has a non-"output_text" type so capturePhase(state, part?.phase) is invoked;
construct an SSE stream (used by convertSseToJson) containing response.created,
response.output_item.added (to create the output index), then a
response.content_part.done with part missing or part.type set to "annotation"
and phase "final_answer", and finally response.done, then assert the converted
JSON body contains phase === "final_answer" to verify capturePhase was called.
- Around line 52-55: getStringField currently trims and rejects whitespace-only
strings which is fine for validation fields like 'type' and 'phase' but could
unintentionally drop meaningful whitespace for other uses; change getStringField
to accept an optional parameter (e.g., trim: boolean = true) and update its
behavior to only call trim() when trim is true, then update all callers (notably
the validation usages such as where 'type' is validated) to pass trim=true and
any textual callers to pass trim=false or omit if default is desired; also add a
short JSDoc above getStringField describing the trim behavior so future callers
know the intent.

In `@test/response-handler.test.ts`:
- Around line 543-571: Add a unit test that verifies delta events missing
output_index are ignored: create a new test (e.g., "ignores delta events with
missing output_index") that constructs an SSE string containing
response.created, response.output_text.delta (without output_index),
response.reasoning_summary_text.delta (without output_index), and response.done,
then call convertSseToJson with a Response and Headers and assert the returned
body.id equals the response id and that body.output_text and
body.reasoning_summary_text are undefined; this ensures the behavior around
getNumberField/makeOutputTextKey skipping null output_index values is covered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d84ec947-0d19-44be-8fab-b2e23b33b528

📥 Commits

Reviewing files that changed from the base of the PR and between 3241406 and 8bdcddf.

📒 Files selected for processing (2)
  • lib/request/response-handler.ts
  • test/response-handler.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/request/response-handler.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/response-handler.test.ts
🔇 Additional comments (10)
lib/request/response-handler.ts (6)

128-139: o(n²) in syncPhaseText remains — acceptable for typical streams.

syncPhaseText rebuilds phase text by filtering all phaseTextSegments on every append. this is already flagged in prior review as a nitpick. for pathological cases (many segments), consider lazy sync or incremental append if profiling shows hot paths.


15-26: state design looks good with proper deduplication.

seenResponseIds at line 24 addresses the prior concern about duplicate onResponseId calls. the encounteredError flag at line 25 cleanly stops processing on SSE errors. test/response-handler.test.ts:502 now asserts toHaveBeenCalledTimes(1).


76-87: content merge logic correctly preserves richer terminal content.

lines 81-84 now treat empty updateContent as non-authoritative, falling back to base.content. this addresses the prior concern at lines 76-86 about response.output_item.added frames with content: [] erasing terminal content. test/response-handler.test.ts:254-277 pins this behavior.


454-470: deduplication via seenResponseIds correctly prevents double callbacks.

notifyResponseId now checks state.seenResponseIds.has(responseId) at line 460 before invoking the callback, and adds the id to the set at line 461. this addresses the prior concern about duplicate side effects on windows filesystem and race conditions.


687-732: state reuse in createResponseIdCapturingStream correctly deduplicates across envelopes.

line 693 creates a single ParsedResponseState shared across the entire stream. lines 696 and 712 check state.encounteredError to short-circuit processing after an error event. this addresses the prior concern about recreating state per parsed line.


363-402: the canonical-text guard is working correctly; no action needed.

tests confirm the intended behavior: applyReasoningSummaries at lib/request/response-handler.ts:388-390 correctly skips overwriting when nextPart.text is already a string (canonical terminal value). test at test/response-handler.test.ts:140-163 validates that synthesized deltas don't overwrite canonical nested summary parts, and test at test/response-handler.test.ts:118-138 confirms top-level reasoning_summary_text remains canonical while nested output[].summary[] can be synthesized. the guard is intentional and tested.

test/response-handler.test.ts (4)

69-116: lgtm, comprehensive coverage for semantic SSE synthesis.

this test exercises the full accumulation flow: response.output_item.added, response.output_text.*, response.reasoning_summary_text.*, and phase tracking. assertions at lines 102-115 pin the synthesized fields (output_text, reasoning_summary_text, phase, final_answer_text, phase_text) and nested content/summary structures.


190-217: addresses past review concern for response.content_part.* coverage.

this test exercises lib/request/response-handler.ts:522-536 by constructing response.content_part.added and response.content_part.done events with type: "output_text" payloads. assertions confirm that setOutputTextValue is invoked and that the final text from the .done event takes precedence over the .added delta.


505-541: solid error and early-termination coverage.

tests at lines 505-523 and 525-541 verify that onResponseId is never called when an error event arrives or when the stream ends without a terminal event. this aligns with lib/request/response-handler.ts:477-481 setting state.encounteredError and lib/request/response-handler.ts:595 returning null.


247-251: this review misidentifies the test expectations.

the assertions at test/response-handler.test.ts:247-251 expect spaces to be preserved, not stripped. the test checks 'Hello world' (with space), which matches the test name "preserves whitespace-only semantic deltas when no done events override them."

getDeltaField at lib/request/response-handler.ts:57-60 correctly preserves whitespace: it returns any non-empty string including space-only deltas via value.length > 0 (no trim). appendOutputTextValue then concatenates all non-null deltas. the code works as intended and the test will pass.

			> Likely an incorrect or invalid review comment.

@ndycode ndycode force-pushed the feat/openai-parity-pr3 branch from 129e40b to 6a273fe Compare March 22, 2026 13:43
@ndycode ndycode removed the needs-human-review Flagged by automated PR quality screening; maintainer review required. label Mar 22, 2026
@ndycode ndycode dismissed stale reviews from coderabbitai[bot], coderabbitai[bot], and coderabbitai[bot] March 22, 2026 15:50

All review threads are resolved and later commits addressed this stale automated change request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants