Skip to content

fix: parse Bedrock event-stream framing for non-Claude models#252

Open
safayavatsal wants to merge 2 commits into
highflame-ai:mainfrom
safayavatsal:fix/bedrock-event-stream-parser
Open

fix: parse Bedrock event-stream framing for non-Claude models#252
safayavatsal wants to merge 2 commits into
highflame-ai:mainfrom
safayavatsal:fix/bedrock-event-stream-parser

Conversation

@safayavatsal
Copy link
Copy Markdown

Summary

  • Replaces the substring-scanning streaming parser in route_service.py with proper AWS event-stream parsing via botocore.eventstream.EventStreamBuffer. Fixes silent empty streams on Bedrock models whose chunks don't carry delta.text (Llama-3 emits generation, Cohere emits top-level text).
  • Detects wire format from Content-Type instead of payload sniffing; unifies sync/async paths; removes dead is_bedrock parameter and stray debug print()s.
  • On a missed JSONPath, logs one structured WARNING naming the path that missed and the payload keys observed — turns silent empty streams into actionable diagnostics.

Closes #146, Closes #147.

Backend dependency

The SDK parser is now correct, but each Bedrock route still needs the right ModelSpec.stream_response_path for its model family. Tracked separately in #251 — please audit existing routes before relying on this fix end-to-end.

What changed

  • highflame/services/route_service.py — new parser internals (_detect_stream_format, _iter_event_stream_payloads, _iter_sse_payloads, _stream_text_sync, _stream_text_async). Public method signatures unchanged.
  • pyproject.toml — adds botocore ^1.34.0 as a runtime dep; adds [tool.pytest.ini_options] (asyncio_mode = strict, testpaths = [\"tests\"]).
  • .flake8extend-ignore = E203 (black/flake8 compatibility; recommended by black docs).
  • README.md — new "Streaming" section with per-model stream_response_path table.
  • tests/ — new top-level directory (the repo previously lacked one). Includes a deterministic event-stream fixture builder so regression tests run without live Bedrock access.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • This change requires a documentation update (done in this PR)

Testing

  • Unit tests added — 11 tests in tests/streaming/test_event_stream_parser.py covering Llama-3, Cohere, Claude (regression), and OpenAI SSE; sync + async; missing-path warning; invalid-path resilience; sync/async parity.
  • Local run: pytest tests/ → 11 passed. black --check clean. flake8 clean.
  • Manual smoke against a live Bedrock route — needs reviewer with gateway access; fixtures validated against real botocore.eventstream so spec compliance is verified, but live behavior is not.

Impact / Risks

  1. botocore as a direct runtime dep (~5MB). Most users already have it transitively. Vendoring a ~150-line parser is the alternative if this is undesirable.
  2. Behavior change for callers relying on the lenient parser. Anyone whose stream "worked" only because the old string-scan happened to match a malformed frame may now get the structured warning instead. No silent regressions — the warning makes the cause explicit.
  3. Without live repro, "fixes meta.llama3-8b-instruct-v1:0, Interaction type: invoke-with-response-stream #147" is verified against the AWS event-stream spec, not against production Bedrock. Fixtures were generated synthetically and round-trip through real botocore.eventstream.

The previous streaming parser detected event-stream frames by
substring-scanning decoded lines for "message-type" and "bytes", then
extracted text via a fixed `delta.text` JSONPath. This silently yielded
nothing for Bedrock models whose chunks don't carry `delta.text`:
Llama-3 emits `generation`, Cohere emits top-level `text`, etc.

Replace with proper AWS event-stream parsing via
botocore.eventstream.EventStreamBuffer, fed from response.iter_bytes()
/ aiter_bytes(). Detect wire format from Content-Type
(application/vnd.amazon.eventstream vs text/event-stream) instead of
sniffing payload text. Unify sync and async paths through shared
helpers; remove the dead is_bedrock parameter and debug print()s.

On a missed JSONPath, log a single structured WARNING with the
payload keys observed so the next misconfiguration is debuggable
instead of producing a silent empty stream.

Adds botocore as a direct runtime dep, a top-level tests/ directory
(the repo previously lacked one), and a deterministic event-stream
fixture builder so regression tests run without live Bedrock access.

Closes highflame-ai#146, Closes highflame-ai#147. Backend audit tracked in highflame-ai#251.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the streaming response handling to support both AWS Bedrock event-stream and standard SSE formats, utilizing JSONPath for flexible text extraction across different model providers. It introduces robust parsing logic using botocore.eventstream, replaces legacy substring-based extraction methods, and adds comprehensive regression tests with synthetic fixtures. Review feedback identifies opportunities to improve error handling during base64 decoding and suggests refining the extraction logic to correctly handle empty string chunks, thereby avoiding false-positive warnings when a JSONPath match is found but contains no text.

Comment thread highflame/services/route_service.py Outdated
Comment thread highflame/services/route_service.py Outdated
Comment thread highflame/services/route_service.py
Comment thread highflame/services/route_service.py
Comment thread highflame/services/route_service.py
Two changes, addressing 5 inline review comments:

1. Add TypeError to the base64.b64decode catch tuple (line 241).
   b64decode raises TypeError on non-string/non-bytes inputs (e.g. None,
   numbers). Including it prevents the parser from crashing on malformed
   event-stream payloads.

2. Distinguish path-miss from empty-string-match (lines 247-272, plus four
   call sites). _extract_text now returns:
     - None       when the JSONPath misses, errors, or matches a non-string
                  value (debug-logged so misconfiguration is still visible)
     - ""         when the path matches an empty string (legitimate keep-alive
                  chunk; e.g. OpenAI streaming sometimes emits delta.content
                  with an empty string before the real tokens arrive)
     - the value  when the path matches a non-empty string

   Callers changed from `if text:` to `if text is None: warn elif text:
   yield`. Empty chunks no longer trigger the "did not match" WARNING.

   Diverged from the bot's literal suggestion (`return str(val) if val is
   not None else ""`) because str()-coercion would silently stringify dicts,
   lists, and numbers, masking misconfigured stream_response_path values.

Tests:
  - Added test_empty_content_chunk_does_not_warn: SSE stream with one
    empty `delta.content: ""` chunk yields no warning and produces only
    the non-empty chunks.
  - Added test_extract_text_distinguishes_miss_from_empty: unit-level
    coverage for the None / "" / "value" / non-string / null cases.
  - Added openai_sse_with_empty.txt fixture for the SSE test.

13 tests pass (was 11). Lint clean.
@safayavatsal
Copy link
Copy Markdown
Author

Addressed all 5 review comments in e614b32:

Comment 1 (line 241, TypeError in b64decode catch): Applied. TypeError added to the exception tuple alongside binascii.Error, json.JSONDecodeError, UnicodeDecodeError.

Comments 2-5 (empty-string vs path-miss): Applied with one deliberate divergence. _extract_text now returns:

  • None when the path misses, errors, or matches a non-string value (debug-logged)
  • "" when the path matches a literal empty string (legitimate keep-alive chunk)
  • the value when the path matches a non-empty string

The four call sites changed from if text: to if text is None: warn elif text: yield. Empty chunks no longer trigger the "did not match" warning, which is the bot's core ask.

Divergence from the literal suggestion: the bot proposed return str(val) if val is not None else "". I didn't take the str(val) coercion because it would silently stringify dicts, lists, and numbers — masking misconfigured stream_response_path values that point at non-text nodes. Instead the new code keeps strict type discipline: non-string match returns None with a debug log, so a misconfiguration still surfaces in the once-per-stream WARNING.

Tests: added test_empty_content_chunk_does_not_warn (SSE stream with delta.content: "" in the middle: no warning fires, only non-empty chunks yield) and test_extract_text_distinguishes_miss_from_empty (unit coverage for None / "" / "value" / non-string / null). 13 tests pass (was 11). Lint clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

meta.llama3-8b-instruct-v1:0, Interaction type: invoke-with-response-stream cohere.command-text-v14, Interaction type: invoke-with-response-stream

1 participant