Skip to content

feat(core): Phase 2 — MOT-owned streaming chunking via stream_parsed_repr (placeholder) #1013

@planetf1

Description

@planetf1

Status

Placeholder issue — needs elaboration through discussion in the comments below.

Raised so that PR #942 has somewhere concrete to link its thread on the broader MOT-owned chunking direction, agent-friendly authoring patterns, and related Phase 2 work. The design summary below captures what is known from epic #891 and the PR #942 discussion. Specific implementation decisions are intentionally deferred to comments.

Context

mellea/stdlib/streaming.py (landing in PR #942, closing #901) provides Phase 1 streaming validation — a call-site ChunkingStrategy with three built-in chunkers (SentenceChunker, WordChunker, ParagraphChunker) and an orchestrator (stream_with_chunking). This is a scoped pragmatic choice. Epic #891 names the longer-term direction:

The right long-term owner of chunking is the MOT itself, since it already owns parsed_repr and has the semantic knowledge to produce meaningful chunks for its specific type. A follow-on issue will cover adding stream_parsed_repr to MOT.

This is that follow-on issue.

Consolidated design summary

Motivation

Phase 1 collapses two semantic concerns onto the call site: how to chunk the stream (a property of the output type) and how to validate chunks (a property of the requirement). These are independent and belong in different owners.

  • Type semantics. What counts as a complete chunk of this kind of output? JSON value, prose sentence, code statement, audio segment, image region. Invariant across requirements.
  • Constraint semantics. What makes a particular output acceptable? Max three sentences, matches schema X, no hallucinated entities. Invariant across outputs of the same type.

Under Phase 1, both are author-written at the call site. Under Phase 2, the type semantics move onto the MOT (via stream_parsed_repr), leaving the requirement author with only the stream_validate override to write.

Motivating output types (all in scope)

Phase 1 chunkers cover prose only (sentence/word/paragraph, all operating on accumulated_text: str). Phase 2 must support at least these output types, each with genuinely different chunk semantics:

  • Prose — sentence, word, paragraph boundaries. Already covered by Phase 1 chunkers; Phase 2 should subsume them.
  • Structured text — JSON values, YAML documents, code statements/blocks. Chunk boundary is "one complete parseable unit."
  • Multi-modal streams — audio (silence-delimited segments, fixed windows, VAD-detected utterances), image (region or tile boundaries), potentially video. Chunk boundary is inherently non-string.

Multi-modal is first-class motivation for this work, not deferred scope. Epic #891 explicitly names the audio case ("Audio that goes wrong in the first few seconds can't be caught until the full clip is done"), and the ChunkingStrategy.split(accumulated_text: str) -> list[str] signature in Phase 1 forecloses on multi-modal by design — that foreclosure is what this issue exists to address.

Proposed direction

Add stream_parsed_repr as an async method/generator on ModelOutputThunk — emitting typed, complete chunks as the stream progresses, where "complete" is defined by the MOT's own parsed-repr type. Each MOT subclass (prose, JSON, audio, image, code) provides its own implementation.

Consequences for Phase 1 APIs:

  • stream_with_chunking() gains an alternative mode that consumes mot.stream_parsed_repr() instead of applying an external ChunkingStrategy. Call-site interface stays the same.
  • External ChunkingStrategy implementations can be deprecated once sufficient MOT types exist.
  • Requirements that currently need internal state to track accumulated output can instead read from context, since the MOT will carry the partial parsed state.

Everything else Phase 1 delivers — stream_validate, PartialValidationResult, the event types (#902), the orchestration logic — is unaffected.

Open design questions (for comments)

  1. Signature and generator shape. Is stream_parsed_repr an async generator on the MOT? Does it share a queue with the raw astream(), or run in parallel? For multi-modal, does the signature accept bytes / frames / tensors rather than str?
  2. Chunking boundary authority. Which component decides what a "complete chunk" is — the MOT's parser, a pluggable chunk-boundary predicate, or both? Answer likely differs between text and multi-modal.
  3. Backpressure. If parsed chunks are slower to produce than raw tokens/frames, where does the buffering live?
  4. Backwards compatibility. How to migrate from external ChunkingStrategy to MOT-native chunking without breaking Phase 1 call sites? Are both modes supported in parallel during a transition period?
  5. Typed output. Does stream_parsed_repr yield values of type S (the MOT's parsed_repr type parameter), or a richer container that carries partial-parse state? Multi-modal MOTs may need the latter.
  6. Error handling. What happens if the MOT's parser fails on a partial stream — surface immediately, wait for more data, or fall back to raw chunking?
  7. Testability. Each MOT type's stream_parsed_repr needs verification against its non-streaming parsed_repr. Shape of the shared test harness? Multi-modal test fixtures are their own problem.
  8. Agent authoring. If we want new MOT types to be agent-writable (see PR feat(stdlib): add stream_with_chunking() with per-chunk validation (#901) #942 discussion), what contracts are needed on the MOT base class? Clear APIs and guidelines are the substrate; a skill can sit on top where the framework supports one.

Broader scope (to discuss)

The PR #942 thread raised a parallel observation. Both stream_validate authoring and (future) stream_parsed_repr authoring have deterministic checks against a non-streaming counterpart (validate() and parsed_repr respectively), which makes them plausible candidates for agent-friendly extension patterns — potentially skills in frameworks that support them. Worth discussing whether this issue should cover just stream_parsed_repr or also the broader agent-authoring story, or whether the latter warrants its own separate issue.

Dependencies

References

Metadata

Metadata

Assignees

Labels

epicHigh level Epic

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions