Problem
The query method waits for the complete response before returning anything to the caller. For long-form answers this introduces noticeable latency — the caller receives nothing until generation is fully complete. Every major AI SDK (Anthropic, OpenAI, Gemini) exposes streaming as a first-class feature.
Proposed Behaviour
Add a query_stream method that yields response chunks as they arrive rather than waiting for the full response.
async for chunk in client.query_stream("What is Python?"):
print(chunk, end="", flush=True)
- Existing
query() is unchanged
query_stream() returns an async generator
- Citations and metadata returned at end of stream
Files to Modify
| File |
Change |
src/brainus_ai/client.py |
Add query_stream async generator method |
src/brainus_ai/models.py |
Add streaming chunk model |
src/brainus_ai/__init__.py |
Export new types |
Acceptance Criteria