You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Server-Sent Events (SSE) streaming support to the Chat Bot to display LLM responses in real-time as tokens are generated, rather than waiting for the complete response. This would significantly improve user experience, especially for longer responses.
Current Limitations
User Experience Issues
Long wait times with no feedback: Users must wait for the entire response to be generated before seeing any output
Polling overhead: Frontend polls every 2 seconds, adding average 1-second latency to response delivery
Unnecessary server load: Constant polling creates repeated HTTP requests while tasks are running
Current Architecture
The Chat Bot currently uses a polling-based approach:
User sends message → TaskProcessing schedules task → Frontend polls every 2s →
Task completes → Result saved to DB → Next poll returns complete message
2-second polling interval defined at src/components/ChattyLLM/ChattyLLMInputForm.vue:753
Why Streaming is Now Feasible
Addressing Previous Concerns
In #41, streaming was marked as "technically not realistic" due to concerns about model compatibility and Nextcloud platform limitations. However, I believe these concerns can be addressed:
1. Model Compatibility
Concern: "We won't be able to use many models anymore"
Reality: Virtually all modern LLM providers support streaming:
✅ OpenAI API (GPT-3.5, GPT-4, GPT-4o) - stream: true parameter
✅ Anthropic Claude - Server-Sent Events streaming
✅ Ollama (local models) - Streaming endpoint
✅ LocalAI - OpenAI-compatible streaming API
✅ Azure OpenAI - Streaming support
✅ Google Gemini - Streaming responses
✅ Hugging Face Inference API - Streaming generators
✅ Together AI, Groq, Perplexity - All support streaming
Providers that don't support streaming can gracefully fallback to the current polling approach.
2. Nextcloud Platform Limitations
Current constraint: The TaskProcessing framework is indeed incompatible with streaming:
ISynchronousProvider::process() is a blocking method that returns complete output
Task status is binary: "running" (HTTP 417) or "complete" (HTTP 200)
No support for partial/chunked output delivery
$reportProgress callback exists but is never utilized by any provider
Solution: Bypass TaskProcessing for streaming-capable providers by creating a dedicated streaming endpoint that communicates directly with LLM APIs.
Proposed Implementation
Architecture Overview
Create a parallel streaming path that coexists with the current TaskProcessing approach:
/** * Stream LLM response in real-time using Server-Sent Events * * @param int $sessionId * @param int $messageId User's message ID * @return Http\StreamResponse */
#[NoAdminRequired]
publicfunctionstreamGenerate(int$sessionId, int$messageId): Response {
// 1. Validate session ownership// 2. Get conversation history// 3. Detect provider streaming capability// 4. If streaming supported:// - Set headers: Content-Type: text/event-stream// - Call provider API with streaming enabled// - Yield chunks as Server-Sent Events// - Save complete message to DB when done// 5. If streaming not supported:// - Return error, frontend falls back to polling
}
2. Provider Integration Layer
New file: lib/Service/StreamingService.php
class StreamingService {
/** * Check if configured provider supports streaming */publicfunctionproviderSupportsStreaming(): bool;
/** * Stream chat completion from provider * Yields string chunks as they arrive */publicfunctionstreamChatCompletion(array$messages): \Generator;
/** * Get provider-specific streaming configuration */privatefunctiongetProviderConfig(): array;
}
This service would:
Read provider settings (already configured in Assistant settings)
Make direct HTTP requests to provider APIs with streaming enabled
Parse streaming response format (SSE or JSON streaming)
// New method (replaces pollGenerationTask)asyncstreamMessageGeneration(sessionId,messageId){consturl=generateUrl('/apps/assistant/api/v1/chat/stream')constparams=newURLSearchParams({ sessionId, messageId })consteventSource=newEventSource(`${url}?${params}`)letfullMessage=''eventSource.onmessage=(event)=>{if(event.data==='[DONE]'){eventSource.close()this.loadingMessage=falsereturn}// Append chunk to displayfullMessage+=event.datathis.updateStreamingMessage(fullMessage)}eventSource.onerror=(error)=>{eventSource.close()// Fallback to polling if streaming failsthis.pollGenerationTask(taskId,sessionId)}}// New method: Update message display in real-timeupdateStreamingMessage(content){// Find or create placeholder message and update its content// This provides real-time visual feedback as tokens arrive}
Reduced server load: One SSE connection vs. polling requests every 2 seconds
Lower latency: No 0-2 second polling delay
More efficient: Less HTTP overhead, fewer database queries
Modern standard: SSE is well-supported in all modern browsers
Progressive enhancement: Works alongside existing system
Competitive Parity
All major AI chat interfaces use streaming:
ChatGPT web interface
Claude web interface
Google Gemini
Microsoft Copilot
Users expect this behavior from AI assistants.
Backward Compatibility
✅ Fully backward compatible:
Existing polling mechanism remains untouched
Non-streaming providers continue to work
Gradual rollout possible (enable per-provider)
No database schema changes required
No breaking API changes
Security Considerations
Same authentication/authorization as current chat endpoints
Session ownership validation
Rate limiting (inherit from existing chat endpoints)
Input sanitization (already handled)
SSE is one-way (server→client), no additional attack surface
Open Questions for Maintainers
Architecture approval: Is bypassing TaskProcessing acceptable for this use case? Or would you prefer exploring $reportProgress callback implementation in TaskProcessing framework?
Provider integration: Should streaming be implemented in:
Describe the feature you'd like to request
Summary
Add Server-Sent Events (SSE) streaming support to the Chat Bot to display LLM responses in real-time as tokens are generated, rather than waiting for the complete response. This would significantly improve user experience, especially for longer responses.
Current Limitations
User Experience Issues
Current Architecture
The Chat Bot currently uses a polling-based approach:
File references:
src/components/ChattyLLM/ChattyLLMInputForm.vue:716-760lib/Controller/ChattyLLMController.php:680-724src/components/ChattyLLM/ChattyLLMInputForm.vue:753Why Streaming is Now Feasible
Addressing Previous Concerns
In #41, streaming was marked as "technically not realistic" due to concerns about model compatibility and Nextcloud platform limitations. However, I believe these concerns can be addressed:
1. Model Compatibility
Concern: "We won't be able to use many models anymore"
Reality: Virtually all modern LLM providers support streaming:
stream: trueparameterProviders that don't support streaming can gracefully fallback to the current polling approach.
2. Nextcloud Platform Limitations
Current constraint: The TaskProcessing framework is indeed incompatible with streaming:
ISynchronousProvider::process()is a blocking method that returns complete output$reportProgresscallback exists but is never utilized by any providerSolution: Bypass TaskProcessing for streaming-capable providers by creating a dedicated streaming endpoint that communicates directly with LLM APIs.
Proposed Implementation
Architecture Overview
Create a parallel streaming path that coexists with the current TaskProcessing approach:
Implementation Steps
1. Backend: New Streaming Endpoint
File:
lib/Controller/ChattyLLMController.phpAdd a new method:
2. Provider Integration Layer
New file:
lib/Service/StreamingService.phpThis service would:
3. Frontend: EventSource Integration
File:
src/components/ChattyLLM/ChattyLLMInputForm.vueReplace polling with EventSource:
4. Route Configuration
File:
appinfo/routes.phpAdd new route:
[ 'name' => 'chattyLLM#streamGenerate', 'url' => '/api/v1/chat/stream', 'verb' => 'GET', ],5. Graceful Fallback
The implementation should:
Configuration
No user action required. The system should:
This could be exposed as an optional toggle in admin settings:
Benefits
User Experience
Technical Benefits
Competitive Parity
All major AI chat interfaces use streaming:
Users expect this behavior from AI assistants.
Backward Compatibility
✅ Fully backward compatible:
Security Considerations
Open Questions for Maintainers
Architecture approval: Is bypassing TaskProcessing acceptable for this use case? Or would you prefer exploring
$reportProgresscallback implementation in TaskProcessing framework?Provider integration: Should streaming be implemented in:
Feature flag: Should this be:
Scope: Should we also add streaming for:
Testing: What providers should be tested in CI/CD?
Alternative Considered: WebSockets
WebSockets would also enable streaming but:
Request for Feedback
I'd love to hear thoughts from @julien-nc, @marcelklehr, and the Nextcloud community:
Related Issues
References
src/components/ChattyLLM/ChattyLLMInputForm.vue:716-760lib/Controller/ChattyLLMController.php:680-724lib/Listener/ChattyLLMTaskListener.php:68I'm willing to implement this feature if there's interest from the maintainers. Please let me know your thoughts!
Describe the solution you'd like
see above
Describe alternatives you've considered
see above