Skip to content

WebUI media display enhancements: audio + video#1067

Open
nickdwhite wants to merge 2 commits intoagent0ai:mainfrom
nickdwhite:feature-branch
Open

WebUI media display enhancements: audio + video#1067
nickdwhite wants to merge 2 commits intoagent0ai:mainfrom
nickdwhite:feature-branch

Conversation

@nickdwhite
Copy link

Summary

This PR introduces inline audio and video playback capabilities to Agent Zero's WebUI chat interface, enabling users to play media files directly within chat messages without downloading or opening external applications.

Problem Statement

Currently, Agent Zero supports file attachments in chat messages, but audio and video files are displayed as plain text links or generic file icons. Users must:

  • Download media files to their local machine
  • Open external media players
  • Switch contexts away from the chat interface

This creates friction in workflows where media content (voice messages, video explanations, audio recordings) is frequently shared between users and agents.

Solution

Implements native HTML5 <audio> and <video> players that render directly in chat messages when media content is detected. The solution includes:

Features

  • Audio: Inline audio playback - MP3, WAV, OGG, AAC, FLAC, M4A support
  • Video: Inline video playback - MP4, WebM, OGV support with native controls
  • Theme: Theme-aware styling - Automatic adaptation to light/dark mode
  • Mobile: Responsive design - Works on desktop and mobile devices
  • Security: Secure serving - Media served through authenticated API endpoint
  • Performance: Range request support - Efficient streaming for large files

Protocol Support

The implementation recognizes custom URL schemes:

  • audio://path/to/file.mp3 - Renders audio player
  • video://path/to/file.mp4 - Renders video player
  • Standard file:// and http(s):// URLs with media extensions are also detected

Implementation Details

Backend Changes

python/api/media_get.py (New file)

  • RESTful API endpoint: GET /api/media/get
  • Serves media files with proper MIME type detection
  • Supports HTTP Range requests for streaming (206 Partial Content)
  • Security: Validates file paths to prevent directory traversal
  • Caching headers for optimal performance
# Example API usage
curl -H "Authorization: Bearer $TOKEN" \
     "http://localhost:5000/api/media/get?path=/path/to/audio.mp3"

Frontend Changes

webui/js/messages.js

  • Auto-detects audio/video URLs in message content
  • Converts URLs to appropriate HTML5 media elements
  • Handles protocol conversion (audio:// -> /api/media/get?path=)
  • Maintains backward compatibility with existing message rendering

webui/css/messages.css

  • Custom media player styling matching Agent Zero's design system
  • CSS variables for theme integration (--color-panel, --color-accent)
  • Visual distinction between audio (compact) and video (larger) players
  • Hover states and focus indicators for accessibility
  • Responsive sizing with max-width constraints

Visual Preview

Audio Player (Compact)

  • Renders as a rounded pill-shaped player
  • Standard HTML5 controls (play, pause, volume, timeline)
  • Blue accent border for visibility in both themes
  • Gradient background for depth

Video Player (Standard)

  • 16:9 aspect ratio container
  • Full HTML5 video controls
  • Subtle shadow for elevation
  • Rounded corners matching UI components

Testing

Manual Testing Checklist

  • Audio files play inline (MP3, WAV, OGG)
  • Video files play inline (MP4, WebM)
  • Dark mode styling renders correctly
  • Light mode styling renders correctly
  • Mobile responsive layout works
  • Large files stream via range requests
  • Invalid paths return 404 errors
  • Non-media files remain as links (no regression)

Browser Compatibility

  • Chrome/Edge (Chromium) Yes
  • Firefox Yes
  • Safari Yes
  • Mobile browsers (iOS Safari, Chrome Mobile) Yes

Usage Examples

For Developers

Sending audio from an agent:

# In agent code
response = "Here's the audio recording: audio:///a0/usr/recordings/meeting.mp3"

Sending video:

response = "Video demonstration: video:///a0/usr/videos/demo.mp4"

For Users

Simply attach or reference media files in chat. The UI automatically renders players for:

  • Files with audio extensions: .mp3, .wav, .ogg, .aac, .flac, .m4a
  • Files with video extensions: .mp4, .webm, .ogv, .mov

Backward Compatibility

Yes Fully backward compatible

  • Existing messages without media render identically
  • No database schema changes required
  • No configuration changes required
  • Graceful degradation for unsupported browsers

Configuration

No configuration required. The feature is automatically available when:

  1. Backend API endpoint is registered (automatic on startup)
  2. Frontend CSS/JS files are loaded (standard page load)

Security Considerations

  • Path validation: API endpoint validates and sanitizes file paths
  • Authentication: Media requests respect existing session/auth mechanisms
  • No directory traversal: Restricted to accessible paths only
  • MIME type detection: Content-Type headers set correctly for each file type

Performance Impact

  • Minimal: Media files loaded on-demand via streaming
  • Lazy loading: Players don't preload content until user interaction
  • Caching: Browser caches media content appropriately
  • Range requests: Efficient seeking in large video files

Future Enhancements

Potential additions for future PRs:

  • Video thumbnail/poster image support
  • Audio waveform visualization
  • Playlist/sequential playback
  • Download button overlay
  • Full-screen video mode
  • Subtitle/caption support for video

Related Issues

Addresses feature gap: No existing issues specifically for media players, but enhances file attachment functionality referenced in general UI discussions.

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Changes tested in isolated environment
  • Backward compatibility maintained
  • Documentation updated (inline comments)
  • No breaking changes introduced
  • Security considerations addressed

Files Changed

File Lines Description
python/api/media_get.py +89 New API endpoint for media streaming
webui/js/messages.js +45 Media detection and player injection
webui/css/messages.css +85 Theme-aware media player styles

Total: ~220 lines of production code, ~130 lines of comments/documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments