Skip to content

Latest commit

 

History

History
2738 lines (2296 loc) · 87.3 KB

File metadata and controls

2738 lines (2296 loc) · 87.3 KB

WebEncode - Distributed Video Processing Platform (v2025)

Status: ULTRA-EXHAUSTIVE SPECIFICATION (FINAL PRODUCTION-READY) Validated For: 2025/2026 Cloud-Native Ecosystem Architecture: Micro-Kernel + 5-Pillar Plugin Mesh Document ID: WE-SPEC-2025-FINAL-v7 Lines: 2500+ (Complete Implementation Blueprint) Last Updated: 2025-12-24 License: MIT (Open Source)


Table of Contents

  1. Vision & Architecture
  2. Project Structure
  3. Global Error Tracking System
  4. Technology Stack (Validated 2025/2026)
  5. Data Models
  6. Messaging Architecture
  7. Plugin System (Detailed Lifecycle)
  8. Plugin Interfaces (5 Pillars - Complete)
  9. Core Workflows
  10. FFmpeg Integration
  11. API Reference (Complete)
  12. Error Handling & Codes
  13. Environment Configuration
  14. Security Architecture
  15. Deployment
  16. Monitoring & Observability
  17. Operational Playbook
  18. Performance Tuning
  19. Disaster Recovery
  20. GUI & Frontend Specification (Complete)
  21. Webhook System
  22. Testing Strategy
  23. Public Repository Guidelines
  24. Features & Non-Features Checklist
  25. Future Roadmap
  26. Glossary

1. Vision & Architecture

1.1 The "Hyper-Modular" Philosophy

WebEncode is designed to be the "Engine of Engines". Unlike traditional monolithic encoders, the WebEncode Kernel contains zero business logic regarding storage backends, authentication providers, or specific transcoding engines. Instead, it acts as a high-performance orchestrator for 5 distinct plugin pillars.

This ensures:

  1. Future Proofing: You can swap ffmpeg for a future AI-based encoder without touching the core.
  2. Hybrid Cloud: Run Storage plugins for S3 (Cloud) and Local Disk (On-Prem) simultaneously.
  3. Compliance: Swap the Auth plugin to integrate with any Identity Provider (OIDC, SAML, AD).
  4. License Safety: Architecture explicitly swaps viral/commercial components (MinIO, Vault) for permissively licensed alternatives (SeaweedFS, OpenBao).

1.2 System Architecture Diagram

graph TB
    subgraph "Frontend Layer"
        UI["Next.js 15 Web UI<br/>(TypeScript / Tailwind / shadcn/ui)"]
        WS["WebSocket/SSE<br/>Real-time Updates"]
    end

    subgraph "Control Plane (Kernel - Go 1.24)"
        Kernel["Core Kernel Service<br/>(REST API / gRPC Server)"]
        Auth["Auth Plugin Manager"]
        Storage["Storage Plugin Manager"]
        Encoder["Encoder Plugin Manager"]
        Live["Live Plugin Manager"]
        Publisher["Publisher Plugin Manager"]
    end

    subgraph "Data Plane"
        NATS["NATS JetStream 2.10+<br/>(WorkQueue + Event Streams)"]
        PG["PostgreSQL 16<br/>(State & Metadata)"]
        Secrets["OpenBao<br/>(Secrets Management)"]
    end

    subgraph "Plugin Processes (Isolated via HashiCorp go-plugin)"
        P_Auth["Auth Plugin<br/>(e.g., OIDC/Keycloak)"]
        P_Storage["Storage Plugin<br/>(e.g., SeaweedFS/S3)"]
        P_Encoder["Encoder Plugin<br/>(FFmpeg + worker)"]
        P_Live["Live Plugin<br/>(MediaMTX ingest)"]
        P_Publisher["Publisher Plugin<br/>(YouTube/Twitch)"]
    end

    subgraph "Worker Pool (Go 1.24 + FFmpeg)"
        W1["Worker 1"]
        W2["Worker 2"]
        WN["Worker N"]
    end

    UI -->|REST/gRPC| Kernel
    UI -->|Subscribe| WS
    WS -->|NATS events| Kernel
    
    Kernel --> Auth
    Kernel --> Storage
    Kernel --> Encoder
    Kernel --> Live
    Kernel --> Publisher
    
    Auth -.->|gRPC<br/>UDS| P_Auth
    Storage -.->|gRPC<br/>UDS| P_Storage
    Encoder -.->|gRPC<br/>UDS| P_Encoder
    Live -.->|gRPC<br/>UDS| P_Live
    Publisher -.->|gRPC<br/>UDS| P_Publisher
    
    Kernel <-->|NATS| NATS
    Kernel <-->|SQL| PG
    Kernel <-->|Secrets| Secrets
    
    NATS -->|Work items| W1
    NATS -->|Work items| W2
    NATS -->|Work items| WN
    
    W1 -->|Progress events| NATS
    W2 -->|Progress events| NATS
    WN -->|Progress events| NATS
Loading

1.3 Scalability Model

  • 100 Users: Single-node or small cluster. API server handles 0 video bytes (stateless).
  • 10,000+ Users: Horizontal scaling of Workers (KEDA) and Kernel replicas. Storage scales infinitely via SeaweedFS volumes.

2. Project Structure (Standard Monorepo)

The project follows a standard Go Modern Monorepo layout.

WebEncode/
├── cmd/                        # Application Entrypoints
│   ├── kernel/                 # Main control plane binary (go run ./cmd/kernel)
│   └── worker/                 # Worker entrypoint (go run ./cmd/worker)
│
├── deploy/                     # Deployment Configurations
│   ├── docker-compose.yml      # Local dev stack (Kernel + NATS + PG + SeaweedFS + UI)
│   ├── k8s/                    # Kubernetes manifests / Helm charts
│   └── terraform/              # Infrastructure as Code
│
├── frontend/                   # Next.js 15 Web Application
│   ├── src/
│   │   ├── app/                # App Router pages
│   │   ├── components/         # shadcn/ui components
│   │   └── lib/                # API clients & utils
│   ├── public/
│   └── next.config.ts
│
├── internal/                   # Private Kernel Logic (No external imports)
│   ├── api/                    # REST/Gin/Echo handlers
│   ├── core/                   # Domain Logic (Job Manager, Scheduler)
│   ├── database/               # PostgreSQL queries (sqlc/pgx)
│   └── events/                 # NATS publisher/subscriber logic
│
├── pkg/                        # Public Shared Libraries
│   ├── plugin_sdk/             # Official Go SDK for writing plugins
│   └── protocol/               # Generated gRPC Protobuf code
│
├── plugins/                    # Core Plugin Suite (Monorepo-managed)
│   ├── auth/
│   │   ├── plugin-auth-oidc/   # OIDC/Keycloak/Google implementation
│   │   ├── plugin-auth-ldap/   # LDAP implementation
│   │   └── plugin-auth-basic/  # Basic Auth implementation
│   ├── storage/
│   │   ├── plugin-storage-fs/  # Local Filesystem implementation
│   │   └── plugin-storage-s3/  # SeaweedFS/S3 implementation
│   ├── encoder/
│   │   └── plugin-encoder-ffmpeg/
│   ├── live/
│   │   └── plugin-live-mediamtx/
│   └── publisher/
│       ├── plugin-publisher-youtube/
│       ├── plugin-publisher-twitch/
│       ├── plugin-publisher-kick/
│       └── plugin-publisher-rumble/
│
├── proto/                      # Protobuf Definitions
│   ├── auth.proto
│   ├── storage.proto
│   ├── encoder.proto
│   ├── live.proto
│   └── publisher.proto
│
├── scripts/                    # Build & Helper Scripts
│   ├── build_all.sh            # Builds kernel + all plugins
│   └── generate_proto.sh       # Regenerates gRPC code
│
├── go.mod                      # Root Go Module
├── go.sum
└── Makefile                    # Make targets (dev, build, test, docker)

2.1 Key Directories

  • cmd/: Only package main. Minimal logic. Calls into internal/.
  • internal/: The "Kernel" logic. Not importable by plugins.
  • plugins/: Each folder here is a separate Go module (or shares the root mod if tight coupling is desired, but separate allows independent versioning).
  • proto/: The contract. All plugins depend on the code generated from these files.


2.5 Global Error Tracking System

A centralized system to capture, store, and analyze errors from all components (Frontend, Kernel, Workers, Plugins).

Architecture

  • Central Event Log: PostgreSQL table error_events stores all error occurrences.
  • Ingestion API: POST /v1/errors endpoint for external components (Frontend, isolated Plugins).
  • Internal Hooks: Kernel and Worker loggers attach to the error tracking service to auto-report ERROR level events.
  • Frontend Capture: Global error boundary and event listeners capture unhandled JS exceptions, Promise rejections, and resource loading failures (404s).

Data Model (error_events)

Field Type Description
id UUID Unique Error ID
source_component Enum frontend, kernel, worker, plugin:<id>
severity Enum warning, error, critical, fatal
message Text Human-readable error message
stack_trace Text (Optional) Stack trace or raw error dump
context_data JSONB Data map (e.g., {"job_id": "...", "url": "..."})
created_at Timestamp UTC Time of occurrence
resolved Boolean Triage status

Integration Points

  1. Frontend: GlobalErrorTracker component listens to window.onerror, unhandledrejection, and capture-phase error events to catch 404s.
  2. Kernel/Worker: pkg/logger hooks dispatch critical errors to the DB asynchronously.
  3. Plugins: Return detailed errors via gRPC, which the Kernel wraps and logs to the tracking system.

3. Technology Stack (Validated 2025/2026)

3.1 Technology Validation Summary

The following technology choices have been validated through extensive 2025 research:

✅ Go 1.24+ (CONFIRMED - Optimal Choice)

Why Go over Rust?

  • Development Speed: Go's simpler syntax and fast compilation enable faster time-to-market
  • Concurrency Model: Goroutines provide intuitive, efficient concurrency for managing parallel video streams and transcoding jobs
  • Cross-Platform: Single static binary deploys to Linux, Windows, macOS without dependencies
  • FFmpeg Integration: Go orchestrates FFmpeg subprocesses efficiently; the actual video processing happens in FFmpeg (C/C++), so Rust's raw performance advantage doesn't apply
  • Ecosystem Maturity: Frameworks like Gin, Echo, and go-plugin are battle-tested for microservices
  • Hybrid Approach Viable: If specific components need Rust-level performance in the future, they can be isolated as plugins

✅ NATS JetStream 2.10+ (CONFIRMED - Optimal Choice)

Why NATS over Kafka?

  • Lower Latency: NATS achieves sub-millisecond latency; critical for real-time telemetry
  • Simpler Operations: Single Go binary vs Kafka's JVM + ZooKeeper (or KRaft) complexity
  • Smaller Footprint: Ideal for self-hosted and edge deployments
  • Built-in Features: KV Store, Object Store, WorkQueue semantics without additional processes
  • Sufficient Throughput: Millions of messages/second; video platform doesn't need Kafka's extreme throughput

✅ Next.js 16 (CONFIRMED - Optimal Choice)

Why Next.js over Remix/Astro?

  • Versatility: SSR, SSG, ISR all supported; dashboard needs dynamic + static pages
  • React 19 Integration: Full React 19 support with stable Turbopack bundler
  • Performance: Turbopack now stable, faster builds and hot reload
  • Ecosystem: shadcn/ui, TanStack Query, massive component library support
  • Server Actions: Hardened for security, ideal for admin operations

✅ OpenBao (CONFIRMED - Best OSS Choice)

Why OpenBao over HashiCorp Vault?

  • License Safety: OpenBao is MPL 2.0 (Open Source), whereas HashiCorp Vault is BSL (Business Source License).
  • Commercial Freedom: No restrictions on commercial use or competition, unlike BSL.
  • Vault Compatibility: Drop-in replacement for HashiCorp Vault APIs (forked from the last open-source version).
  • Community Governance: Managed by the Linux Foundation, ensuring long-term open-source commitment.

✅ SeaweedFS (CONFIRMED - Best FOSS Storage)

Why SeaweedFS over MinIO?

  • License Safety: SeaweedFS is Apache 2.0 (Permissive), whereas MinIO is AGPLv3 (Viral/Copyleft).
  • Commercial Safety: Zero risk of viral license contamination for commercial deployments.
  • Performance: Optimized for handling billions of small files, often faster than MinIO for high-concurrency workloads.
  • API: Fully S3-compatible via its Filer component.

✅ MediaMTX (CONFIRMED - Best for Our Use Case)

Why MediaMTX over SRS/Nginx-RTMP?

  • Go-Native: Same language as Kernel/Workers, easier integration and contribution
  • Protocol Breadth: RTSP, RTMP, HLS, WebRTC, SRT, LL-HLS all supported
  • License: MIT (Permissive) - safe for all commercial uses.
  • Active Development: JWT auth, multi-track HLS, AV1/H265 codec support
  • Plugin Integration: Easy to embed as a Go library or run as a service

3.2 Complete Tool & Language Mapping

Component Technology Version License Purpose (Exact)
Kernel Go 1.24 1.24+ BSD Single control-plane service: REST API, WebSocket/SSE, job/task state machine. Cross-platform static binaries.
Workers Go 1.24 + FFmpeg 1.24+ BSD/LGPL Stateless executors: subscribe to work items from NATS. FFmpeg is LGPL (dynamically linked).
Plugin Host HashiCorp go-plugin v2.0+ MPL 2.0 Subprocess-based plugin isolation.
Messaging NATS JetStream 2.12+ Apache 2.0 WorkQueue stream for task dispatch and event fanout. Supports delayed scheduling, atomic batch publish.
Metadata DB PostgreSQL 17.x PostgreSQL Source of truth for jobs, tasks, streams.
Object Storage SeaweedFS / Local FS Latest Apache 2.0 S3-compatible API (SeaweedFS) for HA. Local Filesystem plugin (fs://) for simple single-node deployments.
Live Ingest MediaMTX Latest MIT RTMP/SRT/WebRTC ingest termination. Deployed as plugin service.
Auth Provider Keycloak (OIDC) Latest Apache 2.0 Via Auth plugin RPC boundary.
Secrets OpenBao Latest MPL 2.0 Central secrets store. Replaces Vault (BSL).
Web UI Next.js 16 16.1+ MIT Mission-control dashboard: jobs, live studio, restream studio, workers, plugins, profiles, audit log. TypeScript, React 19, Turbopack stable.
UI Component Lib shadcn/ui Latest MIT Headless Tailwind components. Accessible, customizable, no vendor lock-in. Used for all UI elements.
UI State React Context + TanStack Query Latest MIT React Context for auth/global state. TanStack Query for server-state caching and synchronization.
UI Real-time Native WebSocket or Socket.io 4.8+ MIT Dashboard/stream/job detail pages subscribe to job:{id}:updates, stream:{id}:telemetry, dashboard:stats subjects from NATS. Refreshed 1-2 times per second.
Charting Recharts Latest MIT Time-series charts for job analytics, live bitrate, viewer count. Built on React, responsive.
Icons Lucide React Latest ISC Consistent icon library. Accessible SVG icons.
Containerization Docker 27.x+ Apache 2.0 Multi-stage builds for optimized images. Build separate images for kernel, workers, plugins. Alpine or Debian base depending on dependencies.
Orchestration Docker Compose (dev) / Kubernetes (prod) Compose 2.x / K8s 1.31+ Apache 2.0 Docker Compose for local dev and small deployments. Kubernetes for enterprise with HA, autoscaling, multi-region. Helm charts optional.
CI/CD GitHub Actions + GitLab CI Latest Various Out-of-the-box workflows for build, test, push to registry. Both platforms supported for maximum flexibility.
Monitoring Prometheus + Grafana + Loki Latest Apache 2.0 Prometheus for metrics, Grafana for dashboards, Loki for log aggregation. All OSS. OpenTelemetry for tracing.

3.3 Core Plugin Suite (MVP Deliverables)

The following plugins are included in the base application repository and constitute the "Standard Library":

Plugin ID Type Implementation Details Use Case
plugin-storage-fs Storage Local Filesystem. Stores objects as standard files in a configured root directory. Local/Single-Node.
plugin-storage-s3 Storage S3-Compatible. Supports AWS S3, SeaweedFS, MinIO, Cloudflare R2. Distributed/Cloud.
plugin-auth-oidc Auth OIDC/OAuth2. Universal handler for Keycloak, Google, GitHub, GitLab. SSO / Social Login.
plugin-auth-ldap Auth LDAP/Active Directory. Direct bind and search for enterprise directories. Enterprise Auth.
plugin-auth-basic Auth HTTP Basic / None. Supports simple HTpasswd or trusting upstream HTTP Basic headers. Internal / No-Auth.
plugin-encoder-ffmpeg Encoder FFmpeg Wrapper. Manages local ffmpeg processes, parses progress, handles cancellation. Video Transcoding.
plugin-live-mediamtx Live MediaMTX Connector. Manages path configuration and consumes API stats from MediaMTX. Live Streaming.
plugin-publisher-youtube Publisher YouTube Data API v3. Handles VOD Uploads and Live RTMP broadcasting. Social VOD & Live.
plugin-publisher-twitch Publisher Twitch API + RTMP. Handles Live Streaming and Stream Metadata updates. Live Streaming.
plugin-publisher-kick Publisher Kick RTMP. Handles Live Streaming to Kick.com. Live Streaming.
plugin-publisher-rumble Publisher Rumble RTMP. Handles Live Streaming to Rumble. Live Streaming.

4. Data Models (PostgreSQL DDL)

4.1 jobs Table

CREATE TYPE job_status AS ENUM ('queued', 'processing', 'stitching', 'uploading', 'completed', 'failed', 'cancelled');
CREATE TYPE job_source_type AS ENUM ('url', 'upload', 'stream', 'restream');

CREATE TABLE jobs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    
    source_type job_source_type DEFAULT 'url',
    source_url TEXT,
    source_stream_id UUID REFERENCES streams(id),
    
    profile_id VARCHAR(100) NOT NULL,
    output_config JSONB DEFAULT '{}',
    
    status job_status DEFAULT 'queued',
    progress_pct INT DEFAULT 0,
    error_message TEXT,
    
    assigned_to_worker_id VARCHAR(100),
    
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    started_at TIMESTAMPTZ,
    finished_at TIMESTAMPTZ,
    eta_seconds INT
);
CREATE INDEX idx_jobs_user ON jobs(user_id);
CREATE INDEX idx_jobs_status ON jobs(status);
CREATE INDEX idx_jobs_created ON jobs(created_at DESC);

4.2 tasks Table

CREATE TABLE tasks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    job_id UUID NOT NULL REFERENCES jobs(id) ON DELETE CASCADE,
    
    sequence_index INT NOT NULL,
    start_time_sec DOUBLE PRECISION NOT NULL,
    end_time_sec DOUBLE PRECISION NOT NULL,
    
    assigned_worker_id VARCHAR(100),
    attempt_count INT DEFAULT 0,
    max_attempts INT DEFAULT 3,
    
    output_key TEXT,
    output_size_bytes BIGINT,
    
    status VARCHAR(20) DEFAULT 'pending'
);
CREATE INDEX idx_tasks_job ON tasks(job_id);
CREATE INDEX idx_tasks_status ON tasks(status);

4.3 streams Table

CREATE TABLE streams (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    
    title VARCHAR(255),
    description TEXT,
    thumbnail_url TEXT,
    
    stream_key VARCHAR(64) UNIQUE NOT NULL,
    ingest_server VARCHAR(100),
    
    is_live BOOLEAN DEFAULT FALSE,
    current_viewers INT DEFAULT 0,
    total_viewers_lifetime BIGINT DEFAULT 0,
    started_at TIMESTAMPTZ,
    ended_at TIMESTAMPTZ,
    
    last_stats JSONB,
    
    archive_enabled BOOLEAN DEFAULT TRUE,
    archive_vod_job_id UUID REFERENCES jobs(id),
    
    restream_destinations JSONB DEFAULT '[]'
);
CREATE INDEX idx_streams_user ON streams(user_id);
CREATE INDEX idx_streams_live ON streams(is_live);

4.4 workers Table

CREATE TABLE workers (
    id VARCHAR(100) PRIMARY KEY,
    hostname VARCHAR(255),
    ip_address INET,
    port INT,
    
    capabilities JSONB,
    
    is_healthy BOOLEAN DEFAULT TRUE,
    last_heartbeat TIMESTAMPTZ DEFAULT NOW(),
    version VARCHAR(20),
    
    created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_workers_healthy ON workers(is_healthy);

4.5 plugin_configs Table

CREATE TABLE plugin_configs (
    id VARCHAR(100) PRIMARY KEY,
    plugin_type VARCHAR(50) NOT NULL,
    
    config_json JSONB NOT NULL,
    
    is_enabled BOOLEAN DEFAULT TRUE,
    priority INT DEFAULT 0,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_plugin_configs_type ON plugin_configs(plugin_type);

4.6 restream_jobs Table

CREATE TABLE restream_jobs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    
    title VARCHAR(255),
    description TEXT,
    
    input_type VARCHAR(20), -- 'rtmp', 'file', 'vod_job'
    input_url TEXT,
    
    output_destinations JSONB NOT NULL, -- [{"platform": "twitch", "url": "rtmps://...", "enabled": true}, {"platform": "kick", ...}]
    
    schedule_type VARCHAR(20), -- 'immediate', 'scheduled', 'recurring'
    schedule_config JSONB,
    
    loop_enabled BOOLEAN DEFAULT FALSE,
    simulate_live BOOLEAN DEFAULT FALSE,
    
    status VARCHAR(20) DEFAULT 'stopped', -- 'streaming', 'stopped', 'error'
    current_stats JSONB,
    
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    started_at TIMESTAMPTZ,
    stopped_at TIMESTAMPTZ
);
CREATE INDEX idx_restream_user ON restream_jobs(user_id);
CREATE INDEX idx_restream_status ON restream_jobs(status);

5. Messaging Architecture (NATS JetStream)

5.1 Stream: WEBENCODE_WORK (WorkQueue)

  • Storage: File (NVMe backed)
  • Replicas: 3 (HA mode)
  • Retention: WorkQueue (ack-based deletion)
  • MaxAge: 24 hours (fallback)

Subjects:

  • jobs.dispatch: Task dispatch to workers
  • jobs.events: Status updates (fanout to UI)

5.2 Stream: WEBENCODE_LIVE (Interest)

  • Storage: Memory
  • Retention: Age 10 seconds
  • Replicas: 1 (ephemeral)

Subjects:

  • live.telemetry.{stream_id}: Bitrate, fps, viewers (1/sec)
  • live.lifecycle.{stream_id}: Stream start/stop events

5.3 Stream: WEBENCODE_EVENTS (Audit)

  • Storage: File
  • Replicas: 3
  • Retention: Age 90 days

Subjects:

  • audit.user_action: Job create, stream delete, etc.
  • audit.system: Worker joined, plugin crashed, etc.

6. Plugin System (Detailed Lifecycle)

6.1 Plugin Identity & Manifest

Every plugin must have a plugin.toml manifest:

[plugin]
id = "auth-oidc-keycloak"
type = "auth"
version = "1.0.0"
protocol_version = "1"
supported_kernel_versions = ["1.0.0", "1.1.0"]

[capabilities]
authentication_methods = ["oidc", "oauth2"]
mfa_support = true

[config_schema]
# JSON Schema for plugin config validation
type = "object"
required = ["issuer", "client_id"]
properties = {
  issuer = { type = "string", description = "OIDC issuer URL" },
  client_id = { type = "string" },
  client_secret_ref = { type = "string", description = "Reference to OpenBao secret" }
}

6.2 Plugin Discovery & Loading

graph TD
    A["Kernel Startup"] --> B["Scan Plugin Directory"]
    B --> C["Load plugin.toml manifests"]
    C --> D{"Validate<br/>protocol_version"}
    D -->|Mismatch| E["Log Error, Skip"]
    D -->|OK| F["Check config_schema"]
    F -->|Invalid config| G["Use defaults or fail"]
    F -->|Valid| H["Load plugin binary"]
    H --> I["Start subprocess<br/>via go-plugin"]
    I --> J{"RPC Handshake<br/>successful?"}
    J -->|No| K["Restart with backoff"]
    J -->|Yes| L["Plugin State: HEALTHY"]
    K --> J
    E --> M["Plugin State: DISABLED"]
    G --> M
Loading

6.3 Plugin Health Model

stateDiagram-v2
    [*] --> Starting: Kernel loads plugin binary
    
    Starting --> Healthy: Handshake OK, capabilities checked
    Starting --> Failed: Handshake timeout/error
    
    Healthy --> Degraded: RPC latency > 1s or error rate > 5%
    Degraded --> Healthy: Recovery detected
    Degraded --> Failed: Error rate > 10% for 2 min
    
    Failed --> Restarting: Automatic restart (exponential backoff)
    Restarting --> Healthy: Recovery successful
    Restarting --> Disabled: Max restarts exceeded (5x)
    
    Disabled --> [*]: Manual re-enable required
    Healthy --> Disabled: Admin action
Loading

6.4 Plugin Routing Rules

Storage Plugin Routing

if source_url.startswith("s3://"):
    use storage_plugin with type="s3"
elif source_url.startswith("fs://"):
    use storage_plugin with type="local_fs"
elif source_url.startswith("file://"):
    use storage_plugin with type="seaweedfs"
elif source_url.startswith("http"):
    use storage_plugin with type="http"

Encoder Plugin Routing

if required_codec in encoder_plugin.capabilities.video_codecs:
    if request.prefer_gpu and encoder_plugin.gpu_accelerated:
        route to GPU-enabled encoder
    else:
        route to any encoder with capability
else:
    fallback to secondary encoder plugin

Publisher Plugin Routing

for each platform in job.publish_targets:
    if platform == "youtube":
        route to publisher_plugin with platform="youtube"
    elif platform == "twitch" or platform == "kick" or platform == "rumble":
        route to publisher_plugin with type="restream" (for Live) or "generic" (for VOD if API matches)
    else:
        skip or error

Live Plugin Routing

if ingest_protocol == "rtmp":
    route to live_plugin with protocol="rtmp", port=1935
elif ingest_protocol == "srt":
    route to live_plugin with protocol="srt", port=10080
elif ingest_protocol == "webrtc":
    route to live_plugin with protocol="webrtc"

6.5 Plugin Configuration & Secrets

Process:

  1. Admin provides plugin config JSON.
  2. Kernel validates against plugin's config_schema.
  3. For secret fields (*_secret_ref, *_token_ref), kernel creates reference in OpenBao.
  4. Kernel passes reference ID (not plaintext) to plugin.
  5. Plugin requests secret from OpenBao when needed.

Example:

{
  "plugin_id": "storage-s3",
  "config": {
    "endpoint": "s3.amazonaws.com",
    "region": "us-west-2",
    "bucket": "webencode-prod",
    "access_key_id_ref": "secret/webencode/s3/access_key",
    "secret_access_key_ref": "secret/webencode/s3/secret_key"
  }
}

Kernel stores only the *_ref keys in DB. Plugin retrieves actual secrets from OpenBao.

6.6 Plugin Upgrade & Rollback

sequenceDiagram
    Admin->>Kernel: Upload new plugin binary (v1.1.0)
    Kernel->>Kernel: Validate manifest & schema compatibility
    Kernel->>Plugin: Load new plugin in shadow mode (internal test)
    Plugin->>Plugin: Run self-test RPC
    Plugin-->>Kernel: Test passed
    Kernel->>Kernel: Mark old plugin as "standby"
    Kernel->>Kernel: Switch traffic to new plugin
    Kernel->>Kernel: Monitor error rate for 5 min
    alt Error rate < 1%
        Kernel->>Kernel: Keep new plugin, remove old
        Kernel-->>Admin: Upgrade successful
    else Error rate > 5%
        Kernel->>Plugin: Stop new plugin
        Kernel->>Plugin: Restart old plugin
        Kernel-->>Admin: Automatic rollback
    end
Loading

6.7 Plugin Observability Contract

Every plugin must emit:

  • Logs: Structured JSON format, with correlation_id field for tracing.
  • Metrics: Prometheus endpoint at /metrics, prefix webencode_plugin_{plugin_id}_*.
  • Health check: gRPC grpc.health.v1.Health/Check endpoint.

7. Plugin Interfaces (The 5 Pillars - Complete Specification)

7.1 Pillar 1: Authentication (auth.proto)

syntax = "proto3";
package plugin.auth;

// Authenticator provides identity and authorization services.
// Note: This is for self-hosted deployments - no billing/quotas.
service Authenticator {
  // Authenticate validates a token and returns user identity
  rpc Authenticate(AuthRequest) returns (AuthResponse);
  
  // Authorize checks if a user can perform an action on a resource
  rpc Authorize(AuthZRequest) returns (AuthZResponse);
  
  // GetUserProfile retrieves user metadata and role information
  rpc GetUserProfile(GetUserProfileRequest) returns (UserProfile);
  
  // ListUsers lists all users (admin only)
  rpc ListUsers(ListUsersRequest) returns (ListUsersResponse);
  
  // RefreshToken exchanges a refresh token for new access token
  rpc RefreshToken(RefreshTokenRequest) returns (AuthResponse);
  
  // Logout invalidates a user session
  rpc Logout(LogoutRequest) returns (Empty);
}

message AuthRequest {
  string token = 1;           // JWT or opaque token
  string scheme = 2;          // "bearer", "basic", "api-key"
  string client_ip = 3;       // For audit logging
  string user_agent = 4;      // For session tracking
}

message AuthResponse {
  bool valid = 1;
  string user_id = 2;
  string username = 3;
  string email = 4;
  string role = 5;            // "admin", "operator", "viewer"
  map<string, string> metadata = 6;
  int64 expires_at_unix = 7;
  string access_token = 8;    // New token if refreshed
  string refresh_token = 9;   // For token refresh flow
}

message AuthZRequest {
  string user_id = 1;
  string action = 2;          // "create", "read", "update", "delete", "execute"
  string resource_type = 3;   // "job", "stream", "worker", "plugin", "settings"
  string resource_id = 4;     // Optional: specific resource ID
}

message AuthZResponse {
  bool allowed = 1;
  string reason = 2;          // Explanation if denied
}

message GetUserProfileRequest {
  string user_id = 1;
}

message UserProfile {
  string user_id = 1;
  string username = 2;
  string email = 3;
  string display_name = 4;
  string role = 5;
  string avatar_url = 6;
  int64 created_at_unix = 7;
  int64 last_login_unix = 8;
  map<string, string> preferences = 9;  // UI preferences, timezone, etc.
}

message ListUsersRequest {
  int32 page = 1;
  int32 page_size = 2;
  string role_filter = 3;     // Optional: filter by role
}

message ListUsersResponse {
  repeated UserProfile users = 1;
  int32 total_count = 2;
  int32 page = 3;
}

message RefreshTokenRequest {
  string refresh_token = 1;
}

message LogoutRequest {
  string user_id = 1;
  string session_id = 2;      // Optional: specific session
  bool all_sessions = 3;      // Logout from all devices
}

message Empty {}

7.2 Pillar 2: Storage (storage.proto)

syntax = "proto3";
package plugin.storage;

service StorageProvider {
  // Upload streams file content to storage
  rpc Upload(stream FileChunk) returns (UploadSummary);
  
  // Download streams file content from storage
  rpc Download(FileRequest) returns (stream FileChunk);
  
  // Delete removes an object from storage
  rpc Delete(FileRequest) returns (Empty);
  
  // GetURL returns a signed URL for direct access
  rpc GetURL(SignedUrlRequest) returns (SignedUrlResponse);
  
  // GetUploadURL returns a pre-signed URL for uploads
  rpc GetUploadURL(SignedUrlRequest) returns (SignedUrlResponse);
  
  // ListObjects lists objects in a bucket with optional prefix
  rpc ListObjects(ListObjectsRequest) returns (ListObjectsResponse);
  
  // GetObjectMetadata returns metadata for an object
  rpc GetObjectMetadata(FileRequest) returns (ObjectMetadata);
  
  // GetCapabilities returns what features this storage plugin supports
  rpc GetCapabilities(Empty) returns (StorageCapabilities);
  
  // BrowseRoots returns available root directories for browsing (if supported)
  rpc BrowseRoots(Empty) returns (BrowseRootsResponse);
  
  // Browse lists contents of a directory (if supported)
  rpc Browse(BrowseRequest) returns (BrowseResponse);
}

message FileChunk {
  oneof content {
    FileMetadata metadata = 1;
    bytes data = 2;
  }
}

message FileMetadata {
  string bucket = 1;
  string path = 2;
  int64 size = 3;
  string content_type = 4;
}

message UploadSummary {
  string url = 1;
  int64 size = 2;
  string etag = 3;
}

message FileRequest {
  string bucket = 1;
  string path = 2;
}

message SignedUrlRequest {
  string bucket = 1;
  string object_key = 2;
  int64 expiry_seconds = 3;
  string content_type = 4;
  string method = 5;
}

message SignedUrlResponse {
  string url = 1;
  map<string, string> headers = 2;
  int64 expires_at = 3;
}

message ListObjectsRequest {
  string bucket = 1;
  string prefix = 2;
  string delimiter = 3;
  int32 max_keys = 4;
  string continuation_token = 5;
}

message ListObjectsResponse {
  repeated ObjectInfo objects = 1;
  repeated string common_prefixes = 2;
  string next_continuation_token = 3;
  bool is_truncated = 4;
}

message ObjectInfo {
  string key = 1;
  int64 size = 2;
  string etag = 3;
  int64 last_modified = 4;
  string storage_class = 5;
}

message ObjectMetadata {
  string key = 1;
  int64 size = 2;
  string content_type = 3;
  string etag = 4;
  int64 last_modified = 5;
  map<string, string> user_metadata = 6;
}

message StorageCapabilities {
  bool supports_browse = 1;
  bool supports_upload = 2;
  bool supports_signed_urls = 3;
  bool supports_streaming = 4;
  string storage_type = 5;
}

message BrowseRootsResponse {
  repeated BrowseEntry roots = 1;
}

message BrowseRequest {
  string path = 1;
  bool show_hidden = 2;
  bool media_only = 3;
  string search_query = 4;
}

message BrowseResponse {
  string current_path = 1;
  string parent_path = 2;
  repeated BrowseEntry entries = 3;
}

message BrowseEntry {
  string name = 1;
  string path = 2;
  bool is_directory = 3;
  int64 size = 4;
  int64 mod_time = 5;
  string extension = 6;
  bool is_video = 7;
  bool is_audio = 8;
  bool is_image = 9;
}

7.3 Pillar 3: Encoder (encoder.proto)

syntax = "proto3";
package plugin.encoder;

service EncoderEngine {
  rpc GetCapabilities(Empty) returns (Capabilities);
  rpc Transcode(TaskSpec) returns (stream TaskProgress);
}

message Capabilities {
  repeated string video_codecs = 1;
  repeated string audio_codecs = 2;
  bool gpu_accelerated = 3;
  string gpu_type = 4;
}

message TaskSpec {
  string task_id = 1;
  string input_url = 2;
  string output_url = 3;
  string video_codec = 4;
  int32 bitrate_k = 5;
  double start_time = 6;
  double duration = 7;
}

message TaskProgress {
  string task_id = 1;
  float percent = 2;
  double speed = 3;
  int32 fps = 4;
}

7.4 Pillar 4: Live (live.proto)

syntax = "proto3";
package plugin.live;

service LiveEngine {
  rpc StartIngest(IngestConfig) returns (stream EngineStatus);
  rpc OnPublishAttempt(StreamContext) returns (Decision);
  rpc StreamTelemetry(Empty) returns (stream TelemetryBatch);
  rpc StopIngest(StopRequest) returns (Empty);
}

message IngestConfig {
  int32 port = 1;
  string protocol = 2;
  string stream_key = 3;
}

message TelemetryBatch {
  string stream_key = 1;
  int32 video_bitrate = 2;
  int32 audio_bitrate = 3;
  double fps = 4;
}

7.5 Pillar 5: Publisher (publisher.proto)

syntax = "proto3";
package plugin.publisher;

service Publisher {
  rpc PublishVOD(PublishRequest) returns (PublishResult);
  rpc UpdateMetadata(UpdateRequest) returns (UpdateResult);
}

message PublishRequest {
  string local_object_key = 1;
  string title = 2;
  string description = 3;
  string platform = 4;
  string privacy = 5;
}

message PublishResult {
  bool success = 1;
  string public_url = 2;
  string platform_id = 3;
}

8. Core Workflows (Logic & Algorithms)

8.1 VOD Encoding Workflow

graph TD
    A["User submits job<br/>POST /v1/jobs"] --> B["Kernel validates auth"]
    B --> C["Storage plugin: GetDownloadURL"]
    C --> D["FFmpeg probe for keyframes"]
    D --> E["Calculate split points<br/>align to IDR frames"]
    E --> F["Create N tasks<br/>in PostgreSQL"]
    F --> G["For each task:<br/>NATS.Publish to jobs.dispatch"]
    G --> H["Worker picks up task"]
    H --> I["Storage plugin: GetDownloadURL"]
    I --> J["FFmpeg transcode"]
    J --> K["Storage plugin: GetUploadURL"]
    K --> L["Upload chunk"]
    L --> M["NATS.Publish job.events"]
    M --> N{"All tasks<br/>complete?"}
    N -->|No| H
    N -->|Yes| O["Stitching phase:<br/>Concat demuxer"]
    O --> P["Publisher plugin: PublishVOD<br/>if enabled"]
    P --> Q["Mark job as completed"]
    Q --> R["Broadcast to UI via NATS"]
Loading

Smart Chunking Optimization: The orchestrator checks the number of healthy workers before scheduling. If only 1 worker is available, the job is processed as a single chunk to avoid splitting/stitching overhead. Parallel segmentation is only active when multiple workers are online.

8.2 Live Stream Workflow (with Archive & Restream)

graph TD
    A["User creates stream<br/>POST /v1/streams"] --> B["Kernel generates stream_key"]
    B --> C["Live plugin: StartIngest"]
    C --> D["MediaMTX listens on RTMP:1935"]
    D --> E["OBS/encoder connects"]
    E --> F["Live plugin: OnPublishAttempt<br/>Auth check"]
    F --> G["Stream goes LIVE"]
    G --> H["Record segments<br/>to object storage"]
    H --> I["Every 1sec:<br/>TelemetryBatch to NATS"]
    I --> J["UI subscribes live.telemetry"]
    J --> K{"Stream ends?"}
    K -->|No| I
    K -->|Yes| L["Collect all segment URLs"]
    L --> M["Create VOD job<br/>source_type=stream"]
    M --> N["Archive & re-encode<br/>if archive_enabled=true"]
    N --> O{"Restream enabled?"}
    O -->|Yes| P["For each restream_destination"]
    P --> Q["Live plugin: proxy/restream<br/>to RTMP endpoint"]
    Q --> R["Track stats per destination"]
    O -->|No| S["Mark stream as ended"]
    S --> T["Broadcast to UI"]
Loading

8.3 Restream Job Workflow

graph TD
    A["User creates restream job<br/>POST /v1/restreams"] --> B["Validate destinations"]
    B --> C["Store in PostgreSQL<br/>restream_jobs table"]
    C --> D{"schedule_type<br/>?"}
    D -->|immediate| E["Start immediately"]
    D -->|scheduled| F["Queue at scheduled time"]
    D -->|recurring| G["Queue recurring instances"]
    E --> H["Determine input source"]
    H --> I{"input_type?"}
    I -->|rtmp| J["Live plugin: accept RTMP ingest"]
    I -->|file| K["Storage plugin: GetDownloadURL"]
    I -->|vod_job| L["Get output from previous job"]
    J --> M["Stream to output destinations"]
    K --> M
    L --> M
    M --> N["For each destination:<br/>check enabled flag"]
    N --> O["Live plugin: proxy to RTMP/SRT"]
    O --> P["Track bitrate per destination"]
    P --> Q["Broadcast stats to UI"]
Loading

9. FFmpeg Integration & Command Templates

9.1 FFmpeg Command Architecture

WebEncode uses FFmpeg as the primary transcoding engine, executed as subprocess by worker nodes. The Encoder Plugin wraps FFmpeg with structured command generation and progress parsing.

graph LR
    A["Worker receives task"] --> B["Generate FFmpeg command"]
    B --> C["Set environment variables"]
    C --> D["Execute subprocess"]
    D --> E["Parse progress output"]
    E --> F["Report to NATS"]
    D --> G["Wait for exit code"]
    G --> H{Success?}
    H -->|Yes| I["Upload chunk"]
    H -->|No| J["Report error, retry if attempts remain"]
Loading

9.2 Base Command Template

ffmpeg \
  -hide_banner \
  -loglevel warning \
  -progress pipe:1 \
  -y \
  -ss {{START_TIME}} \
  -i "{{INPUT_URL}}" \
  -t {{DURATION}} \
  -map 0:v:0 -map 0:a:0 \
  -c:v {{VIDEO_CODEC}} \
  -b:v {{VIDEO_BITRATE}} \
  -maxrate {{MAX_BITRATE}} \
  -bufsize {{BUFFER_SIZE}} \
  -preset {{PRESET}} \
  -c:a {{AUDIO_CODEC}} \
  -b:a {{AUDIO_BITRATE}} \
  -ar {{AUDIO_SAMPLE_RATE}} \
  -f {{OUTPUT_FORMAT}} \
  "{{OUTPUT_URL}}"

9.3 Profile Command Templates

9.3.1 1080p H.264 (Software)

ffmpeg -hide_banner -loglevel warning -progress pipe:1 -y \
  -ss {{START}} -i "{{INPUT}}" -t {{DURATION}} \
  -c:v libx264 -preset fast -profile:v high -level:v 4.1 \
  -b:v 5000k -maxrate 5500k -bufsize 10000k \
  -vf "scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2" \
  -pix_fmt yuv420p \
  -c:a aac -b:a 192k -ar 48000 -ac 2 \
  -movflags +faststart \
  -f mp4 "{{OUTPUT}}"

9.3.2 4K HEVC (NVIDIA GPU)

ffmpeg -hide_banner -loglevel warning -progress pipe:1 -y \
  -hwaccel cuda -hwaccel_output_format cuda \
  -ss {{START}} -i "{{INPUT}}" -t {{DURATION}} \
  -c:v hevc_nvenc -preset p4 -tune hq -rc vbr -cq 23 \
  -b:v 15000k -maxrate 18000k -bufsize 30000k \
  -vf "scale_cuda=3840:2160:force_original_aspect_ratio=decrease" \
  -c:a aac -b:a 320k -ar 48000 -ac 2 \
  -movflags +faststart \
  -f mp4 "{{OUTPUT}}"

9.3.3 720p VP9 (WebM)

ffmpeg -hide_banner -loglevel warning -progress pipe:1 -y \
  -ss {{START}} -i "{{INPUT}}" -t {{DURATION}} \
  -c:v libvpx-vp9 -b:v 2000k -crf 30 -row-mt 1 -threads 4 \
  -vf "scale=1280:720:force_original_aspect_ratio=decrease" \
  -c:a libopus -b:a 128k -ar 48000 \
  -f webm "{{OUTPUT}}"

9.3.4 AV1 (Software - High Quality)

ffmpeg -hide_banner -loglevel warning -progress pipe:1 -y \
  -ss {{START}} -i "{{INPUT}}" -t {{DURATION}} \
  -c:v libaom-av1 -crf 30 -b:v 0 -cpu-used 4 -row-mt 1 \
  -vf "scale=1920:1080:force_original_aspect_ratio=decrease" \
  -pix_fmt yuv420p10le \
  -c:a libopus -b:a 128k -ar 48000 \
  -f mp4 "{{OUTPUT}}"

9.4 HLS/DASH Packaging

9.4.1 HLS with Multiple Qualities (ABR Ladder)

ffmpeg -hide_banner -loglevel warning -y \
  -i "{{INPUT}}" \
  -filter_complex "[0:v]split=3[v1][v2][v3]; \
    [v1]scale=1920:1080[v1out]; \
    [v2]scale=1280:720[v2out]; \
    [v3]scale=854:480[v3out]" \
  -map "[v1out]" -c:v:0 libx264 -b:v:0 5000k -maxrate:v:0 5500k -bufsize:v:0 10000k \
  -map "[v2out]" -c:v:1 libx264 -b:v:1 2500k -maxrate:v:1 2750k -bufsize:v:1 5000k \
  -map "[v3out]" -c:v:2 libx264 -b:v:2 1000k -maxrate:v:2 1100k -bufsize:v:2 2000k \
  -map 0:a -c:a aac -b:a:0 192k -b:a:1 128k -b:a:2 96k \
  -f hls \
  -hls_time 6 \
  -hls_list_size 0 \
  -hls_segment_filename "{{OUTPUT_DIR}}/stream_%v/segment_%03d.ts" \
  -master_pl_name master.m3u8 \
  -var_stream_map "v:0,a:0 v:1,a:1 v:2,a:2" \
  "{{OUTPUT_DIR}}/stream_%v/playlist.m3u8"

9.5 Hardware Acceleration Detection

// GPU detection logic (pseudo-code)
func DetectGPUCapabilities() (*GPUInfo, error) {
    // NVIDIA detection
    if output, err := exec.Command("nvidia-smi", "-L").Output(); err == nil {
        return parseNvidiaOutput(output), nil
    }
    
    // AMD detection (Linux)
    if _, err := os.Stat("/dev/dri/renderD128"); err == nil {
        return &GPUInfo{Type: "amd", Available: true}, nil
    }
    
    // Intel QSV detection
    if output, err := exec.Command("vainfo").Output(); err == nil {
        if strings.Contains(string(output), "iHD") {
            return &GPUInfo{Type: "intel", Available: true}, nil
        }
    }
    
    return &GPUInfo{Type: "none", Available: false}, nil
}

Encoder Selection Matrix:

GPU Type H.264 Encoder H.265/HEVC AV1
NVIDIA h264_nvenc hevc_nvenc av1_nvenc (RTX 40+)
AMD h264_amf hevc_amf av1_amf (RX 7000+)
Intel h264_qsv hevc_qsv av1_qsv (Arc)
None libx264 libx265 libaom-av1 / libsvtav1

9.6 Segment Splitting Algorithm

// Keyframe-aligned segment splitting
func CalculateSegmentBoundaries(probeData *ProbeResult, targetSegmentDuration float64) []Segment {
    keyframes := probeData.Keyframes // From ffprobe -show_frames -select_streams v -show_entries frame=pkt_pts_time,key_frame
    
    segments := []Segment{}
    currentStart := 0.0
    
    for i, kf := range keyframes {
        timeSinceStart := kf.Time - currentStart
        
        // If we've accumulated enough time, or this is the last keyframe
        if timeSinceStart >= targetSegmentDuration || i == len(keyframes)-1 {
            segments = append(segments, Segment{
                Index:     len(segments),
                StartTime: currentStart,
                EndTime:   kf.Time,
                Duration:  kf.Time - currentStart,
            })
            currentStart = kf.Time
        }
    }
    
    return segments
}

Segmentation Parameters:

  • Target segment duration: 30-60 seconds (configurable)
  • Minimum segment: 10 seconds (avoid overhead)
  • Maximum segment: 120 seconds (limit single task impact)
  • Keyframe alignment: Always split on I-frames to avoid decode artifacts

9.7 Progress Parsing

FFmpeg outputs progress to stderr with -progress pipe:1. The worker parses this in real-time:

func ParseFFmpegProgress(line string, totalDuration float64) *ProgressUpdate {
    // Example: out_time_ms=30000000
    if strings.HasPrefix(line, "out_time_ms=") {
        ms, _ := strconv.ParseInt(strings.TrimPrefix(line, "out_time_ms="), 10, 64)
        currentTime := float64(ms) / 1_000_000.0
        percent := (currentTime / totalDuration) * 100.0
        return &ProgressUpdate{
            Percent:      math.Min(percent, 100.0),
            CurrentTime:  currentTime,
            TotalTime:    totalDuration,
        }
    }
    
    // Parse speed (e.g., "speed=2.5x")
    if strings.HasPrefix(line, "speed=") {
        speedStr := strings.TrimSuffix(strings.TrimPrefix(line, "speed="), "x")
        speed, _ := strconv.ParseFloat(speedStr, 64)
        return &ProgressUpdate{Speed: speed}
    }
    
    // Parse fps
    if strings.HasPrefix(line, "fps=") {
        fps, _ := strconv.ParseFloat(strings.TrimPrefix(line, "fps="), 64)
        return &ProgressUpdate{FPS: fps}
    }
    
    return nil
}

9.8 Error Handling & Recovery

Exit Code Meaning Action
0 Success Continue to upload
1 Generic error Retry with exponential backoff
69 Input error (corrupt file) Mark job as failed, no retry
137 OOM killed Retry on different worker with fewer parallel tasks
255 Timeout/abort Retry if attempts remain

Retry Backoff Formula:

delay = min(base * 2^attempt, max_delay)
base = 5 seconds
max_delay = 5 minutes
max_attempts = 3 (configurable per profile)

10. API Reference (OpenAPI 3.1)

10.1 Jobs API

POST /v1/jobs

{
  "source_url": "s3://bucket/video.mov",
  "profile": "1080p_h264",
  "publish": {
    "youtube": { "title": "My Video", "privacy": "unlisted" }
  },
  "webhooks": ["https://callback.site/hook"]
}

Response: 201 Created

{
  "id": "job-uuid",
  "status": "queued",
  "eta_seconds": 300
}

GET /v1/jobs/{id} Response:

{
  "id": "job-uuid",
  "status": "processing",
  "progress_pct": 45,
  "tasks_total": 30,
  "tasks_completed": 13,
  "eta_seconds": 200
}

POST /v1/jobs/{id}/retry Cancels the current job (if running) and creates a new job with the same parameters (source URL, profiles, etc.). Returns the new job details. Response: 201 Created

{
  "id": "new-job-uuid",
  "source_url": "...",
  "status": "queued",
  ...
}

GET /v1/jobs/{id}/outputs Returns output files for a completed job with download URLs. Response:

{
  "job_id": "job-uuid",
  "status": "completed",
  "outputs": [
    {
      "name": "abc123_final.mp4",
      "type": "final",
      "url": "abc123_final.mp4",
      "download_url": "https://storage.example.com/abc123_final.mp4?signed=...",
      "profile": "default"
    }
  ]
}

POST /v1/jobs/{id}/publish Publishes a completed job to an external platform using publisher plugins. Request:

{
  "platform": "twitch",
  "title": "My Video",
  "description": "Optional description",
  "access_token": "oauth_token_for_platform",
  "output_key": "optional_specific_output"
}

Response:

{
  "success": true,
  "platform_id": "video123",
  "platform_url": "https://www.twitch.tv/videos/video123",
  "message": "Successfully published to twitch"
}

10.2 Streams API

POST /v1/streams

{
  "title": "Live Event",
  "description": "My stream",
  "archive_enabled": true
}

Response: 201 Created

{
  "id": "stream-uuid",
  "stream_key": "live_xxxxx",
  "ingest_url": "rtmp://ingest.webencode.local:1935/live"
}

GET /v1/streams/{id}/stats Response:

{
  "status": "live",
  "viewers": 420,
  "bitrate_kbps": 5200,
  "fps": 59.97
}

10.3 Restream API

POST /v1/restreams

{
  "title": "Multi-platform stream",
  "input_type": "rtmp",
  "output_destinations": [
    { "platform": "youtube", "url": "rtmp://...", "enabled": true },
    { "platform": "twitch", "url": "rtmp://...", "enabled": true }
  ]
}

GET /v1/restreams/{id} Response includes per-destination stats.

10.4 Files API

GET /v1/files/capabilities Returns list of storage plugins and their capabilities.

[
  {
    "plugin_id": "storage-s3",
    "storage_type": "s3",
    "supports_browse": true
  }
]

GET /v1/files/roots Returns root directories (buckets/folders) from all plugins.

[
  {
    "plugin_id": "storage-s3",
    "name": "webencode-bucket",
    "path": "webencode-bucket",
    "storage_type": "s3"
  }
]

GET /v1/files/browse Lists contents of a directory. Query Params: plugin, path, media_only, show_hidden, search

{
  "plugin_id": "storage-s3",
  "current_path": "bucket/videos",
  "entries": [ { "name": "movie.mp4", "is_directory": false, "size": 10240, "is_video": true } ]
}

POST /v1/files/upload-url Generate pre-signed URL for direct upload.

{
  "plugin_id": "storage-s3",
  "bucket": "my-bucket",
  "object_key": "uploads/video.mp4",
  "content_type": "video/mp4"
}

POST /v1/files/upload Upload file via multipart form to the proxy. Form Fields: file, plugin_id, bucket, object_key


11. Error Handling & Codes

11.1 Error Response Format

All API errors return a standardized JSON structure:

{
  "error": {
    "code": "WE-JOB-004",
    "message": "Job not found",
    "details": "No job exists with ID: 550e8400-e29b-41d4-a716-446655440000",
    "request_id": "req-abc123",
    "timestamp": "2025-12-22T10:30:00Z",
    "docs_url": "https://docs.webencode.io/errors/WE-JOB-004"
  }
}

11.2 Error Code Catalog

Job Errors (WE-JOB-xxx)

Code HTTP Status Message Description Resolution
WE-JOB-001 400 Invalid job request Missing required fields or malformed JSON Check request body schema
WE-JOB-002 400 Invalid source URL Source URL is unreachable or unsupported Verify URL accessibility
WE-JOB-003 400 Invalid profile Encoding profile does not exist Use GET /v1/profiles to list valid profiles
WE-JOB-004 404 Job not found Job ID does not exist Verify job ID
WE-JOB-005 409 Job already cancelled Cannot modify a cancelled job Create a new job
WE-JOB-006 409 Job already completed Cannot cancel a completed job No action needed
WE-JOB-007 422 Unsupported source format File format not supported See supported formats list
WE-JOB-008 500 Job processing failed Internal error during processing Retry or contact support
WE-JOB-009 503 No workers available All workers are busy or offline Wait and retry

Stream Errors (WE-STR-xxx)

Code HTTP Status Message Description Resolution
WE-STR-001 400 Invalid stream request Missing required fields Check request body
WE-STR-002 404 Stream not found Stream ID does not exist Verify stream ID
WE-STR-003 409 Stream already live Cannot modify settings while live Stop stream first
WE-STR-004 409 Stream key conflict Stream key already in use Use a different key
WE-STR-005 422 Invalid restream destination RTMP URL format invalid Verify destination URL
WE-STR-006 500 Ingest server unavailable MediaMTX plugin not responding Restart live plugin

Worker Errors (WE-WRK-xxx)

Code HTTP Status Message Description Resolution
WE-WRK-001 404 Worker not found Worker ID does not exist List available workers
WE-WRK-002 409 Worker busy Cannot drain worker with active tasks Wait for tasks to complete
WE-WRK-003 410 Worker offline Worker has not sent heartbeat Check worker connectivity
WE-WRK-004 500 Worker communication failed gRPC connection to worker failed Restart worker

Plugin Errors (WE-PLG-xxx)

Code HTTP Status Message Description Resolution
WE-PLG-001 404 Plugin not found Plugin ID does not exist List installed plugins
WE-PLG-002 409 Plugin disabled Plugin is disabled Enable plugin first
WE-PLG-003 422 Invalid plugin config Config does not match schema Validate against plugin schema
WE-PLG-004 500 Plugin crashed Plugin process terminated unexpectedly Check plugin logs
WE-PLG-005 503 Plugin degraded Plugin is responding slowly Monitor and restart if needed

Auth Errors (WE-AUTH-xxx)

Code HTTP Status Message Description Resolution
WE-AUTH-001 401 Unauthorized Missing or invalid token Provide valid Bearer token
WE-AUTH-002 401 Token expired JWT token has expired Refresh token
WE-AUTH-003 403 Forbidden User lacks permission for action Contact admin for access
WE-AUTH-004 403 Role required Specific role required for action Check RBAC requirements
WE-AUTH-005 500 Auth provider unavailable OIDC/Keycloak not responding Retry or check provider

Storage Errors (WE-STO-xxx)

Code HTTP Status Message Description Resolution
WE-STO-001 404 Object not found File does not exist in storage Verify object key
WE-STO-002 403 Storage access denied Missing storage credentials Check plugin config
WE-STO-003 507 Storage full Storage capacity exceeded Free space or expand storage
WE-STO-004 500 Storage unavailable Cannot connect to storage backend Check storage plugin

System Errors (WE-SYS-xxx)

Code HTTP Status Message Description Resolution
WE-SYS-001 500 Internal server error Unexpected error Check logs, contact support
WE-SYS-002 500 Database error PostgreSQL query failed Check database connectivity
WE-SYS-003 500 Message queue error NATS connection failed Check NATS connectivity
WE-SYS-004 503 Service unavailable Kernel is starting or shutting down Wait and retry
WE-SYS-005 429 Rate limit exceeded Too many requests Wait and retry with backoff

11.3 HTTP Status Code Mapping

Status Code Usage
200 OK Successful GET, PUT, PATCH
201 Created Successful POST creating a resource
204 No Content Successful DELETE
400 Bad Request Invalid input, malformed JSON
401 Unauthorized Missing or invalid authentication
403 Forbidden Authenticated but not authorized
404 Not Found Resource does not exist
409 Conflict State conflict (already exists, cannot modify)
422 Unprocessable Entity Valid JSON but semantic error
429 Too Many Requests Rate limit exceeded
500 Internal Server Error Unexpected server error
503 Service Unavailable Temporary unavailability
507 Insufficient Storage Storage quota exceeded

11.4 Error Logging

All errors are logged with the following structure:

{
  "level": "error",
  "timestamp": "2025-12-22T10:30:00Z",
  "correlation_id": "req-abc123",
  "error_code": "WE-JOB-008",
  "message": "Job processing failed",
  "user_id": "user-uuid",
  "resource_type": "job",
  "resource_id": "job-uuid",
  "stack_trace": "...",
  "context": {
    "worker_id": "worker-001",
    "task_id": "task-uuid"
  }
}

12. Environment Configuration

Kernel

KERNEL_PORT=8080
KERNEL_GRPC_PORT=50051
NATS_URL=nats://nats:4222
DATABASE_URL=postgresql://user:pass@postgres:5432/webencode
OPENBAO_ADDR=http://openbao:8200
OPENBAO_TOKEN=s.xxxxxxx

PLUGIN_DIR=/plugins
LOG_LEVEL=info

Worker

KERNEL_GRPC_ADDR=kernel:50051
FFMPEG_PATH=/usr/bin/ffmpeg
WORKER_ID=worker-001
NATS_URL=nats://nats:4222

13. Security Architecture

Authentication Flow

sequenceDiagram
    Client->>Kernel: GET /v1/jobs<br/>Authorization: Bearer token
    Kernel->>AuthPlugin: Authenticate(token)
    AuthPlugin->>Keycloak: Validate JWT
    Keycloak-->>AuthPlugin: user_id + metadata
    AuthPlugin-->>Kernel: AuthResponse
    Kernel->>Kernel: Cache (5 min TTL)
    Kernel->>AuthPlugin: Authorize(user_id, 'list_jobs', NULL)
    AuthPlugin-->>Kernel: allowed=true
    Kernel-->>Client: 200 OK + jobs list
Loading

Data Encryption

  • In Transit: TLS 1.3 (all HTTP, gRPC, NATS)
  • At Rest: PostgreSQL pgcrypto for sensitive fields
  • Secrets: OpenBao (never raw values in DB or logs)

14. Deployment Configurations

Docker Compose

version: '3.8'
services:
  nats:
    image: nats:latest
    ports: ["4222:4222", "8222:8222"]
    command: "-js"
    volumes: [nats_data:/data]

  postgres:
    image: postgres:17
    environment:
      POSTGRES_DB: webencode
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes: [postgres_data:/var/lib/postgresql/data]
    ports: ["5432:5432"]

  openbao:
    image: openbao/openbao:latest
    ports: ["8200:8200"]
    environment:
      OPENBAO_DEV_ROOT_TOKEN_ID: "dev-token"

  # SeaweedFS - S3-compatible distributed object storage (Apache 2.0)
  seaweedfs-master:
    image: chrislusf/seaweedfs:latest
    ports: ["9333:9333", "19333:19333"]
    command: "master -ip=seaweedfs-master -ip.bind=0.0.0.0"
    volumes: [seaweedfs_master:/data]

  seaweedfs-volume:
    image: chrislusf/seaweedfs:latest
    ports: ["8080:8080", "18080:18080"]
    command: "volume -mserver=seaweedfs-master:9333 -ip.bind=0.0.0.0 -port=8080"
    volumes: [seaweedfs_volume:/data]
    depends_on: [seaweedfs-master]

  seaweedfs-filer:
    image: chrislusf/seaweedfs:latest
    ports: ["8888:8888", "18888:18888", "8333:8333"]
    command: "filer -master=seaweedfs-master:9333 -ip.bind=0.0.0.0 -s3 -s3.port=8333"
    volumes: [seaweedfs_filer:/data]
    depends_on: [seaweedfs-master, seaweedfs-volume]

  kernel:
    build:
      context: .
      dockerfile: kernel/Dockerfile
    ports: ["8000:8000", "50051:50051"]
    environment:
      DATABASE_URL: postgresql://webencode:${DB_PASSWORD}@postgres:5432/webencode
      NATS_URL: nats://nats:4222
      OPENBAO_ADDR: http://openbao:8200
      S3_ENDPOINT: http://seaweedfs-filer:8333
      PLUGIN_DIR: /plugins
    depends_on: [postgres, nats, openbao, seaweedfs-filer]

  worker:
    build:
      context: .
      dockerfile: worker/Dockerfile
    environment:
      KERNEL_GRPC_ADDR: kernel:50051
      NATS_URL: nats://nats:4222
    depends_on: [kernel, nats]

  ui:
    build:
      context: .
      dockerfile: ui/Dockerfile
    ports: ["3000:3000"]
    environment:
      NEXT_PUBLIC_KERNEL_URL: http://localhost:8080
      NEXT_PUBLIC_WS_URL: ws://localhost:8080

volumes:
  postgres_data:
  nats_data:
  seaweedfs_master:
  seaweedfs_volume:
  seaweedfs_filer:

Kubernetes (Production)

apiVersion: v1
kind: Namespace
metadata:
  name: webencode

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webencode-kernel
  namespace: webencode
spec:
  replicas: 3
  selector:
    matchLabels:
      app: webencode-kernel
  template:
    metadata:
      labels:
        app: webencode-kernel
    spec:
      containers:
      - name: kernel
        image: webencode/kernel:v1.0.0
        ports:
        - containerPort: 8080
        - containerPort: 50051
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: webencode-secrets
              key: database-url
        - name: NATS_URL
          value: "nats://nats:4222"
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 2000m
            memory: 2Gi
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 30

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: webencode-worker
  namespace: webencode
spec:
  serviceName: webencode-worker
  replicas: 5
  selector:
    matchLabels:
      app: webencode-worker
  template:
    metadata:
      labels:
        app: webencode-worker
    spec:
      containers:
      - name: worker
        image: webencode/worker:v1.0.0
        env:
        - name: WORKER_ID
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: KERNEL_GRPC_ADDR
          value: "webencode-kernel:50051"
        - name: NATS_URL
          value: "nats://nats:4222"
        resources:
          requests:
            cpu: 4000m
            memory: 8Gi
          limits:
            cpu: 8000m
            memory: 16Gi
        volumeMounts:
        - name: temp
          mountPath: /tmp/webencode-worker
  volumeClaimTemplates:
  - metadata:
      name: temp
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 100Gi

15. Monitoring & Observability

Prometheus Metrics (Kernel)

webencode_kernel_jobs_total{status="completed"} 1234
webencode_kernel_jobs_total{status="failed"} 12
webencode_kernel_job_processing_time_seconds{quantile="0.95"} 450

webencode_plugin_rpc_duration_seconds{plugin="encoder-ffmpeg"} 125
webencode_plugin_error_count{plugin="storage-s3"} 3
webencode_plugin_health{plugin="auth-oidc"} 1

Prometheus Metrics (Workers)

webencode_worker_task_duration_seconds{worker_id="worker-001"} 125
webencode_worker_cpu_usage_percent{worker_id="worker-001"} 85
webencode_worker_ffmpeg_process_count{worker_id="worker-001"} 2

Log Aggregation (Loki)

{job="webencode-kernel"} | json | level="error"
{job="webencode-worker"} | json | correlation_id="xyz"

16. Operational Playbook

CI/CD Pipelines

GitHub Actions

name: Build & Deploy
on:
  push:
    branches: [main]
    tags: [v*]

jobs:
  test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:17
      nats:
        image: nats:latest
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-go@v4
      with:
        go-version: '1.24'
    - run: go test ./... -v -coverprofile=coverage.out
    - uses: codecov/codecov-action@v3

  build:
    needs: test
    runs-on: ubuntu-latest
    strategy:
      matrix:
        target: [kernel, worker, ui]
    steps:
    - uses: actions/checkout@v4
    - run: docker build -f ${{ matrix.target }}/Dockerfile -t webencode/${{ matrix.target }}:${{ github.ref_name }} .
    - run: docker push webencode/${{ matrix.target }}:${{ github.ref_name }}

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
    - run: kubectl set image deployment/webencode-kernel kernel=webencode/kernel:${{ github.ref_name }}
    - run: kubectl rollout status deployment/webencode-kernel

17. Performance Tuning

Database

CREATE INDEX idx_jobs_user_created ON jobs(user_id, created_at DESC) WHERE status != 'completed';
VACUUM ANALYZE jobs;
ALTER SYSTEM SET work_mem = '256MB';

NATS

max_connections = 100000
max_subscriptions = 100000

Go

debug.SetGCPercent(75)
debug.SetMaxStack(256 * 1024 * 1024)

18. Disaster Recovery

RTO & RPO

  • RTO: 5 minutes (restart all services)
  • RPO: 1 minute (NATS replication lag)

Backup Strategy

  1. Tier 1: Continuous PostgreSQL replication, NATS 3-node quorum
  2. Tier 2: Weekly dumps to S3, NATS snapshots
  3. Tier 3: Standby K8s cluster in different region

19. GUI & Frontend Specification (Complete)

19.1 Architecture

graph TB
    UI["Next.js 15 App<br/>(SSR + Server Actions)"]
    API["REST API<br/>(Kernel)"]
    WS["WebSocket<br/>(NATS events)"]
    AUTH["Auth Plugin<br/>(OIDC/Keycloak)"]
    
    UI -->|fetch/POST| API
    UI -->|subscribe| WS
    API -->|validate| AUTH
    API -->|dispatch| WS
Loading

19.2 Page Hierarchy

/
├── /dashboard
│   ├── System Health Card
│   ├── Job Statistics
│   ├── Recent Jobs Table
│   ├── Live Streams Widget
│   └── Resource Utilization Charts
├── /jobs
│   ├── /jobs (list, filter, search)
│   ├── /jobs/[id] (detail, tasks, logs)
│   └── /jobs/new (create wizard)
├── /streams
│   ├── /streams (list, active/past)
│   ├── /streams/[id] (live studio)
│   ├── /streams/new (create stream)
│   └── /streams/[id]/restream (restream destinations)
├── /restreams
│   ├── /restreams (list, schedule)
│   ├── /restreams/[id] (detail, per-destination stats)
│   └── /restreams/new (create restream job)
├── /workers
│   ├── /workers (list, health, uptime)
│   └── /workers/[id] (capabilities, logs, drain)
├── /settings
│   ├── /settings/plugins (manage plugins, config)
│   ├── /settings/profiles (encoding profiles)
│   ├── /settings/storage (S3/SeaweedFS config)
│   ├── /settings/auth (OAuth2 setup)
│   └── /settings/audit (audit log viewer)
└── /auth
    └── /auth/login (OAuth2 redirect)

19.3 Page Specifications

Dashboard (/dashboard)

Real-time subscriptions:

  • dashboard:stats (system health, job counts, worker status, plugin health)
  • live.telemetry.* (aggregate live stream stats)

Sections:

  1. System Health Card

    • NATS: status, version, message backlog
    • PostgreSQL: connections, query latency
    • Workers: online count, health status
    • Plugins: per-plugin uptime, error rate
  2. Job Statistics (last 24h)

    • Queued, Processing, Completed, Failed (counts + % change)
    • Average processing time
    • Total data processed (GB)
  3. Recent Jobs Table

    • Job ID, source, profile, status, progress %, ETA, actions
    • Filter: by status, user, date range
    • Sort: by created_at, eta_seconds, progress_pct
    • Actions: view detail, cancel, retry, delete
  4. Live Streams Widget

    • Active streams list
    • Per-stream: title, viewers, bitrate, uptime
    • Click to open stream dashboard
  5. Resource Utilization

    • CPU, memory, disk usage across workers
    • Time-series charts (Recharts)
    • Alerts if > 80%

Jobs (/jobs)

GET /jobs:

  • List all jobs with pagination (20 per page)
  • Filters: status, user, date range, source_type
  • Search: job ID, source URL
  • Sort: created_at DESC, eta_seconds, progress_pct
  • Real-time progress updates via job:{id}:updates

GET /jobs/[id]:

  • Job metadata: ID, user, source, profile, timestamps
  • Overall progress bar with % and ETA
  • Tasks table:
    • Index, time range, status, assigned worker, output size, retry count
    • Per-task: view logs, retry, requeue
  • Event timeline: Breadcrumbs of status changes (queued → processing → stitching → uploading → completed)
  • Artifact browser: Chunks, stitched output, manifests, thumbnails, checksums
  • FFmpeg logs: Download or view in browser
  • Actions: Cancel job, force stitch, force publish, delete

POST /jobs/new (Multi-step wizard):

  1. Step 1: Source Selection

    • Option A: Enter URL (s3://, https://, file://)
    • Option B: Browse S3 bucket (if storage plugin supports it)
    • Option C: Upload file (via presigned URL)
    • Validation: Check file size, format, accessibility
  2. Step 2: Profile Selection

    • Preset profiles: "1080p_h264", "4k_hevc", "480p_fast", etc.
    • Or "Custom" for advanced override
    • Advanced panel (if custom):
      • Video codec: dropdown [libx264, hevc_nvenc, av1]
      • Bitrate ladder: [500k, 1000k, 2000k, 5000k, 10000k] (enable/disable each)
      • Resolution: width/height or preset
      • Audio: codec, bitrate, language, passthrough vs re-encode
      • Subtitles: passthrough, extract, burn-in
      • Thumbnails: interval (e.g., every 5s)
  3. Step 3: Output & Packaging

    • Container: MP4, MKV, WebM
    • HLS/DASH: enable/disable, segment duration
    • Encryption: none, AES-128, custom
    • Subtitles: embed or separate files
  4. Step 4: Publishing

    • Publish to: YouTube, S3, email, none
    • Per-platform settings:
      • YouTube: title, description, tags, privacy (public/unlisted/private), thumbnail
      • S3: bucket, prefix, storage class
      • Email: recipient, include link
  5. Review & Submit

    • Summary of all settings
    • [Create Job] button
    • Response: Job ID + insta-link to job detail page

Live Streams (/streams)

GET /streams:

  • List: active (at top) + past streams
  • Per-stream: title, status (LIVE / offline), viewers, duration, created_at
  • Filter: active/past, by user, by date range
  • Actions: view detail, edit, archive settings

GET /streams/[id] (Live Studio):

  • Stream info card:

    • Title, description, thumbnail
    • Status: LIVE / IDLE / ENDED
    • Duration (if live): "Online for 2h 15m"
    • Peak viewers: 2,890
  • Ingest settings (read-only during live):

    • RTMP URL: rtmp://ingest.webencode.local:1935/live
    • Stream key: [hidden] [Copy] [Reveal]
    • Protocol: RTMP / SRT (dropdown to switch)
    • Advanced: bitrate targets, encoder presets (if live plugin supports)
  • Live telemetry (updates 1/sec via stream:{id}:telemetry):

    • Video bitrate: current + target (color-coded if below threshold)
    • Audio bitrate: current
    • Frame rate: current + target
    • Audio level: meter (0-100%)
    • Network health: "✓ Excellent" / "⚠ Degraded" / "✗ Poor"
    • Dropped frames: count
    • RTT / jitter: if available
  • Viewer analytics:

    • Current viewers (large number)
    • Peak viewers (last 24h)
    • Time-series chart: viewer count over stream duration
    • Geographic distribution: countries/regions (if available)
  • Restream destinations (if enabled):

    • Table: platform, URL, enabled/disabled, status, bitrate per destination
    • Add destination modal
    • Per-destination stats: latency, error count, last error
  • Chat / embedding (if plugin supports):

    • Embed code for website
    • Chat frame (or link to external chat)
  • Actions:

    • [End Stream] button (visible only if LIVE)
    • [Start Recording] / [Stop Recording]
    • [Archive to VOD] (creates job automatically)
    • [Restream to...] button (opens restream destination editor)
    • [Settings] button (edit stream metadata)

POST /streams/new:

  • Title: text input
  • Description: textarea
  • Archive enabled: checkbox
  • [Create Stream] → returns ingest URL and stream key

Restream (/restreams)

GET /restreams:

  • List all restream jobs
  • Per-job: title, status (streaming/stopped/error), input type, created_at, actions
  • Filter: by status, by date range
  • Actions: view detail, start, stop, delete

GET /restreams/[id]:

  • Job info:

    • Title, description, input type (rtmp/file/vod_job)
    • Schedule: immediate / scheduled / recurring
    • Loop enabled: yes/no
    • Simulate live: yes/no
  • Input:

    • If rtmp: RTMP URL to listen on
    • If file: source file URL + playback controls (play/pause/restart)
    • If vod_job: reference to original job (link)
  • Output destinations (table):

    • Platform, RTMP/SRT URL, enabled/disabled, status
    • Per-destination stats: bitrate, latency, error count, last error
    • Add destination, remove destination
  • Actions:

    • [Start Streaming] / [Stop Streaming]
    • [Edit] button (modify destinations, schedule)
    • [Delete] button

POST /restreams/new:

  • Title, description
  • Input type: RTMP / File / VOD Job (radio)
  • Conditional fields based on input type
  • Output destinations: array input (add multiple)
  • Schedule: Immediate / Scheduled (date/time) / Recurring (cron-like)
  • Loop: checkbox
  • Simulate live: checkbox (if file input)
  • [Create Restream Job]

Workers (/workers)

GET /workers:

  • List all workers
  • Per-worker: ID, hostname, IP, status (healthy/degraded/offline), CPU %, memory %, tasks running, uptime
  • Filter: by status, by hostname
  • Actions: view detail, drain, restart, remove

GET /workers/[id]:

  • Metadata: ID, hostname, IP, port, version, registration time

  • Capabilities (from GetCapabilities RPC):

    • Video codecs: [libx264, hevc_nvenc, av1]
    • Audio codecs: [aac, opus]
    • GPU: nvidia / amd / none
    • GPU count, model
    • CPU cores
    • Memory (GB)
    • Max parallel tasks
  • Health:

    • Status: healthy / degraded / failed
    • Last heartbeat
    • Uptime
    • Recent errors (last 5)
  • Performance (last 24h):

    • Tasks completed: count
    • Average task time
    • Error rate
    • CPU/memory usage chart
  • Actions:

    • [Drain] (stop accepting new tasks, wait for current to finish)
    • [Undrain] (resume accepting)
    • [Restart] (graceful restart)
    • [Remove] (permanently deregister)
    • [View Logs] (stream last N log lines)

Plugins (/settings/plugins)

GET /settings/plugins:

  • List all plugins by type (Auth, Storage, Encoder, Live, Publisher)
  • Per-plugin: ID, type, version, status (enabled/disabled/degraded), uptime, error rate
  • Actions: enable/disable, configure, view logs, test connection

Plugin Detail Modal (per plugin):

  • Info: ID, type, version, protocol version

  • Status: enabled/disabled/degraded, last error

  • Capabilities (from plugin.GetCapabilities):

    • For encoder: codecs, GPU support
    • For storage: backends supported
    • For live: protocols supported
  • Configuration:

    • JSON editor (with schema validation)
    • Secret fields marked as (*_secret_ref, *_token_ref)
    • [Test Connection] button (calls plugin RPC)
    • [Save] button
  • Logs:

    • Last 50 log lines (streaming via WebSocket)
    • Filter: by level (info, warn, error)
  • Metrics:

    • RPC call latency (p50, p95, p99)
    • Error rate (%)
    • Uptime (%)

Routing & Priority:

  • Table: plugin type, current instance, priority, routing rule
  • Edit routing rule for each type (e.g., "Use encoder-ffmpeg for h264, encoder-av1 for av1")

Encoding Profiles (/settings/profiles)

GET /settings/profiles:

  • List all profiles: name, codec, bitrate, resolution
  • Actions: edit, duplicate, delete

Profile Detail Editor:

  • Basic:

    • Name: "1080p_h264"
    • Description: "Full HD H.264"
    • Video codec: libx264
    • Bitrate: 5000 kbps
    • Resolution: 1920x1080
  • Advanced:

    • Preset: ultrafast / fast / medium / slow
    • Audio codec: aac
    • Audio bitrate: 128 kbps
    • Container: mp4
    • HLS/DASH: enable, segment duration (e.g., 6s)
  • Ladder (ABR):

    • Multiple quality tiers
    • Per-tier: codec, bitrate, resolution
    • Enable/disable each tier
    • [Add tier] button
  • [Save] button

Audit Log (/settings/audit)

GET /settings/audit:

  • Table: timestamp, user, action, resource, result (success/failure)
  • Filter: by user, action, date range
  • Search: by resource ID
  • Per-row: [View details] modal showing full payload
  • Export: download as CSV

19.4 Real-time Updates Architecture

sequenceDiagram
    UI->>Kernel: Subscribe to 'job:job-123:updates'
    Kernel->>NATS: subscribe jobs.events with filter
    Worker->>NATS: publish job.events {job_id, progress_pct, ...}
    NATS->>Kernel: deliver event
    Kernel->>UI: WebSocket message
    UI->>UI: Update progress bar
Loading

19.5 Role-Based Access Control (RBAC)

graph TD
    A["User Role"] --> B{"Viewer"}
    A --> C{"Operator"}
    A --> D{"Admin"}
    
    B --> B1["View jobs/streams<br/>View dashboards<br/>View workers (read-only)"]
    
    C --> C1["All Viewer perms"]
    C --> C2["Create/edit jobs<br/>Create/edit streams<br/>Manage restreams<br/>View worker logs"]
    
    D --> D1["All Operator perms"]
    D --> D2["Manage plugins<br/>Manage profiles<br/>Worker drain/restart<br/>Audit log access"]
Loading

19.6 Internationalization (i18n)

StreamHub and Admin UI fully support internationalization via next-intl.

  • Architecture:

    • Path-Prefix Routing: /{locale}/route (e.g., /en/dashboard, /ja/browse).
    • Middleware: Automatically detects user locale from headers/cookies and redirects or rewrites paths.
    • Server Components: Uses getMessages() for zero-bundle-size translations on the server.
    • Client Components: Uses useTranslations() hook for interactive elements.
  • Supported Locales:

    • en (English) - Default
    • de (German)
    • ja (Japanese)
    • es (Spanish)
    • fr (French)
  • Implementation details:

    • Config: src/i18n/config.ts defines supported locales.
    • Routing: src/i18n/routing.ts defines navigation wrappers (Link, redirect, usePathname).
    • Messages: JSON files located in messages/{locale}.json.
    • Layout: app/[locale]/layout.tsx serves as the root layout injects NextIntlClientProvider.


20. Webhook System

20.1 Architecture

Webhooks allow external systems to receive real-time notifications about job statuses and stream lifecycle events.

  • Delivery: Asynchronous via NATS webhooks.dispatch queue
  • Retries: Exponential backoff (up to 24 hours)
  • Security: HMAC-SHA256 signatures

20.2 Event Types

Event Type Trigger
job.created Job submitted and queued
job.started Worker picked up first task
job.completed All tasks finished, stitching done
job.failed Job failed after max retries
stream.started RTMP ingest connected
stream.ended RTMP ingest disconnected
recording.ready Live stream archive VOD is ready

20.3 Payload Schema

POST requests are sent with Content-Type: application/json:

{
  "event_id": "evt-uuid",
  "event_type": "job.completed",
  "timestamp": "2025-12-22T12:00:00Z",
  "data": {
    "job_id": "job-uuid",
    "user_id": "user-uuid",
    "status": "completed",
    "outputs": [
      {
        "url": "https://cdn.example.com/video.mp4",
        "format": "mp4",
        "size_bytes": 1024000
      }
    ]
  }
}

20.4 Security Verification

Recipients should verify the X-WebEncode-Signature header: Signature = HMAC-SHA256(payload, webhook_secret)


21. Testing Strategy

21.1 Testing Pyramid

  1. Unit Tests (60%):
    • Go: Standard testing package. Mock interfaces for plugins/DB.
    • React: Jest + React Testing Library for components.
  2. Integration Tests (30%):
    • Testcontainers: Spin up ephemeral Postgres, NATS, SeaweedFS.
    • Verify NATS pub/sub flow, DB constraints, plugin RPC contracts.
  3. End-to-End (E2E) Tests (10%):
    • Playwright: Full browser flow (Login -> Create Job -> View Progress).
    • API Tests: Postman/Newman collections running against a staging env.

21.2 Performance Testing

  • Tool: k6
  • Scenarios:
    • Job Submission flood (100 req/sec)
    • WebSocket fanout (10k concurrent connections)
  • CI Integration: Run strictly defined performance baselines on every PR.

22. Public Repository Guidelines

22.1 Repository Standards

To ensure the project is welcoming and professional:

  • README.md: High-quality banner, clear "Getting Started", Architecture diagram.
  • CONTRIBUTING.md: Dev environment setup, coding standards (buffers vs streams), PR process.
  • LICENSE: MIT License file.
  • CODE_OF_CONDUCT.md: Standard Contributor Covenant.
  • SECURITY.md: Vulnerability reporting policy.

22.2 Branching Strategy

  • main: Stable, production-ready code.
  • develop: Integration branch for next release.
  • feat/xyz: Feature branches.
  • fix/xyz: Bug fix branches.
  • Commits: Follow Conventional Commits (e.g., feat: add av1 support, fix: resolve memory leak).

22.3 Issue Triage

  • Use GitHub Issue Templates (Bug Report, Feature Request).
  • Labels: good first issue, help wanted, bug, enhancement.

22.4 Branding and Logo

WebEncode has an official logo (logo.svg) that represents the platform's core functionality:

Logo Elements:

  • Film Strip Frame: Represents video/media processing
  • Play Button (Center): Symbolizes video playback and streaming
  • Code Brackets: Represent the encoding/transformation process
  • Distributed Nodes: Four corner nodes connected to the center represent the distributed worker architecture
  • Transformation Arrows: Indicate the encoding transformation workflow

Color Scheme:

  • Primary Blue (#3B82F6): Main brand color, represents reliability and technology
  • Accent Blue (#60A5FA): Secondary elements, film strip details
  • Success Green (#10B981): Distributed nodes and transformation arrows
  • Dark Background (#0F172A): Professional, modern appearance

Usage:

  • Display at the top of README.md (150x150px)
  • Use in documentation headers where appropriate
  • Maintain aspect ratio when scaling
  • SVG format allows scaling to any size without quality loss

File Location: /logo.svg (root of repository)


23. Features & Non-Features Checklist

Project Philosophy

WebEncode is a self-hosted, open-source video processing platform. There is no billing, quotas, or usage metering. You own and operate your own infrastructure.

Must-Have Features

  • VOD Encoding: Split & stitch, ABR ladder, thumbnails, subtitles passthrough/extract/burn-in, per-task retry, cancel, resume
  • Live Streaming: Create/manage channels, OBS ingest, live telemetry dashboard, recording/archive to VOD
  • Restreaming: RTMP/SRT input → multiple RTMP/SRT outputs, file-as-live with scheduling & looping
  • Automated Upload: Publish VOD to YouTube, metadata + scheduling + retry + status tracking
  • Plugin-First Extensibility: 5 pillars (auth, storage, encoder, live, publisher) with full lifecycle management
  • Self-Hosted First: No external dependencies, runs entirely on your infrastructure
  • Ops-Grade Runtime: Worker pool, job queue visibility, audit logs, metrics, alerting, backups
  • Cross-Platform Workers: Run workers on Linux, Windows, macOS (anywhere FFmpeg runs)
  • Open Source: MIT licensed, community-driven development

Non-Features (Explicitly Excluded)

  • ❌ In-browser video editor (complex client-side app, out of scope)
  • ❌ Full OBS replacement (we ingest from OBS, not replace it)
  • ❌ DRM licensing management (use specialized services)
  • ❌ Real-time subtitle generation via ML (future AI plugin possible)
  • ❌ Custom filter/effect creation in UI (use FFmpeg filters directly)
  • ❌ Billing / Quotas / Usage Metering (this is self-hosted, not SaaS)
  • ❌ Multi-tenant architecture (single-tenant by design, simpler operations)

24. Future Roadmap

Phase 1 (Q1 2026): Core MVP

  • Basic VOD encoding workflow
  • Live streaming + archive
  • Restream RTMP/file
  • Plugin system + 5 pillar implementations
  • Web dashboard (basic)
  • Worker pool management
  • PostgreSQL + NATS + SeaweedFS stack

Phase 2 (Q2 2026): Enhanced Features

  • AI Highlights Plugin (auto-clip from live streams)
  • Advanced Auth: SAML, LDAP, Active Directory
  • Custom encoding profiles (advanced ladder editor)
  • Webhook delivery + retry logic
  • Audit log with full retention

Phase 3 (Q3 2026): Scaling & Performance

  • Multi-region ingest/origin servers
  • Predictive auto-scaling (based on traffic forecast)
  • Hardware acceleration: NVIDIA, AMD GPU optimization
  • AV1 encoding full support
  • Cache optimization strategies

Phase 4 (Q4 2026): Ecosystem

  • Plugin Marketplace (discovery + installation)
  • Analytics API (GraphQL for job/stream metrics)
  • Advanced webhooks (transformation, filtering)
  • CDN integration partnerships
  • White-label options

25. Glossary

25.1 Architecture Terms

  • Kernel: The central control plane service (Go) responsible for API handling, job orchestration, and plugin management. The Kernel is the "brain" of the cluster.
  • Pillar: One of the 5 core extensible plugin interfaces: Auth, Storage, Encoder, Live, Publisher.
  • Worker: A stateless execution node that subscribes to tasks. Workers run FFmpeg subprocesses and report progress to the Kernel via NATS.
  • Plugin: An independent process (launched by the Kernel) that implements a specific gRPC interface (e.g., plugin-auth-oidc, plugin-storage-s3).

25.2 Domain Terms

  • Job: A request to process a VOD file. Consists of multiple Tasks.
  • Task: An atomic unit of work (e.g., "transcode segment 10 to 1080p").
  • Stream: A live video session with a unique stream key and persistent telemetry.
  • Restream: The process of forwarding a live ingest or VOD file to external RTMP destinations (Twitch, YouTube) in real-time.
  • Profile: A named configuration defining codecs, bitrates, resolutions, and packaging settings (e.g., "1080p_h264").

25.3 Technology Terms

  • SeaweedFS: High-performance distributed object storage used for video artifacts and segments. Replaces MinIO for license compliance.
  • OpenBao: Open-source secret management service (fork of Vault) used to store plugin credentials.
  • MediaMTX: The underlying media server engine used by the Live Plugin for RTMP/SRT/WebRTC handling.
  • NATS JetStream: The messaging backbone providing persistent work queues (WEBENCODE_WORK) and event streams.

Appendix: Configuration Examples

Example: Storage Plugin (SeaweedFS)

{
  "plugin_id": "storage-s3",
  "config": {
    "backend": "s3",
    "endpoint": "seaweedfs-filer.webencode.svc:8333",
    "region": "us-east-1",
    "bucket": "webencode",
    "access_key_id_ref": "secret/webencode/seaweedfs/access_key",
    "secret_access_key_ref": "secret/webencode/seaweedfs/secret_key",
    "use_ssl": false,
    "signed_url_expiry_seconds": 3600
  }
}

Example: Encoder Plugin (FFmpeg)

{
  "plugin_id": "encoder-ffmpeg",
  "config": {
    "ffmpeg_path": "/usr/bin/ffmpeg",
    "ffprobe_path": "/usr/bin/ffprobe",
    "max_parallel_tasks": 2,
    "presets": {
      "1080p_h264": {
        "codec": "libx264",
        "bitrate": "5000k",
        "width": 1920,
        "height": 1080,
        "fps": 30,
        "preset": "fast"
      },
      "4k_hevc": {
        "codec": "hevc_nvenc",
        "bitrate": "15000k",
        "width": 3840,
        "height": 2160,
        "fps": 30,
        "gpu": "nvidia"
      }
    }
  }
}

Example: Live Plugin (MediaMTX)

{
  "plugin_id": "live-mediamtx",
  "config": {
    "rtmp_port": 1935,
    "srt_port": 10080,
    "webrtc_port": 8443,
    "record_segments": true,
    "segment_duration": 2,
    "hls_enabled": true,
    "hls_directory": "/tmp/hls"
  }
}

Example: Auth Plugin (OIDC)

{
  "plugin_id": "auth-oidc-keycloak",
  "config": {
    "issuer": "https://keycloak.example.com/realms/webencode",
    "client_id": "webencode-app",
    "client_secret_ref": "secret/webencode/oidc/client_secret",
    "scopes": ["openid", "profile", "email"],
    "token_cache_ttl_seconds": 300
  }
}

Example: Publisher Plugin (YouTube)

{
  "plugin_id": "publisher-youtube",
  "config": {
    "api_key_ref": "secret/webencode/youtube/api_key",
    "oauth_credentials_ref": "secret/webencode/youtube/oauth",
    "default_privacy": "unlisted",
    "enable_comments": true,
    "enable_ratings": true,
    "auto_publish_delay_seconds": 0
  }
}

End of Specification (v7 - FINAL)

This specification is comprehensive, production-ready, and includes complete UI specifications, plugin system lifecycle, routing rules, and Mermaid diagrams. All components are finalized and technology choices are justified for 2025/2026 implementation.