WebEncode - Distributed Video Processing Platform (v2025)

Status: ULTRA-EXHAUSTIVE SPECIFICATION (FINAL PRODUCTION-READY) Validated For: 2025/2026 Cloud-Native Ecosystem Architecture: Micro-Kernel + 5-Pillar Plugin Mesh Document ID: WE-SPEC-2025-FINAL-v7 Lines: 2500+ (Complete Implementation Blueprint) Last Updated: 2025-12-24 License: MIT (Open Source)

Vision & Architecture
Project Structure
Global Error Tracking System
Technology Stack (Validated 2025/2026)
Data Models
Messaging Architecture
Plugin System (Detailed Lifecycle)
Plugin Interfaces (5 Pillars - Complete)
Core Workflows
FFmpeg Integration
API Reference (Complete)
Error Handling & Codes
Environment Configuration
Security Architecture
Deployment
Monitoring & Observability
Operational Playbook
Performance Tuning
Disaster Recovery
GUI & Frontend Specification (Complete)
Webhook System
Testing Strategy
Public Repository Guidelines
Features & Non-Features Checklist
Future Roadmap
Glossary

1. Vision & Architecture

1.1 The "Hyper-Modular" Philosophy

WebEncode is designed to be the "Engine of Engines". Unlike traditional monolithic encoders, the WebEncode Kernel contains zero business logic regarding storage backends, authentication providers, or specific transcoding engines. Instead, it acts as a high-performance orchestrator for 5 distinct plugin pillars.

This ensures:

Future Proofing: You can swap ffmpeg for a future AI-based encoder without touching the core.
Hybrid Cloud: Run Storage plugins for S3 (Cloud) and Local Disk (On-Prem) simultaneously.
Compliance: Swap the Auth plugin to integrate with any Identity Provider (OIDC, SAML, AD).
License Safety: Architecture explicitly swaps viral/commercial components (MinIO, Vault) for permissively licensed alternatives (SeaweedFS, OpenBao).

1.2 System Architecture Diagram

graph TB
    subgraph "Frontend Layer"
        UI["Next.js 15 Web UI<br/>(TypeScript / Tailwind / shadcn/ui)"]
        WS["WebSocket/SSE<br/>Real-time Updates"]
    end

    subgraph "Control Plane (Kernel - Go 1.24)"
        Kernel["Core Kernel Service<br/>(REST API / gRPC Server)"]
        Auth["Auth Plugin Manager"]
        Storage["Storage Plugin Manager"]
        Encoder["Encoder Plugin Manager"]
        Live["Live Plugin Manager"]
        Publisher["Publisher Plugin Manager"]
    end

    subgraph "Data Plane"
        NATS["NATS JetStream 2.10+<br/>(WorkQueue + Event Streams)"]
        PG["PostgreSQL 16<br/>(State & Metadata)"]
        Secrets["OpenBao<br/>(Secrets Management)"]
    end

    subgraph "Plugin Processes (Isolated via HashiCorp go-plugin)"
        P_Auth["Auth Plugin<br/>(e.g., OIDC/Keycloak)"]
        P_Storage["Storage Plugin<br/>(e.g., SeaweedFS/S3)"]
        P_Encoder["Encoder Plugin<br/>(FFmpeg + worker)"]
        P_Live["Live Plugin<br/>(MediaMTX ingest)"]
        P_Publisher["Publisher Plugin<br/>(YouTube/Twitch)"]
    end

    subgraph "Worker Pool (Go 1.24 + FFmpeg)"
        W1["Worker 1"]
        W2["Worker 2"]
        WN["Worker N"]
    end

    UI -->|REST/gRPC| Kernel
    UI -->|Subscribe| WS
    WS -->|NATS events| Kernel
    
    Kernel --> Auth
    Kernel --> Storage
    Kernel --> Encoder
    Kernel --> Live
    Kernel --> Publisher
    
    Auth -.->|gRPC<br/>UDS| P_Auth
    Storage -.->|gRPC<br/>UDS| P_Storage
    Encoder -.->|gRPC<br/>UDS| P_Encoder
    Live -.->|gRPC<br/>UDS| P_Live
    Publisher -.->|gRPC<br/>UDS| P_Publisher
    
    Kernel <-->|NATS| NATS
    Kernel <-->|SQL| PG
    Kernel <-->|Secrets| Secrets
    
    NATS -->|Work items| W1
    NATS -->|Work items| W2
    NATS -->|Work items| WN
    
    W1 -->|Progress events| NATS
    W2 -->|Progress events| NATS
    WN -->|Progress events| NATS

1.3 Scalability Model

100 Users: Single-node or small cluster. API server handles 0 video bytes (stateless).
10,000+ Users: Horizontal scaling of Workers (KEDA) and Kernel replicas. Storage scales infinitely via SeaweedFS volumes.

2. Project Structure (Standard Monorepo)

The project follows a standard Go Modern Monorepo layout.

WebEncode/
├── cmd/                        # Application Entrypoints
│   ├── kernel/                 # Main control plane binary (go run ./cmd/kernel)
│   └── worker/                 # Worker entrypoint (go run ./cmd/worker)
│
├── deploy/                     # Deployment Configurations
│   ├── docker-compose.yml      # Local dev stack (Kernel + NATS + PG + SeaweedFS + UI)
│   ├── k8s/                    # Kubernetes manifests / Helm charts
│   └── terraform/              # Infrastructure as Code
│
├── frontend/                   # Next.js 15 Web Application
│   ├── src/
│   │   ├── app/                # App Router pages
│   │   ├── components/         # shadcn/ui components
│   │   └── lib/                # API clients & utils
│   ├── public/
│   └── next.config.ts
│
├── internal/                   # Private Kernel Logic (No external imports)
│   ├── api/                    # REST/Gin/Echo handlers
│   ├── core/                   # Domain Logic (Job Manager, Scheduler)
│   ├── database/               # PostgreSQL queries (sqlc/pgx)
│   └── events/                 # NATS publisher/subscriber logic
│
├── pkg/                        # Public Shared Libraries
│   ├── plugin_sdk/             # Official Go SDK for writing plugins
│   └── protocol/               # Generated gRPC Protobuf code
│
├── plugins/                    # Core Plugin Suite (Monorepo-managed)
│   ├── auth/
│   │   ├── plugin-auth-oidc/   # OIDC/Keycloak/Google implementation
│   │   ├── plugin-auth-ldap/   # LDAP implementation
│   │   └── plugin-auth-basic/  # Basic Auth implementation
│   ├── storage/
│   │   ├── plugin-storage-fs/  # Local Filesystem implementation
│   │   └── plugin-storage-s3/  # SeaweedFS/S3 implementation
│   ├── encoder/
│   │   └── plugin-encoder-ffmpeg/
│   ├── live/
│   │   └── plugin-live-mediamtx/
│   └── publisher/
│       ├── plugin-publisher-youtube/
│       ├── plugin-publisher-twitch/
│       ├── plugin-publisher-kick/
│       └── plugin-publisher-rumble/
│
├── proto/                      # Protobuf Definitions
│   ├── auth.proto
│   ├── storage.proto
│   ├── encoder.proto
│   ├── live.proto
│   └── publisher.proto
│
├── scripts/                    # Build & Helper Scripts
│   ├── build_all.sh            # Builds kernel + all plugins
│   └── generate_proto.sh       # Regenerates gRPC code
│
├── go.mod                      # Root Go Module
├── go.sum
└── Makefile                    # Make targets (dev, build, test, docker)

2.1 Key Directories

cmd/: Only package main. Minimal logic. Calls into internal/.
internal/: The "Kernel" logic. Not importable by plugins.
plugins/: Each folder here is a separate Go module (or shares the root mod if tight coupling is desired, but separate allows independent versioning).
proto/: The contract. All plugins depend on the code generated from these files.

2.5 Global Error Tracking System

A centralized system to capture, store, and analyze errors from all components (Frontend, Kernel, Workers, Plugins).

Architecture

Central Event Log: PostgreSQL table error_events stores all error occurrences.
Ingestion API: POST /v1/errors endpoint for external components (Frontend, isolated Plugins).
Internal Hooks: Kernel and Worker loggers attach to the error tracking service to auto-report ERROR level events.
Frontend Capture: Global error boundary and event listeners capture unhandled JS exceptions, Promise rejections, and resource loading failures (404s).

Data Model (`error_events`)

Field	Type	Description
`id`	UUID	Unique Error ID
`source_component`	Enum	`frontend`, `kernel`, `worker`, `plugin:<id>`
`severity`	Enum	`warning`, `error`, `critical`, `fatal`
`message`	Text	Human-readable error message
`stack_trace`	Text	(Optional) Stack trace or raw error dump
`context_data`	JSONB	Data map (e.g., `{"job_id": "...", "url": "..."}`)
`created_at`	Timestamp	UTC Time of occurrence
`resolved`	Boolean	Triage status

Integration Points

Frontend: GlobalErrorTracker component listens to window.onerror, unhandledrejection, and capture-phase error events to catch 404s.
Kernel/Worker: pkg/logger hooks dispatch critical errors to the DB asynchronously.
Plugins: Return detailed errors via gRPC, which the Kernel wraps and logs to the tracking system.

3. Technology Stack (Validated 2025/2026)

3.1 Technology Validation Summary

The following technology choices have been validated through extensive 2025 research:

✅ Go 1.24+ (CONFIRMED - Optimal Choice)

Why Go over Rust?

Development Speed: Go's simpler syntax and fast compilation enable faster time-to-market
Concurrency Model: Goroutines provide intuitive, efficient concurrency for managing parallel video streams and transcoding jobs
Cross-Platform: Single static binary deploys to Linux, Windows, macOS without dependencies
FFmpeg Integration: Go orchestrates FFmpeg subprocesses efficiently; the actual video processing happens in FFmpeg (C/C++), so Rust's raw performance advantage doesn't apply
Ecosystem Maturity: Frameworks like Gin, Echo, and go-plugin are battle-tested for microservices
Hybrid Approach Viable: If specific components need Rust-level performance in the future, they can be isolated as plugins

✅ NATS JetStream 2.10+ (CONFIRMED - Optimal Choice)

Why NATS over Kafka?

Lower Latency: NATS achieves sub-millisecond latency; critical for real-time telemetry
Simpler Operations: Single Go binary vs Kafka's JVM + ZooKeeper (or KRaft) complexity
Smaller Footprint: Ideal for self-hosted and edge deployments
Built-in Features: KV Store, Object Store, WorkQueue semantics without additional processes
Sufficient Throughput: Millions of messages/second; video platform doesn't need Kafka's extreme throughput

✅ Next.js 16 (CONFIRMED - Optimal Choice)

Why Next.js over Remix/Astro?

Versatility: SSR, SSG, ISR all supported; dashboard needs dynamic + static pages
React 19 Integration: Full React 19 support with stable Turbopack bundler
Performance: Turbopack now stable, faster builds and hot reload
Ecosystem: shadcn/ui, TanStack Query, massive component library support
Server Actions: Hardened for security, ideal for admin operations

✅ OpenBao (CONFIRMED - Best OSS Choice)

Why OpenBao over HashiCorp Vault?

License Safety: OpenBao is MPL 2.0 (Open Source), whereas HashiCorp Vault is BSL (Business Source License).
Commercial Freedom: No restrictions on commercial use or competition, unlike BSL.
Vault Compatibility: Drop-in replacement for HashiCorp Vault APIs (forked from the last open-source version).
Community Governance: Managed by the Linux Foundation, ensuring long-term open-source commitment.

✅ SeaweedFS (CONFIRMED - Best FOSS Storage)

Why SeaweedFS over MinIO?

License Safety: SeaweedFS is Apache 2.0 (Permissive), whereas MinIO is AGPLv3 (Viral/Copyleft).
Commercial Safety: Zero risk of viral license contamination for commercial deployments.
Performance: Optimized for handling billions of small files, often faster than MinIO for high-concurrency workloads.
API: Fully S3-compatible via its Filer component.

✅ MediaMTX (CONFIRMED - Best for Our Use Case)

Why MediaMTX over SRS/Nginx-RTMP?

Go-Native: Same language as Kernel/Workers, easier integration and contribution
Protocol Breadth: RTSP, RTMP, HLS, WebRTC, SRT, LL-HLS all supported
License: MIT (Permissive) - safe for all commercial uses.
Active Development: JWT auth, multi-track HLS, AV1/H265 codec support
Plugin Integration: Easy to embed as a Go library or run as a service

3.2 Complete Tool & Language Mapping

Component	Technology	Version	License	Purpose (Exact)
Kernel	Go 1.24	1.24+	BSD	Single control-plane service: REST API, WebSocket/SSE, job/task state machine. Cross-platform static binaries.
Workers	Go 1.24 + FFmpeg	1.24+	BSD/LGPL	Stateless executors: subscribe to work items from NATS. FFmpeg is LGPL (dynamically linked).
Plugin Host	HashiCorp go-plugin	v2.0+	MPL 2.0	Subprocess-based plugin isolation.
Messaging	NATS JetStream	2.12+	Apache 2.0	WorkQueue stream for task dispatch and event fanout. Supports delayed scheduling, atomic batch publish.
Metadata DB	PostgreSQL	17.x	PostgreSQL	Source of truth for jobs, tasks, streams.
Object Storage	SeaweedFS / Local FS	Latest	Apache 2.0	S3-compatible API (SeaweedFS) for HA. Local Filesystem plugin (`fs://`) for simple single-node deployments.
Live Ingest	MediaMTX	Latest	MIT	RTMP/SRT/WebRTC ingest termination. Deployed as plugin service.
Auth Provider	Keycloak (OIDC)	Latest	Apache 2.0	Via Auth plugin RPC boundary.
Secrets	OpenBao	Latest	MPL 2.0	Central secrets store. Replaces Vault (BSL).
Web UI	Next.js 16	16.1+	MIT	Mission-control dashboard: jobs, live studio, restream studio, workers, plugins, profiles, audit log. TypeScript, React 19, Turbopack stable.
UI Component Lib	shadcn/ui	Latest	MIT	Headless Tailwind components. Accessible, customizable, no vendor lock-in. Used for all UI elements.
UI State	React Context + TanStack Query	Latest	MIT	React Context for auth/global state. TanStack Query for server-state caching and synchronization.
UI Real-time	Native WebSocket or Socket.io	4.8+	MIT	Dashboard/stream/job detail pages subscribe to `job:{id}:updates`, `stream:{id}:telemetry`, `dashboard:stats` subjects from NATS. Refreshed 1-2 times per second.
Charting	Recharts	Latest	MIT	Time-series charts for job analytics, live bitrate, viewer count. Built on React, responsive.
Icons	Lucide React	Latest	ISC	Consistent icon library. Accessible SVG icons.
Containerization	Docker	27.x+	Apache 2.0	Multi-stage builds for optimized images. Build separate images for kernel, workers, plugins. Alpine or Debian base depending on dependencies.
Orchestration	Docker Compose (dev) / Kubernetes (prod)	Compose 2.x / K8s 1.31+	Apache 2.0	Docker Compose for local dev and small deployments. Kubernetes for enterprise with HA, autoscaling, multi-region. Helm charts optional.
CI/CD	GitHub Actions + GitLab CI	Latest	Various	Out-of-the-box workflows for build, test, push to registry. Both platforms supported for maximum flexibility.
Monitoring	Prometheus + Grafana + Loki	Latest	Apache 2.0	Prometheus for metrics, Grafana for dashboards, Loki for log aggregation. All OSS. OpenTelemetry for tracing.

3.3 Core Plugin Suite (MVP Deliverables)

The following plugins are included in the base application repository and constitute the "Standard Library":

Plugin ID	Type	Implementation Details	Use Case
`plugin-storage-fs`	Storage	Local Filesystem. Stores objects as standard files in a configured root directory.	Local/Single-Node.
`plugin-storage-s3`	Storage	S3-Compatible. Supports AWS S3, SeaweedFS, MinIO, Cloudflare R2.	Distributed/Cloud.
`plugin-auth-oidc`	Auth	OIDC/OAuth2. Universal handler for Keycloak, Google, GitHub, GitLab.	SSO / Social Login.
`plugin-auth-ldap`	Auth	LDAP/Active Directory. Direct bind and search for enterprise directories.	Enterprise Auth.
`plugin-auth-basic`	Auth	HTTP Basic / None. Supports simple HTpasswd or trusting upstream HTTP Basic headers.	Internal / No-Auth.
`plugin-encoder-ffmpeg`	Encoder	FFmpeg Wrapper. Manages local `ffmpeg` processes, parses progress, handles cancellation.	Video Transcoding.
`plugin-live-mediamtx`	Live	MediaMTX Connector. Manages path configuration and consumes API stats from MediaMTX.	Live Streaming.
`plugin-publisher-youtube`	Publisher	YouTube Data API v3. Handles VOD Uploads and Live RTMP broadcasting.	Social VOD & Live.
`plugin-publisher-twitch`	Publisher	Twitch API + RTMP. Handles Live Streaming and Stream Metadata updates.	Live Streaming.
`plugin-publisher-kick`	Publisher	Kick RTMP. Handles Live Streaming to Kick.com.	Live Streaming.
`plugin-publisher-rumble`	Publisher	Rumble RTMP. Handles Live Streaming to Rumble.	Live Streaming.

4. Data Models (PostgreSQL DDL)

4.1 `jobs` Table

CREATE TYPE job_status AS ENUM ('queued', 'processing', 'stitching', 'uploading', 'completed', 'failed', 'cancelled');
CREATE TYPE job_source_type AS ENUM ('url', 'upload', 'stream', 'restream');

CREATE TABLE jobs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    
    source_type job_source_type DEFAULT 'url',
    source_url TEXT,
    source_stream_id UUID REFERENCES streams(id),
    
    profile_id VARCHAR(100) NOT NULL,
    output_config JSONB DEFAULT '{}',
    
    status job_status DEFAULT 'queued',
    progress_pct INT DEFAULT 0,
    error_message TEXT,
    
    assigned_to_worker_id VARCHAR(100),
    
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    started_at TIMESTAMPTZ,
    finished_at TIMESTAMPTZ,
    eta_seconds INT
);
CREATE INDEX idx_jobs_user ON jobs(user_id);
CREATE INDEX idx_jobs_status ON jobs(status);
CREATE INDEX idx_jobs_created ON jobs(created_at DESC);

4.2 `tasks` Table

CREATE TABLE tasks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    job_id UUID NOT NULL REFERENCES jobs(id) ON DELETE CASCADE,
    
    sequence_index INT NOT NULL,
    start_time_sec DOUBLE PRECISION NOT NULL,
    end_time_sec DOUBLE PRECISION NOT NULL,
    
    assigned_worker_id VARCHAR(100),
    attempt_count INT DEFAULT 0,
    max_attempts INT DEFAULT 3,
    
    output_key TEXT,
    output_size_bytes BIGINT,
    
    status VARCHAR(20) DEFAULT 'pending'
);
CREATE INDEX idx_tasks_job ON tasks(job_id);
CREATE INDEX idx_tasks_status ON tasks(status);

4.3 `streams` Table

CREATE TABLE streams (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    
    title VARCHAR(255),
    description TEXT,
    thumbnail_url TEXT,
    
    stream_key VARCHAR(64) UNIQUE NOT NULL,
    ingest_server VARCHAR(100),
    
    is_live BOOLEAN DEFAULT FALSE,
    current_viewers INT DEFAULT 0,
    total_viewers_lifetime BIGINT DEFAULT 0,
    started_at TIMESTAMPTZ,
    ended_at TIMESTAMPTZ,
    
    last_stats JSONB,
    
    archive_enabled BOOLEAN DEFAULT TRUE,
    archive_vod_job_id UUID REFERENCES jobs(id),
    
    restream_destinations JSONB DEFAULT '[]'
);
CREATE INDEX idx_streams_user ON streams(user_id);
CREATE INDEX idx_streams_live ON streams(is_live);

4.4 `workers` Table

CREATE TABLE workers (
    id VARCHAR(100) PRIMARY KEY,
    hostname VARCHAR(255),
    ip_address INET,
    port INT,
    
    capabilities JSONB,
    
    is_healthy BOOLEAN DEFAULT TRUE,
    last_heartbeat TIMESTAMPTZ DEFAULT NOW(),
    version VARCHAR(20),
    
    created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_workers_healthy ON workers(is_healthy);

4.5 `plugin_configs` Table

CREATE TABLE plugin_configs (
    id VARCHAR(100) PRIMARY KEY,
    plugin_type VARCHAR(50) NOT NULL,
    
    config_json JSONB NOT NULL,
    
    is_enabled BOOLEAN DEFAULT TRUE,
    priority INT DEFAULT 0,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_plugin_configs_type ON plugin_configs(plugin_type);

4.6 `restream_jobs` Table

CREATE TABLE restream_jobs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    
    title VARCHAR(255),
    description TEXT,
    
    input_type VARCHAR(20), -- 'rtmp', 'file', 'vod_job'
    input_url TEXT,
    
    output_destinations JSONB NOT NULL, -- [{"platform": "twitch", "url": "rtmps://...", "enabled": true}, {"platform": "kick", ...}]
    
    schedule_type VARCHAR(20), -- 'immediate', 'scheduled', 'recurring'
    schedule_config JSONB,
    
    loop_enabled BOOLEAN DEFAULT FALSE,
    simulate_live BOOLEAN DEFAULT FALSE,
    
    status VARCHAR(20) DEFAULT 'stopped', -- 'streaming', 'stopped', 'error'
    current_stats JSONB,
    
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    started_at TIMESTAMPTZ,
    stopped_at TIMESTAMPTZ
);
CREATE INDEX idx_restream_user ON restream_jobs(user_id);
CREATE INDEX idx_restream_status ON restream_jobs(status);

5. Messaging Architecture (NATS JetStream)

5.1 Stream: `WEBENCODE_WORK` (WorkQueue)

Storage: File (NVMe backed)
Replicas: 3 (HA mode)
Retention: WorkQueue (ack-based deletion)
MaxAge: 24 hours (fallback)

Subjects:

jobs.dispatch: Task dispatch to workers
jobs.events: Status updates (fanout to UI)

5.2 Stream: `WEBENCODE_LIVE` (Interest)

Storage: Memory
Retention: Age 10 seconds
Replicas: 1 (ephemeral)

Subjects:

live.telemetry.{stream_id}: Bitrate, fps, viewers (1/sec)
live.lifecycle.{stream_id}: Stream start/stop events

5.3 Stream: `WEBENCODE_EVENTS` (Audit)

Storage: File
Replicas: 3
Retention: Age 90 days

Subjects:

audit.user_action: Job create, stream delete, etc.
audit.system: Worker joined, plugin crashed, etc.

6. Plugin System (Detailed Lifecycle)

6.1 Plugin Identity & Manifest

Every plugin must have a plugin.toml manifest:

[plugin]
id = "auth-oidc-keycloak"
type = "auth"
version = "1.0.0"
protocol_version = "1"
supported_kernel_versions = ["1.0.0", "1.1.0"]

[capabilities]
authentication_methods = ["oidc", "oauth2"]
mfa_support = true

[config_schema]
# JSON Schema for plugin config validation
type = "object"
required = ["issuer", "client_id"]
properties = {
  issuer = { type = "string", description = "OIDC issuer URL" },
  client_id = { type = "string" },
  client_secret_ref = { type = "string", description = "Reference to OpenBao secret" }
}

6.2 Plugin Discovery & Loading

graph TD
    A["Kernel Startup"] --> B["Scan Plugin Directory"]
    B --> C["Load plugin.toml manifests"]
    C --> D{"Validate<br/>protocol_version"}
    D -->|Mismatch| E["Log Error, Skip"]
    D -->|OK| F["Check config_schema"]
    F -->|Invalid config| G["Use defaults or fail"]
    F -->|Valid| H["Load plugin binary"]
    H --> I["Start subprocess<br/>via go-plugin"]
    I --> J{"RPC Handshake<br/>successful?"}
    J -->|No| K["Restart with backoff"]
    J -->|Yes| L["Plugin State: HEALTHY"]
    K --> J
    E --> M["Plugin State: DISABLED"]
    G --> M

6.3 Plugin Health Model

stateDiagram-v2
    [*] --> Starting: Kernel loads plugin binary
    
    Starting --> Healthy: Handshake OK, capabilities checked
    Starting --> Failed: Handshake timeout/error
    
    Healthy --> Degraded: RPC latency > 1s or error rate > 5%
    Degraded --> Healthy: Recovery detected
    Degraded --> Failed: Error rate > 10% for 2 min
    
    Failed --> Restarting: Automatic restart (exponential backoff)
    Restarting --> Healthy: Recovery successful
    Restarting --> Disabled: Max restarts exceeded (5x)
    
    Disabled --> [*]: Manual re-enable required
    Healthy --> Disabled: Admin action

6.4 Plugin Routing Rules

Storage Plugin Routing

if source_url.startswith("s3://"):
    use storage_plugin with type="s3"
elif source_url.startswith("fs://"):
    use storage_plugin with type="local_fs"
elif source_url.startswith("file://"):
    use storage_plugin with type="seaweedfs"
elif source_url.startswith("http"):
    use storage_plugin with type="http"

Encoder Plugin Routing

if required_codec in encoder_plugin.capabilities.video_codecs:
    if request.prefer_gpu and encoder_plugin.gpu_accelerated:
        route to GPU-enabled encoder
    else:
        route to any encoder with capability
else:
    fallback to secondary encoder plugin

Publisher Plugin Routing

for each platform in job.publish_targets:
    if platform == "youtube":
        route to publisher_plugin with platform="youtube"
    elif platform == "twitch" or platform == "kick" or platform == "rumble":
        route to publisher_plugin with type="restream" (for Live) or "generic" (for VOD if API matches)
    else:
        skip or error

Live Plugin Routing

if ingest_protocol == "rtmp":
    route to live_plugin with protocol="rtmp", port=1935
elif ingest_protocol == "srt":
    route to live_plugin with protocol="srt", port=10080
elif ingest_protocol == "webrtc":
    route to live_plugin with protocol="webrtc"

6.5 Plugin Configuration & Secrets

Process:

Admin provides plugin config JSON.
Kernel validates against plugin's config_schema.
For secret fields (*_secret_ref, *_token_ref), kernel creates reference in OpenBao.
Kernel passes reference ID (not plaintext) to plugin.
Plugin requests secret from OpenBao when needed.

Example:

{
  "plugin_id": "storage-s3",
  "config": {
    "endpoint": "s3.amazonaws.com",
    "region": "us-west-2",
    "bucket": "webencode-prod",
    "access_key_id_ref": "secret/webencode/s3/access_key",
    "secret_access_key_ref": "secret/webencode/s3/secret_key"
  }
}

Kernel stores only the *_ref keys in DB. Plugin retrieves actual secrets from OpenBao.

6.6 Plugin Upgrade & Rollback

sequenceDiagram
    Admin->>Kernel: Upload new plugin binary (v1.1.0)
    Kernel->>Kernel: Validate manifest & schema compatibility
    Kernel->>Plugin: Load new plugin in shadow mode (internal test)
    Plugin->>Plugin: Run self-test RPC
    Plugin-->>Kernel: Test passed
    Kernel->>Kernel: Mark old plugin as "standby"
    Kernel->>Kernel: Switch traffic to new plugin
    Kernel->>Kernel: Monitor error rate for 5 min
    alt Error rate < 1%
        Kernel->>Kernel: Keep new plugin, remove old
        Kernel-->>Admin: Upgrade successful
    else Error rate > 5%
        Kernel->>Plugin: Stop new plugin
        Kernel->>Plugin: Restart old plugin
        Kernel-->>Admin: Automatic rollback
    end

6.7 Plugin Observability Contract

Every plugin must emit:

Logs: Structured JSON format, with correlation_id field for tracing.
Metrics: Prometheus endpoint at /metrics, prefix webencode_plugin_{plugin_id}_*.
Health check: gRPC grpc.health.v1.Health/Check endpoint.

7. Plugin Interfaces (The 5 Pillars - Complete Specification)

7.1 Pillar 1: Authentication (`auth.proto`)

syntax = "proto3";
package plugin.auth;

// Authenticator provides identity and authorization services.
// Note: This is for self-hosted deployments - no billing/quotas.
service Authenticator {
  // Authenticate validates a token and returns user identity
  rpc Authenticate(AuthRequest) returns (AuthResponse);
  
  // Authorize checks if a user can perform an action on a resource
  rpc Authorize(AuthZRequest) returns (AuthZResponse);
  
  // GetUserProfile retrieves user metadata and role information
  rpc GetUserProfile(GetUserProfileRequest) returns (UserProfile);
  
  // ListUsers lists all users (admin only)
  rpc ListUsers(ListUsersRequest) returns (ListUsersResponse);
  
  // RefreshToken exchanges a refresh token for new access token
  rpc RefreshToken(RefreshTokenRequest) returns (AuthResponse);
  
  // Logout invalidates a user session
  rpc Logout(LogoutRequest) returns (Empty);
}

message AuthRequest {
  string token = 1;           // JWT or opaque token
  string scheme = 2;          // "bearer", "basic", "api-key"
  string client_ip = 3;       // For audit logging
  string user_agent = 4;      // For session tracking
}

message AuthResponse {
  bool valid = 1;
  string user_id = 2;
  string username = 3;
  string email = 4;
  string role = 5;            // "admin", "operator", "viewer"
  map<string, string> metadata = 6;
  int64 expires_at_unix = 7;
  string access_token = 8;    // New token if refreshed
  string refresh_token = 9;   // For token refresh flow
}

message AuthZRequest {
  string user_id = 1;
  string action = 2;          // "create", "read", "update", "delete", "execute"
  string resource_type = 3;   // "job", "stream", "worker", "plugin", "settings"
  string resource_id = 4;     // Optional: specific resource ID
}

message AuthZResponse {
  bool allowed = 1;
  string reason = 2;          // Explanation if denied
}

message GetUserProfileRequest {
  string user_id = 1;
}

message UserProfile {
  string user_id = 1;
  string username = 2;
  string email = 3;
  string display_name = 4;
  string role = 5;
  string avatar_url = 6;
  int64 created_at_unix = 7;
  int64 last_login_unix = 8;
  map<string, string> preferences = 9;  // UI preferences, timezone, etc.
}

message ListUsersRequest {
  int32 page = 1;
  int32 page_size = 2;
  string role_filter = 3;     // Optional: filter by role
}

message ListUsersResponse {
  repeated UserProfile users = 1;
  int32 total_count = 2;
  int32 page = 3;
}

message RefreshTokenRequest {
  string refresh_token = 1;
}

message LogoutRequest {
  string user_id = 1;
  string session_id = 2;      // Optional: specific session
  bool all_sessions = 3;      // Logout from all devices
}

message Empty {}

7.2 Pillar 2: Storage (`storage.proto`)

syntax = "proto3";
package plugin.storage;

service StorageProvider {
  // Upload streams file content to storage
  rpc Upload(stream FileChunk) returns (UploadSummary);
  
  // Download streams file content from storage
  rpc Download(FileRequest) returns (stream FileChunk);
  
  // Delete removes an object from storage
  rpc Delete(FileRequest) returns (Empty);
  
  // GetURL returns a signed URL for direct access
  rpc GetURL(SignedUrlRequest) returns (SignedUrlResponse);
  
  // GetUploadURL returns a pre-signed URL for uploads
  rpc GetUploadURL(SignedUrlRequest) returns (SignedUrlResponse);
  
  // ListObjects lists objects in a bucket with optional prefix
  rpc ListObjects(ListObjectsRequest) returns (ListObjectsResponse);
  
  // GetObjectMetadata returns metadata for an object
  rpc GetObjectMetadata(FileRequest) returns (ObjectMetadata);
  
  // GetCapabilities returns what features this storage plugin supports
  rpc GetCapabilities(Empty) returns (StorageCapabilities);
  
  // BrowseRoots returns available root directories for browsing (if supported)
  rpc BrowseRoots(Empty) returns (BrowseRootsResponse);
  
  // Browse lists contents of a directory (if supported)
  rpc Browse(BrowseRequest) returns (BrowseResponse);
}

message FileChunk {
  oneof content {
    FileMetadata metadata = 1;
    bytes data = 2;
  }
}

message FileMetadata {
  string bucket = 1;
  string path = 2;
  int64 size = 3;
  string content_type = 4;
}

message UploadSummary {
  string url = 1;
  int64 size = 2;
  string etag = 3;
}

message FileRequest {
  string bucket = 1;
  string path = 2;
}

message SignedUrlRequest {
  string bucket = 1;
  string object_key = 2;
  int64 expiry_seconds = 3;
  string content_type = 4;
  string method = 5;
}

message SignedUrlResponse {
  string url = 1;
  map<string, string> headers = 2;
  int64 expires_at = 3;
}

message ListObjectsRequest {
  string bucket = 1;
  string prefix = 2;
  string delimiter = 3;
  int32 max_keys = 4;
  string continuation_token = 5;
}

message ListObjectsResponse {
  repeated ObjectInfo objects = 1;
  repeated string common_prefixes = 2;
  string next_continuation_token = 3;
  bool is_truncated = 4;
}

message ObjectInfo {
  string key = 1;
  int64 size = 2;
  string etag = 3;
  int64 last_modified = 4;
  string storage_class = 5;
}

message ObjectMetadata {
  string key = 1;
  int64 size = 2;
  string content_type = 3;
  string etag = 4;
  int64 last_modified = 5;
  map<string, string> user_metadata = 6;
}

message StorageCapabilities {
  bool supports_browse = 1;
  bool supports_upload = 2;
  bool supports_signed_urls = 3;
  bool supports_streaming = 4;
  string storage_type = 5;
}

message BrowseRootsResponse {
  repeated BrowseEntry roots = 1;
}

message BrowseRequest {
  string path = 1;
  bool show_hidden = 2;
  bool media_only = 3;
  string search_query = 4;
}

message BrowseResponse {
  string current_path = 1;
  string parent_path = 2;
  repeated BrowseEntry entries = 3;
}

message BrowseEntry {
  string name = 1;
  string path = 2;
  bool is_directory = 3;
  int64 size = 4;
  int64 mod_time = 5;
  string extension = 6;
  bool is_video = 7;
  bool is_audio = 8;
  bool is_image = 9;
}

7.3 Pillar 3: Encoder (`encoder.proto`)

syntax = "proto3";
package plugin.encoder;

service EncoderEngine {
  rpc GetCapabilities(Empty) returns (Capabilities);
  rpc Transcode(TaskSpec) returns (stream TaskProgress);
}

message Capabilities {
  repeated string video_codecs = 1;
  repeated string audio_codecs = 2;
  bool gpu_accelerated = 3;
  string gpu_type = 4;
}

message TaskSpec {
  string task_id = 1;
  string input_url = 2;
  string output_url = 3;
  string video_codec = 4;
  int32 bitrate_k = 5;
  double start_time = 6;
  double duration = 7;
}

message TaskProgress {
  string task_id = 1;
  float percent = 2;
  double speed = 3;
  int32 fps = 4;
}

7.4 Pillar 4: Live (`live.proto`)

syntax = "proto3";
package plugin.live;

service LiveEngine {
  rpc StartIngest(IngestConfig) returns (stream EngineStatus);
  rpc OnPublishAttempt(StreamContext) returns (Decision);
  rpc StreamTelemetry(Empty) returns (stream TelemetryBatch);
  rpc StopIngest(StopRequest) returns (Empty);
}

message IngestConfig {
  int32 port = 1;
  string protocol = 2;
  string stream_key = 3;
}

message TelemetryBatch {
  string stream_key = 1;
  int32 video_bitrate = 2;
  int32 audio_bitrate = 3;
  double fps = 4;
}

7.5 Pillar 5: Publisher (`publisher.proto`)

syntax = "proto3";
package plugin.publisher;

service Publisher {
  rpc PublishVOD(PublishRequest) returns (PublishResult);
  rpc UpdateMetadata(UpdateRequest) returns (UpdateResult);
}

message PublishRequest {
  string local_object_key = 1;
  string title = 2;
  string description = 3;
  string platform = 4;
  string privacy = 5;
}

message PublishResult {
  bool success = 1;
  string public_url = 2;
  string platform_id = 3;
}

8. Core Workflows (Logic & Algorithms)

8.1 VOD Encoding Workflow

graph TD
    A["User submits job<br/>POST /v1/jobs"] --> B["Kernel validates auth"]
    B --> C["Storage plugin: GetDownloadURL"]
    C --> D["FFmpeg probe for keyframes"]
    D --> E["Calculate split points<br/>align to IDR frames"]
    E --> F["Create N tasks<br/>in PostgreSQL"]
    F --> G["For each task:<br/>NATS.Publish to jobs.dispatch"]
    G --> H["Worker picks up task"]
    H --> I["Storage plugin: GetDownloadURL"]
    I --> J["FFmpeg transcode"]
    J --> K["Storage plugin: GetUploadURL"]
    K --> L["Upload chunk"]
    L --> M["NATS.Publish job.events"]
    M --> N{"All tasks<br/>complete?"}
    N -->|No| H
    N -->|Yes| O["Stitching phase:<br/>Concat demuxer"]
    O --> P["Publisher plugin: PublishVOD<br/>if enabled"]
    P --> Q["Mark job as completed"]
    Q --> R["Broadcast to UI via NATS"]

Smart Chunking Optimization: The orchestrator checks the number of healthy workers before scheduling. If only 1 worker is available, the job is processed as a single chunk to avoid splitting/stitching overhead. Parallel segmentation is only active when multiple workers are online.

8.2 Live Stream Workflow (with Archive & Restream)

graph TD
    A["User creates stream<br/>POST /v1/streams"] --> B["Kernel generates stream_key"]
    B --> C["Live plugin: StartIngest"]
    C --> D["MediaMTX listens on RTMP:1935"]
    D --> E["OBS/encoder connects"]
    E --> F["Live plugin: OnPublishAttempt<br/>Auth check"]
    F --> G["Stream goes LIVE"]
    G --> H["Record segments<br/>to object storage"]
    H --> I["Every 1sec:<br/>TelemetryBatch to NATS"]
    I --> J["UI subscribes live.telemetry"]
    J --> K{"Stream ends?"}
    K -->|No| I
    K -->|Yes| L["Collect all segment URLs"]
    L --> M["Create VOD job<br/>source_type=stream"]
    M --> N["Archive & re-encode<br/>if archive_enabled=true"]
    N --> O{"Restream enabled?"}
    O -->|Yes| P["For each restream_destination"]
    P --> Q["Live plugin: proxy/restream<br/>to RTMP endpoint"]
    Q --> R["Track stats per destination"]
    O -->|No| S["Mark stream as ended"]
    S --> T["Broadcast to UI"]

8.3 Restream Job Workflow

graph TD
    A["User creates restream job<br/>POST /v1/restreams"] --> B["Validate destinations"]
    B --> C["Store in PostgreSQL<br/>restream_jobs table"]
    C --> D{"schedule_type<br/>?"}
    D -->|immediate| E["Start immediately"]
    D -->|scheduled| F["Queue at scheduled time"]
    D -->|recurring| G["Queue recurring instances"]
    E --> H["Determine input source"]
    H --> I{"input_type?"}
    I -->|rtmp| J["Live plugin: accept RTMP ingest"]
    I -->|file| K["Storage plugin: GetDownloadURL"]
    I -->|vod_job| L["Get output from previous job"]
    J --> M["Stream to output destinations"]
    K --> M
    L --> M
    M --> N["For each destination:<br/>check enabled flag"]
    N --> O["Live plugin: proxy to RTMP/SRT"]
    O --> P["Track bitrate per destination"]
    P --> Q["Broadcast stats to UI"]

9. FFmpeg Integration & Command Templates

9.1 FFmpeg Command Architecture

WebEncode uses FFmpeg as the primary transcoding engine, executed as subprocess by worker nodes. The Encoder Plugin wraps FFmpeg with structured command generation and progress parsing.

graph LR
    A["Worker receives task"] --> B["Generate FFmpeg command"]
    B --> C["Set environment variables"]
    C --> D["Execute subprocess"]
    D --> E["Parse progress output"]
    E --> F["Report to NATS"]
    D --> G["Wait for exit code"]
    G --> H{Success?}
    H -->|Yes| I["Upload chunk"]
    H -->|No| J["Report error, retry if attempts remain"]

9.2 Base Command Template

ffmpeg \
  -hide_banner \
  -loglevel warning \
  -progress pipe:1 \
  -y \
  -ss {{START_TIME}} \
  -i "{{INPUT_URL}}" \
  -t {{DURATION}} \
  -map 0:v:0 -map 0:a:0 \
  -c:v {{VIDEO_CODEC}} \
  -b:v {{VIDEO_BITRATE}} \
  -maxrate {{MAX_BITRATE}} \
  -bufsize {{BUFFER_SIZE}} \
  -preset {{PRESET}} \
  -c:a {{AUDIO_CODEC}} \
  -b:a {{AUDIO_BITRATE}} \
  -ar {{AUDIO_SAMPLE_RATE}} \
  -f {{OUTPUT_FORMAT}} \
  "{{OUTPUT_URL}}"

9.3 Profile Command Templates

9.3.1 1080p H.264 (Software)

ffmpeg -hide_banner -loglevel warning -progress pipe:1 -y \
  -ss {{START}} -i "{{INPUT}}" -t {{DURATION}} \
  -c:v libx264 -preset fast -profile:v high -level:v 4.1 \
  -b:v 5000k -maxrate 5500k -bufsize 10000k \
  -vf "scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2" \
  -pix_fmt yuv420p \
  -c:a aac -b:a 192k -ar 48000 -ac 2 \
  -movflags +faststart \
  -f mp4 "{{OUTPUT}}"

9.3.2 4K HEVC (NVIDIA GPU)

ffmpeg -hide_banner -loglevel warning -progress pipe:1 -y \
  -hwaccel cuda -hwaccel_output_format cuda \
  -ss {{START}} -i "{{INPUT}}" -t {{DURATION}} \
  -c:v hevc_nvenc -preset p4 -tune hq -rc vbr -cq 23 \
  -b:v 15000k -maxrate 18000k -bufsize 30000k \
  -vf "scale_cuda=3840:2160:force_original_aspect_ratio=decrease" \
  -c:a aac -b:a 320k -ar 48000 -ac 2 \
  -movflags +faststart \
  -f mp4 "{{OUTPUT}}"

9.3.3 720p VP9 (WebM)

ffmpeg -hide_banner -loglevel warning -progress pipe:1 -y \
  -ss {{START}} -i "{{INPUT}}" -t {{DURATION}} \
  -c:v libvpx-vp9 -b:v 2000k -crf 30 -row-mt 1 -threads 4 \
  -vf "scale=1280:720:force_original_aspect_ratio=decrease" \
  -c:a libopus -b:a 128k -ar 48000 \
  -f webm "{{OUTPUT}}"

9.3.4 AV1 (Software - High Quality)

ffmpeg -hide_banner -loglevel warning -progress pipe:1 -y \
  -ss {{START}} -i "{{INPUT}}" -t {{DURATION}} \
  -c:v libaom-av1 -crf 30 -b:v 0 -cpu-used 4 -row-mt 1 \
  -vf "scale=1920:1080:force_original_aspect_ratio=decrease" \
  -pix_fmt yuv420p10le \
  -c:a libopus -b:a 128k -ar 48000 \
  -f mp4 "{{OUTPUT}}"

9.4 HLS/DASH Packaging

9.4.1 HLS with Multiple Qualities (ABR Ladder)

ffmpeg -hide_banner -loglevel warning -y \
  -i "{{INPUT}}" \
  -filter_complex "[0:v]split=3[v1][v2][v3]; \
    [v1]scale=1920:1080[v1out]; \
    [v2]scale=1280:720[v2out]; \
    [v3]scale=854:480[v3out]" \
  -map "[v1out]" -c:v:0 libx264 -b:v:0 5000k -maxrate:v:0 5500k -bufsize:v:0 10000k \
  -map "[v2out]" -c:v:1 libx264 -b:v:1 2500k -maxrate:v:1 2750k -bufsize:v:1 5000k \
  -map "[v3out]" -c:v:2 libx264 -b:v:2 1000k -maxrate:v:2 1100k -bufsize:v:2 2000k \
  -map 0:a -c:a aac -b:a:0 192k -b:a:1 128k -b:a:2 96k \
  -f hls \
  -hls_time 6 \
  -hls_list_size 0 \
  -hls_segment_filename "{{OUTPUT_DIR}}/stream_%v/segment_%03d.ts" \
  -master_pl_name master.m3u8 \
  -var_stream_map "v:0,a:0 v:1,a:1 v:2,a:2" \
  "{{OUTPUT_DIR}}/stream_%v/playlist.m3u8"

9.5 Hardware Acceleration Detection

// GPU detection logic (pseudo-code)
func DetectGPUCapabilities() (*GPUInfo, error) {
    // NVIDIA detection
    if output, err := exec.Command("nvidia-smi", "-L").Output(); err == nil {
        return parseNvidiaOutput(output), nil
    }
    
    // AMD detection (Linux)
    if _, err := os.Stat("/dev/dri/renderD128"); err == nil {
        return &GPUInfo{Type: "amd", Available: true}, nil
    }
    
    // Intel QSV detection
    if output, err := exec.Command("vainfo").Output(); err == nil {
        if strings.Contains(string(output), "iHD") {
            return &GPUInfo{Type: "intel", Available: true}, nil
        }
    }
    
    return &GPUInfo{Type: "none", Available: false}, nil
}

Encoder Selection Matrix:

GPU Type	H.264 Encoder	H.265/HEVC	AV1
NVIDIA	`h264_nvenc`	`hevc_nvenc`	`av1_nvenc` (RTX 40+)
AMD	`h264_amf`	`hevc_amf`	`av1_amf` (RX 7000+)
Intel	`h264_qsv`	`hevc_qsv`	`av1_qsv` (Arc)
None	`libx264`	`libx265`	`libaom-av1` / `libsvtav1`

9.6 Segment Splitting Algorithm

// Keyframe-aligned segment splitting
func CalculateSegmentBoundaries(probeData *ProbeResult, targetSegmentDuration float64) []Segment {
    keyframes := probeData.Keyframes // From ffprobe -show_frames -select_streams v -show_entries frame=pkt_pts_time,key_frame
    
    segments := []Segment{}
    currentStart := 0.0
    
    for i, kf := range keyframes {
        timeSinceStart := kf.Time - currentStart
        
        // If we've accumulated enough time, or this is the last keyframe
        if timeSinceStart >= targetSegmentDuration || i == len(keyframes)-1 {
            segments = append(segments, Segment{
                Index:     len(segments),
                StartTime: currentStart,
                EndTime:   kf.Time,
                Duration:  kf.Time - currentStart,
            })
            currentStart = kf.Time
        }
    }
    
    return segments
}

Segmentation Parameters:

Target segment duration: 30-60 seconds (configurable)
Minimum segment: 10 seconds (avoid overhead)
Maximum segment: 120 seconds (limit single task impact)
Keyframe alignment: Always split on I-frames to avoid decode artifacts

9.7 Progress Parsing

FFmpeg outputs progress to stderr with -progress pipe:1. The worker parses this in real-time:

func ParseFFmpegProgress(line string, totalDuration float64) *ProgressUpdate {
    // Example: out_time_ms=30000000
    if strings.HasPrefix(line, "out_time_ms=") {
        ms, _ := strconv.ParseInt(strings.TrimPrefix(line, "out_time_ms="), 10, 64)
        currentTime := float64(ms) / 1_000_000.0
        percent := (currentTime / totalDuration) * 100.0
        return &ProgressUpdate{
            Percent:      math.Min(percent, 100.0),
            CurrentTime:  currentTime,
            TotalTime:    totalDuration,
        }
    }
    
    // Parse speed (e.g., "speed=2.5x")
    if strings.HasPrefix(line, "speed=") {
        speedStr := strings.TrimSuffix(strings.TrimPrefix(line, "speed="), "x")
        speed, _ := strconv.ParseFloat(speedStr, 64)
        return &ProgressUpdate{Speed: speed}
    }
    
    // Parse fps
    if strings.HasPrefix(line, "fps=") {
        fps, _ := strconv.ParseFloat(strings.TrimPrefix(line, "fps="), 64)
        return &ProgressUpdate{FPS: fps}
    }
    
    return nil
}

9.8 Error Handling & Recovery

Exit Code	Meaning	Action
0	Success	Continue to upload
1	Generic error	Retry with exponential backoff
69	Input error (corrupt file)	Mark job as failed, no retry
137	OOM killed	Retry on different worker with fewer parallel tasks
255	Timeout/abort	Retry if attempts remain

Retry Backoff Formula:

delay = min(base * 2^attempt, max_delay)
base = 5 seconds
max_delay = 5 minutes
max_attempts = 3 (configurable per profile)

10. API Reference (OpenAPI 3.1)

10.1 Jobs API

POST /v1/jobs

{
  "source_url": "s3://bucket/video.mov",
  "profile": "1080p_h264",
  "publish": {
    "youtube": { "title": "My Video", "privacy": "unlisted" }
  },
  "webhooks": ["https://callback.site/hook"]
}

Response: 201 Created

{
  "id": "job-uuid",
  "status": "queued",
  "eta_seconds": 300
}

GET /v1/jobs/{id} Response:

{
  "id": "job-uuid",
  "status": "processing",
  "progress_pct": 45,
  "tasks_total": 30,
  "tasks_completed": 13,
  "eta_seconds": 200
}

POST /v1/jobs/{id}/retry Cancels the current job (if running) and creates a new job with the same parameters (source URL, profiles, etc.). Returns the new job details. Response: 201 Created

{
  "id": "new-job-uuid",
  "source_url": "...",
  "status": "queued",
  ...
}

GET /v1/jobs/{id}/outputs Returns output files for a completed job with download URLs. Response:

{
  "job_id": "job-uuid",
  "status": "completed",
  "outputs": [
    {
      "name": "abc123_final.mp4",
      "type": "final",
      "url": "abc123_final.mp4",
      "download_url": "https://storage.example.com/abc123_final.mp4?signed=...",
      "profile": "default"
    }
  ]
}

POST /v1/jobs/{id}/publish Publishes a completed job to an external platform using publisher plugins. Request:

{
  "platform": "twitch",
  "title": "My Video",
  "description": "Optional description",
  "access_token": "oauth_token_for_platform",
  "output_key": "optional_specific_output"
}

Response:

{
  "success": true,
  "platform_id": "video123",
  "platform_url": "https://www.twitch.tv/videos/video123",
  "message": "Successfully published to twitch"
}

10.2 Streams API

POST /v1/streams

{
  "title": "Live Event",
  "description": "My stream",
  "archive_enabled": true
}

Response: 201 Created

{
  "id": "stream-uuid",
  "stream_key": "live_xxxxx",
  "ingest_url": "rtmp://ingest.webencode.local:1935/live"
}

GET /v1/streams/{id}/stats Response:

{
  "status": "live",
  "viewers": 420,
  "bitrate_kbps": 5200,
  "fps": 59.97
}

10.3 Restream API

POST /v1/restreams

{
  "title": "Multi-platform stream",
  "input_type": "rtmp",
  "output_destinations": [
    { "platform": "youtube", "url": "rtmp://...", "enabled": true },
    { "platform": "twitch", "url": "rtmp://...", "enabled": true }
  ]
}

GET /v1/restreams/{id} Response includes per-destination stats.

10.4 Files API

GET /v1/files/capabilities Returns list of storage plugins and their capabilities.

[
  {
    "plugin_id": "storage-s3",
    "storage_type": "s3",
    "supports_browse": true
  }
]

GET /v1/files/roots Returns root directories (buckets/folders) from all plugins.

[
  {
    "plugin_id": "storage-s3",
    "name": "webencode-bucket",
    "path": "webencode-bucket",
    "storage_type": "s3"
  }
]

GET /v1/files/browse Lists contents of a directory. Query Params: plugin, path, media_only, show_hidden, search

{
  "plugin_id": "storage-s3",
  "current_path": "bucket/videos",
  "entries": [ { "name": "movie.mp4", "is_directory": false, "size": 10240, "is_video": true } ]
}

POST /v1/files/upload-url Generate pre-signed URL for direct upload.

{
  "plugin_id": "storage-s3",
  "bucket": "my-bucket",
  "object_key": "uploads/video.mp4",
  "content_type": "video/mp4"
}

POST /v1/files/upload Upload file via multipart form to the proxy. Form Fields: file, plugin_id, bucket, object_key

11. Error Handling & Codes

11.1 Error Response Format

All API errors return a standardized JSON structure:

{
  "error": {
    "code": "WE-JOB-004",
    "message": "Job not found",
    "details": "No job exists with ID: 550e8400-e29b-41d4-a716-446655440000",
    "request_id": "req-abc123",
    "timestamp": "2025-12-22T10:30:00Z",
    "docs_url": "https://docs.webencode.io/errors/WE-JOB-004"
  }
}

11.2 Error Code Catalog

Job Errors (WE-JOB-xxx)

Code	HTTP Status	Message	Description	Resolution
WE-JOB-001	400	Invalid job request	Missing required fields or malformed JSON	Check request body schema
WE-JOB-002	400	Invalid source URL	Source URL is unreachable or unsupported	Verify URL accessibility
WE-JOB-003	400	Invalid profile	Encoding profile does not exist	Use GET /v1/profiles to list valid profiles
WE-JOB-004	404	Job not found	Job ID does not exist	Verify job ID
WE-JOB-005	409	Job already cancelled	Cannot modify a cancelled job	Create a new job
WE-JOB-006	409	Job already completed	Cannot cancel a completed job	No action needed
WE-JOB-007	422	Unsupported source format	File format not supported	See supported formats list
WE-JOB-008	500	Job processing failed	Internal error during processing	Retry or contact support
WE-JOB-009	503	No workers available	All workers are busy or offline	Wait and retry

Stream Errors (WE-STR-xxx)

Code	HTTP Status	Message	Description	Resolution
WE-STR-001	400	Invalid stream request	Missing required fields	Check request body
WE-STR-002	404	Stream not found	Stream ID does not exist	Verify stream ID
WE-STR-003	409	Stream already live	Cannot modify settings while live	Stop stream first
WE-STR-004	409	Stream key conflict	Stream key already in use	Use a different key
WE-STR-005	422	Invalid restream destination	RTMP URL format invalid	Verify destination URL
WE-STR-006	500	Ingest server unavailable	MediaMTX plugin not responding	Restart live plugin

Worker Errors (WE-WRK-xxx)

Code	HTTP Status	Message	Description	Resolution
WE-WRK-001	404	Worker not found	Worker ID does not exist	List available workers
WE-WRK-002	409	Worker busy	Cannot drain worker with active tasks	Wait for tasks to complete
WE-WRK-003	410	Worker offline	Worker has not sent heartbeat	Check worker connectivity
WE-WRK-004	500	Worker communication failed	gRPC connection to worker failed	Restart worker

Plugin Errors (WE-PLG-xxx)

Code	HTTP Status	Message	Description	Resolution
WE-PLG-001	404	Plugin not found	Plugin ID does not exist	List installed plugins
WE-PLG-002	409	Plugin disabled	Plugin is disabled	Enable plugin first
WE-PLG-003	422	Invalid plugin config	Config does not match schema	Validate against plugin schema
WE-PLG-004	500	Plugin crashed	Plugin process terminated unexpectedly	Check plugin logs
WE-PLG-005	503	Plugin degraded	Plugin is responding slowly	Monitor and restart if needed

Auth Errors (WE-AUTH-xxx)

Code	HTTP Status	Message	Description	Resolution
WE-AUTH-001	401	Unauthorized	Missing or invalid token	Provide valid Bearer token
WE-AUTH-002	401	Token expired	JWT token has expired	Refresh token
WE-AUTH-003	403	Forbidden	User lacks permission for action	Contact admin for access
WE-AUTH-004	403	Role required	Specific role required for action	Check RBAC requirements
WE-AUTH-005	500	Auth provider unavailable	OIDC/Keycloak not responding	Retry or check provider

Storage Errors (WE-STO-xxx)

Code	HTTP Status	Message	Description	Resolution
WE-STO-001	404	Object not found	File does not exist in storage	Verify object key
WE-STO-002	403	Storage access denied	Missing storage credentials	Check plugin config
WE-STO-003	507	Storage full	Storage capacity exceeded	Free space or expand storage
WE-STO-004	500	Storage unavailable	Cannot connect to storage backend	Check storage plugin

System Errors (WE-SYS-xxx)

Code	HTTP Status	Message	Description	Resolution
WE-SYS-001	500	Internal server error	Unexpected error	Check logs, contact support
WE-SYS-002	500	Database error	PostgreSQL query failed	Check database connectivity
WE-SYS-003	500	Message queue error	NATS connection failed	Check NATS connectivity
WE-SYS-004	503	Service unavailable	Kernel is starting or shutting down	Wait and retry
WE-SYS-005	429	Rate limit exceeded	Too many requests	Wait and retry with backoff

11.3 HTTP Status Code Mapping

Status Code	Usage
200 OK	Successful GET, PUT, PATCH
201 Created	Successful POST creating a resource
204 No Content	Successful DELETE
400 Bad Request	Invalid input, malformed JSON
401 Unauthorized	Missing or invalid authentication
403 Forbidden	Authenticated but not authorized
404 Not Found	Resource does not exist
409 Conflict	State conflict (already exists, cannot modify)
422 Unprocessable Entity	Valid JSON but semantic error
429 Too Many Requests	Rate limit exceeded
500 Internal Server Error	Unexpected server error
503 Service Unavailable	Temporary unavailability
507 Insufficient Storage	Storage quota exceeded

11.4 Error Logging

All errors are logged with the following structure:

{
  "level": "error",
  "timestamp": "2025-12-22T10:30:00Z",
  "correlation_id": "req-abc123",
  "error_code": "WE-JOB-008",
  "message": "Job processing failed",
  "user_id": "user-uuid",
  "resource_type": "job",
  "resource_id": "job-uuid",
  "stack_trace": "...",
  "context": {
    "worker_id": "worker-001",
    "task_id": "task-uuid"
  }
}

12. Environment Configuration

Kernel

KERNEL_PORT=8080
KERNEL_GRPC_PORT=50051
NATS_URL=nats://nats:4222
DATABASE_URL=postgresql://user:pass@postgres:5432/webencode
OPENBAO_ADDR=http://openbao:8200
OPENBAO_TOKEN=s.xxxxxxx

PLUGIN_DIR=/plugins
LOG_LEVEL=info

Worker

KERNEL_GRPC_ADDR=kernel:50051
FFMPEG_PATH=/usr/bin/ffmpeg
WORKER_ID=worker-001
NATS_URL=nats://nats:4222

13. Security Architecture

Authentication Flow

sequenceDiagram
    Client->>Kernel: GET /v1/jobs<br/>Authorization: Bearer token
    Kernel->>AuthPlugin: Authenticate(token)
    AuthPlugin->>Keycloak: Validate JWT
    Keycloak-->>AuthPlugin: user_id + metadata
    AuthPlugin-->>Kernel: AuthResponse
    Kernel->>Kernel: Cache (5 min TTL)
    Kernel->>AuthPlugin: Authorize(user_id, 'list_jobs', NULL)
    AuthPlugin-->>Kernel: allowed=true
    Kernel-->>Client: 200 OK + jobs list

Data Encryption

In Transit: TLS 1.3 (all HTTP, gRPC, NATS)
At Rest: PostgreSQL pgcrypto for sensitive fields
Secrets: OpenBao (never raw values in DB or logs)

14. Deployment Configurations

Docker Compose

version: '3.8'
services:
  nats:
    image: nats:latest
    ports: ["4222:4222", "8222:8222"]
    command: "-js"
    volumes: [nats_data:/data]

  postgres:
    image: postgres:17
    environment:
      POSTGRES_DB: webencode
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes: [postgres_data:/var/lib/postgresql/data]
    ports: ["5432:5432"]

  openbao:
    image: openbao/openbao:latest
    ports: ["8200:8200"]
    environment:
      OPENBAO_DEV_ROOT_TOKEN_ID: "dev-token"

  # SeaweedFS - S3-compatible distributed object storage (Apache 2.0)
  seaweedfs-master:
    image: chrislusf/seaweedfs:latest
    ports: ["9333:9333", "19333:19333"]
    command: "master -ip=seaweedfs-master -ip.bind=0.0.0.0"
    volumes: [seaweedfs_master:/data]

  seaweedfs-volume:
    image: chrislusf/seaweedfs:latest
    ports: ["8080:8080", "18080:18080"]
    command: "volume -mserver=seaweedfs-master:9333 -ip.bind=0.0.0.0 -port=8080"
    volumes: [seaweedfs_volume:/data]
    depends_on: [seaweedfs-master]

  seaweedfs-filer:
    image: chrislusf/seaweedfs:latest
    ports: ["8888:8888", "18888:18888", "8333:8333"]
    command: "filer -master=seaweedfs-master:9333 -ip.bind=0.0.0.0 -s3 -s3.port=8333"
    volumes: [seaweedfs_filer:/data]
    depends_on: [seaweedfs-master, seaweedfs-volume]

  kernel:
    build:
      context: .
      dockerfile: kernel/Dockerfile
    ports: ["8000:8000", "50051:50051"]
    environment:
      DATABASE_URL: postgresql://webencode:${DB_PASSWORD}@postgres:5432/webencode
      NATS_URL: nats://nats:4222
      OPENBAO_ADDR: http://openbao:8200
      S3_ENDPOINT: http://seaweedfs-filer:8333
      PLUGIN_DIR: /plugins
    depends_on: [postgres, nats, openbao, seaweedfs-filer]

  worker:
    build:
      context: .
      dockerfile: worker/Dockerfile
    environment:
      KERNEL_GRPC_ADDR: kernel:50051
      NATS_URL: nats://nats:4222
    depends_on: [kernel, nats]

  ui:
    build:
      context: .
      dockerfile: ui/Dockerfile
    ports: ["3000:3000"]
    environment:
      NEXT_PUBLIC_KERNEL_URL: http://localhost:8080
      NEXT_PUBLIC_WS_URL: ws://localhost:8080

volumes:
  postgres_data:
  nats_data:
  seaweedfs_master:
  seaweedfs_volume:
  seaweedfs_filer:

Kubernetes (Production)

apiVersion: v1
kind: Namespace
metadata:
  name: webencode

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webencode-kernel
  namespace: webencode
spec:
  replicas: 3
  selector:
    matchLabels:
      app: webencode-kernel
  template:
    metadata:
      labels:
        app: webencode-kernel
    spec:
      containers:
      - name: kernel
        image: webencode/kernel:v1.0.0
        ports:
        - containerPort: 8080
        - containerPort: 50051
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: webencode-secrets
              key: database-url
        - name: NATS_URL
          value: "nats://nats:4222"
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 2000m
            memory: 2Gi
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 30

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: webencode-worker
  namespace: webencode
spec:
  serviceName: webencode-worker
  replicas: 5
  selector:
    matchLabels:
      app: webencode-worker
  template:
    metadata:
      labels:
        app: webencode-worker
    spec:
      containers:
      - name: worker
        image: webencode/worker:v1.0.0
        env:
        - name: WORKER_ID
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: KERNEL_GRPC_ADDR
          value: "webencode-kernel:50051"
        - name: NATS_URL
          value: "nats://nats:4222"
        resources:
          requests:
            cpu: 4000m
            memory: 8Gi
          limits:
            cpu: 8000m
            memory: 16Gi
        volumeMounts:
        - name: temp
          mountPath: /tmp/webencode-worker
  volumeClaimTemplates:
  - metadata:
      name: temp
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 100Gi

15. Monitoring & Observability

Prometheus Metrics (Kernel)

webencode_kernel_jobs_total{status="completed"} 1234
webencode_kernel_jobs_total{status="failed"} 12
webencode_kernel_job_processing_time_seconds{quantile="0.95"} 450

webencode_plugin_rpc_duration_seconds{plugin="encoder-ffmpeg"} 125
webencode_plugin_error_count{plugin="storage-s3"} 3
webencode_plugin_health{plugin="auth-oidc"} 1

Prometheus Metrics (Workers)

webencode_worker_task_duration_seconds{worker_id="worker-001"} 125
webencode_worker_cpu_usage_percent{worker_id="worker-001"} 85
webencode_worker_ffmpeg_process_count{worker_id="worker-001"} 2

Log Aggregation (Loki)

{job="webencode-kernel"} | json | level="error"
{job="webencode-worker"} | json | correlation_id="xyz"

16. Operational Playbook

CI/CD Pipelines

GitHub Actions

name: Build & Deploy
on:
  push:
    branches: [main]
    tags: [v*]

jobs:
  test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:17
      nats:
        image: nats:latest
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-go@v4
      with:
        go-version: '1.24'
    - run: go test ./... -v -coverprofile=coverage.out
    - uses: codecov/codecov-action@v3

  build:
    needs: test
    runs-on: ubuntu-latest
    strategy:
      matrix:
        target: [kernel, worker, ui]
    steps:
    - uses: actions/checkout@v4
    - run: docker build -f ${{ matrix.target }}/Dockerfile -t webencode/${{ matrix.target }}:${{ github.ref_name }} .
    - run: docker push webencode/${{ matrix.target }}:${{ github.ref_name }}

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
    - run: kubectl set image deployment/webencode-kernel kernel=webencode/kernel:${{ github.ref_name }}
    - run: kubectl rollout status deployment/webencode-kernel

17. Performance Tuning

Database

CREATE INDEX idx_jobs_user_created ON jobs(user_id, created_at DESC) WHERE status != 'completed';
VACUUM ANALYZE jobs;
ALTER SYSTEM SET work_mem = '256MB';

NATS

max_connections = 100000
max_subscriptions = 100000

Go

debug.SetGCPercent(75)
debug.SetMaxStack(256 * 1024 * 1024)

18. Disaster Recovery

RTO & RPO

RTO: 5 minutes (restart all services)
RPO: 1 minute (NATS replication lag)

Backup Strategy

Tier 1: Continuous PostgreSQL replication, NATS 3-node quorum
Tier 2: Weekly dumps to S3, NATS snapshots
Tier 3: Standby K8s cluster in different region

19. GUI & Frontend Specification (Complete)

19.1 Architecture

graph TB
    UI["Next.js 15 App<br/>(SSR + Server Actions)"]
    API["REST API<br/>(Kernel)"]
    WS["WebSocket<br/>(NATS events)"]
    AUTH["Auth Plugin<br/>(OIDC/Keycloak)"]
    
    UI -->|fetch/POST| API
    UI -->|subscribe| WS
    API -->|validate| AUTH
    API -->|dispatch| WS

19.2 Page Hierarchy

/
├── /dashboard
│   ├── System Health Card
│   ├── Job Statistics
│   ├── Recent Jobs Table
│   ├── Live Streams Widget
│   └── Resource Utilization Charts
├── /jobs
│   ├── /jobs (list, filter, search)
│   ├── /jobs/[id] (detail, tasks, logs)
│   └── /jobs/new (create wizard)
├── /streams
│   ├── /streams (list, active/past)
│   ├── /streams/[id] (live studio)
│   ├── /streams/new (create stream)
│   └── /streams/[id]/restream (restream destinations)
├── /restreams
│   ├── /restreams (list, schedule)
│   ├── /restreams/[id] (detail, per-destination stats)
│   └── /restreams/new (create restream job)
├── /workers
│   ├── /workers (list, health, uptime)
│   └── /workers/[id] (capabilities, logs, drain)
├── /settings
│   ├── /settings/plugins (manage plugins, config)
│   ├── /settings/profiles (encoding profiles)
│   ├── /settings/storage (S3/SeaweedFS config)
│   ├── /settings/auth (OAuth2 setup)
│   └── /settings/audit (audit log viewer)
└── /auth
    └── /auth/login (OAuth2 redirect)

19.3 Page Specifications

Dashboard (`/dashboard`)

Real-time subscriptions:

dashboard:stats (system health, job counts, worker status, plugin health)
live.telemetry.* (aggregate live stream stats)

Sections:

System Health Card
- NATS: status, version, message backlog
- PostgreSQL: connections, query latency
- Workers: online count, health status
- Plugins: per-plugin uptime, error rate
Job Statistics (last 24h)
- Queued, Processing, Completed, Failed (counts + % change)
- Average processing time
- Total data processed (GB)
Recent Jobs Table
- Job ID, source, profile, status, progress %, ETA, actions
- Filter: by status, user, date range
- Sort: by created_at, eta_seconds, progress_pct
- Actions: view detail, cancel, retry, delete
Live Streams Widget
- Active streams list
- Per-stream: title, viewers, bitrate, uptime
- Click to open stream dashboard
Resource Utilization
- CPU, memory, disk usage across workers
- Time-series charts (Recharts)
- Alerts if > 80%

Jobs (`/jobs`)

GET /jobs:

List all jobs with pagination (20 per page)
Filters: status, user, date range, source_type
Search: job ID, source URL
Sort: created_at DESC, eta_seconds, progress_pct
Real-time progress updates via job:{id}:updates

GET /jobs/[id]:

Job metadata: ID, user, source, profile, timestamps
Overall progress bar with % and ETA
Tasks table:
- Index, time range, status, assigned worker, output size, retry count
- Per-task: view logs, retry, requeue
Event timeline: Breadcrumbs of status changes (queued → processing → stitching → uploading → completed)
Artifact browser: Chunks, stitched output, manifests, thumbnails, checksums
FFmpeg logs: Download or view in browser
Actions: Cancel job, force stitch, force publish, delete

POST /jobs/new (Multi-step wizard):

Step 1: Source Selection
- Option A: Enter URL (s3://, https://, file://)
- Option B: Browse S3 bucket (if storage plugin supports it)
- Option C: Upload file (via presigned URL)
- Validation: Check file size, format, accessibility
Step 2: Profile Selection
- Preset profiles: "1080p_h264", "4k_hevc", "480p_fast", etc.
- Or "Custom" for advanced override
- Advanced panel (if custom):
  - Video codec: dropdown [libx264, hevc_nvenc, av1]
  - Bitrate ladder: [500k, 1000k, 2000k, 5000k, 10000k] (enable/disable each)
  - Resolution: width/height or preset
  - Audio: codec, bitrate, language, passthrough vs re-encode
  - Subtitles: passthrough, extract, burn-in
  - Thumbnails: interval (e.g., every 5s)
Step 3: Output & Packaging
- Container: MP4, MKV, WebM
- HLS/DASH: enable/disable, segment duration
- Encryption: none, AES-128, custom
- Subtitles: embed or separate files
Step 4: Publishing
- Publish to: YouTube, S3, email, none
- Per-platform settings:
  - YouTube: title, description, tags, privacy (public/unlisted/private), thumbnail
  - S3: bucket, prefix, storage class
  - Email: recipient, include link
Review & Submit
- Summary of all settings
- [Create Job] button
- Response: Job ID + insta-link to job detail page

Live Streams (`/streams`)

GET /streams:

List: active (at top) + past streams
Per-stream: title, status (LIVE / offline), viewers, duration, created_at
Filter: active/past, by user, by date range
Actions: view detail, edit, archive settings

GET /streams/[id] (Live Studio):

Stream info card:
- Title, description, thumbnail
- Status: LIVE / IDLE / ENDED
- Duration (if live): "Online for 2h 15m"
- Peak viewers: 2,890
Ingest settings (read-only during live):
- RTMP URL: rtmp://ingest.webencode.local:1935/live
- Stream key: [hidden] [Copy] [Reveal]
- Protocol: RTMP / SRT (dropdown to switch)
- Advanced: bitrate targets, encoder presets (if live plugin supports)
Live telemetry (updates 1/sec via stream:{id}:telemetry):
- Video bitrate: current + target (color-coded if below threshold)
- Audio bitrate: current
- Frame rate: current + target
- Audio level: meter (0-100%)
- Network health: "✓ Excellent" / "⚠ Degraded" / "✗ Poor"
- Dropped frames: count
- RTT / jitter: if available
Viewer analytics:
- Current viewers (large number)
- Peak viewers (last 24h)
- Time-series chart: viewer count over stream duration
- Geographic distribution: countries/regions (if available)
Restream destinations (if enabled):
- Table: platform, URL, enabled/disabled, status, bitrate per destination
- Add destination modal
- Per-destination stats: latency, error count, last error
Chat / embedding (if plugin supports):
- Embed code for website
- Chat frame (or link to external chat)
Actions:
- [End Stream] button (visible only if LIVE)
- [Start Recording] / [Stop Recording]
- [Archive to VOD] (creates job automatically)
- [Restream to...] button (opens restream destination editor)
- [Settings] button (edit stream metadata)

POST /streams/new:

Title: text input
Description: textarea
Archive enabled: checkbox
[Create Stream] → returns ingest URL and stream key

Restream (`/restreams`)

GET /restreams:

List all restream jobs
Per-job: title, status (streaming/stopped/error), input type, created_at, actions
Filter: by status, by date range
Actions: view detail, start, stop, delete

GET /restreams/[id]:

Job info:
- Title, description, input type (rtmp/file/vod_job)
- Schedule: immediate / scheduled / recurring
- Loop enabled: yes/no
- Simulate live: yes/no
Input:
- If rtmp: RTMP URL to listen on
- If file: source file URL + playback controls (play/pause/restart)
- If vod_job: reference to original job (link)
Output destinations (table):
- Platform, RTMP/SRT URL, enabled/disabled, status
- Per-destination stats: bitrate, latency, error count, last error
- Add destination, remove destination
Actions:
- [Start Streaming] / [Stop Streaming]
- [Edit] button (modify destinations, schedule)
- [Delete] button

POST /restreams/new:

Title, description
Input type: RTMP / File / VOD Job (radio)
Conditional fields based on input type
Output destinations: array input (add multiple)
Schedule: Immediate / Scheduled (date/time) / Recurring (cron-like)
Loop: checkbox
Simulate live: checkbox (if file input)
[Create Restream Job]

Workers (`/workers`)

GET /workers:

List all workers
Per-worker: ID, hostname, IP, status (healthy/degraded/offline), CPU %, memory %, tasks running, uptime
Filter: by status, by hostname
Actions: view detail, drain, restart, remove

GET /workers/[id]:

Metadata: ID, hostname, IP, port, version, registration time
Capabilities (from GetCapabilities RPC):
- Video codecs: [libx264, hevc_nvenc, av1]
- Audio codecs: [aac, opus]
- GPU: nvidia / amd / none
- GPU count, model
- CPU cores
- Memory (GB)
- Max parallel tasks
Health:
- Status: healthy / degraded / failed
- Last heartbeat
- Uptime
- Recent errors (last 5)
Performance (last 24h):
- Tasks completed: count
- Average task time
- Error rate
- CPU/memory usage chart
Actions:
- [Drain] (stop accepting new tasks, wait for current to finish)
- [Undrain] (resume accepting)
- [Restart] (graceful restart)
- [Remove] (permanently deregister)
- [View Logs] (stream last N log lines)

Plugins (`/settings/plugins`)

GET /settings/plugins:

List all plugins by type (Auth, Storage, Encoder, Live, Publisher)
Per-plugin: ID, type, version, status (enabled/disabled/degraded), uptime, error rate
Actions: enable/disable, configure, view logs, test connection

Plugin Detail Modal (per plugin):

Info: ID, type, version, protocol version
Status: enabled/disabled/degraded, last error
Capabilities (from plugin.GetCapabilities):
- For encoder: codecs, GPU support
- For storage: backends supported
- For live: protocols supported
Configuration:
- JSON editor (with schema validation)
- Secret fields marked as (*_secret_ref, *_token_ref)
- [Test Connection] button (calls plugin RPC)
- [Save] button
Logs:
- Last 50 log lines (streaming via WebSocket)
- Filter: by level (info, warn, error)
Metrics:
- RPC call latency (p50, p95, p99)
- Error rate (%)
- Uptime (%)

Routing & Priority:

Table: plugin type, current instance, priority, routing rule
Edit routing rule for each type (e.g., "Use encoder-ffmpeg for h264, encoder-av1 for av1")

Encoding Profiles (`/settings/profiles`)

GET /settings/profiles:

List all profiles: name, codec, bitrate, resolution
Actions: edit, duplicate, delete

Profile Detail Editor:

Basic:
- Name: "1080p_h264"
- Description: "Full HD H.264"
- Video codec: libx264
- Bitrate: 5000 kbps
- Resolution: 1920x1080
Advanced:
- Preset: ultrafast / fast / medium / slow
- Audio codec: aac
- Audio bitrate: 128 kbps
- Container: mp4
- HLS/DASH: enable, segment duration (e.g., 6s)
Ladder (ABR):
- Multiple quality tiers
- Per-tier: codec, bitrate, resolution
- Enable/disable each tier
- [Add tier] button
[Save] button

Audit Log (`/settings/audit`)

GET /settings/audit:

Table: timestamp, user, action, resource, result (success/failure)
Filter: by user, action, date range
Search: by resource ID
Per-row: [View details] modal showing full payload
Export: download as CSV

19.4 Real-time Updates Architecture

sequenceDiagram
    UI->>Kernel: Subscribe to 'job:job-123:updates'
    Kernel->>NATS: subscribe jobs.events with filter
    Worker->>NATS: publish job.events {job_id, progress_pct, ...}
    NATS->>Kernel: deliver event
    Kernel->>UI: WebSocket message
    UI->>UI: Update progress bar

19.5 Role-Based Access Control (RBAC)

graph TD
    A["User Role"] --> B{"Viewer"}
    A --> C{"Operator"}
    A --> D{"Admin"}
    
    B --> B1["View jobs/streams<br/>View dashboards<br/>View workers (read-only)"]
    
    C --> C1["All Viewer perms"]
    C --> C2["Create/edit jobs<br/>Create/edit streams<br/>Manage restreams<br/>View worker logs"]
    
    D --> D1["All Operator perms"]
    D --> D2["Manage plugins<br/>Manage profiles<br/>Worker drain/restart<br/>Audit log access"]

19.6 Internationalization (i18n)

StreamHub and Admin UI fully support internationalization via next-intl.

Architecture:
- Path-Prefix Routing: /{locale}/route (e.g., /en/dashboard, /ja/browse).
- Middleware: Automatically detects user locale from headers/cookies and redirects or rewrites paths.
- Server Components: Uses getMessages() for zero-bundle-size translations on the server.
- Client Components: Uses useTranslations() hook for interactive elements.
Supported Locales:
- en (English) - Default
- de (German)
- ja (Japanese)
- es (Spanish)
- fr (French)
Implementation details:
- Config: src/i18n/config.ts defines supported locales.
- Routing: src/i18n/routing.ts defines navigation wrappers (Link, redirect, usePathname).
- Messages: JSON files located in messages/{locale}.json.
- Layout: app/[locale]/layout.tsx serves as the root layout injects NextIntlClientProvider.

20. Webhook System

20.1 Architecture

Webhooks allow external systems to receive real-time notifications about job statuses and stream lifecycle events.

Delivery: Asynchronous via NATS webhooks.dispatch queue
Retries: Exponential backoff (up to 24 hours)
Security: HMAC-SHA256 signatures

20.2 Event Types

Event Type	Trigger
`job.created`	Job submitted and queued
`job.started`	Worker picked up first task
`job.completed`	All tasks finished, stitching done
`job.failed`	Job failed after max retries
`stream.started`	RTMP ingest connected
`stream.ended`	RTMP ingest disconnected
`recording.ready`	Live stream archive VOD is ready

20.3 Payload Schema

POST requests are sent with Content-Type: application/json:

{
  "event_id": "evt-uuid",
  "event_type": "job.completed",
  "timestamp": "2025-12-22T12:00:00Z",
  "data": {
    "job_id": "job-uuid",
    "user_id": "user-uuid",
    "status": "completed",
    "outputs": [
      {
        "url": "https://cdn.example.com/video.mp4",
        "format": "mp4",
        "size_bytes": 1024000
      }
    ]
  }
}

20.4 Security Verification

Recipients should verify the X-WebEncode-Signature header: Signature = HMAC-SHA256(payload, webhook_secret)

21. Testing Strategy

21.1 Testing Pyramid

Unit Tests (60%):
- Go: Standard testing package. Mock interfaces for plugins/DB.
- React: Jest + React Testing Library for components.
Integration Tests (30%):
- Testcontainers: Spin up ephemeral Postgres, NATS, SeaweedFS.
- Verify NATS pub/sub flow, DB constraints, plugin RPC contracts.
End-to-End (E2E) Tests (10%):
- Playwright: Full browser flow (Login -> Create Job -> View Progress).
- API Tests: Postman/Newman collections running against a staging env.

21.2 Performance Testing

Tool: k6
Scenarios:
- Job Submission flood (100 req/sec)
- WebSocket fanout (10k concurrent connections)
CI Integration: Run strictly defined performance baselines on every PR.

22. Public Repository Guidelines

22.1 Repository Standards

To ensure the project is welcoming and professional:

README.md: High-quality banner, clear "Getting Started", Architecture diagram.
CONTRIBUTING.md: Dev environment setup, coding standards (buffers vs streams), PR process.
LICENSE: MIT License file.
CODE_OF_CONDUCT.md: Standard Contributor Covenant.
SECURITY.md: Vulnerability reporting policy.

22.2 Branching Strategy

main: Stable, production-ready code.
develop: Integration branch for next release.
feat/xyz: Feature branches.
fix/xyz: Bug fix branches.
Commits: Follow Conventional Commits (e.g., feat: add av1 support, fix: resolve memory leak).

22.3 Issue Triage

Use GitHub Issue Templates (Bug Report, Feature Request).
Labels: good first issue, help wanted, bug, enhancement.

22.4 Branding and Logo

WebEncode has an official logo (logo.svg) that represents the platform's core functionality:

Logo Elements:

Film Strip Frame: Represents video/media processing
Play Button (Center): Symbolizes video playback and streaming
Code Brackets: Represent the encoding/transformation process
Distributed Nodes: Four corner nodes connected to the center represent the distributed worker architecture
Transformation Arrows: Indicate the encoding transformation workflow

Color Scheme:

Primary Blue (#3B82F6): Main brand color, represents reliability and technology
Accent Blue (#60A5FA): Secondary elements, film strip details
Success Green (#10B981): Distributed nodes and transformation arrows
Dark Background (#0F172A): Professional, modern appearance

Usage:

Display at the top of README.md (150x150px)
Use in documentation headers where appropriate
Maintain aspect ratio when scaling
SVG format allows scaling to any size without quality loss

File Location: /logo.svg (root of repository)

23. Features & Non-Features Checklist

Project Philosophy

WebEncode is a self-hosted, open-source video processing platform. There is no billing, quotas, or usage metering. You own and operate your own infrastructure.

Must-Have Features

✅ VOD Encoding: Split & stitch, ABR ladder, thumbnails, subtitles passthrough/extract/burn-in, per-task retry, cancel, resume
✅ Live Streaming: Create/manage channels, OBS ingest, live telemetry dashboard, recording/archive to VOD
✅ Restreaming: RTMP/SRT input → multiple RTMP/SRT outputs, file-as-live with scheduling & looping
✅ Automated Upload: Publish VOD to YouTube, metadata + scheduling + retry + status tracking
✅ Plugin-First Extensibility: 5 pillars (auth, storage, encoder, live, publisher) with full lifecycle management
✅ Self-Hosted First: No external dependencies, runs entirely on your infrastructure
✅ Ops-Grade Runtime: Worker pool, job queue visibility, audit logs, metrics, alerting, backups
✅ Cross-Platform Workers: Run workers on Linux, Windows, macOS (anywhere FFmpeg runs)
✅ Open Source: MIT licensed, community-driven development

Non-Features (Explicitly Excluded)

❌ In-browser video editor (complex client-side app, out of scope)
❌ Full OBS replacement (we ingest from OBS, not replace it)
❌ DRM licensing management (use specialized services)
❌ Real-time subtitle generation via ML (future AI plugin possible)
❌ Custom filter/effect creation in UI (use FFmpeg filters directly)
❌ Billing / Quotas / Usage Metering (this is self-hosted, not SaaS)
❌ Multi-tenant architecture (single-tenant by design, simpler operations)

24. Future Roadmap

Phase 1 (Q1 2026): Core MVP

Basic VOD encoding workflow
Live streaming + archive
Restream RTMP/file
Plugin system + 5 pillar implementations
Web dashboard (basic)
Worker pool management
PostgreSQL + NATS + SeaweedFS stack

Phase 2 (Q2 2026): Enhanced Features

AI Highlights Plugin (auto-clip from live streams)
Advanced Auth: SAML, LDAP, Active Directory
Custom encoding profiles (advanced ladder editor)
Webhook delivery + retry logic
Audit log with full retention

Phase 3 (Q3 2026): Scaling & Performance

Multi-region ingest/origin servers
Predictive auto-scaling (based on traffic forecast)
Hardware acceleration: NVIDIA, AMD GPU optimization
AV1 encoding full support
Cache optimization strategies

Phase 4 (Q4 2026): Ecosystem

Plugin Marketplace (discovery + installation)
Analytics API (GraphQL for job/stream metrics)
Advanced webhooks (transformation, filtering)
CDN integration partnerships
White-label options

25. Glossary

25.1 Architecture Terms

Kernel: The central control plane service (Go) responsible for API handling, job orchestration, and plugin management. The Kernel is the "brain" of the cluster.
Pillar: One of the 5 core extensible plugin interfaces: Auth, Storage, Encoder, Live, Publisher.
Worker: A stateless execution node that subscribes to tasks. Workers run FFmpeg subprocesses and report progress to the Kernel via NATS.
Plugin: An independent process (launched by the Kernel) that implements a specific gRPC interface (e.g., plugin-auth-oidc, plugin-storage-s3).

25.2 Domain Terms

Job: A request to process a VOD file. Consists of multiple Tasks.
Task: An atomic unit of work (e.g., "transcode segment 10 to 1080p").
Stream: A live video session with a unique stream key and persistent telemetry.
Restream: The process of forwarding a live ingest or VOD file to external RTMP destinations (Twitch, YouTube) in real-time.
Profile: A named configuration defining codecs, bitrates, resolutions, and packaging settings (e.g., "1080p_h264").

25.3 Technology Terms

SeaweedFS: High-performance distributed object storage used for video artifacts and segments. Replaces MinIO for license compliance.
OpenBao: Open-source secret management service (fork of Vault) used to store plugin credentials.
MediaMTX: The underlying media server engine used by the Live Plugin for RTMP/SRT/WebRTC handling.
NATS JetStream: The messaging backbone providing persistent work queues (WEBENCODE_WORK) and event streams.

Appendix: Configuration Examples

Example: Storage Plugin (SeaweedFS)

{
  "plugin_id": "storage-s3",
  "config": {
    "backend": "s3",
    "endpoint": "seaweedfs-filer.webencode.svc:8333",
    "region": "us-east-1",
    "bucket": "webencode",
    "access_key_id_ref": "secret/webencode/seaweedfs/access_key",
    "secret_access_key_ref": "secret/webencode/seaweedfs/secret_key",
    "use_ssl": false,
    "signed_url_expiry_seconds": 3600
  }
}

Example: Encoder Plugin (FFmpeg)

{
  "plugin_id": "encoder-ffmpeg",
  "config": {
    "ffmpeg_path": "/usr/bin/ffmpeg",
    "ffprobe_path": "/usr/bin/ffprobe",
    "max_parallel_tasks": 2,
    "presets": {
      "1080p_h264": {
        "codec": "libx264",
        "bitrate": "5000k",
        "width": 1920,
        "height": 1080,
        "fps": 30,
        "preset": "fast"
      },
      "4k_hevc": {
        "codec": "hevc_nvenc",
        "bitrate": "15000k",
        "width": 3840,
        "height": 2160,
        "fps": 30,
        "gpu": "nvidia"
      }
    }
  }
}

Example: Live Plugin (MediaMTX)

{
  "plugin_id": "live-mediamtx",
  "config": {
    "rtmp_port": 1935,
    "srt_port": 10080,
    "webrtc_port": 8443,
    "record_segments": true,
    "segment_duration": 2,
    "hls_enabled": true,
    "hls_directory": "/tmp/hls"
  }
}

Example: Auth Plugin (OIDC)

{
  "plugin_id": "auth-oidc-keycloak",
  "config": {
    "issuer": "https://keycloak.example.com/realms/webencode",
    "client_id": "webencode-app",
    "client_secret_ref": "secret/webencode/oidc/client_secret",
    "scopes": ["openid", "profile", "email"],
    "token_cache_ttl_seconds": 300
  }
}

Example: Publisher Plugin (YouTube)

{
  "plugin_id": "publisher-youtube",
  "config": {
    "api_key_ref": "secret/webencode/youtube/api_key",
    "oauth_credentials_ref": "secret/webencode/youtube/oauth",
    "default_privacy": "unlisted",
    "enable_comments": true,
    "enable_ratings": true,
    "auto_publish_delay_seconds": 0
  }
}

End of Specification (v7 - FINAL)

This specification is comprehensive, production-ready, and includes complete UI specifications, plugin system lifecycle, routing rules, and Mermaid diagrams. All components are finalized and technology choices are justified for 2025/2026 implementation.

FilesExpand file tree

SPECIFICATION.md

Latest commit

History

SPECIFICATION.md

File metadata and controls

WebEncode - Distributed Video Processing Platform (v2025)

Table of Contents

1. Vision & Architecture

1.1 The "Hyper-Modular" Philosophy

1.2 System Architecture Diagram

1.3 Scalability Model

2. Project Structure (Standard Monorepo)

2.1 Key Directories

2.5 Global Error Tracking System

Architecture

Data Model (error_events)

Integration Points

3. Technology Stack (Validated 2025/2026)

3.1 Technology Validation Summary

✅ Go 1.24+ (CONFIRMED - Optimal Choice)

✅ NATS JetStream 2.10+ (CONFIRMED - Optimal Choice)

✅ Next.js 16 (CONFIRMED - Optimal Choice)

✅ OpenBao (CONFIRMED - Best OSS Choice)

✅ SeaweedFS (CONFIRMED - Best FOSS Storage)

✅ MediaMTX (CONFIRMED - Best for Our Use Case)

3.2 Complete Tool & Language Mapping

3.3 Core Plugin Suite (MVP Deliverables)

4. Data Models (PostgreSQL DDL)

4.1 jobs Table

4.2 tasks Table

4.3 streams Table

4.4 workers Table

4.5 plugin_configs Table

4.6 restream_jobs Table

5. Messaging Architecture (NATS JetStream)

5.1 Stream: WEBENCODE_WORK (WorkQueue)

5.2 Stream: WEBENCODE_LIVE (Interest)

5.3 Stream: WEBENCODE_EVENTS (Audit)

6. Plugin System (Detailed Lifecycle)

6.1 Plugin Identity & Manifest

6.2 Plugin Discovery & Loading

6.3 Plugin Health Model

6.4 Plugin Routing Rules

Storage Plugin Routing

Encoder Plugin Routing

Publisher Plugin Routing

Live Plugin Routing

6.5 Plugin Configuration & Secrets

6.6 Plugin Upgrade & Rollback

6.7 Plugin Observability Contract

7. Plugin Interfaces (The 5 Pillars - Complete Specification)

7.1 Pillar 1: Authentication (auth.proto)

7.2 Pillar 2: Storage (storage.proto)

7.3 Pillar 3: Encoder (encoder.proto)

7.4 Pillar 4: Live (live.proto)

7.5 Pillar 5: Publisher (publisher.proto)

8. Core Workflows (Logic & Algorithms)

8.1 VOD Encoding Workflow

8.2 Live Stream Workflow (with Archive & Restream)

8.3 Restream Job Workflow

9. FFmpeg Integration & Command Templates

9.1 FFmpeg Command Architecture

9.2 Base Command Template

9.3 Profile Command Templates

9.3.1 1080p H.264 (Software)

9.3.2 4K HEVC (NVIDIA GPU)

9.3.3 720p VP9 (WebM)

9.3.4 AV1 (Software - High Quality)

9.4 HLS/DASH Packaging

9.4.1 HLS with Multiple Qualities (ABR Ladder)

9.5 Hardware Acceleration Detection

9.6 Segment Splitting Algorithm

9.7 Progress Parsing

9.8 Error Handling & Recovery

10. API Reference (OpenAPI 3.1)

10.1 Jobs API

10.2 Streams API

10.3 Restream API

10.4 Files API

Data Model (`error_events`)

4.1 `jobs` Table

4.2 `tasks` Table

4.3 `streams` Table

4.4 `workers` Table

4.5 `plugin_configs` Table

4.6 `restream_jobs` Table

5.1 Stream: `WEBENCODE_WORK` (WorkQueue)

5.2 Stream: `WEBENCODE_LIVE` (Interest)

5.3 Stream: `WEBENCODE_EVENTS` (Audit)

7.1 Pillar 1: Authentication (`auth.proto`)

7.2 Pillar 2: Storage (`storage.proto`)

7.3 Pillar 3: Encoder (`encoder.proto`)

7.4 Pillar 4: Live (`live.proto`)

7.5 Pillar 5: Publisher (`publisher.proto`)

Dashboard (`/dashboard`)

Jobs (`/jobs`)

Live Streams (`/streams`)

Restream (`/restreams`)

Workers (`/workers`)

Plugins (`/settings/plugins`)

Encoding Profiles (`/settings/profiles`)

Audit Log (`/settings/audit`)