feat: add quantized video model support (GGUF, NF4, FP8) by taskmasterpeace · Pull Request #74 · Lightricks/LTX-Desktop

Machine King (taskmasterpeace) · 2026-03-23T08:40:15Z

Summary

Add support for quantized LTX 2.3 video model formats (GGUF Q4/Q5/Q8, NF4, FP8) so Directors Desktop can run on 24GB GPUs (RTX 3090, 4070 Ti Super) instead of requiring 32GB+
New ModelScanner service detects model files by metadata (GGUF magic bytes, safetensors headers, NF4 config)
New Models tab in Settings with model dropdown, folder picker, GPU info, and distilled LoRA status
New Model Guide popup with GPU-based format recommendations and HuggingFace download links
PipelinesHandler routes to the correct pipeline class based on selected model format
GGUF and NF4 pipeline classes are scaffolded (NotImplementedError) — actual loading requires testing with real model files

Backend changes

gguf>=0.10.0 dependency added
ModelScanner Protocol + Impl + Fake (services pattern)
3 new API endpoints: GET /api/models/video/scan, POST /api/models/video/select, GET /api/models/video/guide
GGUFFastVideoPipeline and NF4FastVideoPipeline scaffolds
25 new tests (463 total passing)

Frontend changes

Models tab in SettingsModal (model dropdown, folder picker, GPU info, scan button)
ModelGuideDialog component (format cards, download links, setup instructions)
New settings: customVideoModelPath, selectedVideoModel

Docs

README section with VRAM table and setup instructions

Test plan

pnpm typecheck passes (pyright + tsc)
pnpm backend:test passes (463 tests)
Manual: Open Settings → Models tab, verify GPU info displays
Manual: Click Model Guide, verify format cards and download links
Manual: Set custom model folder, scan, select a model
Manual: Verify default BF16 pipeline still works when no model is selected

🤖 Generated with Claude Code

…cons Replace LTX branding throughout the app with Director's Desktop. New colorful palette+clapperboard logo (SVG + generated PNG/ICO icons). Updated product name, window titles, loading text, about section, and electron-builder config.

Show which AI model produced each result with a badge overlay on VideoPlayer and ImageResult components. Tracks lastModel in generation state and maps model IDs to display names.

Swap out fal.ai API backend for Replicate with multi-model support. New ImageAPIClient protocol with ReplicateImageClientImpl supporting Z-Image Turbo and Nano Banana 2. Settings updated from fal_api_key to replicate_api_key with image_model selector. All 247 tests pass.

New VideoAPIClient protocol with ReplicateVideoClientImpl for Seedance 1.5 Pro cloud video generation. Job queue with submit/status/cancel routes for managing generation jobs. Includes full test coverage.

Design document covering 6-phase integration between Director's Desktop and Director's Palette (auth, gallery/library sync, generation upgrades, power tools, advanced features, testing). Phase 1 plan with 14 TDD tasks.

- palette_api_key setting with masked responses - PaletteSyncClient protocol + HTTP implementation + fake test double - /api/sync/status and /api/sync/credits routes - SyncHandler wired into AppHandler composition root - 5 new integration tests for sync behavior

- GenerateVideoRequest.lastFramePath for first/last frame video gen - POST /api/enhance-prompt using Gemini to enhance rough prompts - EnhancePromptHandler with cinematic prompt expansion - 5 new tests covering both features

…aspect labels - FrameSlot component with paste/drop/browse support - First Frame and Last Frame slots in Playground for video generation - Sparkle button to enhance prompts via Gemini API - Image variations slider (1-12) in text-to-image settings - Social media labels on aspect ratio presets (YouTube, TikTok, Instagram) - 4:5 Instagram Post aspect ratio option for images

Extract the last frame of a generated video and use it as the first frame for the next generation. FastForward button in VideoPlayer controls. Clears prompt so user describes what happens next.

Director's Palette section in API Keys tab with API key input, connection status indicator, user email display, and live credits balance. Polls /api/sync/status and /api/sync/credits every 60s.

Injects a mock window.electronAPI when running in a plain browser (non-Electron). Gated by !window.electronAPI && location.protocol === 'http:' so it never activates in production Electron builds.

…Features Phase 2 — Gallery + Library: - Local gallery backend: scan outputs dir, pagination, type filtering, delete - Library backend: Characters, Styles, References CRUD with JSON persistence - Frontend Gallery view with filter tabs, model badges, lightbox preview - Frontend Characters/Styles/References views with add/edit/delete modals - New organized sidebar: Create, Edit, Library, Tools sections Phase 4 — Power Tools: - Wildcard parser with Cartesian expansion and random mode - Prompt Library with search, tags, usage tracking - Frontend Wildcards view with test/expand area - Frontend Prompt Library view with search, sort, copy-to-clipboard Phase 5 — Advanced Features: - Receive-job endpoint for Palette→Desktop generation dispatch - Contact Sheet generation (9 cinematic angles from reference) - Style Guide Grid generation (9 diverse subjects in one style) 81 new backend tests, all 339 passing. TypeScript + Pyright clean.

Previously, hasRunning checked ALL jobs in the queue, so stale/orphaned jobs from previous sessions would permanently block the UI in "generating" state. Now only the actively-submitted job determines isGenerating.

…d UI - Extend QueueJob with batch_id, depends_on, auto_params, tags fields - Add JobQueue helpers: jobs_for_batch, active_batch_ids, queued_jobs_for_slot - Add batch API types: BatchSubmitRequest, SweepDefinition, PipelineDefinition, BatchReport - Implement BatchHandler with list, sweep (cartesian product), and pipeline modes - Add QueueWorker dependency checking, auto-param resolution, batch completion detection - Wire i2v auto-prompt generation into dependent job resolution - Add batch routes: submit-batch, batch status, cancel, retry-failed - Include batch fields in QueueJobResponse and queue status endpoint - Add batchSoundEnabled setting - Add frontend batch types, API client, useBatch hook with polling - Create BatchBuilderModal with List, Import (CSV/JSON), and Grid Sweep tabs - Add Batch button to GenSpace prompt bar - 37 new/updated backend tests, all 372 backend tests pass - TypeScript and Pyright clean, Vite build succeeds

…d R2 storage - FFN chunked feedforward reduces peak VRAM by up to 8x (setting: ffnChunkCount) - TeaCache timestep-aware caching for 1.6-2.1x denoising speedup (setting: teaCacheThreshold) - Aggressive VRAM deep_cleanup after every GPU job prevents post-heavy-load stalls - R2/S3-compatible cloud storage upload for generated media (setting: autoUploadToR2) - 382 tests passing including 10 new tests for optimizations

- Credit system: balance display, per-generation cost on buttons, auto-deduction after API jobs - Output naming: dd_{model}_{prompt_slug}_{timestamp}.{ext} across all handlers - Gallery parser supports both new dd_ and legacy filename formats - Palette credits fallback: uses /credits/check when /credits returns 500 - Hardcoded pricing from live Palette API as fallback when credits endpoint unavailable - Seedance time estimates (60s/5s@720p, 120s/10s@720p) - Palette API spec, handoff docs, and integration plans - README updated with credits, Seedance, Replicate API docs

LoRAs are loaded at pipeline creation time via DistilledPipeline. Pipeline is recreated when the requested LoRA changes, and reused when it matches. Frontend sends loraPath/loraWeight params for video jobs through the queue.

The distilled pipeline doesn't support last-frame conditioning (frame_idx=num_frames-1), causing tensor shape mismatches. Using frame_idx=0 instead works identically — the new video continues from the provided frame.

New endpoint POST /api/generate/long and queue job type "long_video". Takes an image + prompt + target duration, automatically chains: 1. Initial I2V segment from source image 2. Extract last frame, generate next segment conditioned on it 3. Repeat until target duration reached 4. Concatenate all segments with ffmpeg into single video Also available via queue submit with type="long_video".

Add FluxKleinImagePipeline with bitsandbytes NF4 4-bit quantization for the transformer, reducing VRAM from ~23GB to ~16GB peak while maintaining identical speed and full LoRA compatibility. Pipeline uses CPU offload for the T5-XXL text encoder and fresh VAE decode on CPU to avoid the Windows/CUDA segfault with accelerate hooks. Includes model download spec, pipeline handler integration, image generation handler routing for flux-klein-9b model selection, and tests.

Add in-app LoRA browsing with CivitAI API search, download, and local library management. Includes backend routes for LoRA catalog CRUD and thumbnail serving, CivitAI API key settings, LoRA selection in generation UI with trigger phrase support (prepend/append/off modes), and frontend LoraBrowser component with search, download progress, and library views.

NF4 test script validates bitsandbytes 4-bit quantization with LoRA support on FLUX Klein 9B. Confirmed: 16GB peak VRAM, ~110s total, LoRAs work perfectly with quantized transformer.

Set PYTHONNOUSERSITE=1 to prevent system Python site-packages from leaking into the bundled runtime, which can cause import crashes.

Let updates auto-install when the user naturally quits instead of calling quitAndInstall which disrupts active work.

HuggingFace now uses xet protocol by default — patch both http_get and xet_get for progress callbacks. Switch from speed_mbps (int) to speed_bytes_per_sec (float) for better granularity, add EWMA smoothing for speed display, and use Math.round() for float progress values.

Spec for supporting GGUF, NF4, and FP8 checkpoint formats to enable LTX 2.3 video generation on 24GB GPUs (RTX 3090, 4070 Ti Super). Includes Model Guide UI, model scanner service, and pipeline extensions.

…odel support 10-task plan covering ModelScanner service, handler/routes, Models tab, ModelGuideDialog, PipelinesHandler format routing, GGUF/NF4 pipeline scaffolds, and README section.

… types

…ing builtin

… fake, and tests

…cept to scanner loop

…egration tests

…ations

Machine King (taskmasterpeace) added 30 commits March 6, 2026 15:12

feat: add model name badges to generated images and videos

927e14f

Show which AI model produced each result with a badge overlay on VideoPlayer and ImageResult components. Tracks lastModel in generation state and maps model IDs to display names.

feat: add Replicate video API client and job queue system

4c14d56

New VideoAPIClient protocol with ReplicateVideoClientImpl for Seedance 1.5 Pro cloud video generation. Job queue with submit/status/cancel routes for managing generation jobs. Includes full test coverage.

feat: add lastFramePath field and prompt enhancement route

24c7366

- GenerateVideoRequest.lastFramePath for first/last frame video gen - POST /api/enhance-prompt using Gemini to enhance rough prompts - EnhancePromptHandler with cinematic prompt expansion - 5 new tests covering both features

feat: add video extend feature — continue generating from last frame

9f81326

Extract the last frame of a generated video and use it as the first frame for the next generation. FastForward button in VideoPlayer controls. Clears prompt so user describes what happens next.

feat: add Palette connection UI in Settings with sync status and credits

35e4dee

Director's Palette section in API Keys tab with API key input, connection status indicator, user email display, and live credits balance. Polls /api/sync/status and /api/sync/credits every 60s.

chore: add dev-only electronAPI mock for browser testing

2232a3c

Injects a mock window.electronAPI when running in a plain browser (non-Electron). Gated by !window.electronAPI && location.protocol === 'http:' so it never activates in production Electron builds.

fix: only track own submitted job for isGenerating state

33dd6d3

Previously, hasRunning checked ALL jobs in the queue, so stale/orphaned jobs from previous sessions would permanently block the UI in "generating" state. Now only the actively-submitted job determines isGenerating.

docs: add GPU optimization results and benchmark comparison report

f96d656

feat: add video LoRA support for LTX-2 local generation

c1e4bd3

LoRAs are loaded at pipeline creation time via DistilledPipeline. Pipeline is recreated when the requested LoRA changes, and reused when it matches. Frontend sends loraPath/loraWeight params for video jobs through the queue.

fix: use first-frame conditioning for video extend

a6fdcd7

The distilled pipeline doesn't support last-frame conditioning (frame_idx=num_frames-1), causing tensor shape mismatches. Using frame_idx=0 instead works identically — the new video continues from the provided frame.

test: add NF4 quantization test script and long video tests

6dce1f1

NF4 test script validates bitsandbytes 4-bit quantization with LoRA support on FLUX Klein 9B. Confirmed: 16GB peak VRAM, ~110s total, LoRAs work perfectly with quantized transformer.

fix: isolate bundled Python from system packages (upstream c3fac8e)

654dda3

Set PYTHONNOUSERSITE=1 to prevent system Python site-packages from leaking into the bundled runtime, which can cause import crashes.

fix: don't force-quit app when update is ready (upstream 049cb1f)

5176e84

Let updates auto-install when the user naturally quits instead of calling quitAndInstall which disrupts active work.

feat(palette-sync): add list_loras to PaletteSyncClient

8b8c2ae

feat(palette-sync): add sync_loras to SyncHandler with LoRA download

a0d2230

feat(palette-sync): add Sync LoRAs button in settings

4e422c2

feat(palette-sync): add sync-loras route and integration tests

742dd77

Machine King (taskmasterpeace) added 14 commits March 22, 2026 23:52

docs: add quantized video model support design spec

136d7f8

Spec for supporting GGUF, NF4, and FP8 checkpoint formats to enable LTX 2.3 video generation on 24GB GPUs (RTX 3090, 4070 Ti Super). Includes Model Guide UI, model scanner service, and pipeline extensions.

docs(quantized-models): add implementation plan for quantized video m…

eb56e3d

…odel support 10-task plan covering ModelScanner service, handler/routes, Models tab, ModelGuideDialog, PipelinesHandler format routing, GGUF/NF4 pipeline scaffolds, and README section.

feat(quantized-models): add gguf dependency, settings fields, and API…

12ad42e

… types

refactor: rename DetectedModel.format to model_format to avoid shadow…

b354c7c

…ing builtin

feat(quantized-models): add ModelScanner service with Protocol, impl,…

63fec2a

… fake, and tests

fix: correct model_guide_data to target LTX 2.3 22B, add outer try/ex…

fba4805

…cept to scanner loop

test(quantized-models): add model guide recommendation tests

b59b7c2

feat(quantized-models): wire ModelScanner to handler, routes, and int…

0ea7455

…egration tests

feat(quantized-models): add Models tab to SettingsModal

a85b018

feat(quantized-models): add ModelGuideDialog popup with GPU recommend…

736a58c

…ations

feat(quantized-models): update PipelinesHandler to route by model format

434b865

feat(quantized-models): add GGUF video pipeline scaffold

1a34407

feat(quantized-models): add NF4 video pipeline scaffold

bd13fc9

docs(quantized-models): add Custom Video Models section to README

086b59c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add quantized video model support (GGUF, NF4, FP8)#74

feat: add quantized video model support (GGUF, NF4, FP8)#74
Machine King (taskmasterpeace) wants to merge 44 commits intoLightricks:mainfrom
taskmasterpeace:feat/quantized-models

Machine King (taskmasterpeace) commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Machine King (taskmasterpeace) commented Mar 23, 2026

Summary

Backend changes

Frontend changes

Docs

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant