Skip to content

feat: add quantized video model support (GGUF, NF4, FP8)#74

Open
Machine King (taskmasterpeace) wants to merge 44 commits intoLightricks:mainfrom
taskmasterpeace:feat/quantized-models
Open

feat: add quantized video model support (GGUF, NF4, FP8)#74
Machine King (taskmasterpeace) wants to merge 44 commits intoLightricks:mainfrom
taskmasterpeace:feat/quantized-models

Conversation

@taskmasterpeace

Summary

  • Add support for quantized LTX 2.3 video model formats (GGUF Q4/Q5/Q8, NF4, FP8) so Directors Desktop can run on 24GB GPUs (RTX 3090, 4070 Ti Super) instead of requiring 32GB+
  • New ModelScanner service detects model files by metadata (GGUF magic bytes, safetensors headers, NF4 config)
  • New Models tab in Settings with model dropdown, folder picker, GPU info, and distilled LoRA status
  • New Model Guide popup with GPU-based format recommendations and HuggingFace download links
  • PipelinesHandler routes to the correct pipeline class based on selected model format
  • GGUF and NF4 pipeline classes are scaffolded (NotImplementedError) — actual loading requires testing with real model files

Backend changes

  • gguf>=0.10.0 dependency added
  • ModelScanner Protocol + Impl + Fake (services pattern)
  • 3 new API endpoints: GET /api/models/video/scan, POST /api/models/video/select, GET /api/models/video/guide
  • GGUFFastVideoPipeline and NF4FastVideoPipeline scaffolds
  • 25 new tests (463 total passing)

Frontend changes

  • Models tab in SettingsModal (model dropdown, folder picker, GPU info, scan button)
  • ModelGuideDialog component (format cards, download links, setup instructions)
  • New settings: customVideoModelPath, selectedVideoModel

Docs

  • README section with VRAM table and setup instructions

Test plan

  • pnpm typecheck passes (pyright + tsc)
  • pnpm backend:test passes (463 tests)
  • Manual: Open Settings → Models tab, verify GPU info displays
  • Manual: Click Model Guide, verify format cards and download links
  • Manual: Set custom model folder, scan, select a model
  • Manual: Verify default BF16 pipeline still works when no model is selected

🤖 Generated with Claude Code

…cons

Replace LTX branding throughout the app with Director's Desktop. New colorful
palette+clapperboard logo (SVG + generated PNG/ICO icons). Updated product name,
window titles, loading text, about section, and electron-builder config.
Show which AI model produced each result with a badge overlay on
VideoPlayer and ImageResult components. Tracks lastModel in generation
state and maps model IDs to display names.
Swap out fal.ai API backend for Replicate with multi-model support.
New ImageAPIClient protocol with ReplicateImageClientImpl supporting
Z-Image Turbo and Nano Banana 2. Settings updated from fal_api_key to
replicate_api_key with image_model selector. All 247 tests pass.
New VideoAPIClient protocol with ReplicateVideoClientImpl for Seedance
1.5 Pro cloud video generation. Job queue with submit/status/cancel
routes for managing generation jobs. Includes full test coverage.
Design document covering 6-phase integration between Director's Desktop
and Director's Palette (auth, gallery/library sync, generation upgrades,
power tools, advanced features, testing). Phase 1 plan with 14 TDD tasks.
- palette_api_key setting with masked responses
- PaletteSyncClient protocol + HTTP implementation + fake test double
- /api/sync/status and /api/sync/credits routes
- SyncHandler wired into AppHandler composition root
- 5 new integration tests for sync behavior
- GenerateVideoRequest.lastFramePath for first/last frame video gen
- POST /api/enhance-prompt using Gemini to enhance rough prompts
- EnhancePromptHandler with cinematic prompt expansion
- 5 new tests covering both features
…aspect labels

- FrameSlot component with paste/drop/browse support
- First Frame and Last Frame slots in Playground for video generation
- Sparkle button to enhance prompts via Gemini API
- Image variations slider (1-12) in text-to-image settings
- Social media labels on aspect ratio presets (YouTube, TikTok, Instagram)
- 4:5 Instagram Post aspect ratio option for images
Extract the last frame of a generated video and use it as the first
frame for the next generation. FastForward button in VideoPlayer
controls. Clears prompt so user describes what happens next.
Director's Palette section in API Keys tab with API key input,
connection status indicator, user email display, and live credits
balance. Polls /api/sync/status and /api/sync/credits every 60s.
Injects a mock window.electronAPI when running in a plain browser
(non-Electron). Gated by !window.electronAPI && location.protocol === 'http:'
so it never activates in production Electron builds.
…Features

Phase 2 — Gallery + Library:
- Local gallery backend: scan outputs dir, pagination, type filtering, delete
- Library backend: Characters, Styles, References CRUD with JSON persistence
- Frontend Gallery view with filter tabs, model badges, lightbox preview
- Frontend Characters/Styles/References views with add/edit/delete modals
- New organized sidebar: Create, Edit, Library, Tools sections

Phase 4 — Power Tools:
- Wildcard parser with Cartesian expansion and random mode
- Prompt Library with search, tags, usage tracking
- Frontend Wildcards view with test/expand area
- Frontend Prompt Library view with search, sort, copy-to-clipboard

Phase 5 — Advanced Features:
- Receive-job endpoint for Palette→Desktop generation dispatch
- Contact Sheet generation (9 cinematic angles from reference)
- Style Guide Grid generation (9 diverse subjects in one style)

81 new backend tests, all 339 passing. TypeScript + Pyright clean.
Previously, hasRunning checked ALL jobs in the queue, so stale/orphaned
jobs from previous sessions would permanently block the UI in "generating"
state. Now only the actively-submitted job determines isGenerating.
…d UI

- Extend QueueJob with batch_id, depends_on, auto_params, tags fields
- Add JobQueue helpers: jobs_for_batch, active_batch_ids, queued_jobs_for_slot
- Add batch API types: BatchSubmitRequest, SweepDefinition, PipelineDefinition, BatchReport
- Implement BatchHandler with list, sweep (cartesian product), and pipeline modes
- Add QueueWorker dependency checking, auto-param resolution, batch completion detection
- Wire i2v auto-prompt generation into dependent job resolution
- Add batch routes: submit-batch, batch status, cancel, retry-failed
- Include batch fields in QueueJobResponse and queue status endpoint
- Add batchSoundEnabled setting
- Add frontend batch types, API client, useBatch hook with polling
- Create BatchBuilderModal with List, Import (CSV/JSON), and Grid Sweep tabs
- Add Batch button to GenSpace prompt bar
- 37 new/updated backend tests, all 372 backend tests pass
- TypeScript and Pyright clean, Vite build succeeds
…d R2 storage

- FFN chunked feedforward reduces peak VRAM by up to 8x (setting: ffnChunkCount)
- TeaCache timestep-aware caching for 1.6-2.1x denoising speedup (setting: teaCacheThreshold)
- Aggressive VRAM deep_cleanup after every GPU job prevents post-heavy-load stalls
- R2/S3-compatible cloud storage upload for generated media (setting: autoUploadToR2)
- 382 tests passing including 10 new tests for optimizations
- Credit system: balance display, per-generation cost on buttons, auto-deduction after API jobs
- Output naming: dd_{model}_{prompt_slug}_{timestamp}.{ext} across all handlers
- Gallery parser supports both new dd_ and legacy filename formats
- Palette credits fallback: uses /credits/check when /credits returns 500
- Hardcoded pricing from live Palette API as fallback when credits endpoint unavailable
- Seedance time estimates (60s/5s@720p, 120s/10s@720p)
- Palette API spec, handoff docs, and integration plans
- README updated with credits, Seedance, Replicate API docs
LoRAs are loaded at pipeline creation time via DistilledPipeline. Pipeline
is recreated when the requested LoRA changes, and reused when it matches.
Frontend sends loraPath/loraWeight params for video jobs through the queue.
The distilled pipeline doesn't support last-frame conditioning
(frame_idx=num_frames-1), causing tensor shape mismatches. Using
frame_idx=0 instead works identically — the new video continues
from the provided frame.
New endpoint POST /api/generate/long and queue job type "long_video".
Takes an image + prompt + target duration, automatically chains:
1. Initial I2V segment from source image
2. Extract last frame, generate next segment conditioned on it
3. Repeat until target duration reached
4. Concatenate all segments with ffmpeg into single video

Also available via queue submit with type="long_video".
Add FluxKleinImagePipeline with bitsandbytes NF4 4-bit quantization for
the transformer, reducing VRAM from ~23GB to ~16GB peak while maintaining
identical speed and full LoRA compatibility. Pipeline uses CPU offload
for the T5-XXL text encoder and fresh VAE decode on CPU to avoid the
Windows/CUDA segfault with accelerate hooks.

Includes model download spec, pipeline handler integration, image
generation handler routing for flux-klein-9b model selection, and tests.
Add in-app LoRA browsing with CivitAI API search, download, and local
library management. Includes backend routes for LoRA catalog CRUD and
thumbnail serving, CivitAI API key settings, LoRA selection in generation
UI with trigger phrase support (prepend/append/off modes), and frontend
LoraBrowser component with search, download progress, and library views.
NF4 test script validates bitsandbytes 4-bit quantization with LoRA
support on FLUX Klein 9B. Confirmed: 16GB peak VRAM, ~110s total,
LoRAs work perfectly with quantized transformer.
Set PYTHONNOUSERSITE=1 to prevent system Python site-packages from
leaking into the bundled runtime, which can cause import crashes.
Let updates auto-install when the user naturally quits instead of
calling quitAndInstall which disrupts active work.
HuggingFace now uses xet protocol by default — patch both http_get and
xet_get for progress callbacks. Switch from speed_mbps (int) to
speed_bytes_per_sec (float) for better granularity, add EWMA smoothing
for speed display, and use Math.round() for float progress values.
Spec for supporting GGUF, NF4, and FP8 checkpoint formats to enable
LTX 2.3 video generation on 24GB GPUs (RTX 3090, 4070 Ti Super).
Includes Model Guide UI, model scanner service, and pipeline extensions.
…odel support

10-task plan covering ModelScanner service, handler/routes, Models tab,
ModelGuideDialog, PipelinesHandler format routing, GGUF/NF4 pipeline
scaffolds, and README section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant