feat: add quantized video model support (GGUF, NF4, FP8)#74
Open
Machine King (taskmasterpeace) wants to merge 44 commits intoLightricks:mainfrom
Open
feat: add quantized video model support (GGUF, NF4, FP8)#74Machine King (taskmasterpeace) wants to merge 44 commits intoLightricks:mainfrom
Machine King (taskmasterpeace) wants to merge 44 commits intoLightricks:mainfrom
Conversation
…cons Replace LTX branding throughout the app with Director's Desktop. New colorful palette+clapperboard logo (SVG + generated PNG/ICO icons). Updated product name, window titles, loading text, about section, and electron-builder config.
Show which AI model produced each result with a badge overlay on VideoPlayer and ImageResult components. Tracks lastModel in generation state and maps model IDs to display names.
Swap out fal.ai API backend for Replicate with multi-model support. New ImageAPIClient protocol with ReplicateImageClientImpl supporting Z-Image Turbo and Nano Banana 2. Settings updated from fal_api_key to replicate_api_key with image_model selector. All 247 tests pass.
New VideoAPIClient protocol with ReplicateVideoClientImpl for Seedance 1.5 Pro cloud video generation. Job queue with submit/status/cancel routes for managing generation jobs. Includes full test coverage.
Design document covering 6-phase integration between Director's Desktop and Director's Palette (auth, gallery/library sync, generation upgrades, power tools, advanced features, testing). Phase 1 plan with 14 TDD tasks.
- palette_api_key setting with masked responses - PaletteSyncClient protocol + HTTP implementation + fake test double - /api/sync/status and /api/sync/credits routes - SyncHandler wired into AppHandler composition root - 5 new integration tests for sync behavior
- GenerateVideoRequest.lastFramePath for first/last frame video gen - POST /api/enhance-prompt using Gemini to enhance rough prompts - EnhancePromptHandler with cinematic prompt expansion - 5 new tests covering both features
…aspect labels - FrameSlot component with paste/drop/browse support - First Frame and Last Frame slots in Playground for video generation - Sparkle button to enhance prompts via Gemini API - Image variations slider (1-12) in text-to-image settings - Social media labels on aspect ratio presets (YouTube, TikTok, Instagram) - 4:5 Instagram Post aspect ratio option for images
Extract the last frame of a generated video and use it as the first frame for the next generation. FastForward button in VideoPlayer controls. Clears prompt so user describes what happens next.
Director's Palette section in API Keys tab with API key input, connection status indicator, user email display, and live credits balance. Polls /api/sync/status and /api/sync/credits every 60s.
Injects a mock window.electronAPI when running in a plain browser (non-Electron). Gated by !window.electronAPI && location.protocol === 'http:' so it never activates in production Electron builds.
…Features Phase 2 — Gallery + Library: - Local gallery backend: scan outputs dir, pagination, type filtering, delete - Library backend: Characters, Styles, References CRUD with JSON persistence - Frontend Gallery view with filter tabs, model badges, lightbox preview - Frontend Characters/Styles/References views with add/edit/delete modals - New organized sidebar: Create, Edit, Library, Tools sections Phase 4 — Power Tools: - Wildcard parser with Cartesian expansion and random mode - Prompt Library with search, tags, usage tracking - Frontend Wildcards view with test/expand area - Frontend Prompt Library view with search, sort, copy-to-clipboard Phase 5 — Advanced Features: - Receive-job endpoint for Palette→Desktop generation dispatch - Contact Sheet generation (9 cinematic angles from reference) - Style Guide Grid generation (9 diverse subjects in one style) 81 new backend tests, all 339 passing. TypeScript + Pyright clean.
Previously, hasRunning checked ALL jobs in the queue, so stale/orphaned jobs from previous sessions would permanently block the UI in "generating" state. Now only the actively-submitted job determines isGenerating.
…d UI - Extend QueueJob with batch_id, depends_on, auto_params, tags fields - Add JobQueue helpers: jobs_for_batch, active_batch_ids, queued_jobs_for_slot - Add batch API types: BatchSubmitRequest, SweepDefinition, PipelineDefinition, BatchReport - Implement BatchHandler with list, sweep (cartesian product), and pipeline modes - Add QueueWorker dependency checking, auto-param resolution, batch completion detection - Wire i2v auto-prompt generation into dependent job resolution - Add batch routes: submit-batch, batch status, cancel, retry-failed - Include batch fields in QueueJobResponse and queue status endpoint - Add batchSoundEnabled setting - Add frontend batch types, API client, useBatch hook with polling - Create BatchBuilderModal with List, Import (CSV/JSON), and Grid Sweep tabs - Add Batch button to GenSpace prompt bar - 37 new/updated backend tests, all 372 backend tests pass - TypeScript and Pyright clean, Vite build succeeds
…d R2 storage - FFN chunked feedforward reduces peak VRAM by up to 8x (setting: ffnChunkCount) - TeaCache timestep-aware caching for 1.6-2.1x denoising speedup (setting: teaCacheThreshold) - Aggressive VRAM deep_cleanup after every GPU job prevents post-heavy-load stalls - R2/S3-compatible cloud storage upload for generated media (setting: autoUploadToR2) - 382 tests passing including 10 new tests for optimizations
- Credit system: balance display, per-generation cost on buttons, auto-deduction after API jobs
- Output naming: dd_{model}_{prompt_slug}_{timestamp}.{ext} across all handlers
- Gallery parser supports both new dd_ and legacy filename formats
- Palette credits fallback: uses /credits/check when /credits returns 500
- Hardcoded pricing from live Palette API as fallback when credits endpoint unavailable
- Seedance time estimates (60s/5s@720p, 120s/10s@720p)
- Palette API spec, handoff docs, and integration plans
- README updated with credits, Seedance, Replicate API docs
LoRAs are loaded at pipeline creation time via DistilledPipeline. Pipeline is recreated when the requested LoRA changes, and reused when it matches. Frontend sends loraPath/loraWeight params for video jobs through the queue.
The distilled pipeline doesn't support last-frame conditioning (frame_idx=num_frames-1), causing tensor shape mismatches. Using frame_idx=0 instead works identically — the new video continues from the provided frame.
New endpoint POST /api/generate/long and queue job type "long_video". Takes an image + prompt + target duration, automatically chains: 1. Initial I2V segment from source image 2. Extract last frame, generate next segment conditioned on it 3. Repeat until target duration reached 4. Concatenate all segments with ffmpeg into single video Also available via queue submit with type="long_video".
Add FluxKleinImagePipeline with bitsandbytes NF4 4-bit quantization for the transformer, reducing VRAM from ~23GB to ~16GB peak while maintaining identical speed and full LoRA compatibility. Pipeline uses CPU offload for the T5-XXL text encoder and fresh VAE decode on CPU to avoid the Windows/CUDA segfault with accelerate hooks. Includes model download spec, pipeline handler integration, image generation handler routing for flux-klein-9b model selection, and tests.
Add in-app LoRA browsing with CivitAI API search, download, and local library management. Includes backend routes for LoRA catalog CRUD and thumbnail serving, CivitAI API key settings, LoRA selection in generation UI with trigger phrase support (prepend/append/off modes), and frontend LoraBrowser component with search, download progress, and library views.
NF4 test script validates bitsandbytes 4-bit quantization with LoRA support on FLUX Klein 9B. Confirmed: 16GB peak VRAM, ~110s total, LoRAs work perfectly with quantized transformer.
Set PYTHONNOUSERSITE=1 to prevent system Python site-packages from leaking into the bundled runtime, which can cause import crashes.
Let updates auto-install when the user naturally quits instead of calling quitAndInstall which disrupts active work.
HuggingFace now uses xet protocol by default — patch both http_get and xet_get for progress callbacks. Switch from speed_mbps (int) to speed_bytes_per_sec (float) for better granularity, add EWMA smoothing for speed display, and use Math.round() for float progress values.
Spec for supporting GGUF, NF4, and FP8 checkpoint formats to enable LTX 2.3 video generation on 24GB GPUs (RTX 3090, 4070 Ti Super). Includes Model Guide UI, model scanner service, and pipeline extensions.
…odel support 10-task plan covering ModelScanner service, handler/routes, Models tab, ModelGuideDialog, PipelinesHandler format routing, GGUF/NF4 pipeline scaffolds, and README section.
…cept to scanner loop
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Backend changes
gguf>=0.10.0dependency addedModelScannerProtocol + Impl + Fake (services pattern)GET /api/models/video/scan,POST /api/models/video/select,GET /api/models/video/guideGGUFFastVideoPipelineandNF4FastVideoPipelinescaffoldsFrontend changes
customVideoModelPath,selectedVideoModelDocs
Test plan
pnpm typecheckpasses (pyright + tsc)pnpm backend:testpasses (463 tests)🤖 Generated with Claude Code