README.md is the product overview. This document is the engineering reference: staged workflow, checkpoint contracts, model routing, agent coordination, and runtime surfaces.
ExplainFlow is an agent-coordinated production studio. It extracts a grounded signal from source material, locks a script pack before generation, streams interleaved multimodal scenes with QA gates, and keeps proof linked to every claim in the output.
The system exposes two product surfaces:
- Advanced Studio — staged workflow with checkpoint control, planner review, proof-linked scene generation, and selective scene regeneration
- Quick — single-pass artifact generation with derived Proof Reel and MP4 export
For the full data lifecycle (source → signal → script pack → scenes → proof → export), see data-pipeline.md.
flowchart TD
subgraph USER["User"]
QUICK["Quick Mode"]
ADVANCED["Advanced Studio"]
end
subgraph FRONTEND["Next.js Frontend — Google Cloud Run"]
UI["Studio UI\nSource Intake · Render Profile\nScene Review · Proof Viewer · Export"]
SSE_CLIENT["SSE Stream Client\nReal-Time Scene Rendering"]
end
subgraph BACKEND["FastAPI Backend — Google Cloud Run"]
ROUTES["REST API + SSE Endpoints"]
COORD["AgentCoordinator\nCP1 Signal → CP2 Artifacts → CP3 Render\n→ CP4 Script → CP5 Stream → CP6 Bundle"]
STORY["GeminiStoryAgent\nExtraction · Planning · Scene Streaming"]
CHAT["WorkflowChatAgent\nCheckpoint-Aware Co-Direction"]
QA["Planner QA + Scene QA\nSelf-Healing · Auto-Retry"]
end
subgraph GEMINI["Gemini Models — Google GenAI SDK"]
PRO["gemini-3.1-pro-preview\nSignal Extraction (structural + creative)\nScript Pack Planning · Constrained Replans\nWorkflow Chat"]
LITE["gemini-3.1-flash-lite-preview\nTranscript Normalization\nPlanner Precompute (salience, forward-pull)\nQuick Artifact Generation"]
IMG["gemini-3-pro-image-preview\nInterleaved Text + Image\nScene Generation · Scene Regen\nQuick Image Generation"]
FLASH["gemini-3-flash-preview\nAsset-Backed Source Recovery"]
end
GCS[("Google Cloud Storage\nSource Assets · Scene Media\nProof Media · Final Outputs")]
LOCAL[("Local Static Assets\nAudio · Video · Bundles")]
QUICK & ADVANCED --> UI
UI -->|"REST"| ROUTES
ROUTES --> COORD
ROUTES --> CHAT
CHAT --> COORD
CHAT --> STORY
COORD --> STORY
STORY --> QA
QA -->|"Auto-Retry on QA Fail"| STORY
STORY --> PRO
STORY --> LITE
STORY --> IMG
STORY --> FLASH
CHAT -->|"GenAI SDK"| PRO
STORY -->|"Images + Source"| GCS
STORY -->|"Audio + Video"| LOCAL
ROUTES -->|"SSE Stream"| SSE_CLIENT
SSE_CLIENT --> UI
style USER fill:#f8fafc,stroke:#94a3b8,color:#1e293b
style FRONTEND fill:#dbeafe,stroke:#3b82f6,color:#1e293b
style BACKEND fill:#fef3c7,stroke:#f59e0b,color:#1e293b
style GEMINI fill:#fce7f3,stroke:#ec4899,color:#1e293b
style GCS fill:#d1fae5,stroke:#10b981,color:#1e293b
style LOCAL fill:#e2e8f0,stroke:#64748b,color:#1e293b
If you are new to the repo, read this document in this order:
System Architecturefor the high-level component map.End-to-End Flowfor the full staged lifecycle.Workflow Checkpoints,Planning Layers, andScene Generation and QAfor the architectural choices that make the workflow inspectable and repairable.Codebase ShapeandAPI Surfacelast, as the implementation appendix.
This sequence describes the checkpointed Advanced Studio path. Quick follows a lighter derived flow (artifact -> Proof Reel -> MP4) and does not reuse every workflow checkpoint.
sequenceDiagram
autonumber
participant U as User
participant W as "Advanced Studio (UI)"
participant WF as AgentCoordinator
participant API as GeminiStoryAgent
participant LLM as "Gemini 3.1 Pro / 3 Pro Image"
rect rgb(240, 248, 255)
note right of U: "PHASE 1: Source Ingestion and Extraction"
U->>W: Paste source text or upload assets
opt uploaded assets provided
W->>API: POST /source-assets/upload
API-->>W: source_manifest asset records
end
W->>WF: POST /workflow/start
W->>WF: POST /workflow/{id}/extract-signal
WF->>API: extract_signal(...)
API->>LLM: Signal extraction (prompt + uploaded Gemini Files)
LLM-->>API: content_signal
API-->>WF: extraction result
WF-->>W: workflow + content_signal
note over WF: "CP1_SIGNAL_READY"
end
rect rgb(245, 255, 250)
note right of U: "PHASE 2: Artifact-Aware Planning and QA"
W->>WF: lock artifacts + render profile
note over WF: "CP2_ARTIFACTS_LOCKED and CP3_RENDER_LOCKED"
W->>WF: POST /workflow/{id}/generate-script-pack
WF->>API: build ScriptPackRequest
API->>LLM: salience pass + forward-pull pass + planner
LLM-->>API: outline
API->>API: compile script_pack + repair/replan + multimodal enrichment
API-->>W: script_pack_ready
note over WF: "CP4_SCRIPT_LOCKED"
end
rect rgb(255, 250, 240)
note right of U: "PHASE 3: Live Multimodal Streaming and QA"
W->>WF: POST /workflow/{id}/generate-stream
WF->>API: build AdvancedStreamRequest
API-->>W: scene_queue_ready
loop Scene by scene
API-->>W: scene_start
API-->>W: source_media_ready (0..N proof assets)
API->>LLM: generate scene text + image
LLM-->>API: text chunks + inline image bytes
API-->>W: story_text_delta / diagram_ready
API->>API: optional audio generation
API-->>W: audio_ready
API->>API: scene QA evaluation
API-->>W: qa_status
alt first attempt fails hard QA
API-->>W: qa_retry + scene_retry_reset
note over API,LLM: "Auto-correction loop"
API->>LLM: constrained rerender
LLM-->>API: updated text + image
API-->>W: story_text_delta / diagram_ready / audio_ready
API->>API: scene QA
API-->>W: qa_status
end
API-->>W: scene_done
end
end
API-->>W: final_bundle_ready
note over WF: "CP6_BUNDLE_FINALIZED"
flowchart LR
subgraph INTAKE["1. Source Intake"]
A["input_text / source_text"]
B["source_manifest.assets[]"]
C["normalized_source_text + source_text_origin"]
D["Text, uploads, or transcript-backed video context"]
A --> D
B -->|"uri, modality, transcript/ocr metadata"| D
C --> D
end
subgraph SIGNAL["2. Extracted Signal"]
E["content_signal"]
F["thesis"]
G["key_claims[]"]
H["evidence_snippets[]\nbbox_norm / start_ms / page_index"]
D -->|"Gemini 3.1 Pro"| E
E --> F
E --> G
G --> H
end
subgraph PACK["3. Script Pack"]
I["ScriptPackScene"]
J["claim_refs[]"]
K["evidence_refs[]"]
L["source_media[]"]
M["render_strategy"]
E -->|"Planner + QA"| I
I --> J
I --> K
I --> L
I --> M
end
subgraph STREAM["4. SSE Stream Events"]
N["SSE client"]
O["scene_start"]
P["story_text_delta"]
Q["source_media_ready"]
R["qa_status"]
I -->|"AgentCoordinator"| N
N -.-> O
N -.-> P
N -.-> Q
N -.-> R
end
style E fill:#f9f2f4,stroke:#d9534f
style I fill:#d9edf7,stroke:#5bc0de
style N fill:#dff0d8,stroke:#5cb85c
The extraction layer now supports three intake shapes:
- text-first input
input_textorsource_text
- asset-backed input
source_manifest.assets[]
- transcript-backed source video input
- transcript in
source_text - source identity in
source_manifest.assets[] - normalization and provenance in
normalized_source_textandsource_text_origin
- transcript in
Each source_manifest.assets[] item can carry:
asset_idmodalityuri- optional transcript, OCR, duration, page, and metadata fields
In practice, that means ExplainFlow can begin extraction from:
- text-only
- uploaded assets only
- text plus uploaded assets
- uploaded video plus transcript/captions
- YouTube URL plus transcript/captions in Quick, represented internally as transcript-backed video context rather than a dedicated
youtube_urlrequest field
content_signal remains the planner backbone. Important fields:
thesiskey_claimsconceptsvisual_candidatesnarrative_beats
For multimodal inputs, each key_claim can now carry structured evidence_snippets[], which are normalized into evidence refs with:
evidence_idasset_idmodalitystart_ms/end_mspage_indexbbox_normquote_text/transcript_textvisual_context
ScriptPack and ScriptPackScene are still the shared planner contract. Important scene fields now include:
claim_refsevidence_refssource_media[]render_strategy- artifact-specific fields such as
scene_mode,layout_template,focal_subject,modules,visual_hierarchy
ExplainFlow still uses JSON contracts throughout. Multimodal traceability is represented as structured fields, not XML tags.
The coordinator uses six named checkpoints:
CP1_SIGNAL_READYCP2_ARTIFACTS_LOCKEDCP3_RENDER_LOCKEDCP4_SCRIPT_LOCKEDCP5_STREAM_COMPLETECP6_BUNDLE_FINALIZED
Key properties of the workflow layer:
- source changes invalidate downstream checkpoints
- artifact and render changes invalidate only the necessary later checkpoints
- fidelity-only final-bundle upgrades preserve the locked script pack
- workflow chat can answer questions without forcing checkpoint regressions
AgentCoordinator now also persists source_manifest so script-pack generation and stream generation can reconstruct the same multimodal source context later.
The Script Pack checkpoint is the architectural hinge of ExplainFlow.
ExplainFlow could stream directly from:
- extracted signal
- locked artifact/render profile
But that would make the system faster only on the happy path. It would also make it much weaker operationally.
Before scene rendering starts, the Script Pack checkpoint gives the system one stable place to validate:
- scene count and pacing budget
- claim coverage across the full story
- per-scene acceptance checks
- artifact/layout intent
- proof affordances such as
claim_refs,evidence_refs, andsource_media - planner QA and repair/replan decisions
That checkpoint improves the system in practice:
- Recovery: if the network drops or the browser refreshes, the workflow can resume from a locked plan instead of rerunning extraction.
- Control: the user can inspect or redirect the plan before paying for scene text, image, and audio generation.
- Quality: planner repair happens once at the plan level instead of trying to patch scene drift after rendering has already begun.
- Traceability: proof metadata can be attached to planned scenes before the stream, so evidence viewing is part of the generation flow rather than an afterthought.
So the Script Pack gate is not bureaucracy. It is the boundary that makes staged recovery, proof-aware streaming, and consistent interleaved generation feasible.
The planner resolves one artifact policy before scene budgeting and enrichment routing.
| Artifact Type | Planning Mode | Default Scene Count | Salience Pass | Forward-Pull Pass |
|---|---|---|---|---|
storyboard_grid |
sequential |
4 |
FULL |
FULL |
comparison_one_pager |
static |
1 |
FULL |
LITE |
slide_thumbnail |
static |
1 |
LITE |
FULL |
technical_infographic |
static |
1 |
FULL |
OFF |
process_diagram |
static |
1 |
FULL |
OFF |
Important semantic note:
- the legacy artifact key
comparison_one_pagernow behaves as a generic modular one-pager board, not a strict left-right comparison layout
Planner execution currently includes:
- artifact policy resolution
- scene budget derivation
- optional salience pass
- optional forward-pull pass
- outline generation
- deterministic repair
- one constrained replan if hard issues survive repair
- planner QA summary emission
Signal extraction is good at recovering what the source says. Salience is there to decide what matters most for explanation.
The salience pass highlights:
- central ideas rather than side details
- stakes and consequences rather than flat facts
- surprise or contrast that deserves scene emphasis
- causal leverage: what changes the rest of the story if removed
- transformation: what moves the audience from old frame to new frame
Without that pass, the planner tends to overvalue whatever is merely explicit or repeated. With it, ExplainFlow can budget scenes around the claims that carry the most explanatory weight.
Forward-pull is the retention layer.
It uses a simple narrative lens:
- Bait
- what creates initial curiosity
- Hook
- what makes the viewer stay for the next beat
- Threat
- what raises tension, risk, or unresolved stakes
- Reward
- what payoff or clarity the viewer gets for continuing
- Payload
- what durable insight should land by the end
This is not meant to turn ExplainFlow into clickbait. It exists because strong explainers are not only correct; they pull the audience forward scene by scene.
In practice, this layer helps:
- choose a stronger opener
- avoid dead middle scenes
- place proof after curiosity instead of before it
- preserve a sense of payoff across the storyboard
The planner is allowed to be ambitious, but not sloppy.
Before scene generation begins, ExplainFlow checks the plan for:
- missing claim coverage
- weak scene progression
- bad artifact/layout fit
- missing acceptance checks
- proof or traceability gaps
If the issues are mechanical, ExplainFlow repairs them deterministically. If hard issues survive, it triggers one constrained replan rather than wasting a full generation run on a weak plan.
This is cheaper and more reliable than trying to fix every problem later at the scene level.
Quick is intentionally separate from the staged Advanced path.
Its current flow is:
- build a four-block artifact
- deterministically derive a Proof Reel from those blocks
- optionally derive an MP4 from the Proof Reel
This matters because Quick is optimized for:
- lower latency
- deterministic overrides
- fast demoability
- reuse of the same proof model without a second heavy planning pass
The derived layers currently behave like this:
- Artifact
- HTML-first explanation blocks with claim refs, evidence refs, and optional source media
- Proof Reel
- one ordered segment per Quick block
- chooses a cited source clip when available, otherwise the generated block image
- MP4
- composes the Proof Reel into a hackathon-grade video with voiceover, Ken Burns motion, and optional local proof intercuts
Quick intentionally avoids the full checkpointed workflow unless a future product version justifies that extra complexity.
ExplainFlow's staged workflow has six checkpoints, multiple locking stages, and conditional gates. The chat agent exists so users never have to track that state themselves.
It does two things:
Explains the workflow in context. It knows which checkpoint the workflow is at, what each stage means, and what is ready or blocked. Ask "What should I do next?" and it reads the checkpoint state and gives you the specific next action — not a generic tooltip.
Proposes safe, state-aware actions. It can propose extraction, profile lock, script-pack generation, or stream launch, and stage those actions behind explicit confirmation. If prerequisites are missing, it tells you exactly which checkpoints are blocked and why.
This means questions like:
- "What is the difference between signal and script pack?"
- "What should I do next?"
- "Why do I need to confirm signal?"
- "Can I continue from here without regenerating everything?"
get answered from live workflow state, not from a static help page.
Architecturally, the chat agent is a checkpoint-aware workflow agent backed by Gemini 3.1 Pro. It receives the full checkpoint state, chooses from a constrained action set, and returns structured responses with UI directives — which panel to open, which action to surface, whether confirmation is needed. It is wired directly to the AgentCoordinator and GeminiStoryAgent, not layered on top of them.
Scene generation is still interleaved:
- narration text streams incrementally
- image output arrives via inline image bytes
- audio is generated after text when the artifact type uses voiceover
QA runs before scene finalization and scores:
- narration length by artifact/layout
- focus adherence
must_includemust_avoid- continuity hints
The stream now surfaces:
qa_statusqa_retryscene_retry_reset- planner QA notes at script-pack time
The new multimodal layer does not replace the planner. It enriches the planned scene graph after planning and before streaming.
Implemented now:
POST /source-assets/uploadfor image, audio, and PDF assets- Advanced Studio asset picker and manifest wiring
- Gemini Files upload during extraction for local uploaded assets
- audio proof clips via
start_ms/end_ms - image/PDF proof crops via
bbox_norm - source-manifest persistence through workflow state
- scene-level
evidence_refs - scene-level
source_media - clickable proof viewer in Advanced Studio
- new
source_media_readySSE event - Quick Proof Reel derived from artifact blocks
- Quick MP4 export derived from Proof Reel segments
Not implemented yet:
- real video clip extraction/compositing
- automatic OCR/transcript segmentation beyond Gemini’s own file understanding
- source-media-first render strategies that skip generated visuals entirely
- proof-linked MP4 export
After a script pack is generated, GeminiStoryAgent deterministically enriches scenes:
- maps
claim_refsto structured evidence - adds
evidence_refs - adds
source_mediaentries when the referenced asset exists insource_manifest - upgrades
render_strategyfromgeneratedtohybridwhen proof media is attached
The first slice currently favors hybrid rendering:
- generated scene visuals still exist
- proof media is surfaced alongside them
- the proof viewer lets the user inspect the exact backing evidence
- extraction uploads local source assets to Gemini Files transiently, while ExplainFlow keeps its own local copy for proof playback and exports
script_pack_readyReturns the compiled script pack and optionalplanner_qa_summary.scene_queue_readyReturns queued scene metadata includingclaim_refs,evidence_refs,render_strategy, andsource_media_count.scene_startReturns the active scene, its trace payload,claim_refs,evidence_refs,render_strategy, and rawsource_mediarefs.story_text_deltaIncremental narration text.diagram_readyGenerated scene visual URL.audio_readyGenerated voiceover URL.qa_statusQA verdict with score, reasons, attempt, and word count.qa_retrySignals a retry after the first hard QA failure.scene_retry_resetClient reset marker before replaying a retried scene.scene_doneTerminal per-scene event.final_bundle_readyRun completion with bundle URL and claim/evidence traceability.errorFatal stream error.
source_media_readyReturns a resolved proof asset for the scene:asset_idmodalityusageurloriginal_urlstart_ms/end_mspage_indexbbox_normclaim_refsevidence_refslabelquote_textvisual_contextspeaker
For image/PDF proof regions, the backend can emit a cropped proof asset instead of the full original page.
web/src/app/advanced/page.tsxAdvanced Studio composition root that wires the staged workflow, assistant surface, scene review, and export UI.web/src/hooks/useAdvancedWorkflowActions.tsWorkflow mutations for extraction, profile locking, script-pack generation, and stage recovery.web/src/hooks/useAdvancedGenerationStream.tsAdvanced SSE stream orchestration, scene updates, QA events, and final-bundle state.web/src/hooks/useAdvancedWorkflowSession.tsWorkflow snapshot recovery, checkpoint hydration, and persisted session reload.web/src/components/AdvancedSourcePanel.tsxSource ingestion, uploaded-asset controls, and extraction CTA/footer layout.web/src/components/AdvancedRenderProfilePanel.tsxArtifact, audience, style, and constraints controls for render-profile locking.web/src/components/AdvancedContentSignalPanel.tsxSignal review and confirmation surface with progress summaries.web/src/components/AdvancedScriptPackPanel.tsxPlanner QA summary, script-pack review, and script-stage progress panel.web/src/components/AdvancedGenerationStreamPanel.tsxStream launch, live scene progress, and generation-stage reporting.web/src/components/SceneCard.tsxPer-scene card with claim badges, regeneration controls, and source-proof entry points.web/src/components/FinalBundle.tsxFinal bundle review, Advanced MP4 export, and upscale controls.web/src/app/quick/page.tsxQuick workflow composition root for artifact, reel, playlist, and MP4 delivery.web/src/components/QuickSourceForm.tsxQuick source/topic intake, transcript expectations, and local-video warnings.web/src/components/QuickReelView.tsxProof Reel review and segment playback controls.
api/app/routes/generate_stream.pyDirect generation routes for legacy SSE, Quick artifact/reel/video flows, Quick source indexing, and non-workflow Advanced entry points.api/app/routes/workflow.pyCheckpointed workflow orchestration routes, workflow-backed stream launch, scene regeneration, and workflow agent chat.api/app/routes/assets.pySource-asset upload, final-bundle export/video/download, quick-video download, and high-fidelity image upscaling endpoints.api/app/routes/sessions.pyFinal bundle retrieval byrun_id.
api/app/services/gemini_story_agent.pyTop-level orchestration shell for extraction, planning, advanced streaming, Quick entrypoints, and regeneration paths.api/app/services/agent_coordinator.pyWorkflow state machine, checkpoint invalidation rules, and request reconstruction.api/app/services/story_agent_source_media.pySource-media resolution, proof-link enrichment, and scene/source evidence attachment.api/app/services/story_agent_planner.pyPlanner QA evaluation, deterministic repair, and constrained replan helpers.api/app/services/story_agent_scene_generation.pyScene prompt assembly, continuity hints, and proof-grounded scene-generation helpers.api/app/services/story_agent_quick_workflows.pyQuick artifact, reel, MP4, and override orchestration.api/app/services/story_agent_extraction_runtime.pyExtraction runtime helpers for normalized text recovery and multimodal recovery paths.api/app/services/story_agent_advanced_stream.pyAdvanced buffered scene preparation, continuity memory, and scene coordination helpers.api/app/services/story_agent_advanced_qa.pyAdvanced buffered-scene QA / retry loop.api/app/services/story_agent_advanced_first_scene.pyAdvanced first-scene live retry/QA wrapper.api/app/services/interleaved_parser.pyScene QA and incremental text parsing.api/app/services/image_pipeline.pyAsset saving, thumbnail composition, proof-region cropping, and upscale helpers.api/app/services/video_pipeline.pyQuick and Advanced MP4 composition from already generated assets.api/app/services/final_bundle_export.pyZip assembly for final bundle export.api/app/services/source_ingest.pyLocal source-asset ingest and manifest construction for uploaded image/audio/PDF files.api/app/services/workflow_chat_agent.pyCheckpoint-aware chat agent for workflow explanation, safe action dispatch, and recovery guidance.
api/app/schemas/requests.pyRequest/response models, script-pack scene schema, planner QA summary, multimodal source models.api/app/schemas/events.pyTrace envelope, checkpoint events, scene trace updates, and SSE helpers.
| Method | Route | Purpose |
|---|---|---|
POST |
/extract-signal |
Direct signal extraction without workflow checkpoints. |
GET |
/generate-stream |
Legacy Quick SSE route kept for compatibility. |
POST |
/generate-stream-advanced |
Direct Advanced stream entrypoint without workflow checkpoint state. |
POST |
/generate-script-pack-advanced |
Direct Advanced script-pack generation without workflow checkpoint state. |
POST |
/regenerate-scene |
Legacy standalone scene regeneration route. |
| Method | Route | Purpose |
|---|---|---|
POST |
/quick-source-index/start |
Start transcript/source indexing for Quick source-backed runs. |
GET |
/quick-source-index/{job_id} |
Poll Quick source-index job status. |
POST |
/generate-quick-artifact |
Generate Quick artifact. |
POST |
/hydrate-quick-artifact-visuals |
Fill hero and block visuals for an existing Quick artifact. |
POST |
/generate-quick-reel |
Derive Proof Reel from Quick artifact blocks. |
POST |
/generate-quick-video |
Render Quick MP4 from Proof Reel / artifact state. |
POST |
/regenerate-quick-block |
Regenerate one Quick block in place. |
POST |
/regenerate-quick-artifact |
Regenerate a wider Quick artifact slice or whole artifact. |
| Method | Route | Purpose |
|---|---|---|
POST |
/workflow/start |
Create workflow state and initial checkpoint envelope. |
POST |
/workflow/{workflow_id}/extract-signal |
Run signal extraction into workflow state and mark CP1. |
GET |
/workflow/{workflow_id} |
Retrieve workflow snapshot and checkpoint state. |
GET |
/workflow/{workflow_id}/content-signal |
Retrieve the locked/extracted content signal. |
GET |
/workflow/{workflow_id}/script-pack |
Retrieve the locked script pack. |
POST |
/workflow/{workflow_id}/lock-artifacts |
Lock artifact scope into workflow state. |
POST |
/workflow/{workflow_id}/lock-render |
Lock render profile into workflow state. |
POST |
/workflow/{workflow_id}/apply-profile |
Lock artifacts and render profile together in one request. |
POST |
/workflow/{workflow_id}/generate-script-pack |
Build script pack from locked workflow state and mark CP4. |
POST |
/workflow/{workflow_id}/generate-stream |
Build Advanced stream request from locked workflow state and emit SSE events. |
POST |
/workflow/{workflow_id}/regenerate-scene |
Regenerate one Advanced scene using locked workflow context. |
POST |
/workflow/agent/chat |
Chat with the workflow-aware assistant agent. |
| Method | Route | Purpose |
|---|---|---|
POST |
/source-assets/upload |
Upload source files and build source_manifest asset entries. |
GET |
/final-bundle/{run_id} |
Retrieve final-bundle status by run ID. |
POST |
/final-bundle/export |
Export current scenes as a ZIP bundle. |
POST |
/final-bundle/video |
Render Advanced MP4 from current final-bundle scenes. |
GET |
/final-bundle/video/download |
Attachment-style download for rendered Advanced MP4. |
GET |
/quick-video/download |
Attachment-style download for rendered Quick MP4. |
POST |
/final-bundle/upscale |
Upscale existing final-bundle scene images without regeneration. |
The Advanced Studio scene card now supports:
- generated scene visuals
- generated audio
- proof-linked claim badges
- explicit
View Source Proofaction - proof dialog for:
- audio playback with seek fragment
- image/PDF proof crops
- evidence metadata display
This is the first visible multimodal UX wedge: claims can now open exact backing proof instead of behaving as passive labels.
ExplainFlow is strongest when it behaves like a director-controlled artifact engine rather than a notebook.
The architecture supports that through:
- locked signal before planning
- artifact-aware prompt routing
- deterministic planner repair
- planner QA surfaced to the user
- scene-level regeneration
- checkpoint-preserving fidelity upgrades
- multimodal proof linking without abandoning the existing JSON planner/runtime spine
The next incremental multimodal steps should be:
- stronger PDF page-level evidence extraction and rasterization
- stronger audio timestamp extraction and speaker-aware evidence refs
- source-media-first scene rendering for selected artifacts
- video proof clips after the evidence pipeline is stable
If there is time for final doc cleanup, the highest-value follow-ups are:
- convert
API Surfaceinto a compact markdown table with route, method, and purpose - do one selective pass to wrap true schema keys and event names in backticks more consistently