Replies: 1 comment
-
|
It is possible to provide images but only via model options which is provider specific. I like this idea although it may be complicated to translate those inputs into every format that's expected. There are now multiple types of images that can be provided - references, elements, start frame, end frame. This is worth exploring though. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
generateImage()andgenerateVideo()are currently centered around text-prompt inputs, but several providers and models support image-conditioned generation workflows.Examples include:
Today there is no obvious provider-agnostic way to pass image inputs into
generateImage()andgenerateVideo().TanStack AI already has a clean multimodal abstraction for content parts (
ImagePartwithsource.type: 'data' | 'url'). It would be great if media generation APIs reused that same shape instead of introducing provider-specific one-offs for image-conditioned generation.Why this matters
Modern image and video models are increasingly multimodal. Generation is no longer only text-to-image or text-to-video.
A unified way to pass image inputs would make it much easier for adapters to support workflows like:
Proposal
Add an optional
inputsfield to bothgenerateImage()andgenerateVideo()that accepts reusable multimodal content parts, ideally existingImagePartvalues.This would provide a consistent, provider-agnostic way to pass image-conditioned inputs into media generation APIs.
Example API
generateImage()generateVideo()Multiple reference images
Expected behavior
generateImage()andgenerateVideo()should both accept image-conditioned inputs through the same field name.ImagePart.Open design questions
inputs,references, or something else?ImagePart[], or broader content parts for future extensibility?generateVideo()support multiple input images as well, or only one initially?Summary
Request: add a unified, provider-agnostic way to pass image-conditioned inputs into both
generateImage()andgenerateVideo(), ideally by reusing existing multimodal content-part types such asImagePart.Beta Was this translation helpful? Give feedback.
All reactions