Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation inference with interleaved input #35

Open
ys-zong opened this issue Sep 29, 2024 · 2 comments
Open

Generation inference with interleaved input #35

ys-zong opened this issue Sep 29, 2024 · 2 comments

Comments

@ys-zong
Copy link

ys-zong commented Sep 29, 2024

Hi, thanks for the nice work! I wonder if Show-o supports inference with interleaved multimodal inputs, e.g., [text 1] [image 1] [text 2] [image 2] [text 3] -> generate a new image. If so, can you provide a code snippet for this? I saw current inference code can only input one image or a pair of image-text. Many thanks!

@KebinWu
Copy link

KebinWu commented Sep 30, 2024

I'm not sure if the code supports doing so, but at least I don't expect the model to perform well on such tasks, as interleaved samples are not used in the training.

@Sierkinhane
Copy link
Collaborator

Hi, mixed-modality generation will be released in the future but the timeline is still undetermined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants