Generation inference with interleaved input #35

ys-zong · 2024-09-29T05:01:45Z

Hi, thanks for the nice work! I wonder if Show-o supports inference with interleaved multimodal inputs, e.g., [text 1] [image 1] [text 2] [image 2] [text 3] -> generate a new image. If so, can you provide a code snippet for this? I saw current inference code can only input one image or a pair of image-text. Many thanks!

KebinWu · 2024-09-30T11:13:10Z

I'm not sure if the code supports doing so, but at least I don't expect the model to perform well on such tasks, as interleaved samples are not used in the training.

Sierkinhane · 2024-10-01T07:17:12Z

Hi, mixed-modality generation will be released in the future but the timeline is still undetermined.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generation inference with interleaved input #35

Generation inference with interleaved input #35

ys-zong commented Sep 29, 2024

KebinWu commented Sep 30, 2024

Sierkinhane commented Oct 1, 2024

Generation inference with interleaved input #35

Generation inference with interleaved input #35

Comments

ys-zong commented Sep 29, 2024

KebinWu commented Sep 30, 2024

Sierkinhane commented Oct 1, 2024