Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does show-o support multimodal-in multimodal-out? #27

Open
URRealHero opened this issue Sep 10, 2024 · 6 comments
Open

Does show-o support multimodal-in multimodal-out? #27

URRealHero opened this issue Sep 10, 2024 · 6 comments

Comments

@URRealHero
Copy link

Like what I said, does it support the title? does it multimodal-in, multimodal-out(with multi images)?

@URRealHero
Copy link
Author

I've noticed that Mixed-modal generation is a pending requested feature, so can I write script to do that? or the model do not have the ability now.

@Sierkinhane
Copy link
Collaborator

Sierkinhane commented Sep 11, 2024

Hi, we have explored mixed-modality generation. However, we still have not uploaded such a pre-trained weight to this version. We consider it in the next update but we are not sure about the timeline.

@URRealHero
Copy link
Author

thx very much!so is this process instruction tunning part or I have to pretrain it again?

@Sierkinhane
Copy link
Collaborator

Hey, it is included in the instruction tuning stage.

@URRealHero
Copy link
Author

Thx a lot! I'll try to do that~

@URRealHero
Copy link
Author

Hi there, I don't understand how to use a new dataset to finetune the model.
1.Do I need to pretrain from the beginning to get the checkpoint for stage2? if not, where can I get the pretrained params
2. For a new dataset, do I have to create a similar instruction-tuning yaml like yours for LLava-tuning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants