Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fine-tuning the mmprojector #39

Closed
lxr-1204 opened this issue Sep 24, 2024 · 3 comments
Closed

fine-tuning the mmprojector #39

lxr-1204 opened this issue Sep 24, 2024 · 3 comments

Comments

@lxr-1204
Copy link

Thank you for your outstanding work, which has allowed me to quickly start my fine-tuning process. However, I have the following two questions:

  1. In the LoRA fine-tuning of the LLaVA series, most fine-tuning involves applying LoRA to the LLM while fully fine-tuning the mmprojector. However, in your work, I didn't seem to find parameters for controlling adjustments to the mmprojector.

  2. I have previously tried various fine-tuning codes, including LLaVA-Next, using 4 L20 (48G) GPUs. In these 7B models, my LoRA parameters were set as r=128, alpha=256, maxlength=8096. However, in your project, when I fine-tune llava-interleave-qwen-7b-hf, I can only set r=8, alpha=8, maxlength=1024; otherwise, it results in an OOM error.

@zjysteven
Copy link
Owner

zjysteven commented Sep 25, 2024

  1. Currently we only support full finetuning of mmprojector, while LLM can be fully or LORA finetuned.
  2. This repo is based on huggingface (HF) implementations of all the models. One caveat of HF implementations is that they did not count vision tokens into maxlength, which is different from the original implementation of LLaVA. This means that, for example for LLaVA-1.5, maxlength=100 in HF implementation is actually equivalent to maxlength=100 + 576 in the original implementation. This would be the first reason that may have caused OOM if you were using a very large maxlength. In more recent transformers library releases it starts to count vision tokens into maxlength. However I haven't got chance to reflect this update in the repo.

When I'm available (hopefully in the next 2-3 weeks) I will make a major refactor regarding problem 2.

@lxr-1204
Copy link
Author

🎉 Thank you for addressing my questions so promptly. I look forward to seeing even more outstanding results from you.

@zjysteven
Copy link
Owner

zjysteven commented Oct 18, 2024

Regarding point 2, I have made the updates, but there could be some caveats that one should be careful about, which is thoroughly discussed in #43.

Closing now. Feel free to reopen if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants