-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen2.5 LoRA Extraction not working in vLLM & Aphrodite Engine #459
Comments
It doesn't like the input and output embeddings in the LoRA adapter. They are valid to have in a LoRA, but it is a bit weird it lists them both twice?! Can you try commenting out these two if module == pretrained_model.get_input_embeddings():
# if isinstance(module, torch.nn.Embedding):
pass #module_details.append(("embedding", name, module.weight.size()))
elif module == pretrained_model.get_output_embeddings():
# if isinstance(module, torch.nn.Embedding):
pass #module_details.append(("output", name, module.weight.size())) and see if the LoRA it creates works OK? Also can you tell me what the peak VRAM use is with these commented out to try to help with your other problem of high VRAM use? If it is just these causing a problem then I can easily add a command line option to skip the input/output embeddings, but if it still uses a lot of VRAM it must be something in the SVD function that upcasts some stuff to The "doubling listing" in the exception, makes me think it could also be something to do with having tied input/output tensors, but I think only the very tiny You can tell if you look in the "tie_word_embeddings": false or in the
|
Will try this and get back to you. Thanks! |
Usually you can use LoRA extraction in mergekit and then run the LoRAs in vLLM or Aphrodite Engine just fine. This works for Llama and Mistral models so far, but it seems like this isn't working for Qwen2.5 models?
If I use my LoRA created from LoRA training using Axolotl, vLLM and Aphrodite Engine runs Qwen LoRAs just fine.
The extraction seems to work without issues too, just cannot be used.
Error traceback from Aphrodite Engine trying to run Qwen2.5-7B lora:
Full traceback:
The text was updated successfully, but these errors were encountered: