-
-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add resize_token_embeddings feature #1965
Comments
May I ask what is the use case? We currently resize to the tokenizer's length (or multiple of 32 to it if enabled). |
Can be usefull for a tokenizer/custom tokenizer and vocab_size mismatch. |
@ccdv-ai , just to clarify, would you want a config that lets you specify the new tokenizer vocab size or just to resize? Axolotl does the latter* under the hood when you add new tokens. axolotl/src/axolotl/utils/models.py Lines 1039 to 1053 in 8c3a727
If you enable |
@NanoCode012 Only the option to choose and resize the token_embeddings to an arbitrary value. if self.cfg.resize_token_embeddings_to < len(self.tokenizer):
#Warning or stop
self.model.resize_token_embeddings(self.cfg.resize_token_embeddings_to, **resize_kwargs) |
@ccdv-ai thanks for clarifying. To add on to your point, we already do resize to tokenizer. # above code but summarized
embeddings_len = len(self.tokenizer)
if (
self.model.get_input_embeddings().num_embeddings < embeddings_len
):
self.model.resize_token_embeddings(embeddings_len) For resizing to another value ( |
🔖 Feature description
Add the option to resize the token embeddings:
PreTrainedModel
has this method.✔️ Solution
❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: