Added model_max_length as a tok parameter #113

aittalam · 2024-08-05T18:24:04Z

Since 4.40.0, transformers does not get model_max_tokens from the model's family dict but explicitly looks for this parameter in tokenizer_config.json. Not all models provide this, so one possibility is to explicitly allow users to provide this parameter in the config. This PR implements exactly this.

Tested with facebook/bart-large-cnn + pytest unit/integration tests.

binaryaaron

approving given the internal constraints; double check a few things before landing though. I'd suggest adding an extra test to the unit tests to check for it.

src/lm_buddy/jobs/model_clients.py

aittalam · 2024-08-06T13:14:54Z

approving given the internal constraints; double check a few things before landing though. I'd suggest adding an extra test to the unit tests to check for it.

Thanks! I have two more PRs in queue:

bump of all libs (this will allow us, anmong other things, to run more recent models)
adding text generation pipeline + prompts

Will make unit tests for both and add one for this change too (explicitly passing a longer text as input and verifying it does not break with the new setup)

aittalam added 2 commits August 5, 2024 19:20

Added model_max_length as a tok parameter

16b447a

Version bump

74ebc65

aittalam requested review from veekaybee and binaryaaron August 5, 2024 18:24

Fixed linting error

27aed5f

binaryaaron approved these changes Aug 5, 2024

View reviewed changes

src/lm_buddy/jobs/model_clients.py Show resolved Hide resolved

veekaybee approved these changes Aug 6, 2024

View reviewed changes

aittalam merged commit 79c8e98 into main Aug 6, 2024
3 of 4 checks passed

aittalam deleted the davide/summarizer_truncation branch August 6, 2024 13:15

aittalam mentioned this pull request Aug 6, 2024

Fix index out of bound error if input size is greater than the maximu… #105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added model_max_length as a tok parameter #113

Added model_max_length as a tok parameter #113

aittalam commented Aug 5, 2024

binaryaaron left a comment

aittalam commented Aug 6, 2024

Added model_max_length as a tok parameter #113

Added model_max_length as a tok parameter #113

Conversation

aittalam commented Aug 5, 2024

binaryaaron left a comment

Choose a reason for hiding this comment

aittalam commented Aug 6, 2024