-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parametrization for the detokenization/decoding #1246
base: master
Are you sure you want to change the base?
Conversation
f3b7732
to
e46466d
Compare
// If user requested add_special_tokens mode different from the current one, | ||
// need to set state variable. | ||
// If requested mode matches the stored state set, then don't touch states. | ||
if (add_special_tokens == m_add_special_tokens) { | ||
if (add_special_tokens_flag == m_add_special_tokens && skip_special_tokens_flag == m_skip_special_tokens) { | ||
return; | ||
} | ||
if (m_older_than_24_5) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does m_skip_special_tokens work with IRs produced by 2024.4 and older?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this PR depends on openvinotoolkit/openvino_tokenizers#325, than we need to add condition for 2025.0 for skip_special_tokens
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they will will not with IR older that 2024.4 and older,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about 2025.0 condition, need to discuss with @apaniukov
@@ -217,3 +217,25 @@ def test_add_special_tokens(add_special_tokens, prompt): | |||
res_genai = genai_tokenzier.encode(prompt, add_special_tokens).input_ids.data | |||
res_hf = hf_tokenizer(prompt, return_tensors="np", add_special_tokens=add_special_tokens)["input_ids"] | |||
assert np.all(res_genai == res_hf) | |||
|
|||
@pytest.mark.precommit | |||
@pytest.mark.xfail(reason="Need to turn them back on when openvino_tokenizers will be updated.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -87,23 +87,59 @@ class OPENVINO_GENAI_EXPORTS Tokenizer { | |||
/** | |||
* @brief decode sequence of tokens | |||
* @param tokens vector storing tokens | |||
* @param tokenization_params AnyMap with detokenization parameters, e.g. {"skip_special_tokens", false} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @param tokenization_params AnyMap with detokenization parameters, e.g. {"skip_special_tokens", false} | |
* @param detokenization_params AnyMap with detokenization parameters, e.g. ov::genai::skip_special_tokens(false) |
/** | ||
* @brief decode tokens. | ||
* @param tokens ov::Tensor with tokens with shape [batch_size, seq_len] | ||
* @param tokenization_params AnyMap with detokenization parameters, e.g. {"skip_special_tokens", false} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, correct docs in all places.
}, | ||
py::arg("tokens"), | ||
py::arg("tokens"), py::arg("skip_special_tokens") = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's also incorrect, because actual default depends on how de-tokenize is converted
Tokenizers IRs should be converted after openvinotoolkit/openvino_tokenizers#325 is merged
Ticket CVS-154151