-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RobertaTokenizerFast object has no attribute 'split_special_tokens' #501
Comments
Hi, The fix for this should have been released in 1.13.1 with #490 . But for some reason it went out with the fix commented out. Bare with me as I try and rectify the issue. |
I've created a PR to fix this (#502). If you need a fix now, you can install
PS: That's not entirely the same state as the |
Thank you, your fix works beautifully and I very much appreciate the speedy response. Do you know when pypi will be updated with an official release? I need to deploy this in a secure environment, and I can't access code via GitHub, only via pypi and after review. |
I hope to do a patch release later today to incorporate the fix. But I can't fully guarantee that since it also relies on other people reviewing the aforementioned PR before I can merge it in and push a release. |
Hi @swburge I've now been able to push a patch release and 1.13.2 is now available on PyPI as well. Let me know if you experience any further issues. |
Thanks so much - it works perfectly. Closing this issue now. |
Hi there,
I'm having trouble running MedCAT for deidentification after some system upgrades. I have python 3.11, transformers 4.46.2, tokenizers 0.20.3 and medcat 1.13.1, and I'm using a model pack that works very well on medcat 1.7.2
I see that the deid code has changed slightly, and now using:
from medcat.utils.ner import deid' 'from medcat.cat import CAT' 'deid=DeIdModel.create("./modelpack.zip")' 'anon_text=deid.deid_text(foo)
results in
AttributeErro: 'RobertaTokenizerFast' object has no attribute 'split_special_tokens'. Did you mean: 'all_special_tokens'?
I think this is an issue with the transformers or tokenizer libraries, but I'm not sure I understand what's going on. The datasets and models work perfectly with previous versions of medcat, transformers (4.21.3) and tokenisers (0.12.1).
The text was updated successfully, but these errors were encountered: