RobertaTokenizerFast object has no attribute 'split_special_tokens' #501

swburge · 2024-11-14T11:24:36Z

Hi there,

I'm having trouble running MedCAT for deidentification after some system upgrades. I have python 3.11, transformers 4.46.2, tokenizers 0.20.3 and medcat 1.13.1, and I'm using a model pack that works very well on medcat 1.7.2

I see that the deid code has changed slightly, and now using:
from medcat.utils.ner import deid' 'from medcat.cat import CAT' 'deid=DeIdModel.create("./modelpack.zip")' 'anon_text=deid.deid_text(foo)

results in AttributeErro: 'RobertaTokenizerFast' object has no attribute 'split_special_tokens'. Did you mean: 'all_special_tokens'?

I think this is an issue with the transformers or tokenizer libraries, but I'm not sure I understand what's going on. The datasets and models work perfectly with previous versions of medcat, transformers (4.21.3) and tokenisers (0.12.1).

The text was updated successfully, but these errors were encountered:

mart-r · 2024-11-14T11:49:11Z

Hi,

The fix for this should have been released in 1.13.1 with #490 . But for some reason it went out with the fix commented out.

Bare with me as I try and rectify the issue.

mart-r · 2024-11-14T12:06:26Z

I've created a PR to fix this (#502).

If you need a fix now, you can install medcat based on the fixed PR:

pip install git+https://github.com/CogStack/MedCAT.git@CU-8696n7w95-fix-deid-comment

PS: That's not entirely the same state as the 1.13.1 release - it's got a few more things added to it since it's based on the master branch (#489, #485, #486, #469, #492, #497, #498, #479). So if that doesn't work for you, you can wait for a patch release (1.13.2).

swburge · 2024-11-14T12:51:22Z

Thank you, your fix works beautifully and I very much appreciate the speedy response. Do you know when pypi will be updated with an official release? I need to deploy this in a secure environment, and I can't access code via GitHub, only via pypi and after review.
Thanks again!

mart-r · 2024-11-14T13:26:45Z

I hope to do a patch release later today to incorporate the fix.

But I can't fully guarantee that since it also relies on other people reviewing the aforementioned PR before I can merge it in and push a release.

mart-r · 2024-11-15T14:33:57Z

Hi @swburge

I've now been able to push a patch release and 1.13.2 is now available on PyPI as well.

Let me know if you experience any further issues.

swburge · 2024-11-15T14:46:55Z

Thanks so much - it works perfectly. Closing this issue now.

mart-r mentioned this issue Nov 14, 2024

CU-8696n7w95: Remove commented code to fix DeID (oversight in PR 490) #502

Merged

swburge closed this as completed Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RobertaTokenizerFast object has no attribute 'split_special_tokens' #501

RobertaTokenizerFast object has no attribute 'split_special_tokens' #501

swburge commented Nov 14, 2024

mart-r commented Nov 14, 2024

mart-r commented Nov 14, 2024

swburge commented Nov 14, 2024

mart-r commented Nov 14, 2024

mart-r commented Nov 15, 2024 •

edited

Loading

swburge commented Nov 15, 2024

RobertaTokenizerFast object has no attribute 'split_special_tokens' #501

RobertaTokenizerFast object has no attribute 'split_special_tokens' #501

Comments

swburge commented Nov 14, 2024

mart-r commented Nov 14, 2024

mart-r commented Nov 14, 2024

swburge commented Nov 14, 2024

mart-r commented Nov 14, 2024

mart-r commented Nov 15, 2024 • edited Loading

swburge commented Nov 15, 2024

mart-r commented Nov 15, 2024 •

edited

Loading