-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to change character set? #12
Comments
For Chinese characters, I changed VOCA_SIZE to the length of chn_cls_list, which going around many Chinese spotting repositories, and set BATEXT SIZE, CLS to the voca_size and chn_cls. I will try other languages too and let you know if it works. hope it helps you. |
Thank you very much for sharing Did the pre-trained model use the polygonal model learned with syntext data(english only)? I'm trying to learn in Korean. Currently, the loss of ctrl points is very high at 40-50, so is there any tip to lower it? |
I trained the model from scratch for the Chinese dataset. If train goes long, it may fit to Korean, I am not sure |
@jeong-tae Thanks for sharing your experience. Can the model achieve good results when switching to a Chinese dataset with a large number of categories? Could you share your results if possible. Thanks in advance. |
@milely On Chinese character set, size 5700(not sure), it works very well. I submitted the results on Icdar ReCTS and it was ranked... maybe 7? or 10? anyway, it works well. |
@jeong-tae Thank you very much for sharing, I will also try other languages. |
You're a god. Thank you so much I'm also experimenting with Chinese, but I keep getting errors. I'll analyze the code on my own. Please |
@ninoogo2 sorry i cant share my code but you can easily modify your code to make it work. just set your character set |
Thanks for the interest of you all. @jeong-tae's approach is correct. I'd like to add that you may refer to AdelaiDet (which contains ABCNet and ABCNet v2 implementations) for training on non-Latin datasets, e.g. Chinese. Link: https://github.com/aim-uofa/AdelaiDet/blob/master/configs/BAText/ReCTS/v2_chn_attn_R_50.yaml#L17-L18 A larger Pretraining can be leveraged to enhance performance. You may use a mix of ChnSyn, ReCTS, LSVT datasets as in ABCNet (https://github.com/aim-uofa/AdelaiDet/blob/master/configs/BAText/Pretrain/Base-Chn-Pretrain.yaml) and finetune on ReCTS. Since the annotations provided by ABCNet are Bezier curves, it is compatible with the Bezier variant of our model if you don't want to convert annotations. |
I am trying to train with Korean dataset and I can't find the issue you mentioned. Anyway, loss is so big that I failed to train well for Korean set. |
@jeong-tae |
@learningsteady0J0 are you trying to reproduce icdar15 result? if you trained with a Korean set and then evaluated on icdar15, ...I don't know. These two may have different distributions so you can't tune precisely. I found that my Korean set has an error and fixed it. It seems it will work well. |
@Zalways it's been a while that I did. hmm... I think that's all you need. if you set the path correctly for training, it will work |
Hi, I'd like to train with different language datasets, such as Chinese, Korean, and Japanese, so I have to change the character set rather than the default setting.
Some detectron based models give character set configuration, but I can't find it here.
Can you guide me on how to change the character set?
The text was updated successfully, but these errors were encountered: