How to change character set? #12

jeong-tae · 2022-09-21T05:41:02Z

Hi, I'd like to train with different language datasets, such as Chinese, Korean, and Japanese, so I have to change the character set rather than the default setting.

Some detectron based models give character set configuration, but I can't find it here.
Can you guide me on how to change the character set?

jeong-tae · 2022-10-18T07:05:07Z

For Chinese characters, I changed VOCA_SIZE to the length of chn_cls_list, which going around many Chinese spotting repositories, and set BATEXT SIZE, CLS to the voca_size and chn_cls.
It works.

I will try other languages too and let you know if it works. hope it helps you.

learningsteady0J0 · 2022-10-19T06:45:36Z

@jeong-tae

Thank you very much for sharing

Did the pre-trained model use the polygonal model learned with syntext data(english only)?

I'm trying to learn in Korean. Currently, the loss of ctrl points is very high at 40-50, so is there any tip to lower it?

jeong-tae · 2022-10-19T12:32:58Z

I trained the model from scratch for the Chinese dataset.
I think pre-trained from syntext_data(English only) may not be trained well with the Korean dataset.
Because the recognizer for the English dataset and Korean do not have the same labels.
(For example, if 'A', 'B', 'C' have class 1, 2, 3 each in English, but in Korean, may have 10, 11, 12 like that)

If train goes long, it may fit to Korean, I am not sure

milely · 2022-11-10T08:01:00Z

@jeong-tae Thanks for sharing your experience. Can the model achieve good results when switching to a Chinese dataset with a large number of categories? Could you share your results if possible. Thanks in advance.

jeong-tae · 2022-11-10T09:08:35Z

@milely On Chinese character set, size 5700(not sure), it works very well. I submitted the results on Icdar ReCTS and it was ranked... maybe 7? or 10? anyway, it works well.

milely · 2022-11-10T09:38:39Z

@jeong-tae Thank you very much for sharing, I will also try other languages.

ninoogo2 · 2022-11-21T06:53:45Z

@jeong-tae

You're a god. Thank you so much

I'm also experimenting with Chinese, but I keep getting errors.
Could you share the code you used?

I'll analyze the code on my own. Please
May there be infinite glory in your future

jeong-tae · 2022-11-27T05:13:16Z

@ninoogo2 sorry i cant share my code but you can easily modify your code to make it work. just set your character set

zx1239856 · 2022-12-01T22:22:13Z

Thanks for the interest of you all. @jeong-tae's approach is correct.

I'd like to add that you may refer to AdelaiDet (which contains ABCNet and ABCNet v2 implementations) for training on non-Latin datasets, e.g. Chinese.

Link: https://github.com/aim-uofa/AdelaiDet/blob/master/configs/BAText/ReCTS/v2_chn_attn_R_50.yaml#L17-L18

A larger VOC_SIZE (5000+) is used with a custom dictionary for inference and evaluation.

Pretraining can be leveraged to enhance performance. You may use a mix of ChnSyn, ReCTS, LSVT datasets as in ABCNet (https://github.com/aim-uofa/AdelaiDet/blob/master/configs/BAText/Pretrain/Base-Chn-Pretrain.yaml) and finetune on ReCTS. Since the annotations provided by ABCNet are Bezier curves, it is compatible with the Bezier variant of our model if you don't want to convert annotations.

jeong-tae · 2022-12-06T11:50:03Z

@learningsteady0J0

I am trying to train with Korean dataset and I can't find the issue you mentioned.
All the training losses are higher than other languages, I think this is because I trained this from scratch without pertaining.
ctr point loss is little high but not 40~50...

Anyway, loss is so big that I failed to train well for Korean set.

learningsteady0J0 · 2022-12-08T13:20:19Z

@jeong-tae
I really appreciate your attention!
I think I improved the performance a little by continuing the experiment.
However, there is still a problem, the precision value in detection is relatively low. Is there a strategic way to raise it?

jeong-tae · 2022-12-09T01:44:50Z

@learningsteady0J0 are you trying to reproduce icdar15 result?
if you follow the experiment well, then the result shows a good result

if you trained with a Korean set and then evaluated on icdar15, ...I don't know. These two may have different distributions so you can't tune precisely.

I found that my Korean set has an error and fixed it. It seems it will work well.

Zalways · 2023-05-30T01:50:08Z

For Chinese characters, I changed VOCA_SIZE to the length of chn_cls_list, which going around many Chinese spotting repositories, and set BATEXT SIZE, CLS to the voca_size and chn_cls. It works.

I will try other languages too and let you know if it works. hope it helps you.

you mentioned set BATEXT SIZE, CLS to the voca_size and chn_cls, i don't find this parameter in the default config files,could you help me with Chinese config file?

i just add these setting into the config file ,any thing else i need to add?
looking forward to your reply! thanks!

jeong-tae · 2023-05-30T12:37:37Z

@Zalways it's been a while that I did. hmm... I think that's all you need. if you set the path correctly for training, it will work

zx1239856 mentioned this issue Dec 1, 2022

Did you use other scripts beside Latin in MLT2017 for pre-training? #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to change character set? #12

How to change character set? #12

jeong-tae commented Sep 21, 2022

jeong-tae commented Oct 18, 2022

learningsteady0J0 commented Oct 19, 2022

jeong-tae commented Oct 19, 2022

milely commented Nov 10, 2022

jeong-tae commented Nov 10, 2022

milely commented Nov 10, 2022

ninoogo2 commented Nov 21, 2022

jeong-tae commented Nov 27, 2022

zx1239856 commented Dec 1, 2022

jeong-tae commented Dec 6, 2022

learningsteady0J0 commented Dec 8, 2022 •

edited

Loading

jeong-tae commented Dec 9, 2022

Zalways commented May 30, 2023

jeong-tae commented May 30, 2023

How to change character set? #12

How to change character set? #12

Comments

jeong-tae commented Sep 21, 2022

jeong-tae commented Oct 18, 2022

learningsteady0J0 commented Oct 19, 2022

jeong-tae commented Oct 19, 2022

milely commented Nov 10, 2022

jeong-tae commented Nov 10, 2022

milely commented Nov 10, 2022

ninoogo2 commented Nov 21, 2022

jeong-tae commented Nov 27, 2022

zx1239856 commented Dec 1, 2022

jeong-tae commented Dec 6, 2022

learningsteady0J0 commented Dec 8, 2022 • edited Loading

jeong-tae commented Dec 9, 2022

Zalways commented May 30, 2023

jeong-tae commented May 30, 2023

learningsteady0J0 commented Dec 8, 2022 •

edited

Loading