-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Fix prepare icdar2013 dataset with lmdb flag #1822
base: dev-1.x
Are you sure you want to change the base?
[Bug] Fix prepare icdar2013 dataset with lmdb flag #1822
Conversation
…g prepare_dataset.py on test set option 1 when lmdb flag
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## dev-1.x #1822 +/- ##
========================================
Coverage 89.19% 89.19%
========================================
Files 193 193
Lines 11311 11311
Branches 1607 1607
========================================
Hits 10089 10089
Misses 902 902
Partials 320 320
Flags with carried forward coverage won't be shown. Click here to find out more. Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In IC13, there are some wrong labels in the original annotation and they are manually fixed by researchers. Therefore, these arguments are intentionally left empty, as we just use the corrected labels (L27-L33) instead of the generated ones.
That is to say, you should be aware of a few wrong labels in your lmdb now, and this PR needs to be revised to support such a case.
Thanks for the reminder. Is there any known issue on other datasets besides icdar 2013 so I can exclude them in training? |
@hugotong6425 Currently not. But it's also fine to mix lmdb and json datasets for training & testing, so you can probably just work with IC13 json annotations. Back to this PR - do you still want to go ahead and enhance the lmdb flag so that it could handle this case - essentially just doing a conversion from |
I can help to handle this case. I am not very familiar to the framework. Should I create a new Parser class to replace # original code
gatherer=dict(type='MonoGatherer', ann_name='train.txt'),
parser=dict(
type='ICDARTxtTextRecogAnnParser', separator=', ',
format='img, text'), # noqa
packer=dict(type='TextRecogPacker'),
dumper=dict(type='JsonDumper'), |
Thanks. Actually, it can be done in a simpler way. Now dumping textrecog datasets in lmdb is done in mmocr/tools/dataset_converters/prepare_dataset.py Lines 83 to 101 in bb591d2
However, it fails to work when we don't want the annotation to be converted and the dumper will never be executed. In this case, we need to perform post-conversion on the annotations & images into lmdb format (right after
textrecog_{train/test}.lmdb . We also have a function recog2lmdb for reference, but note that its input annotation format is slightly different from MMOCR's.
Also, if you are interested, you can join MMSIG group (Wechat: OpenMMLabwx) and earn some points for your contributions :) |
…n running prepare_dataset.py on test set option 1 when lmdb flag" This reverts commit 0fcac7e.
cfg.dataset_name = dataset
if args.lmdb:
cfg = force_lmdb(cfg)
preparer = DatasetPreparer.from_file(cfg)
# convert annotations & images into lmdb format
img_root = get_img_root(cfg)
label_path = get_label_path(cfg)
output = get_output_dir(cfg)
func_similar_to_recog2lmdb(img_root, label_path, output)
preparer.run(args.splits) I am not sure if I get you correctly. But if we perform post-conversion on the annotations & images into lmdb format right after
Please let me know if I understand incorrectly. Thanks! |
My mistake. The conversion should be done after L149 ( |
Hi @hugotong6425 , would you still be available to work on this PR? |
Yes I can, but I am currently on vacation so I will come back to it next week. If others want to handle this issue, please feel free to close this issue. Happy Easter holiday btw! |
Gotcha. I just wanted to know if you are still interested - and now we can keep this PR opening then. Happy Easter too! |
Motivation
When I run
with
I got an error
Modification
Fix the above error.
Checklist
Before PR:
After PR: