You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hOCR files converted from PAGE have each TextEquiv in them, as opposed one variant, and, for fontshape - the style determined by fontshape.
I start with an empty workspace, add an image to it, and run ocrd process "tesserocr-recognize -P segmentation_level region -P textequiv_level word -P find_tables true -P model pol -I images -O OCR-D-OCR"
then I annotate it with ocrd-tesserocr-fontshape -I OCR-D-OCR -O OCR-D-OCR-FONTSHAPE -P model pol
and finally, convert it to hocr ocrd-fileformat-transform -I OCR-D-OCR-FONTSHAPE -O hocr -P from-to "page hocr"
The resulting file has the words/segments doubled, and when fontshape is used - tripled.
The text was updated successfully, but these errors were encountered:
hOCR files converted from PAGE have each TextEquiv in them, as opposed one variant, and, for fontshape - the style determined by fontshape.
I start with an empty workspace, add an image to it, and run
ocrd process "tesserocr-recognize -P segmentation_level region -P textequiv_level word -P find_tables true -P model pol -I images -O OCR-D-OCR"
then I annotate it with
ocrd-tesserocr-fontshape -I OCR-D-OCR -O OCR-D-OCR-FONTSHAPE -P model pol
and finally, convert it to hocr
ocrd-fileformat-transform -I OCR-D-OCR-FONTSHAPE -O hocr -P from-to "page hocr"
The resulting file has the words/segments doubled, and when fontshape is used - tripled.
The text was updated successfully, but these errors were encountered: