Skip to content

Simplify using uzn files with tesseract for OCR

Notifications You must be signed in to change notification settings

scourtois/tesseract-uzn

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tesseract-uzn: Easy uzn files with tesseract

Simplify the process of feeding zone files (.uzn files) to tesseract for region-based OCR. It will create a text file for every image you're dealing with.

Usage

tesseract-uzn uznfile.uzn imagefile.png

Use on multiple files

We now support wildcards! Hooray.

tesseract-uzn uznfile.uzn *.png

Installation

Download as a zip file using the button on the top right, then unzip it. After that, cd into that folder move it to /usr/local/bin with the following command

mv tesseract-uzn /usr/local/bin

That way no matter where you are when you type tesseract-uzn it'll be available.

Testing

Just run this. It is a little chatty.

./test.sh

Backstory

If you want to only have tesseract pay attention to certain areas of your image, you use a uzn file. They look like this:

  395   368  1633    78 Text/Latin
 2030   368  1634    78 Text/Greek
  388   478  1633  2275 Text/Greek
 2031   478  1634  2275 Text/Latin
  396  2852  1633  1002 Text/Greek
 2018  2852  1634  1002 Text/Latin
  471  3960  1565    75 Text/Latin
 1639  4141   685    62 AppCrit
  394  4293  3249  1482 AppCrit
 4078   462     5   606 AppCrit

When you're using a zone file, though, the .uzn needs to be the same name as the image file. That seems complicated, so I made this tool to make things a little more simple.

Installation

Put this in your PATH if you'd like, and run it as above! You'll need to have installed tesseract first.

You can use Kull to make UZN files if you'd like.

TODO

  • non-stdout input
  • Windows support
  • Accept wildcards
  • Tests

About

Simplify using uzn files with tesseract for OCR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%