Welcome to our repository, where we have compiled a diverse range of Layout Analysis (LA) and Optical Character Recognition (OCR) models. This collection is aimed at providing researchers, developers, and hobbyists with easy access to a variety of OCR models.
Optical Character Recognition (OCR) is a field of study that involves the conversion of typed, handwritten, or printed text into machine-encoded text. OCR technology is used to digitize printed texts, so that they can be electronically edited, searched, stored more compactly, and used in machine processes such as machine translation, text-to-speech, and data mining.
We have LA und OCR models for different OCR-Engines
The structure of the repo is the following:
├── LICENSE.md
└── data
└── OCR-Model as submodule
Here's our OCR Model Catalogue:
Model | OCR-Engine | Type of model | Description | Default model |
---|---|---|---|---|
German print | Kraken | Text recognition | Kraken model for german prints trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print | Download |
German print | Tesseract | Text recognition | OCR model for german prints trained from several datasets. Best model variant for Tesseract. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print | Download |
German print | Tesseract | Text recognition | OCR model for german prints trained from several datasets. Fast model variant for Tesseract. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print | Download |
German newspapers | Kraken | Text recognition | Kraken model with kraken topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print | Download |
German newspapers | Kraken | Text recognition | Kraken model with sgd topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print | Download |
German newspapers | Kraken | Text recognition | Kraken model with htr+ topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print | Download |
German newspapers | Kraken | Text recognition | Kraken model with htru topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print | Download |
German newspapers | Kraken | Text recognition | Kraken model with gpt topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print | Download |
German newspapers | Kraken | Text recognition | Kraken (default) model for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print | Download |
German newspapers | Tesseract | Text recognition | OCR model for german newspapers trained from several datasets. Best model variant for Tesseract. See https://github.com/UB-Mannheim/kraken/wiki/Training-german-newspapers | Download |
German newspapers | Tesseract | Text recognition | OCR model for german newspapers trained from several datasets. Fast model variant for Tesseract. See https://github.com/UB-Mannheim/kraken/wiki/Training-german-newspapers | Download |
UBMA Segmentation | Kraken | Layout analysis | Kraken segmentation model for a wide range of materials. | Download |
Historical Reports 2col | Kraken | Layout analysis | A Kraken segmentation model for 2 column layout. | Download |
See the LICENSE file in the repository for more details.
For any queries or suggestions, feel free to open an issue in this repository, or contact us at OCR-Helpdesk. Thank you for exploring our OCR Models Collection. We hope this repository aids you in your text recognition projects and research!