OCR transformer model

Textline recognition model, implemented using PyTorch, specialised for the recognition of multi-script and multi-language lines containing Polytonic Greek and other scripts/languages.

This custom model was trained with ~6.2M of articially generated lines, as well as 350k real-world lines. It reaches a character-level accuracy of 98.2% on lines containing mixed Latin and Greek alphabets (+8% improvement with respect to our Tesseract baseline).

The model will be released in early 2025, together with data and documentation.

Acknowledgements

Code & data in this repository were produced in the context of the Ajax Multi-Commentary project, funded by the Swiss National Science Foundation under an Ambizione grant PZ00P1_186033.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR transformer model

Acknowledgements

About

Releases

Packages

License

AjaxMultiCommentary/OCR-transformer-model

Folders and files

Latest commit

History

Repository files navigation

OCR transformer model

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages