Skip to content

Commit

Permalink
[Automatic] Update README and METADATA
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions[bot] committed Feb 8, 2024
1 parent 3fd5deb commit af9a4ac
Show file tree
Hide file tree
Showing 19 changed files with 1,116 additions and 5 deletions.
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,25 @@ The structure of the repo is the following:

Here's our OCR Model Catalogue:


<!-- Models !-->
## 📂 Models
Model|OCR-Engine|Type of model|Description|Default model
---|---|---|---|---
[German print](data/german-print/data/kraken/text/german_print)|Kraken|Text recognition|Kraken model for german prints trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print|<a href="https://github.com/JKamlah/german-print-ocr-model/tree/main/data/kraken/text/german_print/german_print.mlmodel" download>Download</a>
[German print](data/german-print/data/tesseract/best/german_print)|Tesseract|Text recognition|OCR model for german prints trained from several datasets. Best model variant for Tesseract. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print|<a href="https://github.com/JKamlah/german-print-ocr-model/tree/main/data/tesseract/best/german_print/german_print_20.traineddata" download>Download</a>
[German print](data/german-print/data/tesseract/fast/german_print)|Tesseract|Text recognition|OCR model for german prints trained from several datasets. Fast model variant for Tesseract. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print|<a href="https://github.com/JKamlah/german-print-ocr-model/tree/main/data/tesseract/fast/german_print/german_print_20.traineddata" download>Download</a>
[German newspapers](data/german-newspapers/data/kraken/text/german_newspapers_topologies/kraken)|Kraken|Text recognition|Kraken model with kraken topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print|<a href="https://github.com/JKamlah/german-newspapers-ocr-model/tree/main/data/kraken/text/german_newspapers/german_newspapers_kraken.mlmodel" download>Download</a>
[German newspapers](data/german-newspapers/data/kraken/text/german_newspapers_topologies/sgd)|Kraken|Text recognition|Kraken model with sgd topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print|<a href="https://github.com/JKamlah/german-newspapers-ocr-model/tree/main/data/kraken/text/german_newspapers/german_newspapers_sgd.mlmodel" download>Download</a>
[German newspapers](data/german-newspapers/data/kraken/text/german_newspapers_topologies/htr+)|Kraken|Text recognition|Kraken model with htr+ topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print|<a href="https://github.com/JKamlah/german-newspapers-ocr-model/tree/main/data/kraken/text/german_newspapers/german_newspapers_htr.mlmodel" download>Download</a>
[German newspapers](data/german-newspapers/data/kraken/text/german_newspapers_topologies/htru)|Kraken|Text recognition|Kraken model with htru topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print|<a href="https://github.com/JKamlah/german-newspapers-ocr-model/tree/main/data/kraken/text/german_newspapers/german_newspapers_htru.mlmodel" download>Download</a>
[German newspapers](data/german-newspapers/data/kraken/text/german_newspapers_topologies/gpt)|Kraken|Text recognition|Kraken model with gpt topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print|<a href="https://github.com/JKamlah/german-newspapers-ocr-model/tree/main/data/kraken/text/german_newspapers/german_newspapers_gpt.mlmodel" download>Download</a>
[German newspapers](data/german-newspapers/data/kraken/text/german_newspapers)|Kraken|Text recognition|Kraken (default) model for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print|<a href="https://github.com/JKamlah/german-newspapers-ocr-model/tree/main/data/kraken/text/german_newspapers/german_newspapers.mlmodel" download>Download</a>
[German newspapers](data/german-newspapers/data/tesseract/best/german_newspapers)|Tesseract|Text recognition|OCR model for german newspapers trained from several datasets. Best model variant for Tesseract. See https://github.com/UB-Mannheim/kraken/wiki/Training-german-newspapers|<a href="https://github.com/JKamlah/german-newspapers-ocr-model/tree/main/data/tesseract/best/german_newspapers/german_newspapers_2023.traineddata" download>Download</a>
[German newspapers](data/german-newspapers/data/tesseract/fast/german_newspapers)|Tesseract|Text recognition|OCR model for german newspapers trained from several datasets. Fast model variant for Tesseract. See https://github.com/UB-Mannheim/kraken/wiki/Training-german-newspapers|<a href="https://github.com/JKamlah/german-newspapers-ocr-model/tree/main/data/tesseract/fast/german_newspapers/german_newspapers_2023.traineddata" download>Download</a>
[UBMA Segmentation](data/ubma-segmentation/data/kraken/layout/ubma_segmentation)|Kraken|Layout analysis|Kraken segmentation model for a wide range of materials.|<a href="https://github.com/JKamlah/ubma-segmentation-ocr-model/blob/main/data/kraken/layout/ubma_segmentation/ubma_segmentation.mlmodel" download>Download</a>
[Historical Reports 2col](data/historical-reports-2col/data/kraken/layout/historical_reports_2col)|Kraken|Layout analysis|A Kraken segmentation model for 2 column layout.|<a href="https://github.com/JKamlah/historical-reports-2col-ocr-model/blob/main/data/kraken/layout/historical_reports_2col/historical_reports_2col.mlmodel" download>Download</a>

<!-- /Models !-->


Expand Down
1 change: 1 addition & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
theme: jekyll-theme-dinky
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<link rel="stylesheet" href="../../../../../../table_hide.css"/>
<div>
<h1 id="title">German newspapers</h1>
<p id="paragraph">Kraken (default) model for german newspapers trained from several datasets.
See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print</p>
<h2>Metadata</h2>
<dl class="grid">
<dt id="Language">OCR engine / software:</dt>
<dd>Kraken</dd>
<dt id="Type">Model type:</dt>
<dd>Text recognition</dd>
<dt id="Format">Format:</dt>
<dd>.mlmodel</dd>
<dt id="Topology">Topology:</dt>
<dd>[1,120,0,1 Cr{C_0}3,13,32 Do{Do_1}0.1,2 Mp{Mp_2}2,2 Cr{C_3}3,13,32 Do{Do_4}0.1,2 Mp{Mp_5}2,2 Cr{C_6}3,9,64 Do{Do_7}0.1,2 Mp{Mp_8}2,2 Cr{C_9}3,9,64 Do{Do_10}0.1,2 S{S_11}1(1x0)1,3 Lbx{L_12}200 Do{Do_13}0.1,2 Lbx{L_14}200 Do.{Do_15}1,2 Lbx{L_16}200 Do{Do_17} O{O_18}1c264]</dd>
<dt id="Creation">Creation:</dt>
<dd></dd>
<dt id="License">License:</dt>
<dd>PublicDomainMark 1.0 (see: https://creativecommons.org/publicdomain/mark/1.0/)</dd>
</dl>
<h2>Training</h2>
<dl class="grid">
<dt id="Training-type">Type of training:</dt>
<dd>From scratch</dd>
<dt id="Epochs">Epochs:</dt>
<dd>39</dd>
</dl>
</div>
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<link rel="stylesheet" href="../../../../../../../table_hide.css"/>
<div>
<h1 id="title">German newspapers</h1>
<p id="paragraph">Kraken model with gpt topology for german newspapers trained from several datasets.
See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print</p>
<h2>Metadata</h2>
<dl class="grid">
<dt id="Language">OCR engine / software:</dt>
<dd>Kraken</dd>
<dt id="Type">Model type:</dt>
<dd>Text recognition</dd>
<dt id="Format">Format:</dt>
<dd>.mlmodel</dd>
<dt id="Topology">Topology:</dt>
<dd>[1,120,0,1 Cr{C_0}3,3,32,1,1 Gn{Gn_1}32 Mp{Mp_2}2,2 Cr{C_3}3,3,64,1,1 Gn{Gn_4}64 Mp{Mp_5}2,2,2,2 Cr{C_6}3,3,128,1,1 Gn{Gn_7}128 Mp{Mp_8}2,2,2,2 Cr{C_9}3,3,256,1,1 Gn{Gn_10}256 Mp{Mp_11}2,2,2,2 S{S_12}1(1x0)1,3 Lbx{L_13}256 Do{Do_14}0.2 Lbx{L_15}256 Do{Do_16}0.2 Lbx{L_17}256 Do{Do_18}0.2 O{O_19}1c264]</dd>
<dt id="Creation">Creation:</dt>
<dd></dd>
<dt id="License">License:</dt>
<dd>PublicDomainMark 1.0 (see: https://creativecommons.org/publicdomain/mark/1.0/)</dd>
</dl>
<h2>Training</h2>
<dl class="grid">
<dt id="Training-type">Type of training:</dt>
<dd>From scratch</dd>
<dt id="Epochs">Epochs:</dt>
<dd>36</dd>
</dl>
</div>
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<link rel="stylesheet" href="../../../../../../../table_hide.css"/>
<div>
<h1 id="title">German newspapers</h1>
<p id="paragraph">Kraken model with htr+ topology for german newspapers trained from several datasets.
See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print</p>
<h2>Metadata</h2>
<dl class="grid">
<dt id="Language">OCR engine / software:</dt>
<dd>Kraken</dd>
<dt id="Type">Model type:</dt>
<dd>Text recognition</dd>
<dt id="Format">Format:</dt>
<dd>.mlmodel</dd>
<dt id="Topology">Topology:</dt>
<dd>[1,128,0,1 Cr{C_0}4,2,8,4,2 Cr{C_1}4,2,32,1,1 Mp{Mp_2}4,2,4,2 Cr{C_3}3,3,64,1,1 Mp{Mp_4}1,2,1,2 S{S_5}1(1x0)1,3 Lbx{L_6}256 Do{Do_7}0.5 Lbx{L_8}256 Do{Do_9}0.5 Lbx{L_10}256 Do{Do_11}0.5 O{O_12}1c264]</dd>
<dt id="Creation">Creation:</dt>
<dd></dd>
<dt id="License">License:</dt>
<dd>PublicDomainMark 1.0 (see: https://creativecommons.org/publicdomain/mark/1.0/)</dd>
</dl>
<h2>Training</h2>
<dl class="grid">
<dt id="Training-type">Type of training:</dt>
<dd>From scratch</dd>
<dt id="Epochs">Epochs:</dt>
<dd>36</dd>
</dl>
</div>
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<link rel="stylesheet" href="../../../../../../../table_hide.css"/>
<div>
<h1 id="title">German newspapers</h1>
<p id="paragraph">Kraken model with htru topology for german newspapers trained from several datasets.
See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print</p>
<h2>Metadata</h2>
<dl class="grid">
<dt id="Language">OCR engine / software:</dt>
<dd>Kraken</dd>
<dt id="Type">Model type:</dt>
<dd>Text recognition</dd>
<dt id="Format">Format:</dt>
<dd>.mlmodel</dd>
<dt id="Topology">Topology:</dt>
<dd>[1,120,0,1 Cr{C_0}4,2,32,4,2 Gn{Gn_1}32 Cr{C_2}4,2,64,1,1 Gn{Gn_3}32 Mp{Mp_4}4,2,4,2 Cr{C_5}3,3,128,1,1 Gn{Gn_6}32 Mp{Mp_7}1,2,1,2 S{S_8}1(1x0)1,3 Lbx{L_9}256 Do{Do_10}0.5 Lbx{L_11}256 Do{Do_12}0.5 Lbx{L_13}256 Do{Do_14}0.5 O{O_15}1c264]</dd>
<dt id="Creation">Creation:</dt>
<dd></dd>
<dt id="License">License:</dt>
<dd>PublicDomainMark 1.0 (see: https://creativecommons.org/publicdomain/mark/1.0/)</dd>
</dl>
<h2>Training</h2>
<dl class="grid">
<dt id="Training-type">Type of training:</dt>
<dd>From scratch</dd>
<dt id="Epochs">Epochs:</dt>
<dd>36</dd>
</dl>
</div>
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<link rel="stylesheet" href="../../../../../../../table_hide.css"/>
<div>
<h1 id="title">German newspapers</h1>
<p id="paragraph">Kraken model with kraken topology for german newspapers trained from several datasets.
See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print</p>
<h2>Metadata</h2>
<dl class="grid">
<dt id="Language">OCR engine / software:</dt>
<dd>Kraken</dd>
<dt id="Type">Model type:</dt>
<dd>Text recognition</dd>
<dt id="Format">Format:</dt>
<dd>.mlmodel</dd>
<dt id="Topology">Topology:</dt>
<dd>[1,120,0,1 Cr{C_0}3,13,32 Do{Do_1}0.1,2 Mp{Mp_2}2,2 Cr{C_3}3,13,32 Do{Do_4}0.1,2 Mp{Mp_5}2,2 Cr{C_6}3,9,64 Do{Do_7}0.1,2 Mp{Mp_8}2,2 Cr{C_9}3,9,64 Do{Do_10}0.1,2 S{S_11}1(1x0)1,3 Lbx{L_12}200 Do{Do_13}0.1,2 Lbx{L_14}200 Do.{Do_15}1,2 Lbx{L_16}200 Do{Do_17} O{O_18}1c264]</dd>
<dt id="Creation">Creation:</dt>
<dd></dd>
<dt id="License">License:</dt>
<dd>PublicDomainMark 1.0 (see: https://creativecommons.org/publicdomain/mark/1.0/)</dd>
</dl>
<h2>Training</h2>
<dl class="grid">
<dt id="Training-type">Type of training:</dt>
<dd>From scratch</dd>
<dt id="Epochs">Epochs:</dt>
<dd>39</dd>
</dl>
</div>
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<link rel="stylesheet" href="../../../../../../../table_hide.css"/>
<div>
<h1 id="title">German newspapers</h1>
<p id="paragraph">Kraken model with sgd topology for german newspapers trained from several datasets.
See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print</p>
<h2>Metadata</h2>
<dl class="grid">
<dt id="Language">OCR engine / software:</dt>
<dd>Kraken</dd>
<dt id="Type">Model type:</dt>
<dd>Text recognition</dd>
<dt id="Format">Format:</dt>
<dd>.mlmodel</dd>
<dt id="Topology">Topology:</dt>
<dd>[1,144,0,1 Cr4,2,16,1,1 Mp4,2 Cr2,2,48,1,1, Gn24 Mp2,2 Cr2,2,72,1,1 Gn36 Mp2,2 S1(1x0)1,3 Lbx288 Do0.2,2 Lbx288 Do0.2,2 Lbx288]</dd>
<dt id="Creation">Creation:</dt>
<dd></dd>
<dt id="License">License:</dt>
<dd>PublicDomainMark 1.0 (see: https://creativecommons.org/publicdomain/mark/1.0/)</dd>
</dl>
<h2>Training</h2>
<dl class="grid">
<dt id="Training-type">Type of training:</dt>
<dd>From scratch</dd>
<dt id="Epochs">Epochs:</dt>
<dd>30</dd>
</dl>
</div>
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<link rel="stylesheet" href="../../../../../../table_hide.css"/>
<div>
<h1 id="title">German newspapers</h1>
<p id="paragraph">OCR model for german newspapers trained from several datasets.
Best model variant for Tesseract.
See https://github.com/UB-Mannheim/kraken/wiki/Training-german-newspapers</p>
<h2>Metadata</h2>
<dl class="grid">
<dt id="Language">OCR engine / software:</dt>
<dd>Tesseract</dd>
<dt id="Type">Model type:</dt>
<dd>Text recognition</dd>
<dt id="Format">Format:</dt>
<dd>.traineddata</dd>
<dt id="Topology">Topology:</dt>
<dd></dd>
<dt id="Creation">Creation:</dt>
<dd></dd>
<dt id="License">License:</dt>
<dd>PublicDomainMark 1.0 (see: https://creativecommons.org/publicdomain/mark/1.0/)</dd>
</dl>
<h2>Training</h2>
<dl class="grid">
<dt id="Training-type">Type of training:</dt>
<dd>From scratch</dd>
<dt id="Epochs">Epochs:</dt>
<dd>20</dd>
</dl>
</div>
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<link rel="stylesheet" href="../../../../../../table_hide.css"/>
<div>
<h1 id="title">German newspapers</h1>
<p id="paragraph">OCR model for german newspapers trained from several datasets.
Fast model variant for Tesseract.
See https://github.com/UB-Mannheim/kraken/wiki/Training-german-newspapers</p>
<h2>Metadata</h2>
<dl class="grid">
<dt id="Language">OCR engine / software:</dt>
<dd>Tesseract</dd>
<dt id="Type">Model type:</dt>
<dd>Text recognition</dd>
<dt id="Format">Format:</dt>
<dd>.traineddata</dd>
<dt id="Topology">Topology:</dt>
<dd></dd>
<dt id="Creation">Creation:</dt>
<dd></dd>
<dt id="License">License:</dt>
<dd>PublicDomainMark 1.0 (see: https://creativecommons.org/publicdomain/mark/1.0/)</dd>
</dl>
<h2>Training</h2>
<dl class="grid">
<dt id="Training-type">Type of training:</dt>
<dd>From scratch</dd>
<dt id="Epochs">Epochs:</dt>
<dd>20</dd>
</dl>
</div>
28 changes: 28 additions & 0 deletions docs/data/german-print/data/kraken/text/german_print/METADATA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<link rel="stylesheet" href="../../../../../../table_hide.css"/>
<div>
<h1 id="title">German print</h1>
<p id="paragraph">Kraken model for german prints trained from several datasets.
See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print</p>
<h2>Metadata</h2>
<dl class="grid">
<dt id="Language">OCR engine / software:</dt>
<dd>Kraken</dd>
<dt id="Type">Model type:</dt>
<dd>Text recognition</dd>
<dt id="Format">Format:</dt>
<dd>.mlmodel</dd>
<dt id="Topology">Topology:</dt>
<dd></dd>
<dt id="Creation">Creation:</dt>
<dd></dd>
<dt id="License">License:</dt>
<dd>PublicDomainMark 1.0 (see: https://creativecommons.org/publicdomain/mark/1.0/)</dd>
</dl>
<h2>Training</h2>
<dl class="grid">
<dt id="Training-type">Type of training:</dt>
<dd>From scratch</dd>
<dt id="Epochs">Epochs:</dt>
<dd>17</dd>
</dl>
</div>
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<link rel="stylesheet" href="../../../../../../table_hide.css"/>
<div>
<h1 id="title">German print</h1>
<p id="paragraph">OCR model for german prints trained from several datasets.
Best model variant for Tesseract.
See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print</p>
<h2>Metadata</h2>
<dl class="grid">
<dt id="Language">OCR engine / software:</dt>
<dd>Tesseract</dd>
<dt id="Type">Model type:</dt>
<dd>Text recognition</dd>
<dt id="Format">Format:</dt>
<dd>.traineddata</dd>
<dt id="Topology">Topology:</dt>
<dd></dd>
<dt id="Creation">Creation:</dt>
<dd></dd>
<dt id="License">License:</dt>
<dd>PublicDomainMark 1.0 (see: https://creativecommons.org/publicdomain/mark/1.0/)</dd>
</dl>
<h2>Training</h2>
<dl class="grid">
<dt id="Training-type">Type of training:</dt>
<dd>From scratch</dd>
<dt id="Epochs">Epochs:</dt>
<dd>20</dd>
</dl>
</div>
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<link rel="stylesheet" href="../../../../../../table_hide.css"/>
<div>
<h1 id="title">German print</h1>
<p id="paragraph">OCR model for german prints trained from several datasets.
Fast model variant for Tesseract.
See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print</p>
<h2>Metadata</h2>
<dl class="grid">
<dt id="Language">OCR engine / software:</dt>
<dd>Tesseract</dd>
<dt id="Type">Model type:</dt>
<dd>Text recognition</dd>
<dt id="Format">Format:</dt>
<dd>.traineddata</dd>
<dt id="Topology">Topology:</dt>
<dd></dd>
<dt id="Creation">Creation:</dt>
<dd></dd>
<dt id="License">License:</dt>
<dd>PublicDomainMark 1.0 (see: https://creativecommons.org/publicdomain/mark/1.0/)</dd>
</dl>
<h2>Training</h2>
<dl class="grid">
<dt id="Training-type">Type of training:</dt>
<dd>From scratch</dd>
<dt id="Epochs">Epochs:</dt>
<dd>20</dd>
</dl>
</div>
Loading

0 comments on commit af9a4ac

Please sign in to comment.