-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test Calamari 1.0.x #8
Comments
PyPI is still at version 0.3.5 – I'll wait until it has 1.0.0. |
https://pypi.org/project/calamari-ocr/ is now at 1.0.1 |
I'm going to have to re-train my GT4HistOCR model for 1.0.x for this update to be useful. |
I'm currently training a new model for 1.0.x, so this is coming. |
This would also avoid wrestling the damn "tensorflow vs tensorflow-gpu problem" |
Unfortunately, it's not done with just using Calamari 1.x. With this change
I get a hundreds of these messages:
However it seems to produce good results using my model for 1.0 (left: GT text, right: updated ocrd_calamari OCR text): |
Waiting for Calamari-OCR/calamari#180 to review the API usage here. |
AFAIK the warning message about retracing in newer TensorFlow 2 versions doesn't show an error per se but only hints that the prediction can be implemented more efficiently in TensorFlow 2. So the results themselves shouldn't be influenced by it. |
Unfortunate that the logger is not further namespaced, so we cannot selectively disable these log messages, should they indeed turn out to be unproblematic-to-ignore log spam. |
Using Calamari 1.0/TF 2.2 my tests are around 5 times slower and I suspect that the retracing is the issue. I'll have a look if fixing #20 solves the warning problem too, as we're doing the most inefficient prediction - line by line - anyway. |
Alright I did some testing using 93190fa, so I am putting all lines of a region into Calamari Using Python 3.7 (on 3.8 it's not possible to install TF 1.15): master (Calmari 0.3.5)
feat/update-calamari1
It's still slower and I still get (a lot less) retracing warnings using a test document:
|
Judging from https://github.com/Calamari-OCR/calamari/blob/master/calamari_ocr/ocr/backends/model_interface.py#L61 we seem to be using the API correctly, i.e. passing a @maxnth any thoughts? |
Hi, I had the same errors recently and I believe I got it fixed for my setup (tensorflow 2.1 and tensorflow 2.3) with https://github.com/Calamari-OCR/calamari/compare/fix-162. The problem has to be somewhere in tensorflow_model.py. The lines around 224 might also work as dataset = tf.data.Dataset.from_generator(gen, output_signature=(
tf.TensorSpec((None, line_height, self.input_channels), tf.float32),
tf.TensorSpec((None,), tf.int32),
tf.TensorSpec((1,), tf.int32),
tf.TensorSpec((1,), tf.int32),
tf.TensorSpec((1,), tf.string))) but I'm not sure about that, it has been a while. I'm also not sure what the change is going to do in connection with other tensorflow versions. Unfortunately, I don't have a testing setup at the moment and had no success at quickly installing the current tensorflow without running into CUDA/NVIDIA problems again... We really need some proper testing and fixed versions for tensorflow, otherwise this is just going to produce problems again and again. |
@andbue I did some tests using calamari-predict alone and using TensorFlow 2.3rc2 reduces the amount of warnings a lot. I updated the requirements for https://github.com/OCR-D/ocrd_calamari/tree/feat/update-calamari1 and will test the performance again on Monday (tested it on a different PC). |
With Calamar 1.0.x, TF 2.3rc2 and not doing the recognition line by line I get comparable or better performance than using Calamari 0.3.5 and line by line:
There are still a lot warning messages, so this is not resolved 100% yet. |
I am going to wait with releasing until TF 2.3.0 is out or a new Calamari release with this issue fixed is out. |
TF 2.3.0 is out |
Still have some issues with this I have to investigate. |
Hi Mike, if you run into any problems with calamari and TF 2.3, make sure to try the current master! I made some changes in Calamari-OCR/calamari#184. The package on pypi is outdated at the moment. |
I've measured CER rates using the Calamari 1 branch and they're on par with the Calamari 0.3 branch (=master). Macro Median CER:
Macro Mean CER:
Dataset:
Models:
→ I'll be merging https://github.com/OCR-D/ocrd_calamari/tree/feat/update-calamari1, and open another issue as I'm still seeing some runtime performance increase. |
|
No description provided.
The text was updated successfully, but these errors were encountered: