You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I realize this may be thoroughly outside the intended scope of this project, but it would be wonderful if it would process not just PDF files, but a variety of image files (tiff and jpg come to mind). Perhaps passing them directly to to tesseract-ocr and outputting the results as text files?
Thanks for the fantastic Unraid docker container, and for your consideration!
The text was updated successfully, but these errors were encountered:
ocrmypdf-auto only needs to allow the extension to be processed. I added jpg to the extension list, but ocrmypdf failed with a picture i took on my phone due to invalid DPI values. Did not try it out further.
The way to go would probably be to use img2pdf for images and then feed them to ocrmypdf.
An initial step toward this support is now available with @jo-me's latest updates. The image now supports the .jpg extension by passing a jpg file directly to ocrmypdf, though from @jo-me's experiments, it sounds like this is not sufficient for proper OCR in all cases.
I will keep this issue open to track the feature request and see whether it is reasonable to add img2pdf preprocessing in the container in a future update.
I realize this may be thoroughly outside the intended scope of this project, but it would be wonderful if it would process not just PDF files, but a variety of image files (tiff and jpg come to mind). Perhaps passing them directly to to tesseract-ocr and outputting the results as text files?
Thanks for the fantastic Unraid docker container, and for your consideration!
The text was updated successfully, but these errors were encountered: