Releases: scribeocr/scribe.js
Releases · scribeocr/scribe.js
v0.5.0
What's Changed
- Added
config
argument torecognize
, which allows for passing arguments to Tesseract.js (#22) - Added support for parsing PDF text at various orientations (90/180/270 degrees).
- Minor improvements to OCR quality.
- Various improvements to imports of HOCR and native PDF text.
- Added
saveAs
utility function for saving files. - Added
opt.kerning
option that can be used to enable or disable kerening.
Full Changelog: v0.4.1...v0.5.0
v0.4.1
What's Changed
- Implemented parallel processing by default for Node.js version
- To restore the previous behavior (1 worker), set
scribe.opt.workerN = 1
before calling any functions.
- To restore the previous behavior (1 worker), set
- Non-default behavior for extracting text from PDF files is now handled by setting the properties of
scribe.opt.usePDFText
. - Added Nimbus Mono font (similar to Courier)
- Improvements to text extraction from PDF files.
- Improvements to text positioning.
Full Changelog: v0.3.1...v0.4.1
Note: This post combines changes for 0.4.0
and 0.4.1
since the former was only the most recent version for a few hours.
v0.3.1
v0.3.0
What's Changed
- Improvements to parsing existing text from PDF files
- Various improvements to OCR text and bounding box quality
- Fixed memory leak
- Various minor changes
Full Changelog: v0.2.8...v0.3.0
v0.2.8
- Improved performance of "Quality" recognition mode.
- Many documents should run up to 10-15% faster in quality mode.
- Updated Scribe Tesseract build to improve recognition accuracy.
- Accuracy for data tables and other complex layouts has been noticeably improved.
- See benchmark repo for examples and accuracy metrics.
- Accuracy for data tables and other complex layouts has been noticeably improved.
- Improved image pre-processing.
- Updated Vanilla Tesseract build to support debugging features and image upscaling.
- Other minor changes
Full Changelog: v0.2.7...v0.2.8
v0.2.7
- Fixed bug preventing existing text in some PDFs from being detected (025456a)
- Increased resolution at which PDFs are rendered (0dd8801)
- Added
calcSuppFontInfo
option that calculates font metrics for the fonts in text-native PDFs (4b2b43e)- This is useful for niche applications that require highly-accurate visual coordinates from text-native PDFs.
- Various other minor updates
Full Changelog: v0.2.6...v0.2.7
v0.2.6
- Restored compatibility with Webpack
Full Changelog: v0.2.5...v0.2.6
v0.2.5
- Improved performance, especially for single-page documents.
- Improved accuracy for "Quality" recognition mode (the default).
- Fixed various minor bugs
Full Changelog: v0.2.4...v0.2.5
v0.2.4
- Improved support with build tools such as Webpack
- Fixed bug where PDF resources were being loaded when not necessary (dd99124)
- Fixed Tesseract bug causing incorrect metrics for single-word recognition (
Recognize Word
) in Scribe OCR UI (f6be561)
Full Changelog: v0.2.3...v0.2.4
v0.2.3
- Added
extractPDFTextImage
option toimportFiles
- When
extractPDFTextNative
,extractPDFTextOCR
, andextractPDFTextImage
are all set totrue
, text will always be extracted from the input PDF and set as the "active" version, even if there is no text.
- When
Full Changelog: v0.2.2...v0.2.3