Skip to content

Releases: scribeocr/scribe.js

v0.5.0

25 Nov 09:08
Compare
Choose a tag to compare

What's Changed

  • Added config argument to recognize, which allows for passing arguments to Tesseract.js (#22)
  • Added support for parsing PDF text at various orientations (90/180/270 degrees).
  • Minor improvements to OCR quality.
  • Various improvements to imports of HOCR and native PDF text.
  • Added saveAs utility function for saving files.
  • Added opt.kerning option that can be used to enable or disable kerening.

Full Changelog: v0.4.1...v0.5.0

v0.4.1

10 Nov 19:24
Compare
Choose a tag to compare

What's Changed

  • Implemented parallel processing by default for Node.js version
    • To restore the previous behavior (1 worker), set scribe.opt.workerN = 1 before calling any functions.
  • Non-default behavior for extracting text from PDF files is now handled by setting the properties of scribe.opt.usePDFText.
  • Added Nimbus Mono font (similar to Courier)
  • Improvements to text extraction from PDF files.
  • Improvements to text positioning.

Full Changelog: v0.3.1...v0.4.1

Note: This post combines changes for 0.4.0 and 0.4.1 since the former was only the most recent version for a few hours.

v0.3.1

31 Oct 08:38
Compare
Choose a tag to compare

What's Changed

  • Fixed memory leaks

Full Changelog: v0.3.0...v0.3.1

v0.3.0

31 Oct 03:59
Compare
Choose a tag to compare

What's Changed

  • Improvements to parsing existing text from PDF files
  • Various improvements to OCR text and bounding box quality
  • Fixed memory leak
  • Various minor changes

Full Changelog: v0.2.8...v0.3.0

v0.2.8

30 Sep 07:30
Compare
Choose a tag to compare
  • Improved performance of "Quality" recognition mode.
    • Many documents should run up to 10-15% faster in quality mode.
  • Updated Scribe Tesseract build to improve recognition accuracy.
    • Accuracy for data tables and other complex layouts has been noticeably improved.
  • Improved image pre-processing.
  • Updated Vanilla Tesseract build to support debugging features and image upscaling.
  • Other minor changes

Full Changelog: v0.2.7...v0.2.8

v0.2.7

25 Sep 05:21
Compare
Choose a tag to compare
  • Fixed bug preventing existing text in some PDFs from being detected (025456a)
  • Increased resolution at which PDFs are rendered (0dd8801)
  • Added calcSuppFontInfo option that calculates font metrics for the fonts in text-native PDFs (4b2b43e)
    • This is useful for niche applications that require highly-accurate visual coordinates from text-native PDFs.
  • Various other minor updates

Full Changelog: v0.2.6...v0.2.7

v0.2.6

06 Sep 08:00
Compare
Choose a tag to compare

v0.2.5

06 Sep 07:40
Compare
Choose a tag to compare
  • Improved performance, especially for single-page documents.
  • Improved accuracy for "Quality" recognition mode (the default).
  • Fixed various minor bugs

Full Changelog: v0.2.4...v0.2.5

v0.2.4

29 Aug 07:56
Compare
Choose a tag to compare
  • Improved support with build tools such as Webpack
  • Fixed bug where PDF resources were being loaded when not necessary (dd99124)
  • Fixed Tesseract bug causing incorrect metrics for single-word recognition (Recognize Word) in Scribe OCR UI (f6be561)

Full Changelog: v0.2.3...v0.2.4

v0.2.3

22 Aug 00:54
Compare
Choose a tag to compare
  • Added extractPDFTextImage option to importFiles
    • When extractPDFTextNative, extractPDFTextOCR, and extractPDFTextImage are all set to true, text will always be extracted from the input PDF and set as the "active" version, even if there is no text.

Full Changelog: v0.2.2...v0.2.3