Skip to content

v0.6.0

Compare
Choose a tag to compare
@jsvine jsvine released this 21 Dec 14:06
· 297 commits to stable since this release

See CHANGELOG.md for a full list of additions, changes, and fixes. In some (hopefully) rare cases, this version may introduce breaking changes, which is why we're bumping to v0.6.0. Highlights from the changelog include:

  • Upgrade pdfminer.six from 20200517 to 20211012; see that library's changelog for details, but a key difference is an improvement in how it assigns line, rect, and curve objects. (Diagonal two-point lines, for instance, are now line objects instead of curve objects.) (#515)
  • Add .extract_text(layout=True), an experimental feature which attempts to mimic the structural layout of the text on the page. (#10)
  • Remove Decimal-ization of parsed object attributes, which are now represented with as much precision as is returned by pdfminer.six (#346 + #520)
  • .extract_text(...) returns "" instead of None when character list is empty. (#482 + cb9900b) [h/t @tungph]
  • Add --precision argument to CLI (#520)
  • Add snap_x_tolerance and snap_y_tolerance to table extraction settings. (#51 + #475) [h/t @dustindall]
  • Add join_x_tolerance and join_y_tolerance to table extraction settings. (cbb34ce)
  • .extract_words(...) now includes doctop among the attributes it returns for each word. (66fef89)

And many thanks to @samkit-jain for his feedback and review of contributions to this release. 🎉