Releases: jsvine/pdfplumber
Releases · jsvine/pdfplumber
v0.6.0
See CHANGELOG.md for a full list of additions, changes, and fixes. In some (hopefully) rare cases, this version may introduce breaking changes, which is why we're bumping to v0.6.0
. Highlights from the changelog include:
- Upgrade
pdfminer.six
from20200517
to20211012
; see that library's changelog for details, but a key difference is an improvement in how it assignsline
,rect
, andcurve
objects. (Diagonal two-point lines, for instance, are nowline
objects instead ofcurve
objects.) (#515) - Add
.extract_text(layout=True)
, an experimental feature which attempts to mimic the structural layout of the text on the page. (#10) - Remove Decimal-ization of parsed object attributes, which are now represented with as much precision as is returned by
pdfminer.six
(#346 + #520) .extract_text(...)
returns""
instead ofNone
when character list is empty. (#482 + cb9900b) [h/t @tungph]- Add
--precision
argument to CLI (#520) - Add
snap_x_tolerance
andsnap_y_tolerance
to table extraction settings. (#51 + #475) [h/t @dustindall] - Add
join_x_tolerance
andjoin_y_tolerance
to table extraction settings. (cbb34ce) .extract_words(...)
now includesdoctop
among the attributes it returns for each word. (66fef89)
And many thanks to @samkit-jain for his feedback and review of contributions to this release. 🎉
v0.5.28
From CHANGELOG.md:
Added
- Add
--laparams
flag to CLI. (#407)
Changed
- Change
.convert_csv(...)
to order objects first by page number, rather than object type. (#407) - Change
.convert_csv(...)
,.convert_json(...)
, and CLI so that, by default, they returning all available object types, rather than those in a predefined default list. (#407)
Fixed
- Fix
.extract_text(...)
so that it can accept generator objects as its main parameter. (#385) [h/t @alexreg] - Fix page-parsing so that
LTAnno
objects (which have no bounding-box coordinates) are not extracted. (Was only an issue when settinglaparams
.) (#388) - Fix
Page.extract_table(...)
so that it honors text tolerance settings (#415) [h/t @trifling]
v0.5.27
From CHANGELOG.md:
Fixed
- Fix regression (introduced in
0.5.26
/b1849f4) in closing files opened byPDF.open
- Reinstate access to higher-level layout objects (such as
textboxhorizontal
) whenlaparams
is passed topdfplumber.open(...)
. Had been removed in0.5.24
via 1f87898. (#359 + #364)
Development Changes
- Add a
python setup.py build sdist
test to main GitHub action. (#365)
v0.5.26
v0.5.25
v0.5.24
v0.5.23
v0.5.22
[0.5.22] — 2020-07-18
Changed
- Upgraded
pdfminer.six
requirement to==20200517
(cddbff7) [h/t @youngquan]
Added
- Add support for
non_stroking_color
attribute onchar
objects (0254da3) [h/t @idan-david]
v0.5.15
v0.6.0-alpha
This release is a preview/alpha for pdfplumber v0.6.0
. Among the more notable changes:
- Revamps the table-extraction methods, to simplify them and make them more flexible.
- Adds font size and font name to results of
Page/utils.extract_words(...)
, based on @jsfenfen's suggestions in #28. (Thanks!)
Goals before v0.6.0-beta
:
- Add
Page.find_text_gutters
feature, bringing back that table-finding strategy from earlier versions ofpdfplumber
. - Attempt to fix/address as many extant GitHub issues as possible.
- Update the example notebooks, so that they work.
Goals before v0.6.0
full release:
- Reach full test coverage.
- Add more robust documentation.
- Add more/better docstrings.