Skip to content
This repository has been archived by the owner on Mar 8, 2021. It is now read-only.

pdftotext does not support -htmlmeta anymore (chapter 8 metadata extraction) #24

Open
flothesof opened this issue Jan 17, 2018 · 0 comments
Assignees

Comments

@flothesof
Copy link

Hi,

This issue concerns chapter 8 of your tutorial.
It seems that pdftotext does not support the -htmlmeta flag anymore, so the metadata extraction you describe in the tutorial is broken when using the latest xpdf 4.0.0 version, see current documentation for pdftotext.
According to my tests, you can still access that information but you have to use the pdfinfo command for that.
Another point regarding that notebook: the Python library reference recommends using subprocess.run for everything. Is there a specific reason why you're using subprocess.call in the tutorial instead?

Thanks for these great tutorials.
Best regards,

Florian

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants