pdftotext does not support -htmlmeta anymore (chapter 8 metadata extraction) #24

flothesof · 2018-01-17T12:53:44Z

Hi,

This issue concerns chapter 8 of your tutorial.
It seems that pdftotext does not support the -htmlmeta flag anymore, so the metadata extraction you describe in the tutorial is broken when using the latest xpdf 4.0.0 version, see current documentation for pdftotext.
According to my tests, you can still access that information but you have to use the pdfinfo command for that.
Another point regarding that notebook: the Python library reference recommends using subprocess.run for everything. Is there a specific reason why you're using subprocess.call in the tutorial instead?

Thanks for these great tutorials.
Best regards,

Florian

The text was updated successfully, but these errors were encountered:

proycon assigned fbkarsdorp Aug 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdftotext does not support -htmlmeta anymore (chapter 8 metadata extraction) #24

pdftotext does not support -htmlmeta anymore (chapter 8 metadata extraction) #24

flothesof commented Jan 17, 2018

pdftotext does not support -htmlmeta anymore (chapter 8 metadata extraction) #24

pdftotext does not support -htmlmeta anymore (chapter 8 metadata extraction) #24

Comments

flothesof commented Jan 17, 2018