Wrapper around the command line tool pdftohtml which converts PDF to HTML, go figure.
This gem was inspired by the MiniMagick gem – which does the same thing for ImageMagick (thanks Corey).
Just pdftohtml and Ruby (1.8.6+ as far as I know).
On Mac:
brew install pdftohtml
On Ubuntu:
It should be installed by default with the ‘poppler-utils’ package.
http://gemcutter.org/gems/pdftohtmlr
gem install pdftohtmlr
require 'pdftohtmlr'
require 'nokogiri'
include PDFToHTMLR
file = PdfFilePath.new([Path to Source PDF])
string = file.convert
doc = file.convert_to_document()
See included test cases for more usage examples, including passwords and URL fetching.
MIT (See included MIT-LICENSE)