It scrapes highlights from kinde.amazon.com web site (https://kindle.amazon.com/your_highlights).
- Mechanize
- Nokogiri
default task is "rake update:recent"
rake convert
call convert:all
rake convert:all
load a local file and convert into xml/html format
rake convert:html
load a local file and convert into html format
rake convert:xml
load a local file and convert into xml format
rake print
load a local file and print highlight data
rake update
call update:recent
rake update:all
retrieve all data from amazon server, and store them into a local file
rake update:recent
retrieve recent 1 month data from amazon server, and store them into a local file
require 'kindle-your-highlights'
# to create a new KindleYourHighlights object, give it your Amazon email address and password
kindle = KindleYourHighlights.new("[email protected]", "password")
kindle.highlights.each do |highlight|
highlight.annotation_id # => a unique value for each highlight, generated by Amazon
highlight.content # => the actual highlight text
highlight.asin # => the Amazon ASIN for the highlight's product
highlight.author # => author of the book from which the highlight is taken
highlight.title # => title of the book from which the highlight is taken
highlight.location # => highlight location in the book
highlight.note # => users' note added along with the highlight
end
kindle.books.each do |book|
book.asin # => the Amazon ASIN for the book
book.author # => author of the book
book.title # => title of the book
book.last_update # => last update of the hightlights for the book (last annoted at)
end
require 'kindle-your-highlights'
# to create a new KindleYourHighlights object, give it your Amazon email address and password
kindle = KindleYourHighlights.new("[email protected]", "password", { :page_limit => 100, :day_limit => 31, :wait_time => 2 }) do | h |
puts "loading... [#{h.books.last.title}] - #{h.books.last.last_update}"
end
# xml outputs (needs to create ./xml folder in advance)
KindleYourHighlights::XML.new(:list => kindle.list, :file_name => "xml/out.xml").output
# html outputs (needs to create ./html folder in advance)
KindleYourHighlights::HTML.new(:list => kindle.list, :file_name => "html/out.html").output
require 'kindle-your-highlights'
# to create a new KindleHighlight object, give it your Amazon email address and password
kindle = KindleYourHighlights.new("[email protected]", "password", { :page_limit => 100, :wait_time => 2 }) do | h |
puts "loading... [#{h.books.last.title}]"
end
# load previous file, merge with the new one, and dump it again.
if File.exist?("out.dump")
list = KindleYourHighlights::List.load("out.dump")
kindle.merge!(list)
end
KindleYourHighlights::HTML.new(:list => kindle.list, :file_name => "out.html").output
kindle.list.dump("out.dump")
- page_limit : specifies maximum number of pages (books) to be loaded
- day_limit : specifies maximum number of days to be retrieved, based on "Last annotated on" date and today
- wait_time : specifies wait time between each page load in seconds (default is 5 seconds)
- block : call-back function which for each page load completion
XML output example
<?xml version="1.0"?>
<books>
<book>
<asin>ASIN</asin>
<title>TITLE</title>
<author>AUTHOR</author>
<highlights>
<annotation_id>ANNOTATION_ID1</annotation_id>
<content>CONTENT1</content>
</highlights>
<highlights>
<annotation_id>ANNOTATION_ID2</annotation_id>
<content>CONTENT2</content>
</highlights>
</book>
</books>
- Initial upload (0.1.0)
This lib was originally from "https://github.com/speric/kindle-highlights", but I created a separate project for adding features and for changing code formats.