Skip to content

Commit

Permalink
Merge pull request #1 from theMashUp/dev
Browse files Browse the repository at this point in the history
v0.1
  • Loading branch information
jonathanlarochelle authored Sep 10, 2022
2 parents 5b53d17 + a4759fe commit c97f8c4
Show file tree
Hide file tree
Showing 6 changed files with 246 additions and 125 deletions.
48 changes: 41 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,44 @@
# FindMyBooks
From a list of books, parse desired online library catalogues to figure out where a book is available.
Note: This tool is still in development. The README will be updated when the tool is complete.
[example code]
# FindMyBooks v0.1
Ever found yourself in front of a 100+ books to-read list on Goodreads, without knowing which one is available from your local library?
FindMyBooks allows you to where all of those books are available.

## Getting started
To use the tool, first clone this repository.
Then, download your goodreads library ([link](https://www.goodreads.com/review/import)).
```Python
python find_my_books MY_GOODREADS_LIBRARY.csv
```
You will then find a MY_GOODREADS_LIBRARY_OUTPUT.csv document with columns for each target library, with url if the book was found in the library.
A detailed list of supported command line arguments can be found below in the Configuration section.

## Features
- Import a list of books from Goodreads, ...
- Find out which book is available where
- Import a list of books from Goodreads
- Find out in which library your books are available

## Configuration
### Goodreads library file
Mandatory, string path to the goodreads library file.
Example:
```Python
python find_my_books MY_GOODREADS_LIBRARY.csv
```

### Output file
*-o, --output OUTPUT_FILE.csv*

Optional, path to desired output file to be created.
Default: Goodreads library file with "_output" suffix.

### Debug
*-d, --debug*

Activate debug mode, which displays more information in the console.

## Contributing
### New libraries
Everyone is encouraged to add libraries via the libraries.json file. Please create one PR with all the libraries you wish to add.
### New features
New features are always welcome. If the feature is substantial, please create an issue first, so that it can be discussed. If the feature is minor, you can directly create a PR and we'll look at it.

...
## Licensing
The code in this project is licensed under MIT license.
138 changes: 138 additions & 0 deletions find_my_books.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# -*- coding: utf-8 -*-

# import built-in modules
import re
import json
import logging
import argparse
import csv

# import third-party modules
import requests


APP_DESCRIPTION = "Parse a Goodreads books list to find in which library they are available."
SUPPORTED_LIBRARIES_FILE = "libraries.json"
# TODO: Figure out what is the minimal header required.
REQUEST_HEADERS = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Connection': 'keep-alive'}
VERSION = "0.1"


def parse_argv() -> dict:
"""
Parse command-line arguments into a dict.
"""
arg_parser = argparse.ArgumentParser(description=APP_DESCRIPTION)
arg_parser.add_argument("goodreads_library", type=str,
help="Goodreads library .csv file")
arg_parser.add_argument("-o", "--output", type=str, required=False,
dest="output",
help="output .csv file")
arg_parser.add_argument("-d", "--debug", action="store_true", required=False, default=False,
help="display debug logging lines")
args = arg_parser.parse_args()
args_dict = vars(args)
return args_dict


def get_book_search_url(book_title: str, book_author: str, url_template: str) -> str:
"""
Replace {TITLE} and {AUTHOR} in url_template by book_title and book_author respectively.
"""
# Remove parenthesis and colon in book title to facilitate search.
re_remove_parenthesis = "\(.*\)|\s-\s.*"
book_title = re.sub(re_remove_parenthesis, "", book_title)
re_remove_colon = ":.+"
book_title = re.sub(re_remove_colon, "", book_title)

url = url_template.replace("{AUTHOR}", book_author.replace(" ", "+"))
url = url.replace("{TITLE}", book_title.replace(" ", "+"))
return url


# Script starts here
if __name__ == '__main__':
# Parse command line arguments
args = parse_argv()
goodreads_library_file_path = args["goodreads_library"]
if args["output"]:
output_file_path = args["output"]
else:
output_file_path = goodreads_library_file_path.replace(".csv", "_output.csv")

# Set-up logging
if args["debug"]:
logging_level = logging.DEBUG
else:
logging_level = logging.INFO

logging.basicConfig(level=logging_level)

print(f"FindMyBooks v{VERSION}")
print(f"\tGoodreads library file path: {goodreads_library_file_path}")
print(f"\tOutput file path: {output_file_path}")

# Load Goodreads library file
books_to_check = list()
with open(goodreads_library_file_path, newline="") as csvfile:
reader = csv.DictReader(csvfile)

# Consider only "to-read" books
for book in reader:
if book["Exclusive Shelf"] == "to-read":
books_to_check.append({"title": book["Title"],
"author": book["Author"]})
print(f"\tFound {len(books_to_check)} books in to-read shelf.")

# Load supported libraries from json config file
with open(SUPPORTED_LIBRARIES_FILE) as f:
supported_libraries = json.load(f)["libraries"]
print(f"\tFound {len(supported_libraries)} supported libraries.")

print("\tBeginning library search:")
for id, book in enumerate(books_to_check):
print(f"\t\tBook {id+1}/{len(books_to_check)}: {book['author']}, \"{book['title']}\"")

for lib in supported_libraries:
books_to_check[id][lib["name"]] = ""
url = get_book_search_url(book["title"], book["author"], lib["url"])

logging.debug(f"Library: {lib['name']} ({lib['response_length_threshold']} bytes response threshold)")
logging.debug(f"URL: {url}")

# Using a GET is a waste of download. However, not all websites provide Content-Length with HEAD,
# and some do not answer to HEAD at all.
# TODO: Figure out way to check libraries with smaller footprint. Idea: pretend to be mobile,
# very small screen, etc.
# REQUEST_HEADERS is used because some websites require a "realistic" header to answer the request.
response = requests.get(url, headers=REQUEST_HEADERS)

if response:
response_content_length = len(response.content)
logging.debug(f"Content size: {response_content_length} bytes")
if response_content_length > lib["response_length_threshold"]:
books_to_check[id][lib["name"]] = url
else:
logging.warning(f"Request not successful. Response status code: {response.status_code}")

print("\tLibrary search completed.")

# Create output file
with open(output_file_path, "w", newline="") as csvfile:
fieldnames = books_to_check[0].keys()
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for book in books_to_check:
writer.writerow(book)
print(f"\tResults written to output file {output_file_path}")

print("End of script.")
13 changes: 13 additions & 0 deletions libraries.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"libraries": [
{"name": "BANQ (Overdrive)",
"url": "https://banq.overdrive.com/search/title?query={TITLE}&creator={AUTHOR}&mediaType=ebook&sortBy=newlyadded",
"response_length_threshold": 95000},
{"name": "Ville de Québec (Prêt Numérique)",
"url": "https://quebec.pretnumerique.ca/resources?utf8=%E2%9C%93&keywords={TITLE}&author={AUTHOR}&narrator=&publisher=&collection_title=&issued_on_range=&language=&audience=&category_standard=thema&category=&nature=ebook&medium=",
"response_length_threshold": 8000},
{"name": "Kobo Plus",
"url": "https://www.kobo.com/ca/en/search?query=query&fcmedia=Book~BookSubscription&nd=true&ac=1&ac.author={AUTHOR}&ac.title={TITLE}&sort=PublicationDateDesc&sortchange=1",
"response_length_threshold": 250000}
]
}
30 changes: 30 additions & 0 deletions test_library.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
Book Id,Title,Author,Author l-f,Additional Authors,ISBN,ISBN13,My Rating,Average Rating,Publisher,Binding,Number of Pages,Year Published,Original Publication Year,Date Read,Date Added,Bookshelves,Bookshelves with positions,Exclusive Shelf,My Review,Spoiler,Private Notes,Read Count,Recommended For,Recommended By,Owned Copies,Original Purchase Date,Original Purchase Location,Condition,Condition Description,BCID
,The Little Grave,Carolyn Arnold,,,,,,,,,,,,,,to-read,,to-read,,,,,,,,,,,,
35022387,Décroissance versus développement durable: Débats pour la suite du monde,Yves-Marie Abraham,"Abraham, Yves-Marie","Louis Marion, Hervé Philippe",,,0,3.7,Écosociété,Kindle Edition,301,2012,,,2022/03/29,to-read,to-read (#186),to-read,,,,0,,,0,,,,,
14781491,"The Time of Contempt (The Witcher, #2)",Andrzej Sapkowski,"Sapkowski, Andrzej",David French,0316219134,9780316219136,3,4.17,Orbit,Paperback,331,2013,1995,2022/06/24,2022/05/08,,,read,,,,1,,,0,,,,,
39739146,L'Intelligence des plantes,Stefano Mancuso,"Mancuso, Stefano","Alessandra Viola, Renaud Temperini",,,0,3.89,Albin Michel,Kindle Edition,240,2018,2013,,2022/06/10,to-read,to-read (#218),to-read,,,,0,,,0,,,,,
34324534,Against the Grain: A Deep History of the Earliest States,James C. Scott,"Scott, James C.",,0300182910,9780300182910,0,4.13,Yale University Press,Hardcover,312,2017,2017,,2022/06/20,to-read,to-read (#229),to-read,,,,0,,,0,,,,,
27889241,Future Trends in Microelectronics: Journey Into the Unknown,Serge Luryi,"Luryi, Serge","Jimmy Xu, Alexander Zaslavsky",1119069114,9781119069119,0,0,Wiley,Hardcover,384,2016,,,2022/06/19,to-read,to-read (#228),to-read,,,,0,,,0,,,,,
23209924,The Water Knife,Paolo Bacigalupi,"Bacigalupi, Paolo",,0385352875,9780385352871,0,3.84,Knopf,Hardcover,371,2015,2015,,2022/06/17,to-read,to-read (#227),to-read,,,,0,,,0,,,,,
44882,Code: The Hidden Language of Computer Hardware and Software,Charles Petzold,"Petzold, Charles",,0735611319,9780735611313,0,4.39,Microsoft Press,Paperback,400,2000,1999,,2022/06/16,to-read,to-read (#226),to-read,,,,0,,,0,,,,,
8701960,"The Information: A History, a Theory, a Flood",James Gleick,"Gleick, James",,0375423729,9780375423727,0,4.02,Knopf Doubleday Publishing Group,Hardcover,527,2011,2011,,2022/06/13,to-read,to-read (#225),to-read,,,,0,,,0,,,,,
40175096,Thus Spoke the Plant: A Remarkable Journey of Groundbreaking Scientific Discoveries and Personal Encounters with Plants,Monica Gagliano,"Gagliano, Monica",,,9781623172435,0,3.77,North Atlantic Books,Paperback,176,2018,2018,,2022/06/10,to-read,to-read (#217),to-read,,,,0,,,0,,,,,
804069,A Brief History of the Future: The Origins of the Internet,John Naughton,"Naughton, John",,075381093X,9780753810934,0,3.81,Orion Publishing Group,Paperback,334,2006,1999,,2022/06/05,"to-read, history-of-technology","to-read (#204), history-of-technology (#6)",to-read,,,,0,,,0,,,,,
8201080,The Master Switch: The Rise and Fall of Information Empires,Tim Wu,"Wu, Tim",,0307269930,9780307269935,0,3.87,Knopf,Hardcover,384,2010,2010,,2022/06/05,"to-read, history-of-technology","to-read (#203), history-of-technology (#5)",to-read,,,,0,,,0,,,,,
753865,Inventing the Internet,Janet Abbate,"Abbate, Janet",,0262511150,9780262511155,0,3.87,MIT Press,Paperback,268,2000,1999,,2022/06/05,"to-read, history-of-technology","to-read (#202), history-of-technology (#4)",to-read,,,,0,,,0,,,,,
509866,"The Rise of the Network Society: The Information Age: Economy, Society and Culture, Volume I",Manuel Castells,"Castells, Manuel",,0631221409,9780631221401,0,3.98,Wiley-Blackwell,Paperback,624,2000,1996,,2022/06/05,"to-read, history-of-technology","to-read (#201), history-of-technology (#3)",to-read,,,,0,,,0,,,,,
35068671,The Perfectionists: How Precision Engineers Created the Modern World,Simon Winchester,"Winchester, Simon",,0062652575,9780062652577,0,4.13,Harper,ebook,416,2018,2018,,2022/06/05,"to-read, history-of-technology","to-read (#200), history-of-technology (#2)",to-read,,,,0,,,0,,,,,
51619298,Technology and the Environment in History,Sara B. Pritchard,"Pritchard, Sara B.",Carl A. Zimring,1421438992,9781421438993,0,3,Johns Hopkins University Press,Paperback,264,2020,,,2022/06/05,"to-read, history-of-technology","to-read (#211), history-of-technology (#1)",to-read,,,,0,,,0,,,,,
13330922,"The Black Count: Glory, Revolution, Betrayal, and the Real Count of Monte Cristo",Tom Reiss,"Reiss, Tom",Gabriel Stoian,030738246X,9780307382467,0,3.97,Crown,Hardcover,414,2012,2012,,2022/06/03,to-read,to-read (#199),to-read,,,,0,,,0,,,,,
59469433,The Temporary European: Lessons and Confessions of a Professional Traveler,Cameron Hewitt,"Hewitt, Cameron",,,,0,4.55,,Kindle Edition,,2022,,,2022/06/01,to-read,to-read (#198),to-read,,,,0,,,0,,,,,
32783223,Problems of Life: An Evaluation of Modern Biological Thought,Ludwig Von Bertalanffy,"Bertalanffy, Ludwig Von",,161427701X,9781614277019,0,4,Martino Fine Books,Paperback,226,2014,,,2022/05/11,to-read,to-read (#196),to-read,,,,0,,,0,,,,,
6043781,"Blood of Elves (The Witcher, #1)",Andrzej Sapkowski,"Sapkowski, Andrzej",Danusia Stok,031602919X,9780316029193,4,4.1,Hachette Book Group,Mass Market Paperback,398,2009,1994,2022/05/08,2022/05/05,,,read,,,,1,,,0,,,,,
25454056,"Sword of Destiny (The Witcher, #0.7)",Andrzej Sapkowski,"Sapkowski, Andrzej",David A French,,,4,4.28,Orbit,Kindle Edition,384,2015,1992,2022/05/03,2021/11/22,,,read,,,,1,,,0,,,,,
39328584,Greenwood,Michael Christie,"Christie, Michael",,1984822004,9781984822000,0,4.34,Hogarth,Hardcover,528,2020,2019,,2022/05/01,to-read,to-read (#194),to-read,,,,0,,,0,,,,,
56404444,Bewilderment,Richard Powers,"Powers, Richard",,0393881148,9780393881141,0,3.97,W. W. Norton Company,Hardcover,278,2021,2021,,2022/04/30,to-read,to-read (#193),to-read,,,,0,,,0,,,,,
40180098,The Overstory,Richard Powers,"Powers, Richard",,039335668X,9780393356687,5,4.12,W.W. Norton & Company,Paperback,502,2019,2018,2022/04/30,2022/04/04,,,read,,,,1,,,0,,,,,
25848636,The Time Traveller's Wife,Audrey Niffenegger,"Niffenegger, Audrey",,,,0,3.99,,Hardcover,518,2003,2003,,2022/04/28,to-read,to-read (#192),to-read,,,,0,,,0,,,,,
50403493,Wagnerism: Art and Politics in the Shadow of Music,Alex Ross,"Ross, Alex",,0374285934,9780374285937,0,4.2,"Farrar, Straus and Giroux",Hardcover,784,2020,2020,,2022/04/18,to-read,to-read (#191),to-read,,,,0,,,0,,,,,
11543839,Did Jesus Exist?: The Historical Argument for Jesus of Nazareth,Bart D. Ehrman,"Ehrman, Bart D.",,0062089943,9780062089946,0,3.84,HarperCollins Publishers,ebook,304,2012,2012,,2022/04/18,to-read,to-read (#190),to-read,,,,0,,,0,,,,,
13587193,"Permanent Present Tense: The Unforgettable Life of the Amnesic Patient, H. M.",Suzanne Corkin,"Corkin, Suzanne",,0465031595,9780465031597,3,3.7,Basic Books,Hardcover,386,2013,2012,,2013/10/23,,,read,,,,1,,,0,,,,,
53328332,Less is More: How Degrowth Will Save the World,Jason Hickel,"Hickel, Jason",,1786091216,9781786091215,5,4.53,Windmill Books,Paperback,320,2021,2020,,2022/03/29,,,read,,,,1,,,0,,,,,
24 changes: 24 additions & 0 deletions test_output.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
title,author,BANQ (Overdrive),Ville de Québec (Prêt Numérique),Kobo Plus
The Little Grave,Carolyn Arnold,,,
Décroissance versus développement durable: Débats pour la suite du monde,Yves-Marie Abraham,,https://quebec.pretnumerique.ca/resources?utf8=%E2%9C%93&keywords=Décroissance+versus+développement+durable&author=Yves-Marie+Abraham&narrator=&publisher=&collection_title=&issued_on_range=&language=&audience=&category_standard=thema&category=&nature=ebook&medium=,
L'Intelligence des plantes,Stefano Mancuso,,https://quebec.pretnumerique.ca/resources?utf8=%E2%9C%93&keywords=L'Intelligence+des+plantes&author=Stefano+Mancuso&narrator=&publisher=&collection_title=&issued_on_range=&language=&audience=&category_standard=thema&category=&nature=ebook&medium=,
Against the Grain: A Deep History of the Earliest States,James C. Scott,,,
Future Trends in Microelectronics: Journey Into the Unknown,Serge Luryi,,,
The Water Knife,Paolo Bacigalupi,https://banq.overdrive.com/search/title?query=The+Water+Knife&creator=Paolo+Bacigalupi&mediaType=ebook&sortBy=newlyadded,,
Code: The Hidden Language of Computer Hardware and Software,Charles Petzold,,,
"The Information: A History, a Theory, a Flood",James Gleick,https://banq.overdrive.com/search/title?query=The+Information&creator=James+Gleick&mediaType=ebook&sortBy=newlyadded,,
Thus Spoke the Plant: A Remarkable Journey of Groundbreaking Scientific Discoveries and Personal Encounters with Plants,Monica Gagliano,,,
A Brief History of the Future: The Origins of the Internet,John Naughton,,,
The Master Switch: The Rise and Fall of Information Empires,Tim Wu,https://banq.overdrive.com/search/title?query=The+Master+Switch&creator=Tim+Wu&mediaType=ebook&sortBy=newlyadded,,
Inventing the Internet,Janet Abbate,,,
"The Rise of the Network Society: The Information Age: Economy, Society and Culture, Volume I",Manuel Castells,,,
The Perfectionists: How Precision Engineers Created the Modern World,Simon Winchester,,,
Technology and the Environment in History,Sara B. Pritchard,,,
"The Black Count: Glory, Revolution, Betrayal, and the Real Count of Monte Cristo",Tom Reiss,https://banq.overdrive.com/search/title?query=The+Black+Count&creator=Tom+Reiss&mediaType=ebook&sortBy=newlyadded,,
The Temporary European: Lessons and Confessions of a Professional Traveler,Cameron Hewitt,,,https://www.kobo.com/ca/en/search?query=query&fcmedia=Book~BookSubscription&nd=true&ac=1&ac.author=Cameron+Hewitt&ac.title=The+Temporary+European&sort=PublicationDateDesc&sortchange=1
Problems of Life: An Evaluation of Modern Biological Thought,Ludwig Von Bertalanffy,,,
Greenwood,Michael Christie,https://banq.overdrive.com/search/title?query=Greenwood&creator=Michael+Christie&mediaType=ebook&sortBy=newlyadded,https://quebec.pretnumerique.ca/resources?utf8=%E2%9C%93&keywords=Greenwood&author=Michael+Christie&narrator=&publisher=&collection_title=&issued_on_range=&language=&audience=&category_standard=thema&category=&nature=ebook&medium=,
Bewilderment,Richard Powers,https://banq.overdrive.com/search/title?query=Bewilderment&creator=Richard+Powers&mediaType=ebook&sortBy=newlyadded,,
The Time Traveller's Wife,Audrey Niffenegger,,,
Wagnerism: Art and Politics in the Shadow of Music,Alex Ross,https://banq.overdrive.com/search/title?query=Wagnerism&creator=Alex++Ross&mediaType=ebook&sortBy=newlyadded,,
Did Jesus Exist?: The Historical Argument for Jesus of Nazareth,Bart D. Ehrman,,,
Loading

0 comments on commit c97f8c4

Please sign in to comment.