Simple python script to crawl (https://www.biblegateway.com/). Currently works for bible versions that supply a direct mapping between verse and verse number (i.e. doesn't work for MSG translation)
Tested on Macbook Pro running MacOS Mojave version 10.14.4.
Environment information:
- Python 3.6.5
To install dependencies, run:
pip install -r requirements.txt
scrapy runspider spider.py -o [FILENAME].json
Replace FILENAME
with any name you want the json output to be stored in. Change the start link in the script to Genesis 1 in your desired version.
Also provided is a bundler.py
script to bundle together the crawling output. This would create a json with the following structure:
{
Book1
{
Chapter1 : Verses {}
Chapter2 : Verses {}
...
}
Book2
{
Chapter1 : Verses {}
Chapter2 : Verses {}
...
}
...
}
bundler.py
expect .json input files (generated by the crawler) to be in the bundler_input
directory. It will create a bundler_output
directory if it doesn't exist to store the bundled outputs.