Scripts for downloading oral history data

Download all oral history audio, images, and metadata as json and csv files directly from oralhistory.nypl.org

python get_metadata_and_assets.py -out "path/to/output/dir/"

This script creates in the output directory:

neighborhoods.json and neighborhoods.csv
interviews.json and interviews.csv
individual .json files for each interview which contain more metadata and annotations
write images and audio to ./audio and ./images folders

Download all oral history transcripts as json, plain text, and web vtt files directly from transcribe.oralhistory.nypl.org

python get_transcripts.py -out "path/to/output/dir/"

This script creates in the output directory:

A manifest transcripts.json file with links to each interview transcripts
Individual folders for each interview that contains three formats of transcripts (.json, .txt, .vtt)
.json files contain all the of the edits, while the .txt and .vtt contain the "best guess" transcriptions for each line