Skip to content

Latest commit

 

History

History
26 lines (18 loc) · 1.14 KB

README.md

File metadata and controls

26 lines (18 loc) · 1.14 KB

Scripts for downloading oral history data

  1. Download all oral history audio, images, and metadata as json and csv files directly from oralhistory.nypl.org
python get_metadata_and_assets.py -out "path/to/output/dir/"

This script creates in the output directory:

  • neighborhoods.json and neighborhoods.csv
  • interviews.json and interviews.csv
  • individual .json files for each interview which contain more metadata and annotations
  • write images and audio to ./audio and ./images folders
  1. Download all oral history transcripts as json, plain text, and web vtt files directly from transcribe.oralhistory.nypl.org
python get_transcripts.py -out "path/to/output/dir/"

This script creates in the output directory:

  • A manifest transcripts.json file with links to each interview transcripts
  • Individual folders for each interview that contains three formats of transcripts (.json, .txt, .vtt)
  • .json files contain all the of the edits, while the .txt and .vtt contain the "best guess" transcriptions for each line