The Old Exams Repository is maintained by the University of Toronto Libraries. It contains the 3 most recent years of exams.
Clone or download the scripts to your local repository. Ensure you have a the pre-requistie software installed before running the scripts.
You must run step1.py before running step2.py, there are more details below about the usage and workflow.
-
python step1.py /directory_path_to_pdf_exams/ campus[A, B or C]
-
python step2.py '/directory_path_to_pdf_exams/
- Exams are scanned into PDF with file names
- Each PDF file must contain the course code, month and year.
- DSpace Dublin Core metadata are generated based on each PDF's filename.
Example: Campus C, they should use "au" for August and "ap" for April to properly distinguish these two months.
detailed exam file naming convention found here
- Once exams are received in PDF format from campuses A, B or C file metadata is generated
- Dublin Core metadata is generated from the file names using beautiful soup
- The script also uses a CSV file of departmental codes per campus for mapping
sample generated metadata file found here
- step2.py script is used to package the PDFs and metadata into DSpace simple archives for ingest
- DSpace simple archives are imported into their respective collections via batch import
- Collections older than 3 years old are removed
DSpace Simple Archives Importer is licensed under Apache License 2.0.