A file manifest should be created for each born-digital accession at the point of accessioning. After an accession is copied into the Libraries' storage systems, use technical-appraisal-logs.py to generate an initial file manifest of all the files and their checksum information.
The current workflow also generates fixity information using bagit.py. This creates a "bag" for each accession with an MD5 manifest text file inside that is quick and easy to validate using bagit's command line program.
Below are instructions for creating and validating bags, as well as instructions for validating fixity information from checksum manifests created by other utilities.
This workflow uses bagit.py, a Python module and command line utility with installation instructions on the Library of Congress's bagit-python GitHub page.
bagit.py turns a directory into a "bag" that includes a file (manifest-md5.txt) with the MD5 and path for every file. To validate a bag, bagit.py compares each file's current MD5 with the information in the manifest and will return "bag invalid" if there are any changes.
All the files for an accession should be put in a single bag named with the accession number.
Bagit Instructions:
$ bagit.py --md5 /path/to/accession/folder
Record this step in the preservation log.
Append "_bag" to the end of the accession folder name (e.g., harg-ms-281_2017-01-er_bag).
$ bagit.py --validate /path/to/accession/folder_bag
Record the results of this step in the preservation log.
Save a copy of the bag manifest (manifest-md5.txt) outside the bag with the other preservation documentation.
Fixity can be validated using a simple command line command.
$ bagit.py --validate /path/to/accession/folder_bag
Several older accessions have checksum manifests that were generated using the file cataloging utility Karen's Directory Printer. The UGA Libraries GitHub has a set of Python scripts that can parse and validate fixity from the MD5 checksums in these manifests.
- verify-md5-KDPmanifests - UGA Libraries GitHub repo