Created by Emmeline Kaser, Digital Archivist, April 2023
This workflow outlines how and when formats are analyzed, when formats should be converted, and lists the characteristics of digital files that make them a candidate for reformatting. It also outlines how and when to establish a migration pathway for a file format.
- an archive (contains multiple files) OR
- at imminent risk of obsolescence and inaccessibility
- complex to preserve and can easily be migrated to a simpler/equally usable format OR
- significantly beneficial to access (uncommon)
At-risk formats are especially labor-intensive if they require new research or a tedious, high-touch conversion process. It is the digital archivist’s responsibility to estimate this labor and communicate it clearly to the collecting archivists, who can factor it in when making appraisal decisions.
Collections need to have reformatting labor factored into the appraisal process and assigned processing tier. It should also be included in the processing plan.
Format sustainability must be factored into the processing plan for new accessions, starting with a format report at the point of accessioning.
-
Use the format report generated by format-analysis.py to identify the file formats in an accession and determine their NARA risk level. This information can inform both appraisal and processing decisions by:
- indicating the potential preservation needs of the collection
- helping the processing archivist(s) estimate the amount of labor the collection will require
-
Arrange the collection as needed.
-
Generate Version 1 AIPs before reformatting anything. There are a few exceptions to this rule:
- Archive (zipped or compressed) files: Because they don't require any actual format conversion of the compressed files, archives should have their contents extracted before the Version 1 AIP is created.
- Web formats: Donors occasionally save web content in ways that are not conducive to access or preservation, e.g. saving a web page from a browser so that it downloads as a series of separate files that are more complex to preserve and reconstruct for access. Because web archiving methods are not common knowledge, we can deprioritize the donor’s choice of format and prioritize the donor's intent to capture a particular page. Improperly captured web content can be remediated before the Version 1 AIP is created.
-
Reformat as needed and document all decisions in a reformatting log. Save the log with the preservation documentation for the collection. See the log template for required fields.
- Note: This log can include decisions to not reformat high-risk files - try to document any rationale that will be helpful for future archivists.
-
Make a new version of each AIP that contains reformatted files.
-
Ingest the new AIPs into ARCHive with the new version number included in the metadata.
-
Reformatted files must have these events clearly documented in their description.
- Each reformatted file should have this noted in the appropriate AIP inventory
- Each AIP with reformatted files should have this noted in the ArchivesSpace description
Workflows for addressing formats in SIP and AIP storage are currently in development.
Pathways are created on an ad hoc basis as formats are identified and their risks assessed. They should be added to the Format Migration Pathways spreadsheet, which can be found in the Digital Stewardship Teams folder.
- Assign a path ID that can be used in reformatting logs (a three-digit number in sequential order with the existing IDs on the spreadsheet).
- Determine which software can open the format.
- If pathway is based on risk, consider NARA’s recommended actions.
- Consider the ubiquity/accessibility of compatible software.
- Establish a recommended action.
- If format conversion is recommended, determine a recommended new format.
- Determine which software/processes can convert the old format to the recommended format.
- Establish a process for using that conversion tool (e.g. which settings need to be consistent).
- Document where the pathway aligns with or diverges from NARA recommendations.
- Add a “last reviewed” date for the pathway.
Older pathways must be periodically reviewed and updated as needed. This can also be done on an ad hoc basis.
Not all reformatted files should be saved to ARCHive. Reformatted files should only be ingested as new versions if they were reformatted for preservation reasons. The most common reason is to avoid obsolescence and imminent data loss. In some cases, files may also be reformatted to simplify future preservation efforts, e.g. saving a web page as a single file when it was originally donated as components in multiple formats. In both cases, reformatting is considered a preservation measure and the new versions become the preservation copies of the files.
The digital archivist may, on rare occasions, choose to reformat purely for access reasons, particularly if a collection is frequently used. Reformatted access copies can be helpful if the formats are low risk but difficult for researchers to use. The existence of access copies must be clearly described in the finding aid so researchers can request the original formats if they wish. These copies should not be ingested into ARCHive.
Example: A collection contains multiple proprietary data files that were created by a piece of genealogy software. The archivist reasonably assumes that most researchers do not have the software required to view the data in its original format. They find a workaround that allows them to view and save the data to a spreadsheet, which will be easier for researchers to use. The digital archivist opts to create access copies in a spreadsheet format rather than providing complex access instructions for researchers.
Version 1.0 preliminary criteria list created by Emmeline Kaser, June 2022