Skip to content

Commit

Permalink
Merge pull request #108 from NYPL/ryanmc
Browse files Browse the repository at this point in the history
Update qc-workflow.md
  • Loading branch information
bturkus authored Apr 9, 2024
2 parents b84591e + e456ac5 commit 71453b5
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 72 deletions.
18 changes: 0 additions & 18 deletions docs/pages/cli-resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,24 +52,6 @@ Nearly all macOs software required for AMI Preservation can be installed with th
```
for file in *wav ; do ffmpeg -i "$file" -c:a aac -b:a 320k -dither_method rectangular -ar 44100 "${file%.*}.mp4" ; done
```
### Generating a QC list
- ```Cd /path/to/Audio``` or Video directory
- ```ls Audio/Video > path/to/log/folder/batchID_assetlist.csv```
- [Import csv into google sheets qc log created from template:]
- The QC template has built-in formulas in the “list” tab.
- Navigate to the “asset list” .csv file for the drive you are QCing.
- Navigate to the “list” tab in your QC log
- **Select the proper cell for either Audio or Video media that you are importing data for (i.e. if you’re importing an AUDIO asset list, select the audio cell as instructed in the log).** _The two cells have different formulas applied for the different quotas we are meeting for audio vs. video data (5% audio vs. 10% video)_
- Once you’ve selected the cell, import _assetlist.csv_ into A1 of the “list” tab of the existing Google Sheets QC log (File>Import>Upload a file)
- Select **“Replace contents starting with selected cell”** and **“comma separated”** when the dialog window appears (because you are importing a .csv)
- The line items will then be imported into the sheet, and the column next to it will generate a filtered list. This is your spot checking list.
- Copy/Paste Special>”Paste values only” the list of filtered Primary IDs into the “QClog” tab, in the Primary ID column. Do this for one media type at a time. (i.e. if there are both audio and video assets on a single drive, first copy the audio items list, then copy the video items list below it.
- Proceed with spot checking.
### Content Inspection
#### General Overview
- Software requirements:
Expand Down
18 changes: 4 additions & 14 deletions docs/pages/mps/qc-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ The following handbook will provide step-by-step instructions for carrying out o
# Content Inspection

* Software requirements:
* Text editor
* VLC
* Text editor (Atom / Notepad / Text Edit etc.) to open and inspect JSON files.
* VLC to open and inspect media files.

* Content inspection can be completed either on ICC or on the drive.
* **On ICC**: make sure your machine is not going to create DS_Store files or Thumbs.db files inside bags.
Expand Down Expand Up @@ -101,8 +101,6 @@ The following handbook will provide step-by-step instructions for carrying out o
* Urgent / Systematic errors
* If you notice that there is something consistently and terribly wrong with many files in a row, please notify MPA / Asst. Mgr immediately so we can notify vendor and avoid replicating the error in future deliverables ASAP. (e.g. the ’barcode’ field in the JSON files is consistently “000000000”, or the ‘duration’ values are all wrong, or every value for ‘filename’ is the same across an entire batch.)
[QC complete! - if there are failures, all failures in an entire shipment will be combined and sent as a single email; report them to the MPA]
# Bag Validation
* Use ```validate_ami_bags.py``` in ami-tools to check Check bag Oxums, bag completeness, bag hashes, directory structure, filenames, and metadata.
Expand Down Expand Up @@ -153,13 +151,6 @@ flac --decode --keep-foreign-metadata --preserve-modtime --verify input.flac
/path/to/ami-preservation/ami-scripts/rawcooked_check_mkv.py -d /Volumes/DRIVE-ID -p 20
```
# Perform Manual QC
* Perform manual QC using Google Sheet list of Bags to check (in Trello card) (1min @ beginning, middle, end of each file)
* Note any errors / observations in the Google Sheet log. Use the categories/menus provided as much as possible.
* Use [this](https://github.com/NYPL/ami-preservation/wiki/Resources#logging-qc-failures--flags) list of definitions to review and mark-off the items listed in the QC log.
* Content Inspection of In-House deliverables can be completed either on ICC or on the drive by following the steps outlined [here](https://github.com/NYPL/ami-preservation/wiki/Resources#content-inspection).
* Use a text-editor (Atom / Notepad / Text Edit etc.) to [open and inspect](https://github.com/NYPL/ami-preservation/wiki/Resources#spot-checking-content--json) JSON files.
# Wrap Up...
* **IF APPROVED**:
Expand All @@ -184,15 +175,14 @@ flac --decode --keep-foreign-metadata --preserve-modtime --verify input.flac
**Vendor**
* Once QC is complete and approved, notify Digital Preservation and make arrangements to hand off hard drive(s) for ingest.
o prepare media for ingest.
* Once QC is complete and approved, notify Digital Preservation and make arrangements to hand off hard drives so media files can be uploaded to EAVie.
**In-House**
[complete]
* Generating a QC list
Use Terminal to generate a QC list for each drive you are QCing by following the steps outlined [here](https://github.com/NYPL/ami-preservation/wiki/Resources#generating-a-qc-list).
Expand Down
84 changes: 44 additions & 40 deletions docs/pages/preservationServices/qualityControl/qc-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ title: QC Workflow
layout: default
nav_order: 1
parent: Quality Control
grand_parent: Preservation Services

---

# Quality Control Workflow
Expand All @@ -17,7 +15,7 @@ Internal workflow for carrying out QC on digital assets.
1. TOC
{:toc}

# Quality Control Overview
## Quality Control Overview
Quality control (QC) is conducted in accordance with best practices to ensure that deliverables generated for preservation and access meet our technical specifications, metadata requirements, and adhere to best practices for handling and digitization of NYPL’s audiovisual collections.

Our QC workflow is currently comprised of the following processes:
Expand All @@ -29,30 +27,34 @@ Our QC workflow is currently comprised of the following processes:

The following handbook will provide step-by-step instructions for carrying out our QC processes on Vendor and In-House projects. Our QC workflows vary slightly between Vendor and In-House deliverables, so steps applicable to Vendor projects only are marked **(vendor only)**. Vendor QC is primarily performed directly on hard-drives.

# Shipment Intake
## Shipment Intake
**(vendor only)**
* Enter all drives and associated invoice IDs ("shipments" / "work orders") received into the [Vendor Project Tracking sheet](https://docs.google.com/spreadsheets/d/1ZeF6vGE1TqLnKaNjZFSIvjyKhYBt38nBcZDHyD_saPo/edit#gid=1973090513). Complete all fields (some are formulas - highlighted gray if so).

* Copy the "work order ID" that is automatically generated in the Vendor Project Tracking sheet (column A).

# Cards & Logs
## Cards & Logs

## Trello Card
### Trello Card
* Use the appropriate template (In-House/Vendor) to create a Trello card on the [MPS Quality Control board](https://trello.com/b/CBLrQvG1/mps-quality-control) for each project directory (In-House) or hard drive (Vendor)

## ICA Log
### ICA Log
* Create an ICA Log directory in ica.repo.nypl.org/pami named with the work order ID

## QC Log
### QC Log
* Create a QC log in the Team Drive QC Folder for each hard drive:
* make a copy of the [QC Log Template](https://docs.google.com/spreadsheets/d/1OKlFNGR27H6Ey9v2EyAjqe6MzOsPrVl_5X4PDV-elsU/edit?usp=sharing) & rename the copy using the same work order ID, (follow the QC log template naming convention).
* Attach QC log to the associated Trello card (using the Attachments button in the card, drop in the URL of the QC log).

# Content Inspection
**Each QC log should be easily found linked in Google Drive as an** _attachment in the Trello Card for the batch you are inspecting. _Tip: you can search for the drive ID / work order ID in the Trello search box._

* Use the list of files that appears in the Google Sheet QC log (in the QClog tab) as your list of files to check.
* Drop down menus are available for noting specific identifiable errors, and there is a free-text field for general notes.
## Content Inspection

* Software requirements:
* Text editor
* VLC
* Text editor (Atom / Notepad / Text Edit etc.) to open and inspect JSON files.
* VLC to open and inspect media files.

* Content inspection can be completed either on ICC or on the drive.
* **On ICC**: make sure your machine is not going to create DS_Store files or Thumbs.db files inside bags.
Expand Down Expand Up @@ -84,8 +86,26 @@ The following handbook will provide step-by-step instructions for carrying out o
```
diskutil mount readOnly device name as listed in Disk Utility
```
# Bag Validation
## Logging QC Failures & Flags
* Use the Definitions below to review and mark-off the items listed in the QC log.
* **Be as concise as possible when noting questions and errors, so MPA does not have to double-check or clarify with you before compiling notes for Vendors.**
* Feel free to add rows for additional assets if you encounter more errors when troubleshooting. Rows are ‘per bag’.
* When QC is complete, send an email to notify MPA / Asst Mgr. that there are some items to review. They will compile all notes for a shipment into a single email and communicate to the vendor. Note: Try to troubleshoot errors to make sure you’re not missing something about the nature of the tape that would impact the quality or structure of the file or metadata, e.g. if it was a very poor quality tape and they baked it twice and cleaned it and tried it on multiple machines.
* Definitions
* Question: A question which will help determine whether an item should be reworked or not. Example:
* Freeze frame at the head of the Preservation Master, not noted in the JSON signalNotes. Is this freeze-frame recorded in on-tape?
* *Flag*: A moderate or minor error that is concerning but that DOES NOT require rework, but does require. Examples:
* An audio Edit Master was not levelled-out. The volume level is the same as the Preservation Master, which is lower than the ideal listening volume.
* Audio channels in a video Service Copy were not mixed down the audio from the single channel audible in the Preservation Master, so Service Copy only has one channel of audible content.
* *Fail*: A severe, systematic, or critical error that you think will most likely require retransfer, updating of metadata, and/or rebagging. Examples:
* The metadata for a video asset describes audible content, but the Preservation Master and Service Copy do not have audio.
* An audio asset appears to sound entirely backwards (reversed content on a single face -f01 -was not split out into a separate Face -f02-)
* Pass No errors, or any errors listed in the notes are inconsequential, inherent to tape, or only included as supplemental information for future cataloger inquiries.
* Urgent / Systematic errors
* If you notice that there is something consistently and terribly wrong with many files in a row, please notify MPA / Asst. Mgr immediately so we can notify vendor and avoid replicating the error in future deliverables ASAP. (e.g. the ’barcode’ field in the JSON files is consistently “000000000”, or the ‘duration’ values are all wrong, or every value for ‘filename’ is the same across an entire batch.)
## Bag Validation
* Use ```validate_ami_bags.py``` in ami-tools to check Check bag Oxums, bag completeness, bag hashes, directory structure, filenames, and metadata.
* Due to the time required to validate a directory of Vendor bags, its best to let validate_ami_bags.py run overnight.
Expand All @@ -97,7 +117,7 @@ python3 /path/to/ami-tools/bin/validate_ami_bags.py -d /Volumes/driveID/ --metad
or...just validate JSON using one of two options:
# JSON Validation
## JSON Validation
* Use ```json_validator.py``` in ami-scripts to confirm JSON files comply with [NYPL metadata specifications](https://nypl.github.io/ami-preservation/pages/ami-metadata.html).
Expand All @@ -110,39 +130,32 @@ or
ajv validate --all-errors --multiple-of-precision=2 --verbose -s /path/to/ami-metadata/versions/2.0/schema/digitized.json -r "/path/to/ami-metadata/versions/2.0/schema/*.json" -d "/Volumes/DRIVE-ID/*/*/data/*/*.json"
```
# Digital Asset Conformance
## Digital Asset Conformance
* Use ```mediaconch_checker.py``` in ami_scripts to confirm media files comply with [NYPL digital asset specifications](https://nypl.github.io/ami-preservation/pages/ami-specifications.html).
The ami-preservation repo contains a directory, [qc_utilities](https://github.com/NYPL/ami-preservation/tree/master/qc_utilities). Within this are various scripts and tools, including the mediaconch scripts listed below which will generate 'pass/fail' logs in your home directory when run against a directory of media files.
```
python3 /path/to/ami-preservation/ami-scripts/mediaconch_checker.py -p /path/to/ami-preservation/qc_utilities/MediaconchPolicies -d /Volumes/DRIVE-ID
```
# Additional Checks
## Additional Checks
## BEXT Check
### BEXT Check
* **AUDIO ONLY**: Check a selection of FLAC for embedded metadata
* Copy 5 .flac files delivered to Desktop and decode these new copies back to wav.
```
flac --decode --keep-foreign-metadata --preserve-modtime --verify input.flac
```
* Check BEXT in newly decoded .wavs using BWF MetaEdit. **Discard .wavs and .flac copies after use.**
## RAWCooked Check
### RAWCooked Check
* **FILM PMs ONLY**:
* Check a selection of PMs for RAWCooked reversability:
```
/path/to/ami-preservation/ami-scripts/rawcooked_check_mkv.py -d /Volumes/DRIVE-ID -p 20
```
# Perform Manual QC
* Perform manual QC using Google Sheet list of Bags to check (in Trello card) (1min @ beginning, middle, end of each file)
* Note any errors / observations in the Google Sheet log. Use the categories/menus provided as much as possible.
* Use [this](https://github.com/NYPL/ami-preservation/wiki/Resources#logging-qc-failures--flags) list of definitions to review and mark-off the items listed in the QC log.
* Content Inspection of In-House deliverables can be completed either on ICC or on the drive by following the steps outlined [here](https://github.com/NYPL/ami-preservation/wiki/Resources#content-inspection).
* Use a text-editor (Atom / Notepad / Text Edit etc.) to [open and inspect](https://github.com/NYPL/ami-preservation/wiki/Resources#spot-checking-content--json) JSON files.
# Wrap Up...
## Wrap Up...
* **IF APPROVED**:
* Move the Trello Card to the proper list (passed / failed etc.)
Expand All @@ -164,27 +177,18 @@ flac --decode --keep-foreign-metadata --preserve-modtime --verify input.flac
## Media Ingest Preparation
**Vendor**
### Vendor
* Once QC is complete and approved, notify Digital Preservation and make arrangements to hand off hard drives so media files can be uploaded to EAVie.
* Once QC is complete and approved, notify Digital Preservation and make arrangements to hand off hard drive(s) for ingest.
o prepare media for ingest.
**In-House**
[complete]
### In-House
[Need to complete]
* Generating a QC list
Use Terminal to generate a QC list for each drive you are QCing by following the steps outlined [here](https://github.com/NYPL/ami-preservation/wiki/Resources#generating-a-qc-list).
* Locate & Open QC log
**Each QC log should be easily found linked in Google Drive as an** _attachment in the Trello Card for the batch you are inspecting._ **If not, check with MPA.** _Tip: you can search for the drive ID / work order ID in the Trello search box._
* Use the list of files that appears in the Google Sheet QC log (in the QClog tab) as your list of files to check.
* Drop down menus are available for noting specific identifiable errors, and there is a free-text field for general notes.
# Tools
## Tools
See our [Command Line Resources ](https://nypl.github.io/ami-preservation/pages/resources.html)for descriptions, usage, and installation instructions of various tools we use in this workflow.

0 comments on commit 71453b5

Please sign in to comment.