From 295ed152100c78fd01235a95dc0b1ecc81ff6788 Mon Sep 17 00:00:00 2001 From: Nick Krabbenhoeft Date: Wed, 5 Oct 2022 12:59:12 -0400 Subject: [PATCH] update on-site-file-transfer for RAID usage --- sitevisits/site-visit-file-transfers.md | 196 ++++++++++++++++-------- 1 file changed, 131 insertions(+), 65 deletions(-) diff --git a/sitevisits/site-visit-file-transfers.md b/sitevisits/site-visit-file-transfers.md index f171295d..f3d5ac50 100644 --- a/sitevisits/site-visit-file-transfers.md +++ b/sitevisits/site-visit-file-transfers.md @@ -10,109 +10,175 @@ parent: Site Visits ## Table of contents {: .no_toc .text-delta } +--- +title: On-Site Transfers +layout: default +nav_order: 2 +parent: Site Visits +--- -1. TOC -{:toc} +## On-Site File Transfers -# Introduction +{: .no_toc } + +When the Library acquires a copy of digital files, rather than the digital carriers themselves, Digital Archives prefers to make the copy on-site instead of bringing the carrier to the lab, copying, and returning. +The file transfer workflows are detailed in this document. +The workflow may need to be adapted based on the media types and file types encountered. +Transferring material during a site visit will differ from transferring material in the lab. - Born digital collection material can be acquired through file transfer or forensic imaging. Most material will be transferred using a Bagit script through command line. +## Preparing the Transfer - The file transfer workflows are detailed in this document. The workflows may vary based on media types and file types encountered. Transferring material during a site visit will differ from transferring material in the lab. At the time of a site visit no collection number will be assigned to the material and no media log inventory will exist for the material. +* Use acquisition dossiers to understand the scale of the transfer. +* Contact the donor or organization for further details about: + * the types of carriers (external drives, network storage, etc) + * the available connections to the carriers (USB3, Thunderbolt, Ethernet, etc) + * space available for a 2-3 week period + * best times to complete the transfer +* Coordinate with LSC Collection Management about transport to and from the site. -**Be sure to pad your estimates of transfer time. Transfer always takes longer than estimated.** -1. Name the Transfer +## Equipment -2. Build SIPs +* Check that you have the appropriate equipment for the transfer. +* Test that all equipment works before the site visit. -3. Transfer files from media object +### Storage -## Name the Transfer -At the time of a site visit no collection number will be assigned to the material. -* Use a collection name such as the name of the institution you are acquiring from or the personal name of the creator. - * The Foundry Theatre ```foundryTheatre``` - * Lou Reed papers ```louReed``` -* Add ```additions``` to the transfer name when you are acquiring additions. - * Bill T. Jones additions ```billTJonesadditions``` +* [ ] 8-16 TB USB External Hard Drive +Or +* [ ] 126TB RAID +* [ ] RAID power cord +* [ ] Thunderbolt 3 cord +* [ ] Thunderbolt 3 to Thunderbolt 2 adapter +* [ ] RAID Pelican case -## Build SIPs -These instructions show you how to create SIPs using a one-line command to create directories. +### Laptop +* [ ] Laptop with Thunderbolt 2/3 ports and/or USB3-A/C ports +* [ ] Laptop power supply -On Windows: -* Start Cygwin from the desktop. A terminal like screen should appear. +### Potential Additional Supplies -On Mac: -* Open Terminal. +* [ ] Pelican rollaway case +* [ ] USB A 10-receptacle hub +* [ ] USB hub power supply +* [ ] USB 3.1 A receptacle to USB-C plug adapter +* [ ] USB 3.1 B - USB 3.1 A cables +* [ ] USB 3.1 Micro B - USB 3.1 A cable +* [ ] Power strips +* [ ] USB3 write-blocker -On all operating systems: +## Transfer -* ``mkdir`` command can be used to create SIPs. This works when SIPs aren't consecutively numbered. 0001 to 0009 require a different line from 0010 on. -* Change to fileTransfers directory. -```$ cd filetransfers``` -* Enter ```mkdir``` command. -```mkdir -p CollID/Media-000{1..9}/{metadata/submissionDocumentation,objects}``` -```mkdir -p CollID/Media-00{10..99}/{metadata/submissionDocumentation,objects}``` -```mkdir -p CollID/Media-000{1,5,7,9}/{metadata/submissionDocumentation,objects}``` +* When you arrive, discuss the following with the site contact: + * contact information and hours of site access + * any documentation that might be useful for provenance + * expectations for repeated on-site access (e.g. adding sets of hard drives) + * whether any files have been updated in the past 30 days and will require additional quarantine +* Establish an expected completion date with LSC Collection Management and update as needed. +### Equipment Setup +Make sure all cables are run safely, such as around the back of a table. -### SIP structure +* Find uninterrupted power source for extension cord. +* Connect the laptop, transfer storage, and and additional equipment to extension cord. +* Connect the transfer storage to the laptop. +* Connect any additional equipment to the laptop. +* Boot the laptop and check that all equipment works. +* Do not connect to network unless required for the transfer. -* /M0021 +### Name the Transfers - * /metadata +At the time of a site visit no collection number will be assigned to the material. - * /submissionDocumentation +* For the collection ID, use the name of the institution you are acquiring from or the personal name of the creator. + * The Foundry Theatre `foundryTheatre` + * Lou Reed papers `louReed` +* Add additions to the transfer name when you are acquiring additions. + * Bill T. Jones additions `billTJonesadditions` +* Create a text file for the collection in the root director of the transfer drive named `CollID.csv`. +* For each piece of source media, assign a sequential ID number `M0001`. +* In the text file, record the name or label of the drive and the ID number, separated by a comma. - * /objects +### Preparing to Transfer from Hard Drives (if necessary) -## Transfer files from media object +Repeat this process until no additional ports are available. -Files that have been updated by the donor within the past 30 - days should be quarantined for 30 days to ensure that - all virus definitions are up to date. +* On the laptop, open the system disk manager or utility to check the status of drives. +* Connect the source drive to laptop or hub and turn on the power. +* Remount the source drive as read-only. Device numbers are available in the disk manager. -* Use a write-blocker to connect the drive to the computer. - * [Ultrakit](../tools/ultrakit){:target="_blank"}, [Portable Forensic Bridges](../using/using-lab-equipment#portable-forensic-bridges){:target="_blank"} + ``` bash + diskutil umount /dev/sda[] + diskutil mount readOnly /dev/sda[] + ``` -### Bagit Script +* Make the transfer directories. -* Run [ft.sh ](../software#ftsh){:target="_blank"} to create a transfer package. + ```bash + mkdir -p /path/to/transfer/CollID/MediaID/{metadata,objects} + ``` -On Windows: -* Start Cygwin from the desktop. A terminal like screen should appear. +* Generate a file name and file size manifest of drive and save to the transfer drive -On Mac: -* Open Terminal. + ``` bash + find /path/to/drive -type f -print0 | xargs -0r stat -f '%N, %z' | sort > /path/to/transfer/CollID/MediaID/metadata/sourcedrive.csv + ``` -On all operating systems: - * Enter the alias ```FT``` and hit return. +* In a text editor, create the command to transfer the source drive to its own folder on the transfer drive -Or + ``` bash + rsync -rtP --exclude-from /path/to/transfer/exclude-list.txt --log-file=/path/to/transfer/CollID/MediaID/metadata/rsync_log.csv --log-file-format=", %f, %l, %C" /path/to/source /path/to/transfer/CollID/MediaID/objects/ + ``` + +### Preparing to Transfer from Network Locations (if necessary) + +To be written{: .label .label-yellow } + +### Starting Transfers + +* Chain together the transfer commands with semi-colons to run them sequentially. +It is typically faster to copy from source drives sequentially, instead of simultaneously. + + ``` bash + rsync -rtP ... ; rsync -rtP ... ; rsync -rtP ... ; ... + ``` + +* Check transfer speed and determine when an additional visit will be necessary to add new drives or to complete the transfer. +For the number of expected days: `[total amount on drives in MB] / [transfer speed in MB] / 3600 / 24` + +### Disconnecting Hard Drive Transfers + +* Compare the size of the source drive to the folder on the transfer drive. -* Enter ```/usr/local/bin/ft.sh``` and hit return if the alias is not set. + ``` bash + du -sh /path/to/source/ + du -sh /path/to/transfer/CollID/MediaID/objects/ + ``` -* Drag the SIP folder from the media object to the window and hit return as prompted. +* Generate a file name and file size manifest of folder on the transfer drive and save it to the transfer drive. -* Enter the MediaID [EX: ```M0021```] for the file transfer and hit return. + ``` bash + find /path/to/drive -type f -print0 | xargs -0r stat -f '%N, %z' | sort > /path/to/transfer/storage/CollID/MediaID/metadata/sourcedrive_transferred.csv + ``` -* The terminal prompt will display below when the process is complete. +* Investigate any discrepancies between the manifests and retransfer if necessary. -### Rsync -Bagit may fail when attempting to copy hidden or system files. Use rsync when you determine it is the better tool for a transfer. It might be possible to use rsync in the event Bagit fails. Make sure you have enough time to start a new transfer. When you don't have enough time the transfer will need to take place another time. + ``` bash + comm -2 -3 /path/to/transfer/storage/CollID/MediaID/metadata/sourcedrive.csv /path/to/transfer/storage/CollID/MediaID/metadata/sourcedrive_transferred.csv + ``` -On Windows: -* Start Cygwin from the desktop. A terminal like screen should appear. +* Unmount the source drive. +* Disconnect completed drive from hub -On Mac: -* Open Terminal. +### Equipment Tear Down -On all operating systems: -* Enter ```rsync -arP targetpath destinationpath``` -* A trailing slash on the destination path copies contents of a folder not the folder itself. -* The selected options represented in the command are ```--archive --recursive --progress --partial```. -* Exclude files using ```--exclude=.DS_Store``` or ```--exclude-from 'exclude-list.txt'``` +* Unmount the transfer drive and any additional equipment. +* Power down the transfer drive. +* Power down the laptop. +* Disconnect all power. +* Place equipment back into Pelican cases. +* Return to LSC with equipment using Library provided transportation.