Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update on-site-file-transfer for RAID usage #44

Open
wants to merge 1 commit into
base: gh-pages
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 131 additions & 65 deletions sitevisits/site-visit-file-transfers.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,109 +10,175 @@ parent: Site Visits

## Table of contents
{: .no_toc .text-delta }
---
title: On-Site Transfers
layout: default
nav_order: 2
parent: Site Visits
---

1. TOC
{:toc}
## On-Site File Transfers

# Introduction
{: .no_toc }

When the Library acquires a copy of digital files, rather than the digital carriers themselves, Digital Archives prefers to make the copy on-site instead of bringing the carrier to the lab, copying, and returning.

The file transfer workflows are detailed in this document.
The workflow may need to be adapted based on the media types and file types encountered.
Transferring material during a site visit will differ from transferring material in the lab.

Born digital collection material can be acquired through file transfer or forensic imaging. Most material will be transferred using a Bagit script through command line.
## Preparing the Transfer

The file transfer workflows are detailed in this document. The workflows may vary based on media types and file types encountered. Transferring material during a site visit will differ from transferring material in the lab. At the time of a site visit no collection number will be assigned to the material and no media log inventory will exist for the material.
* Use acquisition dossiers to understand the scale of the transfer.
* Contact the donor or organization for further details about:
* the types of carriers (external drives, network storage, etc)
* the available connections to the carriers (USB3, Thunderbolt, Ethernet, etc)
* space available for a 2-3 week period
* best times to complete the transfer
* Coordinate with LSC Collection Management about transport to and from the site.

**Be sure to pad your estimates of transfer time. Transfer always takes longer than estimated.**
1. Name the Transfer
## Equipment

2. Build SIPs
* Check that you have the appropriate equipment for the transfer.
* Test that all equipment works before the site visit.

3. Transfer files from media object
### Storage

## Name the Transfer
At the time of a site visit no collection number will be assigned to the material.
* Use a collection name such as the name of the institution you are acquiring from or the personal name of the creator.
* The Foundry Theatre ```foundryTheatre```
* Lou Reed papers ```louReed```
* Add ```additions``` to the transfer name when you are acquiring additions.
* Bill T. Jones additions ```billTJonesadditions```
* [ ] 8-16 TB USB External Hard Drive

Or

* [ ] 126TB RAID
* [ ] RAID power cord
* [ ] Thunderbolt 3 cord
* [ ] Thunderbolt 3 to Thunderbolt 2 adapter
* [ ] RAID Pelican case

## Build SIPs
These instructions show you how to create SIPs using a one-line command to create directories.
### Laptop

* [ ] Laptop with Thunderbolt 2/3 ports and/or USB3-A/C ports
* [ ] Laptop power supply

On Windows:
* Start Cygwin from the desktop. A terminal like screen should appear.
### Potential Additional Supplies

On Mac:
* Open Terminal.
* [ ] Pelican rollaway case
* [ ] USB A 10-receptacle hub
* [ ] USB hub power supply
* [ ] USB 3.1 A receptacle to USB-C plug adapter
* [ ] USB 3.1 B - USB 3.1 A cables
* [ ] USB 3.1 Micro B - USB 3.1 A cable
* [ ] Power strips
* [ ] USB3 write-blocker

On all operating systems:
## Transfer

* ``mkdir`` command can be used to create SIPs. This works when SIPs aren't consecutively numbered. 0001 to 0009 require a different line from 0010 on.
* Change to fileTransfers directory.
```$ cd filetransfers```
* Enter ```mkdir``` command.
```mkdir -p CollID/Media-000{1..9}/{metadata/submissionDocumentation,objects}```
```mkdir -p CollID/Media-00{10..99}/{metadata/submissionDocumentation,objects}```
```mkdir -p CollID/Media-000{1,5,7,9}/{metadata/submissionDocumentation,objects}```
* When you arrive, discuss the following with the site contact:
* contact information and hours of site access
* any documentation that might be useful for provenance
* expectations for repeated on-site access (e.g. adding sets of hard drives)
* whether any files have been updated in the past 30 days and will require additional quarantine
* Establish an expected completion date with LSC Collection Management and update as needed.

### Equipment Setup

Make sure all cables are run safely, such as around the back of a table.

### SIP structure
* Find uninterrupted power source for extension cord.
* Connect the laptop, transfer storage, and and additional equipment to extension cord.
* Connect the transfer storage to the laptop.
* Connect any additional equipment to the laptop.
* Boot the laptop and check that all equipment works.
* Do not connect to network unless required for the transfer.

* /M0021
### Name the Transfers

* /metadata
At the time of a site visit no collection number will be assigned to the material.

* /submissionDocumentation
* For the collection ID, use the name of the institution you are acquiring from or the personal name of the creator.
* The Foundry Theatre `foundryTheatre`
* Lou Reed papers `louReed`
* Add additions to the transfer name when you are acquiring additions.
* Bill T. Jones additions `billTJonesadditions`
* Create a text file for the collection in the root director of the transfer drive named `CollID.csv`.
* For each piece of source media, assign a sequential ID number `M0001`.
* In the text file, record the name or label of the drive and the ID number, separated by a comma.

* /objects
### Preparing to Transfer from Hard Drives (if necessary)

## Transfer files from media object
Repeat this process until no additional ports are available.

Files that have been updated by the donor within the past 30
days should be quarantined for 30 days to ensure that
all virus definitions are up to date.
* On the laptop, open the system disk manager or utility to check the status of drives.
* Connect the source drive to laptop or hub and turn on the power.
* Remount the source drive as read-only. Device numbers are available in the disk manager.

* Use a write-blocker to connect the drive to the computer.
* [Ultrakit](../tools/ultrakit){:target="_blank"}, [Portable Forensic Bridges](../using/using-lab-equipment#portable-forensic-bridges){:target="_blank"}
``` bash
diskutil umount /dev/sda[]
diskutil mount readOnly /dev/sda[]
```

### Bagit Script
* Make the transfer directories.

* Run [ft.sh ](../software#ftsh){:target="_blank"} to create a transfer package.
```bash
mkdir -p /path/to/transfer/CollID/MediaID/{metadata,objects}
```

On Windows:
* Start Cygwin from the desktop. A terminal like screen should appear.
* Generate a file name and file size manifest of drive and save to the transfer drive

On Mac:
* Open Terminal.
``` bash
find /path/to/drive -type f -print0 | xargs -0r stat -f '%N, %z' | sort > /path/to/transfer/CollID/MediaID/metadata/sourcedrive.csv
```

On all operating systems:
* Enter the alias ```FT``` and hit return.
* In a text editor, create the command to transfer the source drive to its own folder on the transfer drive

Or
``` bash
rsync -rtP --exclude-from /path/to/transfer/exclude-list.txt --log-file=/path/to/transfer/CollID/MediaID/metadata/rsync_log.csv --log-file-format=", %f, %l, %C" /path/to/source /path/to/transfer/CollID/MediaID/objects/
```

### Preparing to Transfer from Network Locations (if necessary)

To be written{: .label .label-yellow }

### Starting Transfers

* Chain together the transfer commands with semi-colons to run them sequentially.
It is typically faster to copy from source drives sequentially, instead of simultaneously.

``` bash
rsync -rtP ... ; rsync -rtP ... ; rsync -rtP ... ; ...
```

* Check transfer speed and determine when an additional visit will be necessary to add new drives or to complete the transfer.
For the number of expected days: `[total amount on drives in MB] / [transfer speed in MB] / 3600 / 24`

### Disconnecting Hard Drive Transfers

* Compare the size of the source drive to the folder on the transfer drive.

* Enter ```/usr/local/bin/ft.sh``` and hit return if the alias is not set.
``` bash
du -sh /path/to/source/
du -sh /path/to/transfer/CollID/MediaID/objects/
```

* Drag the SIP folder from the media object to the window and hit return as prompted.
* Generate a file name and file size manifest of folder on the transfer drive and save it to the transfer drive.

* Enter the MediaID [EX: ```M0021```] for the file transfer and hit return.
``` bash
find /path/to/drive -type f -print0 | xargs -0r stat -f '%N, %z' | sort > /path/to/transfer/storage/CollID/MediaID/metadata/sourcedrive_transferred.csv
```

* The terminal prompt will display below when the process is complete.
* Investigate any discrepancies between the manifests and retransfer if necessary.

### Rsync
Bagit may fail when attempting to copy hidden or system files. Use rsync when you determine it is the better tool for a transfer. It might be possible to use rsync in the event Bagit fails. Make sure you have enough time to start a new transfer. When you don't have enough time the transfer will need to take place another time.
``` bash
comm -2 -3 /path/to/transfer/storage/CollID/MediaID/metadata/sourcedrive.csv /path/to/transfer/storage/CollID/MediaID/metadata/sourcedrive_transferred.csv
```

On Windows:
* Start Cygwin from the desktop. A terminal like screen should appear.
* Unmount the source drive.
* Disconnect completed drive from hub

On Mac:
* Open Terminal.
### Equipment Tear Down

On all operating systems:
* Enter ```rsync -arP targetpath destinationpath```
* A trailing slash on the destination path copies contents of a folder not the folder itself.
* The selected options represented in the command are ```--archive --recursive --progress --partial```.
* Exclude files using ```--exclude=.DS_Store``` or ```--exclude-from 'exclude-list.txt'```
* Unmount the transfer drive and any additional equipment.
* Power down the transfer drive.
* Power down the laptop.
* Disconnect all power.
* Place equipment back into Pelican cases.
* Return to LSC with equipment using Library provided transportation.