Skip to content

Commit

Permalink
Merge pull request #7 from UtrechtUniversity/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
chStaiger authored Mar 30, 2023
2 parents 8f4f6e6 + df149bc commit dec1f79
Show file tree
Hide file tree
Showing 6 changed files with 744 additions and 1 deletion.
62 changes: 61 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,62 @@
# iBridges-SteppingStone
Transferring data from Yoda to a destination server through a stepping stone server in the middle.
The scripts in this repository are developed for transferring data from Yoda/iRODS to a destination server through a stepping stone server in the middle.

## Use case
This method can be used when an iRODS instance is protected by a firewall (e.g. in the case of sensitive data) and the firewall is configured in such a way that data cannot be transferred directly between the iRODS server and the compute site (destination server).

When a stepping stone server is available, data tranfer can be organized in two steps:
1. `irsync` for data transfer between iRODS and the stepping stone server
2. `rsync` for data transfer between the stepping stone server and the final destination, e.g. the compute server or VM (see below).

![Stepping stone transfer](img/Stepping_stone.png)

## Requirements
1. This method requires a stepping stone server to be setup at the periphery of the network (not covered in this repository). This server should be able to connect to iRODS and also to the compute facility.
2. Both servers, stepping stone and destination server, are *linux* servers.
3. The user that transfers data between the stepping stone server and the destination server authenticates for the destination server through an ssh-key pair.
4. The iRODS `icommands` are installed on the stepping stone server ([instructions](https://www.uu.nl/en/research/yoda/guide-to-yoda/i-am-using-yoda/using-icommands-for-large-datasets)).
5. Python and the python-irodsclient are installed on the stepping stone server:
- Python 3.X
- python-irodsclient version 1.X

**The scripts are tested on Ubuntu with python 3.6.9.**

## Installation & configuration

The tools in this repo are installed and used on the stepping stone server.

1. Clone this GitHub repository

`git clone https://github.com/UtrechtUniversity/iBridges-SteppingStone.git`

2. Create an iRODS configuration file:
`touch ~/.irods/irods_environment.json`

3. Add all relevant info to the configuration file.
`nano ~/.irods/irods_environment.json`
For Yoda users @UtrechtUniversity, find the relevant info [here](https://www.uu.nl/en/research/yoda/guide-to-yoda/i-am-using-yoda/using-icommands-for-large-datasets).

4. Install Python dependencies
`pip3 install python-irodsclient==1.1.6`

5. Create a client configuration file. The client needs to be given the information about the destination server to copy data to and which user to use for the actions.
`touch ~/.irods/transfer.config`

6. Add the relevant info to the client configuration file.
`nano ~/.irods/transfer.config`
```sh
[remote]
serverip: IP address of destination server or FQDN
datauser: user
sudo: False

[local_cache]
limit = number of GB of free space on stepping stone server (e.g. limit = 10)
```

## Usage
```
Usage: python3 transfer_workflow.py -i, --input=csv-file-path
Example: python3 transfer_workflow.py -i /home/user/transfer.csv
```

Binary file added img/Stepping_stone.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
191 changes: 191 additions & 0 deletions src/irods_functions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
import subprocess
import json
import os
from datetime import datetime
from typing import Union
import irods.session
from irods.exception import CATALOG_ALREADY_HAS_ITEM_BY_THAT_NAME, CAT_NO_ACCESS_PERMISSION
from src.utils import print_error, print_warning, print_message


def read_irods_env(irods_env_file: str) -> dict:
"""
Expects a json file in ~/.irods/irods_environment.json
Returns the whole file as dictionary
"""
with open(irods_env_file) as file:
ienv = json.load(file)
return ienv


def init_irods_connection(irods_env_file: str) -> Union[tuple, bool]:
"""
Tests whether a connection to an irods server can be established.
Expects an irods_environment.json file and a valid scrambled password .iRODS in ~/.irods.
Creates a dictionary from the environment file.
Returns: irods.session, dictionary
"""
ienv = read_irods_env(irods_env_file=irods_env_file)
res = subprocess.run(["ils"], input="bogus".encode(),
stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=False)
if res.returncode == 0:
print_message(f"Connected to: {ienv.get('irods_host')}")
print_message(res.stdout.decode())
session = irods.session.iRODSSession(irods_env_file=irods_env_file)
return (session, ienv)

print_error("ERROR: Cannot connect to iRODS server")
print_message("Please do an iinit")
return False


def irsync_local_to_irods(session: irods.session.iRODSSession, localpath: str,
irodspath: str):
"""
Transfers data from a local filesystem to iRODS. Checks checksums and registers them in iRODS.
Returns: True upon success; False otherwise.
"""
print_message(f"iRODS irsync: {localpath} --> {irodspath}")
if not session.collections.exists(irodspath):
print_error(f"ERROR: Destination {irodspath} does not exist")
return False

localname = os.path.basename(localpath)
if os.path.isdir(localpath):
res = subprocess.run(["irsync", "-Kr", f"{localpath}", f"i:{irodspath}/{localname}"],
stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=False)
elif os.path.isfile(localpath):
res = subprocess.run(["irsync", "-K", f"{localpath}", f"i:{irodspath}/{localname}"],
stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=False)
else:
print_error(f"ERROR: Transferring {localpath} --> {irodspath} failed")
print_message("Local path not known.")
return False

if res.returncode == 0:
return True
else:
print_error(f"ERROR: Transferring {localpath} --> {irodspath} failed")
print_message(res)
return False


def irsync_irods_to_local(session: irods.session.iRODSSession, irodspath: str,
localpath: str) -> bool:
"""
Given an iRODS path and a localpath, transfers data from iRODS to a local filesystem.
During the transport checksums are checked on the fly and, if not present, registered in iRODS.
Running time can be reduced by firsuring that checksums are already registered in iRODS
(running "ichksum irodspath" on commandline).
Returns: True upon success; False otherwise
"""
print_message(f"iRODS irsync: {irodspath} --> {localpath}")
if not os.path.isdir(localpath):
print_error(f"ERROR: Destination {localpath} does not exist")
return False

itemname = os.path.basename(irodspath)
if session.collections.exists(irodspath) or session.data_objects.exists(irodspath):
res = subprocess.run(["irsync", "-Kr", f"i:{irodspath}", f"{localpath}/{itemname}"],
stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=False)
if res.returncode == 0:
return True

print_error(f"ERROR: Transferring {irodspath} --> {localpath} failed")
print_message(res)
return False

print_error(f"ERROR: Transferring {irodspath} --> {localpath} failed")
print_message("iRODS path not known.")
return False


def get_irods_size(session: irods.session, path_names: list) -> int:
"""
Calculates the cumulative file size of a list of iRODS paths. Paths can
point toiRODS data objects or iRODS collections.
Input: irods.session object, list of iRODS path names
Output: cumulative sum of all file sizes
"""
irods_sizes = []
for path_name in path_names:
if session.data_objects.exists(path_name):
obj = session.data_objects.get(path_name)
irods_sizes.append(obj.size)
elif session.collections.exists(path_name):
coll = session.collections.get(path_name)
irods_sizes.append(sum((sum((obj.size for obj in objs)) for _, _, objs in coll.walk())))
return sum(irods_sizes)


def map_collitems_to_folder(session: irods.session, collpath: str, folder: str,
localpath_to_irods=False) -> list:
"""
Mapping all members of a collection to their absolute path in a folder on a linux filesystem.
Params:
session: iRODS session
collpath: iRODS collection path
folder: linux or windows path
localpath_to_irods: direction of output
"""
coll = session.collections.get(collpath)
destination = f"{folder}/{os.path.basename(coll.path)}"
objs = [obj for _, _, objs in coll.walk() for obj in objs]

obj_to_file = []

for obj in objs:
if localpath_to_irods:
obj_to_file.append((destination+obj.path.split(coll.path)[1], obj.path))
else:
obj_to_file.append((obj.path, destination+obj.path.split(coll.path)[1]))
return obj_to_file


def annotate_data(session: irods.session, irodspath: str,
localpath: str, serverip: str):
"""
Annnotates all data objects on the irodspath with metadata triple:
"data_copy_on_server", serverip:localpath, timestamp
Input: irods.session object, full irods path (coll or obj),
full local path, server ip or fully qualified domain name
Output: True when metadata is added or already present, False otherwise
"""
if session.collections.exists(irodspath):
coll = session.collections.get(irodspath)
annotate_objs = [obj for _, _, objs in coll.walk() for obj in objs]
elif session.data_objects.exists(irodspath):
annotate_objs = [session.data_objects.get(irodspath)]
else:
print_error(f"ERROR: Annotating {irodspath} failed")
print_message("Path does not exist.")
return False

timestamp = datetime.now()
print_message(annotate_objs)
for obj in annotate_objs:
try:
obj.metadata.add("data_copy_on_server", serverip+":"+localpath,
timestamp.strftime("%Y-%m-%d"))
except CATALOG_ALREADY_HAS_ITEM_BY_THAT_NAME:
print_warning("INFO: Metadata already exists {irodspath}")
except CAT_NO_ACCESS_PERMISSION:
print_error(f"ERROR: No permission to add metadata {irodspath}")
except Exception:
print_error(f"ERROR: Metadata could not be added {irodspath}")


def ensure_coll(session: irods.session, irodspath: str):
try:
if session.collections.exists(irodspath):
return True
else:
session.collections.create(irodspath)
return True
except irods.exception.CAT_NO_ACCESS_PERMISSION:
print_error(f'ERROR: Could not create {irodspath}')
return False
Loading

0 comments on commit dec1f79

Please sign in to comment.