Organize the storage on the cluster #77

gkaf89 · 2024-07-25T07:23:30Z

Storage tiers

The storage is organized across multiple tiers. The distinguishing characteristics for the tiers are:

speed (throughput and latency),
size,
accessibility (temporal and locational persistency), and
robustness (redundancy and back-ups).

Usually

speed is inversely proportional to size, robustness, and accessibility, and
size, robustness, and accessibility are proportional to each other.

Only low speed storage (i.e. the Isilon NFS mount) will be accessible to all clusters in the future. Thus, Isilon will become crucial in the future in maintaining uniform data access across all clusters.

File systems accessible through the HPC Infiniband network

The HPC file systems are meant to store working data, and are not meant for long term storage. The scratch file system and project directories store large temporary input/output files, the home directory is meant for working storage, and then we have local file systems accessible through /tmp (local persistent memory) and /dev/shm (virtual memory) that are fast, available in jobs, and wiped out when the job finishes. Finally project storage is meant to store finalized input and output files.

However, there are file systems that are accessible through slower network connections and offer different kinds of features.

File systems not accessible through Infiniband

The central university storage is slower, but snapshotted and backed up much more regularly. Therefore users should transfer their data to the central systems for long tern storage.

However, there are multiple options of accessing the central university storage. There are the systems Atlas, Ebenezer, Isilon-DMZi, and Isilon-DMZe.

What is the difference between Atlas, Ebenezer, and Isilon?
What is the difference between Isilon-DMZi and Isilon-DMZe?
How are user quota managed in central storage systems, and how can users see the usage limits?

The Isilon file system

Isilon is actually the name of the technical solution: https://www.dell.com/fr-fr/dt/storage/isilon/isilon-h5600-hybrid-nas-storage.htm#scroll=off

There are 2 central storage servers to Hyacithe's knowledge, which are operated by the SIU, the "isilon-prod" and "isilon-drs" (off site replica of "isilon-prod", in case of disaster on "isilon-prod").

The isilon-prod is split in (at least) two zones:

the SIU zone, that accessed using SMB via atlas.uni.lux, and
the HPC zone, that is mounted in the clusters with NFS and can be accessed in /mnt/isilon.

For the HPC side, we are on;y interested about the NFS mounted file system. Documentation about Isilon: https://hpc-git.uni.lu/ulhpc/sysadmins/-/wikis/storage/isilon

The processes for the HPC zone are not well defined or documented. We can set up quota per project directory, but there's no way to show this information to the users. We are working on providing users with access to this information and setting up a policy for assigning quota.
We share the Isilon system with the SIU. There is a "fair use agreement" in place which allocated 2PB for the HPC zone, currently used at 88% of the full capacity. Maintaining access to the Isilon system is important moving forward, as the Isilon file system will be the only system unifying data access across our future clusters. We should participate in any future calls and coordinate with SIU.
In terms of performance, performance is abysmal with small random I/Os, for instance small files, metadata, etc. The Isilon NFS mount works well for administrative needs, like archiving and occasional data transfers, and even for big file I/O. But don't try to perform any compute driven operation on NFS mounted Isilon, like compile a software on it, or anything similar.

The Atlas file system

The SMB protocol allows for easy mounting of file systems on personal computers, including Windows machines.

The HPC team is not managing the file system exported through SMB from Atlas (atlas.uni.lux). However, the HPC team maintains the smb-storage script (under active development) that allows mounting SMB shares on the login nodes of our clusters.

Fun fact: you can access the HPC zone via samba on your workstation using your Active Directory credentials. This works via a fragile script to map windows/POSIX permissions and user accounts from the HPC-IPA to the SIU Active Directory. This was requested by LCSB Bio-core in 2014. The system still works but it is no longer supported. Honestly, if you are using linux you can get the performance of SMB with SSHFS: https://blog.ja-ke.tech/2019/08/27/nas-performance-sshfs-nfs-smb.html

Add some instruction on how to fix errors in access permissions

The discussion of data management is a bit unorganized. We should probably reorganize the sections and add some information on how users can fix their projects when errors occur.

To fix access permissions in a project directory,

change ownership,

chown -R :<project name> /work/projects/<project name>

and then change access rights:

find /work/projects/<project name> -type d | xargs chmod g=rxs

Also, add a link with more resources: https://www.redhat.com/sysadmin/suid-sgid-sticky-bit

The text was updated successfully, but these errors were encountered:

gkaf89 · 2024-08-11T09:29:36Z

Excerpt from ticket:

You also need to make sure that all files created in your directories will have the correct permissions. Try this command:

find /work/projects/covalux/scratch_lschramm -type d | xargs chmod g=rxs

This command sets the sticky-bit (https://www.redhat.com/sysadmin/suid-sgid-sticky-bit) in your directories. You will see in the ls -la command that directories will change from drwxr-xr-xto drwxr-sr-x.

Setting the sticky bit in a directory ensures that all files created in the directory will inherit the group of the directory. See: https://www.redhat.com/sysadmin/suid-sgid-sticky-bit

In the future make sure that files and directories your create in projects have the correct permission. Remember, in project directories the quota are computed per project group (covalux in your case). Cluster users (clusterusers) have 0 quota in the project directory, so any complaints about insufficient storage may also be caused by incorrect user groups.

gkaf89 added the enhancement New feature or request label Jul 25, 2024

gkaf89 self-assigned this Jul 25, 2024

gkaf89 changed the title ~~Add some instruction on how to fix errors in access permissions~~ organize the storage on the cluster Aug 11, 2024

gkaf89 changed the title ~~organize the storage on the cluster~~ Organize the storage on the cluster Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Organize the storage on the cluster #77

Organize the storage on the cluster #77

gkaf89 commented Jul 25, 2024 •

edited

Loading

gkaf89 commented Aug 11, 2024

Organize the storage on the cluster #77

Organize the storage on the cluster #77

Comments

gkaf89 commented Jul 25, 2024 • edited Loading

Storage tiers

File systems accessible through the HPC Infiniband network

File systems not accessible through Infiniband

The Isilon file system

The Atlas file system

Add some instruction on how to fix errors in access permissions

gkaf89 commented Aug 11, 2024

gkaf89 commented Jul 25, 2024 •

edited

Loading