Skip to content

Simplify the parallelization and execution of R code on the Saint Louis University High Performance Cluster

License

Notifications You must be signed in to change notification settings

Saint-Louis-University/sluhpc

Repository files navigation

sluhpc

Under Development Last Commit Travis build status AppVeyor build status Codecov test coverage

Overview

The goal of sluhpc is to simplify the parallelization and execution of R code on the Saint Louis University (SLU) High Performance Cluster (HPC).


Installation

You can install sluhpc from GitHub with:

remotes::install_github("Saint-Louis-University/sluhpc")

Notes

R Version

The current version of R on the cluster is Microsoft R Open 3.3.2. You may avoid many potential errors by working in a local copy of R circa 3.3.2.

IP Restriction

The cluster only accepts connections from IP addresses registered to SLU. If working off campus, you will need to log into the VPN using your SLU Net ID and password.

Credentials

By default, credentials to connect to the HPC are read from the environment variables APEX.SLU.EDU_USER and APEX.SLU.EDU_PASS using base::Sys.getenv(). These variables are commonly set via a .Renviron file.


Example

This is a three-step example which shows how to execute R code in parallel on the cluster.

Step 1

We define a function and construct a corresponding parameter set. Then we create a local folder containing of all the files necessary to run the code in parallel on the cluster.

library(sluhpc)

my_function <- function(parameter_mu, parameter_sd) {
  sample <- rnorm(10^6, parameter_mu, parameter_sd)
  c(sample_mu = mean(sample), sample_sd = sd(sample))
}

my_parameters <- data.frame(parameter_mu = 1:10,
                            parameter_sd = seq(0.1, 1, length.out = 10))

slurm_job <- slurm_apply(my_function, 
                         my_parameters, 
                         "my_apply")

Step 2

We open a secure shell (SSH) connection to the cluster using credentials stored in your .Renviron file, upload the previously created local folder, and submit the job to the Slurm Workload Manager.

session <- apex_connect()
slurm_upload(session, slurm_job)
slurm_submit(session, slurm_job)

Step 3

The slurm_download() function will block until the job has completed running on the cluster and then download the results via SCP. We can then bind the results from each node together into a data frame object, and disconnect our SSH session.

slurm_download(session, slurm_job)
results <- slurm_output_dfr(slurm_job)
apex_disconnect(session)

About

Saint Louis University

Founded in 1818, Saint Louis University is one of the nation’s oldest and most prestigious Catholic institutions. Rooted in Jesuit values and its pioneering history as the first university west of the Mississippi River, SLU offers nearly 13,000 students a rigorous, transformative education of the whole person. At the core of the University’s diverse community of scholars is SLU’s service-focused mission, which challenges and prepares students to make the world a better, more just place.

About

Simplify the parallelization and execution of R code on the Saint Louis University High Performance Cluster

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages