Added degree day calculation and SLURM job handling #4

hagento · 2024-12-07T13:12:10Z

This PR introduces the integration of explicit degree day calculation and SLURM job management to facilitate the process.

Key Changes

New Functionality:
- getDegreeDays: Orchestrates the degree day calculation process. It accepts a fileMapping input that defines the datasets to be processed. This function:
  - Loads the required input data and parameters.
  - Initiates calculations row-wise, outsourcing them into individual SLURM jobs corresponding to each row of fileMapping.
  - Waits for all SLURM jobs to complete successfully.
    In the next PR, this function will be extended to gather data from completed jobs and handle post-processing.
- processArgs: Ensures the input arguments for getDegreeDays are properly formatted.
- initCalculation: Reads a specific row from fileMapping and splits the data year-wise to enhance processing stability.
- SLURM Utilities (slurmUtils.R):
  - createSlurm: Generates SLURM scripts for individual jobs, temporarily saves (and later removes) necessary data, and submits the jobs. It also creates the required directory structure within an output folder.
  - waitForSlurm: Monitors a list of jobs defined by jobIds, ensuring they complete successfully. If a job fails, it returns the corresponding job ID(s).
  - Log files are stored in output/logs, while job outputs are saved in output/hddcdd.
- Generalized Directory Path Handling: All functions now support flexible path handling.
- Parameter and Mapping Management: External parameters and most mappings have been moved to individual .csv files located in /extdata/mappings.

Current Limitations

Hardcoded SLURM Scripts:
Currently, the SLURM script created by createSlurm is hardcoded. Future iterations will make this function more flexible, allowing users to define and pass custom scripts. For now, the existing implementation serves its purpose effectively.

robinhasse · 2024-12-13T10:15:44Z

R/computeAnnualDegreeDays.R

+  ftas  <- fileMapping[["tas"]]
+  frsds <- if (bait) fileMapping[["rsds"]]    else NULL
+  fsfc  <- if (bait) fileMapping[["sfcwind"]] else NULL
+  fhuss <- if (bait) fileMapping[["huss"]]    else NULL


This looks as if it should be one object

Suggested change

ftas <- fileMapping[["tas"]]

frsds <- if (bait) fileMapping[["rsds"]] else NULL

fsfc <- if (bait) fileMapping[["sfcwind"]] else NULL

fhuss <- if (bait) fileMapping[["huss"]] else NULL

fileNames <- c(ftas = fileMapping[["tas"]])

if (bait) {

fileNames <- c(

fileNames,

frsds = fileMapping[["rsds"]]

fsfc = fileMapping[["sfcwind"]]

fhuss = fileMapping[["huss"]]

)

}

robinhasse · 2024-12-13T10:17:03Z

R/computeAnnualDegreeDays.R

+    compStackHDDCDD(ftas  = gsub(".nc", paste0("_", i, ".nc"), ftas),
+                    frsds = if (bait) gsub(".nc", paste0("_", i, ".nc"), frsds) else NULL,
+                    fsfc  = if (bait) gsub(".nc", paste0("_", i, ".nc"), fsfc)  else NULL,
+                    fhuss = if (bait) gsub(".nc", paste0("_", i, ".nc"), fhuss) else NULL,


Of course, you would have to adapt the function to take on argument and recognise missing elements if bait = FALSE...

Suggested change

compStackHDDCDD(ftas = gsub(".nc", paste0("_", i, ".nc"), ftas),

frsds = if (bait) gsub(".nc", paste0("_", i, ".nc"), frsds) else NULL,

fsfc = if (bait) gsub(".nc", paste0("_", i, ".nc"), fsfc) else NULL,

fhuss = if (bait) gsub(".nc", paste0("_", i, ".nc"), fhuss) else NULL,

compStackHDDCDD(fileNames = sub(".nc", paste0("_", i, ".nc"), fileNames),

robinhasse

Thanks @hagento! Have you checked if there are already SLURM or some R parallelisation tools that do the work of starting and collecting calculations for you? This seems like a very common task in parallel computing. Sorry, if I have asked this multiple times before.

robinhasse · 2024-12-13T10:21:37Z

R/computeAnnualDegreeDays.R

+  yStart <- fileMapping[["start"]] %>% as.numeric()
+  yEnd   <- fileMapping[["end"]] %>% as.numeric()
+
+  # extract RCP scenario + model
+  rcp   <- fileMapping[["rcp"]] %>% unique()
+  model <- fileMapping[["gcm"]] %>% unique()


I'm wondering if csv is the right format for the mapping if you need so many unique() statements here. What if one puts different values and you have multiple unique values? Maybe use yaml format?

This should not happen since initCalculation always only receives one row of the initial file mapping from getDegreeDays.

robinhasse · 2024-12-13T10:27:54Z

R/computeBAIT.R

-  h <- hum     - cfac(tasData, type = "h", params = c(params[["aHUSS"]], params[["bHUSS"]]))
+  s <- solar   - cfac(tasData, type = "s", params = params)
+  w <- wind    - cfac(tasData, type = "w", params = params)
+  h <- hum     - cfac(tasData, type = "h", params = params)
  t <- tasData - cfac(tasData, type = "t", params = NULL)



Doesn't seem efficient to read the same file for each call of cfac. Might make sense to read the file outside of the function and pass it as an argument. But I guess this is really minor compared to other impacts on the runtime...

Are you talking about paramsMap? You are right, this can be checked in advance.

robinhasse · 2024-12-13T10:34:50Z

R/getDegreeDays.R

+  # check if mapping file contains correct columns
+  mappingCols <- c("gcm", "rcp", "start", "end", "tas", "rsds", "sfc", "huss")
+  if (!any(mappingCols %in% colnames(fileMapping))) {
+    stop("Please provide file mapping with correct columns.\n Missing columns: ")


You don't give the missing cols.

robinhasse · 2024-12-13T10:36:35Z

R/getDegreeDays.R

+  wBAIT <- setNames(as.list(wBAIT$value), wBAIT$variable)
+
+  # population data
+  popMapping <- setNames(as.list(popMapping$file), popMapping$scenario)
+
+  # scenario matrix
+  scenMatrix <- setNames(lapply(strsplit(scenMatrix$rcp, ","), trimws), scenMatrix$ssp)


Can't you use dplyr::pull here?

robinhasse · 2024-12-13T10:41:02Z

R/slurmUtils.R

+  return(list(jobName = jobName,
+              jobScript = jobScript,
+              outputFile = outputFile,
+              slurmCommand = slurmCommand,
+              jobId = jobId,
+              batch_tag = batch_tag))


Make the input invisible such that it doesn't flush the console if the user did no assignment.

Suggested change

return(list(jobName = jobName,

jobScript = jobScript,

outputFile = outputFile,

slurmCommand = slurmCommand,

jobId = jobId,

batch_tag = batch_tag))

return(invisible(list(jobName = jobName,

jobScript = jobScript,

outputFile = outputFile,

slurmCommand = slurmCommand,

jobId = jobId,

batch_tag = batch_tag)))

robinhasse · 2024-12-13T10:45:01Z

R/getDegreeDays.R

+                             wBAIT = wBAIT,
+                             outDir = outDir)
+
+          allJobs[[length(allJobs) + 1]] <- job


Doesn't that work?

Suggested change

allJobs[[length(allJobs) + 1]] <- job

allJobs <- c(allJobs, job)

robinhasse · 2024-12-13T10:49:32Z

R/slurmUtils.R

+
+    # Check timeout
+    if (difftime(Sys.time(), startTime, units = "secs") > maxWaitTime) {
+      stop("Maximum wait time exceeded")
+    }


Suggested change

# Check timeout

if (difftime(Sys.time(), startTime, units = "secs") > maxWaitTime) {

stop("Maximum wait time exceeded")

}

robinhasse · 2024-12-13T10:49:56Z

R/slurmUtils.R

+  jobIds <- as.character(jobIds)
+  jobSet <- unique(jobIds)  # Remove duplicates
+
+  while (TRUE) {


Suggested change

while (TRUE) {

while (difftime(Sys.time(), startTime, units = "secs") < maxWaitTime) {

robinhasse · 2024-12-13T10:52:43Z

R/slurmUtils.R

+    }
+
+    Sys.sleep(checkInterval)
+  }


Suggested change

}

}

if (difftime(Sys.time(), startTime, units = "secs") > maxWaitTime) {

stop("Maximum wait time exceeded")

}

robinhasse · 2024-12-13T11:02:59Z

R/processArgs.R

+#' @export
+
+processArgs <- function(tLim, std, ssp) {
+  #### Process tLim ####


Please try to stick to standard code section syntax to allow proper code folding and outline generation in RStudio.

hagento added 7 commits December 6, 2024 16:18

added dynamic directory handling

0a75dce

changed mapping format and other small changes

21af8b3

added slurm job handling

c70a6ef

improved waitForSlurm() and added output directories

40939ff

add initCalculation and processArgs

9891689

outsourced parameters into mappings and some style changes

2ec0404

increment package version

360cc57

hagento requested a review from robinhasse December 7, 2024 13:12

hagento added 2 commits December 7, 2024 14:27

included full input file mapping

d9245eb

shifted scenario matrix to external mapping

8198e79

hagento force-pushed the processData branch from d524172 to 8198e79 Compare December 12, 2024 11:05

robinhasse reviewed Dec 13, 2024

View reviewed changes

robinhasse requested changes Dec 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added degree day calculation and SLURM job handling #4

Added degree day calculation and SLURM job handling #4

hagento commented Dec 7, 2024

robinhasse Dec 13, 2024

robinhasse Dec 13, 2024 •

edited

Loading

robinhasse left a comment

robinhasse Dec 13, 2024

hagento Dec 13, 2024 •

edited

Loading

robinhasse Dec 13, 2024

hagento Dec 13, 2024

robinhasse Dec 13, 2024

robinhasse Dec 13, 2024

robinhasse Dec 13, 2024

robinhasse Dec 13, 2024

robinhasse Dec 13, 2024

robinhasse Dec 13, 2024

robinhasse Dec 13, 2024

robinhasse Dec 13, 2024

-  ftas  <- fileMapping[["tas"]]
-  frsds <- if (bait) fileMapping[["rsds"]]    else NULL
-  fsfc  <- if (bait) fileMapping[["sfcwind"]] else NULL
-  fhuss <- if (bait) fileMapping[["huss"]]    else NULL
+fileNames <- c(ftas = fileMapping[["tas"]])
+if (bait) {
+  fileNames <- c(
+    fileNames,
+    frsds = fileMapping[["rsds"]]
+    fsfc  = fileMapping[["sfcwind"]]
+    fhuss = fileMapping[["huss"]]
+  )
+}

	allJobs[[length(allJobs) + 1]] <- job
	allJobs <- c(allJobs, job)

	while (TRUE) {
	while (difftime(Sys.time(), startTime, units = "secs") < maxWaitTime) {

Added degree day calculation and SLURM job handling #4

Are you sure you want to change the base?

Added degree day calculation and SLURM job handling #4

Conversation

hagento commented Dec 7, 2024

Key Changes

Current Limitations

Choose a reason for hiding this comment

robinhasse Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

robinhasse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hagento Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robinhasse Dec 13, 2024 •

edited

Loading

hagento Dec 13, 2024 •

edited

Loading