Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added degree day calculation and SLURM job handling #4

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

hagento
Copy link
Collaborator

@hagento hagento commented Dec 7, 2024

This PR introduces the integration of explicit degree day calculation and SLURM job management to facilitate the process.


Key Changes

  1. New Functionality:
    • getDegreeDays: Orchestrates the degree day calculation process. It accepts a fileMapping input that defines the datasets to be processed. This function:

      • Loads the required input data and parameters.
      • Initiates calculations row-wise, outsourcing them into individual SLURM jobs corresponding to each row of fileMapping.
      • Waits for all SLURM jobs to complete successfully.
        In the next PR, this function will be extended to gather data from completed jobs and handle post-processing.
    • processArgs: Ensures the input arguments for getDegreeDays are properly formatted.

    • initCalculation: Reads a specific row from fileMapping and splits the data year-wise to enhance processing stability.

    • SLURM Utilities (slurmUtils.R):

      • createSlurm: Generates SLURM scripts for individual jobs, temporarily saves (and later removes) necessary data, and submits the jobs. It also creates the required directory structure within an output folder.
      • waitForSlurm: Monitors a list of jobs defined by jobIds, ensuring they complete successfully. If a job fails, it returns the corresponding job ID(s).
      • Log files are stored in output/logs, while job outputs are saved in output/hddcdd.
    • Generalized Directory Path Handling: All functions now support flexible path handling.

    • Parameter and Mapping Management: External parameters and most mappings have been moved to individual .csv files located in /extdata/mappings.


Current Limitations

  • Hardcoded SLURM Scripts:
    Currently, the SLURM script created by createSlurm is hardcoded. Future iterations will make this function more flexible, allowing users to define and pass custom scripts. For now, the existing implementation serves its purpose effectively.

@hagento hagento requested a review from robinhasse December 7, 2024 13:12
Comment on lines +39 to +42
ftas <- fileMapping[["tas"]]
frsds <- if (bait) fileMapping[["rsds"]] else NULL
fsfc <- if (bait) fileMapping[["sfcwind"]] else NULL
fhuss <- if (bait) fileMapping[["huss"]] else NULL

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks as if it should be one object

Suggested change
ftas <- fileMapping[["tas"]]
frsds <- if (bait) fileMapping[["rsds"]] else NULL
fsfc <- if (bait) fileMapping[["sfcwind"]] else NULL
fhuss <- if (bait) fileMapping[["huss"]] else NULL
fileNames <- c(ftas = fileMapping[["tas"]])
if (bait) {
fileNames <- c(
fileNames,
frsds = fileMapping[["rsds"]]
fsfc = fileMapping[["sfcwind"]]
fhuss = fileMapping[["huss"]]
)
}

Comment on lines +68 to +71
compStackHDDCDD(ftas = gsub(".nc", paste0("_", i, ".nc"), ftas),
frsds = if (bait) gsub(".nc", paste0("_", i, ".nc"), frsds) else NULL,
fsfc = if (bait) gsub(".nc", paste0("_", i, ".nc"), fsfc) else NULL,
fhuss = if (bait) gsub(".nc", paste0("_", i, ".nc"), fhuss) else NULL,
Copy link

@robinhasse robinhasse Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, you would have to adapt the function to take on argument and recognise missing elements if bait = FALSE...

Suggested change
compStackHDDCDD(ftas = gsub(".nc", paste0("_", i, ".nc"), ftas),
frsds = if (bait) gsub(".nc", paste0("_", i, ".nc"), frsds) else NULL,
fsfc = if (bait) gsub(".nc", paste0("_", i, ".nc"), fsfc) else NULL,
fhuss = if (bait) gsub(".nc", paste0("_", i, ".nc"), fhuss) else NULL,
compStackHDDCDD(fileNames = sub(".nc", paste0("_", i, ".nc"), fileNames),

Copy link

@robinhasse robinhasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hagento! Have you checked if there are already SLURM or some R parallelisation tools that do the work of starting and collecting calculations for you? This seems like a very common task in parallel computing. Sorry, if I have asked this multiple times before.

Comment on lines +45 to +50
yStart <- fileMapping[["start"]] %>% as.numeric()
yEnd <- fileMapping[["end"]] %>% as.numeric()

# extract RCP scenario + model
rcp <- fileMapping[["rcp"]] %>% unique()
model <- fileMapping[["gcm"]] %>% unique()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if csv is the right format for the mapping if you need so many unique() statements here. What if one puts different values and you have multiple unique values? Maybe use yaml format?

Copy link
Collaborator Author

@hagento hagento Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not happen since initCalculation always only receives one row of the initial file mapping from getDegreeDays.

h <- hum - cfac(tasData, type = "h", params = c(params[["aHUSS"]], params[["bHUSS"]]))
s <- solar - cfac(tasData, type = "s", params = params)
w <- wind - cfac(tasData, type = "w", params = params)
h <- hum - cfac(tasData, type = "h", params = params)
t <- tasData - cfac(tasData, type = "t", params = NULL)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem efficient to read the same file for each call of cfac. Might make sense to read the file outside of the function and pass it as an argument. But I guess this is really minor compared to other impacts on the runtime...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about paramsMap? You are right, this can be checked in advance.

# check if mapping file contains correct columns
mappingCols <- c("gcm", "rcp", "start", "end", "tas", "rsds", "sfc", "huss")
if (!any(mappingCols %in% colnames(fileMapping))) {
stop("Please provide file mapping with correct columns.\n Missing columns: ")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't give the missing cols.

Comment on lines +128 to +134
wBAIT <- setNames(as.list(wBAIT$value), wBAIT$variable)

# population data
popMapping <- setNames(as.list(popMapping$file), popMapping$scenario)

# scenario matrix
scenMatrix <- setNames(lapply(strsplit(scenMatrix$rcp, ","), trimws), scenMatrix$ssp)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you use dplyr::pull here?

Comment on lines +152 to +157
return(list(jobName = jobName,
jobScript = jobScript,
outputFile = outputFile,
slurmCommand = slurmCommand,
jobId = jobId,
batch_tag = batch_tag))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the input invisible such that it doesn't flush the console if the user did no assignment.

Suggested change
return(list(jobName = jobName,
jobScript = jobScript,
outputFile = outputFile,
slurmCommand = slurmCommand,
jobId = jobId,
batch_tag = batch_tag))
return(invisible(list(jobName = jobName,
jobScript = jobScript,
outputFile = outputFile,
slurmCommand = slurmCommand,
jobId = jobId,
batch_tag = batch_tag)))

wBAIT = wBAIT,
outDir = outDir)

allJobs[[length(allJobs) + 1]] <- job

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that work?

Suggested change
allJobs[[length(allJobs) + 1]] <- job
allJobs <- c(allJobs, job)

Comment on lines +231 to +235

# Check timeout
if (difftime(Sys.time(), startTime, units = "secs") > maxWaitTime) {
stop("Maximum wait time exceeded")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Check timeout
if (difftime(Sys.time(), startTime, units = "secs") > maxWaitTime) {
stop("Maximum wait time exceeded")
}

jobIds <- as.character(jobIds)
jobSet <- unique(jobIds) # Remove duplicates

while (TRUE) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while (TRUE) {
while (difftime(Sys.time(), startTime, units = "secs") < maxWaitTime) {

}

Sys.sleep(checkInterval)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
}
if (difftime(Sys.time(), startTime, units = "secs") > maxWaitTime) {
stop("Maximum wait time exceeded")
}

#' @export

processArgs <- function(tLim, std, ssp) {
#### Process tLim ####

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to stick to standard code section syntax to allow proper code folding and outline generation in RStudio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants