-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added degree day calculation and SLURM job handling #4
base: main
Are you sure you want to change the base?
Conversation
ftas <- fileMapping[["tas"]] | ||
frsds <- if (bait) fileMapping[["rsds"]] else NULL | ||
fsfc <- if (bait) fileMapping[["sfcwind"]] else NULL | ||
fhuss <- if (bait) fileMapping[["huss"]] else NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks as if it should be one object
ftas <- fileMapping[["tas"]] | |
frsds <- if (bait) fileMapping[["rsds"]] else NULL | |
fsfc <- if (bait) fileMapping[["sfcwind"]] else NULL | |
fhuss <- if (bait) fileMapping[["huss"]] else NULL | |
fileNames <- c(ftas = fileMapping[["tas"]]) | |
if (bait) { | |
fileNames <- c( | |
fileNames, | |
frsds = fileMapping[["rsds"]] | |
fsfc = fileMapping[["sfcwind"]] | |
fhuss = fileMapping[["huss"]] | |
) | |
} |
compStackHDDCDD(ftas = gsub(".nc", paste0("_", i, ".nc"), ftas), | ||
frsds = if (bait) gsub(".nc", paste0("_", i, ".nc"), frsds) else NULL, | ||
fsfc = if (bait) gsub(".nc", paste0("_", i, ".nc"), fsfc) else NULL, | ||
fhuss = if (bait) gsub(".nc", paste0("_", i, ".nc"), fhuss) else NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, you would have to adapt the function to take on argument and recognise missing elements if bait = FALSE
...
compStackHDDCDD(ftas = gsub(".nc", paste0("_", i, ".nc"), ftas), | |
frsds = if (bait) gsub(".nc", paste0("_", i, ".nc"), frsds) else NULL, | |
fsfc = if (bait) gsub(".nc", paste0("_", i, ".nc"), fsfc) else NULL, | |
fhuss = if (bait) gsub(".nc", paste0("_", i, ".nc"), fhuss) else NULL, | |
compStackHDDCDD(fileNames = sub(".nc", paste0("_", i, ".nc"), fileNames), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hagento! Have you checked if there are already SLURM or some R parallelisation tools that do the work of starting and collecting calculations for you? This seems like a very common task in parallel computing. Sorry, if I have asked this multiple times before.
yStart <- fileMapping[["start"]] %>% as.numeric() | ||
yEnd <- fileMapping[["end"]] %>% as.numeric() | ||
|
||
# extract RCP scenario + model | ||
rcp <- fileMapping[["rcp"]] %>% unique() | ||
model <- fileMapping[["gcm"]] %>% unique() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if csv is the right format for the mapping if you need so many unique()
statements here. What if one puts different values and you have multiple unique values? Maybe use yaml format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not happen since initCalculation
always only receives one row of the initial file mapping from getDegreeDays
.
h <- hum - cfac(tasData, type = "h", params = c(params[["aHUSS"]], params[["bHUSS"]])) | ||
s <- solar - cfac(tasData, type = "s", params = params) | ||
w <- wind - cfac(tasData, type = "w", params = params) | ||
h <- hum - cfac(tasData, type = "h", params = params) | ||
t <- tasData - cfac(tasData, type = "t", params = NULL) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem efficient to read the same file for each call of cfac
. Might make sense to read the file outside of the function and pass it as an argument. But I guess this is really minor compared to other impacts on the runtime...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about paramsMap
? You are right, this can be checked in advance.
# check if mapping file contains correct columns | ||
mappingCols <- c("gcm", "rcp", "start", "end", "tas", "rsds", "sfc", "huss") | ||
if (!any(mappingCols %in% colnames(fileMapping))) { | ||
stop("Please provide file mapping with correct columns.\n Missing columns: ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't give the missing cols.
wBAIT <- setNames(as.list(wBAIT$value), wBAIT$variable) | ||
|
||
# population data | ||
popMapping <- setNames(as.list(popMapping$file), popMapping$scenario) | ||
|
||
# scenario matrix | ||
scenMatrix <- setNames(lapply(strsplit(scenMatrix$rcp, ","), trimws), scenMatrix$ssp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you use dplyr::pull
here?
return(list(jobName = jobName, | ||
jobScript = jobScript, | ||
outputFile = outputFile, | ||
slurmCommand = slurmCommand, | ||
jobId = jobId, | ||
batch_tag = batch_tag)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the input invisible such that it doesn't flush the console if the user did no assignment.
return(list(jobName = jobName, | |
jobScript = jobScript, | |
outputFile = outputFile, | |
slurmCommand = slurmCommand, | |
jobId = jobId, | |
batch_tag = batch_tag)) | |
return(invisible(list(jobName = jobName, | |
jobScript = jobScript, | |
outputFile = outputFile, | |
slurmCommand = slurmCommand, | |
jobId = jobId, | |
batch_tag = batch_tag))) |
wBAIT = wBAIT, | ||
outDir = outDir) | ||
|
||
allJobs[[length(allJobs) + 1]] <- job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that work?
allJobs[[length(allJobs) + 1]] <- job | |
allJobs <- c(allJobs, job) |
|
||
# Check timeout | ||
if (difftime(Sys.time(), startTime, units = "secs") > maxWaitTime) { | ||
stop("Maximum wait time exceeded") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Check timeout | |
if (difftime(Sys.time(), startTime, units = "secs") > maxWaitTime) { | |
stop("Maximum wait time exceeded") | |
} |
jobIds <- as.character(jobIds) | ||
jobSet <- unique(jobIds) # Remove duplicates | ||
|
||
while (TRUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while (TRUE) { | |
while (difftime(Sys.time(), startTime, units = "secs") < maxWaitTime) { |
} | ||
|
||
Sys.sleep(checkInterval) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
} | |
if (difftime(Sys.time(), startTime, units = "secs") > maxWaitTime) { | |
stop("Maximum wait time exceeded") | |
} |
#' @export | ||
|
||
processArgs <- function(tLim, std, ssp) { | ||
#### Process tLim #### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try to stick to standard code section syntax to allow proper code folding and outline generation in RStudio.
This PR introduces the integration of explicit degree day calculation and SLURM job management to facilitate the process.
Key Changes
getDegreeDays
: Orchestrates the degree day calculation process. It accepts afileMapping
input that defines the datasets to be processed. This function:fileMapping
.In the next PR, this function will be extended to gather data from completed jobs and handle post-processing.
processArgs
: Ensures the input arguments forgetDegreeDays
are properly formatted.initCalculation
: Reads a specific row fromfileMapping
and splits the data year-wise to enhance processing stability.SLURM Utilities (
slurmUtils.R
):createSlurm
: Generates SLURM scripts for individual jobs, temporarily saves (and later removes) necessary data, and submits the jobs. It also creates the required directory structure within anoutput
folder.waitForSlurm
: Monitors a list of jobs defined byjobIds
, ensuring they complete successfully. If a job fails, it returns the corresponding job ID(s).output/logs
, while job outputs are saved inoutput/hddcdd
.Generalized Directory Path Handling: All functions now support flexible path handling.
Parameter and Mapping Management: External parameters and most mappings have been moved to individual
.csv
files located in/extdata/mappings
.Current Limitations
Currently, the SLURM script created by
createSlurm
is hardcoded. Future iterations will make this function more flexible, allowing users to define and pass custom scripts. For now, the existing implementation serves its purpose effectively.