Access to previous task used resources for resources optimization #5378

luisas · 2024-10-07T08:02:09Z

luisas
Oct 7, 2024

New feature

I need to assign a specific amount of resources on retry depending on the amount of resources used in the process run that failed previously.
So for instance, if a process has used 99% of memory and 10% of the requested time, I would increase memory and not time/cpus.
I would like to be able to access the task.previous resources to take the decision.

Usage scenario

When running a big analysis, with input files of different size, instead of increasing blindly all cpus, time and memory resources, if only the time was insufficient, one could leave the memory unchanged and increase only time and cpus and vice-versa. This would allow to spare some resources, especially when one does not know the software well and it is difficult to predict and when the input file sizes can vary a lot.

Suggest implementation

As suggested by @bentsherman, we could provide something like task.previous which refers to the previous task attempt. then you could use task.previous.workDir to get the previous task directory and inspect the .command.trace file to see what the cpu/mem usage was

pditommaso · 2024-10-07T13:19:00Z

pditommaso
Oct 7, 2024
Maintainer

You are saying you would like to access the previous task "effective" usage. not declared one, right?

0 replies

luisas · 2024-10-07T13:20:40Z

luisas
Oct 7, 2024
Author

Yes exactly.

I would also be happy with assigned only, if that was easier to access, but in the best case scenario i'd need the used.

0 replies

luisas · 2024-10-07T13:24:50Z

luisas
Oct 7, 2024
Author

I have also been suggested to do this through a plugin, would this be a good approach?

0 replies

pditommaso · 2024-10-07T13:30:52Z

pditommaso
Oct 7, 2024
Maintainer

Possibly, not 100% it will be possible to access it via the task retry retry tho

0 replies

pditommaso · 2024-10-07T13:32:07Z

pditommaso
Oct 7, 2024
Maintainer

However I find this approach convoluted. Would not be enough check the exit status for OOM error and decide if increase the mem or not?

0 replies

luisas · 2024-10-07T13:50:38Z

luisas
Oct 7, 2024
Author

I would not be sure how to make it work, maybe i am missing something. The thing is that if I have error 137 for the first 3 attempts, and I only increase memory then, and then at attempt 4 I have 140 and I want to increase time by 1 step only (because i supposedly not increase it in the steps before), I would need to "remember" that I did not increase it the last 3 times and I am unsure how to get that information

0 replies

luisas · 2024-10-07T13:55:10Z

luisas
Oct 7, 2024
Author

because now I am only able to increase time and memory by task.attempt (I do not know any other way, if there is maybe it is all I need :D), but that is blind of whether the attempts before had an error related to that specific resource (OOM for memory or OOT for cpus/time) and would increase it as if all the ones before were failing for that specific resource

0 replies

luisas · 2024-10-07T14:00:38Z

luisas
Oct 7, 2024
Author

  memory = { task.exitStatus == 137 ? check_max( 1.MB  * (2** (task.attempt-1)), 'memory'  ) : 1.MB }
  time   = { task.exitStatus == 140 ? check_max( 1.m  * (2** (task.attempt-1)), 'time'  ) : 1.m }
  cpus   = { task.exitStatus == 140 ? check_max( 1  * (2** (task.attempt-1)), 'cpus'  ) : 1 }

I was playing with a configuration similar to this one to test this behaviour - and this is an example of the resources i got:

0 replies

pditommaso · 2024-10-07T14:32:08Z

pditommaso
Oct 7, 2024
Maintainer

Indeed, the memory request should be a function taking the attempt and the exit code and returning the desired amount of memory.

Note that the logic can be defined into a plain function and used in the memory directive, for example

workflow {
    foo()    
}

def my_mem_request(task) {
    def result = ( task.exitStatus == 137 )
        ? 1.GB * (2** task.attempt)
        : 1.GB
    return result
} 

process foo {
  debug true
  memory { my_mem_request(task) }
  errorStrategy 'retry'
  resourceLimits memory: 10.GB
  """
  echo mem: $task.memory
  exit 137
  """
}

Note the use of resourceLimits to cap the max amount of memory instead of check_max

0 replies

pditommaso · 2024-10-07T14:41:53Z

pditommaso
Oct 7, 2024
Maintainer

i've realised it's quire straightforward to implement the solution tracking the state of the previous task as your are suggesting.

def state = [:]

def my_mem_request(task, Map state) {
    // index of current execution
    // find out the previous mem 
    def previous = state["${task.process}-${task.index-1}"] ?: (1.GB)
    // compute the new one 
    def result = ( task.exitStatus == 137 )
        ? previous *2   // add here your increasing logic
        : previous      // same as before
    // store for the next one
    state["${task.process}-${task.index}"] = result
    return result
} 

process foo {
  debug true
  memory { my_mem_request(task,state) }
  errorStrategy 'retry'
  resourceLimits memory: 10.GB
  """
  echo mem: $task.memory
  exit 137
  """
}

update:

i've changed the use of key.previous() that was buggy with "${task.process}-${task.index-1}"

1 reply

luisas Oct 8, 2024
Author

Eventually, I uses ${task.attempt} instead of ${task.index} - for some reason task.index was always 1.

With task attempt it works as I wanted it to, fantastic!

luisas · 2024-10-07T14:45:37Z

luisas
Oct 7, 2024
Author

Amazing!! Paolo's magic!

Thanks a ton :)

0 replies

pditommaso · 2024-10-07T14:51:27Z

pditommaso
Oct 7, 2024
Maintainer

Paologpt

0 replies

bentsherman · 2024-10-07T15:00:44Z

bentsherman
Oct 7, 2024
Maintainer

@pditommaso what do you think about storing a reference to the previous task in the task object e.g. task.previous points to the task of the previous attempt

this way the previous attempt can be used in config settings, whereas your custom function won't be accessible in the config

0 replies

pditommaso · 2024-10-07T15:03:16Z

pditommaso
Oct 7, 2024
Maintainer

Not so keen to implement it as a core feature. Tracking all tasks in memory could require a lot of memory for very big runs

0 replies

pditommaso · 2024-10-07T15:29:25Z

pditommaso
Oct 7, 2024
Maintainer

@luisas look at the updates

1 reply

luisas Oct 8, 2024
Author

Thank you!

luisas · 2024-10-08T12:13:59Z

luisas
Oct 8, 2024
Author

@pditommaso would there be any way to access the actual used resources?

I tried to store them with the afterScript but i was not able to access any of the task.rss or task.vmem variables.

If not accessible, do you think I could make that work with a plugin?

5 replies

pditommaso Oct 8, 2024
Maintainer

You can only access directive declared in the task context. Right now it's not possible to access the "trace" values. The plugin should allow you to access them, but it maybe not be straightforward.

Was thinking we could add the access to trace.x metainfo, not sure how much it will take

luisas Oct 8, 2024
Author

I was also thinking of using some trace functions in the afterScript, or generally accessing the trace values as nextflow does.

Or is it too sketchy?

luisas Oct 8, 2024
Author

Or I could just do it properly with the plugin - do you think it would be generally useful? Cedric is quite excited and wanted me to try some things, so maybe the plugin is the right playground.

Or do you think that once the trace.x metainfo would be available it would make the plugin useless? I am unsure whether to go down this road now

pditommaso Oct 9, 2024
Maintainer

I'd suggest to give a try the plugin approach, the co2footprint could be a good start because it access to the trace info

luisas Oct 9, 2024
Author

Ok, I think I will give it a try then :) Thanks a lot!

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access to previous task used resources for resources optimization #5378

{{title}}

Replies: 17 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

This comment was marked as off-topic.

Select a reply

Access to previous task used resources for resources optimization #5378

luisas Oct 7, 2024

New feature

Usage scenario

Suggest implementation

Replies: 17 comments · 7 replies

pditommaso Oct 7, 2024 Maintainer

luisas Oct 7, 2024 Author

luisas Oct 7, 2024 Author

pditommaso Oct 7, 2024 Maintainer

pditommaso Oct 7, 2024 Maintainer

luisas Oct 7, 2024 Author

luisas Oct 7, 2024 Author

luisas Oct 7, 2024 Author

pditommaso Oct 7, 2024 Maintainer

pditommaso Oct 7, 2024 Maintainer

luisas Oct 8, 2024 Author

luisas Oct 7, 2024 Author

pditommaso Oct 7, 2024 Maintainer

bentsherman Oct 7, 2024 Maintainer

pditommaso Oct 7, 2024 Maintainer

pditommaso Oct 7, 2024 Maintainer

luisas Oct 8, 2024 Author

luisas Oct 8, 2024 Author

pditommaso Oct 8, 2024 Maintainer

luisas Oct 8, 2024 Author

luisas Oct 8, 2024 Author

pditommaso Oct 9, 2024 Maintainer

luisas Oct 9, 2024 Author

This comment was marked as off-topic.

luisas
Oct 7, 2024

Replies: 17 comments 7 replies

pditommaso
Oct 7, 2024
Maintainer

luisas
Oct 7, 2024
Author

luisas
Oct 7, 2024
Author

pditommaso
Oct 7, 2024
Maintainer

pditommaso
Oct 7, 2024
Maintainer

luisas
Oct 7, 2024
Author

luisas
Oct 7, 2024
Author

luisas
Oct 7, 2024
Author

pditommaso
Oct 7, 2024
Maintainer

pditommaso
Oct 7, 2024
Maintainer

luisas Oct 8, 2024
Author

luisas
Oct 7, 2024
Author

pditommaso
Oct 7, 2024
Maintainer

bentsherman
Oct 7, 2024
Maintainer

pditommaso
Oct 7, 2024
Maintainer

pditommaso
Oct 7, 2024
Maintainer

luisas Oct 8, 2024
Author

luisas
Oct 8, 2024
Author

pditommaso Oct 8, 2024
Maintainer

luisas Oct 8, 2024
Author

luisas Oct 8, 2024
Author

pditommaso Oct 9, 2024
Maintainer

luisas Oct 9, 2024
Author