Held tasks need multiple triggers to run? #6398

ColemanTom · 2024-10-08T00:47:42Z

I am finding that if I have a held workflow (not paused), if I trigger a task it appears to do nothing. But, if I trigger it twice, it will then run the task. See the attached gif.

The scheduler log shows this

2024-10-08T00:42:22Z INFO - Command "force_trigger_tasks" received. ID=8b832ccb-2fba-46c7-a389-602fb984d60c
    force_trigger_tasks(flow=['all'], flow_wait=False, tasks=['20241006T1800Z/glu_ops_process_background_iasi'])
2024-10-08T00:42:22Z INFO - Command "force_trigger_tasks" actioned. ID=8b832ccb-2fba-46c7-a389-602fb984d60c
2024-10-08T00:42:22Z INFO - [20241006T1800Z/glu_ops_process_background_iasi:waiting(held)] => waiting(held)(queued)
2024-10-08T00:42:29Z INFO - Command "force_trigger_tasks" received. ID=883ccef9-4b47-4cac-9258-26056328462b
    force_trigger_tasks(flow=['all'], flow_wait=False, tasks=['20241006T1800Z/glu_ops_process_background_iasi'])
2024-10-08T00:42:29Z INFO - Command "force_trigger_tasks" actioned. ID=883ccef9-4b47-4cac-9258-26056328462b

If I read that correctly, the trigger pushes it to the queued state, and then a second trigger pushes to submit the job. I had expected that triggering would shift it twice.

I'm unsure if this is a Cylc-flow question or a UI question, so I've put it here as that is how I'm looking at it.

The text was updated successfully, but these errors were encountered:

hjoliver · 2024-10-11T00:17:15Z

This is intended.

triggering an un-queued task queues it to run
- (we don't automatically assume that you want to violate the queue limit)
triggering it again (i.e., triggering a queued task) causes it to submit
- (regardless of the queue limit)

The held attribute should not make any difference here, because we can run held tasks (they should stay held to prevent any automatic retries unless released).

Are you happy with this explanation, @ColemanTom ?

ColemanTom · 2024-10-11T00:25:40Z

Ah, ok. So because this task is in a queue, it takes two triggers. I just tested on a task that does not belong to a queue and it does only take one. I think that there is no visible change in the WUI when I do the first trigger is part of the confusion. In Cylc7, when triggering a task which was in a full queue, it changed to (I think) a yellow colour so show the state had changed. Here, there was no update to the display.

hjoliver · 2024-10-11T00:30:10Z

In Cylc 8 "queued" isn't a task state, it's a particular attribute of the waiting state.

Task attributes are shown as badges on the task icon. I think we only display one badge at a time, and we prioritize held over queued, so that's why you didn't see a change in the GUI.

@cylc/core - maybe this suggests we need to stack multiple badges on task icons?

ColemanTom · 2024-10-11T00:50:52Z

It would be helpful to see some indication that an action has been registered and done something. I wasn't sure if it was severe lag or what was happening.

hjoliver · 2024-10-11T02:00:27Z

(We can close this after adding or tweaking a cylc-ui issue to cover the last comment)

oliver-sanders · 2024-10-11T08:30:10Z

[edit]: Dammit, this is wrong, see later comment

Aah right.

Under normal circumstances, triggering a held task DOES cause it to be run immediately, however, in this case the queue that the task was added to was full so it sat there waiting.

(We can close this after adding or tweaking a cylc-ui issue to cover the last comment)

If the queued state always beats the held state (which I think it does), then we can resolve this by changing the "task modifier" icon order from:

is_held
is_queued
is_runahead

to:

is_queued
is_held
is_runahead

Note, this applies both to the GUI and Tui.

Additionally, when you select a task in either, you should get a task state summary (e.g. waiting (held)). We should ensure that queued beats held here too.

And (although I agree that held isn't really a special case here) we should still give it a passing mention in the docs:

$ cylc trigger --help | grep held

TODO:

Tui:
- Ensure queued beets held in the task modifier icon.
- Ensure queued beets held in the task state summary (select task and press enter).
GUI:
- Ensure queued beets held in the task modifier icon.
- Ensure queued beets held in the task state summary (select task to open context menu).
Add passing mention to held in cylc trigger CLI help?
Add passing mention to held in cylc trigger GraphQL schema?

oliver-sanders · 2024-10-11T08:39:04Z

As an aside, see this snippet from the Cylc UI design document where we were exploring context information for task state modifiers:

And the [now?] erroneous statement underneath about held taking priority over queue (I guess that made sense at the time?).

Where the "view" button would have taken you to this cylc/cylc-ui#462

oliver-sanders · 2024-10-11T14:05:51Z

@hjoliver

Sorry, on reflection, I have realised that this was not the intended behaviour and is actually inconsistent. The current behaviour is an unintended result of a bugfix (mia-culpa) for a completely different issue.

To explain, here's the stated intention:

Triggering an unqueued waiting task queues it, regardless of prerequisites.

Triggering a queued task submits it, regardless of queue limiting.

Triggering an active task has no effect (it already triggered).

cylc trigger --help

Under this logic, if you trigger a held task it will be queued. If there is space in the queue, it will run instantly.

However, at a later date, the queues because "is_held" aware in order to allow users to hold queued tasks (without us having to yank the tasks out of the queue in order to do so):

cylc-flow/cylc/flow/task_queues/independent.py

Lines 57 to 58 in 59ce660

    
           if itask.state.is_held: 
        
               held.append(itask)

So a held task won't queue (naturally) and a queued held task won't run (naturally).

This changed the behaviour. Now when we trigger a held task, it gets added to the queue, but the queue will never release it (irrespective of whether there is space in the queue or not).

So the behaviour changed from "it will run when there is space in the queue" to "it will never run under any circumstances" which was not intended. Intentions aside, the "it will never run" bit is also totally illogical.

In my above comment, I made this suggestion:

If the queued state always beats the held state (which I think it does), then we can resolve this by changing the "task modifier" icon order from:

is_held

is_queued

is_runahead

to:

is_queued

is_held

is_runahead

However, that doesn't make sense. Under the current implementation:

Held beats queued, because a held task cannot (naturally) become queued.
Queued beats held, because a queued held task cannot (naturally) run.

So there is no order of precedence we can give the task modifier icon that will work in all cases, in other words, the logic is inconsistent.

To resolve this we need to either:

Differentiate between a held task that became queued (i.e. was manually triggered) and a queued task that became held (i.e. was manually held).
Remove tasks from the queue when they are manually held rather than leaving them in the queue and filtering out all held tasks.

Option (2) is nicer, and technically correct (if you hold a queued task, you are suppressing it from being submitted, so it doesn't make sense for the task to remain queued). I think it's relatively easy to do these days as we now have other functionality that needs to remove tasks from the queue (e.g. cylc remove).

hjoliver · 2024-10-13T20:47:57Z

I ran into a hold bug when testing this on Friday, which I'll now try to reproduce (from memory ... it was on my office box, and I'm WFH today). However, I thought it was a different issue, because @ColemanTom's description above says:

If I read that correctly, the trigger pushes it to the queued state, and then a second trigger pushes to submit the job. I had expected that triggering would shift it twice.

That's just the intended double-trigger behaviour when queue-limiting is active, which doesn't seem to gel with what you're saying @oliver-sanders ... maybe it depends on how Tom's tasks got held in the first place (e.g. were they already queued or not?).

hjoliver · 2024-10-13T21:58:23Z

(Side-issue spawned after taking hold and queue out of the mix: #6406 )

ColemanTom · 2024-10-13T22:05:35Z

Tom's tasks got held in the first place (e.g. were they already queued or not?).

I don't really remember the setup I had. If I pay attention and see it again I'll try to comment back here.

hjoliver · 2024-10-13T23:11:56Z

Talking a step back from attribute priority, to confirm triggering and queues work as intended without hold involved:

[scheduling]
    [[queues]]
        [[[default]]]
            limit = 1
    [[graph]]
        R1 = "foo:start => a => b"
[runtime]
    [[foo]]
        pre-script = "sleep 360"
    [[a, b]]
        script = "sleep 10; false"
        execution retry delays = PT5S

foo:start spawns a as queued; b remains a future task
job 01:
- triggering a (queued) once runs it
- triggering b once queues it; twice runs it
job 02:
- a queues again after retry time is done; triggering once runs it
- b returns to waiting on a:succeeded after retry timer is done; triggering once queues it, twice runs it
All good ✔️ (but see Triggered future tasks, and prerequisite status? #6406)

hjoliver · 2024-10-13T23:29:19Z

Now see how hold affects the active (n=0) waiting task a:

hold a:waiting(queued) => a:waiting(held)(queued)
trigger unqueues it to run (still held)
job fails => a:waiting(held) (not queued)
release => waiting(queued)
(now trigger once to unqueue and run job 02)

This kinda seems right from a manual triggering perspective, IF we think that manually triggering a held task should make it run but remain held - which seems reasonable because we can hold a running task, which just means it won't automatically retry.

But maybe not from an automatic triggering perspective, because a held task cannot submit a job automatically, so arguably it should not be queued? 🤔

hjoliver · 2024-10-13T23:38:32Z

Now see how hold affects the future task b:

hold adds it to the future hold list
trigger => waiting(queued) - (the hold list is not used?)
trigger again runs it
job 01 fails; after the retry timer it is waiting, not held and not queued

Alternatively:

hold adds it to the future hold list
cylc set -p 1/a causes it to spawn naturally => waiting(held) - or trigger a and allow it to succeed - (the hold list is used)
triggering it now => b:waiting(held)(queued)

(Then same as for the a case above)

hjoliver · 2024-12-04T01:33:14Z

trigger => waiting(queued) - (the hold list is not used?)

Spinning off a small bug issue for this: #6513

I'm not sure this issue has revealed any other actual bugs. From a quick play, I can't seem to reproduce your claimed bug from above @oliver-sanders (did you test this, or were you speculating based on how the code looks?):

Now when we trigger a held task, it gets added to the queue, but the queue will never release it (irrespective of whether there is space in the queue or not)

Test case:

[scheduling]
    [[queues]]
        [[[default]]]
            limit = 1
    [[graph]]
        R1 = "a & long:start => b"
[runtime]
    [[a, b]]
        script = "false"
    [[long]]
        script = "sleep 300"

Procedure:

cylc vip
- wait for a:failed, and long:started, and b:waiting
cylc hold //1/b
cylc trigger //1/b
- result b:waiting (held, queued)
cylc trigger //1/b
- result: b runs and ends up as b:failed (held)

oliver-sanders · 2024-12-04T16:53:20Z

I'm not sure this issue has revealed any other actual bugs

Here's a simple workflow to demonstrate the bug of tasks requiring multiple triggers to run:

[scheduling]
    [[graph]]
        R1 = a

[runtime]
    [[a]]
        script = """
            cylc__job__poll_grep_workflow_log -E '1/a.*running'
            cylc kill "${CYLC_WORKFLOW_ID}//1/a"
        """
        [[[events]]]
            failed handlers = cylc trigger "%(workflow)s//1/a"

Queue limiting is not a factor here (default queue has 100 spaces and there are no other active tasks) so the task should run straight away, but it doesn't.

Edited workflow log:

INFO - [1/a:waiting] => preparing
INFO - [1/a/01:submitted] => running
# kill issued
INFO - Command "kill_tasks" received. ID=b0e6c400-d7a7-41df-a230-0a8563387521
    kill_tasks(tasks=['1/a'])
INFO - [1/a/01:running] => running(held)
INFO - Command "kill_tasks" actioned. ID=b0e6c400-d7a7-41df-a230-0a8563387521
INFO - [1/a/01:running(held)] => failed(held)
# first trigger issued
INFO - Command "force_trigger_tasks" received. ID=99ef9eda-bd4c-48c2-99e0-f9c561c933fb
    force_trigger_tasks(flow=[], flow_wait=False, tasks=['1/a'])
INFO - [1/a/01:failed(held)] => waiting(held)
INFO - [1/a:waiting(held)] => waiting(held)(queued)
# nothing happens -> second trigger issued
INFO - Command "force_trigger_tasks" received. ID=0b9539c1-e5e0-421e-841d-ccecb95f08ef
    force_trigger_tasks(flow=[], flow_wait=False, tasks=['1/a'])
# job submits
INFO - [1/a:waiting(held)(queued)] => waiting(held)
INFO - [1/a:waiting(held)] => preparing(held)

oliver-sanders · 2024-12-04T16:55:12Z

I can't seem to reproduce your claimed bug from above @oliver-sanders (did you test this, or were you speculating based on how the code looks?):

This has been reported and observed. I have not attempted to reproduce yet (too much to do, too little time).

test cylc#6398

hjoliver · 2024-12-05T18:27:47Z

By sheer luck, this one is getting closed by #6499

Summary: @ColemanTom's initial report is expected trigger-twice-if-the-queue-is-limiting behaviour:

the trigger pushes it to the queued state, and then a second trigger pushes to submit the job. I had expected that triggering would shift it twice.

But @oliver-sanders revealed another bug that IS associated with manually triggering of held tasks (see his example just above).

oliver-sanders transferred this issue from cylc/cylc-ui Oct 9, 2024

oliver-sanders added this to the 8.3.6 milestone Oct 9, 2024

oliver-sanders added bug Something is wrong :( small labels Oct 9, 2024

hjoliver removed the bug Something is wrong :( label Oct 11, 2024

oliver-sanders added the doc Documentation label Oct 11, 2024

oliver-sanders added bug? Not sure if this is a bug or not bug Something is wrong :( and removed bug? Not sure if this is a bug or not doc Documentation labels Oct 11, 2024

oliver-sanders modified the milestones: 8.3.6, 8.3.7 Oct 18, 2024

oliver-sanders linked a pull request Dec 3, 2024 that will close this issue

Trigger now if workflow paused #6499

Open

8 tasks

oliver-sanders modified the milestones: 8.3.7, 8.4.0 Dec 5, 2024

oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Dec 5, 2024

test cylc#6398

47ce8fa

hjoliver added a commit to hjoliver/cylc-flow that referenced this issue Dec 5, 2024

Merge pull request #61 from oliver-sanders/test-6398

f1462eb

test cylc#6398

hjoliver self-assigned this Dec 5, 2024

hjoliver pushed a commit to hjoliver/cylc-flow that referenced this issue Dec 10, 2024

test cylc#6398

da04357

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Held tasks need multiple triggers to run? #6398

Held tasks need multiple triggers to run? #6398

ColemanTom commented Oct 8, 2024

hjoliver commented Oct 11, 2024 •

edited

Loading

ColemanTom commented Oct 11, 2024

hjoliver commented Oct 11, 2024

ColemanTom commented Oct 11, 2024

hjoliver commented Oct 11, 2024

oliver-sanders commented Oct 11, 2024 •

edited

Loading

oliver-sanders commented Oct 11, 2024 •

edited

Loading

oliver-sanders commented Oct 11, 2024 •

edited

Loading

hjoliver commented Oct 13, 2024 •

edited

Loading

hjoliver commented Oct 13, 2024

ColemanTom commented Oct 13, 2024

hjoliver commented Oct 13, 2024 •

edited

Loading

hjoliver commented Oct 13, 2024 •

edited

Loading

hjoliver commented Oct 13, 2024 •

edited

Loading

hjoliver commented Dec 4, 2024 •

edited

Loading

oliver-sanders commented Dec 4, 2024 •

edited

Loading

oliver-sanders commented Dec 4, 2024 •

edited

Loading

hjoliver commented Dec 5, 2024

Held tasks need multiple triggers to run? #6398

Held tasks need multiple triggers to run? #6398

Comments

ColemanTom commented Oct 8, 2024

hjoliver commented Oct 11, 2024 • edited Loading

ColemanTom commented Oct 11, 2024

hjoliver commented Oct 11, 2024

ColemanTom commented Oct 11, 2024

hjoliver commented Oct 11, 2024

oliver-sanders commented Oct 11, 2024 • edited Loading

oliver-sanders commented Oct 11, 2024 • edited Loading

oliver-sanders commented Oct 11, 2024 • edited Loading

hjoliver commented Oct 13, 2024 • edited Loading

hjoliver commented Oct 13, 2024

ColemanTom commented Oct 13, 2024

hjoliver commented Oct 13, 2024 • edited Loading

hjoliver commented Oct 13, 2024 • edited Loading

hjoliver commented Oct 13, 2024 • edited Loading

hjoliver commented Dec 4, 2024 • edited Loading

oliver-sanders commented Dec 4, 2024 • edited Loading

oliver-sanders commented Dec 4, 2024 • edited Loading

hjoliver commented Dec 5, 2024

hjoliver commented Oct 11, 2024 •

edited

Loading

oliver-sanders commented Oct 11, 2024 •

edited

Loading

oliver-sanders commented Oct 11, 2024 •

edited

Loading

oliver-sanders commented Oct 11, 2024 •

edited

Loading

hjoliver commented Oct 13, 2024 •

edited

Loading

hjoliver commented Oct 13, 2024 •

edited

Loading

hjoliver commented Oct 13, 2024 •

edited

Loading

hjoliver commented Oct 13, 2024 •

edited

Loading

hjoliver commented Dec 4, 2024 •

edited

Loading

oliver-sanders commented Dec 4, 2024 •

edited

Loading

oliver-sanders commented Dec 4, 2024 •

edited

Loading