dttools: priority queue data structure #3915

JinZhou5042 · 2024-08-12T17:53:25Z

Proposed Changes

The idea came from the discussion in #3851.

Current TaskVine uses a Breadth First scheduling algorithm, which is implemented with a FIFO queue, meaning tasks are considered in the order they arrive. Alternatively, Depth First scheduling prioritizes tasks deeper in the compute graph, allowing them to be considered earlier when there are tasks with different priorities.

Both algorithms perform similarly without storage optimization techniques. However, in a large-scale compute graph, promptly pruning files helps workers reduce peak disk load, and Depth First scheduling makes it better. This is because tasks are interdependent, files can be consumed as soon as they are created with Depth First scheduling.

With a small HEP application consisting of nearly 1000 tasks, pruning using Breadth First scheduling (the current scheduling algorithm) reduces the peak disk usage by 50.45%, while pruning with Depth First scheduling achieves a reduction by 92.23%.

Merge Checklist

The following items must be completed before PRs can be merge.
Check these off to verify you have completed all steps.

make test Run local tests prior to pushing.
make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
make lint Run lint on source code prior to pushing.
Manual Update Update the manual to reflect user-visible changes.
Type Labels Select a github label for the type: bugfix, enhancement, etc.
Product Labels Select a github label for the product: TaskVine, Makeflow, etc.
PR RTM Mark your PR as ready to merge.

dthain

I like this implementation, it's nice and straightforward.

A few nits about the API documentation versus comments in the implementation below.

Now, the real question is, can it be worked into our various scheduling methods without breaking everything? Make sure you talk to @colinthomas-z80 about how/why we iterate over only part of the list at one time.

dttools/src/priority_queue.h

colinthomas-z80 · 2024-08-12T19:32:55Z

I was thinking this could possibly replace the list rotation and the way we only visit a subset of the tasks each scheduling attempt.

As long as the priority includes whether the task is ready to run or not, i.e. data dependencies are present. We should be finding a good task without looking too deep in the list.

JinZhou5042 · 2024-08-12T20:03:19Z

When all tasks have the same priority, the queue might look like this (priority, data):

[(1, "a"), (1, "b"), (1, "c"), (1, "d"), (1, "e")]

The first time to pop, we get (1, "a"). If we decide this isn't the element we need right now and want to consider it later, we push it back into the queue, resulting in:

[(1, "e"), (1, "b"), (1, "c"), (1, "d"), (1, "a")]

This is roughly like rotating the list.

In contrast, when tasks have different priorities, the element with the highest priority will stay at the top of the queue if we repeatedly pop and push it. For example, with this priority queue: (the store order is specified here):

[(5, "e"), (4, "d"), (2, "b"), (1, "a"), (3, "c")]

Popping the first element and then deciding to reconsider it later would place it back at the top of the queue, since its priority remains unchanged.

However, in real-world scenarios, if an element is popped and set aside for later consideration, it probably shouldn't retain its high priority. Instead, we might decrease its priority by 1 or set it to the lowest priority, allowing other elements deeper in the queue a fair chance to be considered.

For example, pop:

[(4, "d"), (2, "b"), (1, "a"), (3, "c")]

push back

[(4, "d"), (4, "e"), (2, "b"), (1, "a"), (3, "c")]

JinZhou5042 · 2024-08-12T20:10:03Z

What I'm uncertain about is that using a priority queue instead of a normal list to store ready tasks will eliminate the opportunity of scheduling tasks using a breadth-first approach (implementing FIFO in a priority is quite expensive, with the complexity of O(n^2)).

We can let the users to specify while scheduling algorithm they want to use, but that results in two different data structures existing to support q->ready_list to work.

JinZhou5042 · 2024-08-13T20:34:13Z

I did a quick experiment with the script provided by Colin:

#! /usr/bin/env python

import argparse
import sys
import time
import math
import ndcctools.taskvine as tv
import random

if __name__ == '__main__':
    q = tv.Manager(port=9123)

    satisfied = False

    # initial task count
    t_count = 5000

    print(f"Submitting {t_count} tasks")
    t_start = time.time()
   
    for i in range(t_count):
        t = tv.Task(":")
        q.submit(t
    
    t_submitted = time.time()
    print(f"Time to submit {t_submitted-t_start}s : {t_count*4/(t_submitted-t_start)} tasks per second.") 

    while not q.empty():
        t = q.wait(5)

    t_end = time.time()

    complete = t_end - t_start

    print(f"Tasks completed in {complete}s : {t_count/(complete)} tasks per second.")
    print()
    
    quit()

As mentioned by @colinthomas-z80, the reason to introduce list rotation is that in the manager end, it first scans the task ready list, to see if there are any tasks available to be scheduled, and then collects results that are ready on the worker. But, if there are no workers available, it still scans the whole ready list and see if it can schedule another.

The solution is to only consider a small number of tasks each attempt, say 100. The looking through process will eventually break out it has scanned 100 tasks. However, the consequence of looking at only 100 tasks is that tasks deeper in the ready list are not treated with a fair chance to be scheduled. So list rotation is introduced, for the purpose of wrapping around everything as it proceeds.

JinZhou5042 · 2024-08-13T20:37:28Z

I ran on 1 worker with 16 cores with three configurations, list without rotation, list with rotation and priority queue (every task has the same priority), the priority queue seems to have pretty much the same performance when compared to rotation list.

colinthomas-z80 · 2024-08-13T20:59:56Z

I’m not sure why there is so much variability in the results. Can you see if things change if you run your worker with a finite disk value, such as —disk 100 ? You may be experiencing the ongoing disk issue

JinZhou5042 · 2024-08-13T21:10:15Z

Ah, I set with finite values to disk and memory and the variability issue immediately disappears.

JinZhou5042 · 2024-08-13T21:28:01Z

The updated results

JinZhou5042 · 2024-08-14T19:15:32Z

Do we want to fully replace the list with a priority queue, or use a heuristic?

dthain · 2024-08-14T19:21:18Z

Jin, your question is too vague to be answered.

If you are proposing to replace the list with a priority queue, then you need to show us the changes needed in taskvine to use the new data structure. And we have to consider carefully whether it meets all of our needs.

I don't know what you mean by "a heuristic". Do you have a specific heuristic in mind?

JinZhou5042 · 2024-08-14T19:28:19Z

Well, I can propose a temporary PR quickly to show what are the changes to replace the list with a priority queue after this one is merged.

I don't know if it is a heuristic approach, but we can support both data structures and let the users make the decision which one they want to use. By default, they are using the link list, but if they want to specify some tasks with different priorities, they can tune a parameter like m.tune('priority_task_mode', 1) to manually switch it on.

priority queue data structure

6b96f7b

JinZhou5042 self-assigned this Aug 12, 2024

JinZhou5042 added the TaskVine label Aug 12, 2024

JinZhou5042 requested a review from dthain August 12, 2024 19:08

dthain requested changes Aug 12, 2024

View reviewed changes

dttools/src/priority_queue.h Show resolved Hide resolved

dttools/src/priority_queue.h Outdated Show resolved Hide resolved

dttools/src/priority_queue.h Outdated Show resolved Hide resolved

comment fix

0712fa3

JinZhou5042 requested a review from dthain August 12, 2024 20:03

JinZhou5042 added 6 commits August 12, 2024 16:15

comment operation complexity

3b6343c

add apis

25db495

lint

e7c1f8f

priority_queue_update_priority

8256c28

lint fix

8bb9333

add blank line at the end

ca995f6

JinZhou5042 added 4 commits August 15, 2024 11:02

update Makefile

ab82fe2

type issue

2f9a4dc

double priority

f4822c0

lint issue

5636acd

JinZhou5042 marked this pull request as draft August 19, 2024 14:00

test

6cc0037

JinZhou5042 closed this Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dttools: priority queue data structure #3915

dttools: priority queue data structure #3915

JinZhou5042 commented Aug 12, 2024 •

edited

Loading

dthain left a comment

colinthomas-z80 commented Aug 12, 2024

JinZhou5042 commented Aug 12, 2024 •

edited

Loading

JinZhou5042 commented Aug 12, 2024 •

edited

Loading

JinZhou5042 commented Aug 13, 2024

JinZhou5042 commented Aug 13, 2024

colinthomas-z80 commented Aug 13, 2024

JinZhou5042 commented Aug 13, 2024

JinZhou5042 commented Aug 13, 2024

JinZhou5042 commented Aug 14, 2024

dthain commented Aug 14, 2024

JinZhou5042 commented Aug 14, 2024

dttools: priority queue data structure #3915

dttools: priority queue data structure #3915

Conversation

JinZhou5042 commented Aug 12, 2024 • edited Loading

Proposed Changes

Merge Checklist

dthain left a comment

Choose a reason for hiding this comment

colinthomas-z80 commented Aug 12, 2024

JinZhou5042 commented Aug 12, 2024 • edited Loading

JinZhou5042 commented Aug 12, 2024 • edited Loading

JinZhou5042 commented Aug 13, 2024

JinZhou5042 commented Aug 13, 2024

colinthomas-z80 commented Aug 13, 2024

JinZhou5042 commented Aug 13, 2024

JinZhou5042 commented Aug 13, 2024

JinZhou5042 commented Aug 14, 2024

dthain commented Aug 14, 2024

JinZhou5042 commented Aug 14, 2024

JinZhou5042 commented Aug 12, 2024 •

edited

Loading

JinZhou5042 commented Aug 12, 2024 •

edited

Loading

JinZhou5042 commented Aug 12, 2024 •

edited

Loading