Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vine: assume file created on declaration until specified as output #3975

Conversation

colinthomas-z80
Copy link
Contributor

@colinthomas-z80 colinthomas-z80 commented Nov 7, 2024

Proposed Changes

Assuming a declared file is created until it is specified as an output allows us to do two things:

  • Fail a task if its input file is missing and no source for the input file has been given
  • Use regular files as intermediate data without failing in the same way, since we specify the file as an input and revert it to the pending state

Merge Checklist

The following items must be completed before PRs can be merge.
Check these off to verify you have completed all steps.

  • make test Run local tests prior to pushing.
  • make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
  • make lint Run lint on source code prior to pushing.
  • Manual Update Update the manual to reflect user-visible changes.
  • Type Labels Select a github label for the type: bugfix, enhancement, etc.
  • Product Labels Select a github label for the product: TaskVine, Makeflow, etc.
  • PR RTM Mark your PR as ready to merge.

@colinthomas-z80 colinthomas-z80 added bug For modifications that fix a flaw in the code. TaskVine labels Nov 7, 2024
@colinthomas-z80
Copy link
Contributor Author

@dthain I am not certain regular files ever worked as intermediate data, and whether or not our implementation would allow it.

My stat check works as anticipated, so it will only send tasks where the regular file inputs are present at the manager. But now we fail one of the tests which is supposed to return "input missing", since the manager does not give up on the task when it finds the file does not exist yet.

If we want to use regular files as intermediate data we need to tolerate these kinds of tasks. But I think we want to warn the user if their inputs are missing rather than hang waiting for more tasks to show up.

@dthain
Copy link
Member

dthain commented Nov 8, 2024

Ah, this is interesting, please stop by today and let's talk about it.
I think there is a distinction to be made between the existence of the file in the filesystem and the state of the file in the DAG.

@colinthomas-z80
Copy link
Contributor Author

Discussion notes:

  • On file declaration assume state "created"
  • when a declared file is added as an output to a task, change to state "pending"

This way we will fail (properly) if we run into a task where the input file is "created" but missing. And we will also hold off on failing if we run into a pending file that will be the output of another task.

@colinthomas-z80 colinthomas-z80 changed the title vine: check created state for regular files before scheduling vine: assume file created on declaration until specified as output Nov 8, 2024
@dthain
Copy link
Member

dthain commented Nov 8, 2024

Looks nice and straightforward. Anything else needed?

@colinthomas-z80
Copy link
Contributor Author

I think it’s good to go

@dthain dthain merged commit e347043 into cooperative-computing-lab:master Nov 8, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug For modifications that fix a flaw in the code. TaskVine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants