Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to optimize submission time of workflow(s) #6681

Open
Light1110 opened this issue Jan 2, 2025 · 2 comments
Open

How to optimize submission time of workflow(s) #6681

Light1110 opened this issue Jan 2, 2025 · 2 comments
Labels
type/feature request status undecided

Comments

@Light1110
Copy link

It seems that I cannot reopen the closed issue #6677, so I open a new one here. The method in #6677 works, below is the new profiling. Thanks for @unkcpz 's kind help.

profiling

But 3 seconds per submission is still a little slower than I expected, I wonder if there is any way to optimize further.

@Light1110 Light1110 added the type/feature request status undecided label Jan 2, 2025
@unkcpz
Copy link
Member

unkcpz commented Jan 2, 2025

It seems that I cannot reopen the closed issue #6677

Sorry. Let's continue the discussion here. I change the title of OP #6677 to "Slow workflow submission when using FolderData as input".

After the change it is obvious to see the bottleneck is from database IO. Sadly, I think at the moment there is not simple way to optimize it if all the inputs are what you want to store for later provenance.
In design aiida will store most of information related to the calculation into the storage to ensure the provenance and data retrieveness for future research. It is the trade-off to pay with such a bit slow down on submission.

There is some effort on going by @rabbull to bring the concurrency into the DB operations (@rabbull @GeigerJ2 correct me if I am wrong) but when submit multiple workflows from one script, the creation of process into DB are happened synchronously therefore it has to be done one by another.

Two things you can try:

  • If the workflows are related and in together the results of workflows are used for follow up calculation, you can use a larger workflow (WorkChain in the context of aiida) to encapsulate it and run the whole workflow as one large workchain with one submit needed. (One example is scan the input parameters for the same calculation logic and collect the results for further data processing)
  • If you are sure some inputs are not so important to be included in the DB for your future data analysis, you can simply exclude those from storing into database by adding non_db for the input port. By default, aiida will try to store all input data for the full provenance. On the contrary, you can remove all the storing of input data, you'll get a workflow engine without provenance which is still fine in some cases.

@unkcpz unkcpz changed the title How to optimize submissions of workflow How to optimize submission time of workflow Jan 2, 2025
@unkcpz unkcpz changed the title How to optimize submission time of workflow How to optimize submission time of workflow(s) Jan 2, 2025
@Light1110
Copy link
Author

I see. As we have used WorkChain in our workflow, I will try non_db for some unimportant inputs. Thanks for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature request status undecided
Projects
None yet
Development

No branches or pull requests

2 participants