Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling failures in elastic flow #861

Merged
merged 3 commits into from
May 29, 2024
Merged

Conversation

gpetretto
Copy link
Contributor

Following what already mentioned in #817, I have implemented a few changes in the BaseElasticFlow.

  • It is now possible to specify a max_failed_deformations in the Flow, that will not complete the fit job if the number of failed deformation jobs exceeds that number. The default is no limit for backward compatibility.
  • The uuids of the failed perturbations are included in the ElasticDocument.
  • A list of warnings has also been added to the ElasticDocument. If the list of failed uuids is not empty, a warning is added there.

@@ -54,3 +57,137 @@ def _get_strains(structure, sym_reduce):
deformations = response[job.uuid][1].output

return [d.green_lagrange_strain for d in deformations]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add a test for the whole flow or at least nake sure that the parameter won't be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a test for the whole workflow is a bit problematic because it is difficult to consistently trigger the failure of one of the pertrubations. I can manually generate a test by explicitly killing vasp for one of the perturbations, but if the tests will need to be regenerated in the future it may be annoying.
It would be easier if I could modify the INCAR of only one of the perturbations, but it seems to not be possible because update_kwargs is not applied to dynamically generated jobs (see materialsproject/jobflow#588).
I thought these tests will be enough, since all the logic stays in the fit job, but if you think it is needed I can try to generate the workflow test by killing the job. Or maybe there is a better way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. This sounds like too much work.

I think the fitting part is well tested. I was rather thinking about testing that a parameter change really arrives in the fit function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was part of the output schema, it would be easy. 😅

Copy link
Member

@JaGeo JaGeo May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only missing point. As this is a rather small but still important pull request, I would in this case merge it.
Does it make sense to add the max_* to the output schema and just test if it arrives there if you set it in the wf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I had partially misunderstood your previous comment. I could add max_failed_deformations to the output document, but I am not sure if it is really worth to have that information just to make the test. I will try if I can quickly make a test by explicitly killing one of the deformation jobs, if it gets too complicated I will add max_failed_deformations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to add a test with the flow failure triggered by the max_failed_deformations being set to 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thank you!

@JaGeo
Copy link
Member

JaGeo commented May 28, 2024

@gpetretto Thank you!

I left a comment on the overall flow. (See Update below)

@JaGeo
Copy link
Member

JaGeo commented May 28, 2024

My initial review comment was unclear (sorry about this, i should not write on my phone): what I meant to say was that I would also recommend a test at the flow level. Of course, a part of the functionality is also already tested with the default values but not everything.

@JaGeo
Copy link
Member

JaGeo commented May 29, 2024

@gpetretto Thank you! I will merge.

@utf I am merging this now as this is a rather small change!

@JaGeo JaGeo merged commit a5e502f into materialsproject:main May 29, 2024
6 checks passed
hrushikesh-s pushed a commit to hrushikesh-s/atomate2 that referenced this pull request Jun 28, 2024
* limit maximum failures in elastic flow

* add property to Elastic flow subclasses

* Add flow test for failiure in elastic flow
BryantLi-BLI pushed a commit to BryantLi-BLI/atomate2 that referenced this pull request Jul 29, 2024
* limit maximum failures in elastic flow

* add property to Elastic flow subclasses

* Add flow test for failiure in elastic flow
@utf utf added the enhancement Improvements to existing features label Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvements to existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants