Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SiestaBaseWorkChain, handle "too many nodes" error #86

Open
pfebrer opened this issue Nov 20, 2020 · 3 comments
Open

SiestaBaseWorkChain, handle "too many nodes" error #86

pfebrer opened this issue Nov 20, 2020 · 3 comments

Comments

@pfebrer
Copy link
Collaborator

pfebrer commented Nov 20, 2020

This one is easy to solve, so I guess it could be incorporated into SiestaBaseWorkchain's error handling.

The exact error in SIESTA is:

Sparse pattern is oversubscribed with nodes, please reduce number of nodes.

I would have done it myself, but I've been looking at the code and I'm not sure how to add a new error in SiestaCalculation. How does a CalcJob know which error has happened?

@pfebrer
Copy link
Collaborator Author

pfebrer commented Nov 20, 2020

Ok, I discovered now that this is done in the parser, which looks at the MESSAGES file, and unfortunately this error does not write anything to MESSAGES.

@bosonie
Copy link
Member

bosonie commented Nov 21, 2020

It's ok to use the output file (see for instance here). Usually something is written in MESSAGES, but even if nothing is there, the fact that "INFO: Job completed" is not present will signal that an error occurred. Anyway, I can implement the logic if you want, but you have to describe me well what happens and also we should check all versions of Siesta supported by the plugin (Siesta 4.0, 4.1, MaX). Do they all behave the same for this error?

@pfebrer
Copy link
Collaborator Author

pfebrer commented Nov 21, 2020

I'm not sure if all versions show the error in the same way.

It is very easy to reproduce the error, you can check it for yourself. Submit a calculation of a single atom in a cluster with a "normal" number of processors. I don't know the limit, if you submit it in hpcq-farm5 with 24 cores you will see it :)

I don't have compilations of the three versions, if you have them maybe you can use the Iterator 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants