Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC's troubleshooting: Orca's mdci error #769

Open
NellyMitnik opened this issue Nov 9, 2024 · 1 comment
Open

ARC's troubleshooting: Orca's mdci error #769

NellyMitnik opened this issue Nov 9, 2024 · 1 comment
Assignees

Comments

@NellyMitnik
Copy link
Contributor

Describe the bug
For documentation purpose:
While running opt job for OH specie in Orca as part of finding the frequency scaling factor project, I encountered the following error in Orce:


[file orca_mdci/mdci_state.cpp, line 1165, Process 2]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 3]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 4]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 5]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 6]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 7]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 8]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 9]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 10]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 11]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 12]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 13]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 14]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 15]: . . . aborting the run

Error (ORCA_MDCI): Number of processes (16) in parallel calculation exceeds number of pairs (13)
[file orca_mdci/mdci_state.cpp, line 1165, Process 0]: . . . aborting the run


ORCA finished by error termination in MDCI
Calling Command: mpirun -np 16  /usr/local/orca-6.0.0/orca_mdci_mpi input.mdciinp.tmp input 
Check for MDCI-logfiles
[file orca_main/main_driver_opt1.cpp, line 1805]: ORCA finished with an error in the energy calculation

ARC's troubleshooting tries to rerun the job with the suggested number of "pairs" by Orca. In this case, 13 ncpus.

ARC Troubleshooting Orca MDCI error: ncpus

The job with 13 ncpus, also got the same error. This enters an endless loop of failed jobs.

I tried manually, changing the number of ncpus in the input and the submit script to 10. It worked and Orca converged successfully.

Suggestion: Maybe ARC's troubleshooting should consider a slightly lower number of ncpus than suggested by Orca's error.

@kfir4444
Copy link
Collaborator

quick follow-up, an edge case of running [H] is done as follows:

"Error (ORCA_MDCI): Number of processes (20) in parallel calculation exceeds number of pairs (0)".
Troubleshooting sp job in orca for H using 0 cpu cores.

which is obviously none-sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants