Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError thermostat.shape #405

Open
rsdmse opened this issue Jul 12, 2024 · 1 comment
Open

AssertionError thermostat.shape #405

rsdmse opened this issue Jul 12, 2024 · 1 comment

Comments

@rsdmse
Copy link

rsdmse commented Jul 12, 2024

I'm reaching out on behalf of a user on our cluster. Half way through an OTF training with SGP_Wrapper the job terminates with this error:

Traceback (most recent call last):
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/20220623_1.3.0/bin/flare-otf", line 8, in <module>
    sys.exit(main())
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/20220623_1.3.0/lib/python3.8/site-packages/flare/scripts/otf_train.py", line 372, in main
    fresh_start_otf(config)
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/20220623_1.3.0/lib/python3.8/site-packages/flare/scripts/otf_train.py", line 339, in fresh_start_otf
    otf.run()
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/20220623_1.3.0/lib/python3.8/site-packages/flare/learners/otf.py", line 433, in run
    self.md_step()  # update positions by Verlet
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/20220623_1.3.0/lib/python3.8/site-packages/flare/learners/otf.py", line 532, in md_step
    self.md.step(tol, self.number_of_steps)
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/20220623_1.3.0/lib/python3.8/site-packages/flare/md/lammps.py", line 289, in step
    self.backup(trj)
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/20220623_1.3.0/lib/python3.8/site-packages/flare/md/lammps.py", line 315, in backup
    assert thermostat.shape[0] == 2 * len(curr_trj) - 2 * n_iters
AssertionError

The tmp/log_<DATE> file looks normal as it ends with

if '$(c_MaxUnc) > 0.05' then quit
quit

What could be causing this issue or what are some of the things that we should be looking out for? (If you need to see the input files I'll have to ask for permission from the user.)

Also I have a general question about OTF's alternating MD (LAMMPS) - DFT (VASP) workflow in Slurm. Because the DFT step is the most intensive, the user needs to request a large amount of resources that is too excessive for the MD step. For instance, the job we're having problems with contains 100 atoms and is submitted to run on a few hundred cores. Based on what I've read (e.g. in this issue the developer recommended 40 cores for 62k atoms), having too many cores could be problematic. While we are not experiencing hanging, the performance seems to be very poor (17 timesteps/s) for such a small system. I wonder if you have any suggestions to improve the performance and the overall efficiency of the OTF workflow.

@rsdmse
Copy link
Author

rsdmse commented Jul 12, 2024

I forgot to mention that we're using Flare 1.3.0 and LAMMPS 23Jun2022. Should we upgrade to the latest versions of Flare and LAMMPS?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant