Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart leads to Missing steps in output.log error in diagnostics.py #90

Open
jakobcasa opened this issue Dec 4, 2023 · 10 comments
Open

Comments

@jakobcasa
Copy link

Dear all

I have run an MD situation (dt =0.5fs, 400 steps). After ~1/3 (e.g. 66 fs), the time limit of the cluster is reached. I had to restart the MD. After the 400 steps were completed, I ran the diagnostics.py file, and in every traj, it gave me the same output: that the missing steps were at the time when I restarted the MD. I want to send you the data produced, but the files are too big. Is there another way to send you the data?

Is there a way to avoid a rerun of the complete traj? And can the diagnostics.py be run while the traj is still running to avoid time lost?

Thank you

Best Jakob

The output from diagnostics.py of the traj, which I would like to send you, is:


    Output files:     .lis .. .log .. .dat .. .xyz .. OK
    Restart files:    ctrl .. traj .. restart/ ..     OK
    Progress:         [=========================]     200.0 of 200.0 fs
    Status:                                           FINISHED
    Data extractor...                                 OK
    Energy:                                           OK
    Population:                                       OK
    Intruder states:  Missing steps in output.log     at 67.50 fs
@maisebastian
Copy link
Collaborator

Dear Jakob,
when restarting a trajectory, you have to give either of the two keywords:
restart_rerun_last_qm_step or restart_goto_new_qm_step
Which one did you use? The correct choice for restarting an interrupted trajectory is restart_rerun_last_qm_step.

The error about missing steps mentions the output.log file. You should inspect (or just send) this file to see what happened.

Best,
Sebastian

@jakobcasa
Copy link
Author

Dear Sebastian

I did use the correct keyword (restart_rerun_last_qm_step). I've attached the output.log file for you to look over.

Thank you

Best Jakob

output.log

@maisebastian
Copy link
Collaborator

Dear Jakob,
it seems that diagnostics.py is confused by the fact that step 67.5fs appears twice in the log file. Can you please check the output.lis and output.dat files, whether they also have this time step twice?

If these two files have this time step only once, in principle everything is ok and the only problem is a bug in diagnostics.py. You will not need to relaunch the trajectories. As a quick solution, you could remove/modify the lines in output.log to avoid confusing diagnostics.py.

Yes, diagnostics.py can be run while a trajectory is currently propagating. Instead of saying "FINISHED" it should report "RUNNING" (or maybe "STUCK" if it thinks that the most recent time step takes unexpectedly long).

Best,
Sebastian

@lukhman9020
Copy link

Dear Sebastian

I did use the correct keyword (restart_rerun_last_qm_step). I've attached the output.log file for you to look over.

Thank you

Best Jakob

output.log

Dear Jakob,
Could you please explain how you restarted the trajectory and where you put this command "restart_rerun_last_qm_step".

Thank you
Lukhmanul hakeem k

@jakobcasa
Copy link
Author

Dear Lukhmanul

Enclosed is the input file I used in those runes; all traj have a similar input.

If you need something else, please let me know.

Best Jakob

input.txt

@lukhman9020
Copy link

Dear Lukhmanul

Enclosed is the input file I used in those runes; all traj have a similar input.

If you need something else, please let me know.

Best Jakob

input.txt

Thank you sir.

@maisebastian
Copy link
Collaborator

Dear Lukhmanul,
the restart keywords are described right at the begin of section 4.1.3. in the SHARC manual (https://sharc-md.org/?page_id=50#tth_sEc4.1.3). These keywords go into the input file. Make sure that the restart.* files and restart/ folder are there.
Best,
Sebastian

@lukhman9020
Copy link

Dear Lukhmanul, the restart keywords are described right at the begin of section 4.1.3. in the SHARC manual (https://sharc-md.org/?page_id=50#tth_sEc4.1.3). These keywords go into the input file. Make sure that the restart.* files and restart/ folder are there. Best, Sebastian

Thank you sir.

@jakobcasa
Copy link
Author

Dear Sebastian, dear Lukhmanul

Is there an ordering of keywords with which this error does not occur? If so, could you let me know where the restart and the "restart_rerun_last_qm_step" keywords I should add instead

Thank you

Best Jakob

@maisebastian
Copy link
Collaborator

Dear Jakob,
the input file in SHARC is completely agnostic of the order of the keywords. Internally, SHARC has its own order of the keywords and searches the entire file for whatever keyword it wants to know next.

I have now found the time to fix your problem. It was caused by a small bug in diagnostics.py, which was mixing up the two handling of the two restart cases. Please check diagnostics.py in the main branch (https://github.com/sharc-md/sharc/blob/main/bin/diagnostics.py). Note that the bug fix is not found in the latest release. Please let me know whether this fixes your problem.

Best,
Sebastian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants