-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Using the robust solver for pyMBAR - avoiding convergence Failu… #735
[WIP] Using the robust solver for pyMBAR - avoiding convergence Failu… #735
Conversation
…res. ## Description In this PR, we would like to propose switching the default solver, if pyMBAR > 4.0.0, such we have an improved convergence rate at the cost of minimal more time. -> less errors thrown. ## Todos - [ ] Implement feature / fix bug - [ ] Add [tests](https://github.com/choderalab/openmmtools/tree/master/openmmtools/tests) - [ ] Update [documentation](https://github.com/choderalab/openmmtools/tree/master/docs) as needed - [ ] Update [changelog](https://github.com/choderalab/openmmtools/blob/master/docs/releasehistory.rst) to summarize changes in behavior, enhancements, and bugfixes implemented in this PR ## Status - [ ] Ready to go ## Changelog message ``` ```
|
We should use this to compare versions https://packaging.pypa.io/en/latest/version.html#packaging.version.Version |
There are a few options here, from our meeting:
@mikemhenry @RiesBen please add any comments to this thread if I'm missing something. Thanks! |
I'm not fully clued in on the discussion, but my suggestion would be, if possible, to keep the kwargs equal between realtime analysis and the final analysis. I.e. the failures are more likely to happen at low sample counts.
|
I think we decided against this since as the simulation goes on, the analysis takes longer so it would be better to do a fast method for the realtime analysis |
Assuming you're trying to avoid regressing, the "slow" method is the same method as pymbar 3. So you're not actually going "slower", indeed the "slow" method isn't actually clearly slower. Going for the "fast" method probably increases your chances of getting errored values on fractional datasets. |
The real time analysis currently instantiates its own I agree that we are probably jumping the gun and maybe the "robust" solver is fast enough (maybe even faster than the pymbar 3 implementation we were using). This is something that we should probably double check to keep our sanity. Other than that, I'd suggest that we probably want to adapt the test script in choderalab/pymbar#419 (comment) to make it a tests for our |
Maybe to add to this a little bit, what I'm advocating for is closer to:
|
I'm happy to meet with people to discuss what "in the future" means. For now, "Robust" should be the best option, and should fail in relatively few cases. There's some interesting options of creating synthetic data to improve convergence, but that adds bias. |
…default-to-robust-pyMBAR-solver
fa92d11
to
0d733d2
Compare
Just uploading this file here (from choderalab/pymbar#419 (comment)) so I can download the file for a test |
@IAlibay @ijpulidos ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of comments otherwise lgtm
@@ -2612,6 +2615,42 @@ def test_resume_velocities_from_legacy_storage(self): | |||
state.velocities.value_in_unit_system(unit.md_unit_system) != 0 | |||
), "At least some velocity in sampler state from new checkpoint is expected to different from zero." | |||
|
|||
@pytest.fixture | |||
def download_nc_file(tmpdir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using something like pooch be better for this kind of thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that but it felt like a lot to add for a single test, if we have more to download then we can add it.
reporter_file = download_nc_file | ||
reporter = MultiStateReporter(reporter_file) | ||
analyzer = MultiStateSamplerAnalyzer(reporter, max_n_iterations=n_iterations) | ||
f_ij, df_ij = analyzer.get_free_energy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would encourage doing a number regression check here rather than just a pure smoke test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that but we already do that in other tests, happy to add it. What threshold do we want to use to check that it is close?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be sampling anything, so ideally we should always be converging to a very similar value. If it changes, something changed enough to affect all our free energies. I would suggest something like 1e-4 or 1e-5 precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be honest, I forgot we were not sampling something, so I didn't even consider that we will get the same result each time we run (or really close to the same result)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need a docstring here telling what the test is doing and how the nc file was generated. For future reference.
openmmtools/tests/test_sampling.py
Outdated
# free energy | ||
assert abs(f_ij[0, -1] - -52.00083148433459) < 1e-5 | ||
# error | ||
assert abs(df_ij[0, -1] - 0.21365627649558516) < 1e-5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fun way to do this is pytest.approx if you've not tried it before!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used numpy.isclose
but never pytest.approx
before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the single value approximate, super fast and good error messages, I would encourage you to have a go!
I can't approve @mikemhenry 😅 , only comment - but lgtm |
Sounds good, I will wait for @ijpulidos to review it before we merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good to me. Thanks! I would only advice on making a docstring for the test so we can now what the test is doing and the NC file provenance.
@@ -2612,6 +2615,46 @@ def test_resume_velocities_from_legacy_storage(self): | |||
state.velocities.value_in_unit_system(unit.md_unit_system) != 0 | |||
), "At least some velocity in sampler state from new checkpoint is expected to different from zero." | |||
|
|||
@pytest.fixture | |||
def download_nc_file(tmpdir): | |||
FILE_URL = "https://github.com/user-attachments/files/17156868/ala-thr.zip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we cannot rely on this long term, since as far as I know the structure of these URLs can change without notice. Is there another way we can have a long-lived url? Not a big issue but probably something to keep in mind.
|
||
See https://github.com/choderalab/pymbar/issues/419#issuecomment-1718386779 for more | ||
information on how the file was generated. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding the comments, this is now much more clear. Great work.
Oh I forgot to suggest a changelog entry, this is what I am thinking New Behavior
|
I am going to merge this in, but feel free to comment on the changelog entry! |
Description
In this PR, we would like to propose switching the default solver, if pyMBAR > 4.0.0, such we have an improved convergence rate at the cost of minimal more time. -> less errors thrown.
Todos
Status
Changelog message