-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: modernise HBA to use AG as objects internally instead of selection strings - code only PR (Issue #3933) #4533
base: develop
Are you sure you want to change the base?
Conversation
Hello @lunamorrow! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2024-04-26 01:03:36 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS
as part of this PR.
Linter Bot Results:Hi @lunamorrow! Thanks for making this PR. We linted your code and found the following: Some issues were found with the formatting of your code.
Please have a look at the Please note: The |
Just leaving some comments here for whoever is assigned to check this PR:
|
Update: That big block of code that I commented out was necessary. Turns out that its removal was causing 2 failures. I am adding it back in now, and attempting to tackle those 2 assertion errors. Question: I have added a commit to my branch, and I suspect I can then push updated code into this PR (i.e., as I fix things that were erroring and adjust/add tests)? Is this correct? |
@yuxuanzhuang can you please look after this PR or assign to someone else, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the in-line review that will fix some of your failed tests. Additionally, please fix other failed tests inside the test_hydrogenbonds_analysis.py
, mostly related to the switching output of guess_hydrogens
# self.acceptors_ag = self.guess_acceptors() #NOTE: do I need this? | ||
# self.hydrogens_ag = self.guess_hydrogens() | ||
# self.donors_ag = self.guess_donors() | ||
|
||
# Set atom selections if they have not been provided | ||
if self.acceptors_sel is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to make sure acceptors_ag, hydrogens_ag, etc. always exist.
else:
self.acceptors_ag = self.u.select_atoms(self.acceptors_sel)
|
||
# Select atom groups | ||
self._acceptors = self.u.select_atoms(self.acceptors_sel, | ||
updating=self.update_selections) | ||
self._acceptors = self.guess_acceptors() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already have self.acceptors_ag
above so you don't need to guess it again. Simply parse self._acceptors = self.acceptors_ag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops I missed this one, thank you for picking it up :)
I also don't have experience handling depreciation before, so I might need help from @hmacdope :) |
Yes, that's correct! After you've added a commit to your branch, you can push the updated code to the same branch that is associated with this PR. Any new commits pushed to the branch will automatically appear in the PR (as you might already saw). |
Thanks @yuxuanzhuang I am working through your comments now and will aim to push the updated branch shortly! Quick question about this comment:
I need to avoid using Also, do I need to worry about the "This Universe does not contain charge information" errors? I do not think they have anything to do with my changes? I am worried they could be "hiding" other possible errors further down that aren't being reached before this error is thrown though... |
Regarding depreciation, current provided code examples will break e.g. u = MDAnalysis.Universe(PSF, DCD)
hbonds = HBA(universe=u)
hbonds.hydrogens_sel = hbonds.guess_hydrogens("protein") # this will not be selection string.
hbonds.acceptors_sel = hbonds.guess_acceptors("protein")
hbonds.run() Should we consider accepting |
The logic behind |
Of course, all good! If you can just help me get this PR sorted and ready to go I'll be so grateful! The depreciation can be sorted later :)
Oh ok yes, I understand what you mean now! Yeah I have been a bit unsure on how to tackle this in the "best" way. The
|
Yes I agree. I am thinking a good work-around would be for users to define |
@yuxuanzhuang I have made some changes, as I proposed above, which I will push shortly. I have pretty much eliminated the errors/failures bar these two.
I suspect the charge one is due to your comment? Is there anything I can do to fix this?
I cannot find a call to |
I think there may be differences in the results when using import MDAnalysis
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis as HBA
u = MDAnalysis.Universe(PSF, DCD)
hbonds = HBA(universe=u)
hbonds.run()
print(hbonds.results.hbonds.shape)
> (10858, 6) # old results
> (4237, 6) # new results
hbonds._donors.n_atoms
> 375 # old results
> 74 # new results will end up with a different result from the current implementation. To facilitate testing, you can copy old hbond_analysis.py as hbond_analysis_old.py and I think this is currently not captured by the tests. Apologies that I don't really have enough time during the Easter to make thorough suggestions on what should be done about it right now. However, I would suggest to make sure you
|
Thanks @yuxuanzhuang, and I completely understand. I appreciate you taking time out of your Easter weekend to give me a hand. Unfortunately I am really struggling with this and I think I might need to take it back to the start and re-implement changes again from scratch. I have spent a good few hours reading over and tweaking code (including adjusting for your suggested changes 1 and 2), and it's still not working nicely with the tests. I have a number of tests that are getting empty AtomGroups, when the selection provided should return something, which is causing a plethora of problems. I'm getting a bit worried about finalising this fix before GSoC applications close, but I will keep this PR updated with any more questions as I go. |
Don't stress about the timeline, there is no requirement that your PR is merged before the application deadline, only that we have had a chance to work with you and go back and forth a bit. :) |
We can move this to a seperate PR when the time comes. :) |
Ok great, that is such a relief to hear @hmacdope. I have been panicking a bit! I'll focus on my written application and keep chugging away at this fix while I wait to hear if I get accepted :)
Perfect, I'll touch base with you before starting that to get some more clear direction! |
Much preferred to setting them by hand. |
Side note while testing: slicing UpdatingAtomGroup will result in a static AtomGroup. -> need a workaround to return an u.select_atoms('protein', updating=True).__class__
> MDAnalysis.core.groups.UpdatingAtomGroup
u.select_atoms('protein', updating=True)[2:].__class__
> MDAnalysis.core.groups.AtomGroup |
If you write this as as selection instead of slicing, can you maintain the updating group donors_ag.select_atoms(f"prop charge < {max_charge}", updating=True) |
This is great to know, thanks @orbeckst and @yuxuanzhuang! I'll make that change to Just one question: will the use of |
Hi @yuxuanzhuang @orbeckst @hmacdope. I'm jumping back on this PR as I have just finished off a bunch of university assessment. A couple weeks away from it was really beneficial for me to gain a fresh perspective and make fixes/changes relatively quickly. I have just committed updated versions of |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #4533 +/- ##
===========================================
- Coverage 93.64% 93.23% -0.42%
===========================================
Files 168 12 -156
Lines 21254 1079 -20175
Branches 3918 0 -3918
===========================================
- Hits 19904 1006 -18898
+ Misses 892 73 -819
+ Partials 458 0 -458 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need a bit more time to test on the new implementation, but I want to highlight an issue with the code examples in the current documentation. Currently, one can use:
hbonds = HydrogenBondAnalysis(universe=u)
hbonds.hydrogens_sel = hbonds.guess_hydrogens("resname ARG HIS LYS")
hbonds.acceptors_sel = hbonds.guess_acceptors("resname ARG HIS LYS")
However, this will fail with the current implementation because hbonds.acceptors_sel
requires a string, whereas guess_acceptors
returns an AtomGroup.
While it's acceptable to have a breaking change in the API, it would be beneficial to ensure that most of the existing code remains functional.
@@ -720,6 +741,8 @@ def _single_frame(self): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it somewhat inelegant that _get_dh_pairs
, a frame-wise operation, performs atom selection to create new donors and hydrogens objects. Ideally, given that updating=self.update_selections
applies to both hydrogens and donors, this should not be necessary.
Thanks @yuxuanzhuang and @orbeckst.
I understand what you mean. Just to make sure we are on the same page, I believe once these changes are implemented that I should be doing a new PR per @hmacdope's guidance that will issue a depreciation warning for some of these features and how the Class is currently used. If I ensure that the API/docstring examples are updated appropriately will that be ok? I cannot see a clean way to implement this change without making all the "guess" methods return AtomGroups instead of selection string. There are some other examples which will lose functionality. Like being able to guess hydrogens or acceptors from a certain group, like a protein (see below). Since my implementation only uses import MDAnalysis
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis as HBA
u = MDAnalysis.Universe(psf, trajectory)
hbonds = HBA(universe=u)
hbonds.hydrogens_sel = hbonds.guess_hydrogens("protein")
hbonds.acceptors_sel = hbonds.guess_acceptors("protein")
hbonds.run() I have gone through the rest of your comments and change requests, and made those changes to the code. I will clean up the comments and docstrings now, then push a new version for your review. Thank you :) |
Just a heads up, it's often necessary to rebase your code and resolve any conflicts with the latest develop branch. Here’s a helpful guide on how to do that: Rebasing Your Code. Last time, I performed the rebase directly on GitHub, which might have caused some issues for you (apologies!) You could |
Thanks @yuxuanzhuang. I realised I had forgotten to rebase before pushing the previous code, when I saw you had done the rebase for me. I made sure to do so for the most recent one. That's ok, I was a bit crazy on my end for the most recent push but I think I have it all sorted now! I'll just need to push a new branch to this PR somehow for further changes. |
Ok finally rebased and fixed up now. Everything should be sorted :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the inline comment and the code below that gets different results after the fix.
import MDAnalysis
from MDAnalysisTests.datafiles import PSF, DCD
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis as HBA
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis_old import HydrogenBondAnalysis as HBA_old
u = MDAnalysis.Universe(PSF, DCD)
hbonds_old = HBA_old(universe=u,
hydrogens_sel='around 10 resid 5 and name H*',
acceptors_sel='resid 5 and name O',
update_selections=True)
hbonds_old.run()
print(f'old results: {hbonds_old.results.hbonds.shape}')
hbonds = HBA(universe=u,
hydrogens_sel='around 10 resid 5 and name H*',
acceptors_sel='resid 5 and name O',
update_selections=True)
hbonds.run()
print(f'new results: {hbonds.results.hbonds.shape}')
> old results: (21, 6)
> new results: (6, 6)
Given all of your tests pass, it might be worth adding it to your test as well and making sure it also passes.
I am actually not a daily user of HBA (for a reason!) and I don't really know how people normally use it, so feel free to pitch in! @MDAnalysis/coredevs
It's worth noting that it runs so so so much faster!!!
hbonds = HBA(universe=u)
%timeit -r 2 -n 5 hbonds.run()
> old 2.85 s ± 8.15 ms per loop (mean ± std. dev. of 2 runs, 5 loops each)
> new 168 ms ± 6.62 ms per loop (mean ± std. dev. of 2 runs, 5 loops each)
self._donors, self._hydrogens = self._get_dh_pairs() | ||
|
||
def _single_frame(self): | ||
|
||
box = self._ts.dimensions | ||
|
||
# Update donor-hydrogen pairs if necessary | ||
if self.update_selections: | ||
self._donors, self._hydrogens = self._get_dh_pairs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still needed for _single_frame
because self._donors
will be updated every frame with the change of self._hydrogens
. You need to separate the part that creates self._hydrogens
from it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lunamorrow The reason you get different results is that in the old code, self._donors
is updated every frame based on self._hydrogens
. However, if you don't execute _get_dh_pairs
every frame, self._donors
will become a non-updating AtomGroup based on the first frame.
Thanks for your quick feedback @yuxuanzhuang! :)
That is really weird as far as the results differing between the old and updated HBA. I will have a dig around on my end this week and fix it up.
Will do!
Wow that is huge, and will have a really big impact for users performing HBA on large systems! That is so amazing to see the impact that this one change has had. I'll wait to hear back from the core devs as far as the loss of some functionality. |
You're asking for other @MDAnalysis/coredevs to jump in and comment "as far as the loss of some functionality.". Can you please summarize the changes so we don't have to read everything in the PR?
Thanks. |
I'm seeking insights on typical uses of HBA prior to and following the changes introduced by this PR. Below are some user scenarios categorized into two main aspects: 1) selections made for analysis and 2) information available in the Universe (omitted for now) Selection
I think it's an important user case and needs workaround, e.g. allow
Currently, this results in divergent results .
sidenote: they are non-updating atom groups Information contained in the Universe (omitted for now)
Let me know if there are additional scenarios or details that might be relevant. |
Just a very brief 2 cent here - the current functionality cannot change or stop working until the 3.0 release. The approach we have usually taken is to temporarily support both input types (and make sure they give the same answer). |
I agree. The plan I made with @hmacdope is to roll these changes out with 3.0 and create a second PR after this one is done to issue a depreciation warning. Would the support of both types be to integrate now, and then remove this support with 3.0? I can make this happen if this is the route we want to take @orbeckst @yuxuanzhuang @IAlibay.
As @yuxuanzhuang said, the main issue is that @MDAnalysis/coredevs would you prefer this work around to include enabling |
I'm considering to allow |
Ok easy, I can get started on that then. I have a crazy few days coming up, but I'll aim to push a revision with this functionality added over the weekend :) |
just chiming in that accepting either AG or string as suggested here: #4533 (comment) is a good idea; strings are a nice way to start, but if you've got a really odd system or want to do quite specific analysis then being able to just hand the analysis an AG to work on directly is heaps easier than finding the magic string that recreates this AG |
Fixes #3933
Changes made in this Pull Request:
guess_acceptors
,guess_hydrogens
,guess_donators
,group_categories
andget_dh_pairs
to remove reliance of HydrogenBondAnalysis on selection strings being passed internallyPR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4533.org.readthedocs.build/en/4533/