-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bond environments: bond-atom neighbouring #240
base: system-with-data
Are you sure you want to change the base?
bond environments: bond-atom neighbouring #240
Conversation
Thank you for the PR! These changes are quite big, so could you actually make one PR with only the first commit, so we can discuss it & improve on it, and then later add the other commits in separate PRs? I see a lot of changes to the C API, which I would prefer to keep minimal, since any additional function there impose requirements to new system implementations. Could you explain what the |
sure! I'm not sure what alternatives there would be to using this object explicitely. If I were to create a new system with "ghost atoms" to serve as bond centers, I could represent the old system's triplets with the new system's pairs? But a lot of the pairs would end up being bond/bond or atom/atom, and would go to waste. Unless I deliberately break I guess one way to bypass changing the system trait would be to have |
Inside system.compute_neighbors(bond_cutoff);
bonds = system.pairs();
// atoms_cutoff needs to be a bit bigger than the one in the current
// implementation to be sure we get the same set of neighbors.
system.compute_neighbors(atoms_cutoff);
let mut ba_triplets = vec![];
for bond in bonds {
let mid_point = positions[bond.first] + bond.vector / 2;
for pair in system.pairs_containing(bond.first) {
if distance_from_mid_point_is_below_cutoff {
ba_triplets.push(bond, other_atom_in_the_pair);
}
}
for pair in system.pairs_containing(bond.second) {
if distance_from_mid_point_is_below_cutoff {
ba_triplets.push(bond, other_atom_in_the_pair);
}
}
} This way, no changes to the system are required, and you can still get the data you need.
This is not possible, because the or even a Python implementation, where This is also why I am reluctant to add new functions to the System API, since that mean implementing this function everywhere, and while there is a neighbor list that can be adapted relatively easily in ASE/LAMMPS/…, nothing similar exists for bond-atom triplets. Computing the bond-atom triplets from two pairs list would solve this. EDIT: sorry, clicked the wrong button. |
oh, I see, thanks!
If it is true that none of those ideas would cut it, how do you think I should proceed? Thanks, and sorry for the late reply. |
Another solution would be to allow the calculator to store data in the system, but I don't love the implications of this either. I'll think about this over the weekend, and then maybe we can have a chat to find the best way forward! |
sure! |
A thing I thought about for this would be the possibility to add other methods to BaseCalculator that would be called with a not exactly great either (and it feels way jankier than just allowing systems to carry a cache around), but probably less disruptive than changing the System trait. |
Yes, this kinda what I landed on over the weekend. Replace the input of struct SystemWithData {
system: Box<dyn System>,
data: BTreeMap<String,Any>
} I think this is the cleanest solution here |
that does sound way better than what I suggested. |
I've implemented this in the For now, I allowed the data to be |
ah, should have replied earlier. |
6582113
to
30d6f47
Compare
and that should be it |
I'll be using a PR within m-stack-org to make sure it passes CI tests. (which catches more errors than cargo test and cargo bench) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only gave a quick look from a very high level, checking code organization and documentation.
I'm not sure I understand the separation between SphericalExpansionForBondType
and SphericalExpansionForBond`. Do you plan on using both separately? I could not find any registration of the calculator, so I'm not sure which one you need to use for the actual ML work.
I would only include the ones you directly need.
In the current code, NeighborsList, SphericalExpansionByPair and SphericalExpansion are all used directly in various ML pipelines of the lab.
rascaline/src/calculators/soap/spherical_expansion_bondcentered.rs
Outdated
Show resolved
Hide resolved
and most of your remarks should be taken into account (notable exception: the ones that were about "TODO"s left in the code) |
7971a1a
to
e841798
Compare
question: EDIT: let me merge to take the clebsch-gordan restructure into account |
Sorry for the delay, I was deep in another part of development which is now reaching its end; so I should have more time for rascaline work. There is not much you need to do for now, I have to do another full round of review on this & finish + merge #260. One point I wanted to discuss with you was how we could expose the fact that this is a very new, still experimental representation that might be changed if we realise it needs to do some things differently. My idea here was to have the code in the main library and make sure it continues working & passing tests, but force the user to opt-in and acknowledge this is experimental code if they want to use it. An environment variable would be a good way to do this: export RASCALINE_EXPERIMENTAL_BOND_ATOM_SPX=1
python script.py or
What do you think? Another point about this is when will the code move out of this experimental state and always be available. I'm thinking having one published paper with it + the main author (you in this case, but we will use similar mechanism for other new representation later) agreeing that the code is ready for general use. |
@@ -32,6 +33,15 @@ def __init__(self): | |||
.. _chemfiles.Frame: http://chemfiles.org/chemfiles.py/latest/reference/frame.html | |||
""" | |||
|
|||
if HAVE_PYSCF: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make the PYSCF system into a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
certainly!
for now, I'm bisecting a failing test on debian12 (that was there on debian11): valgrind is annoyed at entries 5 and 9 of cargo test -p rascaline-c-api --test run-cxx-tests
(a problem that pre-dates my changes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...bisecting only found the commit where equistore was renamed to metatensor, breaking the build on all older rascaline commits, due to a failed hash check on the zip archive containing metatensor at a given commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valgrind is annoyed at entries 5 and 9 of cargo test -p rascaline-c-api --test run-cxx-tests
(a problem that pre-dates my changes)
Hmm, can you open a separate issue for this? Or just share the setup with me on Slack? We run the valgrind tests on CI, so this is strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well nevermind the error disappears for a fresh clone of master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to look more into this, but when it does/doesn't happen seems to be nonsense for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 'bug' was in the Cargo.lock file, it seems. Do you want me to send it to you, see if a dependency needs to be updated or something?
OK! To be honest, I was also busy, with my candidacy exam.
Sure!
That seems like a good rule of thumb for this sort of thing. Thank you for thinking about this! |
oh no |
Yes, it should be. I mainly need to document and merge it! EDIT: PR breakage is fine, we can fix it =) |
ebff16d
to
b60fbc5
Compare
[restructured contribution, commit 1]
[restructured contribution, commit 2]
bd8fe3f
to
53c4591
Compare
[restructured contribution, commit 3]
53c4591
to
e3cd082
Compare
aaand rebased. |
except I probably need to rebase this further because of the name change, which seems to break some tests |
the changes I talked about, split into three commits that can be reviewed separately
There's just one point where I know feedback is needed, it's for the new API for the systems.
(specifically, the huge incoherence between
pairs_per_species
andtriplets_per_species
, that can be seen in the comments of the two methods)📚 Documentation preview 📚: https://rascaline--240.org.readthedocs.build/en/240/
⚙️ Download Python wheels for this pull-request (you can install these with pip)