Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and improve Snapshot API #819

Closed
bdice opened this issue Oct 20, 2020 · 18 comments · Fixed by #1079
Closed

Document and improve Snapshot API #819

bdice opened this issue Oct 20, 2020 · 18 comments · Fixed by #1079
Assignees
Labels
breaking Changes that will break API. complex A particularly complex or large project that involves significant amount of effort. enhancement New feature or request

Comments

@bdice
Copy link
Member

bdice commented Oct 20, 2020

Description

I had a hard time figuring out how to calculate the number density of a HOOMD v3 simulation running in MPI. sim.state.snapshot.particles.N and sim.state.snapshot.configuration.box are only set on the root rank, but the snapshot has to be created on all ranks (it requires collective access). It would not be possible to use the computed density for further simulation changes on any rank but the root. Luckily there are state properties like sim.state.N_particles and sim.state.box that can be used, but this wouldn't work if the properties of interest were arrays like particle positions.

Proposed solution

@joaander @b-butler I read through the hoomd.State docstring and the snapshot description again after our conversation today. It describes the same things you told me today, but I didn't understand it when trying to work through it on my own. Maybe there are ways to make this clearer -- I would have benefitted most from a code snippet demonstrating the access pattern, as I suggest below.

We could document that MPI snapshot access patterns should look like this (if the information isn't needed on all ranks):

snap = sim.state.snapshot  # Uses all ranks
if snap.exists:  # Only triggers on root rank
    print(snap.particles.N)

We should also provide a simple mpi4py example for the case where the information is needed on all ranks:

snap = sim.state.snapshot  # Uses all ranks
if snap.exists:  # Only triggers on root rank
    first_position = snap.particles.position[0]
# Do some mpi4py broadcasting here so all ranks have the data from first_position

Also for what it's worth, this code snippet deadlocks because snapshot gathering is collective (this is expected and sort-of documented).

if sim.device.communicator.rank == 0:
    snap = sim.state.snapshot

Additional context

@joaander suggested we might be able to provide a simple wrapper for broadcasting to all ranks, but that might be just as complicated to use as mpi4py, so it's not clear whether it would result in a net improvement for usability / user intuition.

@bdice bdice added the enhancement New feature or request label Oct 20, 2020
@vyasr
Copy link
Contributor

vyasr commented Oct 20, 2020

For context, is this any different than in HOOMD 2.x? I agree that an example would be great, but I just want to make sure that this isn't a regression. The same issues arise with having to broadcast snapshots in the 2.x API, right?

@joaander
Copy link
Member

We deleted much of the v2.x snapshot docs. The rewritten docs do need improvement and these are good suggestions. The need to access global snapshots on rank 0 only is the same in v2 and v3.

@vyasr
Copy link
Contributor

vyasr commented Oct 20, 2020

Yes, I think starting from scratch is probably best to avoid having any misleading examples in there 👍

@joaander joaander added this to the v3.0.0-beta.2 milestone Nov 9, 2020
@joaander
Copy link
Member

joaander commented Nov 9, 2020

In addition to this, Snapshot needs documentation in general.

@joaander joaander modified the milestones: v3.0.0-beta.2, v3.0.0-beta.3 Nov 30, 2020
@joaander joaander changed the title Improve documentation of MPI behavior with snapshots Document and improve Snapshot API Dec 17, 2020
@cbkerr
Copy link
Member

cbkerr commented Dec 18, 2020

Perhaps the following struggles can help inform the composing of Snapshot documentation. @joaander suggested I post them in this discussion.

I was trying to read a unit cell from gsd and then start a simulation with many unit cells.
I read the documentation.

Semi-intuitively I wanted to do:

# load unit cell
sim.create_state_from_gsd(filename=job.fn("sm2-unit.gsd"))
sim.state.snapshot.replicate(10,10,10)

but still only one unit cell is there.

Next I try:

# load unit cell
sim.create_state_from_gsd(filename=job.fn("sm2-unit.gsd"))
sim.state.snapshot = sim.state.snapshot.replicate(10,10,10)

but the problem is that replicate returns None

Try again:

# load unit cell
sim.create_state_from_gsd(filename=job.fn("sm2-unit.gsd"))
snap = sim.state.snapshot
big_snap = snap.replicate(10,10,10)
sim.state.snapshot = big_snap

Doesn't fix it because replicate still returns None.

I tried re-initializing the snapshot,

sim.create_state_from_snapshot(snap)

naturally getting "cannot initialize more than once".

Some confusion points

  1. Create a snapshot directly from gsd feels like it's a really common operation so the code felt like it should be ~1 line. I remember dealing with other python code where you do something like new_thing = foo.bar.create(args).bar() (like freud)
  2. The docs say hoomd.State "Provides access (read/write) to a hoomd.Simulation object’s particle, bond, angle, etc. data." so I thought this meant I could assign to it. In the code, I see that hoomd.State.replicate() is not implemented yet
  3. I'm used to python returning references to things....could the behavior of replicate be changed at this point?

By trial and error, I settled on this:

device = hoomd.device.auto_select()
junk = hoomd.Simulation(device=device) # type hoomd.simulation.Simulation
# load unit cell
junk.create_state_from_gsd(filename=job.fn("sm2-unit.gsd"))
snap = junk.state.snapshot
# set up large box
# big_snap = snap.replicate(10,10,10)
snap.replicate(10,10,10)
sim = hoomd.Simulation(device=device)
sim.create_state_from_snapshot(snap)

This works because the snap is replicated in place and assigned to the new simulation.

@b-butler
Copy link
Member

b-butler commented Dec 18, 2020

@cbkerr
So the state.snapshot property returns a hoomd.Snapshot object. By the nature of the Snapshot class, changes to it cannot directly change the hoomd.State (since we copy the data and gather across MPI ranks). That is why it must be set directly through sim.state.snapshot = snapshot. The local snapshot API does allow for direct access through a context manager which allows inplace editing although the particle, bond, angle, numbers cannot change.

Taking your points in inverse order

  1. We can make Snapshot.replicate return itself if that is more intuitive.

2 . I think this may be more a documentation thing? hoomd.Snapshot cannot be made to automatically update the hoomd.State without a huge performance hit, and the local snapshots exist to all for direct modification. One other option that could also be done is to make snapshots only accessible through a getter and force the user to use a setter to set the state to a new snapshot. This would likely prevent any confusion about the direct setting of a snapshot.

  1. I am not sure how common that is without wanting to go through the state first. Are you suggesting your want something like hoomd.Snapshot.from_gsd_file? PR Create hoomd.Snapshot from gsd.hoomd.snapshot and allow Simulation.create_state_from_snapshot to accept gsd snapshotss #893 is working on a method to take a gsd.hoomd.Snapshot and turn it into a hoomd.Snapshot. We were going to make that private, would you want it to be public?

@vyasr
Copy link
Contributor

vyasr commented Dec 21, 2020

Just to clarify, the expected way to achieve what @cbkerr wants in the current API is the following, right?

sim.create_state_from_gsd(filename=job.fn("sm2-unit.gsd"))
snap = sim.state.snapshot
snap.replicate(10,10,10)
sim.state.snapshot = snap

@cbkerr I'm trying to get a sense for how things could be improved. Did you try this alternative? If not, was the problem that when replicate didn't return something you didn't consider that replicate must be happening in place? Is the issue what @b-butler mentioned in point 2 above, that it wasn't clear that you could set the snapshot initially, and then when replicate didn't behave as expected you didn't try the right combination of replicate in place followed by setting?

@b-butler is sim.state actually writeable? I wonder if stating that it's read/write might be adding to the confusion there.

@cbkerr
Copy link
Member

cbkerr commented Dec 22, 2020

1 - @b-butler, I would have used the gsd to snapshot function of #893 if it were implemented and documented (ie, removing this warning). I think it would be very common in cases where you store a base state in a .gsd file (like a unit cell; or a grid of particles without assigned identities) and make some modifications before running.

@cbkerr
Copy link
Member

cbkerr commented Dec 22, 2020

3 - returning the modified self of Snapshot.replicate would be intuitive at first glance. Are there any technical downsides to returning the Snapshot's modified self? Would it conflict with the standard for other areas of the API? (I'm still new to it)

I may have identified a root cause of the confusion, so read on.

@vyasr, yes, the code you provided works BUT the middle two lines seem redundant to me. And yes, at first I assumed that replicate would modify in place, but I didn't consider that it could do so without returning anything.

~~~interlude~~~
Thus far, I think either sim.state.snapshot.replicate() should either:
(1A) Modify the hoomd.State in place, which is technically infeasible so ruled out, or
(1B) return the replicated snapshot (which is what I tried next)
~~~~~~~~~~~~

While writing the code, I did not understand what @b-butler raised in his point 2. I assumed that I could modify the simulation state directly. I know the following quote from Brandon relates to making it harder for people to make my mistake:

make snapshots only accessible through a getter and force the user to use a setter to set the state to a new snapshot

I am inclined to agree, but I can't quite visualize what this would look like. Also, by "direct setting of snapshot", do you mean "setting snapshot while it's still 'attached' to the simulation State?" In other words: "setting snapshot in the same manner by which you would access the simulation snapshot?"

The fundamental confusion
Saying replicate "in place" has a different intuitive meaning (to someone 1 year into python and who didn't write the API) depending on whether you are doing:
(2A) sim.state.snapshot.replicate() --> "replicate the simulation state's snapshot in place and assign it to the state"
(2B) snap.replicate() --> "replicate the snapshot in place"

To illustrate this confusion of syntax, note that @vyasr rephrases point 2 as follows:

it wasn't clear that you could set the snapshot initially, and then when replicate didn't behave as expected you didn't try the right combination of replicate in place followed by setting?

Note that @Vyas is using the definition of "replicate in place" that means "replicate the snapshot in place" even though we are both talking about code that follows the syntax of case (2A).
Actually, I can't "set the snapshot" of the simulation state!

Is this a sensible way of explaining the confusion?
Does @b-butler's point about getters and setters resolve this?

@vyasr
Copy link
Contributor

vyasr commented Dec 22, 2020

I've responded to a few of your points inline below, but let me summarize what I think the two core issues are:

  1. The v3 API is inconsistent (generally for good reason) on when something returns a copy and when something happens in place. For example, in this instance sim.state.snapshot gives you a new snapshot object, but replicate operates on this snapshot in place, and then you have to reassign it to the state.
  2. The v3 API makes extensive use of properties, and many objects are deeply nested, so when nested attributes have to be set it's often unclear. To your point about what "in place" means, the fact that sim.state.snapshot is a copy that is then modified in place by replicate means that you have to be aware of which properties are copies and which aren't in order to understand how to set things.

Does that sound right to you? If so, would you be less confused if everything worked "in place"? If so, would it help if we identified the dominant pattern (in place vs copy) throughout HOOMD and documented the exceptional cases where things are done the other way? I don't think there is any reasonable way to avoid some level of inconsistency in this fashion without either 1) incurring significant performance penalties or 2) walking back some of v3's commitment to property based access to everything. However, if you think this accurately summarizes the confusion then maybe it would be possible to make one pattern much more common than the other, making it easier to clearly document such behavior.

3 - returning the modified self of Snapshot.replicate would be intuitive at first glance. Are there any technical downsides to returning the Snapshot's modified self? Would it conflict with the standard for other areas of the API? (I'm still new to it)

I don't see any technical downsides with just having replicate end with a return self. My usual argument against doing something like this is that some users may assume from such a pattern that the returned value is a copy, rather than a reference to the same object that has been modified in place.

I may have identified a root cause of the confusion, so read on.

@vyasr, yes, the code you provided works BUT the middle two lines seem redundant to me. And yes, at first I assumed that replicate would modify in place, but I didn't consider that it could do so without returning anything.

Yes, that makes sense.

~~~interlude~~~
Thus far, I think either sim.state.snapshot.replicate() should either:
(1A) Modify the hoomd.State in place, which is technically infeasible so ruled out, or
(1B) return the replicated snapshot (which is what I tried next)
~~~~~~~~~~~~

While writing the code, I did not understand what @b-butler raised in his point 2. I assumed that I could modify the simulation state directly. I know the following quote from Brandon relates to making it harder for people to make my mistake:

make snapshots only accessible through a getter and force the user to use a setter to set the state to a new snapshot

I am inclined to agree, but I can't quite visualize what this would look like. Also, by "direct setting of snapshot", do you mean "setting snapshot while it's still 'attached' to the simulation State?" In other words: "setting snapshot in the same manner by which you would access the simulation snapshot?"

Yes, I think you are correctly interpreting his comment. I think what @b-butler means here is to change the code so that my snippet would no longer work, and you would instead have to do the following:

sim.create_state_from_gsd(filename=job.fn("sm2-unit.gsd"))
snap = sim.state.snapshot
snap.replicate(10,10,10)
sim.state.set_snapshot(snap)

In other words, preventing direct assignment to sim.state.snapshot.

The fundamental confusion
Saying replicate "in place" has a different intuitive meaning (to someone 1 year into python and who didn't write the API) depending on whether you are doing:
(2A) sim.state.snapshot.replicate() --> "replicate the simulation state's snapshot in place and assign it to the state"
(2B) snap.replicate() --> "replicate the snapshot in place"

Yes, this concern is very valid; even to someone who knows how Python works very well you would have to read the appropriate documentation to know which of these are copies and when things can be modified in place, etc.

To illustrate this confusion of syntax, note that @vyasr rephrases point 2 as follows:

it wasn't clear that you could set the snapshot initially, and then when replicate didn't behave as expected you didn't try the right combination of replicate in place followed by setting?

Note that @Vyas is using the definition of "replicate in place" that means "replicate the snapshot in place" even though we are both talking about code that follows the syntax of case (2A).
Actually, I can't "set the snapshot" of the simulation state!

Is this a sensible way of explaining the confusion?
Does @b-butler's point about getters and setters resolve this?

@cbkerr
Copy link
Member

cbkerr commented Dec 22, 2020

@vyasr I agree with your 2-point summary.

Knowing that I shouldn't expect consistency goes a long way toward resolving the confusion, so I think a high-level documentation page explaining the underlying reasoning and rules about when to expect copy vs. in-place could resolve this without having to make one design pattern more prevelant.

I don't know if this is made more or less confusing by {knowing python really well, knowing hoomd 2 really well, just starting with hoomd 3, ...}, because I'm fairly new to all of them. Maybe we need to informally survey other new users?

some users may assume from such a pattern that the returned value is a copy, rather than a reference to the same object that has been modified in place

Funny, from my experience with python, I've been frustrated so often with getting back references instead of copies that I've learned to expect references instead of copies!


About switching to setters and getters suggested by @b-butler:
Personally, it feels natural to use property notation to access properties, so I don't think the lack of getters is a problem. (Aside: does hoomd 3 currently have any properties that require getters currently? If not, then not having them is definitely not a problem.)

I can see how using the = notation instead of making a setter method looks more elegant, but maybe having an explicit setter for hoomd.State would emphasize that this is an expensive operation?
Could the rule for documentation referred to above be related to whether or not the attribute has a setter? For example, perhaps the presence of the setter sim.state.set_snapshot(snap) means that sim.state.snapshot.replicate() modifies in-place a copy of sim.state.snapshot, not the actual state, because modifying the original is an expensive computation. I could see this generating other inconsistencies in the API; do you see any?

@joaander
Copy link
Member

To be clear State.snapshot returns a copy by design, not just for performance reasons. It is intended for use in asynchronous analysis, hybrid MD/MC, umbrella sampling, and other cases where you need to capture a snapshot (as in a picture) of state without modifying the system and may potentially need to reset the system state later (i.e. a rejected move). This is also not a change in behavior from v2. The only difference is that we wrapped the take/restore methods from v2 in a property.

Methods of State access or modify the state itself. For example, a State.replicate method (if we implemented one) would replicate the system in place as @cbkerr intended.

With respect to this case, I think we just need to make the docs clearer for Snapshot. We could also replace the property with explicit get/set methods take_snapshot / restore_snapshot to make this intention more clear.

With regards to the larger conversation on reference/copies for nested data structures - HOOMD currently returns copies for most parameters and type parameters. #776 implements proxies that make it appear that the user can modify nested data in place, even though the implementation is performing a read/modify/write operation. The main exception to this are the lists of Operations, Forces, etc... which store references and Triggers/ParticleFilters which are returned by reference. Triggers and ParticleFilters are immutable objects so reference vs copy is less of an issue there.

@b-butler
Copy link
Member

Okay, I think the conclusions thus far sare we should more clearly document the nature of the returned snapshot object, and make Snapshot.replicate return itself. Also, what does everyone think of making the snapshot property unsettable and creating a set_snapshot method? While it would work, I would prefer to use = if the documentation could be clear enough to everyone.

@vyasr
Copy link
Contributor

vyasr commented Dec 24, 2020

To be clear State.snapshot returns a copy by design, not just for performance reasons. It is intended for use in asynchronous analysis, hybrid MD/MC, umbrella sampling, and other cases where you need to capture a snapshot (as in a picture) of state without modifying the system and may potentially need to reset the system state later (i.e. a rejected move). This is also not a change in behavior from v2. The only difference is that we wrapped the take/restore methods from v2 in a property.

Methods of State access or modify the state itself. For example, a State.replicate method (if we implemented one) would replicate the system in place as @cbkerr intended.

With respect to this case, I think we just need to make the docs clearer for Snapshot. We could also replace the property with explicit get/set methods take_snapshot / restore_snapshot to make this intention more clear.

With regards to the larger conversation on reference/copies for nested data structures - HOOMD currently returns copies for most parameters and type parameters. #776 implements proxies that make it appear that the user can modify nested data in place, even though the implementation is performing a read/modify/write operation. The main exception to this are the lists of Operations, Forces, etc... which store references and Triggers/ParticleFilters which are returned by reference. Triggers and ParticleFilters are immutable objects so reference vs copy is less of an issue there.

Okay, I think the conclusions thus far sare we should more clearly document the nature of the returned snapshot object, and make Snapshot.replicate return itself. Also, what does everyone think of making the snapshot property unsettable and creating a set_snapshot method? While it would work, I would prefer to use = if the documentation could be clear enough to everyone.

The logic for most of the statements above is sound and in general I think that the right choices have been made on copies vs references. I also agree that improving the documentation for Snapshot would help a lot. The one area where I would like to highlight room for discussion is where the API encourages property-based access of deeply nested objects. I think that this introduces confusingly fluid representations. For example, sim.state.snapshot.particles.positions[:] *= 2 results in the creation of a new snapshot object that copies the system configuration, but then allows in-place modification. This problem is representative of every case where a property returns a copy of an object that itself support in-place modification. As @joaander points out, while #776 addresses this confusion for typeparams etc by giving every object a notion of being owned by a parent object, snapshots of a state avoid this by design.

I'm not sure that any of the core HOOMD devs are good people to determine how confusing this is, someone like @cbkerr is probably more representative of our user base. Perhaps we should ask for a few additional opinions. However, using getter/setter methods rather than properties specifically in cases like state.snapshot above might help resolve some of this confusion. After #776 is complete I'm not sure there will be many examples of such cases anyway, so this would only require a small set of changes.

From a performance perspective, we had a long discussion about the use of properties vs methods in coxeter that might be instructive. My argument was that properties should be cheap to evaluate, since otherwise it hides an expensive operation behind an apparent attribute access. This is (to my knowledge) the generally accepted consensus on property usage, for instance see here and here. On the other hand, in coxeter a large part of its value is based on making things highly mutable. We ended up deciding that performance concerns in coxeter are pretty negligible anyway since even "slow" operations in that case are fast in absolute terms, and by my own statement of purpose for the project usability is a higher priority than performance. However, @b-butler also made the point that in HOOMD v3 choosing to use properties as much as possible even at the expense of hiding potentially expensive operations is an intentional design choice. Assuming that we are fine with that as a conscious design choice, I would say that we should reserve the getter/setter pattern above for cases where there is a good design reason for nested structure to have different value vs reference semantics, and otherwise use properties everywhere and just tell users up front that they should profile their code if it's slow.

@joaander
Copy link
Member

joaander commented Jan 4, 2021

For example, sim.state.snapshot.particles.positions[:] *= 2 results in the creation of a new snapshot object that copies the system configuration, but then allows in-place modification. This problem is representative of every case where a property returns a copy of an object that itself support in-place modification

Unfortunately, Python does not support concepts of const or return by value that C++ does. This strong typing makes it clear to the developer what they are getting, though they still need to refer to the documentation (or at least the function definition) to see whether a particular get* method returns a copy or a reference.

I think the best we can do here is remove the snapshot property which appears to be causing mass confusion and replace it with appropriately named getter and setter methods to make the intent clear and implement #776 so that parameter objects support nested modification to provide the appearance of return by reference semantics that Python programmers expect.

Going forward we should adopt a consistent API and carefully use properties only for quantities that are conceptually owned by the parent object. Snapshot's do not fit this test as they are an independent copy of the system state.

@joaander joaander modified the milestones: v3.0.0-beta.3, v3.0.0-beta.4 Jan 11, 2021
@joaander joaander modified the milestones: v3.0.0-beta.4, v3.0.0-beta.5 Feb 15, 2021
@joaander joaander modified the milestones: v3.0.0-beta.5, v3.0.0-beta.6 Mar 12, 2021
@joaander joaander modified the milestones: v3.0.0-beta.6, v3.0.0-beta.7 May 17, 2021
joaander added a commit that referenced this issue May 20, 2021
@joaander joaander modified the milestones: v3.0.0-beta.7, future Jun 16, 2021
@joaander joaander removed this from the future milestone Jul 21, 2021
@joaander joaander added breaking Changes that will break API. complex A particularly complex or large project that involves significant amount of effort. labels Jul 21, 2021
@joaander joaander self-assigned this Jul 27, 2021
@joaander
Copy link
Member

Should we also implement a setter method for State.box? This is another example where a property returns a copy instead of a reference. For example, state.box.scale(2) does not scale the state's box. It scales the unnamed temporary copy.

@bdice
Copy link
Member Author

bdice commented Jul 27, 2021

Should we also implement a setter method for State.box? This is another example where a property returns a copy instead of a reference. For example, state.box.scale(2) does not scale the state's box. It scales the unnamed temporary copy.

Here is a related issue in freud: glotzerlab/freud#680

joaander added a commit that referenced this issue Jul 28, 2021
Some developers discussed in #819 that they expect methods that make
in-place modifications to return the object. This enables chaining
of modifications.
joaander added a commit that referenced this issue Jul 28, 2021
Per the discussion in #819, users find it confusing that the
snapshot property access returns copies and does not allow direct
modification. Make these explicit methods so that the intent
of the design is clear as is the performance implications.

Also, prevent the type names from changing when restoring
snapshots. Allowing the names to change would lead to parameters
for one type applying to another.
@joaander
Copy link
Member

To all who participated in this discussion, #1079 implements the changes we discussed and adds some additional documentation. If you have additional comments, please review the pull request and make them there.

For now, the properties are still there, but deprecated. I will remove prior to the final 3.0.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Changes that will break API. complex A particularly complex or large project that involves significant amount of effort. enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants