Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a generic evaluator function #306

Merged
merged 17 commits into from
Oct 12, 2023
Merged

Add a generic evaluator function #306

merged 17 commits into from
Oct 12, 2023

Conversation

jdolence
Copy link
Member

@jdolence jdolence commented Oct 9, 2023

PR Summary

As discussed with @Yurlungur...

This is a tiny PR that adds a fair bit more flexibility to how downstream codes interact with singularity-eos. The new functionality is a new Evaluate template function for both EosBase and Variant that can take in a downstream-defined functor and evaluate it with the underlying eos object. This allows one to, for example, express a parallel loop that calls several EOS functions at once in the downstream code, and then have singularity execute that parallel loop with one of the basic EOSs (as opposed to having to visit the variant in every call).

The current vector interface is quite constraining. For example, if you're running on GPU, it assumes you're calling from the host and that you want to launch a new kernel to make the eos call. In our downstream code, we use hierarchical parallelism where what we really want is to have an inner loop inside our kernel that calls the EOSs. This new feature allows us to write our inner loop as we do all our other inner loops, pass it in to singularity, and have it execute as we expressed but using the CRTP stuff to avoid many calls to mpark::visit.

I think the python interface requires something like the existing vector interface, otherwise I would argue that this is much more flexible and may be preferred in general for interfacing with downstream codes. The only downside is that it requires the downstream code to write a functor (or some number of functors to capture all the EOS calling patterns).

PR Checklist

  • Adds a test for any bugs fixed. Adds tests for new features.
  • Format your changes by using the make format command after configuring with cmake.
  • Document any new features, update documentation for changes made.
  • Make sure the copyright notice on any files you modified is up to date.
  • After creating a pull request, note it in the CHANGELOG.md file
  • If preparing for a new release, update the version in cmake.

@jdolence jdolence requested review from Yurlungur and jhp-lanl October 9, 2023 20:13
@jdolence jdolence requested a review from dholladay00 October 9, 2023 20:59
Copy link
Collaborator

@jhp-lanl jhp-lanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve, but I have some questions and a comment:

If I'm understanding correctly, the main advantage here is that you can string together an arbitrary number of EOS lookup functions under a single mpark::visit(), right? This begs the question, is the mpark::visit() function that impactful for performance? It doesn't get optimized away? Have you profiled this change?

As a tangent in defense of the current vector interface, it was never intended to be called on device. The predominant wisdom at the time it was written was that the only real way that the EOS would be called from device would be via the scalar interface and that the host could chain together whatever calls it needed. This duplicates the visit() calls, yes, but otherwise allows the flexible parallelism you mention in your PR description.

EDIT - I may have come off too critical. I think this is a very useful addition and could be powerful in a number of different scenarios.

singularity-eos/eos/eos_base.hpp Show resolved Hide resolved
singularity-eos/eos/eos_base.hpp Show resolved Hide resolved
@jhp-lanl
Copy link
Collaborator

jhp-lanl commented Oct 9, 2023

As an aside, does this functionality eliminate the need for FillEOS() since you can basically inject it now?

EDIT - Or does it make sense to provide pre-made injectables rather than defining these within the EOS or base class? For example, we could use this to eliminate DensityEnergyFromPressureTemperature() and instead define functors that could take different approaches to finding the density and energy given that there may not be a unique solution. This would also allow the downstream code to make their own decisions if they prefer.

@jhp-lanl
Copy link
Collaborator

jhp-lanl commented Oct 9, 2023

I think the python interface requires something like the existing vector interface, otherwise I would argue that this is much more flexible and may be preferred in general for interfacing with downstream codes. The only downside is that it requires the downstream code to write a functor (or some number of functors to capture all the EOS calling patterns).

I know this isn't what you were intending here, but if you extend the existing vector interface to have a vector version of Evaluate(), you could also think about using the python interface to inject arbitrary functions as well.

The other downside of this approach is that a fortran-based host code would have to provide its own C++ interface, which isn't something we're currently doing.

@dholladay00
Copy link
Collaborator

I approve, but I have some questions and a comment:

If I'm understanding correctly, the main advantage here is that you can string together an arbitrary number of EOS lookup functions under a single mpark::visit(), right? This begs the question, is the mpark::visit() function that impactful for performance? It doesn't get optimized away? Have you profiled this change?

As a tangent in defense of the current vector interface, it was never intended to be called on device. The predominant wisdom at the time it was written was that the only real way that the EOS would be called from device would be via the scalar interface and that the host could chain together whatever calls it needed. This duplicates the visit() calls, yes, but otherwise allows the flexible parallelism you mention in your PR description.

My understanding is that the visit removes the possibility of vectorization. This lifts the visit higher up and thus in a lower level loop vectorization is possible. This basically allows downstream codes to write functors that can get at the underlying eos type and use that directly similar to how the vector interface accomplishes this. @jdolence has found this to have noticeable speedup. I am not clear on the main contributor to the speedup, that would be helpful to know though.

As for the vector interface, it is still necessary and helpful as it provides concrete functions that can be called from fortran.

@Yurlungur
Copy link
Collaborator

I'm not going to review the code, since @jdolence and I came up with it over a whiteboard conversation. I might take a stab at adding a unit test, though, which I want before this gets merged. A few comments regarding the above discussion:

If I'm understanding correctly, the main advantage here is that you can string together an arbitrary number of EOS lookup functions under a single mpark::visit(), right? This begs the question, is the mpark::visit() function that impactful for performance? It doesn't get optimized away? Have you profiled this change?

There are, I think, a few benefits:

  1. As @dholladay00 mentioned---the visit call precludes vectorization, because there's a branch internal to the visit (unavoidably). This lifts the branch above the innermost loop.
  2. This allows the vector calls to be made on device. I agree with you @jhp-lanl that was never the intent of the vector calls. That said, it seems like a nice win.
  3. More generally, it allows a downstream C++ code to completely generalize what the vector call looks like and completely control both the loop body and the loop pattern. This allows one to fuse multiple EOS calls into a single loop body (which was the original purpose of FillEos), and it allows the downstream code to pick whatever loop they want. They're not bound to what's provided in ports-of-call, which I think is a big win because we don't want to be in the business of providing loop patterns through ports-of-call. In fact, I recall @clellsolomon requesting a feature like this when we discussed the vectorized code some time ago. So maybe it's of some use to EAP. (And CJ if you happen to see this, I'm curious what you think.)

This basically allows downstream codes to write functors that can get at the underlying eos type and use that directly similar to how the vector interface accomplishes this. @jdolence has found this to have noticeable speedup. I am not clear on the main contributor to the speedup, that would be helpful to know though.

It is worth noting that @jdolence only tested for ideal gas. It's possible our other EOS models are too complex to vectorize anyway. That said, we use ideal gas sometimes and being able to vectorize for it is probably a win.

As for the vector interface, it is still necessary and helpful as it provides concrete functions that can be called from fortran.

I don't think we can eliminate the current vector interface at this time. I thought about rolling it into this API, i.e., redefining the vector valls to just call Evaluate with different functors, but I think that actually ends up being more code than what we have. And I'm not sure how it works with the modifiers.

As an aside, does this functionality eliminate the need for FillEOS() since you can basically inject it now?

No---because FillEos is intended to know the proper order to fill variables, i.e., it should get you what you request in the order that has the fewest internal EOS calls possible. For thermodynamically complete EOS's, like Helmholtz or Vinet, this is probably a significant savings, because the underlying potential need only be computed once.

EDIT - Or does it make sense to provide pre-made injectables rather than defining these within the EOS or base class? For example, we could use this to eliminate DensityEnergyFromPressureTemperature() and instead define functors that could take different approaches to finding the density and energy given that there may not be a unique solution. This would also allow the downstream code to make their own decisions if they prefer.

This I am a fan of! Would love to have an injectable instead of using PTofRE.

I know this isn't what you were intending here, but if you extend the existing vector interface to have a vector version of Evaluate(), you could also think about using the python interface to inject arbitrary functions as well.

This would be cool! I don't know if this kind of thing is possible with pybind. Maybe @rbberger knows?

@Yurlungur Yurlungur self-assigned this Oct 9, 2023
@Yurlungur Yurlungur added the enhancement New feature or request label Oct 9, 2023
@jdolence
Copy link
Member Author

jdolence commented Oct 9, 2023

I approve, but I have some questions and a comment:

If I'm understanding correctly, the main advantage here is that you can string together an arbitrary number of EOS lookup functions under a single mpark::visit(), right? This begs the question, is the mpark::visit() function that impactful for performance? It doesn't get optimized away? Have you profiled this change?

As a tangent in defense of the current vector interface, it was never intended to be called on device. The predominant wisdom at the time it was written was that the only real way that the EOS would be called from device would be via the scalar interface and that the host could chain together whatever calls it needed. This duplicates the visit() calls, yes, but otherwise allows the flexible parallelism you mention in your PR description.

EDIT - I may have come off too critical. I think this is a very useful addition and could be powerful in a number of different scenarios.

Sorry -- reread my PR description and I think it may have come off more critical than I intended of the current vector interface. There's nothing wrong with that interface, it just does a very specific thing that happens not to align with what I'm trying to do.

As @dholladay00 points out, I'm not sure if it's the cost of the visit, per se, or instead the fact that loops can vectorize that couldn't before. Whether or not a loop actually vectorizes ultimately depends on the implementation of each EOS, and I'm not sure which do and which don't. IdealGas clearly should, at least, and it's possible others do or could be transformed in a way that allows it. In my tests downstream, our whole code sped up by ~10% (maybe a bit more) when I just explicitly pulled out an IdealGas before looping over cells and making EOS calls, for example.

Another thing this PR enables are other kinds of operations that may or may not have anything to do with EOS in the same loop. For example, say you wanted to actually compute sound speed. You could write a functor that calls the EOS for bulk modulus and uses that result to compute the adiabatic sound speed. Really you can write whatever you want to execute in the functor just as you would anything else in your downstream code.

And another thing this enables is for downstream codes to use their native data structures, if they desire. There may still be advantages to copying state out into a contiguous vector and then copying back results, but it's probably not always advantageous to do so.

It could be that many of the EOSs won't vectorize anyway, so the only effective difference from just calling the EOS functions on the variant is that you call visit less. It's not clear if this matters performance-wise (and I suppose it could be different on CPU and GPU). A key selling point of this PR is that it only requires 9 lines of new code, and (at least some) downstream codes would write things almost identically to how they would otherwise have been written, so there's not much downside.

Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs tests. But otherwise LGTM. Pleased how this came out. :D

@jhp-lanl
Copy link
Collaborator

jhp-lanl commented Oct 9, 2023

Thanks for answering my questions @dholladay00 @Yurlungur and @jdolence ! I think I understand this better now. I think this is ready to be merged.

I'd like to also decrease the number of overloads we have on the vector interface in the future and this might be a tool to do so. I'll have to give it more thought though.

@dholladay00
Copy link
Collaborator

I'd be interested to see if this improves performance in places where consecutive eos calls are present such as in get_sg_eos and in the PTE solvers.

@jhp-lanl
Copy link
Collaborator

jhp-lanl commented Oct 9, 2023

Needs tests. But otherwise LGTM. Pleased how this came out. :D

Thanks for the reminder!. A really simple test should cover it and then it can be merged

@Yurlungur
Copy link
Collaborator

I'd be interested to see if this improves performance in places where consecutive eos calls are present such as in get_sg_eos and in the PTE solvers.

Can't be used in the PTE solvers yet. We'd need a vector solver.

@jhp-lanl
Copy link
Collaborator

jhp-lanl commented Oct 9, 2023

I'd be interested to see if this improves performance in places where consecutive eos calls are present such as in get_sg_eos and in the PTE solvers.

Can't be used in the PTE solvers yet. We'd need a vector solver.

But it can be used where we request multiple things from an EOS in quick succession (like within a PTE solver)

@dholladay00
Copy link
Collaborator

I'd be interested to see if this improves performance in places where consecutive eos calls are present such as in get_sg_eos and in the PTE solvers.

Can't be used in the PTE solvers yet. We'd need a vector solver.

I am thinking specifically in material loops there are several places where multiple eos calls are made per loop iteration, the evaluate could be used in such a way as to only require 1 visit per loop iteration. This would only be helpful if visits themselves are expensive. It's possible there are situations where they may be if, for example, every visit consumes a lot of registers.

@Yurlungur
Copy link
Collaborator

Ah I see. Yes that's true. That might buy us something.

@Yurlungur Yurlungur changed the title WIP: Add a generic evaluator function Add a generic evaluator function Oct 10, 2023
Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a stab at documentation and tests. Tests triggered on Darwin.

singularity-eos/eos/eos_base.hpp Outdated Show resolved Hide resolved
test/test_eos_unit.cpp Outdated Show resolved Hide resolved
test/test_eos_unit.cpp Show resolved Hide resolved
doc/sphinx/src/using-eos.rst Show resolved Hide resolved
Copy link
Collaborator

@jhp-lanl jhp-lanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation and test @Yurlungur ! This looks good to me other than a minor change to the docs

test/test_eos_unit.cpp Outdated Show resolved Hide resolved
@Yurlungur
Copy link
Collaborator

Tests now pass on darwin. As soon as tests pass on github we can merge.

@Yurlungur Yurlungur merged commit 4fb2409 into main Oct 12, 2023
4 checks passed
@Yurlungur Yurlungur deleted the jdolence/generic_evaluate branch October 12, 2023 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants