Add a generic evaluator function #306

jdolence · 2023-10-09T20:12:58Z

PR Summary

As discussed with @Yurlungur...

This is a tiny PR that adds a fair bit more flexibility to how downstream codes interact with singularity-eos. The new functionality is a new Evaluate template function for both EosBase and Variant that can take in a downstream-defined functor and evaluate it with the underlying eos object. This allows one to, for example, express a parallel loop that calls several EOS functions at once in the downstream code, and then have singularity execute that parallel loop with one of the basic EOSs (as opposed to having to visit the variant in every call).

The current vector interface is quite constraining. For example, if you're running on GPU, it assumes you're calling from the host and that you want to launch a new kernel to make the eos call. In our downstream code, we use hierarchical parallelism where what we really want is to have an inner loop inside our kernel that calls the EOSs. This new feature allows us to write our inner loop as we do all our other inner loops, pass it in to singularity, and have it execute as we expressed but using the CRTP stuff to avoid many calls to mpark::visit.

I think the python interface requires something like the existing vector interface, otherwise I would argue that this is much more flexible and may be preferred in general for interfacing with downstream codes. The only downside is that it requires the downstream code to write a functor (or some number of functors to capture all the EOS calling patterns).

PR Checklist

Adds a test for any bugs fixed. Adds tests for new features.
Format your changes by using the make format command after configuring with cmake.
Document any new features, update documentation for changes made.
Make sure the copyright notice on any files you modified is up to date.
After creating a pull request, note it in the CHANGELOG.md file
If preparing for a new release, update the version in cmake.

jhp-lanl

I approve, but I have some questions and a comment:

If I'm understanding correctly, the main advantage here is that you can string together an arbitrary number of EOS lookup functions under a single mpark::visit(), right? This begs the question, is the mpark::visit() function that impactful for performance? It doesn't get optimized away? Have you profiled this change?

As a tangent in defense of the current vector interface, it was never intended to be called on device. The predominant wisdom at the time it was written was that the only real way that the EOS would be called from device would be via the scalar interface and that the host could chain together whatever calls it needed. This duplicates the visit() calls, yes, but otherwise allows the flexible parallelism you mention in your PR description.

EDIT - I may have come off too critical. I think this is a very useful addition and could be powerful in a number of different scenarios.

singularity-eos/eos/eos_base.hpp

jhp-lanl · 2023-10-09T22:04:29Z

As an aside, does this functionality eliminate the need for FillEOS() since you can basically inject it now?

EDIT - Or does it make sense to provide pre-made injectables rather than defining these within the EOS or base class? For example, we could use this to eliminate DensityEnergyFromPressureTemperature() and instead define functors that could take different approaches to finding the density and energy given that there may not be a unique solution. This would also allow the downstream code to make their own decisions if they prefer.

jhp-lanl · 2023-10-09T22:11:07Z

I think the python interface requires something like the existing vector interface, otherwise I would argue that this is much more flexible and may be preferred in general for interfacing with downstream codes. The only downside is that it requires the downstream code to write a functor (or some number of functors to capture all the EOS calling patterns).

I know this isn't what you were intending here, but if you extend the existing vector interface to have a vector version of Evaluate(), you could also think about using the python interface to inject arbitrary functions as well.

The other downside of this approach is that a fortran-based host code would have to provide its own C++ interface, which isn't something we're currently doing.

dholladay00 · 2023-10-09T22:16:40Z

I approve, but I have some questions and a comment:

If I'm understanding correctly, the main advantage here is that you can string together an arbitrary number of EOS lookup functions under a single mpark::visit(), right? This begs the question, is the mpark::visit() function that impactful for performance? It doesn't get optimized away? Have you profiled this change?

As a tangent in defense of the current vector interface, it was never intended to be called on device. The predominant wisdom at the time it was written was that the only real way that the EOS would be called from device would be via the scalar interface and that the host could chain together whatever calls it needed. This duplicates the visit() calls, yes, but otherwise allows the flexible parallelism you mention in your PR description.

My understanding is that the visit removes the possibility of vectorization. This lifts the visit higher up and thus in a lower level loop vectorization is possible. This basically allows downstream codes to write functors that can get at the underlying eos type and use that directly similar to how the vector interface accomplishes this. @jdolence has found this to have noticeable speedup. I am not clear on the main contributor to the speedup, that would be helpful to know though.

As for the vector interface, it is still necessary and helpful as it provides concrete functions that can be called from fortran.

Yurlungur · 2023-10-09T23:11:34Z

I'm not going to review the code, since @jdolence and I came up with it over a whiteboard conversation. I might take a stab at adding a unit test, though, which I want before this gets merged. A few comments regarding the above discussion:

If I'm understanding correctly, the main advantage here is that you can string together an arbitrary number of EOS lookup functions under a single mpark::visit(), right? This begs the question, is the mpark::visit() function that impactful for performance? It doesn't get optimized away? Have you profiled this change?

There are, I think, a few benefits:

As @dholladay00 mentioned---the visit call precludes vectorization, because there's a branch internal to the visit (unavoidably). This lifts the branch above the innermost loop.
This allows the vector calls to be made on device. I agree with you @jhp-lanl that was never the intent of the vector calls. That said, it seems like a nice win.
More generally, it allows a downstream C++ code to completely generalize what the vector call looks like and completely control both the loop body and the loop pattern. This allows one to fuse multiple EOS calls into a single loop body (which was the original purpose of FillEos), and it allows the downstream code to pick whatever loop they want. They're not bound to what's provided in ports-of-call, which I think is a big win because we don't want to be in the business of providing loop patterns through ports-of-call. In fact, I recall @clellsolomon requesting a feature like this when we discussed the vectorized code some time ago. So maybe it's of some use to EAP. (And CJ if you happen to see this, I'm curious what you think.)

This basically allows downstream codes to write functors that can get at the underlying eos type and use that directly similar to how the vector interface accomplishes this. @jdolence has found this to have noticeable speedup. I am not clear on the main contributor to the speedup, that would be helpful to know though.

It is worth noting that @jdolence only tested for ideal gas. It's possible our other EOS models are too complex to vectorize anyway. That said, we use ideal gas sometimes and being able to vectorize for it is probably a win.

As for the vector interface, it is still necessary and helpful as it provides concrete functions that can be called from fortran.

I don't think we can eliminate the current vector interface at this time. I thought about rolling it into this API, i.e., redefining the vector valls to just call Evaluate with different functors, but I think that actually ends up being more code than what we have. And I'm not sure how it works with the modifiers.

As an aside, does this functionality eliminate the need for FillEOS() since you can basically inject it now?

No---because FillEos is intended to know the proper order to fill variables, i.e., it should get you what you request in the order that has the fewest internal EOS calls possible. For thermodynamically complete EOS's, like Helmholtz or Vinet, this is probably a significant savings, because the underlying potential need only be computed once.

EDIT - Or does it make sense to provide pre-made injectables rather than defining these within the EOS or base class? For example, we could use this to eliminate DensityEnergyFromPressureTemperature() and instead define functors that could take different approaches to finding the density and energy given that there may not be a unique solution. This would also allow the downstream code to make their own decisions if they prefer.

This I am a fan of! Would love to have an injectable instead of using PTofRE.

I know this isn't what you were intending here, but if you extend the existing vector interface to have a vector version of Evaluate(), you could also think about using the python interface to inject arbitrary functions as well.

This would be cool! I don't know if this kind of thing is possible with pybind. Maybe @rbberger knows?

jdolence · 2023-10-09T23:28:18Z

I approve, but I have some questions and a comment:

If I'm understanding correctly, the main advantage here is that you can string together an arbitrary number of EOS lookup functions under a single mpark::visit(), right? This begs the question, is the mpark::visit() function that impactful for performance? It doesn't get optimized away? Have you profiled this change?

As a tangent in defense of the current vector interface, it was never intended to be called on device. The predominant wisdom at the time it was written was that the only real way that the EOS would be called from device would be via the scalar interface and that the host could chain together whatever calls it needed. This duplicates the visit() calls, yes, but otherwise allows the flexible parallelism you mention in your PR description.

EDIT - I may have come off too critical. I think this is a very useful addition and could be powerful in a number of different scenarios.

Sorry -- reread my PR description and I think it may have come off more critical than I intended of the current vector interface. There's nothing wrong with that interface, it just does a very specific thing that happens not to align with what I'm trying to do.

As @dholladay00 points out, I'm not sure if it's the cost of the visit, per se, or instead the fact that loops can vectorize that couldn't before. Whether or not a loop actually vectorizes ultimately depends on the implementation of each EOS, and I'm not sure which do and which don't. IdealGas clearly should, at least, and it's possible others do or could be transformed in a way that allows it. In my tests downstream, our whole code sped up by ~10% (maybe a bit more) when I just explicitly pulled out an IdealGas before looping over cells and making EOS calls, for example.

Another thing this PR enables are other kinds of operations that may or may not have anything to do with EOS in the same loop. For example, say you wanted to actually compute sound speed. You could write a functor that calls the EOS for bulk modulus and uses that result to compute the adiabatic sound speed. Really you can write whatever you want to execute in the functor just as you would anything else in your downstream code.

And another thing this enables is for downstream codes to use their native data structures, if they desire. There may still be advantages to copying state out into a contiguous vector and then copying back results, but it's probably not always advantageous to do so.

It could be that many of the EOSs won't vectorize anyway, so the only effective difference from just calling the EOS functions on the variant is that you call visit less. It's not clear if this matters performance-wise (and I suppose it could be different on CPU and GPU). A key selling point of this PR is that it only requires 9 lines of new code, and (at least some) downstream codes would write things almost identically to how they would otherwise have been written, so there's not much downside.

Yurlungur

Needs tests. But otherwise LGTM. Pleased how this came out. :D

jhp-lanl · 2023-10-09T23:40:36Z

Thanks for answering my questions @dholladay00 @Yurlungur and @jdolence ! I think I understand this better now. I think this is ready to be merged.

I'd like to also decrease the number of overloads we have on the vector interface in the future and this might be a tool to do so. I'll have to give it more thought though.

dholladay00 · 2023-10-09T23:41:21Z

I'd be interested to see if this improves performance in places where consecutive eos calls are present such as in get_sg_eos and in the PTE solvers.

jhp-lanl · 2023-10-09T23:41:46Z

Needs tests. But otherwise LGTM. Pleased how this came out. :D

Thanks for the reminder!. A really simple test should cover it and then it can be merged

Yurlungur · 2023-10-09T23:41:49Z

I'd be interested to see if this improves performance in places where consecutive eos calls are present such as in get_sg_eos and in the PTE solvers.

Can't be used in the PTE solvers yet. We'd need a vector solver.

jhp-lanl · 2023-10-09T23:42:48Z

I'd be interested to see if this improves performance in places where consecutive eos calls are present such as in get_sg_eos and in the PTE solvers.

Can't be used in the PTE solvers yet. We'd need a vector solver.

But it can be used where we request multiple things from an EOS in quick succession (like within a PTE solver)

dholladay00 · 2023-10-09T23:48:09Z

I'd be interested to see if this improves performance in places where consecutive eos calls are present such as in get_sg_eos and in the PTE solvers.

Can't be used in the PTE solvers yet. We'd need a vector solver.

I am thinking specifically in material loops there are several places where multiple eos calls are made per loop iteration, the evaluate could be used in such a way as to only require 1 visit per loop iteration. This would only be helpful if visits themselves are expensive. It's possible there are situations where they may be if, for example, every visit consumes a lot of registers.

Yurlungur · 2023-10-10T00:20:44Z

Ah I see. Yes that's true. That might buy us something.

Yurlungur

Took a stab at documentation and tests. Tests triggered on Darwin.

singularity-eos/eos/eos_base.hpp

test/test_eos_unit.cpp

doc/sphinx/src/using-eos.rst

jhp-lanl

Thanks for the documentation and test @Yurlungur ! This looks good to me other than a minor change to the docs

test/test_eos_unit.cpp

…than the test code

Yurlungur · 2023-10-11T02:25:00Z

Tests now pass on darwin. As soon as tests pass on github we can merge.

singularity-eos/eos/eos_base.hpp

add generic evaluator

73a6561

jdolence requested review from Yurlungur and jhp-lanl October 9, 2023 20:13

formatting

8813cd9

jdolence requested a review from dholladay00 October 9, 2023 20:59

jhp-lanl approved these changes Oct 9, 2023

View reviewed changes

singularity-eos/eos/eos_base.hpp Show resolved Hide resolved

singularity-eos/eos/eos_base.hpp Show resolved Hide resolved

Yurlungur self-assigned this Oct 9, 2023

Yurlungur added the enhancement New feature or request label Oct 9, 2023

Yurlungur requested changes Oct 9, 2023

View reviewed changes

jonahm-LANL added 3 commits October 9, 2023 19:46

add docs and a test. make functor not have to be const

4b23de5

add a warning about const

de6202d

added a few more comments to the docs

9d19dd4

Yurlungur changed the title ~~WIP: Add a generic evaluator function~~ Add a generic evaluator function Oct 10, 2023

simplify code a bit

bc25dae

Yurlungur approved these changes Oct 10, 2023

View reviewed changes

singularity-eos/eos/eos_base.hpp Outdated Show resolved Hide resolved

test/test_eos_unit.cpp Outdated Show resolved Hide resolved

test/test_eos_unit.cpp Show resolved Hide resolved

doc/sphinx/src/using-eos.rst Show resolved Hide resolved

jonahm-LANL added 2 commits October 9, 2023 19:56

formatting

efe6cfd

changelog

4d29385

jhp-lanl approved these changes Oct 10, 2023

View reviewed changes

jonahm-LANL added 5 commits October 10, 2023 18:41

Changelog and copyright

a01b6b0

capture of this pointer

6d58fab

oops

ec7891a

update docs to reflect auto and this pointer

a9ab2fd

formatting

08d3c25

dholladay00 reviewed Oct 11, 2023

View reviewed changes

test/test_eos_unit.cpp Outdated Show resolved Hide resolved

jonahm-LANL added 3 commits October 10, 2023 19:23

slight tweak

2fe3723

add blurm in parallelism model as suggested by jhp

621c3ae

Add cuda checks. Disable annoying warnings in the base class, rather …

bcfb597

…than the test code

dholladay00 reviewed Oct 11, 2023

View reviewed changes

singularity-eos/eos/eos_base.hpp Outdated Show resolved Hide resolved

switch Evaluate to be constexpr, as suggested by danl

da54c64

Yurlungur merged commit 4fb2409 into main Oct 12, 2023
4 checks passed

Yurlungur deleted the jdolence/generic_evaluate branch October 12, 2023 02:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a generic evaluator function #306

Add a generic evaluator function #306

jdolence commented Oct 9, 2023 •

edited by Yurlungur

Loading

jhp-lanl left a comment •

edited

Loading

jhp-lanl commented Oct 9, 2023 •

edited

Loading

jhp-lanl commented Oct 9, 2023

dholladay00 commented Oct 9, 2023

Yurlungur commented Oct 9, 2023

jdolence commented Oct 9, 2023

Yurlungur left a comment

jhp-lanl commented Oct 9, 2023

dholladay00 commented Oct 9, 2023

jhp-lanl commented Oct 9, 2023

Yurlungur commented Oct 9, 2023

jhp-lanl commented Oct 9, 2023

dholladay00 commented Oct 9, 2023

Yurlungur commented Oct 10, 2023

Yurlungur left a comment

jhp-lanl left a comment

Yurlungur commented Oct 11, 2023

Add a generic evaluator function #306

Add a generic evaluator function #306

Conversation

jdolence commented Oct 9, 2023 • edited by Yurlungur Loading

PR Summary

PR Checklist

jhp-lanl left a comment • edited Loading

Choose a reason for hiding this comment

jhp-lanl commented Oct 9, 2023 • edited Loading

jhp-lanl commented Oct 9, 2023

dholladay00 commented Oct 9, 2023

Yurlungur commented Oct 9, 2023

jdolence commented Oct 9, 2023

Yurlungur left a comment

Choose a reason for hiding this comment

jhp-lanl commented Oct 9, 2023

dholladay00 commented Oct 9, 2023

jhp-lanl commented Oct 9, 2023

Yurlungur commented Oct 9, 2023

jhp-lanl commented Oct 9, 2023

dholladay00 commented Oct 9, 2023

Yurlungur commented Oct 10, 2023

Yurlungur left a comment

Choose a reason for hiding this comment

jhp-lanl left a comment

Choose a reason for hiding this comment

Yurlungur commented Oct 11, 2023

jdolence commented Oct 9, 2023 •

edited by Yurlungur

Loading

jhp-lanl left a comment •

edited

Loading

jhp-lanl commented Oct 9, 2023 •

edited

Loading