Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbor SIMD Library Refactoring #1772

Open
anstaf opened this issue Nov 19, 2021 · 5 comments
Open

Arbor SIMD Library Refactoring #1772

anstaf opened this issue Nov 19, 2021 · 5 comments

Comments

@anstaf
Copy link

anstaf commented Nov 19, 2021

Motivation

Eventually factor out Arbor SIMD into a separate project and make it useful for the users outside of Arbor.

Current state

Arbor SIMD provides an API that is a variation of std::experimental::simd API. It is neither subset or superset of experimental.

Distinctive features of Arbor SIMD:

  • gather/scatter support.
  • SVE backend.

Observations

  • According to N4808, std::experimental::simd provides explicit conversions from/to underlined type:
        explicit operator implementation-defined () const;
        explicit simd(const implementation-defined &);   
  • SVE doesn't fit std::experimetal::simd backend model fundamentally

Proposal

Let us split Arbor SIMD into two libraries:

  • arbor-simd-indirect. It will depend on std::experimental::simd and will provide gather/scatter API in the form of free functions that accept std::experimental::simd parameters. Based on the compilation target, scalar type and width it will dispatch to the proper intrinsic, using static_cast's to do simd wrapping/unwrapping. The library will not support SVE.

  • arbor-simd-sve. It will depend on std::experimental::simd and arbor-simd-indirect and will provide adapted to SVE simd API. This API will consist of makers that return vectors (like: arbsve::broadcast(42) or arbsve::copy_from(ptr)) and functions that accept vectors. If the compilation target has sve intrinsics implemetation will forward directly to them, otherwise it will fall back to std::experimental::simd + arbor-simd-indirect.

@bcumming
Copy link
Member

bcumming commented Nov 29, 2021

Thanks for the proposal @antonf.

I feel that refactoring to use std::experimental::simd is impractical while not part of the standard.

  • it is available only in gcc 11, while the minimum version required by Arbor is GCC 8 (and Clang)
  • we need to understand the performance tradeoffs, and check support for features AVX512 in std::experimental implementation. For this we would have to conduct performance benchmarks.

As a rule in Arbor, we have implemented future standard library features internally, and used them when they can be replaced by mature implementations in our minimum compiler versions. Given this, I think it is too early to refactor the SIMD library to be based around std::experimental.

@halfflat
Copy link
Contributor

halfflat commented Dec 14, 2021

I certainly like the idea of splitting out the SVE side; it's really incompatible with the rest of the API.

Regarding std::experimental::simd:

  • We could still factor our SIMD library into something that accords to the std::experimental::simd interface, and an additional component that supports the gather/scatter/constraint semantics, with a view to swapping over to the standard implementation in the future.
  • N4808, §9.7.7 provides cmath overloads for SIMD values; we can provide our own implementations with consistent numerics across back-ends under e.g. arb::math, both for SIMD and scalar values. Their optimized implementations though use low-level intrinsics rather than just the arithmetic operations provided by std::experimental::simd.

For our implementations of e.g. expm1, exprelr etc. which rely upon decomposition of the mantissa and exponent and such, we could implement a set of architecture-specific low-level operations which are then used within our generic implementations, or stick to writing things in terms of standard decomposition functions and arithmetic. The former would allow us to maintain (mostly) the performance; the latter could well be slower, but might allow us an implementation that is more easily robust (proper support for subnormal numbers, etc.).

@jan-wassenberg
Copy link

Hi, just happened across this issue while searching. Have you seen https://github.com/google/highway ? It's a C++ wrapper over intrinsics that supports SVE, RISC-V, AVX-512 and others. Would be happy to discuss if you're interested.

@thorstenhater
Copy link
Contributor

thorstenhater commented Nov 3, 2022

Hi @jan-wassenberg,

thanks for the suggestion. Highway looks pretty interesting, but it's unlikely we'll change our
SIMD backend soon without pressing need. (RISC-V might pose such a need in the future)
Just out of curiosity, how does highway compare to VC2 (https://github.com/vectorclass/version2)?

Just to note our requirements (mostly in terms of performant operations, since this is the motivator)
not only to highway, but any other choice as well

  • scatter store/gather load
  • fast approximate mathematical functions: exp, pow, sqrt, log
  • to a lesser degree: sin, cos, ...

@jan-wassenberg
Copy link

Hi @thorstenhater , got it. Yes, RISC-V looks to be gathering momentum.

how does highway compare to VC2

I very much respect Agner's work but he is clear that no instruction sets other than x86 will be supported.

Just to note our requirements

Good to know. We have all of those except pow, and can help add that or other math functions if required. (For pow it really depends how much accuracy you want. A simple version can use log+exp already.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants