Polynomial Evaluation #415
JonTidswell
started this conversation in
Ideas
Replies: 2 comments 9 replies
-
This is still an active research topic, see for example Polynomial Evaluation on Superscalar Architecture, Applied to the Elementary Function ex (https://dl.acm.org/doi/abs/10.1145/3408893). |
Beta Was this translation helpful? Give feedback.
2 replies
-
Hello, Oh, are you one of the authors of this book? |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Sleef currently uses a full Estrin method.
Estrin often has lower latency than Horner evaluation, but it sacrifices accuracy and throughput.
Unfortunately, the gains in latency are effectively limited by the ability to exploit the hardware pipeline parallelism, and expanding the polynomial evaluation tree completely can be slower than a partial expansion.
The pipeline is very processor specific, so it would be a large amount of work to cover all the possibilities correctly, so presuming that a full Estrin is justified is a reasonable default position.
However, when there are multiple polynomials that can be evaluated concurrently, then the default position is sub-optimal. Three obvious cases where this occurs are:
When unrolling loops, it is better to fill the pipeline with separate Horner evaluations, and not use Estrin.
When evaluating rational polynomials, the pipeline needs to be shared between the polynomials.
Note: Division exacerbates numerical errors, so instead of using a formula like
p1(x)/p2(x)
its better to use a formula like
p0(x) + p1(x)/p2(x)
wherep0(x)
is an accurate enough approximation that the errors in the division are below rounding threshold (typically this meansp0(x)
is evaluated in extended precision).When the input domain is partitioned it may be faster to evaluate multiple partitions in parallel and then to use conditional assignment, rather than use conditional branches.
I have had a first attempt (evalpoly branch) at extending estrin.h to support Horner in order to support a discussion, but I don't assume my approach is suitable.
I also suspect it could be extended to support extended precision in the first couple of terms to avoid the common pattern of a POLY(...) call followed by some instrincs to complete the polynomial in extended precision.
Beta Was this translation helpful? Give feedback.
All reactions