-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
README improvement #125
README improvement #125
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks great!
|
||
function axpy_polyester!(y, a, x) | ||
@batch for i in eachindex(y,x) | ||
y[i] = a * x[i] + y[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no muladd
?
Because Julia takes a strong stance against unexpectedly giving people better performance and accuracy, I'm pro-normalizing-muladd
.
Zen CPUs probably don't muladd
much faster than separate mul
and add
, because they have 2 adders and 2 multiply units that can also fma.
For the muladd
version on Zen, the adders are idle, so throughput is the same w/ respect to the backend (but fewer instructions and uops with muladd
is friendlier to the frontend).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not strongly opposed to using muladd
, I just don't think it matters here. The purpose of this example is to demonstrate how to use @batch
. Optimizing the axpy kernel itself is an orthogonal issue IMO and only a distraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. FWIW, even on my intel system, muladd
wasn't faster. axpy
isn't compute bound.
|
||
## Important Notes | ||
|
||
* `Polyester.@batch` moves arrays to threads by turning them into [StrideArraysCore.PtrArray](https://github.com/JuliaSIMD/StrideArraysCore.jl)s. This means that under an `@batch` slices will create `view`s by default(!). You may want to start Julia with `--check-bounds=yes` while debugging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No automatic @inbounds
as of StrideArraysCore.jl
0.5
but using --check-bounds=yes
is good advice, anyway.
Especially because @batch
adds @inbounds
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand.
- Will arrays this be turned into
PtrArray
? - Will slices still default to
view
?
I'll add a note about @batch
automatically adding @inbounds
.
(I have to say I'm not a fan of adding automatic @inbounds
and defaulting to views
inside of @batch
. For me, the latter is about low-overhead threading. The former two are optimizations that a user can add themself if they want to. But it's your package.)
What do you have in mind? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #125 +/- ##
=======================================
Coverage 91.13% 91.13%
=======================================
Files 3 3
Lines 451 451
=======================================
Hits 411 411
Misses 40 40 ☔ View full report in Codecov by Sentry. |
Apart from what I said above - I'd personally drop auto |
Thanks! I don't really want to change |
See #124.
I've tried to clarify the README and have updated/rerun the examples on Julia 1.9 on a linux machine with 8 threads.
The sections about
threadlocal
and disabling polyester threads have not gotten much attention (but benchmarks in the latter have been rerun as well).(I think the package needs an overhaul but this is beyond this PR which just tries to communicate the status quo a bit better.)