Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store receiver in any_operation_state to avoid dynamic allocation for receiver #1354

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

msimberg
Copy link
Contributor

@msimberg msimberg commented Nov 25, 2024

Complement to #1281. Fixes #845.

Adds more indirection so that the concrete receiver type doesn't have to be type-erased. It can instead be stored directly in the operation state type, and then a reference/pointer to a base class is passed to connect instead. This does not avoid the indirection/virtual function dispatch, but only requires passing a pointer-sized receiver to connect.

@msimberg msimberg self-assigned this Nov 25, 2024
Copy link

codacy-production bot commented Nov 25, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.02% (target: -1.00%) 86.76% (target: 90.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (015302c) 18217 13754 75.50%
Head commit (5b2d02b) 18213 (-4) 13755 (+1) 75.52% (+0.02%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#1354) 68 59 86.76%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

@msimberg msimberg force-pushed the any-receiver-no-alloc branch 4 times, most recently from a47c601 to 191c235 Compare November 25, 2024 11:11
@msimberg
Copy link
Contributor Author

I ran some benchmarks with DLA-Future on todi, and this seems like a win. In most cases there is almost no difference (or within noise). In particular the GPU runs don't really show much of a difference, and CPU runs with big (512) block sizes also barely show a difference. Most algorithms also don't really show a difference. However, in the very best case this gives up to a 10% speedup. Below is the tridiagonal solver with 128 block size and a 40k input matrix:
trid_evp_strong_time_40960_128
The 128 blocksize case is still significantly slower than the 512 blocksize, but it's a small step in the right direction. Note: In the above plot I've also benchmarked main...any-sender-sbo, which is this PR and #1281 combined. In this case that seems to improve things a bit more.

@msimberg
Copy link
Contributor Author

@albestro, @rasolca, @RMeli I'm not expecting a review from you, but requested one just in case you're interested in looking. I feel like I should be able to remove one further level of indirection in the receiver, but have not been able to figure out how (if it's possible). I'm happy with the state of this already, but in case you spot something that looks suspicious or could be improved, do let me know.

@msimberg msimberg marked this pull request as ready for review November 26, 2024 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Update any_receiver to not require heap allocation
1 participant