Store receiver in `any_operation_state` to avoid dynamic allocation for receiver #1354

msimberg · 2024-11-25T10:06:52Z

Complement to #1281. Fixes #845.

Adds more indirection so that the concrete receiver type doesn't have to be type-erased. It can instead be stored directly in the operation state type, and then a reference/pointer to a base class is passed to connect instead. This does not avoid the indirection/virtual function dispatch, but only requires passing a pointer-sized receiver to connect.

codacy-production · 2024-11-25T10:08:45Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +0.02% (target: -1.00%)	❌ 86.76% (target: 90.00%)

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`015302c`)	18217	13754	75.50%
Head commit (`5b2d02b`)	18213 (-4)	13755 (+1)	75.52% (+0.02%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#1354)	68	59	86.76%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more}

… receiver

msimberg · 2024-11-26T16:45:42Z

I ran some benchmarks with DLA-Future on todi, and this seems like a win. In most cases there is almost no difference (or within noise). In particular the GPU runs don't really show much of a difference, and CPU runs with big (512) block sizes also barely show a difference. Most algorithms also don't really show a difference. However, in the very best case this gives up to a 10% speedup. Below is the tridiagonal solver with 128 block size and a 40k input matrix:

The 128 blocksize case is still significantly slower than the 512 blocksize, but it's a small step in the right direction. Note: In the above plot I've also benchmarked main...any-sender-sbo, which is this PR and #1281 combined. In this case that seems to improve things a bit more.

msimberg · 2024-11-26T16:47:47Z

@albestro, @rasolca, @RMeli I'm not expecting a review from you, but requested one just in case you're interested in looking. I feel like I should be able to remove one further level of indirection in the receiver, but have not been able to figure out how (if it's possible). I'm happy with the state of this already, but in case you spot something that looks suspicious or could be improved, do let me know.

msimberg self-assigned this Nov 25, 2024

msimberg force-pushed the any-receiver-no-alloc branch 4 times, most recently from a47c601 to 191c235 Compare November 25, 2024 11:11

Store receiver in any_operation_state to avoid dynamic allocation for…

5b2d02b

… receiver

msimberg force-pushed the any-receiver-no-alloc branch from 191c235 to 5b2d02b Compare November 25, 2024 12:24

msimberg requested review from rasolca, albestro and RMeli November 26, 2024 16:45

msimberg marked this pull request as ready for review November 26, 2024 16:47

msimberg requested review from aurianer and biddisco as code owners November 26, 2024 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store receiver in `any_operation_state` to avoid dynamic allocation for receiver #1354

Store receiver in `any_operation_state` to avoid dynamic allocation for receiver #1354

msimberg commented Nov 25, 2024 •

edited

Loading

codacy-production bot commented Nov 25, 2024 •

edited

Loading

msimberg commented Nov 26, 2024

msimberg commented Nov 26, 2024

Store receiver in any_operation_state to avoid dynamic allocation for receiver #1354

Are you sure you want to change the base?

Store receiver in any_operation_state to avoid dynamic allocation for receiver #1354

Conversation

msimberg commented Nov 25, 2024 • edited Loading

codacy-production bot commented Nov 25, 2024 • edited Loading

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

msimberg commented Nov 26, 2024

msimberg commented Nov 26, 2024

Store receiver in `any_operation_state` to avoid dynamic allocation for receiver #1354

Store receiver in `any_operation_state` to avoid dynamic allocation for receiver #1354

msimberg commented Nov 25, 2024 •

edited

Loading

codacy-production bot commented Nov 25, 2024 •

edited

Loading