Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/PROTO: Consider RNDV_PERF_DIFF #10292

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ivankochin
Copy link
Contributor

@ivankochin ivankochin commented Nov 12, 2024

What?

Applies UCX_RNDV_PERF_DIFF setting effect for protov2.

Why?

This control effect was removed during introducing perf-factors logic. This PR brings it back.

That how it looks like with that patch:
image

@ivankochin
Copy link
Contributor Author

@yosefe @brminich currently patch improves all RNDV protocols, but maybe it worth to think about applying this change to RNDV-get/put only since there can be cases when RMA is unsupported for some reason and on big sizes there would be comparison between Eager and RNDV through AM and BW of latter would be slightly higher due to RNDV_PERF_DIFF. WDYT?

@ivankochin ivankochin self-assigned this Nov 12, 2024
@brminich
Copy link
Contributor

@yosefe @brminich currently patch improves all RNDV protocols, but maybe it worth to think about applying this change to RNDV-get/put only since there can be cases when RMA is unsupported for some reason and on big sizes there would be comparison between Eager and RNDV through AM and BW of latter would be slightly higher due to RNDV_PERF_DIFF. WDYT?

I’d leave it for all protocols for simplicity. Additionally, we have PPLN protocols, and even AM-based RNDV may be preferable to eager due to its lower memory consumption.

@ivankochin
Copy link
Contributor Author

@yosefe @brminich currently patch improves all RNDV protocols, but maybe it worth to think about applying this change to RNDV-get/put only since there can be cases when RMA is unsupported for some reason and on big sizes there would be comparison between Eager and RNDV through AM and BW of latter would be slightly higher due to RNDV_PERF_DIFF. WDYT?

I’d leave it for all protocols for simplicity. Additionally, we have PPLN protocols, and even AM-based RNDV may be preferable to eager due to its lower memory consumption.

I thought AM-based RNDV and PPLN protocols consume same amount of memory as eager, aren't they? Do AM-based protocols send message directly to user-provided buffer?

@brminich
Copy link
Contributor

@yosefe @brminich currently patch improves all RNDV protocols, but maybe it worth to think about applying this change to RNDV-get/put only since there can be cases when RMA is unsupported for some reason and on big sizes there would be comparison between Eager and RNDV through AM and BW of latter would be slightly higher due to RNDV_PERF_DIFF. WDYT?

I’d leave it for all protocols for simplicity. Additionally, we have PPLN protocols, and even AM-based RNDV may be preferable to eager due to its lower memory consumption.

I thought AM-based RNDV and PPLN protocols consume same amount of memory as eager, aren't they? Do AM-based protocols send message directly to user-provided buffer?

In case of tag matching API eager messages are queued on the receiver until receive operation is invoked. Such eager fragments may consume significant amount of memory on the receiver. With RNDV data transfer is started only when receiver is ready

@@ -430,6 +430,27 @@ ucs_status_t ucp_proto_perf_aggregate2(const char *name,
return ucp_proto_perf_aggregate(name, perf_elems, 2, perf_p);
}

void ucp_proto_perf_apply_bias(ucp_proto_perf_t *perf, double bias) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code style

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ucp_proto_perf_factor_id_t fid;
ucp_proto_perf_segment_t *seg;

if (bias == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare with some epsilon value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +443 to +451
ucp_proto_perf_segment_foreach(seg, perf) {
for (fid = 0; fid < UCP_PROTO_PERF_FACTOR_LAST; ++fid) {
seg->perf_factors[fid] = ucs_linear_func_compose(
bias_func, seg->perf_factors[fid]);
ucp_proto_perf_node_update_factors(seg->node, seg->perf_factors);
}
bias_node = ucp_proto_perf_node_new_data("bias", "%.2f %%", bias);
ucp_proto_perf_node_own_child(seg->node, &bias_node);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we implement this using ucp_proto_perf_add_funcs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think that this would be more laconic/readable since ucp_proto_perf_add_funcs is designed to sum up one factors with another ones on some range, so:

  1. We need to sumehow replace sum logic by multiplication
  2. We still need to do that for each segment since doing it from 0 to SIZE_MAX can expand perf layout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants