UCP/PROTO: Consider RNDV_PERF_DIFF #10292

ivankochin · 2024-11-12T06:55:41Z

What?

Applies UCX_RNDV_PERF_DIFF setting effect for protov2.

Why?

This control effect was removed during introducing perf-factors logic. This PR brings it back.

That how it looks like with that patch:

ivankochin · 2024-11-12T06:58:19Z

@yosefe @brminich currently patch improves all RNDV protocols, but maybe it worth to think about applying this change to RNDV-get/put only since there can be cases when RMA is unsupported for some reason and on big sizes there would be comparison between Eager and RNDV through AM and BW of latter would be slightly higher due to RNDV_PERF_DIFF. WDYT?

brminich · 2024-11-12T07:58:27Z

@yosefe @brminich currently patch improves all RNDV protocols, but maybe it worth to think about applying this change to RNDV-get/put only since there can be cases when RMA is unsupported for some reason and on big sizes there would be comparison between Eager and RNDV through AM and BW of latter would be slightly higher due to RNDV_PERF_DIFF. WDYT?

I’d leave it for all protocols for simplicity. Additionally, we have PPLN protocols, and even AM-based RNDV may be preferable to eager due to its lower memory consumption.

ivankochin · 2024-11-12T08:47:10Z

@yosefe @brminich currently patch improves all RNDV protocols, but maybe it worth to think about applying this change to RNDV-get/put only since there can be cases when RMA is unsupported for some reason and on big sizes there would be comparison between Eager and RNDV through AM and BW of latter would be slightly higher due to RNDV_PERF_DIFF. WDYT?

I’d leave it for all protocols for simplicity. Additionally, we have PPLN protocols, and even AM-based RNDV may be preferable to eager due to its lower memory consumption.

I thought AM-based RNDV and PPLN protocols consume same amount of memory as eager, aren't they? Do AM-based protocols send message directly to user-provided buffer?

brminich · 2024-11-12T08:54:09Z

@yosefe @brminich currently patch improves all RNDV protocols, but maybe it worth to think about applying this change to RNDV-get/put only since there can be cases when RMA is unsupported for some reason and on big sizes there would be comparison between Eager and RNDV through AM and BW of latter would be slightly higher due to RNDV_PERF_DIFF. WDYT?

I’d leave it for all protocols for simplicity. Additionally, we have PPLN protocols, and even AM-based RNDV may be preferable to eager due to its lower memory consumption.

I thought AM-based RNDV and PPLN protocols consume same amount of memory as eager, aren't they? Do AM-based protocols send message directly to user-provided buffer?

In case of tag matching API eager messages are queued on the receiver until receive operation is invoked. Such eager fragments may consume significant amount of memory on the receiver. With RNDV data transfer is started only when receiver is ready

yosefe · 2024-11-18T11:53:19Z

src/ucp/proto/proto_perf.c

@@ -430,6 +430,27 @@ ucs_status_t ucp_proto_perf_aggregate2(const char *name,
    return ucp_proto_perf_aggregate(name, perf_elems, 2, perf_p);
 }

+void ucp_proto_perf_apply_bias(ucp_proto_perf_t *perf, double bias) {


yosefe · 2024-11-18T11:53:32Z

src/ucp/proto/proto_perf.c

+    ucp_proto_perf_factor_id_t fid;
+    ucp_proto_perf_segment_t *seg;
+
+    if (bias == 0) {


compare with some epsilon value?

yosefe · 2024-11-18T11:55:06Z

src/ucp/proto/proto_perf.c

+    ucp_proto_perf_segment_foreach(seg, perf) {
+        for (fid = 0; fid < UCP_PROTO_PERF_FACTOR_LAST; ++fid) {
+            seg->perf_factors[fid] = ucs_linear_func_compose(
+                    bias_func, seg->perf_factors[fid]);
+            ucp_proto_perf_node_update_factors(seg->node, seg->perf_factors);
+        }
+        bias_node = ucp_proto_perf_node_new_data("bias", "%.2f %%", bias);
+        ucp_proto_perf_node_own_child(seg->node, &bias_node);
+    }


can we implement this using ucp_proto_perf_add_funcs?

Don't think that this would be more laconic/readable since ucp_proto_perf_add_funcs is designed to sum up one factors with another ones on some range, so:

We need to sumehow replace sum logic by multiplication

We still need to do that for each segment since doing it from 0 to SIZE_MAX can expand perf layout

UCP/PROTO: Consider RNDV_PERF_DIFF

7092087

ivankochin self-assigned this Nov 12, 2024

ivankochin closed this Nov 12, 2024

ivankochin reopened this Nov 12, 2024

ivankochin requested a review from yosefe November 12, 2024 09:22

yosefe reviewed Nov 18, 2024

View reviewed changes

UCP/PROTO: Fix codestyle + compare bias with epsilon

f24c29d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UCP/PROTO: Consider RNDV_PERF_DIFF #10292

UCP/PROTO: Consider RNDV_PERF_DIFF #10292

ivankochin commented Nov 12, 2024 •

edited

Loading

ivankochin commented Nov 12, 2024

brminich commented Nov 12, 2024

ivankochin commented Nov 12, 2024

brminich commented Nov 12, 2024

yosefe Nov 18, 2024

ivankochin Nov 19, 2024

yosefe Nov 18, 2024

ivankochin Nov 19, 2024

yosefe Nov 18, 2024

ivankochin Nov 19, 2024

UCP/PROTO: Consider RNDV_PERF_DIFF #10292

Are you sure you want to change the base?

UCP/PROTO: Consider RNDV_PERF_DIFF #10292

Conversation

ivankochin commented Nov 12, 2024 • edited Loading

What?

Why?

ivankochin commented Nov 12, 2024

brminich commented Nov 12, 2024

ivankochin commented Nov 12, 2024

brminich commented Nov 12, 2024

yosefe Nov 18, 2024

Choose a reason for hiding this comment

ivankochin Nov 19, 2024

Choose a reason for hiding this comment

yosefe Nov 18, 2024

Choose a reason for hiding this comment

ivankochin Nov 19, 2024

Choose a reason for hiding this comment

yosefe Nov 18, 2024

Choose a reason for hiding this comment

ivankochin Nov 19, 2024

Choose a reason for hiding this comment

ivankochin commented Nov 12, 2024 •

edited

Loading