Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repair responses always saturate the MTU limit #25818

Closed
CherryWorm opened this issue Jun 7, 2022 · 11 comments
Closed

Repair responses always saturate the MTU limit #25818

CherryWorm opened this issue Jun 7, 2022 · 11 comments
Labels
stale [bot only] Added to stale content; results in auto-close after a week.

Comments

@CherryWorm
Copy link
Contributor

CherryWorm commented Jun 7, 2022

Problem

When repair requests are serviced, shreds are pulled straight from the blockstore, padded with a nonce and then send back to the requestor. However the data that is returned from the blockstore is padded with a bunch of 0-bytes.

Repair is a non-trivial part of Solana network traffic. Doing this more efficient and not sending the 0-byte padding should reduce overall network overhead by quite a bit (about 3-4x in my small experiments).

Proposed Solution

Don't send the 0-padding in repair responses.

@steviez
Copy link
Contributor

steviez commented Jun 7, 2022

Good catch; the relevant snippet where we form repair packets is:

pub fn repair_response_packet(
blockstore: &Blockstore,
slot: Slot,
shred_index: u64,
dest: &SocketAddr,
nonce: Nonce,
) -> Option<Packet> {
let shred = blockstore
.get_data_shred(slot, shred_index)
.expect("Blockstore could not get data shred");
shred
.map(|shred| repair_response_packet_from_bytes(shred, dest, nonce))
.unwrap_or(None)
}

As you called out, the shred bytestreams are 0-padded when pulled from blockstore. When we create the packet, we use the length of the bytestream, and not the length of the shred payload.

pub fn repair_response_packet_from_bytes(
bytes: Vec<u8>,
dest: &SocketAddr,
nonce: Nonce,
) -> Option<Packet> {
let mut packet = Packet::default();
let size = bytes.len() + SIZE_OF_NONCE;
if size > packet.buffer_mut().len() {
return None;
}
packet.meta.size = size;

So, at first glance, simply adding a blockstore retrieval function that returns unpadded streams seems like it would do the trick

@behzadnouri
Copy link
Contributor

The zero padding is part of the signed message; So the receiving node has to:

  • resize the payload to MTU and move the nonce to the end of the payload.
  • zero out bytes where previously nonce was located.
  • verify signature on everything except the nonce at the end of the payload.

That being said current shreds are labeled as legacy, and we are in the process of moving to merkle variants: #25237 which have different layout and signed message. So I wouldn't suggest putting effort into above until merkle variants are fully rolled out.

It is though a good point to consider in designing layout for merkle variants so that above would be easier.

@t-nelson
Copy link
Contributor

t-nelson commented Jul 6, 2022

can someone make a call as to whether we're going to fix this or let it ride 'til the new shred format is live?

@behzadnouri
Copy link
Contributor

can someone make a call as to whether we're going to fix this or let it ride 'til the new shred format is live?

I lean towards not making shreds processing any more convoluted until we have fully migrated to merkle variants.
Their binary layout is different so this will add extra if branches based on bytes introspection.

Lets keep the issue open though because longer term we do want to address this if it has significant impact.

@steviez
Copy link
Contributor

steviez commented Nov 7, 2022

Just ran solana-ledger-tool analyze-storage on some recent blocks:

{
  "column": "ShredData",
  "entries": 237506364,
  "key_stats": {
    "max": 16,
    "total_bytes": 3800101824
  },
  "row_stats": {
    "max": 1156,
    "min": 104,
    "p50": 1156,
    "p90": 1156,
    "p99": 1156,
    "p999": 1156,
    "stddev": 296,
    "total_bytes": 242635907054
  },
  "val_stats": {
    "max": 1140,
    "min": 88,
    "p50": 1140,
    "p90": 1140,
    "p99": 1140,
    "p999": 1140,
    "stddev": 296,
    "total_bytes": 238835805230
  }
}

Row refers to kv whereas val refers to just v; k is 16 bytes (u64 for slot && u64 for shred_id). So

mean shreds_per_byte = total_bytes_from_val_stats / entries
238835805230 / 237506364 = ~1006 bytes / shred

And then

1006 bytes_per_shreds / 1228 bytes_per_packet (accounting for nonce) ~= 82% aka 18% reduction

Need to double check this, but seems like the significant drop in network bandwidth could warrant the additional overhead that would be incurred by the process that @behzadnouri mentioned above. Not a deep dive now, just a quick comment to get this back on the radar

@behzadnouri
Copy link
Contributor

An alternative way to address this issue is to optimize the coalescing/buffering of entries here:
https://github.com/solana-labs/solana/blob/628ac0d4e/core/src/broadcast_stage/broadcast_utils.rs#L34-L98
before [Entry] is converted to shreds.
The larger [Entry] is, the smaller zero padded buffer is per shred (i.e. amortized).
The trade-off is obviously how long it takes since entries are generated until shreds are generated.

@CherryWorm
Copy link
Contributor Author

Friendly bump, did this get implemented yet?

@behzadnouri
Copy link
Contributor

Friendly bump, did this get implemented yet?

no, once merkle shreds are rolled out we can look into impact of removing the zero padding; it will require a slice shift at the receiving end though. This is because for Reed-Solomon erasure coding we need a contiguous buffer which forces us to put the Merkle proof after the zero padding and not before the data buffer.

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jun 10, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 17, 2024
@nyetwurk
Copy link

See #25818 (comment) this should not have been closed.

@steviez
Copy link
Contributor

steviez commented Jul 1, 2024

I'll reopen for now, will let @behzadnouri chime in as he is more up to date on the latest shred happenings.

EDIT: Lol, nevermind; github-actions said no

@steviez steviez reopened this Jul 1, 2024
Copy link
Contributor

github-actions bot commented Jul 1, 2024

This repository is no longer in use. Please re-open this issue in the agave repo: https://github.com/anza-xyz/agave

@github-actions github-actions bot closed this as completed Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

No branches or pull requests

6 participants
@t-nelson @nyetwurk @behzadnouri @CherryWorm @steviez and others