-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(p2p): ensure time synchronization in the network #2255
Conversation
b675759
to
afe2212
Compare
Signed-off-by: onur-ozkan <[email protected]>
afe2212
to
4343c0e
Compare
Signed-off-by: onur-ozkan <[email protected]>
Signed-off-by: onur-ozkan <[email protected]>
f9dd830
to
5acfe8d
Compare
Signed-off-by: onur-ozkan <[email protected]>
da8335c
to
82ae6d2
Compare
Signed-off-by: onur-ozkan <[email protected]>
Signed-off-by: onur-ozkan <[email protected]>
Signed-off-by: onur-ozkan <[email protected]>
/// TODO: This should be called `PeerInfoRequest` instead. However, renaming it | ||
/// will introduce a breaking change in the network and is not worth it. Do this | ||
/// renaming when there is already a breaking change in the release. | ||
NetworkInfo(network_info::NetworkInfoRequest), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR already leads to breaking change. It's perfect time to rename this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to avoid breaking changes is to have a timeout on the request for peer timestamp and allow them to connect if there is no response from the peer. We later remove this when most peers update. Another way is to get the mm2 version and do the timestamp request depending on the version. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to avoid breaking changes is to have a timeout on the request for peer timestamp and allow them to connect if there is no response from the peer. We later remove this when most peers update.
This sounds like a good plan!
Another way is to get the mm2 version and do the timestamp request depending on the version. What do you think?
I think the first idea was better, but then I need to revert the peer-info renaming part. What about bundling these changes in some branch, and then releasing multiple breaking changes at once (like removing mm2 binaries and some other stuff) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about bundling these changes in some branch, and then releasing multiple breaking changes at once (like removing mm2 binaries and some other stuff) ?
I am fine with this or the first approach.
Signed-off-by: onur-ozkan <[email protected]>
Signed-off-by: onur-ozkan <[email protected]>
5ba1ac5
to
e86a166
Compare
Signed-off-by: onur-ozkan <[email protected]>
e86a166
to
c1a6eaf
Compare
Signed-off-by: onur-ozkan <[email protected]>
Signed-off-by: onur-ozkan <[email protected]>
Signed-off-by: onur-ozkan <[email protected]>
… into time-synced-network
Signed-off-by: onur-ozkan <[email protected]>
Signed-off-by: onur-ozkan <[email protected]>
Signed-off-by: onur-ozkan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I have a few suggestion in addition to these 2 comments #2255 (comment), #2255 (comment)
Signed-off-by: onur-ozkan <[email protected]>
Signed-off-by: onur-ozkan <[email protected]>
/// Determines if a dial attempt to the remote should be made. | ||
/// | ||
/// Returns `false` if a dial attempt to the given address has already been made, | ||
/// in which case the caller must skip the dial attempt. | ||
fn pre_dial_check(recently_dialed_peers: &mut MutexGuard<TimedMap<StdClock, Multiaddr, ()>>, addr: &Multiaddr) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither the doc comment nor the func name signify how this function changes the timedmap (only visible from the mutability of the passed map). Let's change this to something like check_and_mark_dialed
or something.
Also, no need for the MutexGuard
annotation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither the doc comment nor the func name signify how this function changes the timedmap (only visible from the mutability of the passed map)
"only visible from the mutability of the passed map" that's visible everywhere, both in the function signature and on the caller side!
I struggle to see how check_and_mark_dialed
describes the function better than pre_dial_check
but whatever, I am okay with it.
Also, no need for the MutexGuard annotation here.
It gives/documents the background context how this map is utilized from caller.
Signed-off-by: onur-ozkan <[email protected]>
… into time-synced-network
Do we have any blocker for this PR? @shamardy |
Signed-off-by: onur-ozkan <[email protected]>
Signed-off-by: onur-ozkan <[email protected]>
Doesn't cause the breakage anymore. |
… into time-synced-network
Resolved conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking comments inline (just to not block this further).
LGTM
debug!( | ||
"Peer '{peer}' is within the acceptable time gap ({MAX_TIME_GAP_FOR_CONNECTED_PEER} seconds); time difference is {diff} seconds." | ||
); | ||
response_tx.send(None).await.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this send in redundant. what about not sending anything in here, as the other end of the end of the channel doesn't really use the None
for anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't send anything, it will block the request/process until timeout. The channel plays a request-response style communication here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please take a second look at this. the response_tx
is for the timestamp checker and not a response to some node or something (i didn't really get what u mean).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thought was awaiting on result of this channel
error!("Unexpected response `{other:?}` from peer `{peer}`"); | ||
// TODO: Ideally, we should send `Some(peer)` to end the connection, | ||
// but we don't want to cause a breaking change yet. | ||
response_tx.send(None).await.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for this None
@@ -733,11 +818,27 @@ fn start_gossipsub( | |||
} | |||
} | |||
|
|||
while let Poll::Ready(Some(Some(peer_id))) = timestamp_rx.poll_next_unpin(cx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we can make this channel transmit PeerID
s instead of Option<PeerID>
s.
BTW I tried to run tests in the mm2_p2p module with command |
That's weird lol. The crate path can't be resolved when targeting |
cargo test -p mm2_p2p --features "application" is needed |
I am aware, it's fixed in #2302 already |
Implementation Details
Once a connection is established with a peer, validation check is immediately performed to confirm the peer's clock. If the peer's time exceeds the maximum allowed cap, the connection is terminated. This approach ensures that all peers performing operations in the network are synchronized in terms of time. Current time cap is set to 30 seconds, which can be changed if it doesn't fit the requirements.
Each peer is temporarily added to the
RECENTLY_DIALED_PEERS
map for 5 minutes before any connection attempt. This prevents repeated connection attempts to peers that are unavailable or out-of-sync, which reduces unnecessary reconnection overheads.Breaking Changes1. A new request-response payload (GetPeerUtcTimestamp
) has been added. This request is sent at each connection attempt, which makes it mandatory for all seed nodes and GUI applications in the network to be updated to this version. Meaning, this version is incompatible with any older version.2. TheNetworkInfoRequest
has been renamed toPeerInfoRequest
. This likely affects the encoded payload ofGetMm2Version
(not yet verified but highly likely).Resolves #1115 and #1683
Blocked by #2256