-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrates to the new contact-info #823
Conversation
ad56c18
to
d03af0b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #823 +/- ##
=======================================
Coverage 81.9% 81.9%
=======================================
Files 853 853
Lines 231955 231919 -36
=======================================
- Hits 189984 189970 -14
+ Misses 41971 41949 -22 |
5072b36
to
592b730
Compare
592b730
to
d6d95f9
Compare
#[test] | ||
#[ignore] | ||
#[serial] | ||
fn test_invalid_forks_persisted_on_restart() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AshwinSekar this test fails with this commit.
It passes if I make below change to push contact-infos out more frequently.
diff --git a/gossip/src/cluster_info.rs b/gossip/src/cluster_info.rs
index 2707a3ea0b..02aa95c1ea 100644
--- a/gossip/src/cluster_info.rs
+++ b/gossip/src/cluster_info.rs
@@ -1875,7 +1875,7 @@ impl ClusterInfo {
entrypoints_processed = entrypoints_processed || self.process_entrypoints();
//TODO: possibly tune this parameter
//we saw a deadlock passing an self.read().unwrap().timeout into sleep
- if start - last_push > CRDS_GOSSIP_PULL_CRDS_TIMEOUT_MS / 2 {
+ {
self.push_self();
self.refresh_push_active_set(
&recycler,
Another way to make it pass is to use the new contact-infos in pull-requests as below, which would also help to propagate contact-infos (but we cannot do this yet because of compatibility issues).
diff --git a/gossip/src/cluster_info.rs b/gossip/src/cluster_info.rs
index 2707a3ea0b..c59a8713cb 100644
--- a/gossip/src/cluster_info.rs
+++ b/gossip/src/cluster_info.rs
@@ -1564,11 +1564,7 @@ impl ClusterInfo {
.map(|(_, filters)| filters.len())
.sum::<usize>() as u64;
self.stats.new_pull_requests_count.add_relaxed(num_requests);
- // TODO: Use new ContactInfo once the cluster has upgraded to:
- // https://github.com/anza-xyz/agave/pull/803
- let self_info = LegacyContactInfo::try_from(&self.my_contact_info())
- .map(CrdsData::LegacyContactInfo)
- .expect("Operator must spin up node with valid contact-info");
+ let self_info = CrdsData::ContactInfo(self.my_contact_info());
let self_info = CrdsValue::new_signed(self_info, &self.keypair());
let pulls = pulls
.into_iter()
So I am guessing the test relies on contact-infos being propagated by a certain time or something. I have ignored it for now not to block this change because that is technically not a problem with this code. Can you please take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, thanks for looking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few questions/comments. Also, do we still want to boost LegacyContactInfo
when generating Pull Responses? https://github.com/behzadnouri/solana/blob/d6d95f92fbd311efabf51514062427d4dcb479fc/gossip/src/cluster_info.rs#L2124
@@ -155,7 +155,7 @@ impl CrdsData { | |||
// TODO: Assign ranges to each arm proportional to their frequency in | |||
// the mainnet crds table. | |||
match kind { | |||
0 => CrdsData::LegacyContactInfo(LegacyContactInfo::new_rand(rng, pubkey)), | |||
0 => CrdsData::ContactInfo(ContactInfo::new_rand(rng, pubkey)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we haven't removed LegacyContactInfo
completely from the network, should we make ContactInfo
an additional field rather than in place of LegacyContactInfo
? will help ensure backwards compatability until LegacyContactInfo
is fully gone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be fine.
This commit is reducing the uses of LegacyContactInfo to pull-requests.
We are not going to backport this code, so by the time this code is about to hit mainnet, the cluster has already upgraded to #803 and we can remove legacy contact-info from pull-requests as well.
@@ -52,7 +52,7 @@ impl_crds_entry!(CrdsValue, |entry| Some(&entry?.value)); | |||
impl_crds_entry!(VersionedCrdsValue, |entry| entry); | |||
|
|||
// Lookup by Pubkey. | |||
impl_crds_entry!(LegacyContactInfo, CrdsData::LegacyContactInfo(node), node); | |||
impl_crds_entry!(ContactInfo, CrdsData::ContactInfo(node), node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we going to run into an issue where we cannot lookup LegacyContactInfo
from the crds.table
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there were any instances of that, we would get compile-time error.
// TODO: Use new ContactInfo once the cluster has upgraded to: | ||
// https://github.com/anza-xyz/agave/pull/803 | ||
let self_info = LegacyContactInfo::try_from(&self.my_contact_info()) | ||
.map(CrdsData::LegacyContactInfo) | ||
.expect("Operator must spin up node with valid contact-info"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i do not understand why this is incompatable. looks like we're able to handle the ContactInfo on the receive side of a PullRequest
in handle_batch_pull_requests()
: https://github.com/behzadnouri/solana/blob/d6d95f92fbd311efabf51514062427d4dcb479fc/gossip/src/cluster_info.rs#L1967
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but that was only done in #803 which is not backported to v1.17 yet. So mainnet node still expect legacy contact-info in the pull request and otherwise drop the pull request.
To keep master branch compatible with mainnet, we cannot yet switch to the new contact-info here until mainnet has upgraded to #803 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh good to know we need to keep master compatible with mainnet. sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the same reason we keep adding LegacyContactInfo
to PullResponses with higher probability: https://github.com/behzadnouri/solana/blob/d6d95f92fbd311efabf51514062427d4dcb479fc/gossip/src/cluster_info.rs#L2124?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, since the cluster is still relying on LegayContactInfo
we need to keep them prioritized for now.
@@ -5706,6 +5703,7 @@ fn test_randomly_mixed_block_verification_methods_between_bootstrap_and_not() { | |||
|
|||
/// Forks previous marked invalid should be marked as such in fork choice on restart | |||
#[test] | |||
#[ignore] | |||
#[serial] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scares me a little that this test is failing just from switching to ContactInfo
. Makes me think it's not getting handled properly somewhere...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but looks like if I make either of patches mentioned in https://github.com/anza-xyz/agave/pull/823/files#r1575382916 it passes. so looks like it relies on contact-infos propagation by a certain time which is not guaranteed anyways.
Nonetheless, we are not going to backport this patch, and as soon as mainnet has upgraded to #803 we can use the new contact-info in the pull requests which makes the test pass.
But agreed that we should investigate this test and figure out what assumptions are failing. Ashwin has opened #982 to keep track of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya good point. the frequency fix is interesting. Wondering why that affects the ContactInfo but not LegacyContactInfo? I'd think it wouldn't have an affect...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the frequency fix is interesting. Wondering why that affects the ContactInfo but not LegacyContactInfo?
Previously, in addition to push and pull-responses, one more way that contact-infos were propagated across cluster was through the "caller" in pull-requests:
https://github.com/anza-xyz/agave/blob/e7452b768/gossip/src/cluster_info.rs#L2076-L2081
https://github.com/anza-xyz/agave/blob/e7452b768/gossip/src/crds_gossip_pull.rs#L296-L307
process_pull_requests
is terribly named but basically what it does it upserts CRDS table with the caller's contact-info attached in pull-requests.
With this commit we are changing what struct counts as "contact-info" by switching from LegacyContactInfo
to ContactInfo
. So that caller's (legacy) contact-info attached to pull-requests is no longer relevant (because it is not a ContactInfo
, but a Legacy...
).
So basically with this commit we are killing that extra propagation path for contact-infos I mentioned earlier above. Increasing the frequency of "push_self" seems to compensate for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhhh yes got it. that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. just need to keep an eye on #982
Problem
move on to new contact-info
Summary of Changes
replaced uses of legacy contact-info with the new one.