-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(lightpush): peer management for protocols #2003
Conversation
size-limit report 📦
|
3b96c37
to
b947aa9
Compare
@@ -315,7 +321,8 @@ export class StoreSDK extends BaseProtocolSDK implements IStoreSDK { | |||
} | |||
|
|||
export function wakuStore( | |||
connectionManager: ConnectionManager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's include it into init
and maybe make it of a type ProtocolCreateOptions & ProtocolServices
or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea. This currently becomes non-trivial considering there is no type/interfaces for ConnectionManager
in @waku/interfaces
-- this becomes a nice followup for #1969
@@ -43,15 +50,19 @@ class LightPushSDK extends BaseProtocolSDK implements ILightPushSDK { | |||
}; | |||
} | |||
|
|||
const peers = await this.protocol.getPeers(); | |||
if (!peers.length) { | |||
const peersFound = await this.hasPeers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to me in this case we should get
peers as fast as possible and only in background support validity and quality of the pool
because if we make it part of send
or subscribe
(or for Store) - then it increases delay for the application built on top
instead I think we should be lazy in this approach and prevent / make consumer wait only if service doesn't work (error being thrown)
edge case is if the only peer is in the pool and this peer is bad - I believe it is a rare case for which we should optimize only after seeing it happens often for end users.
So we should measure it now: add a log line for it for later use in telemetry etc, also a point for dogfood app we are preparing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your approach in theory and that is how I implemented it.
However, this hasPeers()
check makes itself useful considering we have a recurring interval that maintains peers and thus we need to ensure that maintainPeers()
was called at least once.
Looking at the function:
protected hasPeers = async (): Promise<boolean> => {
let success = await this.maintainPeers();
let attempts = 0;
while (!success || this.peers.length === 0) {
attempts++;
success = await this.maintainPeers();
if (attempts > 3) {
if (this.peers.length === 0) {
this.log.error("Failed to find peers to send message to");
return false;
} else {
this.log.warn(
`Found only ${this.peers.length} peers, expected ${this.numPeersToUse}`
);
return true;
}
}
await new Promise((resolve) => setTimeout(resolve, 1000));
}
return true;
};
It will return if peers.length > 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and thus we need to ensure that maintainPeers() was called at least once
it seems it is initiated from theconstructor
so there is no need to make sure of it anymore, consideringctor
was initiated successfully
let's not have blocking routines for ad-hoc operations, if we cannot send a message now - we should throw (maybe later we can implement improvement of re-sending messages but this is out of scope)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your POV and agree to it.
This is non-trivial because if we don't provide this reliability abstraction part of the SDK send()
functionality, it will have to be implemented as a wrapper by library consumers.
For example: if the interval set is 30s
(default), users will have to wait 30 seconds after node initialisation to be able to register new connected peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an idea:
Instead of relying solely on an interval, let's trigger maintainPeers()
on a new peer:connect
event.
This will significantly improve efficiency of the routine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that definitely will improve the time it takes for maintenance to update.
Tested it out with tests.
However, because of the nature of it being non-blocking, send()
will fail way more often when users try to send a lightpush request right after starting their node.
This is most reproducible in our tests, where we do a lightpush.send()
right after we start a node and connect to peers (as will also happen in most user applications)
IMO: it is a good default/reliability abstraction to automatically try to wait for peers IF there are no peers available.
If peers are available, well and good and we don't wait.
Since this is especially a SDK level feature, I think this is a nice to have.
We can wrap this as an argument autoRetry
from user with default true
@weboko
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented some improvements to your comment here: 8e89b24
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of autoRetry
and we can enrich it within #2032 (feel free to take if you want and when free and let's discuss exact design, we can change it from what I described there)
and I think it is a good way to use peer:identify
or some other event (maybe connected
is good for that purpose too)
but I would still not make consumer wait for some other operation during .send
unless it is explicitly configured
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private async findAdditionalPeers(numPeers: number): Promise<Peer[]> { | ||
this.log.info(`Finding ${numPeers} additional peers`); | ||
try { | ||
let newPeers = await this.core.getPeers({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general - getPeers
from BaseProtocolCore
will get peer only those we are connected to
is it sufficient? it seems that in some cases we need to initiate new connections, if existing are not enough or not good
or we rely on connection manager to automatically add one after it being dropped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConnectionManager
keeps on trying to connect to as many peers, as we keep discovering this.
The upper limit on max connections is 100: https://github.com/libp2p/js-libp2p/blob/169c9d85e7c9cd65be964b5d08bd618d950f70ee/doc/LIMITS.md?plain=1#L39
This is more than enough. We can assume that when a connection is dropped, we can find new peers (if they were discovered), or we will connect to them as soon as we discover them.
There does not seem to be an apparent action js-waku can take to connect to new peers, other than what's already happening through discoveries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it seems to be 300
I think we can document it somewhere, at least, in comment section - mentioning that assumption is to have connection manager to populate connected peers.
e94ac30
to
ff79151
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! We derived follow-ups and before merging, please, make sure it works e2e within some of our exampels.
ff79151
to
f97c6a9
Compare
- passes `ConnectionManager` to ProtocolSDK to allow disconnecting from within protocol - maintains `numPeersToUse` for each protocol within BaseProtocolSDK
…ests fixes parallelisation of lightpush.send() requests
8122de7
to
910d5d3
Compare
Problem
For Filter and LightPush, every time a new LP request/new subscription is created, getPeers() is called -- this is called every time which gives us a new set of peers to use for each request/subscription.
Further, we wish to discard a peer if it fails, and renew it with another peer. This is not trivial to do if we rely on getting a new set of peers each time.
Solution
This PR tackles it for LightPush:
numPeersToUse
Notes
Contribution checklist:
!
in title if breaks public API;