Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Add simple collator election mechanism #2960

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

georgepisaltu
Copy link
Contributor

Closes paritytech/polkadot-sdk#106

This PR adds the ability to bid for collator slots even after the max number of collators have already registered. This eliminates the first come, first served mechanism that was in place before.

Key changes:

  • added increase_bond/decrease_bond extrinsics to allow registered candidates to adjust their bonds in order to dynamically control their bids
  • added take_candidate_slot extrinsic to try to replace an already existing candidate by bidding more than them
  • candidates are now lazily sorted through the bags-list pallet, where the top DesiredCandidates out of MaxCandidates candidates in the list will be selected by the session pallet as collators

I'm not sure whether we should allow decreasing the candidate bids. The mechanic is partially in place already through leave_intent, but it's not quite the same thing. I'm also open to any rename suggestions.

I made adjustments to the parachain template runtime to test things out, but will go ahead and fix the rest of them once we reach consensus on the overall approach.

@georgepisaltu georgepisaltu added A0-please_review Pull request needs code review. B1-note_worthy Changes should be noted in the release notes C3-medium PR touches the given topic and has a medium impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit labels Aug 1, 2023
@georgepisaltu georgepisaltu self-assigned this Aug 1, 2023
@georgepisaltu georgepisaltu added T5-parachains This PR/Issue is related to Parachains. T1-runtime This PR/Issue is related to the topic “runtime”. and removed T5-parachains This PR/Issue is related to Parachains. labels Aug 1, 2023
Copy link
Contributor

@joepetrowski joepetrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you also need to impl SortedListProvider, unless I'm missing it somewhere? But I don't see a default impl in Substrate.

pallets/collator-selection/src/lib.rs Outdated Show resolved Hide resolved
pallets/collator-selection/src/lib.rs Outdated Show resolved Hide resolved
pallets/collator-selection/src/lib.rs Outdated Show resolved Hide resolved
pallets/collator-selection/src/lib.rs Outdated Show resolved Hide resolved
pallets/collator-selection/src/lib.rs Outdated Show resolved Hide resolved
pallets/collator-selection/src/lib.rs Outdated Show resolved Hide resolved
pallets/collator-selection/src/lib.rs Outdated Show resolved Hide resolved
pallets/collator-selection/src/mock.rs Outdated Show resolved Hide resolved
@@ -486,6 +504,7 @@ construct_runtime!(
Session: pallet_session = 22,
Aura: pallet_aura = 23,
AuraExt: cumulus_pallet_aura_ext = 24,
CandidateList: pallet_bags_list::<Instance1> = 25,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implied

Suggested change
CandidateList: pallet_bags_list::<Instance1> = 25,
CandidateList: pallet_bags_list = 25,


Self::deposit_event(Event::CandidateRemoved { account_id: target });
Self::deposit_event(Event::CandidateAdded { account_id: who, deposit });
Ok(Some(T::WeightInfo::take_candidate_slot(length as u32)).into())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should do the same for the other functions: refund weight by providing the actual length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I forgot to do this for increase_bond/decrease_bond.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if candidate_info.deposit < <CandidacyBond<T>>::get() {
return Err(Error::<T>::OnRemoveInsufficientFunds.into())
}
T::Currency::unreserve(&who, bond);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to migrate to new fungible trait from Currency
may be in different PR
here about new traits - paritytech/substrate#12951

@@ -289,6 +300,16 @@ pub mod pallet {
NoAssociatedValidatorId,
/// Validator ID is not yet registered.
ValidatorNotRegistered,
/// Some doc.
OnInsert,
Copy link
Contributor

@muharem muharem Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more idiomatic way within our libs to name the errors, is not describing on what occasion an error has happened, but what has happened, FailedToUpdateCandidateList for example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the names are just placeholders right now. Same with doc comments. I'll write some proper ones once the other comments are addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

});
Ok(())
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be easier to review with the docs telling what this meant for
who is target, who is origin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller of this extrinsic wants to register themselves in the candidate list by taking the spot of another candidate target. The caller's deposit must be greater than the deposit of the candidate it's trying to replace for this to work.

I'll get some proper documentation on these functions in a future iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

collators
}

/// Kicks out candidates that did not produce a block in the kick threshold and refunds
/// their deposits.
pub fn kick_stale_candidates(
candidates: BoundedVec<CandidateInfo<T::AccountId, BalanceOf<T>>, T::MaxCandidates>,
) -> BoundedVec<T::AccountId, T::MaxCandidates> {
) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to document what is returned value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -142,6 +143,8 @@ pub mod pallet {
/// Maximum number of invulnerables.
type MaxInvulnerables: Get<u32>;

type CandidateList: SortedListProvider<Self::AccountId, Score = BalanceOf<Self>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why we using the bags-list?
the number of Candidates is not more than desired_candidates (can be more if desired_candidates changed to the less value), so the Candidates list is quite small.
Candidates list stored as a value, the whole list is being read usually from the storage.
is this more efficient than maintaining a sorted list of Candidates within the pallet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be at most MaxCandidates candidates, out of which the top N (where N is DesiredCandidates) are selected to be collators when sessions begin/end. The list could be small, it depends on the config, but the idea behind using bags-list was to have it lazily sorted, as in people take a slot in the list and then they use bags-list extrinsics to place their entry higher on the list, thus sorting it themselves and paying for it. This moves the sorting logic off-chain as the candidates registering have to watch the list and do the calls if/when they see fit. It would also give an incentive to only the top N candidates to be in a sorted order, since if you're below the N threshold, it doesn't matter if your entry is out of order or not since you're not gonna get selected for collation. This is also true for all the entries above the threshold, since it doesn't matter what spot of the top N you got, you're still going to be selected. This reduces the sorting burden on the list and also makes the candidates responsible for their place in the list.

Copy link
Contributor

@muharem muharem Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bags-list will perform additional reads and inserts on insert/decrease/increase etc. On decrease and increase it will run the same rebag as it does with the extrinsic you mentioning. So I am not sure what exactly you optimizing with it.
what the BagThresholds you planning to set?
I think for this case, when MaxCandidates wont be more than 50 in near future, its not necessary.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the bags list is counter productive in terms of performance and resources since the number of candidates is potentially small, I'd suggest implementing a simpler SortedListProvider that allows parachain devs to decide whether to use that implementation or an instance of bags-lists.

In any case, it would be interesting to compare benchmarks with both implementations and see what's the threshold in terms of number of candidates (and corresponding thresholds) where the bags list approach starts to make sense (if it's not always better -- I'm unsure).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bags-list will perform additional reads and inserts on insert/decrease/increase etc. On decrease and increase it will run the same rebag as it does with the extrinsic you mentioning. So I am not sure what exactly you optimizing with it.
what the BagThresholds you planning to set?

I was planning to have a single universal bag - so BagThresholds = [] - and on_increase and on_decrease would be noops for that configuration since rebag would do nothing, but they would be needed if the user chose a different SortedListProvider implementation. The on_insert operation for bags-list is very simple, while on_remove is a bit costly indeed, since it would need to traverse the list.

I think for this case, when MaxCandidates wont be more than 50 in near future, its not necessary.

I thought it was supposed to be about 100 in the near future with the potential to grow even more, but I'm not sure, I might be misremembering.

In any case, it would be interesting to compare benchmarks with both implementations and see what's the threshold in terms of number of candidates (and corresponding thresholds) where the bags list approach starts to make sense (if it's not always better -- I'm unsure).

I'm in favor of implementing a simpler SortedListProvider and comparing the two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented both the simple list as a separate pallet as well as integrated within collator-selection. I ran the benchmarks on my machine and these are the results in terms of weight:
register_as_candidate: bags-list outperforms for MaxCandidates > 55
increase_bond: bags-list outperforms for MaxCandidates > 10
decrease_bond: bags-list outperforms for MaxCandidates > 6
take_candidate_slot: bags-list outperforms for MaxCandidates > 160
leave_intent always: more expensive on bags-list by about 50%
new_session always: more expensive on bags-list by about 30%

There is no major difference between the simple sorted list pallet vs list kept in the storage of collator-selection, aside from increase_bond which starts to be better for the integrated list for MaxCandidates > 13.

The bags-list benchmarks do not include the put_in_front_of call costs because those are separate and are supposed to be called less often than the other extrinsics presented here, as in not every time a bond changes.

I didn't get to run the bags-list pallet benchmarks locally to get an idea of how much those would cost, but each put_in_front_of scales linearly with MaxCandidates. Given the small MaxCandidates values we're working with, bags-list is not ideal. That said, the simple list as a separate pallet seems to be the way to go for now, since it allows for customization down the line and the weights are very similar with the integrated version.

I'll link the branches used for the benchmarks below:

Copy link
Contributor

@muharem muharem Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@georgepisaltu why you need two lists (Candidates and CandidateList) for the simple list within collator-selection solution?
here you even do an additional read for very candidate - https://github.com/georgepisaltu/cumulus/blob/b0ca1b615336d83ea210a26950efc4833ab5367e/pallets/collator-selection/src/lib.rs#L823

you can have only one Candidate<T: Config> = StorageValue<_, BoundedVec<(AccountId, Deposit), T::MaxCandidates> for that solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with that we wont event need those new three dispatchables. we need only one more or register_as_candidate can both. we can only rename it (keeping the same index) to imply that its not only register but also reset bond.

Copy link
Contributor Author

@georgepisaltu georgepisaltu Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you need two lists (Candidates and CandidateList) for the simple list within collator-selection solution?

The purpose of the Candidates map is to allow fast access to the deposits. Even in the solution where the list was within the collator-selection pallet storage, there are still places where we look for an account's deposit and we have to iterate the list if we don't have the map. However, with these small numbers it's probably counterproductive to load a 10-sized map into memory just for that. As a bonus, it works well with the SortedListProvider/ScoreProvider abstraction we have in place. I'm not too worried about the "extra" map since it uses the Identity hasher, so it should be cheaper than a standard one.

with that we wont event need those new three dispatchables. we need only one more or register_as_candidate can both. we can only rename it (keeping the same index) to imply that its not only register but also reset bond.

The plan right now is to merge increase_bond and decrease_bond into a update_bond. A function like update_bond is necessary to allow an auction to take place and I like the idea of these functions being as intuitive and simple as possible, so I'm against having bid update mechanics in the function used to register.

Anyway, I implemented a simpler version without the map, as you suggested, and I got the benchmarks comparing the simple list in a separate pallet with the mapless implementation (I'll refer to it as "simplest" from this point on):
register_as_candidate: always more expensive with the pallet version by about 40%
increase_bond: simplest version outperforms for MaxCandidates > 10
decrease_bond: same base weight, increases by about 1/30 of the base weight for each extra candidate for the pallet version
take_candidate_slot: always more expensive with the pallet version by about 35%
leave_intent: always more expensive with the pallet version by about 20%
new_session: always more expensive with the pallet version by about 25%
Also, the simplest version also uses about half the reads/writes on the database compared to the simple pallet version.

Also, I managed to get the benchmarks for the bags-list put_in_front_of calls, which would be additional weight incurred by users in the bags-list version. However, the catch with that is that if we use an empty BagThresholds, there is only a single universal bag, so all rebags are basically noops (they still have to fetch the thresholds from storage but the cost is reflected in the previous benchmarks I posted anyway). This means that the optimal way to use the bags-list version would be to register as a candidate, observe the list off-chain and do calls to update_bond as necessary to have a top-DesiredCandidates bond in place, so before the new session is about to begin and the collators are being selected (after the dust is settled and the bidding is done), if you are a top-DesiredCandidates candidate, call put_in_front_of so that you can claim your collator spot. This means that the amount of swaps done on the list becomes DesiredCandidates, a fraction of MaxCandidates by definition, and it's easy to see how it would scale if the numbers were greater, but as I understand, MaxCandidates will probably be a small number. put_in_front_of is about as expensive as a leave_intent, which means it's inexpensive for a candidate to call this to claim their spot.

That being said, I think for the problem we're trying to solve right now we should go with the simple list version, but in a separate pallet while keeping the SortedListProvider abstraction. The reasons for this are:

  • there isn't a significant difference between the separate pallet version and the list inside collator-selection pallet storage (see benches above)
  • easy to upgrade in the future
  • works for other users of the collator-selection pallet who may want to have lots of collators, even if we plan on having small lists of candidates

Main downside of this approach is the extra complexity in the code because of the new simple-list pallet which needs to be tested/maintained.

I assumed that other users may want to use the collator-selection pallet with MaxCandidates values greater than 50-100, which seems to be the cut-off point for bags-list. If I'm wrong about that, and, AFAIK, we don't really plan to raise that over 20 any time soon, then it's probably better to go with the simplest possible version, a list kept inside the collator-selection pallet storage.

@muharem @joepetrowski @gpestana let me know what you think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go for the simpler solution since we don't expect needing more than 100 candidates at any time. Having a list and a map also adds overhead of needing to maintain some invariants like have one-to-one correspondence of elements and keys.

About the solution of removing the map, BoundedVecs should generally be at their limit in usage, while general storage should use maps to facilitate O(1) operations. Perhaps a good solution is to store candidates in a map, but only pass a (full) BoundedVec to Session for the next set of validators? I'm not sure the costs of computing that on session change vs. just maintaining candidates in a vec though.

Agree with collapsing increase/decrease into update.

I don't think we should optimize for benchmarks. We don't expect these functions to be called very often. The one we do need to keep an eye on is the new_session call since that is called in on_initialize and needs to stay within block building bounds.

@@ -142,6 +143,8 @@ pub mod pallet {
/// Maximum number of invulnerables.
type MaxInvulnerables: Get<u32>;

type CandidateList: SortedListProvider<Self::AccountId, Score = BalanceOf<Self>>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be at most MaxCandidates candidates, out of which the top N (where N is DesiredCandidates) are selected to be collators when sessions begin/end. The list could be small, it depends on the config, but the idea behind using bags-list was to have it lazily sorted, as in people take a slot in the list and then they use bags-list extrinsics to place their entry higher on the list, thus sorting it themselves and paying for it. This moves the sorting logic off-chain as the candidates registering have to watch the list and do the calls if/when they see fit. It would also give an incentive to only the top N candidates to be in a sorted order, since if you're below the N threshold, it doesn't matter if your entry is out of order or not since you're not gonna get selected for collation. This is also true for all the entries above the threshold, since it doesn't matter what spot of the top N you got, you're still going to be selected. This reduces the sorting burden on the list and also makes the candidates responsible for their place in the list.

pallets/collator-selection/src/lib.rs Outdated Show resolved Hide resolved
@@ -289,6 +300,16 @@ pub mod pallet {
NoAssociatedValidatorId,
/// Validator ID is not yet registered.
ValidatorNotRegistered,
/// Some doc.
OnInsert,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the names are just placeholders right now. Same with doc comments. I'll write some proper ones once the other comments are addressed.

});
Ok(())
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller of this extrinsic wants to register themselves in the candidate list by taking the spot of another candidate target. The caller's deposit must be greater than the deposit of the candidate it's trying to replace for this to work.

I'll get some proper documentation on these functions in a future iteration.

collators
}

/// Kicks out candidates that did not produce a block in the kick threshold and refunds
/// their deposits.
pub fn kick_stale_candidates(
candidates: BoundedVec<CandidateInfo<T::AccountId, BalanceOf<T>>, T::MaxCandidates>,
) -> BoundedVec<T::AccountId, T::MaxCandidates> {
) -> usize {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

pallets/collator-selection/src/tests.rs Outdated Show resolved Hide resolved
pallets/collator-selection/src/lib.rs Outdated Show resolved Hide resolved
pallets/collator-selection/src/lib.rs Outdated Show resolved Hide resolved
@@ -142,6 +143,8 @@ pub mod pallet {
/// Maximum number of invulnerables.
type MaxInvulnerables: Get<u32>;

type CandidateList: SortedListProvider<Self::AccountId, Score = BalanceOf<Self>>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the bags list is counter productive in terms of performance and resources since the number of candidates is potentially small, I'd suggest implementing a simpler SortedListProvider that allows parachain devs to decide whether to use that implementation or an instance of bags-lists.

In any case, it would be interesting to compare benchmarks with both implementations and see what's the threshold in terms of number of candidates (and corresponding thresholds) where the bags list approach starts to make sense (if it's not always better -- I'm unsure).

let who = ensure_signed(origin)?;

let new_bond =
<Candidates<T>>::try_mutate(|candidates| -> Result<BalanceOf<T>, DispatchError> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a map keyed by account Id would suit better the candidates use case to avoid this loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

///
/// Todo
#[pallet::call_index(7)]
#[pallet::weight(T::WeightInfo::increase_bond(T::MaxCandidates::get()))]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep the candidates in a vec, perhaps we should do weight refund here using the DispatchWeightWithPostInfo, as the worst case estimate may be overestimating and fees will be too high. See more here and maybe some helpful pointers here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

///
/// Todo
#[pallet::call_index(8)]
#[pallet::weight(T::WeightInfo::decrease_bond(T::MaxCandidates::get()))]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment regarding weights as in increase_bond.

let who = ensure_signed(origin)?;

let new_bond =
<Candidates<T>>::try_mutate(|candidates| -> Result<BalanceOf<T>, DispatchError> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggestion for you here: the implementation of the increase and decrease bond extrinsics are very similar. I'd suggest refactoring out most it it's mutation logic into a Self::update_bond(accountID, imbalance), where the imbalance is an enum that can take positive and negative imbalance (to add or remove bond).

The pallet balances implements a similar logic and, perhaps more relevant, I've implemented something similar in this PR, which also calls into a SortedListProvoder::on_increase/decrease paritytech/substrate#14620

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea.

Error::<T>::ValidatorNotRegistered
);

ensure!(deposit >= Self::candidacy_bond(), Error::<T>::InsufficientBond);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this check before fetching the validator key, in the spirit of verifying first and read/write later. The storage transactional ensures that all the writes are rolled back if the extrinsic fails, but the node may spend unnecessary cycles reading/writing something that will be eventually rolled back due to this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


ensure!(deposit >= Self::candidacy_bond(), Error::<T>::InsufficientBond);

<Candidates<T>>::try_mutate(|candidates| -> Result<(), DispatchError> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I start to think that Candidates should be refactored as a storage map instead of a vec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable
Logs: https://gitlab.parity.io/parity/mirrors/cumulus/-/jobs/3352264

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B1-note_worthy Changes should be noted in the release notes C3-medium PR touches the given topic and has a medium impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit T1-runtime This PR/Issue is related to the topic “runtime”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Simple Election for Permissionless Collators
5 participants