Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nonce: support reading a new Claimed state. #136

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mfycheng
Copy link
Contributor

Add preliminary (backwards compatible) support for a Claimed nonce state. This state will be used in the future to indicate that a specific server has 'claimed' a nonce for future use.

In this future system, servers will 'claim' a batch of Available nonces for use in future transactions. This enables servers to cache nonces to help reduce the overhead on acquiring a nonce per-intent. The defense mechanism to abandoned claim is that claims will expire. Healthy servers will refresh their claimed nonces periodically in the background.

Add preliminary (backwards compatible) support for a Claimed nonce
state. This state will be used in the future to indicate that a specific
server has 'claimed' a nonce for future use.

In this future system, servers will 'claim' a batch of Available nonces
for use in future transactions. This enables servers to cache nonces to
help reduce the overhead on acquiring a nonce per-intent. The defense
mechanism to abandoned claim is that claims will expire. Healthy servers
will refresh their claimed nonces periodically in the background.
@mfycheng mfycheng requested a review from jeffyanta as a code owner June 10, 2024 04:19
Comment on lines +28 to +29
ClaimNodeId string `db:"claim_node_id"`
ClaimExpiresAtMs int64 `db:"claim_expires_at"`
Copy link
Contributor Author

@mfycheng mfycheng Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the docs[1], as long as we allow these two fields to be NULL (and have it as the default), then there won't need to be any rewrite of the underlying tables. This means the ALTER TABLE command will place an exclusive access on the table, but the duration of the command should be quick.

[1] https://www.postgresql.org/docs/current/sql-altertable.html

Comment on lines +174 to +224
// We query a random nonce by first selecting any available candidate from the
// total set, applying an upper limit of 100, and _then_ randomly shuffling the
// results and selecting the first. By bounding the size before ORDER BY random(),
// we avoid having to shuffle large sets of results.
//
// Previously, we would use OFFSET FLOOR(RANDOM() * 100). However, if the pool
// (post filter) size was less than 100, any selection > pool size would result
// in the OFFSET being set to zero. This meant random() disappeared for a subset
// of values. In practice, this would result in a bias, and increased contention.
//
// For example, 50 Available nonce's, 25 Claimed (expired), 25 Reserved. With Offset:
//
// 1. 50% of the time would be a random Available.
// 2. 25% of the time would be a random expired Claimed.
// 3. 25% of the time would be _the first_ Available.
//
// This meant that 25% of the time would not be random. As we pull from the pool,
// this % only increases, further causing contention.
//
// Performance wise, this approach is slightly worse, but the vast majority of the
// time is spent on the scan and filter. Below are two example query plans (from a
// small dataset in an online editor).
//
// QUERY PLAN (OFFSET):
//
// Limit (cost=17.80..35.60 rows=1 width=140) (actual time=0.019..0.019 rows=0 loops=1)
// -> Seq Scan on codewallet__core_nonce (cost=0.00..17.80 rows=1 width=140) (actual time=0.016..0.017 rows=0 loops=1)
// Filter: ((signature IS NOT NULL) AND (purpose = 1) AND ((state = 0) OR ((state = 2) AND (claim_expires_at < 200))))
// Rows Removed by Filter: 100
//
// Planning Time: 0.046 ms
// Execution Time: 0.031 ms
//
// QUERY PLAN (ORDER BY):
//
// Limit (cost=17.82..17.83 rows=1 width=148) (actual time=0.018..0.019 rows=0 loops=1)
// -> Sort (cost=17.82..17.83 rows=1 width=148) (actual time=0.018..0.018 rows=0 loops=1)
// Sort Key: (random())
// Sort Method: quicksort Memory: 25kB
// -> Subquery Scan on sub (cost=0.00..17.81 rows=1 width=148) (actual time=0.015..0.016 rows=0 loops=1)
// -> Limit (cost=0.00..17.80 rows=1 width=140) (actual time=0.015..0.015 rows=0 loops=1)
// -> Seq Scan on codewallet__core_nonce (cost=0.00..17.80 rows=1 width=140) (actual time=0.015..0.015 rows=0 loops=1)
// Filter: ((signature IS NOT NULL) AND (purpose = 1) AND ((state = 0) OR ((state = 2) AND (claim_expires_at < 200))))
// Rows Removed by Filter: 100
//
// Planning Time: 0.068 ms
// Execution Time: 0.037 ms
//
// Overall, the Seq Scan and Filter is the bulk of the work, with the ORDER BY RANDOM()
// adding a small (fixed) amount of overhead. The trade-off is negligible time complexity
// for more reliable semantics.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a thing to really double check.

  1. I discovered this oddity when testing the distributions (which was also highlighted in previous comments). I think this change is likely worth it given the minuscule (theoretical) performance difference.
  2. The previous approach had a fallback query, which I think was to address the OFFSET selecting an offset out of range, but in testing the OFFSET was just set to 0. The new approach removes this issue (but if it was for another issue, that's important to know!)

ClaimExpiresAt: time.Now().Add(time.Hour),
},
} {
require.Error(t, invalid.Validate())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this wasn't really tested or used much before? Will be using more of it in the upcoming functions I imagine

} {
for i := 0; i < 500; i++ {
for i := 0; i < 50; i++ {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shrunk this in size to fit Available, Claimed, Claimed(Expired) within the LIMIT window.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant