-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nonce: support reading a new Claimed state. #136
base: main
Are you sure you want to change the base?
Conversation
Add preliminary (backwards compatible) support for a Claimed nonce state. This state will be used in the future to indicate that a specific server has 'claimed' a nonce for future use. In this future system, servers will 'claim' a batch of Available nonces for use in future transactions. This enables servers to cache nonces to help reduce the overhead on acquiring a nonce per-intent. The defense mechanism to abandoned claim is that claims will expire. Healthy servers will refresh their claimed nonces periodically in the background.
ClaimNodeId string `db:"claim_node_id"` | ||
ClaimExpiresAtMs int64 `db:"claim_expires_at"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the docs[1], as long as we allow these two fields to be NULL
(and have it as the default), then there won't need to be any rewrite of the underlying tables. This means the ALTER TABLE
command will place an exclusive access on the table, but the duration of the command should be quick.
[1] https://www.postgresql.org/docs/current/sql-altertable.html
// We query a random nonce by first selecting any available candidate from the | ||
// total set, applying an upper limit of 100, and _then_ randomly shuffling the | ||
// results and selecting the first. By bounding the size before ORDER BY random(), | ||
// we avoid having to shuffle large sets of results. | ||
// | ||
// Previously, we would use OFFSET FLOOR(RANDOM() * 100). However, if the pool | ||
// (post filter) size was less than 100, any selection > pool size would result | ||
// in the OFFSET being set to zero. This meant random() disappeared for a subset | ||
// of values. In practice, this would result in a bias, and increased contention. | ||
// | ||
// For example, 50 Available nonce's, 25 Claimed (expired), 25 Reserved. With Offset: | ||
// | ||
// 1. 50% of the time would be a random Available. | ||
// 2. 25% of the time would be a random expired Claimed. | ||
// 3. 25% of the time would be _the first_ Available. | ||
// | ||
// This meant that 25% of the time would not be random. As we pull from the pool, | ||
// this % only increases, further causing contention. | ||
// | ||
// Performance wise, this approach is slightly worse, but the vast majority of the | ||
// time is spent on the scan and filter. Below are two example query plans (from a | ||
// small dataset in an online editor). | ||
// | ||
// QUERY PLAN (OFFSET): | ||
// | ||
// Limit (cost=17.80..35.60 rows=1 width=140) (actual time=0.019..0.019 rows=0 loops=1) | ||
// -> Seq Scan on codewallet__core_nonce (cost=0.00..17.80 rows=1 width=140) (actual time=0.016..0.017 rows=0 loops=1) | ||
// Filter: ((signature IS NOT NULL) AND (purpose = 1) AND ((state = 0) OR ((state = 2) AND (claim_expires_at < 200)))) | ||
// Rows Removed by Filter: 100 | ||
// | ||
// Planning Time: 0.046 ms | ||
// Execution Time: 0.031 ms | ||
// | ||
// QUERY PLAN (ORDER BY): | ||
// | ||
// Limit (cost=17.82..17.83 rows=1 width=148) (actual time=0.018..0.019 rows=0 loops=1) | ||
// -> Sort (cost=17.82..17.83 rows=1 width=148) (actual time=0.018..0.018 rows=0 loops=1) | ||
// Sort Key: (random()) | ||
// Sort Method: quicksort Memory: 25kB | ||
// -> Subquery Scan on sub (cost=0.00..17.81 rows=1 width=148) (actual time=0.015..0.016 rows=0 loops=1) | ||
// -> Limit (cost=0.00..17.80 rows=1 width=140) (actual time=0.015..0.015 rows=0 loops=1) | ||
// -> Seq Scan on codewallet__core_nonce (cost=0.00..17.80 rows=1 width=140) (actual time=0.015..0.015 rows=0 loops=1) | ||
// Filter: ((signature IS NOT NULL) AND (purpose = 1) AND ((state = 0) OR ((state = 2) AND (claim_expires_at < 200)))) | ||
// Rows Removed by Filter: 100 | ||
// | ||
// Planning Time: 0.068 ms | ||
// Execution Time: 0.037 ms | ||
// | ||
// Overall, the Seq Scan and Filter is the bulk of the work, with the ORDER BY RANDOM() | ||
// adding a small (fixed) amount of overhead. The trade-off is negligible time complexity | ||
// for more reliable semantics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a thing to really double check.
- I discovered this oddity when testing the distributions (which was also highlighted in previous comments). I think this change is likely worth it given the minuscule (theoretical) performance difference.
- The previous approach had a fallback query, which I think was to address the
OFFSET
selecting an offset out of range, but in testing theOFFSET
was just set to 0. The new approach removes this issue (but if it was for another issue, that's important to know!)
ClaimExpiresAt: time.Now().Add(time.Hour), | ||
}, | ||
} { | ||
require.Error(t, invalid.Validate()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this wasn't really tested or used much before? Will be using more of it in the upcoming functions I imagine
} { | ||
for i := 0; i < 500; i++ { | ||
for i := 0; i < 50; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shrunk this in size to fit Available, Claimed, Claimed(Expired)
within the LIMIT
window.
Add preliminary (backwards compatible) support for a Claimed nonce state. This state will be used in the future to indicate that a specific server has 'claimed' a nonce for future use.
In this future system, servers will 'claim' a batch of Available nonces for use in future transactions. This enables servers to cache nonces to help reduce the overhead on acquiring a nonce per-intent. The defense mechanism to abandoned claim is that claims will expire. Healthy servers will refresh their claimed nonces periodically in the background.