-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change how we store SKF for performance gains #256
Conversation
SKF are stored with the timestamp of the last time they were used to make it more likely to hit a session key for a packet early. An ets ordered_set table was used to achieve this, but it follows total sorting, which means the timestamp needs to be the first element in the table if we want that to take precedence. Part of removing, or updating, a SKF involved `ets:select_delete/2` which requires a full table scan because we only had the SessionKey at the time of deletion. When maybe removes or updates come in from the config service, all of those updates would be spawned into their own process and contend for the SKF table at the same time. For sufficiently large ets tables (+10k) and sufficiently large batches of updates (1k), this would cause all schedulers to have trouble doing anything while tables scans were happening. This pausing would mean the grpc connection could not talk back to the config-service and negotiate a slower rate of updates, causing them to get out of order, and grpcbox would kill the connection. And it would all start again. The structure of the table has been flattened, so we can tell ets the key position is the SessionKey in the second slot. It will still respect total ordering with the timestamp in the first position, but we can now do deletes in constant time with only the SessionKey. 💃
The test data I was using mislead me. It also sorts on keypos. :sigh: |
0 -> lager:debug(MD, "inserted SKF"); | ||
_ -> lager:debug([{deleted, Deleted} | MD], "updated SKF") | ||
end; | ||
true = do_remove_skf(SKFETS, SKF), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make them return ok
for both?
ordered_set, | ||
{read_concurrency, true}, | ||
{write_concurrency, true}, | ||
{keypos, 2}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk if this will work as the ordered_set
will be on the key. Can you 2x check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one key, one object, ordered in Erlang term order
Maybe it's fine
@@ -572,7 +570,10 @@ do_insert_skf(SKFETS, SKF) -> | |||
%% This is negative to make newest time on top | |||
Now = erlang:system_time(millisecond) * -1, | |||
true = ets:insert(SKFETS, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe create a record for this?
SKF are stored with the timestamp of the last time they were used to make it more likely to hit a session key for a packet early. An ets ordered_set table was used to achieve this, but it follows total sorting, which means the timestamp needs to be the first element in the table if we want that to take precedence.
Part of removing, or updating, a SKF involved
ets:select_delete/2
which requires a full table scan because we only had the SessionKey at the time of deletion. When maybe removes or updates come in from the config service, all of those updates would be spawned into their own process and contend for the SKF table at the same time. For sufficiently large ets tables (+10k) and sufficiently large batches of updates (1k), this would cause all schedulers to have trouble doing anything while tables scans were happening. This pausing would mean the grpc connection could not talk back to the config-service and negotiate a slower rate of updates, causing them to get out of order, and grpcbox would kill the connection. And it would all start again.The structure of the table has been flattened, so we can tell ets the key position is the SessionKey in the second slot. It will still respect total ordering with the timestamp in the first position, but we can now do deletes in constant time with only the SessionKey. 💃