-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce support for tablets #137
Introduce support for tablets #137
Conversation
1d791a9
to
d0d5b66
Compare
Please add an experimental setting and require the user to enable it. Tablets are still experimental in ScyllaDB and this allows us to change the tablets table format, but if drivers start to rely on it, then we cannot. |
d8bf274
to
b217867
Compare
Added an experimental setting and corrected |
Unfortunately we'll need to add such a flag to scylla-bench and plumb it down, but that's the price of compatibility. |
4fa94d0
to
9bb1f52
Compare
host_source.go
Outdated
func findTabletForToken(tablets []*TabletInfo, token token, l int) *TabletInfo { | ||
r := l + tablets[l].TabletCount() - 1 | ||
for l < r { | ||
m := (l + r) / 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this l+r
overflow is l
, r
are big?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could happen, I changed it a bit to prevent that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't m := l + (r-l)/2
still incorrect? For:
l, r - signed 64-bit ints
r = 2^63 - 5
l = -(2^63-5)
r - l = 2^63 - 5 - (-(2^63-5))
r - l = 2^63 - 5 + 2^63 - 5
r - l = 2^64 - 10 > range of 64-bit signed int
(some other context: https://stackoverflow.com/questions/19106350/explanation-of-the-safe-average-of-two-numbers)
It's interesting that the tests didn't catch the problem, so either they are not sufficient enough to catch the bug or this code is actually correct for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it is incorrect it only solves the problem for two positive or two negative variables, I think that the tests are not sufficient enough to catch this bug
34b5c75
to
89c9190
Compare
89c9190
to
3b66037
Compare
} | ||
} | ||
|
||
assertEqual(t, "queriedHosts", 1, queriedHosts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a (temporary?) print here of all hostAddresses
(before strings.Contains
filter) so you can debug the failing CI.
a575552
to
ceb8294
Compare
cd4ff97
to
898567d
Compare
49062b9
to
d2f016f
Compare
641af7f
to
8339f5a
Compare
I opened #155 as an umbrella issue |
@sylwiaszunejko when do you plan to get back to this PR? |
@bhalevy It is ready for review, but @avelanarius is on PTO, and it has to wait until he is back. |
8339f5a
to
1bdb234
Compare
@avelanarius I adapted the code to recent changes in the core. Now all tablet info are sent under one key in |
With the negotiation thing, this is no longer needed. Also, the cover letter is outdated. |
Do I understand correctly that I should delete experimental setting and replace it with checking if ScyllaDB supports |
Yes; you must check for it anyway, otherwise the server won't send routing info. |
In order for Scylla to send the tablet info, the driver must tell the database during connection handshake that it is able to interpret it. It was implemented similarly to the already-present LWT optimisation extension negotiation.
Add mechanism to parse system.tablets periodically. In TokenAwareHostPolicy check if keyspace uses tablets if so try to use them to find replicas. Make shard awareness work when using tablets.
tablet_integration_test.go: Integration tests checking if TokenAwareHostPolicy and shard awareness works correctly when using tablets, and if adding new table changes tablets table. tablet_test.go: Unit tests checking if searching in tablets list works correctly.
1bdb234
to
a98a393
Compare
experimental setting removed |
idx := p.shardOf(mmt) | ||
idx := -1 | ||
|
||
p.conns[0].mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sylwiaszunejko This code seems to assume that p.conns[0]
is never nil
. I don't think this is always true. AFAIK p.conns
is indexed by the shard of the connection. It may temporarily happen that there is no connection to shard 0 for some time (e.g. the first connection to the node landed on a different shard than 0).
I think this might be the cause of scylladb/scylla-bench#134. You can try reproducing it by writing with a high concurrency and shutting down a node or starting one. The node should have more than one shard set up (the more, the easier to reproduce).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are probably right about the root cause of this issue, in this part of the code I needed to take any connection just to check if tabletsRoutingV1
is true, but taking p.conns[0]
was definitely not the right approach to do that. I will try to confirm that it is true and fix it. Thank you for your feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR introduces changes to the driver that are necessary for shard-awareness and token-awareness to work effectively with the tablets.
Now if driver send the request to the wrong node/shard it will get the correct tablet information from Scylla in
custom_payload
. It uses this information to obtain target replicas and shard numbers for tables managed by tablet replication.I also added parsing
TABLETS_ROUTING_V1
extension. In order for Scylla to send the tablet info, the driver must tell the database during connection handshake that it is able to interpret it. It was implemented similarly to the already-present LWT optimization extension negotiation.The metadata for tablets are provided to the user.
To test this change I created both integration and unit tests.
Fixes: #155