-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] feat: add YDB as a new database engine #31141
base: master
Are you sure you want to change the base?
Conversation
4389a74
to
7ab18c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #31141 +/- ##
===========================================
+ Coverage 60.48% 76.88% +16.39%
===========================================
Files 1931 537 -1394
Lines 76236 38976 -37260
Branches 8568 0 -8568
===========================================
- Hits 46114 29966 -16148
+ Misses 28017 9010 -19007
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
7ab18c0
to
8c5ba7f
Compare
8c5ba7f
to
129462a
Compare
Default is `grpc`. | ||
|
||
|
||
##### Authenticaions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it ought to say "authentication methods" or something similar really since authentication itself can't be pluralised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
superset/db_engine_specs/ydb.py
Outdated
from typing import ( | ||
Any, | ||
TYPE_CHECKING, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you install and run the pre-commit hook? I think that'd stick these on one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
supports_file_upload = False | ||
|
||
_time_grain_expressions = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you find this was necessary to make it work? I would have thought the base spec's empty dict would be ok here if you're not configuring any custom time grain expressions, and several other specs aren't setting it. But I do see this None: "{col}"
entry in other specs which do set them so maybe I'm missing something which needs this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed for one test to pass (it checks that len of this dict is greater than 0)
I had an issue with generated query after filling this dict in correct for YDB way https://apache-superset.slack.com/archives/C014LS99C1K/p1732544528541239
TLDR we can't work with generated queries because YDB expecting something like
SELECT DateTime::StartOf(date, Interval('P1D')) AS date, count(client_id) AS `COUNT(client_id)`
FROM (SELECT * from transactions_data
) AS virtual_table GROUP BY date ORDER BY `COUNT(client_id)` DESC
LIMIT CAST(1000 AS UInt64);
instead of
SELECT DateTime::StartOf(date, Interval('P1D')) AS date, count(client_id) AS `COUNT(client_id)`
FROM (SELECT * from transactions_data
) AS virtual_table GROUP BY DateTime::StartOf(date, Interval('P1D')) AS date ORDER BY `COUNT(client_id)` DESC
LIMIT CAST(1000 AS UInt64);
Let me know if I can override something to make group by use aliases as well as order by.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind :)
superset/db_engine_specs/ydb.py
Outdated
credentials_info = encrypted_extra.pop("credentials", {}) | ||
credentials = None | ||
if "username" in credentials_info: | ||
from ydb import StaticCredentials |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit simpler just to collect these imports at the top of the method rather than place them individually in various if/else branches; you can't put them at the top of the file because of the driver being an optional dependency, but I don't think you need to go this extreme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ServiceAccountCredentials will need additional dependency (I've updated docs), so I can't just import it for every YDB usage. It feels weird to have 2 imports moved up, but 1 not, so I left every import under if's
superset/db_engine_specs/ydb.py
Outdated
connect_args = params.setdefault("connect_args", {}) | ||
|
||
if "protocol" in encrypted_extra: | ||
connect_args["protocol"] = encrypted_extra.pop("protocol", "grpc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest avoiding mutating encrypted_extra
in case the mutated data gets saved back to the database accidentally at some point. Just use .get()
instead.
Also that "grpc"
string will never be used since you've checked the existence of the key first, you probably just meant to do
connect_args["protocol"] = encrypted_extra.pop("protocol", "grpc")
in all cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
superset/db_engine_specs/ydb.py
Outdated
) | ||
elif "service_account_json" in credentials_info: | ||
from ydb.iam import ServiceAccountCredentials | ||
sa_json = credentials_info.pop("service_account_json", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default args would never be used here or on L81 for same reason as above; you can just index it since you've checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Also the logic in your spec should have some unit tests. |
@giftig No ephemeral environment action detected. Please use '/testenv up' or '/testenv down'. View workflow run. |
/testenv up |
Can you also provide some test instructions for how to verify this is working / describe what you tested, please? I'm assuming it's mostly just install the ydb driver, connect a ydb database, and try running some queries and building some dashboards, but maybe others will catch some edge case testing which may be useful with a new engine spec. |
@giftig Ephemeral environment spinning up at http://18.237.77.159:8080. Credentials are |
4b28fb7
to
a414bb7
Compare
a414bb7
to
0dcc907
Compare
SUMMARY
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION