Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] feat: add YDB as a new database engine #31141

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vgvoleg
Copy link

@vgvoleg vgvoleg commented Nov 25, 2024

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions bot added the doc Namespace | Anything related to documentation label Nov 25, 2024
@dosubot dosubot bot added data:connect Namespace | Anything related to db connections / integrations enhancement:db Suggest new DB connections labels Nov 25, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

Copy link

codecov bot commented Nov 25, 2024

Codecov Report

Attention: Patch coverage is 47.91667% with 25 lines in your changes missing coverage. Please review.

Project coverage is 76.88%. Comparing base (76d897e) to head (a414bb7).
Report is 1086 commits behind head on master.

Files with missing lines Patch % Lines
superset/db_engine_specs/ydb.py 47.91% 25 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #31141       +/-   ##
===========================================
+ Coverage   60.48%   76.88%   +16.39%     
===========================================
  Files        1931      537     -1394     
  Lines       76236    38976    -37260     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    29966    -16148     
+ Misses      28017     9010    -19007     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive ?
javascript ?
mysql 76.59% <47.91%> (?)
postgres 76.68% <47.91%> (?)
presto ?
python 76.88% <47.91%> (+13.39%) ⬆️
sqlite 76.14% <47.91%> (?)
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vgvoleg vgvoleg changed the title Add YDB as a new database engine feat: add YDB as a new database engine Nov 25, 2024
Default is `grpc`.


##### Authenticaions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it ought to say "authentication methods" or something similar really since authentication itself can't be pluralised.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

from typing import (
Any,
TYPE_CHECKING,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you install and run the pre-commit hook? I think that'd stick these on one line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


supports_file_upload = False

_time_grain_expressions = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you find this was necessary to make it work? I would have thought the base spec's empty dict would be ok here if you're not configuring any custom time grain expressions, and several other specs aren't setting it. But I do see this None: "{col}" entry in other specs which do set them so maybe I'm missing something which needs this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed for one test to pass (it checks that len of this dict is greater than 0)
I had an issue with generated query after filling this dict in correct for YDB way https://apache-superset.slack.com/archives/C014LS99C1K/p1732544528541239

TLDR we can't work with generated queries because YDB expecting something like

SELECT DateTime::StartOf(date, Interval('P1D')) AS date, count(client_id) AS `COUNT(client_id)` 
FROM (SELECT * from transactions_data
) AS virtual_table GROUP BY date ORDER BY `COUNT(client_id)` DESC
 LIMIT CAST(1000 AS UInt64);

instead of

SELECT DateTime::StartOf(date, Interval('P1D')) AS date, count(client_id) AS `COUNT(client_id)` 
FROM (SELECT * from transactions_data
) AS virtual_table GROUP BY DateTime::StartOf(date, Interval('P1D')) AS date ORDER BY `COUNT(client_id)` DESC
 LIMIT CAST(1000 AS UInt64);

Let me know if I can override something to make group by use aliases as well as order by.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind :)

credentials_info = encrypted_extra.pop("credentials", {})
credentials = None
if "username" in credentials_info:
from ydb import StaticCredentials
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit simpler just to collect these imports at the top of the method rather than place them individually in various if/else branches; you can't put them at the top of the file because of the driver being an optional dependency, but I don't think you need to go this extreme.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServiceAccountCredentials will need additional dependency (I've updated docs), so I can't just import it for every YDB usage. It feels weird to have 2 imports moved up, but 1 not, so I left every import under if's

connect_args = params.setdefault("connect_args", {})

if "protocol" in encrypted_extra:
connect_args["protocol"] = encrypted_extra.pop("protocol", "grpc")
Copy link
Contributor

@giftig giftig Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest avoiding mutating encrypted_extra in case the mutated data gets saved back to the database accidentally at some point. Just use .get() instead.

Also that "grpc" string will never be used since you've checked the existence of the key first, you probably just meant to do

connect_args["protocol"] = encrypted_extra.pop("protocol", "grpc")

in all cases

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

)
elif "service_account_json" in credentials_info:
from ydb.iam import ServiceAccountCredentials
sa_json = credentials_info.pop("service_account_json", {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default args would never be used here or on L81 for same reason as above; you can just index it since you've checked.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@giftig
Copy link
Contributor

giftig commented Nov 25, 2024

Also the logic in your spec should have some unit tests.

Copy link
Contributor

@giftig No ephemeral environment action detected. Please use '/testenv up' or '/testenv down'. View workflow run.

@giftig
Copy link
Contributor

giftig commented Nov 25, 2024

/testenv up

Copy link
Contributor

@giftig Processing your ephemeral environment request here.

@giftig
Copy link
Contributor

giftig commented Nov 25, 2024

Can you also provide some test instructions for how to verify this is working / describe what you tested, please? I'm assuming it's mostly just install the ydb driver, connect a ydb database, and try running some queries and building some dashboards, but maybe others will catch some edge case testing which may be useful with a new engine spec.

Copy link
Contributor

@giftig Ephemeral environment spinning up at http://18.237.77.159:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

@vgvoleg vgvoleg force-pushed the support-ydb-database branch 3 times, most recently from 4b28fb7 to a414bb7 Compare November 26, 2024 16:14
@vgvoleg vgvoleg changed the title feat: add YDB as a new database engine [WIP] feat: add YDB as a new database engine Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:connect Namespace | Anything related to db connections / integrations doc Namespace | Anything related to documentation enhancement:db Suggest new DB connections size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants