Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to work with Databricks SDK v0.38.0 #350

Merged
merged 7 commits into from
Nov 19, 2024

Conversation

JCZuurmond
Copy link
Member

@JCZuurmond JCZuurmond commented Nov 19, 2024

Changes to work with Databricks SDK v0.38.0

Resolves #349
Resolves #348
Resolves #347
Resolves #346
Resolves #345
Resolves #344
Resolves #343
Resolves #342
Resolves #341
Resolves #340
Resolves #339
Resolves #338
Resolves #337
Resolves #336
Resolves #335
Resolves #334
Resolves #333
Resolves #332

@JCZuurmond JCZuurmond added the upstream these issues are caused by upstream dependencies label Nov 19, 2024
@JCZuurmond JCZuurmond requested a review from asnare November 19, 2024 10:52
@JCZuurmond JCZuurmond self-assigned this Nov 19, 2024
Copy link

github-actions bot commented Nov 19, 2024

✅ 36/36 passed, 4 skipped, 4m37s total

Running from acceptance #462

pyproject.toml Show resolved Hide resolved
Comment on lines 42 to 45
results = set()
see = StatementExecutionExt(ws, warehouse_id=env_or_skip("TEST_DEFAULT_WAREHOUSE_ID"))
for pickup_zip, dropoff_zip in see.fetch_all(
"SELECT pickup_zip, dropoff_zip FROM nyctaxi.trips LIMIT 10", catalog="samples"
):
results.append((pickup_zip, dropoff_zip))
assert results == [
(10282, 10171),
(10110, 10110),
(10103, 10023),
(10022, 10017),
(10110, 10282),
(10009, 10065),
(10153, 10199),
(10112, 10069),
(10023, 10153),
(10012, 10003),
]


def test_sql_execution_partial(ws, env_or_skip):
results = []
for pickup_zip, dropoff_zip, *_ in see.fetch_all(NYC_TAXI_LIMITED_TRIPS, catalog="samples"):
results.add((pickup_zip, dropoff_zip))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check the result-set precisely, results needs to remain a list.

Because the query orders the rows, order-sensitive equality checking is fine in this case.

It's not needed here, but in general for order-insensitive comparison the pattern is:

results = […]
expected = {…}
assert len(results) == len(expected)
assert set(results) == expected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should clarify: right now this is performing neither an order-sensitive nor an order-insensitive check of the results, but rather something in-between.

In this case I think the comparison is supposed to be order-sensitive (because the query is), so please use lists for both results and the check against the expected set of results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example you provided does not work as assert len(results) == len(expected) will assert to False as there are duplicates in results

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the integration test is just there to test we get some output back

Copy link
Contributor

@asnare asnare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly fine, with a quibble in the way query results are being checked in a test.

Comment on lines 42 to 45
results = set()
see = StatementExecutionExt(ws, warehouse_id=env_or_skip("TEST_DEFAULT_WAREHOUSE_ID"))
for pickup_zip, dropoff_zip in see.fetch_all(
"SELECT pickup_zip, dropoff_zip FROM nyctaxi.trips LIMIT 10", catalog="samples"
):
results.append((pickup_zip, dropoff_zip))
assert results == [
(10282, 10171),
(10110, 10110),
(10103, 10023),
(10022, 10017),
(10110, 10282),
(10009, 10065),
(10153, 10199),
(10112, 10069),
(10023, 10153),
(10012, 10003),
]


def test_sql_execution_partial(ws, env_or_skip):
results = []
for pickup_zip, dropoff_zip, *_ in see.fetch_all(NYC_TAXI_LIMITED_TRIPS, catalog="samples"):
results.add((pickup_zip, dropoff_zip))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should clarify: right now this is performing neither an order-sensitive nor an order-insensitive check of the results, but rather something in-between.

In this case I think the comparison is supposed to be order-sensitive (because the query is), so please use lists for both results and the check against the expected set of results.

@JCZuurmond JCZuurmond requested a review from asnare November 19, 2024 12:57
Copy link
Contributor

@asnare asnare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@JCZuurmond JCZuurmond merged commit 348cb43 into main Nov 19, 2024
8 checks passed
@JCZuurmond JCZuurmond deleted the fix/handle-sdk-release-0380 branch November 19, 2024 13:21
JCZuurmond added a commit that referenced this pull request Nov 19, 2024
* Changes to work with Databricks SDK `v0.38.0` ([#350](#350)). In this release, the open-source library has been updated to be compatible with Databricks SDK version 0.38.0, addressing issues [#349](#349) to [#332](#332). The changes include modifying the `create_dashboard` function, which now directly passes the `SDKDashboard` instance to the `ws.lakeview.create` and `ws.lakeview.update` methods, eliminating the need for dictionary conversions. A new SQL query, `NYC_TAXI_TRIPS_LIMITED`, has been introduced, and the `test_sql_execution` method has been updated to use this query. The `test_sql_execution_partial` and `test_sql_execution_as_iterator` methods have been removed, and the `test_fetch_one_works` method now includes an assertion to verify the returned row's `pickup_zip` value. These updates improve the library's compatibility with the latest Databricks SDK, refactor SQL query-based tests, and enhance test reliability.
* Specify the minimum required version of `databricks-sdk` as 0.37.0 ([#331](#331)). In this release, we have updated the minimum required version of the `databricks-sdk` package to 0.37.0, as specified in the project's `pyproject.toml` file. This update is necessary due to the modifications made in pull request [#320](#320), which constrains the `databricks-sdk` version to 0.37.x for compatible updates within the same minor version. This change also resolves issue [#330](#330). It is important to note that no new methods have been added or existing functionality changed in the codebase as part of this commit. Therefore, the impact on the existing functionality should be minimal and confined to the interaction with the `databricks-sdk` package.
@JCZuurmond JCZuurmond mentioned this pull request Nov 19, 2024
JCZuurmond added a commit that referenced this pull request Nov 19, 2024
* Changes to work with Databricks SDK `v0.38.0` ([#350](#350)). In this release, we have upgraded the Databricks SDK to version 0.38.0 from version 0.37.0 to ensure compatibility with the latest SDK and address several issues. The update includes changes to make the code compatible with the new SDK version, removing the need for `.as_dict()` method calls when creating or updating dashboards and utilizing a `sdk_dashboard` variable for interacting with the Databricks workspace. We also updated the dependencies to "databricks-labs-blueprint[yaml]" package version greater than or equal to 0.4.2 and `sqlglot` package version greater than or equal to 22.3.1. The `test_core.py` file has been updated to address multiple issues ([#349](#349) to [#332](#332)) related to the Databricks SDK and the `test_dashboards.py` file has been revised to work with the new SDK version. These changes improve integration with Databricks' lakeview dashboards, simplify the code, and ensure compatibility with the latest SDK version, resolving issues [#349](#349) to [#332](#332).
* Specify the minimum required version of `databricks-sdk` as 0.37.0 ([#331](#331)). In this release, we have updated the minimum required version of the `databricks-sdk` package to 0.37.0 from 0.29.0 in the `pyproject.toml` file to ensure compatibility with the latest version. This change was made necessary due to updates made in issue [#320](#320). To accommodate any patch release of `databricks-sdk` with a major and minor version of 0.37, we have updated the dependency constraint to use the `~=` operator, resolving issue [#330](#330). These changes are intended to enhance the compatibility and stability of our software.
@JCZuurmond JCZuurmond mentioned this pull request Nov 19, 2024
JCZuurmond added a commit that referenced this pull request Nov 19, 2024
* Changes to work with Databricks SDK `v0.38.0`
([#350](#350)). In this
release, we have upgraded the Databricks SDK to version 0.38.0 from
version 0.37.0 to ensure compatibility with the latest SDK and address
several issues. The update includes changes to make the code compatible
with the new SDK version, removing the need for `.as_dict()` method
calls when creating or updating dashboards and utilizing a
`sdk_dashboard` variable for interacting with the Databricks workspace.
We also updated the dependencies to "databricks-labs-blueprint[yaml]"
package version greater than or equal to 0.4.2 and `sqlglot` package
version greater than or equal to 22.3.1. The `test_core.py` file has
been updated to address multiple issues
([#349](#349) to
[#332](#332)) related to
the Databricks SDK and the `test_dashboards.py` file has been revised to
work with the new SDK version. These changes improve integration with
Databricks' lakeview dashboards, simplify the code, and ensure
compatibility with the latest SDK version, resolving issues
[#349](#349) to
[#332](#332).
* Specify the minimum required version of `databricks-sdk` as 0.37.0
([#331](#331)). In this
release, we have updated the minimum required version of the
`databricks-sdk` package to 0.37.0 from 0.29.0 in the `pyproject.toml`
file to ensure compatibility with the latest version. This change was
made necessary due to updates made in issue
[#320](#320). To
accommodate any patch release of `databricks-sdk` with a major and minor
version of 0.37, we have updated the dependency constraint to use the
`~=` operator, resolving issue
[#330](#330). These changes
are intended to enhance the compatibility and stability of our software.
gueniai added a commit that referenced this pull request Nov 19, 2024
* Changes to work with Databricks SDK `v0.38.0` ([#350](#350)). In this release, we have updated the Databricks SDK from version 0.37 to 0.38 to ensure compatibility with the latest SDK version. This update includes modifications to the `test_dashboards.py` file, where the `create_dashboard` function has been updated to use the new `sdk_dashboard` object directly, eliminating the need for dictionary conversion using the `as_dict()` method. This change has been implemented in both the `ws.lakeview.create` and `ws.lakeview.update` methods. These updates address several issues, as listed in the commit message, which were related to compatibility and functionality with the previous SDK version. Additionally, new methods such as `test_sql_execution`, `test_sql_execution_as_iterator`, and an updated `test_fetch_one_works` method have been introduced to improve compatibility with the Databricks SDK version 0.38.0. These changes are essential for adopters of the project who wish to ensure compatibility with the latest version of the Databricks SDK.
* Release v0.14.1 ([#352](#352)). 0.14.1 release brings updates for compatibility with Databricks SDK v0.38.0, addressing several issues and improving code compatibility with the new SDK version. It removes the need for `.as_dict()` method calls when creating or updating dashboards, introduces a `sdk_dashboard` variable for interacting with the Databricks workspace, and updates dependencies to include the "databricks-labs-blueprint[yaml]" package version 0.4.2 or greater and `sqlglot` package version 22.3.1 or greater. Test files have been revised to address multiple issues related to the Databricks SDK, and the minimum required version of `databricks-sdk` has been updated to 0.37.0 from 0.29.0. These changes improve integration with Databricks' lakeview dashboards, simplify the code, and enhance compatibility and stability of the software.
* Specify the minimum required version of `databricks-sdk` as 0.37.0 ([#331](#331)). In this release, we have updated the minimum required version of the `databricks-sdk` library to a version compatible with 0.37.0. This update is necessary due to changes introduced in pull request [#320](#320). We now specify the `databricks-sdk` version using the "~=" operator, indicating that any version with a major version number of 0 and minor version number of 37 is acceptable. This allows for future updates while ensuring compatibility with the required features. This change enhances the reliability and compatibility of the project by ensuring that it utilizes a recent and supported version of the `databricks-sdk`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment