-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normalize databricks paths as part of resolving them #157
Conversation
✅ 40/40 passed, 2 skipped, 2m9s total Running from acceptance #225 |
return absolute | ||
return absolute.normalize() | ||
|
||
def normalize(self: P) -> P: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick initial impression while I have a closer look at this: we should not be introducing new public methods like this, they're not part of the Path()
API being emulated here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. this method is a good private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree but we're already doing this: see is_notebook()
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
for index, part in enumerate(self._path_parts): | ||
# we can't normalize '/../stuff' so let's ignore such scenarios | ||
if index == 0 or part != "..": | ||
continue | ||
segments = list(self._path_parts) | ||
segments.pop(index) | ||
segments.pop(index - 1) | ||
return self.with_segments(self.anchor, *segments).normalize() | ||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for index, part in enumerate(self._path_parts): | |
# we can't normalize '/../stuff' so let's ignore such scenarios | |
if index == 0 or part != "..": | |
continue | |
segments = list(self._path_parts) | |
segments.pop(index) | |
segments.pop(index - 1) | |
return self.with_segments(self.anchor, *segments).normalize() | |
return self | |
stack = [] | |
for part in self._path_parts: | |
if part == "..": | |
if stack: | |
stack.pop() | |
elif part == "." or not part: | |
continue | |
else: | |
stack.append(part) | |
return self.with_segments(self.anchor, *stack) |
can we perhaps get a non-recursive option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return absolute | ||
return absolute.normalize() | ||
|
||
def normalize(self: P) -> P: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. this method is a good private
@@ -520,7 +520,18 @@ def resolve(self: P, strict: bool = False) -> P: | |||
if strict and not absolute.exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't self.absolute()
fail because self.cwd()
is not available for _DatabricksPath
?
def absolute(self: P) -> P:
if self.is_absolute():
return self
return self.with_segments(self.cwd(), self)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will but that's not the problem being addressed by this PR
elif part is None or part == '.': | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed: by the time we get here a part can neither be None
nor .
?
else: | ||
segments.append(part) | ||
# pylint: disable=protected-access | ||
return self.with_segments(self.anchor, *segments)._normalize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unclear to me why we need to also call ._normalize()
on the new instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* Bump actions/checkout from 4.1.7 to 4.2.0 ([#149](#149)). In this pull request, the `actions/checkout` dependency is upgraded from version 4.1.7 to 4.2.0 in the `acceptance.yml` and `downstreams.yml` workflow files. The new version provides additional Ref and Commit outputs, as well as updated dependencies, which aim to improve the functionality and security of the checkout process. The `Ref` output is a string representing the reference that was checked out, and the `Commit` output is the SHA-1 hash of the checked-out commit. Dependency updates include bumping the `braces` package from 3.0.2 to 3.0.3 and updating the minor-npm-dependencies group across one directory with four updates. These changes contribute to a more reliable and efficient checkout process and enhance the overall functionality and maintainability of the Action. Software engineers are recommended to review the changes and ensure they do not introduce conflicts with their current setup before adopting the new version. * Bump actions/checkout from 4.2.0 to 4.2.1 ([#152](#152)). In this update, the version of the `actions/checkout` GitHub Action is bumped from 4.2.0 to 4.2.1 in a project's GitHub workflow files. This new version includes a modification to check out other `refs/*` by commit if provided, falling back to the ref. This change enhances the flexibility of the `checkout` action in handling different types of references, which could be useful for users working with multiple branches or references in their workflows. The update also adds a workflow file for publishing releases to an immutable action package. This release was contributed by the new project collaborator, @orhantoy, who made the change in pull request [1924](https://redirect.github.com/actions/checkout/pull/1924). * Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 ([#155](#155)). In this update, the dependency for `databrickslabs/sandbox` has been bumped from version `acceptance/v0.3.0` to `0.3.1`. This change includes bug fixes, upgrades to go-git libraries, and dependency updates. The `golang.org/x/crypto` library was specifically bumped from version `0.16.0` to `0.17.0` in both `/go-libs` and `/runtime-packages`. Additionally, the `cac167b` commit expanded acceptance test logs and introduced experimental OIDC refresh token rotation. The acceptance test job in the workflow was also updated to use the new version of `databrickslabs/sandbox`. Ignore conditions were added for previous versions of `databrickslabs/sandbox` in this release. The README was also modified, and install instructions were added to the changelog. * Catch all errors when checking Databricks path, notably BadRequest ones ([#156](#156)). This commit introduces improvements to the error handling of the `exists` method in the `paths.py` file when checking Databricks path. Previously, only `NotFound` errors were caught, but now `BadRequest` errors are also handled, addressing issue [#2882](https://github.com/databrickslabs/blueprint/issues/2882). The `exists` method has been updated to catch and manage `DatabricksError` exceptions, which now encompass `BadRequest` errors, ensuring comprehensive error handling for Databricks path-related operations. Additionally, the `_cached_file_info` and `_cached_object_info` attributes are now initialized when a `DatabricksError` exception occurs, returning `False` accordingly. This enhancement maintains consistency and accuracy in the `exists` method while broadening the range of errors captured, resulting in a more robust and reliable codebase with enhanced error reporting for users. * Normalize databricks paths as part of resolving them ([#157](#157)). In this release, the `resolve` method in the `paths.py` file of the databricks/labs/blueprint project has been enhanced to handle parent directory references ("..") consistently with Python's built-in `Path` object. Previously, `Path("/a/b/../c").resolve()` would return `Path("/a/b/c")`, while Databricks paths were not behaving consistently. This modification introduces a new `_normalize()` method, which processes the path parts and ensures that ".." segments are handled correctly. The commit also includes a new test function, 'test_resolve_is_consistent', which checks the consistent resolution of Databricks paths with various input formats, such as relative paths, ".." or "." components, and absolute paths. This change ensures that the resolved path will be normalized according to the expected behavior, regardless of the input format, contributing to the resolution of issue [#2882](https://github.com/databrickslabs/blueprint/issues/2882). By normalizing Databricks paths in the same fashion as Python's built-in `Path` object, the code should become more robust and predictable, providing a more reliable and predictable experience for software engineers utilizing the project. * Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 ([#153](#153)). In this pull request, the `databrickslabs/sandbox` package requirement in the downstreams GitHub Actions workflow is updated to version 0.3.0, which is the latest version available. This package provides a sandbox environment for development and testing, and the new version includes bug fixes and dependency updates that may enhance its reliability and performance. Dependabot has been used to ensure a smooth update process, with any conflicts being resolved automatically. However, it is recommended to review the changelog and test the updated version before merging this pull request to ensure compatibility and functionality in your specific use case. Additionally, Dependabot commands are available to manage ignore conditions for this dependency. Dependency updates: * Bump actions/checkout from 4.1.7 to 4.2.0 ([#149](#149)). * Bump actions/checkout from 4.2.0 to 4.2.1 ([#152](#152)). * Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 ([#153](#153)). * Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 ([#155](#155)).
* Bump actions/checkout from 4.1.7 to 4.2.0 ([#149](#149)). In this pull request, the `actions/checkout` dependency is upgraded from version 4.1.7 to 4.2.0 in the `acceptance.yml` and `downstreams.yml` workflow files. The new version provides additional Ref and Commit outputs, as well as updated dependencies, which aim to improve the functionality and security of the checkout process. The `Ref` output is a string representing the reference that was checked out, and the `Commit` output is the SHA-1 hash of the checked-out commit. Dependency updates include bumping the `braces` package from 3.0.2 to 3.0.3 and updating the minor-npm-dependencies group across one directory with four updates. These changes contribute to a more reliable and efficient checkout process and enhance the overall functionality and maintainability of the Action. Software engineers are recommended to review the changes and ensure they do not introduce conflicts with their current setup before adopting the new version. * Bump actions/checkout from 4.2.0 to 4.2.1 ([#152](#152)). In this update, the version of the `actions/checkout` GitHub Action is bumped from 4.2.0 to 4.2.1 in a project's GitHub workflow files. This new version includes a modification to check out other `refs/*` by commit if provided, falling back to the ref. This change enhances the flexibility of the `checkout` action in handling different types of references, which could be useful for users working with multiple branches or references in their workflows. The update also adds a workflow file for publishing releases to an immutable action package. This release was contributed by the new project collaborator, @orhantoy, who made the change in pull request [1924](https://redirect.github.com/actions/checkout/pull/1924). * Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 ([#155](#155)). In this update, the dependency for `databrickslabs/sandbox` has been bumped from version `acceptance/v0.3.0` to `0.3.1`. This change includes bug fixes, upgrades to go-git libraries, and dependency updates. The `golang.org/x/crypto` library was specifically bumped from version `0.16.0` to `0.17.0` in both `/go-libs` and `/runtime-packages`. Additionally, the `cac167b` commit expanded acceptance test logs and introduced experimental OIDC refresh token rotation. The acceptance test job in the workflow was also updated to use the new version of `databrickslabs/sandbox`. Ignore conditions were added for previous versions of `databrickslabs/sandbox` in this release. The README was also modified, and install instructions were added to the changelog. * Catch all errors when checking Databricks path, notably BadRequest ones ([#156](#156)). This commit introduces improvements to the error handling of the `exists` method in the `paths.py` file when checking Databricks path. Previously, only `NotFound` errors were caught, but now `BadRequest` errors are also handled, addressing issue [#2882](https://github.com/databrickslabs/blueprint/issues/2882). The `exists` method has been updated to catch and manage `DatabricksError` exceptions, which now encompass `BadRequest` errors, ensuring comprehensive error handling for Databricks path-related operations. Additionally, the `_cached_file_info` and `_cached_object_info` attributes are now initialized when a `DatabricksError` exception occurs, returning `False` accordingly. This enhancement maintains consistency and accuracy in the `exists` method while broadening the range of errors captured, resulting in a more robust and reliable codebase with enhanced error reporting for users. * Normalize databricks paths as part of resolving them ([#157](#157)). In this release, the `resolve` method in the `paths.py` file of the databricks/labs/blueprint project has been enhanced to handle parent directory references ("..") consistently with Python's built-in `Path` object. Previously, `Path("/a/b/../c").resolve()` would return `Path("/a/b/c")`, while Databricks paths were not behaving consistently. This modification introduces a new `_normalize()` method, which processes the path parts and ensures that ".." segments are handled correctly. The commit also includes a new test function, 'test_resolve_is_consistent', which checks the consistent resolution of Databricks paths with various input formats, such as relative paths, ".." or "." components, and absolute paths. This change ensures that the resolved path will be normalized according to the expected behavior, regardless of the input format, contributing to the resolution of issue [#2882](https://github.com/databrickslabs/blueprint/issues/2882). By normalizing Databricks paths in the same fashion as Python's built-in `Path` object, the code should become more robust and predictable, providing a more reliable and predictable experience for software engineers utilizing the project. * Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 ([#153](#153)). In this pull request, the `databrickslabs/sandbox` package requirement in the downstreams GitHub Actions workflow is updated to version 0.3.0, which is the latest version available. This package provides a sandbox environment for development and testing, and the new version includes bug fixes and dependency updates that may enhance its reliability and performance. Dependabot has been used to ensure a smooth update process, with any conflicts being resolved automatically. However, it is recommended to review the changelog and test the updated version before merging this pull request to ensure compatibility and functionality in your specific use case. Additionally, Dependabot commands are available to manage ignore conditions for this dependency. Dependency updates: * Bump actions/checkout from 4.1.7 to 4.2.0 ([#149](#149)). * Bump actions/checkout from 4.2.0 to 4.2.1 ([#152](#152)). * Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 ([#153](#153)). * Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 ([#155](#155)).
## Changes Workspace API does not support relative subpaths such as "/a/b/../c". This PR fixes the issue by resolving workspace paths before calling the API. ### Linked issues Resolves #2882 Requires #databrickslabs/blueprint#156 Requires #databrickslabs/blueprint#157 ### Functionality None ### Tests - [x] added integration tests --------- Co-authored-by: Eric Vergnaud <[email protected]> Co-authored-by: Serge Smertin <[email protected]>
…databricks-labs-blueprint` (#2950) ## Changes In #2920 we introduced a change to ensure that notebook paths are normalised, resolving #2882. This change depended on an upstream fix (databrickslabs/blueprint#157) included in the 0.9.1 release of the dependency. This PR ensures that we run against that release or later.
Path("/a/b/../c").resolve()
returnsPath("/a/c")
Databricks paths should behave the same, but currently don't.
This PR fixes the issue, which participates in databrickslabs/ucx#2882
Progresses databrickslabs/ucx#2882