-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingestion/prefect-plugin): fixed the unit tests #10643
fix(ingestion/prefect-plugin): fixed the unit tests #10643
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe updates introduce a GitHub Actions workflow for the Prefect plugin, enhance artifact handling, and integrate Prefect with DataHub. Concurrently, Gradle and TypeScript changes support the new Prefect capabilities by including Prefect in the metadata ingestion modules, ensuring proper build processes, and updating documentation to feature the Prefect integration and corresponding examples. Changes
Sequence DiagramssequenceDiagram
participant Developer
participant GitHub Actions
participant DataHub
Developer ->> GitHub Actions: Pushes to master/PR/release
GitHub Actions ->> GitHub Actions: Trigger Prefect Plugin workflow
GitHub Actions ->> Python: Set up environment and dependencies
GitHub Actions ->> Gradle: Build metadata ingestion modules
GitHub Actions ->> Python: Run Prefect flow
Python ->> DataHub: Emit metadata events
GitHub Actions -->> Developer: Upload artifacts and report coverage
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 36
Outside diff range and nitpick comments (13)
.github/workflows/test-results.yml (1)
Line range hint
19-19
: Fix potential issues withread
and quoting.The
read
command should use the-r
option to prevent backslash mangling. Additionally, the variables should be quoted to prevent globbing and word splitting.- read name url <<< "$artifact" + IFS=$'\t' read -r name url <<< "$artifact" - gh api $url > "$name.zip" + gh api "$url" > "$name.zip"metadata-ingestion/developing.md (11)
Line range hint
67-67
: Add a comma after the introductory clause.A comma is needed after "pip install".
- If you've already run the pip install, but running `datahub` in your command line doesn't work, then there is likely an issue with your PATH setup and Python. + If you've already run the pip install, but running `datahub` in your command line doesn't work, then there is likely an issue with your PATH setup and Python.
Line range hint
112-112
: Avoid wordiness.Consider replacing "a variety of" with a more concise phrase.
- The sources pull metadata from a variety of data systems, while the sinks are primarily for moving this metadata into DataHub. + The sources pull metadata from various data systems, while the sinks are primarily for moving this metadata into DataHub.
Line range hint
116-116
: Avoid redundancy: "CLI interface".The phrase "CLI interface" is redundant. Use "CLI".
- The CLI interface is defined in [entrypoints.py](./src/datahub/entrypoints.py) and in the [cli](./src/datahub/cli) directory. + The CLI is defined in [entrypoints.py](./src/datahub/entrypoints.py) and in the [cli](./src/datahub/cli) directory.
Line range hint
117-117
: Use a hyphen for compound adjectives.Use a hyphen for "high level".
- The high level interfaces are defined in the [API directory](./src/datahub/ingestion/api). + The high-level interfaces are defined in the [API directory](./src/datahub/ingestion/api).
Line range hint
147-147
: Avoid repetition: "probably".The word "probably" is repeated in nearby sentences. Consider replacing it for variety.
- You probably should not be defining a `__hash__` method yourself. Using `@dataclass(frozen=True)` is a good way to get a hashable class. + You should not be defining a `__hash__` method yourself. Using `@dataclass(frozen=True)` is a good way to get a hashable class.
Line range hint
154-154
: Add a comma after "package".A comma is needed after "package".
- are not required by the "core" package but instead can be optionally installed using Python "extras". + are not required by the "core" package, but instead can be optionally installed using Python "extras".
Line range hint
162-162
: Avoid wordiness.Consider a shorter alternative to avoid wordiness.
- In order to ensure that the configs are consistent and easy to use, we have a few guidelines: + To ensure that the configs are consistent and easy to use, we have a few guidelines:
Line range hint
168-168
: Add a comma after "For example".A comma is needed after "For example".
- For example `client_id` or `tenant_id` over a bare `id` and `access_secret` over a bare `secret`. + For example, `client_id` or `tenant_id` over a bare `id` and `access_secret` over a bare `secret`.
Line range hint
38-38
: Add blank lines around heading.Headings should be surrounded by blank lines.
- ### (Optional) Set up your Python environment for developing on Prefect Plugin + + ### (Optional) Set up your Python environment for developing on Prefect Plugin +
Line range hint
37-37
: Add blank lines around fenced code blocks.Fenced code blocks should be surrounded by blank lines.
- ```shell + + ```shell
Line range hint
109-109
: Add alt text for images.Images should have alternate text.
- <img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/datahub-metadata-ingestion-framework.png"/> + <img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/datahub-metadata-ingestion-framework.png" alt="DataHub Metadata Ingestion Framework"/>docs-website/generateDocsDir.ts (1)
Line range hint
352-367
: Remove unnecessary else clause.The else clause can be omitted because previous branches break early.
- } else { - // Find first h1 header and use it as the title. - const headers = contents.content.match(/^# (.+)$/gm); - - if (!headers) { - throw new Error( - `${filepath} must have at least one h1 header for setting the title` - ); - } - - if (headers.length > 1 && contents.content.indexOf("```") < 0) { - throw new Error(`too many h1 headers in ${filepath}`); - } - title = headers[0].slice(2).trim(); - } + // Find first h1 header and use it as the title. + const headers = contents.content.match(/^# (.+)$/gm); + + if (!headers) { + throw new Error( + `${filepath} must have at least one h1 header for setting the title` + ); + } + + if (headers.length > 1 && contents.content.indexOf("```") < 0) { + throw new Error(`too many h1 headers in ${filepath}`); + } + title = headers[0].slice(2).trim();
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (24)
- .github/workflows/build-and-test.yml (2 hunks)
- .github/workflows/prefect-plugin.yml (1 hunks)
- .github/workflows/test-results.yml (1 hunks)
- docs-website/build.gradle (1 hunks)
- docs-website/generateDocsDir.ts (1 hunks)
- docs-website/sidebars.js (2 hunks)
- docs/lineage/prefect.md (1 hunks)
- metadata-ingestion-modules/prefect-plugin/.gitignore (1 hunks)
- metadata-ingestion-modules/prefect-plugin/README.md (1 hunks)
- metadata-ingestion-modules/prefect-plugin/build.gradle (1 hunks)
- metadata-ingestion-modules/prefect-plugin/pyproject.toml (1 hunks)
- metadata-ingestion-modules/prefect-plugin/scripts/release.sh (1 hunks)
- metadata-ingestion-modules/prefect-plugin/setup.cfg (1 hunks)
- metadata-ingestion-modules/prefect-plugin/setup.py (1 hunks)
- metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/init.py (1 hunks)
- metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/datahub_emitter.py (1 hunks)
- metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/entities.py (1 hunks)
- metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/example/flow.py (1 hunks)
- metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/example/save_block.py (1 hunks)
- metadata-ingestion-modules/prefect-plugin/tests/integration/integration_test_dummy.py (1 hunks)
- metadata-ingestion-modules/prefect-plugin/tests/unit/test_block_standards.py (1 hunks)
- metadata-ingestion-modules/prefect-plugin/tests/unit/test_datahub_emitter.py (1 hunks)
- metadata-ingestion/developing.md (2 hunks)
- settings.gradle (1 hunks)
Files not reviewed due to errors (1)
- docs-website/sidebars.js (no review received)
Files skipped from review due to trivial changes (5)
- metadata-ingestion-modules/prefect-plugin/pyproject.toml
- metadata-ingestion-modules/prefect-plugin/setup.cfg
- metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/init.py
- metadata-ingestion-modules/prefect-plugin/tests/integration/integration_test_dummy.py
- settings.gradle
Additional context used
GitHub Check: actionlint
.github/workflows/test-results.yml
[failure] 19-19:
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2162:info:5:91: read without -r will mangle backslashes [shellcheck]Raw Output:
.github/workflows/test-results.yml:19:9: shellcheck reported issue in this script: SC2162:info:5:91: read without -r will mangle backslashes [shellcheck]
[failure] 19-19:
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2162:info:7:13: read without -r will mangle backslashes [shellcheck]Raw Output:
.github/workflows/test-results.yml:19:9: shellcheck reported issue in this script: SC2162:info:7:13: read without -r will mangle backslashes [shellcheck]
[failure] 19-19:
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:8:10: Double quote to prevent globbing and word splitting [shellcheck]Raw Output:
.github/workflows/test-results.yml:19:9: shellcheck reported issue in this script: SC2086:info:8:10: Double quote to prevent globbing and word splitting [shellcheck]
LanguageTool
docs/lineage/prefect.md
[misspelling] ~15-~15: This word is normally spelled with a hyphen.
Context: ...ther Prefect Cloud (recommended) or the self hosted Prefect server. 2. Refer [Cloud Quickst...(EN_COMPOUNDS_SELF_HOSTED)
[misspelling] ~17-~17: This word is normally spelled with a hyphen.
Context: ...refect.io/latest/guides/host/) to setup self hosted Prefect server. 4. Make sure the Prefec...(EN_COMPOUNDS_SELF_HOSTED)
[uncategorized] ~40-~40: A comma is probably missing here.
Context: .../concepts/blocks/#saving-blocks). While saving you can provide below configurations. D...(MISSING_COMMA_AFTER_INTRODUCTORY_PHRASE)
[uncategorized] ~45-~45: Possible missing comma found.
Context: ...belong to. For more detail and possible values refer [here](https://datahubproject.io/...(AI_HYDRA_LEO_MISSING_COMMA)
[typographical] ~46-~46: Consider adding a comma here.
Context: ...y this recipe belong to. For more detail please refer [here](https://datahubproject.io/...(PLEASE_COMMA)
[uncategorized] ~100-~100: A comma may be missing after the conjunctive/linking adverb ‘Otherwise’.
Context: ...sks, user compulsory need to emit flow. Otherwise nothing will get emit. ## Concept mapp...(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
metadata-ingestion-modules/prefect-plugin/README.md
[style] ~25-~25: Consider a shorter alternative to avoid wordiness.
Context: ... Getting Started ### Setup DataHub UI In order to use 'prefect-datahub' collection, you'l...(IN_ORDER_TO_PREMIUM)
[uncategorized] ~30-~30: Possible missing preposition found.
Context: ...cessful deployment of DataHub will lead creation of DataHub GMS service running on 'http...(AI_HYDRA_LEO_MISSING_TO)
[uncategorized] ~36-~36: A comma is probably missing here.
Context: .../concepts/blocks/#saving-blocks). While saving you can provide below configurations. D...(MISSING_COMMA_AFTER_INTRODUCTORY_PHRASE)
[uncategorized] ~36-~36: You might be missing the article “the” here.
Context: ...g-blocks). While saving you can provide below configurations. Default value will get ...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[uncategorized] ~36-~36: A determiner appears to be missing. Consider inserting it.
Context: ...g you can provide below configurations. Default value will get set if not provided whil...(AI_EN_LECTOR_MISSING_DETERMINER)
[uncategorized] ~41-~41: Possible missing comma found.
Context: ...belong to. For more detail and possible values refer [here](https://datahubproject.io/...(AI_HYDRA_LEO_MISSING_COMMA)
[typographical] ~42-~42: Consider adding a comma here.
Context: ...y this recipe belong to. For more detail please refer [here](https://datahubproject.io/...(PLEASE_COMMA)
[uncategorized] ~81-~81: This verb may not be in the correct form. Consider using a different form for this context.
Context: ...lows to help you emit metadata event as show below! ```python import asyncio from ...(AI_EN_LECTOR_REPLACEMENT_VERB_FORM)
[uncategorized] ~124-~124: A comma may be missing after the conjunctive/linking adverb ‘Otherwise’.
Context: ...sks, user compulsory need to emit flow. Otherwise nothing will get emit. ## Resources F...(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[style] ~149-~149: The phrase ‘feel free to’ is used quite frequently. Consider using a less frequent alternative to set your writing apart from others and make it sound more professional.
Context: ... community](https://prefect.io/slack). Feel free to star or watch [datahub
](https://githu...(FEEL_FREE_TO_STYLE_ME)
[style] ~150-~150: Using many exclamation marks might seem excessive (in this case: 8 exclamation marks for a text that’s 3263 characters long)
Context: ...datahub-project/datahub) for updates too! ### Contributing If you'd like to hel...(EN_EXCESSIVE_EXCLAMATION)
[style] ~154-~154: Consider using a different verb for a more formal wording.
Context: ...ng If you'd like to help contribute to fix an issue or add a feature to `prefect-d...(FIX_RESOLVE)
metadata-ingestion/developing.md
[grammar] ~67-~67: Possibly, a comma is missing after an introductory clause, the verb inflection is not correct, or a question mark is missing. Did you mean “pip, install” or “pip installs”?
Context: ...ll If you've already run the pip install, but runningdatahub
in your command ...(IF_DT_NN_VBZ)
[style] ~112-~112: The phrase “a variety of” may be wordy. To make your writing clearer, consider replacing it.
Context: ...objects. The sources pull metadata from a variety of data systems, while the sinks are prima...(A_VARIETY_OF)
[style] ~116-~116: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ...ta into DataHub. ## Code layout - The CLI interface is defined in [entrypoints.py](./src/da...(ACRONYM_TAUTOLOGY)
[uncategorized] ~117-~117: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...li](./src/datahub/cli) directory. - The high level interfaces are defined in the [API dire...(EN_COMPOUND_ADJECTIVE_INTERNAL)
[style] ~147-~147: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...er method over a@staticmethod
. - You probably should not be defining a__hash__
met...(REP_PROBABLY)
[uncategorized] ~154-~154: Possible missing comma found.
Context: ...s are not required by the "core" package but instead can be optionally installed usi...(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~162-~162: Consider a shorter alternative to avoid wordiness.
Context: ...l.io/) to define the ingestion configs. In order to ensure that the configs are consistent ...(IN_ORDER_TO_PREMIUM)
[typographical] ~168-~168: After the expression ‘for example’ a comma is usually used.
Context: ...ternative isn’t descriptive enough. For exampleclient_id
ortenant_id
over a bare ...(COMMA_FOR_EXAMPLE)
Markdownlint
docs/lineage/prefect.md
58-58: Expected: 0 or 2; Actual: 1
Trailing spaces(MD009, no-trailing-spaces)
114-114: Expected: 1; Actual: 2
Multiple consecutive blank lines(MD012, no-multiple-blanks)
124-124: Expected: 1; Actual: 0; Above
Headings should be surrounded by blank lines(MD022, blanks-around-headings)
134-134: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines(MD022, blanks-around-headings)
19-19: null
Fenced code blocks should be surrounded by blank lines(MD031, blanks-around-fences)
21-21: null
Fenced code blocks should be surrounded by blank lines(MD031, blanks-around-fences)
123-123: null
Fenced code blocks should be surrounded by blank lines(MD031, blanks-around-fences)
18-18: null
Lists should be surrounded by blank lines(MD032, blanks-around-lists)
22-22: null
Lists should be surrounded by blank lines(MD032, blanks-around-lists)
44-44: null
Bare URL used(MD034, no-bare-urls)
120-120: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
66-66: Expected: fenced; Actual: indented
Code block style(MD046, code-block-style)
metadata-ingestion-modules/prefect-plugin/README.md
26-26: Expected: 0 or 2; Actual: 1
Trailing spaces(MD009, no-trailing-spaces)
63-63: Expected: 0 or 2; Actual: 1
Trailing spaces(MD009, no-trailing-spaces)
21-21: Expected: 1; Actual: 2
Multiple consecutive blank lines(MD012, no-multiple-blanks)
34-34: Expected: 1; Actual: 2
Multiple consecutive blank lines(MD012, no-multiple-blanks)
30-30: null
Bare URL used(MD034, no-bare-urls)
40-40: null
Bare URL used(MD034, no-bare-urls)
7-7: null
Images should have alternate text (alt text)(MD045, no-alt-text)
9-9: null
Images should have alternate text (alt text)(MD045, no-alt-text)
11-11: null
Images should have alternate text (alt text)(MD045, no-alt-text)
14-14: null
Images should have alternate text (alt text)(MD045, no-alt-text)
71-71: Expected: fenced; Actual: indented
Code block style(MD046, code-block-style)
metadata-ingestion/developing.md
165-165: Expected: h3; Actual: h4
Heading levels should only increment by one level at a time(MD001, heading-increment)
38-38: Expected: 1; Actual: 0; Above
Headings should be surrounded by blank lines(MD022, blanks-around-headings)
37-37: null
Fenced code blocks should be surrounded by blank lines(MD031, blanks-around-fences)
109-109: null
Images should have alternate text (alt text)(MD045, no-alt-text)
256-256: null
Files should end with a single newline character(MD047, single-trailing-newline)
Biome
docs-website/generateDocsDir.ts
[error] 352-367: This else clause can be omitted because previous branches break early.
Unsafe fix: Omit the else clause.
(lint/style/noUselessElse)
[error] 356-367: This else clause can be omitted because previous branches break early.
(lint/style/noUselessElse)
[error] 365-367: This else clause can be omitted because previous branches break early.
(lint/style/noUselessElse)
Ruff
metadata-ingestion-modules/prefect-plugin/tests/unit/test_datahub_emitter.py
550-550: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
Additional comments not posted (40)
metadata-ingestion-modules/prefect-plugin/scripts/release.sh (1)
4-8
: Ensure proper handling of build and install commands.The script conditionally runs
gradlew build
andgradlew install
. Ensure that these commands are correctly handling failures and that the conditions are mutually exclusive.Verification successful
Ensure proper handling of build and install commands.
The script conditionally runs
gradlew build
andgradlew install
. Thegradlew
file is accessible at../../jailuser/git/gradlew
. The conditions forRELEASE_SKIP_TEST
andRELEASE_SKIP_INSTALL
are mutually exclusive, and the script usesset -euxo pipefail
for robust error handling.
- Update the script to use the correct path:
../../jailuser/git/gradlew
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the build and install commands are correctly handling failures and that the conditions are mutually exclusive. # Test: Check the exit status of the commands. ../../gradlew build || { echo "Build failed"; exit 1; } ../../gradlew install || { echo "Install failed"; exit 1; }Length of output: 154
Script:
#!/bin/bash # Locate the `gradlew` file to ensure it exists in the expected directory. fd gradlew # Check the conditions and error handling in the release.sh script. cat metadata-ingestion-modules/prefect-plugin/scripts/release.shLength of output: 1225
Script:
#!/bin/bash # Verify the correct relative path to `gradlew` from the release.sh script location. fd gradlew --exec realpath --relative-to=metadata-ingestion-modules/prefect-plugin/scripts/ # Check the mutual exclusivity of the conditions in the release.sh script. ast-grep --lang bash --pattern $'if [[ ! ${RELEASE_SKIP_TEST:-} ]]; then\n\t$_\nelif [[ ! ${RELEASE_SKIP_INSTALL:-} ]]; then\n\t$_\nfi' metadata-ingestion-modules/prefect-plugin/scripts/release.shLength of output: 634
Script:
#!/bin/bash # Locate the `gradlew` file and determine its relative path from the script's location. find ../../ -name gradlew -exec realpath --relative-to=metadata-ingestion-modules/prefect-plugin/scripts/ {} + # Verify the mutual exclusivity of conditions in the release.sh script. grep -E 'if \[\[ ! \${RELEASE_SKIP_TEST:-} \]\]; then|elif \[\[ ! \${RELEASE_SKIP_INSTALL:-} \]\]; then' metadata-ingestion-modules/prefect-plugin/scripts/release.shLength of output: 411
Script:
#!/bin/bash # Locate the `gradlew` file and display the paths. find ../../ -name gradlewLength of output: 235
Script:
#!/bin/bash # Verify if the `gradlew` file is accessible at the intended path. if [ -f ../../jailuser/git/gradlew ]; then echo "gradlew is accessible at ../../jailuser/git/gradlew" else echo "gradlew is not accessible at ../../jailuser/git/gradlew" fiLength of output: 152
metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/entities.py (3)
9-13
: LGTM!The
_Entity
class is well-defined and follows best practices for abstract base classes.
16-30
: LGTM!The
Dataset
class is well-implemented and makes good use ofattr.s
for attribute management.
33-46
: LGTM!The
Urn
class is well-implemented and makes good use ofattr.s
for attribute management and custom validation.metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/example/flow.py (2)
17-20
: LGTM!The
extract
function is simple and well-implemented.
49-55
: LGTM!The
etl
function is well-implemented and follows best practices for flow functions.metadata-ingestion-modules/prefect-plugin/.gitignore (1)
1-143
: LGTM!The
.gitignore
file is well-structured and includes common patterns for Python projects..github/workflows/prefect-plugin.yml (1)
1-86
: LGTM!The workflow file is well-structured and follows best practices for GitHub Actions workflows.
metadata-ingestion-modules/prefect-plugin/setup.py (7)
1-2
: Optimize imports.Combine the
os
andpathlib
imports for better readability.import os import pathlib
16-24
: Clarify and document dependency groups.Add comments to explain the purpose of each dependency group.
# Common dependencies for all environments. rest_common = {"requests", "requests_file"} # Base dependencies required for the package. base_requirements = { # For python 3.7 and importlib-metadata>=5.0.0, build failed with attribute error "importlib-metadata>=4.4.0,<5.0.0; python_version < '3.8'", # Actual dependencies. "prefect >= 2.0.0", *rest_common, f"acryl-datahub == {package_metadata['__version__']}", }
46-69
: Updatedev_requirements
for clarity.Group related dependencies and add comments for better readability.
# Development dependencies. dev_requirements = { *base_requirements, *mypy_stubs, # Code formatting and linting tools. "black==22.12.0", "coverage>=5.1", "flake8>=3.8.3", "flake8-tidy-imports>=4.3.0", "isort>=5.7.0", "mypy>=1.4.0", # Pydantic version compatibility with mypy. "pydantic>=1.10", # Testing tools. "pytest>=6.2.2", "pytest-asyncio>=0.16.0", "pytest-cov>=2.8.1", "tox", # Miscellaneous tools. "deepdiff", "requests-mock", "freezegun", "jsonpickle", "build", "twine", "packaging", }
76-121
: Addkeywords
metadata for better discoverability.Include relevant keywords in the
setuptools.setup
call to improve package discoverability.setuptools.setup( # Package metadata. name=package_metadata["__package_name__"], version=package_metadata["__version__"], url="https://datahubproject.io/", project_urls={ "Documentation": "https://datahubproject.io/docs/", "Source": "https://github.com/datahub-project/datahub", "Changelog": "https://github.com/datahub-project/datahub/releases", }, license="Apache License 2.0", description="Datahub prefect block to capture executions and send to Datahub", long_description=get_long_description(), long_description_content_type="text/markdown", classifiers=[ "Development Status :: 5 - Production/Stable", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3 :: Only", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.10", "Intended Audience :: Developers", "Intended Audience :: Information Technology", "Intended Audience :: System Administrators", "License :: OSI Approved", "License :: OSI Approved :: Apache Software License", "Operating System :: Unix", "Operating System :: POSIX :: Linux", "Environment :: Console", "Environment :: MacOS X", "Topic :: Software Development", ], # Package info. zip_safe=False, python_requires=">=3.7", package_dir={"": "src"}, packages=setuptools.find_namespace_packages(where="./src"), entry_points=entry_points, # Dependencies. install_requires=list(base_requirements), extras_require={ "dev": list(dev_requirements), }, keywords="datahub prefect metadata integration", )
71-73
: Check entry points for correctness.Ensure the entry point definition correctly maps to the
DatahubEmitter
class.Verification successful
Entry point definition is correct.
The
DatahubEmitter
class is correctly defined in theprefect_datahub
module as specified in the entry point.
- File:
metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/datahub_emitter.py
- Class Definition:
class DatahubEmitter(Block)
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the entry point definition correctly maps to the `DatahubEmitter` class. # Test: Search for `DatahubEmitter` class definition. Expect: Definition in the specified module. rg --type python --pattern "class DatahubEmitter" ./src/prefect_datahub/prefect_datahub.pyLength of output: 504
Script:
#!/bin/bash # Description: Verify that the entry point definition correctly maps to the `DatahubEmitter` class. # Test: Search for `DatahubEmitter` class definition. Expect: Definition in the specified module. rg "class DatahubEmitter" ./src/prefect_datahub/prefect_datahub.pyLength of output: 149
Script:
#!/bin/bash # Description: Verify that the entry point definition correctly maps to the `DatahubEmitter` class. # Test: Search for `DatahubEmitter` class definition across the entire repository. rg "class DatahubEmitter"Length of output: 278
28-44
: Verify dependency versions formypy_stubs
.Ensure that the specified versions for
mypy_stubs
are compatible with the rest of the dependencies.Verification successful
Verify dependency versions for
mypy_stubs
.The specified versions for
mypy_stubs
appear to be compatible with the rest of the dependencies listed inbase_requirements
andframework_common
. No conflicts were found based on the available version constraints.
base_requirements
andframework_common
contain dependencies with specific versions that do not conflict with themypy_stubs
versions.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the specified versions for `mypy_stubs` are compatible with the rest of the dependencies. # Test: Check for dependency conflicts. Expect: No conflicts. pip install types-dataclasses sqlalchemy-stubs types-pkg_resources types-six types-python-dateutil types-requests types-toml types-PyYAML types-freezegun types-cachetools types-click==0.1.12 types-tabulate types-pytzLength of output: 289
Script:
#!/bin/bash # Description: List all dependencies and their versions in the setup.py file to manually verify compatibility. # Locate the setup.py file in the metadata-ingestion-modules/prefect-plugin directory fd setup.py -a -t f -x cat {} | grep -E "install_requires|mypy_stubs"Length of output: 404
Script:
#!/bin/bash # Description: Search for the definitions of base_requirements and framework_common to list all dependencies and their versions. # Locate the setup.py file in the metadata-ingestion-modules/prefect-plugin directory and search for base_requirements and framework_common definitions fd setup.py -a -t f -x cat '{}' | grep -A 10 -E 'base_requirements|framework_common'Length of output: 5934
6-8
: Ensure__init__.py
contains necessary metadata.Verify that
./src/prefect_datahub/__init__.py
contains the__package_name__
and__version__
keys used later in the setup.Verification successful
Ensure
__init__.py
contains necessary metadata.The
metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/__init__.py
file contains the__package_name__
and__version__
keys, confirming that the necessary metadata is present.
__package_name__ = "prefect-datahub"
__version__ = "1!0.0.0.dev0"
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that `__package_name__` and `__version__` are defined in `__init__.py`. # Test: Search for the definitions. Expect: Definitions of `__package_name__` and `__version__`. rg --type python --pattern "__package_name__|__version__" ./src/prefect_datahub/__init__.pyLength of output: 505
Script:
#!/bin/bash # Description: Verify that `__package_name__` and `__version__` are defined in `__init__.py`. # Test: Search for the definitions. Expect: Definitions of `__package_name__` and `__version__`. rg '__package_name__|__version__' ./src/prefect_datahub/__init__.pyLength of output: 143
Script:
#!/bin/bash # Description: Locate `__init__.py` and verify that `__package_name__` and `__version__` are defined. # Step 1: Locate `__init__.py` in the repository. fd --type f --name '__init__.py' # Step 2: Once located, check for the definitions of `__package_name__` and `__version__`. fd --type f --name '__init__.py' --exec rg '__package_name__|__version__' {}Length of output: 613
Script:
#!/bin/bash # Description: Locate `__init__.py` and verify that `__package_name__` and `__version__` are defined. # Step 1: Locate `__init__.py` in the repository. fd '__init__.py' # Step 2: Once located, check for the definitions of `__package_name__` and `__version__'. fd '__init__.py' --exec rg '__package_name__|__version__' {}Length of output: 10052
docs-website/build.gradle (1)
89-89
: LGTM! Ensure the build task is necessary.The addition of the
prefect-plugin:buildWheel
task looks good.Verify that this task is necessary for the documentation build process.
Verification successful
The
buildWheel
task for theprefect-plugin
is necessary.The
buildWheel
task is consistently referenced across multiple Gradle files, indicating its necessity in the build process.
metadata-ingestion-modules/prefect-plugin/build.gradle
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the `prefect-plugin:buildWheel` task is necessary for the documentation build process. # Test: Search for references to `buildWheel` tasks. Expect: References in relevant build scripts. rg --type gradle --pattern "buildWheel"Length of output: 451
Script:
#!/bin/bash # Description: Verify that the `prefect-plugin:buildWheel` task is necessary for the documentation build process. # Test: Search for references to `buildWheel` tasks. Expect: References in relevant build scripts. rg "buildWheel" --type gradleLength of output: 813
.github/workflows/build-and-test.yml (1)
89-90
: Ensure exclusions are necessary.The exclusion of the
prefect-plugin:build
andprefect-plugin:check
tasks looks good.Verify that these exclusions are necessary for the build process.
Verification successful
Exclusions are necessary.
The exclusions for
prefect-plugin:build
andprefect-plugin:check
tasks are correctly specified and necessary for the build process.
- The
metadata-ingestion-modules/prefect-plugin/build.gradle
file includes references to bothbuild
andcheck
tasks, confirming their presence in the build process.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the exclusions for `prefect-plugin:build` and `prefect-plugin:check` tasks are necessary for the build process. # Test: Search for references to `build` and `check` tasks. Expect: References in relevant build scripts. rg --type gradle --pattern "build|check"Length of output: 454
Script:
#!/bin/bash # Description: Verify that the exclusions for `prefect-plugin:build` and `prefect-plugin:check` tasks are necessary for the build process. # Test: Search for references to `build` and `check` tasks. Expect: References in relevant build scripts. rg "build|check" --type gradleLength of output: 19935
metadata-ingestion-modules/prefect-plugin/build.gradle (13)
1-3
: LGTM! Ensure the base plugin is necessary.The inclusion of the
base
plugin looks good.Verify that the
base
plugin is necessary for the build process.Verification successful
The
base
plugin is necessary for the build process.The
base
plugin is referenced in multiplebuild.gradle
files across different modules, indicating its necessity for the build configuration.
metadata-ingestion-modules/prefect-plugin/build.gradle
metadata-ingestion/build.gradle
metadata-ingestion-modules/airflow-plugin/build.gradle
metadata-ingestion-modules/dagster-plugin/build.gradle
entity-registry/custom-test-model/build.gradle
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the `base` plugin is necessary for the build process. # Test: Search for references to the `base` plugin. Expect: References in relevant build scripts. rg --type gradle --pattern "id 'base'"Length of output: 458
Script:
#!/bin/bash # Description: Verify that the `base` plugin is necessary for the build process. # Test: Search for references to the `base` plugin. Expect: References in relevant build scripts. rg "id 'base'" --type gradleLength of output: 348
30-41
: Ensure package installation task is necessary.The task to install the package looks good.
Verify that this task is necessary for the build process.
Verification successful
The
installPackage
task is necessary for the build process.The
installPackage
task is referenced in multiple build scripts, confirming its necessity.
metadata-ingestion/build.gradle
metadata-ingestion-modules/dagster-plugin/build.gradle
metadata-ingestion-modules/prefect-plugin/build.gradle
metadata-ingestion-modules/airflow-plugin/build.gradle
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the package installation task is necessary for the build process. # Test: Search for references to the package installation task. Expect: References in relevant build scripts. rg --type gradle --pattern "installPackage"Length of output: 455
Script:
#!/bin/bash # Description: Verify that the package installation task is necessary for the build process. # Test: Search for references to the package installation task. Expect: References in relevant build scripts. rg "installPackage" --type gradleLength of output: 1360
14-15
: LGTM! Ensure thepip_install_command
definition is correct.The definition for the
pip_install_command
looks good.Verify that this definition is correct for the environment.
Verification successful
LGTM! Ensure the
pip_install_command
definition is correct.The definition for the
pip_install_command
looks good. It is referenced multiple times in themetadata-ingestion-modules/prefect-plugin/build.gradle
file, confirming its usage.
metadata-ingestion-modules/prefect-plugin/build.gradle
metadata-ingestion-modules/dagster-plugin/build.gradle
metadata-ingestion-modules/airflow-plugin/build.gradle
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the definition for the `pip_install_command` is correct for the environment. # Test: Search for references to `pip_install_command`. Expect: References in relevant build scripts. rg --type gradle --pattern "pip_install_command"Length of output: 460
Script:
#!/bin/bash # Description: Verify that the definition for the `pip_install_command` is correct for the environment. # Test: Search for references to `pip_install_command`. Expect: References in relevant build scripts. rg "pip_install_command" --type gradleLength of output: 1467
10-12
: LGTM! Ensure theextra_pip_requirements
property is necessary.The definition for the
extra_pip_requirements
property looks good.Verify that this property is necessary for the build process.
Verification successful
The
extra_pip_requirements
property is necessary for the build process.The property is referenced across multiple build scripts and is used in pip installation commands, indicating its importance in managing pip dependencies.
metadata-ingestion/build.gradle
metadata-ingestion-modules/airflow-plugin/build.gradle
metadata-ingestion-modules/prefect-plugin/build.gradle
metadata-ingestion-modules/dagster-plugin/build.gradle
datahub-web-react/src/app/ingest/source/builder/NameSourceStep.tsx
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the `extra_pip_requirements` property is necessary for the build process. # Test: Search for references to `extra_pip_requirements`. Expect: References in relevant build scripts. rg --type gradle --pattern "extra_pip_requirements"Length of output: 463
Script:
#!/bin/bash # Description: Verify that the `extra_pip_requirements` property is necessary for the build process. # Test: Search for references to `extra_pip_requirements` across all files. rg "extra_pip_requirements"Length of output: 2217
20-27
: Ensure environment setup task is necessary.The task to set up the environment looks good.
Verify that this task is necessary for the build process.
Verification successful
Environment setup task is necessary.
The
environmentSetup
task is a prerequisite for multiple tasks across various modules, confirming its necessity for the build process.
metadata-ingestion/build.gradle
metadata-ingestion-modules/dagster-plugin/build.gradle
metadata-ingestion-modules/airflow-plugin/build.gradle
metadata-ingestion-modules/prefect-plugin/build.gradle
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the environment setup task is necessary for the build process. # Test: Search for references to the environment setup task. Expect: References in relevant build scripts. rg --type gradle --pattern "environmentSetup"Length of output: 457
Script:
#!/bin/bash # Description: Verify that the environment setup task is necessary for the build process. # Test: Search for references to the environment setup task. Expect: References in relevant build scripts. rg "environmentSetup" --type gradleLength of output: 1702
46-54
: Ensure development installation task is necessary.The task to install development dependencies looks good.
Verify that this task is necessary for the build process.
Verification successful
The
installDev
task is necessary for the build process.The
installDev
task is referenced in multiplebuild.gradle
files and is a dependency for various other tasks across the project.
smoke-test/build.gradle
metadata-ingestion-modules/prefect-plugin/build.gradle
metadata-ingestion/build.gradle
metadata-ingestion-modules/airflow-plugin/build.gradle
metadata-ingestion-modules/dagster-plugin/build.gradle
These references confirm its necessity within the build process.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the development installation task is necessary for the build process. # Test: Search for references to the development installation task. Expect: References in relevant build scripts. rg --type gradle --pattern "installDev"Length of output: 451
Script:
#!/bin/bash # Description: Verify that the development installation task is necessary for the build process. # Test: Search for references to the development installation task. Expect: References in relevant build scripts. rg "installDev" --type gradleLength of output: 4020
5-8
: LGTM! Ensure the Python executable and virtual environment name are correct.The definitions for
python_executable
andvenv_name
look good.Verify that these definitions are correct for the environment.
Verification successful
Verification successful for
python_executable
andvenv_name
.The definitions for
python_executable
andvenv_name
are consistent and correctly used across the relevant build scripts.
metadata-ingestion-modules/prefect-plugin/build.gradle
smoke-test/build.gradle
metadata-ingestion/build.gradle
metadata-ingestion-modules/dagster-plugin/build.gradle
metadata-ingestion-modules/airflow-plugin/build.gradle
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the definitions for `python_executable` and `venv_name` are correct for the environment. # Test: Search for references to `python_executable` and `venv_name`. Expect: References in relevant build scripts. rg --type gradle --pattern "python_executable|venv_name"Length of output: 470
Script:
#!/bin/bash # Description: Verify that the definitions for `python_executable` and `venv_name` are correct for the environment. # Test: Search for references to `python_executable` and `venv_name`. Expect: References in relevant build scripts. rg 'python_executable|venv_name' --type gradleLength of output: 12093
16-18
: Ensure Python version check is necessary.The task to check the Python version looks good.
Verify that this task is necessary for the build process.
Verification successful
Python version check task is necessary for the build process.
The
checkPythonVersion
task is referenced as a dependency for theenvironmentSetup
task in multiple build.gradle files, indicating its necessity for the build process.
metadata-ingestion-modules/dagster-plugin/build.gradle
metadata-ingestion/build.gradle
metadata-ingestion-modules/prefect-plugin/build.gradle
metadata-ingestion-modules/airflow-plugin/build.gradle
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the Python version check task is necessary for the build process. # Test: Search for references to the Python version check task. Expect: References in relevant build scripts. rg --type gradle --pattern "checkPythonVersion"Length of output: 459
Script:
#!/bin/bash # Description: Verify that the Python version check task is necessary for the build process. # Test: Search for references to the Python version check task. Expect: References in relevant build scripts. rg "checkPythonVersion" --type gradleLength of output: 851
73-80
: Ensure development test installation task is necessary.The task to install development test dependencies looks good.
Verify that this task is necessary for the build process.
Verification successful
Verified: The
installDevTest
task is necessary for the build process.The task is referenced by multiple other tasks, confirming its necessity.
metadata-ingestion-modules/prefect-plugin/build.gradle
metadata-ingestion/build.gradle
metadata-ingestion-modules/dagster-plugin/build.gradle
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the development test installation task is necessary for the build process. # Test: Search for references to the development test installation task. Expect: References in relevant build scripts. rg --type gradle --pattern "installDevTest"Length of output: 455
Script:
#!/bin/bash # Description: Verify that the development test installation task is necessary for the build process. # Test: Search for references to the development test installation task in Gradle files. rg "installDevTest" --type gradleLength of output: 1697
56-63
: Ensure linting task is necessary.The task to lint the code looks good.
Verify that this task is necessary for the build process.
Verification successful
The linting task is necessary for the build process.
The
lint
task is referenced as a dependency of thecheck
task in themetadata-ingestion-modules/prefect-plugin/build.gradle
file, confirming its necessity.
metadata-ingestion-modules/prefect-plugin/build.gradle
:check.dependsOn lint
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the linting task is necessary for the build process. # Test: Search for references to the linting task. Expect: References in relevant build scripts. rg --type gradle --pattern "lint"Length of output: 445
Script:
#!/bin/bash # Description: Verify that the linting task is necessary for the build process. # Test: Search for references to the linting task. Expect: References in relevant build scripts. rg "lint" --type gradleLength of output: 1714
96-103
: Ensure quick test task is necessary.The task to run quick tests looks good.
Verify that this task is necessary for the build process.
Verification successful
The
testQuick
task is necessary for the build process.The
testQuick
task is referenced in multiple build scripts, indicating its importance in the build process. Specifically, it is a dependency for thecheck
task and thetestFull
task.
- Files and references:
metadata-ingestion-modules/dagster-plugin/build.gradle
- Line:
task testQuick(type: Exec, dependsOn: installDevTest)
- Line:
task testFull(type: Exec, dependsOn: [testQuick, installDevTest])
- Line:
check.dependsOn testQuick
metadata-ingestion-modules/prefect-plugin/build.gradle
- Line:
task testQuick(type: Exec, dependsOn: installDevTest)
- Line:
task testFull(type: Exec, dependsOn: [testQuick, installDevTest])
- Line:
check.dependsOn testQuick
metadata-ingestion/build.gradle
- Line:
task testQuick(type: Exec, dependsOn: [installDev, ':metadata-models:generateJsonSchema'])
- Line:
check.dependsOn testQuick
metadata-ingestion-modules/airflow-plugin/build.gradle
- Line:
task testQuick(type: Exec, dependsOn: installTest)
- Line:
check.dependsOn testQuick
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the quick test task is necessary for the build process. # Test: Search for references to the quick test task. Expect: References in relevant build scripts. rg --type gradle --pattern "testQuick"Length of output: 450
Script:
#!/bin/bash # Description: Verify that the quick test task is necessary for the build process. # Test: Search for references to the quick test task. Expect: References in relevant build scripts. rg "testQuick" --type gradleLength of output: 1030
82-94
: Ensure single test task is necessary.The task to run a single test looks good.
Verify that this task is necessary for the build process.
Verification successful
Ensure single test task is necessary.
The task
testSingle
is indeed referenced in the documentation (metadata-ingestion/developing.md
), indicating that it is used for running specific test files. This confirms its necessity for the build process.
metadata-ingestion/developing.md
: References totestSingle
for running specific test files.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the single test task is necessary for the build process. # Test: Search for references to the single test task. Expect: References in relevant build scripts. rg --type gradle --pattern "testSingle"Length of output: 451
Script:
#!/bin/bash # Description: Verify that the single test task is necessary for the build process. # Test: Search for references to the single test task. Expect: References in relevant build scripts. rg "testSingle" --type gradleLength of output: 312
Script:
#!/bin/bash # Description: Verify if the single test task is invoked or referenced elsewhere in the codebase. # Test: Search for invocations or references to the single test task. rg "testSingle"Length of output: 705
64-71
: Ensure lint fixing task is necessary.The task to fix linting issues looks good.
Verify that this task is necessary for the build process.
Verification successful
Lint fixing task is necessary
The
lintFix
task is referenced in multiple build scripts and is part of the dependency chain for running tests (installTest
andtestQuick
). This confirms its necessity in the build process.
metadata-ingestion-modules/airflow-plugin/build.gradle
:installTest
should run afterlintFix
.metadata-ingestion-modules/airflow-plugin/build.gradle
:testQuick
depends oninstallTest
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the lint fixing task is necessary for the build process. # Test: Search for references to the lint fixing task. Expect: References in relevant build scripts. rg --type gradle --pattern "lintFix"Length of output: 448
Script:
#!/bin/bash # Description: Verify that the lint fixing task is necessary for the build process. # Test: Search for references to the lint fixing task. Expect: References in relevant build scripts. rg "lintFix" --type gradleLength of output: 513
Script:
#!/bin/bash # Description: Verify the role of the `installTest` task in the build process. # Test: Search for references to the `installTest` task. Expect: References in relevant build scripts. rg "installTest" --type gradleLength of output: 426
metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/datahub_emitter.py (10)
104-114
: LGTM!The method correctly converts a list of
_Entity
objects to a list ofDatasetUrn
objects.
116-145
: LGTM!The method fetches the workspace name and handles errors gracefully.
174-189
: LGTM!The method correctly emits the browse path.
190-254
: LGTM!The method correctly generates a
DataJob
entity and handles various scenarios.
256-324
: LGTM!The method correctly generates a
DataFlow
entity and handles various scenarios.
326-394
: LGTM!The method correctly emits task metadata and handles upstream dependencies.
395-458
: LGTM!The method correctly emits flow run metadata and handles various scenarios.
459-552
: LGTM!The method correctly emits task run metadata and handles various scenarios.
554-612
: LGTM!The method correctly stores the current running task metadata.
613-659
: LGTM!The method correctly emits the current running flow metadata.
f"(prefect,{flow_run_ctx.flow.name},PROD),{task_run_ctx.task.task_key})" | ||
) | ||
|
||
assert expected_datajob_urn in datahub_emitter._datajobs_to_emit.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize dictionary key check.
Use key in dict
instead of key in dict.keys()
.
- assert expected_datajob_urn in datahub_emitter._datajobs_to_emit.keys()
+ assert expected_datajob_urn in datahub_emitter._datajobs_to_emit
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
assert expected_datajob_urn in datahub_emitter._datajobs_to_emit.keys() | |
assert expected_datajob_urn in datahub_emitter._datajobs_to_emit |
Tools
Ruff
550-550: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
import asyncio | ||
|
||
from prefect_datahub.datahub_emitter import DatahubEmitter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making the configuration values configurable.
Hardcoding values like datahub_rest_url
, env
, platform_instance
, and token
can reduce flexibility. Consider accepting these as parameters or using environment variables.
import os
datahub_rest_url = os.getenv("DATAHUB_REST_URL", "http://localhost:8080")
env = os.getenv("ENV", "DEV")
platform_instance = os.getenv("PLATFORM_INSTANCE", "local_prefect")
token = os.getenv("TOKEN", None) # generate auth token in the datahub and provide here if gms endpoint is secure
datahub_emitter = DatahubEmitter(
datahub_rest_url=datahub_rest_url,
env=env,
platform_instance=platform_instance,
token=token,
)
async def save_datahub_emitter(): | ||
datahub_emitter = DatahubEmitter( | ||
datahub_rest_url="http://localhost:8080", | ||
env="DEV", | ||
platform_instance="local_prefect", | ||
token=None, # generate auth token in the datahub and provide here if gms endpoint is secure | ||
) | ||
|
||
await datahub_emitter.save("BLOCK-ID", overwrite=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for the asynchronous operation.
The function save_datahub_emitter
lacks error handling for the await datahub_emitter.save
call. Consider adding a try-except block to handle potential exceptions.
async def save_datahub_emitter():
try:
datahub_emitter = DatahubEmitter(
datahub_rest_url="http://localhost:8080",
env="DEV",
platform_instance="local_prefect",
token=None, # generate auth token in the datahub and provide here if gms endpoint is secure
)
await datahub_emitter.save("BLOCK-ID", overwrite=True)
except Exception as e:
print(f"Failed to save DatahubEmitter: {e}")
async def load_datahub_emitter(): | ||
datahub_emitter = DatahubEmitter() | ||
emitter = datahub_emitter.load("BLOCK-ID") | ||
print(emitter) | ||
return emitter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making the block ID a parameter.
The function is well-implemented, but making the block ID a parameter would improve flexibility.
- emitter = datahub_emitter.load("BLOCK-ID")
+ emitter = datahub_emitter.load(block_id: str)
Committable suggestion was skipped due to low confidence.
@task(name="Transform", description="Transform the data") | ||
def transform( | ||
data: str, datahub_emitter: DatahubEmitter | ||
) -> Tuple[List[str], DatahubEmitter]: | ||
data_list_str = data.split(" ") | ||
datahub_emitter.add_task( | ||
inputs=[ | ||
Dataset( | ||
platform="snowflake", | ||
name="mydb.schema.tableA", | ||
env=datahub_emitter.env, | ||
platform_instance=datahub_emitter.platform_instance, | ||
) | ||
], | ||
outputs=[ | ||
Dataset( | ||
platform="snowflake", | ||
name="mydb.schema.tableB", | ||
env=datahub_emitter.env, | ||
platform_instance=datahub_emitter.platform_instance, | ||
) | ||
], | ||
) | ||
return data_list_str, datahub_emitter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making the task details parameters.
The function is well-implemented, but making the task details parameters would improve flexibility.
- datahub_emitter.add_task(
- inputs=[
- Dataset(
- platform="snowflake",
- name="mydb.schema.tableA",
- env=datahub_emitter.env,
- platform_instance=datahub_emitter.platform_instance,
- )
- ],
- outputs=[
- Dataset(
- platform="snowflake",
- name="mydb.schema.tableB",
- env=datahub_emitter.env,
- platform_instance=datahub_emitter.platform_instance,
- )
- ],
- )
+ datahub_emitter.add_task(inputs: List[Dataset], outputs: List[Dataset])
Committable suggestion was skipped due to low confidence.
Config | Type | Default | Description | ||
--- | --- | --- | --- | ||
datahub_rest_url | `str` | *http://localhost:8080* | DataHub GMS REST URL | ||
env | `str` | *PROD* | The environment that all assets produced by this orchestrator belong to. For more detail and possible values refer [here](https://datahubproject.io/docs/graphql/enums/#fabrictype). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comma after "For more detail".
A comma is needed after "For more detail".
- For more detail and possible values refer [here]
+ For more detail and possible values, refer [here]
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
env | `str` | *PROD* | The environment that all assets produced by this orchestrator belong to. For more detail and possible values refer [here](https://datahubproject.io/docs/graphql/enums/#fabrictype). | |
env | `str` | *PROD* | The environment that all assets produced by this orchestrator belong to. For more detail and possible values, refer [here](https://datahubproject.io/docs/graphql/enums/#fabrictype). |
Tools
LanguageTool
[uncategorized] ~41-~41: Possible missing comma found.
Context: ...belong to. For more detail and possible values refer [here](https://datahubproject.io/...(AI_HYDRA_LEO_MISSING_COMMA)
|
||
### Load the saved block in prefect workflows | ||
|
||
After installing `prefect-datahub` and [saving the configution](#saving-configurations-to-a-block), you can easily use it within your prefect workflows to help you emit metadata event as show below! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct verb form: "shown"
The correct verb form is "shown".
- as show below!
+ as shown below!
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
After installing `prefect-datahub` and [saving the configution](#saving-configurations-to-a-block), you can easily use it within your prefect workflows to help you emit metadata event as show below! | |
After installing `prefect-datahub` and [saving the configution](#saving-configurations-to-a-block), you can easily use it within your prefect workflows to help you emit metadata event as shown below! |
Tools
LanguageTool
[uncategorized] ~81-~81: This verb may not be in the correct form. Consider using a different form for this context.
Context: ...lows to help you emit metadata event as show below! ```python import asyncio from ...(AI_EN_LECTOR_REPLACEMENT_VERB_FORM)
def get_long_description(): | ||
root = os.path.dirname(__file__) | ||
return pathlib.Path(os.path.join(root, "README.md")).read_text() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use pathlib
consistently.
For consistency, use pathlib
for path operations instead of os.path
.
def get_long_description():
root = pathlib.Path(__file__).parent
return (root / "README.md").read_text()
@@ -241,4 +253,4 @@ tox -- --update-golden-files | |||
|
|||
# Update golden files for a specific environment. | |||
tox -e py310-airflow26 -- --update-golden-files | |||
``` | |||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a single newline character at the end of the file.
Files should end with a single newline character.
-
+
Committable suggestion was skipped due to low confidence.
Tools
Markdownlint
256-256: null
Files should end with a single newline character(MD047, single-trailing-newline)
task testFull(type: Exec, dependsOn: [testQuick, installDevTest]) { | ||
commandLine 'bash', '-x', '-c', | ||
"source ${venv_name}/bin/activate && pytest -m 'not slow_integration' -vv --continue-on-collection-errors --junit-xml=junit.full.xml" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
Ensure full test task is necessary.
The testFull
task is referenced in the metadata-ingestion/developing.md
file but is not used in the build process itself.
metadata-ingestion/developing.md
:../gradlew :metadata-ingestion:testFull
This indicates that the testFull
task is used during the development process but not required for the build process.
Analysis chain
Ensure full test task is necessary.
The task to run full tests looks good.
Verify that this task is necessary for the build process.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that the full test task is necessary for the build process.
# Test: Search for references to the full test task. Expect: References in relevant build scripts.
rg --type gradle --pattern "testFull"
Length of output: 449
Script:
#!/bin/bash
# Description: Verify that the full test task is necessary for the build process.
# Test: Search for references to the full test task. Expect: References in relevant build scripts.
rg "testFull" --type gradle
Length of output: 362
Script:
#!/bin/bash
# Description: Verify that the full test task is necessary for the build process.
# Test: Search for references to the full test task in all files within the repository.
rg "testFull"
Length of output: 421
Fix build
Checklist
Summary by CodeRabbit
New Features
Documentation
Chores