Feature/pred2bq bulk update #230

cfezequiel · 2023-04-01T02:35:57Z

Fixes #78

Includes changes in PR #225 .

This pull request contains bulk updates to the Predictions to BigQuery component.

Changes:

Adds integration test (executor-only, local run and Vertex AI run) and test data.
Adds component readme.
Updates component code to pass the tests.

Checks:

[ X ] Tests pass
[ X ] Appropriate changes to README are included in PR

github-actions · 2023-04-01T02:36:15Z

Thanks for the PR! 🚀

Instructions: Approve using /lgtm and mark for automatic merge by using /merge.

cfezequiel · 2023-04-01T02:36:40Z

@michaelwsherman fyi

tfx_addons/version.py

tfx_addons/predictions_to_bigquery/README.md

casassg · 2023-04-17T20:58:03Z

tfx_addons/predictions_to_bigquery/README.md

+    schema=schema_gen.outputs['schema']
+    transform_graph=transform.outputs['transform_graph'],
+    bq_table_name='my_bigquery_table',
+    gcs_temp_dir='gs://bucket/temp-dir',


Should this just use the temp_dir from beam_pipeline_args instead?

Yes, I just realized that the WriteToBigQuery transform used by the Beam pipeline defaults to temp_location if custom_gcs_temp_location is not specified, and temp_location should be an argument supported by beam_pipeline_args. Although I'm not quite sure if temp_location needs to be explicitly set or if Beam will create one by default otherwise.
I think this would be a more involved change as the Vertex integration test needs a custom container having the pred2bq component in Artifact Registry to run. Perhaps we have it as a separate PR instead?

- Adds unit tests - Also adds credits to original code author

Adds a test that runs the executor module's Beam pipeline using a DirectRunner and exports prediction data to an actual BigQuery table.

Changes: - Refactors the component integrate test and adds a test to run the component on Vertex AI Pipelines. - Adds a Dockerfile to package the component code into a Docker image based on OSS TFX image. The image can then be used as the base image when running a pipeline in Vertex AI. - Updates the `bigquery_export` output of the predictions-to-bigquery to store the generated BigQuery table name. This aids with checking the output of the component during testing, but also allows any downstream component receive this component's output.

Adds a test that integrates the transform component into the pipeline. Test is implemented for local runner only.

Adds a container component stub to represent the TFX Transform component for integration testing on Vertex AI.

Replaces create_tempfile and create_tempdir calls from abseil's absltest.TestCase and parameterized.TestCase with equivalent methods from the tempfile package. The reason is that the abseil methods require parsing of the FLAGs variable, which may not be executed if absltest.main() is not invoked. This can happen when test filtering is performed, e.g. ``` python -m unittest path.to.test ```

Mentions the predictions-to-bigquery component in top-level readme.

- Fix issues in pred2bq readme - Reverted version change in setup.py - Add abls-py test prerequisite in setup.py

cfezequiel · 2023-05-15T17:34:38Z

@casassg thanks for the comments! Made some changes and replied to your comments, ptal.

casassg

looks good. nit are optional (good for later as well), undo chnage in version.py as instructed before merge. Also fixing CI so you can merge

README.md

tfx_addons/predictions_to_bigquery/Dockerfile

tfx_addons/predictions_to_bigquery/README.md

tfx_addons/version.py

casassg · 2023-05-15T17:50:54Z

may need to rebase to use #244 so we can run end to end all CI

michaelwsherman

Overall LGTM. Carlos, I trust you to fix anything I flagged that's worth fixing so I'm approving this PR.

I didn't review or run the tests, and I didn't make sure that everything in utils is used. If you want the tests run/reviewed, let me know and I'll delegate it. If you've run them I'm good.

Overall this is great. I appreciate that you've written out a full example in the integration test, documented everything well, and included an example.

michaelwsherman · 2023-05-12T21:24:45Z

tfx_addons/version.py

@@ -16,7 +16,7 @@

 # We follow Semantic Versioning (https://semver.org/)
 _MAJOR_VERSION = "0"
-_MINOR_VERSION = "6"
+_MINOR_VERSION = "7"


Heads up to other reviewers that this matches where we're currently at with the releases, not sure why github still shows it as a diff.

I think it may be due to missing to rebase the branch from master

michaelwsherman · 2023-05-12T21:28:59Z

tfx_addons/predictions_to_bigquery/Dockerfile

Remove this file.

@michaelwsherman why do we need to remove the Dockerfile? It's currently used to define the tfx-addons container that's needed by the integration test.

tfx_addons/predictions_to_bigquery/component.py

tfx_addons/predictions_to_bigquery/executor.py

tfx_addons/predictions_to_bigquery/utils.py

cfezequiel

Thanks for the additional comments @casassg and @michaelwsherman .
Made some fixes.

README.md

tfx_addons/predictions_to_bigquery/Dockerfile

tfx_addons/predictions_to_bigquery/README.md

tfx_addons/predictions_to_bigquery/component.py

cfezequiel · 2023-05-25T19:37:08Z

tfx_addons/predictions_to_bigquery/utils.py



-def _get_compress_type(file_path):
+def _get_compress_type(file_path: str) -> Optional[str]:


This is from the original code by Hannes and I decided to reuse it. I saw a Python library that may provide a similar functionality: https://pypi.org/project/filetype
I could create an issue for it, and it might be good first issue to take on for a new contributor.

tfx_addons/predictions_to_bigquery/utils.py

tfx_addons/version.py

cfezequiel requested review from hanneshapke and casassg as code owners April 1, 2023 02:35

github-actions bot added the needs-lgtm label Apr 1, 2023

cfezequiel mentioned this pull request Apr 1, 2023

pred2bq: Update schema parsing from prediction results. #225

Closed

casassg requested changes Apr 17, 2023

View reviewed changes

cfezequiel added 21 commits May 15, 2023 13:28

pred2bq: Update schema parsing from prediction results.

c5d3250

pred2bq: Add integration test.

1ed693e

pred2bq: Refactor executor.py.

9708b46

- Adds unit tests - Also adds credits to original code author

pred2bq: Remove symlink to data folder - not needed.

7f25633

pred2bq: Refactor executor.py.

f4ca221

pred2bq: Add integration test - executor to BQ

6938e58

Adds a test that runs the executor module's Beam pipeline using a DirectRunner and exports prediction data to an actual BigQuery table.

pred2bq: Update component spec.

cd17b95

pred2bq: Update utils.py.

60faa2d

pred2bq: Add component integration test.

e4d78ce

pred2bq: Add deps to version.py; update pkg version.

c68667b

pred2bq: Add integration test with transform.

8f48517

Adds a test that integrates the transform component into the pipeline. Test is implemented for local runner only.

pred2bq: Add integration test with schema.

f8a53d2

pred2bq: Add Transform component in Vertex AI test.

4fca0a9

Adds a container component stub to represent the TFX Transform component for integration testing on Vertex AI.

pred2bq: Code cleanup and documentation.

b57d02a

pred2bq: Add readme file.

a33b419

Add tests to expand code coverage.

685cd60

Add project team to readme.

f68f0d7

Update top-level readme.

6035c17

Mentions the predictions-to-bigquery component in top-level readme.

Update code based on reviewer comments.

d091922

- Fix issues in pred2bq readme - Reverted version change in setup.py - Add abls-py test prerequisite in setup.py

cfezequiel force-pushed the feature/pred2bq-bulk-update branch from 20f4068 to d091922 Compare May 15, 2023 17:32

casassg approved these changes May 15, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

tfx_addons/predictions_to_bigquery/Dockerfile Show resolved Hide resolved

tfx_addons/predictions_to_bigquery/README.md Outdated Show resolved Hide resolved

tfx_addons/version.py Outdated Show resolved Hide resolved

github-actions bot added lgtm needs-merge and removed needs-lgtm labels May 15, 2023

michaelwsherman approved these changes May 17, 2023

View reviewed changes

pred2bq: Update code based on code reviews.

a5de9d2

cfezequiel commented May 26, 2023

View reviewed changes

cfezequiel mentioned this pull request Jul 13, 2023

Add a top-level Dockerfile to create a custom tfx image that includes tfx-addons #254

Open

Merge branch 'main' into feature/pred2bq-bulk-update

91feb1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/pred2bq bulk update #230

Feature/pred2bq bulk update #230

cfezequiel commented Apr 1, 2023 •

edited

Loading

github-actions bot commented Apr 1, 2023

cfezequiel commented Apr 1, 2023

casassg Apr 17, 2023

cfezequiel May 15, 2023

cfezequiel commented May 15, 2023

casassg left a comment

casassg commented May 15, 2023

michaelwsherman left a comment

michaelwsherman May 12, 2023

casassg May 17, 2023

michaelwsherman May 12, 2023

cfezequiel Jul 13, 2023

cfezequiel left a comment

cfezequiel May 25, 2023



		def _get_compress_type(file_path):
		def _get_compress_type(file_path: str) -> Optional[str]:

Feature/pred2bq bulk update #230

Are you sure you want to change the base?

Feature/pred2bq bulk update #230

Conversation

cfezequiel commented Apr 1, 2023 • edited Loading

github-actions bot commented Apr 1, 2023

cfezequiel commented Apr 1, 2023

casassg Apr 17, 2023

Choose a reason for hiding this comment

cfezequiel May 15, 2023

Choose a reason for hiding this comment

cfezequiel commented May 15, 2023

casassg left a comment

Choose a reason for hiding this comment

casassg commented May 15, 2023

michaelwsherman left a comment

Choose a reason for hiding this comment

michaelwsherman May 12, 2023

Choose a reason for hiding this comment

casassg May 17, 2023

Choose a reason for hiding this comment

michaelwsherman May 12, 2023

Choose a reason for hiding this comment

cfezequiel Jul 13, 2023

Choose a reason for hiding this comment

cfezequiel left a comment

Choose a reason for hiding this comment

cfezequiel May 25, 2023

Choose a reason for hiding this comment

cfezequiel commented Apr 1, 2023 •

edited

Loading