Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pred2bq: Update schema parsing from prediction results. #225

Conversation

cfezequiel
Copy link
Contributor

Related to #78

Update code to enable schema parsing from BulkInferrer prediction results data directly.
This adds a third option to the existing ways to parse the input data schema, which include:

  1. Parsing from schema.pbtxt file
  2. Parsing from TFTransform output path (generated by the TFX Transform component).

Additional changes:

  • Revert input of BigQuery table name to use fully qualified name, i.e. 'project:dataset.table_name', instead of specifying GCP project and BigQuery dataset separately.

  • Label ID to name mapping is optional - depends on presence of relevant vocabulary file in Transform output

  • Typehint and variable name changes

  • [ X ] Tests pass

@cfezequiel cfezequiel requested a review from hanneshapke as a code owner March 15, 2023 17:20
@github-actions
Copy link
Contributor

Thanks for the PR! 🚀

Instructions: Approve using /lgtm and mark for automatic merge by using /merge.

@cfezequiel
Copy link
Contributor Author

Hi @hanneshapke, I was wondering if you'll be able to review this PR. Sorry for the long-ish change, but it's a bit of a challenge now to break it up into smaller chunks due to time constraints, and subsequent changes also build on top of this PR.

With that said, an alternative would be for me to submit one large PR containing all the subsequent changes for the component.
Pro: One PR to review; total LOC change may be lower than the sum of subsequent PRs due to merging of overlapping changes.
Con: PR would be larger than the previous ones.

Let me know what you think.

@cfezequiel
Copy link
Contributor Author

This can be deprecated in favor of #230 , which also contains the changes here.

@hanneshapke
Copy link
Contributor

Closing it by request from @michaelwsherman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants