Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for specifying column size options in taps (S3 CSV to Vertica VARCHAR(N)) #720

Open
jordan8037310 opened this issue Jun 16, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@jordan8037310
Copy link

Is your feature request related to a problem? Please describe.
When we set up a new fastsync replication from S3 CSV to Target-Vertica, Vertica requires a column length for varchar to be defined. Unfortunately this is hardcoded and there is no config parameters that would allow us to set up the length of the VARCHAR when the S3 Tap is configured.

Describe the solution you'd like
We want to implement changes to pipelinewise-tap-s3-csv, pipelinewise, and to our own pipelinewise-target-vertica: full360/pipelinewise-target-vertica#7

Describe alternatives you've considered
How do other people solve this?

APPROACH
Here is the approach we have outlined and want to start working on:

Approach 1

If the user is replicating multiple tables and wants to set different varchar character values for different tables then we can go with this approach.
Note: In this approach, we have to modify the pipelinewise-tap-s3-csv as well.

Possible modifications that are required:-

Have to modify pipelinewise-tap-s3-csv and add an extra field in CONFIG_CONTRACT.

Modifications to s3_csv_to_vertica.py
At line 33:
Minimal modification is required that is adding another parameter and implementation of the same.

At line 48:
Creating a lambda function of “tap_type_to_target_type” and passing the varchar value to the new parameter and then passing the lambda function to FastSyncTapS3Csv

At line 49:
We need to append varchar value to args.target which will be required in fastsync target_vertica.py file

Modifications to target_vertica.py
At line 86:
We just need to add the character length.

Now to pass the value of varchar to the singer engine of target_vertica that requires some modification in the cli piplinewise.py file to pass varchar length as an argument to the singer. To accept the argument we need to modify the following:

Modifications to init.py
At line 311
Modify the main function to take an additional argument and then store it in the config dictionary.

Modification to utils.py
At line 102:
Minimal modification is required.

Modification to db_sync.py
Need to pass varchar length from config to functions column_type and column_clause

Request for Input

  • Does anyone recommend a different approach for the column width specification?
  • Would our changes be likely accepted into core pipelinewise if we opened a PR?
@jordan8037310 jordan8037310 added the enhancement New feature or request label Jun 16, 2021
@Samira-El
Copy link
Contributor

Hi @jordan8037310, apologies for the late reply, this slipped through the cracks!

I'm unfamiliar with Vertica so please excuse the noob question, why not make target vertica create varchar columns with the maximum allowed length regardless of what a tap sends? this is how it's done in target-snowflake for example.

Also, I think the logic of figuring out the right length needs to live in target-vertica so that it can work with any singer tap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants