-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redshift: dbWriteTable() with larger DFs and CSVs #429
Comments
Thanks. This is a hard problem, also tracked in r-dbi/DBI#252. Loading large data works best if the data is near the server. Also, it helps to disable or delete things like indexes and constraints. To load large data efficiently, a little more than a single function call will be necessary. Procedures will vary vastly across databases. For small data, these things don't matter that much, reliability is important, and The current Redshift implementation creates a huge SQL query that inserts all rows. As you noticed, this collides with Redshift's limit on the query size. To work around this, we need a better version of |
I think I said this in a way that doesn't indicate how much I appreciate all the work you guys have put into making Agree w/ your assessment of the problem & the most reasonable solution! Do you think a different function or even graceful fallback of |
Missed the question, sorry. I think the upload to S3 plus |
Hey there,
I'd run into some issues with trying to upload data to redshift using RPostgres and non-Redshift-specific drivers a couple years ago and developed a workaround then that relied on pushing data to S3 first and then copying it into Redshift.
That solution utilizes the somehow-still-working RedshiftTools package, so when I was doing some refactoring, I was eager to see if DBI::dbWriteTable had made any progress on this project in the couple years since, and figured I'd toss a reprex your way with any bugs I saw.
It is "hard" to get data into redshift in ways that probably shouldn't be hard. Both
dplyr::copy_to()
anddbplyr::dbWriteTable()
don't work as expected for me, and I'm forced to rely on a hackier workaround than I'd like to.If one outcome of this bug report is to say "You should use use
dbWriteTable()
to upload a CSV instead of DF, that's totally fine w/ me, but that part should probably be fixed to work w/ schema naming andDBI::Id
just like the "lite" version with the small DF does.Created on 2023-03-30 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: