You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was using your code mssql_to_s3_operator.py as a reference point for a similar plugin that I am building. I've noticed something strange on the lines 198-203. To get the minimum and maximum values of the primary key, you are splitting the query_filter to get only the first part.
For example, starting with:
query_filter = "WHERE created_at >= x AND created_at < y AND"
(the last AND added on line 186), you would get:
count_sql_min = "SELECT min(id) FROM table WHERE created_at >= x" count_sql_max = "SELECT max(id) FROM table WHERE created_at >= x"
This does not make much sense because you would get a count value much bigger than it should be, so you would end up with many more iterations on your loop, ie, you would end up looping during conditions where the filter condition was such that you wouldn't be loading data from the DB anymore (and I also don't see any break for when this happens).
Shouldn't we keep the full query_filter so that:
count_sql_max = "SELECT max(id) FROM table WHERE created_at >= x AND created_at < y"
Maybe the code was only tested using only the start argument without passing the end and in that case it would be removing the last "AND" added in line 186, so that the code works on this conditions.
Let me know if this makes sense or if I am missing anything, maybe there is some assumption that I am not considering.
The text was updated successfully, but these errors were encountered:
I was using your code
mssql_to_s3_operator.py
as a reference point for a similar plugin that I am building. I've noticed something strange on the lines 198-203. To get the minimum and maximum values of the primary key, you are splitting the query_filter to get only the first part.For example, starting with:
query_filter = "WHERE created_at >= x AND created_at < y AND"
(the last AND added on line 186), you would get:
count_sql_min = "SELECT min(id) FROM table WHERE created_at >= x"
count_sql_max = "SELECT max(id) FROM table WHERE created_at >= x"
This does not make much sense because you would get a
count
value much bigger than it should be, so you would end up with many more iterations on your loop, ie, you would end up looping during conditions where the filter condition was such that you wouldn't be loading data from the DB anymore (and I also don't see any break for when this happens).Shouldn't we keep the full query_filter so that:
count_sql_max = "SELECT max(id) FROM table WHERE created_at >= x AND created_at < y"
Maybe the code was only tested using only the
start
argument without passing theend
and in that case it would be removing the last "AND" added in line 186, so that the code works on this conditions.Let me know if this makes sense or if I am missing anything, maybe there is some assumption that I am not considering.
The text was updated successfully, but these errors were encountered: