-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic use of row_number()
#77
Comments
Interestingly the former has been solved upstream: |
I definitely agree we should choose a consistent and deterministic way of doing this, and I like ROW_NUMBER() OVER() ... are you sure SQL Server doesn't support it? My Googling says yes but I've never actually used SQL Server. |
Sorry I made my point badly - you're absolutely right it does support it, but it does not support an empty https://stackoverflow.com/questions/44105691/row-number-without-order-by |
Ahh OK! That makes more sense 🙃 and agree, we should choose some ordering key for consistency even if there are dupes in a table (which will happen in OMOP). |
row_number
is sprinkled out the repository, however it is used in various different ways which are likely to give unexpected/non-deterministic behaviour between runs. We usually will have a logical means of ordering them - if not date, then an ID of some sort. They tend to be used as:row_number() over ()
row_number() over(order by (select null)
The variation is likely to be from a mixture of copy-pasted sources (e.g. T-SQL doesn't allow
...OVER ()
but Postgres/DuckDB do). The main two offenders I can find so far:dbt-synthea/models/omop/location.sql
Line 2 in ae79114
dbt-synthea/models/omop/provider.sql
Line 2 in 75a1e12
Although this is probably has low impact downstream, it may be causing unexpected behaviour e.g. #47
The text was updated successfully, but these errors were encountered: