Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored remaining core tables #166

Merged
merged 8 commits into from
Jan 26, 2024
Merged

Conversation

dogversioning
Copy link
Contributor

This PR makes the following changes:

  • Updates to core templates for handling date fields
  • Moves all base core tables to builders
  • Moves core-related templates from base templates to core_templates
  • moves non-template reading functions from template_sql/templates to template_sql/utils

Checklist

  • Consider if documentation (like in docs/) needs to be updated
  • Consider if tests should be added
  • Run pylint if you're making changes beyond adding studies
  • Update template repo if there are changes to study configuration

tests/test_data/core/core__count_patient.txt Show resolved Hide resolved
tests/test_data/core/core__encounter.txt Outdated Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
cumulus_library/studies/core/builder_medication.py Outdated Show resolved Hide resolved
cumulus_library/studies/core/manifest.toml Show resolved Hide resolved
@dogversioning dogversioning marked this pull request as ready for review January 24, 2024 13:58
Copy link
Contributor

@mikix mikix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done yet, but I'm going to break for lunch and wanted to dump these on ya first 😄

e.priority_code_system,
e.reasoncode_code,
e.reasoncode_code_system,
date_diff('year', date(p.birthdate), e.period_start_day) AS age_at_visit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probs fine since the ETL always gives you year-only birthDates. But if we ever want to handle cases where the day might actually fall anywhere in the year, this is the logic I used in quality (works in both athena and duckdb, unlike date_diff):

                year(targetDateTime)
                - year(birthDate)
                - (
                    CASE WHEN
                        extract(doy FROM birthDate) > extract(doy FROM targetDateTime)
                    THEN 1
                    ELSE 0
                    END
                )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frankly i just copied this over verbatim from the raw sql without thinking about it.

but - can we talk about this? i assume there's a reason, which is why i didn't touch this, but that feels like an odd behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

date_diff works like you want I believe, in Athena. (i.e. it takes day of year into account, is my memory)

But not duckdb - in duckdb, it is naive and just does year-minus-year.

So I used the logic above to get both to work the same. I'm guessing the author of that sql just didn't have duckdb in their brains because why would they. But you might hid oddities during unit testing if you ever feed this a non -01-01 date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the bit I actually wanted to talk about was truncating the birthdates at the ETL - just wanted to understand more of why that's happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that was because of reducing PHI vibes - we keep most datetimes in place, for clinical reasons. But I guess they (Andy & Jamie) felt like birthdates were more sensitive and we should keep truncating them.

For reference, that happens at the MS-deid-tool layer and we could change the config to not do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I created #169 - we're using that now in a way that's slightly inaccurate and i want to, at some point, just discuss the implications once more.

Copy link
Contributor

@mikix mikix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping another group of comments, before I dig into the tests

Copy link
Contributor

@mikix mikix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK finished looking at this 😮‍💨 and looks good! Thanks for all this refactoring.

Comments can be resolved as you see fit, but nothing deal breaking.

@dogversioning dogversioning merged commit 178295f into main Jan 26, 2024
3 checks passed
@dogversioning dogversioning deleted the mg/core_refactor_part_2 branch January 26, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants