-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FHIR field naming conformance #201
Conversation
421629d
to
732a1be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file was an older reference sql file that just never got cleaned up.
docs/core-study-details.md
Outdated
Unless otherwise noted, column names correspond to a FHIR path relative to the resource | ||
table in question, with resources delineated by underscores. So, 'category_code' in the | ||
Condition table would correspond to the code element inside of the category element of | ||
the Condition resource. | ||
|
||
Common exceptions to this rule: | ||
|
||
- `cnt` is a Cumulus specific notation for a column that contains counts of resources. | ||
- `age_at_visit` is calculated from a Patient's birthDate and the period.start.day of an Encounter | ||
- `postalcode_3` is calculated from a Patient's address.postalCode | ||
- The Encounter resource includes several elements from the referenced Patient (`gender`, `postalcode_3`, | ||
and the US Core race and ethnicity extensions) that are commonly used in informatics analysis | ||
|
||
The core tables include all FHIR required/must support fields noted in the | ||
[FHIR resource profiles(http://hl7.org/fhir/us/core/STU4/artifacts.html#structures-resource-profiles). | ||
Additionally, there are fields that are useful to informatics analysis that are commonly available | ||
from EHRs, but are not guaranteed to be populated, so consult with your research partners if | ||
you are authoring a study using some of these data elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@comorbidity drawing your attention to this text - not quite in db metadata, but would this work for external consumers, i.e. a DPH?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dogversioning Which part are you referring to? If you mean ZipCode with 3 digits, that's a HIPAA constraint.
There has been an outstanding task to look at "census track data" but that has not yet been completed. ZipCode with 5 digits to my knowledge we do not have PHI clearance for, playing it safe here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@comorbidity this was more directed at answering the question that source metadata was trying to answer, i.e. 'i see this field name in the dashboard, where does it come from'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@comorbidity this might be the easiest place to note the changes this PR makes to the column names from the perspective of a study author.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
age_at_visit is a calculated view very, very commonly used in population health studies that involve patients over any timeperiod (or even patients at a single point in time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment was more aimed at the general cutover to FHIR-like column names, not age at visit specifically.
@@ -151,6 +159,52 @@ def update_nested_obj(id_path, obj, i): | |||
f.write(json.dumps(row, default=str) + "\n") | |||
|
|||
|
|||
# Debugging aids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I threw these in here out of convenience, but could be convinced to put them into a test_utils if we feel strongly about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine!
("core__medication"), | ||
("core__medicationrequest"), | ||
("core__observation"), | ||
("core__observation_lab"), | ||
("core__patient"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has been missing for awhile! 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know the reason that core__patient
has been removed?
FHIR Patient object exists. I think the reason was because of patient AGE or DOB, which we just truncated down to YYYY. Otherwise, I think this table "should" exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DPH active Cumulus studies have all been "over time" and thus relied more on core__encounter
, but I can easily imagine a user (not me) being confused by the lack of a core__patient
table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed, or not added, only from our unit tests, as an oversight. I have not done the work to trace this back
except Exception as e: | ||
conftest.debug_diff_tables(cols, table_rows, ref_table, pos=0) | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm leaving this in permanently for error states because it is incredibly useful for troubleshooting data problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fail Fast, love it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't as agressive with some of the renaming in this file, for a few reasons:
- we already have a ticket to do some cleanup work (core__study_period defines "ed_note" but this should not be present #198 ) that i'd like to keep seperate from the many, many moving parts here
- this straddles the border between the FHIR standard and the study interfaces currently, and i'm not quite ready to break that until i at least look at cutting the covid study over.
- I sort of think we might want some infrastructure around this that studies could use.
LEFT JOIN core__encounter_dn_reasonCode AS edr ON e.id = edr.id | ||
LEFT JOIN core__encounter_dn_dischargeDisposition AS edd ON e.id = edd.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, mixed case tables are just as fine as mixed case fields (casing is ignored), but it does hit me different - I'm less used to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't either - this might be a ctrl+f artifact, lemme touch this one.
Condition table would correspond to the code element inside of the category element of | ||
the Condition resource. | ||
|
||
Common exceptions to this rule: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other things you could mention, but you don't need to, just thoughts:
- DateTime and Period fields are usually chopped up.
- Often References are represented with a toplevel
xxx_ref
field, and sometimes that field lives alongside a simple ID version of same.
@@ -42,7 +42,7 @@ def test_discovery(tmp_path): | |||
) | |||
db = databases.DuckDatabaseBackend(f"{tmp_path}/duck.db") | |||
cursor = db.cursor() | |||
table_rows = conftest.get_sorted_table_data(cursor, "discovery__code_sources") | |||
table_rows, cols = conftest.get_sorted_table_data(cursor, "discovery__code_sources") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cols
isn't used anywhere, yeah? Does ruff not complain about it? That feels like a missing check to my brain. Usually I'd prefix it with a sunder, like _cols
to avoid warnings / signal to other devs that they can ignore it.
(Do you use the result anywhere? Is it just for manual debugging?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the except statement on line 68.
tests/test_core.py
Outdated
assert row in ref_table | ||
except Exception as e: | ||
conftest.debug_diff_tables(cols, table_rows, ref_table, pos=0) | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You can simply say raise
and it will re-raise the current exception. (which is nice, because then you don't even have to give the exception a name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh! i had no idea.
@@ -151,6 +159,52 @@ def update_nested_obj(id_path, obj, i): | |||
f.write(json.dumps(row, default=str) + "\n") | |||
|
|||
|
|||
# Debugging aids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine!
This PR makes the following changes:
core__encounter_type
table removed, as it is now 100% redundant with theencounter
tablecore__count_encounter_type
table renamed tocore__count_encounter_all_types
. Same for the month striated version.Checklist
docs/
) needs to be updated