Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FHIR field naming conformance #201

Merged
merged 3 commits into from
Apr 1, 2024
Merged

FHIR field naming conformance #201

merged 3 commits into from
Apr 1, 2024

Conversation

dogversioning
Copy link
Contributor

This PR makes the following changes:

  • Column names now adhere more closely to FHIR naming conventions
  • core__encounter_type table removed, as it is now 100% redundant with the encounter table
  • core__count_encounter_type table renamed to core__count_encounter_all_types. Same for the month striated version.
  • Added information on exceptions to FHIR naming to the core study documentation.

Checklist

  • Consider if documentation (like in docs/) needs to be updated
  • Consider if tests should be added
  • Update template repo if there are changes to study configuration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file was an older reference sql file that just never got cleaned up.

Comment on lines 30 to 47
Unless otherwise noted, column names correspond to a FHIR path relative to the resource
table in question, with resources delineated by underscores. So, 'category_code' in the
Condition table would correspond to the code element inside of the category element of
the Condition resource.

Common exceptions to this rule:

- `cnt` is a Cumulus specific notation for a column that contains counts of resources.
- `age_at_visit` is calculated from a Patient's birthDate and the period.start.day of an Encounter
- `postalcode_3` is calculated from a Patient's address.postalCode
- The Encounter resource includes several elements from the referenced Patient (`gender`, `postalcode_3`,
and the US Core race and ethnicity extensions) that are commonly used in informatics analysis

The core tables include all FHIR required/must support fields noted in the
[FHIR resource profiles(http://hl7.org/fhir/us/core/STU4/artifacts.html#structures-resource-profiles).
Additionally, there are fields that are useful to informatics analysis that are commonly available
from EHRs, but are not guaranteed to be populated, so consult with your research partners if
you are authoring a study using some of these data elements.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comorbidity drawing your attention to this text - not quite in db metadata, but would this work for external consumers, i.e. a DPH?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dogversioning Which part are you referring to? If you mean ZipCode with 3 digits, that's a HIPAA constraint.
There has been an outstanding task to look at "census track data" but that has not yet been completed. ZipCode with 5 digits to my knowledge we do not have PHI clearance for, playing it safe here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comorbidity this was more directed at answering the question that source metadata was trying to answer, i.e. 'i see this field name in the dashboard, where does it come from'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comorbidity this might be the easiest place to note the changes this PR makes to the column names from the perspective of a study author.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

age_at_visit is a calculated view very, very commonly used in population health studies that involve patients over any timeperiod (or even patients at a single point in time).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment was more aimed at the general cutover to FHIR-like column names, not age at visit specifically.

@@ -151,6 +159,52 @@ def update_nested_obj(id_path, obj, i):
f.write(json.dumps(row, default=str) + "\n")


# Debugging aids
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I threw these in here out of convenience, but could be convinced to put them into a test_utils if we feel strongly about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine!

("core__medication"),
("core__medicationrequest"),
("core__observation"),
("core__observation_lab"),
("core__patient"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been missing for awhile! 😬

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know the reason that core__patient has been removed?
FHIR Patient object exists. I think the reason was because of patient AGE or DOB, which we just truncated down to YYYY. Otherwise, I think this table "should" exist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DPH active Cumulus studies have all been "over time" and thus relied more on core__encounter, but I can easily imagine a user (not me) being confused by the lack of a core__patient table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed, or not added, only from our unit tests, as an oversight. I have not done the work to trace this back

Comment on lines +54 to +56
except Exception as e:
conftest.debug_diff_tables(cols, table_rows, ref_table, pos=0)
raise e
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm leaving this in permanently for error states because it is incredibly useful for troubleshooting data problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fail Fast, love it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't as agressive with some of the renaming in this file, for a few reasons:

  • we already have a ticket to do some cleanup work (core__study_period defines "ed_note" but this should not be present #198 ) that i'd like to keep seperate from the many, many moving parts here
  • this straddles the border between the FHIR standard and the study interfaces currently, and i'm not quite ready to break that until i at least look at cutting the covid study over.
  • I sort of think we might want some infrastructure around this that studies could use.

Comment on lines 88 to 89
LEFT JOIN core__encounter_dn_reasonCode AS edr ON e.id = edr.id
LEFT JOIN core__encounter_dn_dischargeDisposition AS edd ON e.id = edd.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, mixed case tables are just as fine as mixed case fields (casing is ignored), but it does hit me different - I'm less used to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't either - this might be a ctrl+f artifact, lemme touch this one.

Condition table would correspond to the code element inside of the category element of
the Condition resource.

Common exceptions to this rule:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other things you could mention, but you don't need to, just thoughts:

  • DateTime and Period fields are usually chopped up.
  • Often References are represented with a toplevel xxx_ref field, and sometimes that field lives alongside a simple ID version of same.

@@ -42,7 +42,7 @@ def test_discovery(tmp_path):
)
db = databases.DuckDatabaseBackend(f"{tmp_path}/duck.db")
cursor = db.cursor()
table_rows = conftest.get_sorted_table_data(cursor, "discovery__code_sources")
table_rows, cols = conftest.get_sorted_table_data(cursor, "discovery__code_sources")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cols isn't used anywhere, yeah? Does ruff not complain about it? That feels like a missing check to my brain. Usually I'd prefix it with a sunder, like _cols to avoid warnings / signal to other devs that they can ignore it.

(Do you use the result anywhere? Is it just for manual debugging?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the except statement on line 68.

assert row in ref_table
except Exception as e:
conftest.debug_diff_tables(cols, table_rows, ref_table, pos=0)
raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can simply say raise and it will re-raise the current exception. (which is nice, because then you don't even have to give the exception a name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh! i had no idea.

@@ -151,6 +159,52 @@ def update_nested_obj(id_path, obj, i):
f.write(json.dumps(row, default=str) + "\n")


# Debugging aids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine!

@dogversioning dogversioning merged commit bf37eec into main Apr 1, 2024
3 checks passed
@dogversioning dogversioning deleted the mg/FHIR_naming branch April 1, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants