Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use national data for nchs-mortality signals #1912

Merged
merged 11 commits into from
Jan 11, 2024

Conversation

rzats
Copy link
Contributor

@rzats rzats commented Nov 22, 2023

Description

Currently, only state-specific data is used for any of the data with source=nchs-mortality. As the Covidcast dashboard uses the US as a whole by default, this leads to "N/A" values showing up in plots and numeric text.

This PR makes use of the previously discarded national data for all signals in the NCHS family, pulling it from the dataset rather than throwing it away.

Changelog

  • Alters the available geo resolutions for NCHS data from state to ['state', 'nation'].
    • Excludes the percent_of_expected_deaths - despite the name, it contains proportions of expected deaths (e.g. 1.1 instead of 110%), which are nontrivial to aggregate.
  • Adds logic to process nation data from the dataset.
  • Adds tests for whether data with the new resolutions is correctly created and exported to CSVs.

Fixes

@rzats rzats requested a review from melange396 November 22, 2023 15:09
@melange396
Copy link
Contributor

It turns out that this dataset already has national-level data in it, we have just been throwing it away!

df = df[df["state"] != "United States"]

You can also see samples of this by browsing the dataset at:
https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Week-Ending-D/r8kw-7aab/data_preview

It is better to use this as-is from the source data than to rebuild it as an aggregation.

@rzats rzats changed the title Synthesize national data for nchs-mortality signals Use national data for nchs-mortality signals Jan 10, 2024
@rzats
Copy link
Contributor Author

rzats commented Jan 10, 2024

@melange396 I've reworked the PR to use the data within the dataset, rather than synthesize new data. Of note:

  • Removed one of the new tests, since the sum of state counts no longer adds up to the national count. This is due to some small cells being excluded from the data, as described in the test dataset:
One or more data cells have counts between 1–9 and have been suppressed in accordance with NCHS confidentiality standards.
  • Re-included the percent_of_expected_deaths metric, as the data for it no longer needs to be nontrivially synthesized.

Copy link
Contributor

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! we should do a little cleanup on the inner loops of run_module(), and then i think it will be good to go.

nchs_mortality/delphi_nchs_mortality/run.py Outdated Show resolved Hide resolved
nchs_mortality/delphi_nchs_mortality/run.py Outdated Show resolved Hide resolved
nchs_mortality/delphi_nchs_mortality/run.py Outdated Show resolved Hide resolved
nchs_mortality/delphi_nchs_mortality/run.py Outdated Show resolved Hide resolved
nchs_mortality/delphi_nchs_mortality/run.py Outdated Show resolved Hide resolved
nchs_mortality/delphi_nchs_mortality/run.py Outdated Show resolved Hide resolved
stats.append((max(dates), len(dates)))
else:
for sensor in SENSORS:
for geo in ["state", "nation"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be able to pull this up two levels (outside of for metric in METRICS:) to reduce repetition

you could even do the filtering on geo_id ==/!= "us" there too, if you want to make another [sub]copy of df_pull

nchs_mortality/delphi_nchs_mortality/run.py Outdated Show resolved Hide resolved
nchs_mortality/delphi_nchs_mortality/run.py Outdated Show resolved Hide resolved
nchs_mortality/delphi_nchs_mortality/run.py Outdated Show resolved Hide resolved
Copy link
Contributor

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

@melange396 melange396 merged commit 67adb8d into main Jan 11, 2024
15 checks passed
@melange396 melange396 deleted the rzatserkovnyi/national-data branch January 11, 2024 20:31
@melange396
Copy link
Contributor

nchs-mortality national-level data went in last thursday for the first time...

Did this work for you in your local testing? Since these values are coming out as "NULL", it makes me think the denominator here is 0 or null, which then makes me think that the national population is not getting set properly here.

Also, to make this message go away, we need to mark the nchs-mortality signals as having national-level data in the spreadsheet and then get it transferred to the csv. We can take care of that after the signals are properly acquired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Synthesize "national" data for nchs-mortality:deaths_covid_incidence_prop
2 participants