Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release covidcast-indicators 0.3.50 #1932

Merged
merged 29 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
1e1b143
rough revision script and init nwss_wastewater
dsweber2 Oct 17, 2023
8667e6b
wastewater first draft
dsweber2 Dec 6, 2023
9368c76
fix: actually working via `python -m delphi_NWSS`
dsweber2 Dec 7, 2023
8f202e7
style: happy linter
dsweber2 Dec 7, 2023
20aaebe
include setup.py...
dsweber2 Dec 14, 2023
22bb268
need a template
dsweber2 Dec 14, 2023
ba53086
format
dsweber2 Dec 14, 2023
95b4ef2
matching log location elsewhere, add to python ci
dsweber2 Dec 14, 2023
d118aa4
get the name right, include notes update
dsweber2 Dec 14, 2023
414eb69
sum, but if all nan, return nan
dsweber2 Dec 14, 2023
8b62909
addressing review comments
dsweber2 Dec 15, 2023
871a227
nwss: enough tests to pass CI
dsweber2 Dec 15, 2023
8e49de1
nwss: happy linter
dsweber2 Dec 15, 2023
bf6b487
add_default_nancodes, pass lint & test
dsweber2 Dec 16, 2023
1a347c6
lint: no megalines
dsweber2 Dec 16, 2023
cf7fbba
nwss: adding tests for pull functions
dsweber2 Dec 19, 2023
af15a42
Add param template
minhkhul Jan 3, 2024
e9cab35
remove archive
minhkhul Jan 3, 2024
56bddfb
add daily_receiving dir
minhkhul Jan 4, 2024
8d06adf
add gitignore csv files
minhkhul Jan 4, 2024
8465082
Update nwss_wastewater-params-prod.json.j2
dsweber2 Jan 4, 2024
b2c67f2
Delete nwss_wastewater/daily_receiving/.gitignore
minhkhul Jan 4, 2024
6ae4d2d
Merge pull request #1920 from cmu-delphi/nwss-stage
minhkhul Jan 4, 2024
56d2d25
Merge pull request #1923 from cmu-delphi/bot/sync-prod-main
melange396 Jan 8, 2024
833e818
Merge pull request #1913 from cmu-delphi/nwss
dsweber2 Jan 8, 2024
67adb8d
Use national data for nchs-mortality signals (#1912)
rzats Jan 11, 2024
b79525b
chore: bump delphi_utils to 0.3.22
Jan 16, 2024
fb47683
chore: bump covidcast-indicators to 0.3.50
Jan 16, 2024
624b73b
[create-pull-request] automated change
melange396 Jan 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.3.49
current_version = 0.3.50
commit = True
message = chore: bump covidcast-indicators to {new_version}
tag = False
56 changes: 34 additions & 22 deletions .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,37 +5,49 @@ name: Python package

on:
push:
branches: [ main, prod ]
branches: [main, prod]
pull_request:
types: [ opened, synchronize, reopened, ready_for_review ]
branches: [ main, prod ]
types: [opened, synchronize, reopened, ready_for_review]
branches: [main, prod]

jobs:
build:
runs-on: ubuntu-20.04
if: github.event.pull_request.draft == false
strategy:
matrix:
packages: [_delphi_utils_python, changehc, claims_hosp, doctor_visits, google_symptoms, hhs_hosp, nchs_mortality, quidel_covidtest, sir_complainsalot]
packages:
[
_delphi_utils_python,
changehc,
claims_hosp,
doctor_visits,
google_symptoms,
hhs_hosp,
nchs_mortality,
nwss_wastewater,
quidel_covidtest,
sir_complainsalot,
]
defaults:
run:
working-directory: ${{ matrix.packages }}
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.8
uses: actions/setup-python@v2
with:
python-version: 3.8
- name: Install testing dependencies
run: |
python -m pip install --upgrade pip
pip install pylint pytest pydocstyle wheel
- name: Install
run: |
make install-ci
- name: Lint
run: |
make lint
- name: Test
run: |
make test
- uses: actions/checkout@v2
- name: Set up Python 3.8
uses: actions/setup-python@v2
with:
python-version: 3.8
- name: Install testing dependencies
run: |
python -m pip install --upgrade pip
pip install pylint pytest pydocstyle wheel
- name: Install
run: |
make install-ci
- name: Lint
run: |
make lint
- name: Test
run: |
make test
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
- TODO: #527 Get this list automatically from python-ci.yml at runtime.
*/

def indicator_list = ["backfill_corrections", "changehc", "claims_hosp", "google_symptoms", "hhs_hosp", "nchs_mortality", "quidel_covidtest", "sir_complainsalot", "doctor_visits"]
def indicator_list = ["backfill_corrections", "changehc", "claims_hosp", "google_symptoms", "hhs_hosp", "nchs_mortality", "quidel_covidtest", "sir_complainsalot", "doctor_visits", "nwss_wastewater"]
def build_package_main = [:]
def build_package_prod = [:]
def deploy_staging = [:]
Expand Down
2 changes: 1 addition & 1 deletion _delphi_utils_python/.bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.3.21
current_version = 0.3.22
commit = True
message = chore: bump delphi_utils to {new_version}
tag = False
Expand Down
2 changes: 1 addition & 1 deletion _delphi_utils_python/delphi_utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@
from .nancodes import Nans
from .weekday import Weekday

__version__ = "0.3.21"
__version__ = "0.3.22"
30 changes: 29 additions & 1 deletion _delphi_utils_python/delphi_utils/nancodes.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,41 @@
"""Unified not-a-number codes for CMU Delphi codebase."""

from enum import IntEnum
import pandas as pd


class Nans(IntEnum):
"""An enum of not-a-number codes for the indicators."""
"""An enum of not-a-number codes for the indicators.

See the descriptions here: https://cmu-delphi.github.io/delphi-epidata/api/missing_codes.html
"""

NOT_MISSING = 0
NOT_APPLICABLE = 1
REGION_EXCEPTION = 2
CENSORED = 3
DELETED = 4
OTHER = 5


def add_default_nancodes(df: pd.DataFrame):
"""Add some default nancodes to the dataframe.

This method sets the `"missing_val"` column to NOT_MISSING whenever the
`"val"` column has `isnull()` as `False`; if `isnull()` is `True`, then it
sets `"missing_val"` to `OTHER`. It also sets both the `"missing_se"` and
`"missing_sample_size"` columns to `NOT_APPLICABLE`.

Returns
-------
pd.DataFrame
"""
# Default missingness codes
df["missing_val"] = Nans.NOT_MISSING
df["missing_se"] = Nans.NOT_APPLICABLE
df["missing_sample_size"] = Nans.NOT_APPLICABLE

# Mark any remaining nans with unknown
remaining_nans_mask = df["val"].isnull()
df.loc[remaining_nans_mask, "missing_val"] = Nans.OTHER
return df
2 changes: 1 addition & 1 deletion _delphi_utils_python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@

setup(
name="delphi_utils",
version="0.3.21",
version="0.3.22",
description="Shared Utility Functions for Indicators",
long_description=long_description,
long_description_content_type="text/markdown",
Expand Down
13 changes: 13 additions & 0 deletions ansible/templates/nwss_wastewater-params-prod.json.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"common": {
"export_dir": "./receiving",
"log_filename": "./nwss_wastewater.log",
"log_exceptions": false
},
"indicator": {
"wip_signal": true,
"export_start_date": "2020-02-01",
"static_file_dir": "./static",
"token": ""
}
}
2 changes: 1 addition & 1 deletion changehc/version.cfg
Original file line number Diff line number Diff line change
@@ -1 +1 @@
current_version = 0.3.49
current_version = 0.3.50
2 changes: 1 addition & 1 deletion claims_hosp/version.cfg
Original file line number Diff line number Diff line change
@@ -1 +1 @@
current_version = 0.3.49
current_version = 0.3.50
2 changes: 1 addition & 1 deletion doctor_visits/version.cfg
Original file line number Diff line number Diff line change
@@ -1 +1 @@
current_version = 0.3.49
current_version = 0.3.50
2 changes: 1 addition & 1 deletion google_symptoms/version.cfg
Original file line number Diff line number Diff line change
@@ -1 +1 @@
current_version = 0.3.49
current_version = 0.3.50
2 changes: 1 addition & 1 deletion hhs_hosp/version.cfg
Original file line number Diff line number Diff line change
@@ -1 +1 @@
current_version = 0.3.49
current_version = 0.3.50
2 changes: 2 additions & 0 deletions nchs_mortality/.pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
disable=logging-format-interpolation,
too-many-locals,
too-many-arguments,
too-many-branches,
too-many-statements,
# Allow pytest functions to be part of a class.
no-self-use,
# Allow pytest classes to have one test.
Expand Down
4 changes: 2 additions & 2 deletions nchs_mortality/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ the state-level data as-is. For detailed information see the files
`MyAppToken` is required when fetching data from SODA Consumer API
(https://dev.socrata.com/foundry/data.cdc.gov/r8kw-7aab). Follow the
steps below to create a MyAppToken.
- Click the `Sign up for an app toekn` buttom in the linked website
- Click the `Sign up for an app token` button in the linked website
- Sign In or Sign Up with Socrata ID
- Clck the `Create New App Token` button
- Click the `Create New App Token` button
- Fill in `Application Name` and `Description` (You can just use NCHS_Mortality
for both) and click `Save`
- Copy the `App Token`
Expand Down
1 change: 0 additions & 1 deletion nchs_mortality/delphi_nchs_mortality/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
"prop"
]
INCIDENCE_BASE = 100000
GEO_RES = "state"

# this is necessary as a delimiter in the f-string expressions we use to
# construct detailed error reports
Expand Down
11 changes: 7 additions & 4 deletions nchs_mortality/delphi_nchs_mortality/pull.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,6 @@ def pull_nchs_mortality_data(token: str, test_file: Optional[str]=None):
{NEWLINE.join(df.columns)}
""") from exc

# Drop rows for locations outside US
df = df[df["state"] != "United States"]
df = df[keep_columns + ["timestamp", "state"]].set_index("timestamp")

# NCHS considers NYC as an individual state, however, we want it included
Expand All @@ -124,6 +122,11 @@ def pull_nchs_mortality_data(token: str, test_file: Optional[str]=None):
# Add population info
keep_columns.extend(["timestamp", "geo_id", "population"])
gmpr = GeoMapper()
df = gmpr.add_population_column(df, "state_name", geocode_col="state")
df = gmpr.add_geocode(df, "state_name", "state_id", from_col="state", new_col="geo_id")
# Map state to geo_id, but set dropna=False as we also have national data
df = gmpr.add_population_column(df, "state_name",
geocode_col="state", dropna=False)
df = gmpr.add_geocode(df, "state_name", "state_id",
from_col="state", new_col="geo_id", dropna=False)
# Manually set geo_id for national data
df.loc[df["state"] == "United States", "geo_id"] = "us"
return df[keep_columns]
71 changes: 37 additions & 34 deletions nchs_mortality/delphi_nchs_mortality/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

from .archive_diffs import arch_diffs
from .constants import (METRICS, SENSOR_NAME_MAP,
SENSORS, INCIDENCE_BASE, GEO_RES)
SENSORS, INCIDENCE_BASE)
from .pull import pull_nchs_mortality_data


Expand Down Expand Up @@ -72,51 +72,54 @@ def run_module(params: Dict[str, Any]):
stats = []
df_pull = pull_nchs_mortality_data(token, test_file)
for metric in METRICS:
if metric == 'percent_of_expected_deaths':
logger.info("Generating signal and exporting to CSV",
metric = metric)
df = df_pull.copy()
df["val"] = df[metric]
df["se"] = np.nan
df["sample_size"] = np.nan
df = add_nancodes(df)
# df = df[~df["val"].isnull()]
sensor_name = "_".join([SENSOR_NAME_MAP[metric]])
dates = create_export_csv(
df,
geo_res=GEO_RES,
export_dir=daily_export_dir,
start_date=datetime.strptime(export_start_date, "%Y-%m-%d"),
sensor=sensor_name,
weekly_dates=True
)
if len(dates) > 0:
stats.append((max(dates), len(dates)))
else:
for sensor in SENSORS:
for geo in ["state", "nation"]:
if metric == 'percent_of_expected_deaths':
logger.info("Generating signal and exporting to CSV",
metric = metric,
sensor = sensor)
metric=metric, geo_level=geo)
df = df_pull.copy()
if sensor == "num":
df["val"] = df[metric]
if geo == "nation":
df = df[df["geo_id"] == "us"]
else:
df["val"] = df[metric] / df["population"] * INCIDENCE_BASE
df = df[df["geo_id"] != "us"]
df["val"] = df[metric]
df["se"] = np.nan
df["sample_size"] = np.nan
df = add_nancodes(df)
# df = df[~df["val"].isnull()]
sensor_name = "_".join([SENSOR_NAME_MAP[metric], sensor])
dates = create_export_csv(
df,
geo_res=GEO_RES,
geo_res=geo,
export_dir=daily_export_dir,
start_date=datetime.strptime(export_start_date, "%Y-%m-%d"),
sensor=sensor_name,
sensor=SENSOR_NAME_MAP[metric],
weekly_dates=True
)
if len(dates) > 0:
stats.append((max(dates), len(dates)))
else:
for sensor in SENSORS:
logger.info("Generating signal and exporting to CSV",
metric=metric, sensor=sensor, geo_level=geo)
df = df_pull.copy()
if geo == "nation":
df = df[df["geo_id"] == "us"]
else:
df = df[df["geo_id"] != "us"]
if sensor == "num":
df["val"] = df[metric]
else:
df["val"] = df[metric] / df["population"] * INCIDENCE_BASE
df["se"] = np.nan
df["sample_size"] = np.nan
df = add_nancodes(df)
sensor_name = "_".join([SENSOR_NAME_MAP[metric], sensor])
dates = create_export_csv(
df,
geo_res=geo,
export_dir=daily_export_dir,
start_date=datetime.strptime(export_start_date, "%Y-%m-%d"),
sensor=sensor_name,
weekly_dates=True
)
if len(dates) > 0:
stats.append((max(dates), len(dates)))

# Weekly run of archive utility on Monday
# - Does not upload to S3, that is handled by daily run of archive utility
Expand Down
38 changes: 20 additions & 18 deletions nchs_mortality/tests/test_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ def test_output_files_exist(self, run_as_module, date):
for output_folder in folders:
csv_files = listdir(output_folder)

geos = ["nation", "state"]
dates = [
"202030",
"202031",
Expand All @@ -38,15 +39,14 @@ def test_output_files_exist(self, run_as_module, date):
sensors = ["num", "prop"]

expected_files = []
for d in dates:
for metric in metrics:
if metric == "deaths_percent_of_expected":
expected_files += ["weekly_" + d + "_state_" \
+ metric + ".csv"]
else:
for sensor in sensors:
expected_files += ["weekly_" + d + "_state_" \
+ metric + "_" + sensor + ".csv"]
for geo in geos:
for d in dates:
for metric in metrics:
if metric == "deaths_percent_of_expected":
expected_files += [f"weekly_{d}_{geo}_{metric}.csv"]
else:
for sensor in sensors:
expected_files += [f"weekly_{d}_{geo}_{metric}_{sensor}.csv"]
assert set(expected_files).issubset(set(csv_files))

# the 14th was a Monday
Expand All @@ -58,12 +58,14 @@ def test_output_file_format(self, run_as_module, date):
if is_mon_or_thurs:
folders.append("receiving")

for output_folder in folders:
df = pd.read_csv(
join(output_folder, "weekly_202026_state_deaths_covid_incidence_prop.csv")
)
expected_columns = [
"geo_id", "val", "se", "sample_size",
"missing_val", "missing_se", "missing_sample_size"
]
assert (df.columns.values == expected_columns).all()
geos = ["nation", "state"]
for geo in geos:
for output_folder in folders:
df = pd.read_csv(
join(output_folder, f"weekly_202026_{geo}_deaths_covid_incidence_prop.csv")
)
expected_columns = [
"geo_id", "val", "se", "sample_size",
"missing_val", "missing_se", "missing_sample_size"
]
assert (df.columns.values == expected_columns).all()
2 changes: 1 addition & 1 deletion nchs_mortality/version.cfg
Original file line number Diff line number Diff line change
@@ -1 +1 @@
current_version = 0.3.49
current_version = 0.3.50
Loading