Add rvdss indicator/data source #1542

cchuong · 2024-09-13T04:39:07Z

Summary:

Code to create a new datasource for the epidata AP. The data is respiratory virus detections in Canada, reported by the Public Health Agency of Canada (PHAC).

Prerequisites:

Unless it is a documentation hotfix it should be merged against the dev branch
Branch is up-to-date with the branch to be merged with, i.e. dev
Build is successful
Code is cleaned up and formatted

src/acquisition/rvdss/rvdss_historic.py

src/acquisition/rvdss/utils.py

This reverts commit 07ed998.

This reverts commit 08f908a.

nmdefries · 2024-09-23T21:26:58Z

We also want to have some unit tests here. This can be complicated for functions that normally interact with external components (websites, databases, etc), but we want to at least test functions that don't require that. (External components can be faked with mocking packages, but it's not essential.)

nmdefries

I renamed some things, moved some variables to constants.py, and added some comments. I will be looking at this more, but wanted to get you started with some feedback.

Overall, I'm having trouble understanding the flow of the code. I think this could be improved by adding more docstrings and comments in general, mid-level functions, and clear function naming.

Docstrings: the shortest form of this is one line describing what the function does, the core goal. We don't necessarily need to describe all the args, like

    Keyword arguments:
    real -- the real part (default 0.0)
    imag -- the imaginary part (default 0.0)

although it can be helpful for more complex and user-facing fns.

Comments: Comments give context to the code. They can say why we're doing this. They can describe broadly what we're doing in a chunk of code (not details of implementation). They can discuss cons or gotchas of the current approach. They can present alternatives to the current approach (generally when the alternatives were considered but were too complex for the current need).

Take this fn, for example. Why are we processing table captions? What is a table caption? What do the "table_identifiers" represent? What is the sum([all... chunk doing (broadly)?

You don't need to answer all these questions in the code, but it's good to insert comments where you think future readers will be confused.

Mid-level functions (optional but something to think about): The helper functions you've defined so far are mostly low level (get_report_date), and the higher level functions in this package are pretty long. It makes me think that that they can/should be broken down into more functions that call several of the low-level fns at once.

Function naming (recommended but optional): Try to make fn names very clearly say what they are doing. For example, get_report_date sounds to me like it is fetching the report date from the website ("get") . However, it's actually interpreting a given start_year-week combo. A better name might be "parse_report_date". Actually, I'm confused by the term "report date", because that makes me think of the date the entire report was posted, but I think this is handling time values (weeks within a given report). So maybe even "parse_time_value"? Or (this is too long, but brainstorming here...) "wrap_early_weeknums_to_next_year".

nmdefries · 2024-09-23T21:43:38Z

src/acquisition/rvdss/constants.py

+# Construct dashboard and data report URLS.
+DASHBOARD_BASE_URL_2023 = "https://health-infobase.canada.ca/src/data/respiratory-virus-detections/archive/{date}/"
+DASHBOARD_BASE_URLS_2023 = (


todo: Please describe a bit more about what these URLs are and if we will need to update them (add new dates in) ever.

issue: also, are these actually for 2023? All the dates are in 2024

nmdefries · 2024-09-23T23:28:44Z

src/acquisition/rvdss/utils.py

+    return(geo_type)
+
+def check_date_format(date_string):
+    if not re.search("[0-9]{4}-[0-9]{2}-[0-9]{2}",date_string):


suggestion (optional): Since datetime.strptime errors if the desired format doesn't match the actual format of the date, we could do this block as a try-catch instead.

suggestion (optional): or we could look for a function that auto-detects common date formats, like R's as.Date, so we don't need to manually do all this checking. dateutil might be a good place to start.

nmdefries · 2024-09-24T13:53:11Z

src/acquisition/rvdss/rvdss_historic.py

+
+def check_duplicate_rows(table):
+    if table['week'].duplicated().any():
+       table.columns = [re.sub("canada","can",t) for t in table.columns]


issue: Is this a permanent replacement of the "canada" name? I thought we were doing this elsewhere. If not, we should be doing it elsewhere so the logic flows better (someone reading the code won't think that this kind of geo replacement is happening in this particular function).

nmdefries · 2024-09-24T19:48:10Z

src/acquisition/rvdss/rvdss_historic.py

+       for name, group in grouped:
+           duplicates_drop.append(group['can tests'].idxmin())


todo: please add a comment here adding more detail about why we're doing this (what we've seen in existing data that necessitates this) and what our specific deduplication approach is.

nmdefries · 2024-09-24T19:50:42Z

src/acquisition/rvdss/rvdss_historic.py

+    pat8= r"^ah1n1pdm09"
+    combined_pat3 = '|'.join((pat5, pat6,pat7,pat8))
+
+    table.columns=[re.sub(combined_pat, "positive_tests",col) for col in table.columns] #making naming consistent


thought (optional): we do a LOT of changing names here. I think this chunk would be better as a separate function that is called here.

actually, it looks like this function mostly does renaming, so maybe we should change the name of the function to better reflect that, but leave the contents as-is.

Ditto for the other create....table functions

nmdefries · 2024-09-24T20:02:09Z

src/acquisition/rvdss/rvdss_historic.py

+            # Rename columns
+            table.columns = [re.sub("\xa0"," ", col) for col in table.columns] # \xa0 to space


thought: hmm, a lot of renaming happens here too. If this happens for all tables, maybe factor this out to a new rename... fn, and call it within each create...table fn.

More generally, this fn is pretty long. It would be more readable broken up into more separate functions.

nmdefries · 2024-09-25T20:17:10Z

Plan to do code walkthrough next week!

… into seperate function

sonarqubecloud · 2024-10-13T01:30:54Z

Quality Gate passed

Issues
6 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

…asily fetched (#1551) * add basic sql tables -- needs update with real col names * rename files * add main fn with CLI; remove date range params in package frontend fn stubs * start filling out historical fn stubs * rest of new fn layout. adds CLI * dashboard results can be stored directly in list in fetch_historical_dashboard_data * Add in archived dashboards, and calculate start year from data * address todos and fix historical fetching * Change misspelled CB to BC * Update imports --------- Co-authored-by: cchuong <[email protected]>

sonarqubecloud · 2024-11-22T18:36:02Z

Quality Gate passed

Issues
8 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
5.4% Duplication on New Code

See analysis details on SonarQube Cloud

cchuong added 2 commits September 12, 2024 21:21

Create rvdss_historic.py

62b9070

Create rvdss_update.py

073aac9

nmdefries self-requested a review September 13, 2024 19:44

nmdefries reviewed Sep 13, 2024

View reviewed changes

src/acquisition/rvdss/rvdss_historic.py Outdated Show resolved Hide resolved

create utils.py for common functions

01af95f

nmdefries reviewed Sep 16, 2024

View reviewed changes

src/acquisition/rvdss/utils.py Outdated Show resolved Hide resolved

cchuong and others added 13 commits September 16, 2024 11:39

create constants.py and update utils

6a002e0

Update rvdss_historic.py

714455c

Update rvdss_update.py

6ee8bb7

fix typo and add missing abbreviation to constants

8814554

fix typo

d7905c8

add missing geo

08f908a

Update constants.py

07ed998

Revert "Update constants.py"

fd5bf15

This reverts commit 07ed998.

Revert "add missing geo"

678b468

This reverts commit 08f908a.

fix geo and virus abbreviation

4bfc933

remove "province of" from geo_values

e8957c3

construct urls automatically

7720a24

comment constants

59f79bf

nmdefries and others added 10 commits September 23, 2024 17:33

note historic urls don't need to be updated

e70b0e9

be stricter about importing local fns

72d1906

move dashboard file names to constants

bf51bd3

move run-the-whole-pipeline code into main()

ee3cadf

add code to calculate number of positive tests back in

180e67f

update abbreviate_geo to remove periods and other spelling

6bd6e24

fix lab name missing province

a7666b8

comment historic script

503165e

move output file names to constants

256e697

replace boolean comparisons with pythonic "not"

cd83087

nmdefries reviewed Sep 24, 2024

View reviewed changes

nmdefries and others added 6 commits September 25, 2024 16:47

actually put csv names in constants

969295b

break more helper functions and add doctsrings

00f3f9a

add more comments

ecca542

calculate update dates in a new function

31ec961

combine different spellings of labs

0be5f08

change slash to underscore in constants and move more processing code…

5696636

… into seperate function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rvdss indicator/data source #1542

Add rvdss indicator/data source #1542

cchuong commented Sep 13, 2024 •

edited

Loading

nmdefries commented Sep 23, 2024

nmdefries left a comment

nmdefries Sep 23, 2024

nmdefries Sep 23, 2024 •

edited

Loading

nmdefries Sep 24, 2024

nmdefries Sep 24, 2024

nmdefries Sep 24, 2024

nmdefries Sep 24, 2024

nmdefries Sep 24, 2024

nmdefries commented Sep 25, 2024

sonarqubecloud bot commented Oct 13, 2024

sonarqubecloud bot commented Nov 22, 2024

		for name, group in grouped:
		duplicates_drop.append(group['can tests'].idxmin())

		# Rename columns
		table.columns = [re.sub("\xa0"," ", col) for col in table.columns] # \xa0 to space

Add rvdss indicator/data source #1542

Are you sure you want to change the base?

Add rvdss indicator/data source #1542

Conversation

cchuong commented Sep 13, 2024 • edited Loading

Summary:

Prerequisites:

nmdefries commented Sep 23, 2024

nmdefries left a comment

Choose a reason for hiding this comment

nmdefries Sep 23, 2024

Choose a reason for hiding this comment

nmdefries Sep 23, 2024 • edited Loading

Choose a reason for hiding this comment

nmdefries Sep 24, 2024

Choose a reason for hiding this comment

nmdefries Sep 24, 2024

Choose a reason for hiding this comment

nmdefries Sep 24, 2024

Choose a reason for hiding this comment

nmdefries Sep 24, 2024

Choose a reason for hiding this comment

nmdefries Sep 24, 2024

Choose a reason for hiding this comment

nmdefries commented Sep 25, 2024

sonarqubecloud bot commented Oct 13, 2024

Quality Gate passed

sonarqubecloud bot commented Nov 22, 2024

Quality Gate passed

cchuong commented Sep 13, 2024 •

edited

Loading

nmdefries Sep 23, 2024 •

edited

Loading