Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor hospital admission to use delphi_utils create_export_csv #2032
refactor hospital admission to use delphi_utils create_export_csv #2032
Changes from 1 commit
003fa54
7a2c25a
1d9b165
c9b600d
ed9031a
ad3c14f
9e9e086
8e4b4f9
0aaad70
10f1812
8822240
5481f0c
ed4ff19
4954f9a
47d8c39
b78c63a
30f5a29
12b62a3
ec1a003
38893fd
29a5153
55958b0
95e6708
f3259de
6234e64
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: is this necessary here? create_export_csv will drop it anyway.
broader question: tracing the code above, i actually don't know what columns are in output_df at this step. in the previous code, we at least knew that we were dealing with
suggestion: i suppose that depends on what
res = ClaimsHospIndicator.fit(sub_data, self.burnindate, geo_id)
outputs inupdate_indicator
, but i haven't tracked that down. what do you think about adding an assert toupdate_indicator
at the end that makes sure that output_df has the all the right columns that we expect?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you take a look at the comment above again, i updated it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! I actually missed sample_size if it wasn't for your comment. Hopefully this should fix the issue.
Yes, we do need the incl at least until the preprocess_output that filters out with incl column being true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad that helped!
I meant is it necessary to even drop it in line 230, since create_export_csv will ignore it when writing the csv. But it's a minor thing, not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: nice test, thanks!
question: how big is the dataset we're comparing here? do you think it's representative and gets a lot of code coverage?
suggestion: this seems like another migration test we can remove once this PR is ready for merge.
suggestion: if we want to be especially careful, we could run this same kind of test but compare staging and prod output CSVs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A1: I need to double check, but I believe I got an actual file from a one-off run and should be about a gig, but I would need to double check. Do you think I should add another file that's more recent?
Response to S1: that's the idea
Response to S2: that seems like a good idea; I would need to poke around how staging is and see what happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the source files for hospital admission, but the answer here really depends on whether source file is a one of many signals, one of many geos, etc. if this single drop contains every signal as a column and it's the source geo that we aggregate up, then that's good coverage. but if it's not, then doing a prod/staging comparison will get that coverage instead.
side-note: very important that we squash merge this PR, so the gig-sized file doesn't make it into the commit history.
I think it would be worthwhile, so let's do that at some point. I also think that your experience with doing prod/staging comparisons will help us streamline this process in the future and make something that does branch comparisons with the press of a button.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in staging: ran the older version and saved the output in
/common/text_hosptial_admission_test_export_20240903
and after scp into local and comparing with the new version
sample script below
passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many export csvs are produced by the staging run?
/common/text_hosptial_admission_test_export_20240903
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20076 or so. hospital-admission creates all geos starting from 2020-02-01 till 2024-08-31 (there's some lag)