Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor hospital admission to use delphi_utils create_export_csv #2032
refactor hospital admission to use delphi_utils create_export_csv #2032
Changes from 19 commits
003fa54
7a2c25a
1d9b165
c9b600d
ed9031a
ad3c14f
9e9e086
8e4b4f9
0aaad70
10f1812
8822240
5481f0c
ed4ff19
4954f9a
47d8c39
b78c63a
30f5a29
12b62a3
ec1a003
38893fd
29a5153
55958b0
95e6708
f3259de
6234e64
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rereading this bit, I'm thinking:
df_list
seems unnecessary. These are all just asserts that terminate the pipeline if any of the values don't pass validation, so we should just run the asserts, but don't rebuild the df. The only difference betweenfiltered_df
andoutput_df
are thegroup["se"] = np.NaN
andgroup["sample_size"] = np.NaN
transformations, but those are independent of group, so can be handled outside the for-loop. It might even make sense to handleand adding NAs at the end of
update_indicator
, call this functionvalidate_dataframe
, and don't have it return anything.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I'd just try removing the df_list and output_df though and do something like this after the for loop with all the assert statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious (a) if that works, (b) how much that speeds things up. I'd guess that this for loop is really expensive because it runs over all counties (update_indicator does too, but at least we parallelize that). There may be a way to avoid the for loop using DataFrame methods, but we can get there later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the create_export_csv is what's slowing down the runs. the preprocessing takes about half the time as the previous write_to_csv. It's the delphi utils create_export_csv that's slower than the previous versions.
Still moved the nan columns outside of the loop though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I see what you mean. Really surprised create_export_csv is that much slower, that's a bit unfortunate.