Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panel Data Vignette (Issue 99) #115

Merged
merged 54 commits into from
Apr 27, 2024
Merged
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
027296e
wip panel data
mgyliu Aug 2, 2022
6673d6c
update epi_keys_mold and tests to handle additional keys
mgyliu Aug 3, 2022
a4b0dd7
put back files from rebase
mgyliu Aug 12, 2022
bd47fa8
fix data format
mgyliu Aug 12, 2022
0b797b0
wip vignette
mgyliu Aug 12, 2022
1683240
updates to vignette
mgyliu Aug 16, 2022
3864d7d
use a better dataset
mgyliu Aug 17, 2022
c781d33
use better data pt 2
mgyliu Aug 17, 2022
34ecc96
fix doc formatting
mgyliu Aug 17, 2022
2f1631d
wording changes
mgyliu Aug 17, 2022
325f8a2
wip panel data
mgyliu Aug 2, 2022
29d567c
update epi_keys_mold and tests to handle additional keys
mgyliu Aug 3, 2022
b30b838
put back files from rebase
mgyliu Aug 12, 2022
340fe12
fix data format
mgyliu Aug 12, 2022
25674a5
wip vignette
mgyliu Aug 12, 2022
27f1387
updates to vignette
mgyliu Aug 16, 2022
2a2c903
use a better dataset
mgyliu Aug 17, 2022
408e7c6
use better data pt 2
mgyliu Aug 17, 2022
0098adc
fix doc formatting
mgyliu Aug 17, 2022
e498edb
wording changes
mgyliu Aug 17, 2022
44afa40
add some math
mgyliu Aug 23, 2022
75a846e
change notation
mgyliu Aug 23, 2022
0747d14
add some todos
mgyliu Aug 23, 2022
acd5a5a
WIP
mgyliu Sep 14, 2022
bbc44d3
make the vignette render finally
mgyliu Sep 14, 2022
c48548c
edits, plots
mgyliu Sep 14, 2022
82b01cc
title change
mgyliu Sep 14, 2022
bf2d171
fix linting, add model, various edits
mgyliu Sep 15, 2022
9822f8d
fix data.R
mgyliu Sep 15, 2022
8d8d528
remove raw data generation eda lines
mgyliu Sep 15, 2022
e9f8751
use extracted fit in plot
mgyliu Sep 15, 2022
039abf5
add a hyperlink
mgyliu Sep 15, 2022
d04e2b4
fix the build
mgyliu Sep 18, 2022
2037716
fix conflict
dajmcdon Dec 27, 2022
d24e931
Merge branch 'main' into ml-99-panel-data-vignette
dajmcdon Dec 27, 2022
738bb68
rebuild documentation
dajmcdon Dec 27, 2022
fd4798d
merge main, fix conflicts
dajmcdon Dec 27, 2022
ce40c50
Merge branch 'dev' into ml-99-panel-data-vignette
dajmcdon Feb 3, 2024
ebb60ad
ignore vignette caches
dajmcdon Feb 3, 2024
009abf5
merge dev and minor revisions
dajmcdon Feb 3, 2024
4871589
bug: blocked by #291
dajmcdon Feb 3, 2024
8552e1a
done. forecast_date/target_date processing is bolixed by #291
dajmcdon Feb 6, 2024
a010db0
Merge branch 'dev' into ml-99-panel-data-vignette
dajmcdon Mar 8, 2024
44a145f
some simplifications
dajmcdon Mar 9, 2024
943e180
Merge branch 'dev' into ml-99-panel-data-vignette
dajmcdon Apr 9, 2024
4140ee1
move all vignette data to here vignettes/
dajmcdon Apr 9, 2024
12745fb
export grad_employ_subset, redocument
dajmcdon Apr 9, 2024
4296c16
fix vignette to match
dajmcdon Apr 9, 2024
4dd2b1c
checks pass
dajmcdon Apr 9, 2024
ec775a0
style and fix pkgdown
dajmcdon Apr 9, 2024
e3dc215
Minor fixes
rachlobay Apr 24, 2024
1ee300b
styler
rachlobay Apr 24, 2024
cc988cb
address @rachellobay review, adjust canned printing to handle `other_…
dajmcdon Apr 26, 2024
55e8166
add a conclusion
dajmcdon Apr 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions R/data.R
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,32 @@
#' \url{https://www.census.gov/data/tables/time-series/demo/popest/2010s-total-puerto-rico-municipios.html},
#' and \url{https://www.census.gov/data/tables/2010/dec/2010-island-areas.html}
"state_census"

#' Subset of Statistics Canada median employment income for postsecondary graduates
#'
#' @format An [epiprocess::epi_df] with 10193 rows and 8 variables:
#' \describe{
#' \item{geo_value}{The province in Canada associated with each
#' row of measurements.}
#' \item{time_value}{The time value, a year integer in YYYY format}
#' \item{edu_qual}{The education qualification}
#' \item{fos}{The field of study}
#' \item{age_group}{The age group; either 15 to 34 or 35 to 64}
#' \item{num_graduates}{The number of graduates for the given row of characteristics}
#' \item{med_income_2y}{The median employment income two years after graduation}
#' \item{med_income_5y}{The median employment income five years after graduation}
#' }
#' @source This object contains modified data from the following Statistics Canada
#' data table: \href{https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3710011501}{
#' Characteristics and median employment income of longitudinal cohorts of postsecondary
#' graduates two and five years after graduation, by educational qualification and
#' field of study (primary groupings)
#' }
#'
#' Modifications:
#' * Only provincial-level geo_values are kept
#' * Only age group, field of study, and educational qualification are kept as
#' covariates. For the remaining covariates, we keep aggregated values and
#' drop the level-specific rows.
#' * No modifications were made to the time range of the data
"grad_employ_subset"
1 change: 1 addition & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ reference:
contents:
- case_death_rate_subset
- state_census
- grad_employ_subset



106 changes: 106 additions & 0 deletions data-raw/grad_employ_subset.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
library(epipredict)
library(epiprocess)
library(cansim)
library(dplyr)
library(stringr)
library(tidyr)

# https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3710011501
statcan_grad_employ <- get_cansim("37-10-0115-01")

gemploy <- statcan_grad_employ %>%
select(c(
"REF_DATE",
"GEO",
# "DGUID",
# "UOM",
# "UOM_ID",
# "SCALAR_FACTOR",
# "SCALAR_ID",
# "VECTOR",
# "COORDINATE",
"VALUE",
"STATUS",
# "SYMBOL",
# "TERMINATED",
# "DECIMALS",
# "GeoUID",
# "Hierarchy for GEO",
# "Classification Code for Educational qualification",
# "Hierarchy for Educational qualification",
# "Classification Code for Field of study",
# "Hierarchy for Field of study",
# "Classification Code for Gender",
# "Hierarchy for Gender",
# "Classification Code for Age group",
# "Hierarchy for Age group",
# "Classification Code for Status of student in Canada",
# "Hierarchy for Status of student in Canada",
# "Classification Code for Characteristics after graduation",
# "Hierarchy for Characteristics after graduation",
# "Classification Code for Graduate statistics",
# "Hierarchy for Graduate statistics",
# "val_norm",
# "Date",
"Educational qualification",
"Field of study",
"Gender",
"Age group",
"Status of student in Canada",
"Characteristics after graduation",
"Graduate statistics"
)) %>%
rename(
"geo_value" = "GEO",
"time_value" = "REF_DATE",
"value" = "VALUE",
"status" = "STATUS",
"edu_qual" = "Educational qualification",
"fos" = "Field of study",
"gender" = "Gender",
"age_group" = "Age group",
"student_status" = "Status of student in Canada",
"grad_charac" = "Characteristics after graduation",
"grad_stat" = "Graduate statistics"
) %>%
mutate(
grad_stat = recode_factor(
grad_stat,
`Number of graduates` = "num_graduates",
`Median employment income two years after graduation` = "med_income_2y",
`Median employment income five years after graduation` = "med_income_5y"
),
time_value = as.integer(time_value)
) %>%
pivot_wider(names_from = grad_stat, values_from = value) %>%
filter(
# Drop aggregates for some columns
geo_value != "Canada" &
age_group != "15 to 64 years" &
edu_qual != "Total, educational qualification" &
# Keep aggregates for keys we don't want to keep
fos == "Total, field of study" &
gender == "Total, gender" &
student_status == "Canadian and international students" &
# Since we're looking at 2y and 5y employment income, the only
# characteristics remaining are:
# - Graduates reporting employment income
# - Graduates reporting wages, salaries, and commissions only
# For simplicity, keep the first one only
grad_charac == "Graduates reporting employment income" &
# Only keep "good" data
is.na(status) &
# Drop NA value rows
!is.na(num_graduates) & !is.na(med_income_2y) & !is.na(med_income_5y)
) %>%
select(-c(status, gender, student_status, grad_charac, fos))

nrow(gemploy)
ncol(gemploy)

grad_employ_subset <- gemploy %>%
as_epi_df(
as_of = "2022-07-19",
additional_metadata = list(other_keys = c("age_group", "edu_qual"))
)
usethis::use_data(grad_employ_subset, overwrite = TRUE)
Binary file added data/grad_employ_subset.rda
Binary file not shown.
44 changes: 44 additions & 0 deletions man/grad_employ_subset.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion vignettes/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
*.html
*.R
*_cache/
*.R
5 changes: 1 addition & 4 deletions vignettes/articles/sliding.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,7 @@ versions for the less up-to-date input archive.
```{r grab-epi-data}
theme_set(theme_bw())

y <- readRDS(system.file(
"extdata", "all_states_covidcast_signals.rds",
package = "epipredict", mustWork = TRUE
))
y <- readRDS("all_states_covidcast_signals.rds")

y <- purrr::map(y, ~ select(.x, geo_value, time_value, version = issue, value))

Expand Down
Loading
Loading