Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V0.5.6 #200

Merged
merged 9 commits into from
Oct 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: cancensus
Type: Package
Title: Access, Retrieve, and Work with Canadian Census Data and Geography
Version: 0.5.5
Version: 0.5.6
Authors@R: c(
person("Jens", "von Bergmann", email = "[email protected]", role = c("aut"), comment = "API creator and maintainer"),
person("Dmitry", "Shkolnik", email = "[email protected]", role = c("aut", "cre"), comment = "Package maintainer, responsible for correspondence"),
Expand All @@ -24,15 +24,13 @@ Imports: digest (>= 0.1),
httr (>= 1.0.0),
jsonlite (>= 1.0),
rlang
RoxygenNote: 7.2.1
RoxygenNote: 7.2.3
Suggests: knitr,
ggplot2,
leaflet,
mapdeck,
rmarkdown,
readr,
rgdal,
rgeos,
scales,
sp,
sf,
Expand Down
7 changes: 7 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
# cancensus 0.5.6

- fix issue when using named vectors to query data for non-existent geographies, return NULL in this case instead of throwing error
- fix problem with population centre geographic data download
- support newly released Forward Sortation Area geography for statcan geography and WDS functionality
- remove instances of new native R pipe |> with dplyr pipe %>% to preserve compatibility with older R versions

# cancensus 0.5.5

- add functionality for direct access to StatCan census WDS for 2021
Expand Down
5 changes: 4 additions & 1 deletion R/cancensus.R
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,10 @@ get_census <- function (dataset, regions, level=NA, vectors=c(), geo_format = NA
to_rename <- setNames(names(result),gsub(":.*","",names(result)))
to_rename <- to_rename[names(to_rename)!=as.character(to_rename)]
if (length(to_rename)>0) result <- result %>% dplyr::rename(!!!to_rename)
if (!is.null(names(vectors))) result <- result %>% dplyr::rename(!!! vectors)
if (!is.null(names(vectors))) {
to_rename <- vectors[as.character(vectors) %in% names(result)]
if (length(to_rename)>0) result <- result %>% dplyr::rename(!!! to_rename)
}
}
}

Expand Down
10 changes: 6 additions & 4 deletions R/census_regions.R
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,9 @@
#' @export
#'
#' @examples
#' \dontrun{
#' list_census_regions('CA16')
#' }
list_census_regions <- function(dataset, use_cache = TRUE, quiet = FALSE) {
dataset <- translate_dataset(dataset)
cache_file <- file.path(tempdir(),paste0(dataset, "_regions.rda"))
Expand Down Expand Up @@ -200,16 +202,16 @@ add_unique_names_to_region_list <- function(region_list) {
dplyr::group_by(.data$name) %>%
dplyr::mutate(count=dplyr::n()) %>%
dplyr::mutate(Name=dplyr::case_when(.data$count==1 ~ name,
TRUE ~ paste0(.data$name," (",.data$municipal_status,")"))) |>
TRUE ~ paste0(.data$name," (",.data$municipal_status,")"))) %>%
dplyr::group_by(.data$Name) %>%
dplyr::mutate(count=dplyr::n()) %>%
dplyr::mutate(Name=dplyr::case_when(.data$count==1 ~ Name,
TRUE ~ paste0(.data$Name," (",.data$region,")"))) |>
dplyr::select(-.data$count) |>
TRUE ~ paste0(.data$Name," (",.data$region,")"))) %>%
dplyr::select(-.data$count) %>%
dplyr::ungroup()

if (length(gs)>1) {
r <- r |>
r <- r %>%
dplyr::group_by(dplyr::across(dplyr::all_of(gs)))
}
r
Expand Down
14 changes: 11 additions & 3 deletions R/geographies.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#'
#' @param census_year census year to get the data for, right now only 2021 is supported
#' @param level geographic level to return the data for, valid choices are
#' "PR","CD","CMACA","CSD","CT","ADA","DA","ER","FED","DPL","POPCNTR"
#' "PR","CD","CMACA","CSD","CT","ADA","DA","ER","FED","DPL","POPCNTR", "FSA"
#' @param type type of geographic data, valid choices area "cartographic" or "digital"
#' @param cache_path optional path to cache the data. If the cancensus cache path is set the geographic data gets
#' cached in the "geographies" subdirectory of the cancensus cache path.
Expand All @@ -24,7 +24,7 @@ get_statcan_geographies <- function(census_year,level,type="cartographic",
cache_path = NULL,timeout=1000,
refresh=FALSE,quiet=FALSE) {
valid_census_years <- c("2021")
valid_levels <- c("PR","CD","CMACA","CMA","CA","CSD","CT","ADA","DA","ER","FED","DPL","POPCNTR")
valid_levels <- c("PR","CD","CMACA","CMA","CA","CSD","CT","ADA","DA","ER","FED","DPL","POPCNTR","POPCTR","FSA")
valid_types <- c("cartographic","digital")
if (!(census_year %in% valid_census_years)) {
stop(paste0("Census year must be one of ",paste0(valid_census_years,collapse = ", "),"."))
Expand All @@ -35,7 +35,7 @@ get_statcan_geographies <- function(census_year,level,type="cartographic",
if (!(level %in% valid_levels)) {
stop(paste0("Level must be one of ",paste0(valid_levels,collapse = ", "),"."))
}
level_map <- c("CMACA"="CMA","CA"="CMA","POPCNTR","PC")
level_map <- c("CMACA"="CMA","CA"="CMA","POPCNTR"="PC","POPCTR"="PC")
if (level %in% names(level_map)) level <-level_map[[level]]
geo_base_path <- cache_path("geographies")
if (!dir.exists(geo_base_path)) dir.create(geo_base_path)
Expand All @@ -56,6 +56,14 @@ get_statcan_geographies <- function(census_year,level,type="cartographic",
utils::download.file(url,tmp,mode="wb",quiet=quiet)
options(timeout = old_timeout)
utils::unzip(tmp,exdir = exdir)
fs <- dir(exdir,full.names = TRUE)
if (length(fs)==1 && dir.exists(fs)) {
tmp_dir <- file.path(geo_base_path,"XXXX")
file.rename(exdir,tmp_dir)
fs <- dir(tmp_dir,full.names = TRUE)
file.rename(fs,exdir)
unlink(tmp_dir)
}
} else {
if (!quiet) message("Reading geographic data from local cache.")
}
Expand Down
5 changes: 4 additions & 1 deletion R/helpers.R
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,10 @@ cache_path <- function(...) {
if (nchar(cache_dir)==0) {
if (!is.null(getOption("cancensus.cache_path"))) {
cache_dir <- getOption("cancensus.cache_path")
} else cache_dir <- tempdir()
} else {
cache_dir <- tempdir()
message(cm_no_cache_path_message)
}
}
if (!is.character(cache_dir)) {
stop("Corrupt 'CM_CACHE_PATH' environment variable or 'cancensus.cache_path' option. Must be a path.",
Expand Down
17 changes: 9 additions & 8 deletions R/user_settings.R
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ set_cancensus_api_key <- function(key, overwrite = FALSE, install = FALSE){
keyconcat <- paste0("CM_API_KEY='", key, "'")
# Append API key to .Renviron file
write(keyconcat, renv, sep = "\n", append = TRUE)
message('Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CM_API_KEY"). \nTo use now, restart R or run `readRenviron("~/.Renviron")`')
Sys.setenv(CM_API_KEY = key)
message('Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CM_API_KEY").')
} else {
message("API key set for duration of session. To install your API key for use across sessions, run this function with `install = TRUE`.")
Sys.setenv(CM_API_KEY = key)
Expand All @@ -65,7 +66,7 @@ set_cancensus_api_key <- function(key, overwrite = FALSE, install = FALSE){
#'\dontrun{
#' set_cancensus_cache_path("~/cancensus_cache")
#'
#' # This will set the cache path permanently until ovewritten again
#' # This will set the cache path permanently until overwritten again
#' set_cancensus_cache_path("~/cancensus_cache", install = TRUE)
#' }
set_cancensus_cache_path <- function(cache_path, overwrite = FALSE, install = FALSE){
Expand Down Expand Up @@ -93,7 +94,8 @@ set_cancensus_cache_path <- function(cache_path, overwrite = FALSE, install = FA
keyconcat <- paste0("CM_CACHE_PATH='", cache_path, "'")
# Append cache path .Renviron file
write(keyconcat, renv, sep = "\n", append = TRUE)
message('Your cache path has been stored in your .Renviron and can be accessed by Sys.getenv("CM_CACHE_PATH"). \nTo use now, restart R or run readRenviron("~/.Renviron").')
message('Your cache path has been stored in your .Renviron and can be accessed by Sys.getenv("CM_CACHE_PATH").')
Sys.setenv('CM_CACHE_PATH' = cache_path)
} else {
message("Cache set for duration of session. To permanently add your cache path for use across sessions, run this function with install = TRUE.")
Sys.setenv('CM_CACHE_PATH' = cache_path)
Expand Down Expand Up @@ -138,9 +140,8 @@ show_cancensus_cache_path <- function() {
cm_no_cache_path_message <- paste(
"Census data is currently stored temporarily.\n\n",
"In order to speed up performance, reduce API quota usage, and reduce",
"unnecessary network calls, please set up a persistent cache directory by",
"setting the environment variable CM_CACHE_PATH= '<path to cancensus cache directory>' or ",
"setting options(cancensus.cache_path = '<path to cancensus cache directory>')\n\n",
"You may add this environment varianble to your .Renviron",
"or add this option, together with your API key, to your .Rprofile.\n\n"
"unnecessary network calls, please set up a persistent cache directory via",
"`set_cancensus_cache_path('<local cache path>', install = TRUE)`.\n",
"This will add your cache directory as environment varianble to your .Renviron to be",
"used across sessions and projects.\n\n"
)
46 changes: 23 additions & 23 deletions R/wds.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#'
#' @param census_year census year to get the data for, right now only 2021 is supported
#' @param level geographic level to return the data for, valid choices are
#' "PR","CD","CMACA","CSD","CT","ADA","DA","ER","FED","DPL","POPCNTR"
#' "PR","CD","CMACA","CSD","CT","ADA","DA","ER","FED","DPL","POPCNTR", "FSA"
#' @param refresh default is `FALSE` will refresh the temporary cache if `TRUE`
#' @return tibble with the metadata
#'
Expand All @@ -18,7 +18,7 @@
#' @export
get_statcan_wds_metadata <- function(census_year,level,refresh=FALSE){
valid_census_years <- c("2021")
valid_levels <- c("PR","CD","CMACA","CSD","CT","ADA","DA","ER","FED","DPL","POPCNTR")
valid_levels <- c("PR","CD","CMACA","CSD","CT","ADA","DA","ER","FED","DPL","POPCNTR","FSA")
if (!(census_year %in% valid_census_years)) {
stop(paste0("Census year must be one of ",paste0(valid_census_years,collapse = ", "),"."))
}
Expand All @@ -34,24 +34,24 @@ get_statcan_wds_metadata <- function(census_year,level,refresh=FALSE){
code_lists <- xml2::xml_find_all(d,"//structure:Codelist")

meta_data <- lapply(code_lists, \(cl){
codelist_id <- cl |> xml2::xml_attr("id")
agencyID <- cl |> xml2::xml_attr("agencyID")
codelist_en <- cl |> xml2::xml_find_all("common:Name[@xml:lang='en']") |> xml2::xml_text()
codelist_fr <- cl |> xml2::xml_find_all("common:Name[@xml:lang='fr']") |> xml2::xml_text()
description_en <- cl |> xml2::xml_find_all("common:Name[@xml:lang='en']") |> xml2::xml_text()
description_fr <- cl |> xml2::xml_find_all("common:Name[@xml:lang='fr']") |> xml2::xml_text()
codes <- cl |> xml2::xml_find_all("structure:Code")
codelist_id <- cl %>% xml2::xml_attr("id")
agencyID <- cl %>% xml2::xml_attr("agencyID")
codelist_en <- cl %>% xml2::xml_find_all("common:Name[@xml:lang='en']") %>% xml2::xml_text()
codelist_fr <- cl %>% xml2::xml_find_all("common:Name[@xml:lang='fr']") %>% xml2::xml_text()
description_en <- cl %>% xml2::xml_find_all("common:Name[@xml:lang='en']") %>% xml2::xml_text()
description_fr <- cl %>% xml2::xml_find_all("common:Name[@xml:lang='fr']") %>% xml2::xml_text()
codes <- cl %>% xml2::xml_find_all("structure:Code")
dplyr::tibble(`Agency ID`=agencyID,
`Codelist ID`=codelist_id,
`Codelist en`=codelist_en,
`Codelist fr`=codelist_fr,
ID=codes |> xml2::xml_attr("id"),
en=codes |> xml2::xml_find_all("common:Name[@xml:lang='en']") |> xml2::xml_text(),
fr=codes |> xml2::xml_find_all("common:Name[@xml:lang='fr']") |> xml2::xml_text(),
`Parent ID`=codes |> xml2::xml_find_all("structure:Parent/Ref",flatten=FALSE) |>
lapply(\(d)ifelse(is.null(d),NA,xml2::xml_attr(d,"id"))) |> unlist()
ID=codes %>% xml2::xml_attr("id"),
en=codes %>% xml2::xml_find_all("common:Name[@xml:lang='en']") %>% xml2::xml_text(),
fr=codes %>% xml2::xml_find_all("common:Name[@xml:lang='fr']") %>% xml2::xml_text(),
`Parent ID`=codes %>% xml2::xml_find_all("structure:Parent/Ref",flatten=FALSE) %>%
lapply(\(d)ifelse(is.null(d),NA,xml2::xml_attr(d,"id"))) %>% unlist()
)
}) |>
}) %>%
dplyr::bind_rows()
meta_data
}
Expand Down Expand Up @@ -116,22 +116,22 @@ get_statcan_wds_data <- function(DGUIDs,
census_year <- "2021"
meta_data <- get_statcan_wds_metadata(census_year,level,refresh = refresh)

levels <- meta_data |>
levels <- meta_data %>%
dplyr::filter(.data$`Codelist ID`=="CL_GEO_LEVEL")

meta_geos <- meta_data |>
meta_geos <- meta_data %>%
dplyr::filter(.data$`Codelist ID`==paste0("CL_GEO_",level))
meta_characteristics <- meta_data |>
meta_characteristics <- meta_data %>%
dplyr::filter(.data$`Codelist ID`=="CL_CHARACTERISTIC")

name_field <- language #paste0(language,"_description")

data <- readr::read_csv(wds_data_tempfile,col_types = readr::cols(.default="c")) |>
dplyr::mutate(dplyr::across(dplyr::matches("OBS_VALUE|TNR_CI_"),as.numeric)) |>
dplyr::left_join(meta_geos |>
data <- readr::read_csv(wds_data_tempfile,col_types = readr::cols(.default="c")) %>%
dplyr::mutate(dplyr::across(dplyr::matches("OBS_VALUE|TNR_CI_"),as.numeric)) %>%
dplyr::left_join(meta_geos %>%
dplyr::select(GEO_DESC=.data$ID,GEO_NAME=!!as.name(name_field)),
by="GEO_DESC") |>
dplyr::left_join(meta_characteristics |>
by="GEO_DESC") %>%
dplyr::left_join(meta_characteristics %>%
dplyr::select(CHARACTERISTIC=.data$ID,CHARACTERISTIC_NAME=!!as.name(name_field)),
by="CHARACTERISTIC")

Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,13 @@ library(cancensus)

Alternatively, the latest development version can be installed from Github.
```
devtools::install_github("mountainmath/cancensus")
remotes::install_github("mountainmath/cancensus")
library(cancensus)
```

### API key

**cancensus** requires a valid CensusMapper API key to use. You can obtain a free API key by [signing up](https://censusmapper.ca/users/sign_up) for a CensusMapper account. To check your API key, just go to "Edit Profile" (in the top-right of the CensusMapper menu bar). Once you have your key, you can store it in your system environment so it is automatically used in API calls. To do so just enter `set_cancensus_api_key(<your_api_key>', install = TRUE)`.
**cancensus** requires a valid CensusMapper API key to use. You can obtain a free API key by [signing up](https://censusmapper.ca/users/sign_up) for a CensusMapper account. To check your API key, just go to "Edit Profile" (in the top-right of the CensusMapper menu bar). Once you have your key, you can store it in your system environment so it is automatically used in API calls. To do so just enter `set_cancensus_api_key('<your_api_key>', install = TRUE)`.

CensusMapper API keys are free and public API quotas are generous; however, due to incremental costs of serving large quantities of data, there are some limits to API usage in place. For most use cases, these API limits should not be an issue. Production uses with large extracts of detailed geographies may run into API quota limits.

Expand All @@ -45,13 +45,13 @@ For larger quotas, please get in touch with Jens [directly](mailto:jens@censusma

### Local Cache

For performance reasons, and to avoid unnecessarily drawing down API quotas, **cancensus** caches data queries under the hood. By default, **cancensus** caches in R's temporary directory, but this cache is not persistent across sessions. In order to speed up performance, reduce quota usage, and reduce the need for unnecessary network calls, we recommend assigning a persistent local cache using `set_cancensus_cache_path(<local cache path>, install = TRUE)`, this enables more efficient loading and reuse of downloaded data. Users will be prompted with a suggestion to change their default cache location when making API calls if one has not been set yet.
For performance reasons, and to avoid unnecessarily drawing down API quotas, **cancensus** caches data queries under the hood. By default, **cancensus** caches in R's temporary directory, but this cache is not persistent across sessions. In order to speed up performance, reduce quota usage, and reduce the need for unnecessary network calls, we recommend assigning a persistent local cache using `set_cancensus_cache_path('<local cache path>', install = TRUE)`, this enables more efficient loading and reuse of downloaded data. Users will be prompted with a suggestion to change their default cache location when making API calls if one has not been set yet.

Starting with version 0.5.2 **cancensus** will automatically check if for data that has been recalled by Statistics Canada and is stored in the local cache via the new data recall API implemented in [CensusMapper](https://censusmapper.ca). Statistics Canada occasionally detects and corrects errors in their census data releases, and **cancensus** will download a list of recalled data at the first invocation of `get_census()` in each session and emit a warning if it detected locally cached data that has been recalled. Removal of the cached recalled data has to be done explicitly by the user via the `remove_recalled_chached_data()` function. If data was cached with **cancensus** versions prior to version 0.5.0 there is insufficient metadata to determine all instances of recalled cached data, but the package will check every time cached data is loaded and can identify recalled data at this point at the latest and issues a warning if recalled data is loaded.

### Currently available datasets

**cancensus** can access Statistics Canada Census data for Census years 1996, 2001, 2006, 2011, 2016, and 2021. You can run `list_census_datasets` to check what datasets are currently available for access through the CensusMapper API. Additional data for the 2021 Census will be included in Censusmapper within a day or two after public release by Statistics Canada. Statistics Canada maintains a release schedule for the Census 2021 Program which can be viewed on their [website](https://www12.statcan.gc.ca/census-recensement/2021/ref/prodserv/release-diffusion-eng.cfm).
**cancensus** can access Statistics Canada Census data for Census years 1996, 2001, 2006, 2011, 2016, and 2021. You can run `list_census_datasets` to check what datasets are currently available for access through the CensusMapper API. Additional data for the 2021 Census will be included in CensusMapper within a day or two after public release by Statistics Canada. Statistics Canada maintains a release schedule for the Census 2021 Program which can be viewed on their [website](https://www12.statcan.gc.ca/census-recensement/2021/ref/prodserv/release-diffusion-eng.cfm).

Thanks to contributions by the Canada Mortgage and Housing Corporation (CMHC), **cancensus** now includes additional Census-linked datasets as open-data releases. These include annual taxfiler data at the census tract level for tax years 2000 through 2018, which includes data on incomes and demographics, as well as specialized crosstabs for Structural type of dwelling by Document type, which details occupancy status for residences. These crosstabs are available for the 2001, 2006, 2011, 2016, and 2021 Census years at all levels starting with census tract.

Expand Down Expand Up @@ -139,7 +139,7 @@ There are several other jurisdiction where census data is available via R packag
If you wish to cite cancensus:

von Bergmann, J., Aaron Jacobs, Dmitry Shkolnik (2022). cancensus: R package to
access, retrieve, and work with Canadian Census data and geography. v0.5.5.
access, retrieve, and work with Canadian Census data and geography. v0.5.6.


A BibTeX entry for LaTeX users is
Expand All @@ -148,7 +148,7 @@ A BibTeX entry for LaTeX users is
author = {Jens {von Bergmann} and Dmitry Shkolnik and Aaron Jacobs},
title = {cancensus: R package to access, retrieve, and work with Canadian Census data and geography},
year = {2022},
note = {R package version 0.5.5},
note = {R package version 0.5.6},
url = {https://mountainmath.github.io/cancensus/}
}
```
Expand Down
Loading
Loading