diff --git a/NEWS.md b/NEWS.md index 67f491c..cc98bfc 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,9 @@ +# tidycensus 1.4 + +* `get_decennial()` has been updated to accommodate the new Demographic and Housing Characteristics and Demographic Profile summary files. Use `sumfile = "dhc"` for the DHC file, and `sumfile = "dp"` for the DP file. +* The default year in `get_decennial()` is now 2020. This may be a breaking change for some legacy code that omits the year, so be sure to update scripts to hard-code the year for years earlier than 2020. +* `sumfile = "pl"` is the default for `get_decennial()` and will remain so to avoid existing code breakages. Please note that variable IDs are replicated across the PL and DHC files, but may represent different topics. + # tidycensus 1.3 * Given that the Census API allows for 500 queries per day without an API key, the API key requirement in the package has been removed to support reproducibility. Users without a key are now warned of potential performance limitations. diff --git a/docs/404.html b/docs/404.html index b9abb72..6f38b51 100644 --- a/docs/404.html +++ b/docs/404.html @@ -39,7 +39,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index 5076964..feb0cb0 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/articles/basic-usage.html b/docs/articles/basic-usage.html index f6b6356..cacbc08 100644 --- a/docs/articles/basic-usage.html +++ b/docs/articles/basic-usage.html @@ -40,7 +40,7 @@ tidycensus - 1.3.3 + 1.4 @@ -126,22 +126,25 @@

Basic usage of tidycensus

access to the 2000, 2010, and 2020 decennial US Census APIs, and get_acs(), which grants access to the 1-year and 5-year American Community Survey APIs.

-

In this basic example, let’s look at median age by state in 2010:

+

In this basic example, let’s look at median age by state in 2020, +with data drawn from the Demographic and Housing Characteristics summary +file:

-age10 <- get_decennial(geography = "state", 
-                       variables = "P013001", 
-                       year = 2010)
+age20 <- get_decennial(geography = "state", 
+                       variables = "P13_001N", 
+                       year = 2020,
+                       sumfile = "dhc")
 
-head(age10)
+head(age20)
## # A tibble: 6 × 4
-##   GEOID NAME       variable value
-##   <chr> <chr>      <chr>    <dbl>
-## 1 01    Alabama    P013001   37.9
-## 2 02    Alaska     P013001   33.8
-## 3 04    Arizona    P013001   35.9
-## 4 05    Arkansas   P013001   37.4
-## 5 06    California P013001   35.2
-## 6 22    Louisiana  P013001   35.8
+## GEOID NAME variable value +## <chr> <chr> <chr> <dbl> +## 1 09 Connecticut P13_001N 41.1 +## 2 10 Delaware P13_001N 41.1 +## 3 11 District of Columbia P13_001N 33.9 +## 4 12 Florida P13_001N 43 +## 5 13 Georgia P13_001N 37.5 +## 6 15 Hawaii P13_001N 40.8

The function returns a tibble with four columns by default: GEOID, which is an identifier for the geographical unit associated with the row; NAME, which is a descriptive name @@ -155,7 +158,7 @@

Basic usage of tidycensus

As the function has returned a tidy object, we can visualize it quickly with ggplot2:

-age10 %>%
+age20 %>%
   ggplot(aes(x = value, y = reorder(NAME, value))) + 
   geom_point()

@@ -431,18 +434,20 @@

Searching for variables"pl" for the -redistricting files (currently the only choice for 2020), -"sf1" or "sf2" (2000 and 2010) and -"sf3" or "sf4" (2000 only) for the various -summary files. Special island area summary files are available with -"as", "mp", "gu", or -"vi". For the ACS, use either "acs1" or -"acs5" for the ACS detailed tables, and append -/profile for the Data Profile and /subject for -the Subject Tables. To browse these variables, assign the result of this -function to a variable and use the View function in -RStudio. An optional argument cache = TRUE will cache the -dataset on your computer for future use.

+redistricting files; "dhc" for the Demographic and Housing +Characteristics file and "dp" for the Demographic Profile +(2020 only), and "sf1" or "sf2" (2000 and +2010) and "sf3" or "sf4" (2000 only) for the +various summary files. Special island area summary files are available +with "as", "mp", "gu", or +"vi".

+

For the ACS, use either "acs1" or "acs5" +for the ACS detailed tables, and append /profile for the +Data Profile and /subject for the Subject Tables. To browse +these variables, assign the result of this function to a variable and +use the View function in RStudio. An optional argument +cache = TRUE will cache the dataset on your computer for +future use.

 v17 <- load_variables(2017, "acs5", cache = TRUE)
 
@@ -472,32 +477,32 @@ 

Working with ACS dataget_acs(). In turn, when requesting ACS data with tidycensus, it is not necessary to specify the "E" or "M" suffix for a variable name. Let’s -fetch median household income data from the 2014-2018 ACS for counties +fetch median household income data from the 2017-2021 ACS for counties in Vermont.

 vt <- get_acs(geography = "county", 
               variables = c(medincome = "B19013_001"), 
               state = "VT", 
-              year = 2018)
+              year = 2021)
 
 vt
## # A tibble: 14 × 5
 ##    GEOID NAME                       variable  estimate   moe
 ##    <chr> <chr>                      <chr>        <dbl> <dbl>
-##  1 50001 Addison County, Vermont    medincome    65093  2424
-##  2 50003 Bennington County, Vermont medincome    53040  2307
-##  3 50005 Caledonia County, Vermont  medincome    49348  1842
-##  4 50007 Chittenden County, Vermont medincome    69896  2132
-##  5 50009 Essex County, Vermont      medincome    41045  2661
-##  6 50011 Franklin County, Vermont   medincome    64258  1568
-##  7 50013 Grand Isle County, Vermont medincome    69583  5812
-##  8 50015 Lamoille County, Vermont   medincome    60365  3915
-##  9 50017 Orange County, Vermont     medincome    60159  2361
-## 10 50019 Orleans County, Vermont    medincome    47915  2193
-## 11 50021 Rutland County, Vermont    medincome    54973  1754
-## 12 50023 Washington County, Vermont medincome    62108  2065
-## 13 50025 Windham County, Vermont    medincome    52659  1706
-## 14 50027 Windsor County, Vermont    medincome    58303  1576
+## 1 50001 Addison County, Vermont medincome 77978 3393 +## 2 50003 Bennington County, Vermont medincome 63448 3413 +## 3 50005 Caledonia County, Vermont medincome 55159 3974 +## 4 50007 Chittenden County, Vermont medincome 81957 2521 +## 5 50009 Essex County, Vermont medincome 48194 3577 +## 6 50011 Franklin County, Vermont medincome 68476 3297 +## 7 50013 Grand Isle County, Vermont medincome 85154 7894 +## 8 50015 Lamoille County, Vermont medincome 66016 4777 +## 9 50017 Orange County, Vermont medincome 67906 2710 +## 10 50019 Orleans County, Vermont medincome 58037 3153 +## 11 50021 Rutland County, Vermont medincome 59751 2133 +## 12 50023 Washington County, Vermont medincome 70128 3014 +## 13 50025 Windham County, Vermont medincome 59195 2060 +## 14 50027 Windsor County, Vermont medincome 63787 2209

The output is similar to a call to get_decennial(), but instead of a value column, get_acs returns estimate and moe columns for the ACS estimate @@ -514,7 +519,7 @@

Working with ACS datageom_errorbarh(aes(xmin = estimate - moe, xmax = estimate + moe)) + geom_point(color = "red", size = 3) + labs(title = "Household income by county in Vermont", - subtitle = "2014-2018 American Community Survey", + subtitle = "2017-2021 American Community Survey", y = "", x = "ACS estimate (bars represent margin of error)")

diff --git a/docs/articles/basic-usage_files/figure-html/unnamed-chunk-4-1.png b/docs/articles/basic-usage_files/figure-html/unnamed-chunk-4-1.png index 934c9ca..1e2b0e8 100644 Binary files a/docs/articles/basic-usage_files/figure-html/unnamed-chunk-4-1.png and b/docs/articles/basic-usage_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/docs/articles/basic-usage_files/figure-html/unnamed-chunk-7-1.png b/docs/articles/basic-usage_files/figure-html/unnamed-chunk-7-1.png index bb2a986..70d9334 100644 Binary files a/docs/articles/basic-usage_files/figure-html/unnamed-chunk-7-1.png and b/docs/articles/basic-usage_files/figure-html/unnamed-chunk-7-1.png differ diff --git a/docs/articles/index.html b/docs/articles/index.html index 25b04a5..7824b61 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/articles/margins-of-error.html b/docs/articles/margins-of-error.html index 65fe5ca..f670aa0 100644 --- a/docs/articles/margins-of-error.html +++ b/docs/articles/margins-of-error.html @@ -40,7 +40,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/articles/other-datasets.html b/docs/articles/other-datasets.html index 6ef9b32..5da59b0 100644 --- a/docs/articles/other-datasets.html +++ b/docs/articles/other-datasets.html @@ -40,7 +40,7 @@ tidycensus - 1.3.3 + 1.4 @@ -172,7 +172,7 @@

Components of change populati ## 8 New Mexico 35 BIRTHS 23125 ## 9 New York 36 BIRTHS 222924 ## 10 North Carolina 37 BIRTHS 119203 -## # … with 614 more rows +## # ℹ 614 more rows

The variables included in the components of change product consist of both estimates of counts and rates. Rates are preceded by an R in the variable name and are calculated per 1000 @@ -223,7 +223,7 @@

Components of change populati ## 8 17175 Stark County, Illinois RNETMIG -10.6 (((500559 424779.4, 51023… ## 9 29169 Pulaski County, Missouri RNETMIG 4.42 (((312851.7 46166.36, 312… ## 10 19151 Pocahontas County, Iowa RNETMIG -12.2 (((88185.95 606331.9, 126… -## # … with 3,132 more rows +## # ℹ 3,132 more rows

We’ll next use tidyverse tools to generate a groups column that bins the net migration rates into comprehensible categories, and plot the result using geom_sf() and ggplot2.

@@ -293,11 +293,11 @@

Estimates of population charact ## 4 06037 Los Angeles County, California 579856 Both sexes Age 0 to 4 ye… Both… ## 5 06037 Los Angeles County, California 236459 Both sexes Age 0 to 4 ye… Non-… ## 6 06037 Los Angeles County, California 343397 Both sexes Age 0 to 4 ye… Hisp… -## 7 06037 Los Angeles County, California 600191 Both sexes Age 5 to 9 ye… Both… -## 8 06037 Los Angeles County, California 229438 Both sexes Age 5 to 9 ye… Non-… -## 9 06037 Los Angeles County, California 370753 Both sexes Age 5 to 9 ye… Hisp… -## 10 06037 Los Angeles County, California 373670 Both sexes Age 15 to 19 … Hisp… -## # … with 200 more rows +## 7 06037 Los Angeles County, California 378447 Both sexes Age 10 to 14 … Hisp… +## 8 06037 Los Angeles County, California 600191 Both sexes Age 5 to 9 ye… Both… +## 9 06037 Los Angeles County, California 229438 Both sexes Age 5 to 9 ye… Non-… +## 10 06037 Los Angeles County, California 370753 Both sexes Age 5 to 9 ye… Hisp… +## # ℹ 200 more rows

With some additional data wrangling, the returned format facilitates analysis and visualization. For example, we can compare population pyramids for Hispanic and non-Hispanic populations in Los Angeles @@ -372,15 +372,14 @@

Using get_flows()filter(!is.na(GEOID2)) %>% head()
## # A tibble: 6 × 7
-##   GEOID1 GEOID2 FULL1_NAME                   FULL2_NAME    varia…¹ estim…²   moe
-##   <chr>  <chr>  <chr>                        <chr>         <chr>     <dbl> <dbl>
-## 1 36119  01089  Westchester County, New York Madison Coun… MOVEDIN       0    28
-## 2 36119  01089  Westchester County, New York Madison Coun… MOVEDO…      26    41
-## 3 36119  01089  Westchester County, New York Madison Coun… MOVEDN…     -26    41
-## 4 36119  01095  Westchester County, New York Marshall Cou… MOVEDIN       0    28
-## 5 36119  01095  Westchester County, New York Marshall Cou… MOVEDO…      35    55
-## 6 36119  01095  Westchester County, New York Marshall Cou… MOVEDN…     -35    55
-## # … with abbreviated variable names ¹​variable, ²​estimate
+## GEOID1 GEOID2 FULL1_NAME FULL2_NAME variable estimate moe +## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> +## 1 36119 01089 Westchester County, New York Madison Co… MOVEDIN 0 28 +## 2 36119 01089 Westchester County, New York Madison Co… MOVEDOUT 26 41 +## 3 36119 01089 Westchester County, New York Madison Co… MOVEDNET -26 41 +## 4 36119 01095 Westchester County, New York Marshall C… MOVEDIN 0 28 +## 5 36119 01095 Westchester County, New York Marshall C… MOVEDOUT 35 55 +## 6 36119 01095 Westchester County, New York Marshall C… MOVEDNET -35 55

With the default setting of get_flows(), data is returned in a “tidy” or long format. Notice that for each pair of places, there are three rows returned with one row for each variable @@ -399,15 +398,14 @@

Using get_flows()arrange(desc(estimate)) %>% head()
## # A tibble: 6 × 7
-##   GEOID1 GEOID2 FULL1_NAME                   FULL2_NAME    varia…¹ estim…²   moe
-##   <chr>  <chr>  <chr>                        <chr>         <chr>     <dbl> <dbl>
-## 1 36119  09001  Westchester County, New York Fairfield Co… MOVEDO…    3916   778
-## 2 36119  36061  Westchester County, New York New York Cou… MOVEDO…    3328   596
-## 3 36119  36005  Westchester County, New York Bronx County… MOVEDO…    2063   418
-## 4 36119  36027  Westchester County, New York Dutchess Cou… MOVEDO…    1870   454
-## 5 36119  36079  Westchester County, New York Putnam Count… MOVEDO…    1318   324
-## 6 36119  36081  Westchester County, New York Queens Count… MOVEDO…    1082   240
-## # … with abbreviated variable names ¹​variable, ²​estimate
+## GEOID1 GEOID2 FULL1_NAME FULL2_NAME variable estimate moe +## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> +## 1 36119 09001 Westchester County, New York Fairfield … MOVEDOUT 3916 778 +## 2 36119 36061 Westchester County, New York New York C… MOVEDOUT 3328 596 +## 3 36119 36005 Westchester County, New York Bronx Coun… MOVEDOUT 2063 418 +## 4 36119 36027 Westchester County, New York Dutchess C… MOVEDOUT 1870 454 +## 5 36119 36079 Westchester County, New York Putnam Cou… MOVEDOUT 1318 324 +## 6 36119 36081 Westchester County, New York Queens Cou… MOVEDOUT 1082 240

The MOVEDOUT variable only estimates the number of people that moved out of Westchester County and doesn’t account for the number of people that moved in to Westchester from each county. If you @@ -419,15 +417,14 @@

Using get_flows()arrange(estimate) %>% head()
## # A tibble: 6 × 7
-##   GEOID1 GEOID2 FULL1_NAME                   FULL2_NAME    varia…¹ estim…²   moe
-##   <chr>  <chr>  <chr>                        <chr>         <chr>     <dbl> <dbl>
-## 1 36119  09001  Westchester County, New York Fairfield Co… MOVEDN…   -1768   958
-## 2 36119  36027  Westchester County, New York Dutchess Cou… MOVEDN…   -1119   497
-## 3 36119  06037  Westchester County, New York Los Angeles … MOVEDN…    -486   339
-## 4 36119  12099  Westchester County, New York Palm Beach C… MOVEDN…    -450   182
-## 5 36119  25021  Westchester County, New York Norfolk Coun… MOVEDN…    -358   351
-## 6 36119  36079  Westchester County, New York Putnam Count… MOVEDN…    -340   407
-## # … with abbreviated variable names ¹​variable, ²​estimate
+## GEOID1 GEOID2 FULL1_NAME FULL2_NAME variable estimate moe +## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> +## 1 36119 09001 Westchester County, New York Fairfield … MOVEDNET -1768 958 +## 2 36119 36027 Westchester County, New York Dutchess C… MOVEDNET -1119 497 +## 3 36119 06037 Westchester County, New York Los Angele… MOVEDNET -486 339 +## 4 36119 12099 Westchester County, New York Palm Beach… MOVEDNET -450 182 +## 5 36119 25021 Westchester County, New York Norfolk Co… MOVEDNET -358 351 +## 6 36119 36079 Westchester County, New York Putnam Cou… MOVEDNET -340 407

You may have noticed that there are some destination geographies that are not other counties. For people that moved into to Westchester from outside the United States, the Migration Flows data reports the region @@ -471,15 +468,13 @@

Demographic characteristicsla_flows %>% filter(str_detect(FULL2_NAME, "San Fran"), variable == "MOVEDNET")
## # A tibble: 5 × 9
-##   GEOID1 GEOID2 FULL1_NAME           FULL2…¹ RACE  RACE_…² varia…³ estim…⁴   moe
-##   <chr>  <chr>  <chr>                <chr>   <chr> <chr>   <chr>     <dbl> <dbl>
-## 1 31080  41860  Los Angeles-Long Be… San Fr… 00    All ra… MOVEDN…   -2433  1585
-## 2 31080  41860  Los Angeles-Long Be… San Fr… 01    White … MOVEDN…   -1077  1096
-## 3 31080  41860  Los Angeles-Long Be… San Fr… 02    Black … MOVEDN…      98   378
-## 4 31080  41860  Los Angeles-Long Be… San Fr… 03    Asian … MOVEDN…    -580   778
-## 5 31080  41860  Los Angeles-Long Be… San Fr… 04    Other … MOVEDN…    -874   549
-## # … with abbreviated variable names ¹​FULL2_NAME, ²​RACE_label, ³​variable,
-## #   ⁴​estimate
+## GEOID1 GEOID2 FULL1_NAME FULL2_NAME RACE RACE_label variable estimate moe +## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> +## 1 31080 41860 Los Angeles… San Franc… 00 All races MOVEDNET -2433 1585 +## 2 31080 41860 Los Angeles… San Franc… 01 White alo… MOVEDNET -1077 1096 +## 3 31080 41860 Los Angeles… San Franc… 02 Black or … MOVEDNET 98 378 +## 4 31080 41860 Los Angeles… San Franc… 03 Asian alo… MOVEDNET -580 778 +## 5 31080 41860 Los Angeles… San Franc… 04 Other rac… MOVEDNET -874 549

Note that the demographic characteristics must be specified in the breakdown argument of get_flows() (not the variable argument). For each dataset there are three or @@ -519,16 +514,15 @@

Mapping migration flows## Bounding box: xmin: -112.0705 ymin: 33.18571 xmax: -112.0705 ymax: 33.18571 ## Geodetic CRS: NAD83 ## # A tibble: 6 × 9 -## GEOID1 GEOID2 FULL1_…¹ FULL2…² varia…³ estim…⁴ moe centroid1 -## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <POINT [°]> -## 1 38060 NA Phoenix… Outsid… MOVEDIN 21602 1464 (-112.0705 33.18571) -## 2 38060 NA Phoenix… Outsid… MOVEDO… 21192 1559 (-112.0705 33.18571) -## 3 38060 NA Phoenix… Outsid… MOVEDN… 410 2186 (-112.0705 33.18571) -## 4 38060 NA Phoenix… Africa MOVEDIN 1078 385 (-112.0705 33.18571) -## 5 38060 NA Phoenix… Africa MOVEDO… NA NA (-112.0705 33.18571) -## 6 38060 NA Phoenix… Africa MOVEDN… NA NA (-112.0705 33.18571) -## # … with 1 more variable: centroid2 <POINT [°]>, and abbreviated variable names -## # ¹​FULL1_NAME, ²​FULL2_NAME, ³​variable, ⁴​estimate +## GEOID1 GEOID2 FULL1_NAME FULL2_NAME variable estimate moe +## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> +## 1 38060 NA Phoenix-Mesa-Scottsdale, AZ … Outside M… MOVEDIN 21602 1464 +## 2 38060 NA Phoenix-Mesa-Scottsdale, AZ … Outside M… MOVEDOUT 21192 1559 +## 3 38060 NA Phoenix-Mesa-Scottsdale, AZ … Outside M… MOVEDNET 410 2186 +## 4 38060 NA Phoenix-Mesa-Scottsdale, AZ … Africa MOVEDIN 1078 385 +## 5 38060 NA Phoenix-Mesa-Scottsdale, AZ … Africa MOVEDOUT NA NA +## 6 38060 NA Phoenix-Mesa-Scottsdale, AZ … Africa MOVEDNET NA NA +## # ℹ 2 more variables: centroid1 <POINT [°]>, centroid2 <POINT [°]>

With the centroids attached to each pair of places, it is straightforward to map the migration flows. Here, we look at the most common origin MSAs for people moving to Phoenix-Mesa-Scottsdale, AZ. To diff --git a/docs/articles/other-datasets_files/figure-html/unnamed-chunk-4-1.png b/docs/articles/other-datasets_files/figure-html/unnamed-chunk-4-1.png index bfc643e..0922a3b 100644 Binary files a/docs/articles/other-datasets_files/figure-html/unnamed-chunk-4-1.png and b/docs/articles/other-datasets_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/docs/articles/pums-data.html b/docs/articles/pums-data.html index 061e250..8a6d201 100644 --- a/docs/articles/pums-data.html +++ b/docs/articles/pums-data.html @@ -40,7 +40,7 @@ tidycensus - 1.3.3 + 1.4 @@ -185,14 +185,14 @@

PUMS data dictionariespums_vars_2018 %>% distinct(var_code, var_label, data_type, level)
## # A tibble: 513 × 4
-##   var_code var_label                                                                        data_…¹ level
-##   <chr>    <chr>                                                                            <chr>   <chr>
-## 1 SERIALNO Housing unit/GQ person serial number                                             chr     NA   
-## 2 DIVISION Division code based on 2010 Census definitions                                   chr     NA   
-## 3 PUMA     Public use microdata area code (PUMA) based on 2010 Census definition (areas wi… chr     NA   
-## 4 REGION   Region code based on 2010 Census definitions                                     chr     NA   
-## 5 ST       State Code based on 2010 Census definitions                                      chr     NA   
-## # … with 508 more rows, and abbreviated variable name ¹​data_type
+## var_code var_label data_type level +## <chr> <chr> <chr> <chr> +## 1 SERIALNO Housing unit/GQ person serial number chr NA +## 2 DIVISION Division code based on 2010 Census definitions chr NA +## 3 PUMA Public use microdata area code (PUMA) based on 2010 Census definition (areas … chr NA +## 4 REGION Region code based on 2010 Census definitions chr NA +## 5 ST State Code based on 2010 Census definitions chr NA +## # ℹ 508 more rows

If you’re new to PUMS data, this is a good dataset to browse to get a feel for what variables are available.

@@ -223,7 +223,7 @@

Person vs. housing unit## 3 AGEP Age num person ## 4 CIT Citizenship status chr person ## 5 CITWP Year of naturalization write-in num person -## # … with 274 more rows +## # ℹ 274 more rows

It is important to be mindful of whether the variables you choose to analyze are person- or household-level variables.

@@ -249,12 +249,12 @@

Using get_pums() to d
## # A tibble: 6,436 × 9
 ##   SERIALNO      SPORDER  WGTP PWGTP  AGEP PUMA  ST    SCHL  SEX  
 ##   <chr>           <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
-## 1 2018GQ0001230       1     0     3    26 00300 50    20    1    
-## 2 2018GQ0002135       1     0    64    18 00100 50    19    2    
-## 3 2018GQ0002999       1     0    62    20 00400 50    19    2    
-## 4 2018GQ0004077       1     0    16    94 00200 50    21    2    
-## 5 2018GQ0006486       1     0    26    20 00400 50    19    1    
-## # … with 6,431 more rows
+## 1 2018GQ0000859 1 0 61 19 00200 50 19 1 +## 2 2018GQ0001119 1 0 67 80 00200 50 11 2 +## 3 2018GQ0001888 1 0 177 82 00400 50 16 2 +## 4 2018GQ0002438 1 0 17 17 00100 50 16 2 +## 5 2018GQ0003293 1 0 68 20 00400 50 19 2 +## # ℹ 6,431 more rows

We get 6436 rows and 9 columns. In addition to the variables we specified, get_pums() also always returns SERIALNO, SPORDER, WGTP, @@ -279,14 +279,14 @@

Using get_pums() to d
 vt_pums_recoded
## # A tibble: 6,436 × 12
-##   SERIALNO      SPORDER  WGTP PWGTP  AGEP PUMA  ST    SCHL  SEX   ST_label   SCHL_label           SEX_l…¹
-##   <chr>           <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <ord>      <ord>                <ord>  
-## 1 2018GQ0001230       1     0     3    26 00300 50    20    1     Vermont/VT Associate's degree   Male   
-## 2 2018GQ0002135       1     0    64    18 00100 50    19    2     Vermont/VT 1 or more years of … Female 
-## 3 2018GQ0002999       1     0    62    20 00400 50    19    2     Vermont/VT 1 or more years of … Female 
-## 4 2018GQ0004077       1     0    16    94 00200 50    21    2     Vermont/VT Bachelor's degree    Female 
-## 5 2018GQ0006486       1     0    26    20 00400 50    19    1     Vermont/VT 1 or more years of … Male   
-## # … with 6,431 more rows, and abbreviated variable name ¹​SEX_label
+## SERIALNO SPORDER WGTP PWGTP AGEP PUMA ST SCHL SEX ST_label SCHL_label SEX_label +## <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <ord> <ord> <ord> +## 1 2018GQ0001230 1 0 3 26 00300 50 20 1 Vermont/VT Associate's degree Male +## 2 2018GQ0002135 1 0 64 18 00100 50 19 2 Vermont/VT 1 or more years o… Female +## 3 2018GQ0002999 1 0 62 20 00400 50 19 2 Vermont/VT 1 or more years o… Female +## 4 2018GQ0004077 1 0 16 94 00200 50 21 2 Vermont/VT Bachelor's degree Female +## 5 2018GQ0006486 1 0 26 20 00400 50 19 1 Vermont/VT 1 or more years o… Male +## # ℹ 6,431 more rows

Analyzing PUMS data @@ -446,18 +446,17 @@

Calculating standard errors)

## # A tibble: 8 × 11
 ## # Groups:   PUMA [4]
-##   PUMA  SEX_label age_25_up age_25_up_low age_25_up_upp ba_abov…¹ ba_ab…² ba_ab…³ ba_ab…⁴ ba_ab…⁵ ba_ab…⁶
-##   <chr> <ord>         <dbl>         <dbl>         <dbl>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
-## 1 00100 Male          72680        70216.        75144.     34113  29913.  38313.   0.469   0.416   0.523
-## 2 00100 Female        77966        75671.        80261.     36873  32202.  41544.   0.473   0.417   0.528
-## 3 00200 Male          52278        50826.        53730.     15831  13327.  18335.   0.303   0.255   0.351
-## 4 00200 Female        55162        53643.        56681.     20248  17679.  22817.   0.367   0.321   0.413
-## 5 00300 Male          45634        44743.        46525.     14869  12638.  17100.   0.326   0.276   0.375
-## 6 00300 Female        49546        48576.        50516.     21527  19010.  24044.   0.434   0.384   0.485
-## 7 00400 Male          45960        45067.        46853.     12788  10699.  14877.   0.278   0.232   0.324
-## 8 00400 Female        48601        47783.        49419.     18980  16540.  21420.   0.391   0.340   0.441
-## # … with abbreviated variable names ¹​ba_above_n, ²​ba_above_n_low, ³​ba_above_n_upp, ⁴​ba_above_pct,
-## #   ⁵​ba_above_pct_low, ⁶​ba_above_pct_upp
+## PUMA SEX_label age_25_up age_25_up_low age_25_up_upp ba_above_n ba_above_n_low ba_above_n_upp +## <chr> <ord> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +## 1 00100 Male 72680 70216. 75144. 34113 29913. 38313. +## 2 00100 Female 77966 75671. 80261. 36873 32202. 41544. +## 3 00200 Male 52278 50826. 53730. 15831 13327. 18335. +## 4 00200 Female 55162 53643. 56681. 20248 17679. 22817. +## 5 00300 Male 45634 44743. 46525. 14869 12638. 17100. +## 6 00300 Female 49546 48576. 50516. 21527 19010. 24044. +## 7 00400 Male 45960 45067. 46853. 12788 10699. 14877. +## 8 00400 Female 48601 47783. 49419. 18980 16540. 21420. +## # ℹ 3 more variables: ba_above_pct <dbl>, ba_above_pct_low <dbl>, ba_above_pct_upp <dbl>

Modeling with PUMS data @@ -507,10 +506,10 @@

Modeling with PUMS data= survey_median(JWMNP) )

## # A tibble: 1 × 10
-##        n  n_se mean_wage mean_wage_se median_wage median_wage_se mean_commute mean_comm…¹ media…² media…³
-##    <dbl> <dbl>     <dbl>        <dbl>       <dbl>          <dbl>        <dbl>       <dbl>   <dbl>   <dbl>
-## 1 282733 1933.    44601.         437.       35000           251.         23.3       0.233      20    1.26
-## # … with abbreviated variable names ¹​mean_commute_se, ²​median_commute, ³​median_commute_se
+## n n_se mean_wage mean_wage_se median_wage median_wage_se mean_commute mean_commute_se +## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +## 1 282733 1933. 44601. 437. 35000 251. 23.3 0.233 +## # ℹ 2 more variables: median_commute <dbl>, median_commute_se <dbl>
 vt_model_sd %>% 
   survey_count(emp_type)
diff --git a/docs/articles/spatial-data.html b/docs/articles/spatial-data.html index 0ead330..0d8e1f2 100644 --- a/docs/articles/spatial-data.html +++ b/docs/articles/spatial-data.html @@ -40,7 +40,7 @@ tidycensus - 1.3.3 + 1.4 @@ -192,7 +192,8 @@

Faceted mapping +Hispanics by Census tract for the 2020 Census, using the PL-94171 +summary file.

@@ -216,15 +218,14 @@

Faceted mapping## Bounding box: xmin: -95.46502 ymin: 29.53424 xmax: -95.09005 ymax: 29.96492 ## Geodetic CRS: NAD83 ## # A tibble: 6 × 6 -## GEOID NAME varia…¹ value summa…² geometry -## <chr> <chr> <chr> <dbl> <dbl> <MULTIPOLYGON [°]> -## 1 48201341203 Census Tract 3412… White 1503 2355 (((-95.10641 29.54594, -… -## 2 48201341203 Census Tract 3412… Black 177 2355 (((-95.10641 29.54594, -… -## 3 48201341203 Census Tract 3412… Asian 54 2355 (((-95.10641 29.54594, -… -## 4 48201341203 Census Tract 3412… Hispan… 492 2355 (((-95.10641 29.54594, -… -## 5 48201550601 Census Tract 5506… White 265 6673 (((-95.46502 29.96456, -… -## 6 48201550601 Census Tract 5506… Black 2156 6673 (((-95.46502 29.96456, -… -## # … with abbreviated variable names ¹​variable, ²​summary_value +## GEOID NAME variable value summary_value geometry +## <chr> <chr> <chr> <dbl> <dbl> <MULTIPOLYGON [°]> +## 1 48201341203 Census Tra… White 1503 2355 (((-95.10641 29.54594, -… +## 2 48201341203 Census Tra… Black 177 2355 (((-95.10641 29.54594, -… +## 3 48201341203 Census Tra… Asian 54 2355 (((-95.10641 29.54594, -… +## 4 48201341203 Census Tra… Hispanic 492 2355 (((-95.10641 29.54594, -… +## 5 48201550601 Census Tra… White 265 6673 (((-95.46502 29.96456, -… +## 6 48201550601 Census Tra… Black 2156 6673 (((-95.46502 29.96456, -…

We notice that there are four entries for each Census tract, with each entry representing one of our requested variables. The summary_value column represents the value of the summary diff --git a/docs/articles/spatial-data_files/figure-html/unnamed-chunk-2-1.png b/docs/articles/spatial-data_files/figure-html/unnamed-chunk-2-1.png index b665d79..8efe14c 100644 Binary files a/docs/articles/spatial-data_files/figure-html/unnamed-chunk-2-1.png and b/docs/articles/spatial-data_files/figure-html/unnamed-chunk-2-1.png differ diff --git a/docs/authors.html b/docs/authors.html index d363d7e..207023d 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 @@ -99,13 +99,13 @@

Citation

Walker K, Herman M (2023). tidycensus: Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames. -R package version 1.3.3, https://walker-data.com/tidycensus/. +R package version 1.4, https://walker-data.com/tidycensus/.

@Manual{,
   title = {tidycensus: Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames},
   author = {Kyle Walker and Matt Herman},
   year = {2023},
-  note = {R package version 1.3.3},
+  note = {R package version 1.4},
   url = {https://walker-data.com/tidycensus/},
 }
diff --git a/docs/index.html b/docs/index.html index 67e0e35..25e289e 100644 --- a/docs/index.html +++ b/docs/index.html @@ -44,7 +44,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/news/index.html b/docs/news/index.html index 5819276..936cb6a 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 @@ -75,6 +75,14 @@

Changelog

Source: NEWS.md +
+ +
diff --git a/docs/reference/as_dot_density.html b/docs/reference/as_dot_density.html index 0282183..407bbf3 100644 --- a/docs/reference/as_dot_density.html +++ b/docs/reference/as_dot_density.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/census_api_key.html b/docs/reference/census_api_key.html index 68dfdda..577ba73 100644 --- a/docs/reference/census_api_key.html +++ b/docs/reference/census_api_key.html @@ -21,7 +21,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/county_laea.html b/docs/reference/county_laea.html index 4e5b82e..c562652 100644 --- a/docs/reference/county_laea.html +++ b/docs/reference/county_laea.html @@ -18,7 +18,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/fips_codes.html b/docs/reference/fips_codes.html index 4a3df39..6feb131 100644 --- a/docs/reference/fips_codes.html +++ b/docs/reference/fips_codes.html @@ -26,7 +26,7 @@ tidycensus - 1.3.3 + 1.4 @@ -101,7 +101,7 @@

Dataset with FIPS codes for US states and counties

Format

-

An object of class data.frame with 3247 rows and 5 columns.

+

An object of class data.frame with 3256 rows and 5 columns.

Details

diff --git a/docs/reference/get_acs.html b/docs/reference/get_acs.html index 8f7b7ac..ec68892 100644 --- a/docs/reference/get_acs.html +++ b/docs/reference/get_acs.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4
diff --git a/docs/reference/get_decennial.html b/docs/reference/get_decennial.html index 86de4a4..9e39871 100644 --- a/docs/reference/get_decennial.html +++ b/docs/reference/get_decennial.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 @@ -86,10 +86,10 @@

Obtain data and feature geometry for the decennial US Census

variables = NULL, table = NULL, cache_table = FALSE, - year = 2010, - sumfile = c("sf1", "sf2", "sf3", "sf4", "sf2profile", "sf3profile", "sf4profile", "pl", - "plnat", "aian", "aianprofile", "as", "gu", "vi", "mp", "responserate", "pes", - "dpvi", "dpgu", "dpas", "dpmp", "sldh"), + year = 2020, + sumfile = c("pl", "dhc", "dp", "sf1", "sf2", "sf3", "sf4", "sf2profile", "sf3profile", + "sf4profile", "pl", "plnat", "aian", "aianprofile", "as", "gu", "vi", "mp", + "responserate", "pes", "dpvi", "dpgu", "dpas", "dpmp", "sldh"), state = NULL, county = NULL, geometry = FALSE, @@ -129,12 +129,12 @@

Arguments

year
-

The year for which you are requesting data. Defaults to 2010; 2000, +

The year for which you are requesting data. Defaults to 2020; 2000, 2010, and 2020 are available.

sumfile
-

The Census summary file; defaults to "sf1" but will switch to "pl" if the year supplied is 2020. Not all summary files are available for each decennial Census year.

+

The Census summary file; defaults to "pl". Not all summary files are available for each decennial Census year. Make sure you are using the correct summary file for your requested variables, as variable IDs may be repeated across summary files and represent different topics.

state
@@ -152,8 +152,7 @@

Arguments

geometry

if FALSE (the default), return a regular tibble of ACS data. if TRUE, uses the tigris package to return an sf tibble -with simple feature geometry in the `geometry` column. state, county, tract, and block group are -supported for 2000 through 2020; block and ZCTA geometry are supported for 2000 and 2010.

+with simple feature geometry in the `geometry` column.

output
diff --git a/docs/reference/get_estimates.html b/docs/reference/get_estimates.html index 90a3651..bf0599a 100644 --- a/docs/reference/get_estimates.html +++ b/docs/reference/get_estimates.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/get_flows.html b/docs/reference/get_flows.html index 4169a0c..d1cac9d 100644 --- a/docs/reference/get_flows.html +++ b/docs/reference/get_flows.html @@ -20,7 +20,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/get_pums.html b/docs/reference/get_pums.html index 49ccc2b..af23855 100644 --- a/docs/reference/get_pums.html +++ b/docs/reference/get_pums.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/index.html b/docs/reference/index.html index d064c60..6ea1568 100644 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/interpolate_pw.html b/docs/reference/interpolate_pw.html index 72149b4..ee6c3e1 100644 --- a/docs/reference/interpolate_pw.html +++ b/docs/reference/interpolate_pw.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/load_variables.html b/docs/reference/load_variables.html index 454ab16..2d6f6a9 100644 --- a/docs/reference/load_variables.html +++ b/docs/reference/load_variables.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 @@ -83,10 +83,10 @@

Load variables from a decennial Census or American Community Survey dataset
load_variables(
   year,
-  dataset = c("sf1", "sf2", "sf3", "sf4", "pl", "as", "gu", "mp", "vi", "acsse", "dpas",
-    "dpgu", "dpmp", "dpvi", "acs1", "acs3", "acs5", "acs1/profile", "acs3/profile",
-    "acs5/profile", "acs1/subject", "acs3/subject", "acs5/subject", "acs1/cprofile",
-    "acs5/cprofile"),
+  dataset = c("sf1", "sf2", "sf3", "sf4", "pl", "dhc", "dp", "as", "gu", "mp", "vi",
+    "acsse", "dpas", "dpgu", "dpmp", "dpvi", "acs1", "acs3", "acs5", "acs1/profile",
+    "acs3/profile", "acs5/profile", "acs1/subject", "acs3/subject", "acs5/subject",
+    "acs1/cprofile", "acs5/cprofile"),
   cache = FALSE
 )
@@ -101,7 +101,7 @@

Arguments

dataset
-

One of "sf1", "sf2", "sf3", "sf4", "pl", +

One of "sf1", "sf2", "sf3", "sf4", "pl", "dhc", "dp", "as", "gu", "mp", "vi", "acsse", "acs1", "acs3", "acs5", "acs1/profile", "acs3/profile", "acs5/profile", "acs1/subject", "acs3/subject", "acs5/subject", "acs1/cprofile", or "acs5/cprofile".

diff --git a/docs/reference/mig_recodes.html b/docs/reference/mig_recodes.html index 36e1ff6..f49b249 100644 --- a/docs/reference/mig_recodes.html +++ b/docs/reference/mig_recodes.html @@ -24,7 +24,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/moe_product.html b/docs/reference/moe_product.html index 66f737a..16e2a57 100644 --- a/docs/reference/moe_product.html +++ b/docs/reference/moe_product.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/moe_prop.html b/docs/reference/moe_prop.html index 12ac1f0..91dd196 100644 --- a/docs/reference/moe_prop.html +++ b/docs/reference/moe_prop.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/moe_ratio.html b/docs/reference/moe_ratio.html index 0e41560..26c11ba 100644 --- a/docs/reference/moe_ratio.html +++ b/docs/reference/moe_ratio.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/moe_sum.html b/docs/reference/moe_sum.html index 315b96c..0160b43 100644 --- a/docs/reference/moe_sum.html +++ b/docs/reference/moe_sum.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/pums_variables.html b/docs/reference/pums_variables.html index 80aac7d..158e4f4 100644 --- a/docs/reference/pums_variables.html +++ b/docs/reference/pums_variables.html @@ -33,7 +33,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/significance.html b/docs/reference/significance.html index 15142f6..23daad8 100644 --- a/docs/reference/significance.html +++ b/docs/reference/significance.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/state_laea.html b/docs/reference/state_laea.html index 9a5dd0a..279d7dc 100644 --- a/docs/reference/state_laea.html +++ b/docs/reference/state_laea.html @@ -18,7 +18,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/tidycensus.html b/docs/reference/tidycensus.html index 372ddde..3b94a74 100644 --- a/docs/reference/tidycensus.html +++ b/docs/reference/tidycensus.html @@ -17,7 +17,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/docs/reference/to_survey.html b/docs/reference/to_survey.html index 0a8f701..f3ea1a5 100644 --- a/docs/reference/to_survey.html +++ b/docs/reference/to_survey.html @@ -22,7 +22,7 @@ tidycensus - 1.3.3 + 1.4 diff --git a/vignettes/basic-usage.Rmd b/vignettes/basic-usage.Rmd index 9486960..4428b4b 100644 --- a/vignettes/basic-usage.Rmd +++ b/vignettes/basic-usage.Rmd @@ -23,14 +23,15 @@ census_api_key("YOUR API KEY GOES HERE") There are two major functions implemented in __tidycensus__: `get_decennial()`, which grants access to the 2000, 2010, and 2020 decennial US Census APIs, and `get_acs()`, which grants access to the 1-year and 5-year American Community Survey APIs. -In this basic example, let's look at median age by state in 2010: +In this basic example, let's look at median age by state in 2020, with data drawn from the Demographic and Housing Characteristics summary file: ```{r} -age10 <- get_decennial(geography = "state", - variables = "P013001", - year = 2010) +age20 <- get_decennial(geography = "state", + variables = "P13_001N", + year = 2020, + sumfile = "dhc") -head(age10) +head(age20) ``` The function returns a tibble with four columns by default: `GEOID`, which is an identifier for the geographical unit associated with the row; `NAME`, which is a descriptive name of the geographical unit; `variable`, which is the Census variable represented in the row; and `value`, which is the value of the variable for that unit. By default, __tidycensus__ functions return tidy data frames in which rows represent unit-variable combinations; for a wide data frame with Census variable names in the columns, set `output = "wide"` in the function call. @@ -38,7 +39,7 @@ The function returns a tibble with four columns by default: `GEOID`, which is an As the function has returned a tidy object, we can visualize it quickly with __ggplot2__: ```{r, fig.height = 8} -age10 %>% +age20 %>% ggplot(aes(x = value, y = reorder(NAME, value))) + geom_point() ``` @@ -83,7 +84,9 @@ If __state__ or __county__ is in bold face in "Available by", you are required t ## Searching for variables -Getting variables from the Census or ACS requires knowing the variable ID - and there are thousands of these IDs across the different Census files. To rapidly search for variables, use the `load_variables()` function. The function takes two required arguments: the year of the Census or endyear of the ACS sample, and the dataset name, which varies in availability by year. For the decennial Census, possible dataset choices include `"pl"` for the redistricting files (currently the only choice for 2020), `"sf1"` or `"sf2"` (2000 and 2010) and `"sf3"` or `"sf4"` (2000 only) for the various summary files. Special island area summary files are available with `"as"`, `"mp"`, `"gu"`, or `"vi"`. For the ACS, use either `"acs1"` or `"acs5"` for the ACS detailed tables, and append `/profile` for the Data Profile and `/subject` for the Subject Tables. To browse these variables, assign the result of this function to a variable and use the `View` function in RStudio. An optional argument `cache = TRUE` will cache the dataset on your computer for future use. +Getting variables from the Census or ACS requires knowing the variable ID - and there are thousands of these IDs across the different Census files. To rapidly search for variables, use the `load_variables()` function. The function takes two required arguments: the year of the Census or endyear of the ACS sample, and the dataset name, which varies in availability by year. For the decennial Census, possible dataset choices include `"pl"` for the redistricting files; `"dhc"` for the Demographic and Housing Characteristics file and `"dp"` for the Demographic Profile (2020 only), and `"sf1"` or `"sf2"` (2000 and 2010) and `"sf3"` or `"sf4"` (2000 only) for the various summary files. Special island area summary files are available with `"as"`, `"mp"`, `"gu"`, or `"vi"`. + +For the ACS, use either `"acs1"` or `"acs5"` for the ACS detailed tables, and append `/profile` for the Data Profile and `/subject` for the Subject Tables. To browse these variables, assign the result of this function to a variable and use the `View` function in RStudio. An optional argument `cache = TRUE` will cache the dataset on your computer for future use. ```{r, eval = FALSE} v17 <- load_variables(2017, "acs5", cache = TRUE) @@ -99,13 +102,13 @@ By filtering for "median age" variable IDs corresponding to that query can be br American Community Survey (ACS) data are available from the 1-year ACS since 2005 for geographies of population 65,000 and greater, and from the 5-year ACS for all geographies down to the block group level starting with the 2005-2009 dataset. `get_acs()` defaults to the 5-year ACS with the argument `survey = "acs5"`, but 1-year ACS data are available using `survey = "acs1"`. -ACS data differ from decennial Census data as they are based on an annual sample of approximately 3 million households, rather than a more complete enumeration of the US population. In turn, ACS data points are __estimates__ characterized by a __margin of error__. __tidycensus__ will always return the estimate and margin of error together for any requested variables when using `get_acs()`. In turn, when requesting ACS data with __tidycensus__, it is not necessary to specify the `"E"` or `"M"` suffix for a variable name. Let's fetch median household income data from the 2014-2018 ACS for counties in Vermont. +ACS data differ from decennial Census data as they are based on an annual sample of approximately 3 million households, rather than a more complete enumeration of the US population. In turn, ACS data points are __estimates__ characterized by a __margin of error__. __tidycensus__ will always return the estimate and margin of error together for any requested variables when using `get_acs()`. In turn, when requesting ACS data with __tidycensus__, it is not necessary to specify the `"E"` or `"M"` suffix for a variable name. Let's fetch median household income data from the 2017-2021 ACS for counties in Vermont. ```{r} vt <- get_acs(geography = "county", variables = c(medincome = "B19013_001"), state = "VT", - year = 2018) + year = 2021) vt ``` @@ -121,7 +124,7 @@ vt %>% geom_errorbarh(aes(xmin = estimate - moe, xmax = estimate + moe)) + geom_point(color = "red", size = 3) + labs(title = "Household income by county in Vermont", - subtitle = "2014-2018 American Community Survey", + subtitle = "2017-2021 American Community Survey", y = "", x = "ACS estimate (bars represent margin of error)") ``` diff --git a/vignettes/spatial-data.Rmd b/vignettes/spatial-data.Rmd index 0c881f7..5b8f0ed 100644 --- a/vignettes/spatial-data.Rmd +++ b/vignettes/spatial-data.Rmd @@ -45,7 +45,7 @@ Please note that the UTM Zone 11N coordinate system (`26911`) is appropriate for ## Faceted mapping -One of the most powerful features of __ggplot2__ is its support for small multiples, which works very well with the tidy data format returned by __tidycensus__. Many Census and ACS variables return _counts_, however, which are generally inappropriate for choropleth mapping. In turn, `get_decennial` and `get_acs` have an optional argument, `summary_var`, that can work as a multi-group denominator when appropriate. Let's use the following example of the racial geography of Harris County, Texas. First, we'll request data for non-Hispanic whites, non-Hispanic blacks, non-Hispanic Asians, and Hispanics by Census tract for the 2020 Census. +One of the most powerful features of __ggplot2__ is its support for small multiples, which works very well with the tidy data format returned by __tidycensus__. Many Census and ACS variables return _counts_, however, which are generally inappropriate for choropleth mapping. In turn, `get_decennial` and `get_acs` have an optional argument, `summary_var`, that can work as a multi-group denominator when appropriate. Let's use the following example of the racial geography of Harris County, Texas. First, we'll request data for non-Hispanic whites, non-Hispanic blacks, non-Hispanic Asians, and Hispanics by Census tract for the 2020 Census, using the PL-94171 summary file. ```{r} racevars <- c(White = "P2_005N", @@ -60,7 +60,8 @@ harris <- get_decennial( county = "Harris County", geometry = TRUE, summary_var = "P2_001N", - year = 2020 + year = 2020, + sumfile = "pl" ) head(harris)