Exploring Rural County Census Data Using the ‘TidyCensus’ Package

Jon Fain
4 min readDec 21, 2021

I currently work as a Senior Data Analyst for a non-profit that specializes in the provision and expansion of educational opportunities for children in rural America. A major function of this role is working with secondary census data to better understand the demographic, socio-economic and behavioral health contexts of the places in which we work. However, as folks who work with the census know, the data can be a real challenge. So I was excited to stumble upon Kyle Walker’s TidyCensus R package, which has vastly simplified the process of working with Census data.

While the package has several different functions for pulling select Census datasets, the main functions I use are get_decennial() and get_acs(). The former pulls data from the Decennial Census, while the latter pulls data from the American Community Survey (ACS). Once you obtain an api key from the Census Bureau, you can set your key in R and begin pulling data. The TidyCensus package is also entirely compatible with the Tidyverse package, which is great news for folks who regularly work with tidyverse tools, as I do.

One thing that is not yet in the TidyCensus package (Kyle…if you’re reading this…) is the designation of a county as either ‘urban’ or ‘rural’. I understand the difficulty of adding this functionality, though, and I imagine it lies less with technical development and more with the actual definition of urban and rural counties. I won’t bore you here with details and nuances of the definitional differences, and for the sake of moving forward I’ll adopt the Office of Management and Budget’s (OMB) definition, which designates urban areas as metropolitan, and rural areas as non-metropolitan (okay, I can’t help myself, check out the Census definition of Rural here, and the ERS definition here.) You can download the OMB designations for each county here. Note that we won’t use the rural-urban continuum codes, just the metro/non-metro designations for each county. Once you have the data, it just takes some quick transformations to join your designations to your Tidy Census data frame.

library(tidyverse)
library(tidycensus)
library(openxlsx)
# load in search tables#2019 acs data tables
v19 <- load_variables(2019, "acs5", cache = TRUE)
#2019 acs subject tables
v19_sub <- load_variables(2019, dataset = "acs5/subject")
#2019 data profiles
v19_prof <- load_variables(2019, dataset = "acs5/profile")
#2020 decennial data
v20 <- load_variables(2020, "pl")
#pull 2019 5-year acs data on poverty by county counties <- get_acs(geography = "county",
variables = c('poverty' = 'DP03_0128P'),
year = 2019,
geometry = FALSE) %>%
separate(NAME, c("County_Name", "State"), sep = "([.,:])") %>%
mutate(State = trimws(State))
#load in and wrangle rural designationsrural_des <- read_xlsx("ruralubancodes2013.xlsx) %>%
separate(Description, c("Designation", "NA")) %>%
select(State, County_Name, Designation)
rural_des$State <- state.name[match(rural_des$State, state.abb)] #join acs data with rural designation datarural_counties <- left_join(counties, rural_des) #view data head(rural_counties)

From here, I’ll filter for all non-metropolitan counties to explore ACS data exclusively in rural America. As an example, let’s look at poverty in rural counties in West Virginia.

wv_rural_pov <- rural_counties %>%
filter(State == 'West Virginia',
Designation == 'Nonmetro') %>%
arrange(estimate)
head(wv_rural_pov)

We can also visualize poverty rates by county.

wv_rural_pov %>%
ggplot(aes(x = estimate, y = reorder(County_Name, estimate))) +
geom_col(fill = "dodgerblue") +
labs(title = "Percent of People in Poverty by County in WV",
subtitle = "2015-2019 American Community Survey",
y = "",
x = "") +
geom_text(aes(label=estimate), position=position_dodge(width=0.9), vjust=0.3, hjust = .03)
A quick visualization to make data exploration a bit easier.

If you set geometry to TRUE within the get_acs() function, you can pull in geospatial data that will allow you to create a choropleth poverty map, as well. I used the ‘mapview’ package for this, but there are many packages you can use for geospatial visualizations.

library(mapview)wv_img <- get_acs(geography = "county", 
state = "West Virginia",
variables = c('poverty' = 'DP03_0128P'),
year = 2019,
geometry = TRUE) %>%
separate(NAME, c("County_Name", "State"), sep = "([.,:])") %>%
mutate(State = trimws(State)) %>%
left_join(rural_des) %>%
filter(Designation == 'Nonmetro')
mapview(wv_img, zcol = "estimate")

The TidyCensus package has saved me a ton of time and energy, and a quick join to county rural designation data has allowed me to explore Census data within rural America. This package will be an invaluable tool to have when the Census releases their 2020 Decennial data in March of 2022. Follow Kyle Walker on Twitter for up to date information on the TidyCensus package.

You can check out the complete code for this example on my Github page.

--

--

Jon Fain
0 Followers

I currently work as a Data Analyst in the non-profit sector. Stoked on data science, woodworking and outdoor adventures.