Add Census Socioeconomic variables to Health Data
Source:R/sus_socio_add_census.R
sus_socio_add_census.RdEnriches health data with socioeconomic indicators from Brazilian Census using
the censobr package. Provides aggregation of microdata to municipality level. Supports multiple census datasets (population, households,
families, mortality, emigration) with intelligent caching and optimized performance for large datasets using Arrow/Parquet format.
Usage
sus_socio_add_census(
df,
dataset = "population",
year = 2010,
census_vars = NULL,
aggregation_fun = "sum",
join_muni_col = NULL,
use_cache = TRUE,
cache_dir = "~/.climasus4r_cache/census",
lang = "pt",
translate_columns = TRUE,
standardize_values = TRUE,
verbose = TRUE
)Arguments
- df
A
data.frameorsfobject containing health data. Typically the output fromsus_join_spatial()function.- dataset
Character string specifying the census dataset to use. Options:
"population"- Population microdata (default)"households"- Household microdata"families"- Family microdata"mortality"- Mortality microdata"emigration"- Emigration microdata"tracts"- Census tract aggregate data (coming soon...)
- year
Integer specifying census year. Options:
2010(default) or2000. Note: Dataset availability varies by year.- census_vars
Character vector specifying census variables to add. Use
sus_census_explore()to select available variables. IfNULL, returns all available variables (not recommended for large datasets).- aggregation_fun
Character. Method to aggregate microdata to municipality level:
"sum"- Sums the selected variables by municipality (e.g., for total population)."mean"- Averages the selected variables (e.g., for income)."median","min","max","sd","q25","q75","q95", and"q99".
- join_muni_col
Character string specifying the column in df containing the 6 or 7-digit IBGE municipality code. If
NULL, detects common SUS patterns (e.g.,code_muni).- use_cache
Logical. If
TRUE(default), uses censobr's caching system to store downloaded data locally for faster subsequent access.- cache_dir
Character string specifying cache directory path. Defaults to
"~/.climasus4r_cache/census".The function automatically callscensobr::set_censobr_cache_dir(cache_dir)to ensure consistency across the ecosystem.- lang
Character string specifying language for messages. Options:
"pt"(Portuguese, default),"en"(English),"es"(Spanish).- translate_columns
Logical. If TRUE, translates column names. Default is TRUE.
- standardize_values
Logical. If TRUE, standardizes categorical values. Default is TRUE.
- verbose
Logical. If
TRUE(default), prints progress messages and download progress bar.
Value
Returns the input data.frame or sf object with additional columns. If the input is an sf object, the spatial geometry and CRS are strictly preserved through a join.
Details
Integration with censobr package:
This function is a wrapper around censobr's dataset-specific functions
(read_population(), read_households(), etc.), providing seamless integration
with the climasus4r ecosystem.
Geographic Columns: The function automatically inherits and matches geographic columns following the geobr/censobr standard:
code_muni- 7-digit municipality codecode_state- 2-digit state codeabbrev_state- State abbreviation (e.g., "AM")name_state- State namecode_region- Region codename_region- Region namecode_weighting- Weighting area code
Automatic Column Detection:
If join_muni_col = NULL, the function automatically detects the appropriate code municipality
column based on common SUS patterns:
Municipality:
residence_municipality_code,municipality_code,codigo_municipio,CODMUNRES, etc.
Performance Optimization: The function uses Arrow/Parquet format for efficient larger-than-memory dataset handling:
Downloads data only once (cached locally)
Filters municipalities BEFORE loading to RAM
Uses
dplyr::collect()only after filteringUses
sfarrowpak for spatial filtering
Spatial Data Support:
If df is an sf object from sus_join_spatial(), geometries are preserved in the output.
References
Pereira, Rafael H. M.; Barbosa, Rogerio J. (2023) censobr: Download Data from Brazil’s Population Census. R package version v0.4.0, https://CRAN.R-project.org/package=censobr. DOI: 10.32614/CRAN.package.censobr.
Examples
if (FALSE) { # \dontrun{
library(climasus4r)
# Prepare spatial health data
sf_sim <- sus_data_import(uf = "SP", year = 2023, system = "SIM-DO") %>%
sus_data_standardize(lang = "pt") %>%
sus_join_spatial(level = "munic", lang = "pt")
# Add census population data
sf_enriched <- sus_socio_add_census(
df = sf_sim,
dataset = "population",
census_vars = c("V0001", "V0002"),
year = 2010,
lang = "pt"
)
} # }