Skip to contents

Enriches health data with socioeconomic indicators from Brazilian Census using the censobr package. Provides aggregation of microdata to municipality level. Supports multiple census datasets (population, households, families, mortality, emigration) with intelligent caching and optimized performance for large datasets using Arrow/Parquet format.

Usage

sus_socio_add_census(
  df,
  dataset = "population",
  year = 2010,
  census_vars = NULL,
  aggregation_fun = "sum",
  join_muni_col = NULL,
  use_cache = TRUE,
  cache_dir = "~/.climasus4r_cache/census",
  lang = "pt",
  translate_columns = TRUE,
  standardize_values = TRUE,
  verbose = TRUE
)

Arguments

df

A data.frame or sf object containing health data. Typically the output from sus_join_spatial() function.

dataset

Character string specifying the census dataset to use. Options:

  • "population" - Population microdata (default)

  • "households" - Household microdata

  • "families" - Family microdata

  • "mortality" - Mortality microdata

  • "emigration" - Emigration microdata

  • "tracts" - Census tract aggregate data (coming soon...)

year

Integer specifying census year. Options: 2010 (default) or 2000. Note: Dataset availability varies by year.

census_vars

Character vector specifying census variables to add. Use sus_census_explore() to select available variables. If NULL, returns all available variables (not recommended for large datasets).

aggregation_fun

Character. Method to aggregate microdata to municipality level:

  • "sum" - Sums the selected variables by municipality (e.g., for total population).

  • "mean" - Averages the selected variables (e.g., for income).

  • "median", "min", "max", "sd", "q25", "q75", "q95", and "q99".

join_muni_col

Character string specifying the column in df containing the 6 or 7-digit IBGE municipality code. If NULL, detects common SUS patterns (e.g., code_muni).

use_cache

Logical. If TRUE (default), uses censobr's caching system to store downloaded data locally for faster subsequent access.

cache_dir

Character string specifying cache directory path. Defaults to "~/.climasus4r_cache/census".The function automatically calls censobr::set_censobr_cache_dir(cache_dir) to ensure consistency across the ecosystem.

lang

Character string specifying language for messages. Options: "pt" (Portuguese, default), "en" (English), "es" (Spanish).

translate_columns

Logical. If TRUE, translates column names. Default is TRUE.

standardize_values

Logical. If TRUE, standardizes categorical values. Default is TRUE.

verbose

Logical. If TRUE (default), prints progress messages and download progress bar.

Value

Returns the input data.frame or sf object with additional columns. If the input is an sf object, the spatial geometry and CRS are strictly preserved through a join.

Details

Integration with censobr package: This function is a wrapper around censobr's dataset-specific functions (read_population(), read_households(), etc.), providing seamless integration with the climasus4r ecosystem.

Geographic Columns: The function automatically inherits and matches geographic columns following the geobr/censobr standard:

  • code_muni - 7-digit municipality code

  • code_state - 2-digit state code

  • abbrev_state - State abbreviation (e.g., "AM")

  • name_state - State name

  • code_region - Region code

  • name_region - Region name

  • code_weighting - Weighting area code

Automatic Column Detection: If join_muni_col = NULL, the function automatically detects the appropriate code municipality column based on common SUS patterns:

  • Municipality: residence_municipality_code, municipality_code, codigo_municipio, CODMUNRES, etc.

Performance Optimization: The function uses Arrow/Parquet format for efficient larger-than-memory dataset handling:

  • Downloads data only once (cached locally)

  • Filters municipalities BEFORE loading to RAM

  • Uses dplyr::collect() only after filtering

  • Uses sfarrow pak for spatial filtering

Spatial Data Support: If df is an sf object from sus_join_spatial(), geometries are preserved in the output.

References

Pereira, Rafael H. M.; Barbosa, Rogerio J. (2023) censobr: Download Data from Brazil’s Population Census. R package version v0.4.0, https://CRAN.R-project.org/package=censobr. DOI: 10.32614/CRAN.package.censobr.

Examples

if (FALSE) { # \dontrun{
library(climasus4r)

# Prepare spatial health data
sf_sim <- sus_data_import(uf = "SP", year = 2023, system = "SIM-DO") %>%
  sus_data_standardize(lang = "pt") %>%
  sus_join_spatial(level = "munic", lang = "pt")

# Add census population data
sf_enriched <- sus_socio_add_census(
  df = sf_sim,
  dataset = "population",
  census_vars = c("V0001", "V0002"),
  year = 2010,
  lang = "pt"
)
} # }