Standardize SUS data column names and values
Source:R/sus_data_standardize.R
sus_data_standardize.RdThis function standardizes column names and categorical values in SUS datasets, ensuring consistency across different years and versions. It supports three languages: English (en), Portuguese (pt), and Spanish (es).
Usage
sus_data_standardize(
df,
lang = "pt",
translate_columns = TRUE,
standardize_values = TRUE,
keep_original = FALSE,
backend = "arrow",
verbose = TRUE
)Arguments
- df
A
data.frameortibbleto be standardized (typically output fromsus_data_import()).- lang
Character. Output language for column names and values. Options: "en" (English), "pt" (Portuguese, Default), "es" (Spanish).
- translate_columns
Logical. If TRUE, translates column names. Default is TRUE.
- standardize_values
Logical. If TRUE, standardizes categorical values. Default is TRUE.
- keep_original
Logical. If TRUE, keeps original columns alongside standardized ones. Default is FALSE.
- backend
Character string specifying the data processing backend. Use
"arrow"for out-of-memory, lazy processing (recommended for large datasets), or"tibble"for in-memory processing (recommended for small to medium datasets)."arrow": operations are performed lazily using the Apache Arrow engine, avoiding loading the full dataset into memory. Ideal for large files (e.g., Parquet, Feather) and high-performance workflows."tibble": data is fully loaded into memory as a tibble and processed eagerly using dplyr. Simpler and more predictable, but may be slow or fail for large datasets.
If not specified, the function may automatically choose the backend based on the input data type.
- verbose
Logical. If TRUE, prints a report of standardization actions. Default is TRUE.
Details
The function builds upon the preprocessing done by microdatasus, adding an
additional layer of standardization specifically designed for climate-health
research workflows.
References
Brazilian Ministry of Health. DATASUS. http://datasus.saude.gov.br
SALDANHA, Raphael de Freitas; BASTOS, Ronaldo Rocha; BARCELLOS, Christovam. Microdatasus: pacote para download e pre-processamento de microdados do Departamento de Informatica do SUS (DATASUS). Cad. Saude Publica, Rio de Janeiro , v. 35, n. 9, e00032419, 2019. Available from https://doi.org/10.1590/0102-311x00032419.
Examples
if (FALSE) { # \dontrun{
# Standardize to English (default)
df_en <- sus_data_standardize(df_raw, lang = "en")
# Standardize to Portuguese
df_pt <- sus_data_standardize(df_raw, lang = "pt")
# Standardize to Spanish
df_es <- sus_data_standardize(df_raw, lang = "es")
# Keep original columns for comparison
df_both <- sus_data_standardize(
df_raw,
lang = "pt",
keep_original = TRUE
)
# Only translate column names (not values)
df_cols_only <- sus_data_standardize(
df_raw,
lang = "en",
translate_columns = TRUE,
standardize_values = FALSE
)
# Complete pipeline
df_analysis_ready <- sus_data_import(uf = "SP", year = 2023, system = "SIM-DO") |>
sus_data_clean_encoding() |>
sus_data_standardize(lang = "pt")
} # }