Standardize SUS data column names and values
Source:R/sus_data_standardize.R
sus_data_standardize.RdThis function standardizes column names and categorical values in SUS datasets, ensuring consistency across different years and versions. It supports three languages: English (en), Portuguese (pt), and Spanish (es).
Usage
sus_data_standardize(
df,
lang = "pt",
translate_columns = TRUE,
standardize_values = TRUE,
keep_original = FALSE,
verbose = TRUE
)Arguments
- df
A
data.frameortibbleto be standardized (typically output fromsus_data_import()).- lang
Character. Output language for column names and values. Options: "en" (English), "pt" (Portuguese, Default), "es" (Spanish).
- translate_columns
Logical. If TRUE, translates column names. Default is TRUE.
- standardize_values
Logical. If TRUE, standardizes categorical values. Default is TRUE.
- keep_original
Logical. If TRUE, keeps original columns alongside standardized ones. Default is FALSE.
- verbose
Logical. If TRUE, prints a report of standardization actions. Default is TRUE.
Details
The function builds upon the preprocessing done by microdatasus, adding an
additional layer of standardization specifically designed for climate-health
research workflows.
References
Brazilian Ministry of Health. DATASUS. http://datasus.saude.gov.br
SALDANHA, Raphael de Freitas; BASTOS, Ronaldo Rocha; BARCELLOS, Christovam. Microdatasus: pacote para download e pre-processamento de microdados do Departamento de Informatica do SUS (DATASUS). Cad. Saude Publica, Rio de Janeiro , v. 35, n. 9, e00032419, 2019. Available from https://doi.org/10.1590/0102-311x00032419.
Examples
if (FALSE) { # \dontrun{
# Standardize to English (default)
df_en <- sus_data_standardize(df_raw, lang = "en")
# Standardize to Portuguese
df_pt <- sus_data_standardize(df_raw, lang = "pt")
# Standardize to Spanish
df_es <- sus_data_standardize(df_raw, lang = "es")
# Keep original columns for comparison
df_both <- sus_data_standardize(
df_raw,
lang = "pt",
keep_original = TRUE
)
# Only translate column names (not values)
df_cols_only <- sus_data_standardize(
df_raw,
lang = "en",
translate_columns = TRUE,
standardize_values = FALSE
)
# Complete pipeline
df_analysis_ready <- sus_data_import(uf = "SP", year = 2023, system = "SIM-DO") |>
sus_data_clean_encoding() |>
sus_data_standardize(lang = "pt")
} # }