Skip to contents

This function standardizes column names and categorical values in SUS datasets, ensuring consistency across different years and versions. It supports three languages: English (en), Portuguese (pt), and Spanish (es).

Usage

sus_data_standardize(
  df,
  lang = "pt",
  translate_columns = TRUE,
  standardize_values = TRUE,
  keep_original = FALSE,
  verbose = TRUE
)

Arguments

df

A data.frame or tibble to be standardized (typically output from sus_data_import()).

lang

Character. Output language for column names and values. Options: "en" (English), "pt" (Portuguese, Default), "es" (Spanish).

translate_columns

Logical. If TRUE, translates column names. Default is TRUE.

standardize_values

Logical. If TRUE, standardizes categorical values. Default is TRUE.

keep_original

Logical. If TRUE, keeps original columns alongside standardized ones. Default is FALSE.

verbose

Logical. If TRUE, prints a report of standardization actions. Default is TRUE.

Value

A data.frame with standardized column names and values in the specified language.

Details

The function builds upon the preprocessing done by microdatasus, adding an additional layer of standardization specifically designed for climate-health research workflows.

References

Brazilian Ministry of Health. DATASUS. http://datasus.saude.gov.br

SALDANHA, Raphael de Freitas; BASTOS, Ronaldo Rocha; BARCELLOS, Christovam. Microdatasus: pacote para download e pre-processamento de microdados do Departamento de Informatica do SUS (DATASUS). Cad. Saude Publica, Rio de Janeiro , v. 35, n. 9, e00032419, 2019. Available from https://doi.org/10.1590/0102-311x00032419.

Examples

if (FALSE) { # \dontrun{
# Standardize to English (default)
df_en <- sus_data_standardize(df_raw, lang = "en")

# Standardize to Portuguese
df_pt <- sus_data_standardize(df_raw, lang = "pt")

# Standardize to Spanish
df_es <- sus_data_standardize(df_raw, lang = "es")

# Keep original columns for comparison
df_both <- sus_data_standardize(
  df_raw,
  lang = "pt",
  keep_original = TRUE
)

# Only translate column names (not values)
df_cols_only <- sus_data_standardize(
  df_raw,
  lang = "en",
  translate_columns = TRUE,
  standardize_values = FALSE
)

# Complete pipeline
df_analysis_ready <- sus_data_import(uf = "SP", year = 2023, system = "SIM-DO") |>
  sus_data_clean_encoding() |>
  sus_data_standardize(lang = "pt")
} # }