Skip to contents

This function acts as a wrapper for microdatasus::fetch_datasus, simplifying the download and reading of data from Brazilian public health information systems (SIM, SINAN, SIH, SIA, CNES, SINASC). It includes parallel processing, caching, and user-friendly CLI feedback.

Usage

sus_data_import(
  uf = NULL,
  region = NULL,
  year,
  month = NULL,
  system,
  use_cache = TRUE,
  cache_dir = "~/.climasus4r_cache/data",
  force_redownload = FALSE,
  parallel = FALSE,
  workers = 4,
  lang = "pt",
  verbose = TRUE
)

Arguments

uf

A string or vector of strings with state abbreviations (igonered if 'region' is provided) (e.g., "AM", c("SP", "RJ")). Valid UF codes: AC, AL, AP, AM, BA, CE, DF, ES, GO, MA, MT, MS, MG, PA, PB, PR, PE, PI, RJ, RN, RS, RO, RR, SC, SP, SE, TO.

region

A string indicating a predefined group of states (supports multilingual names PT, EN, ES). Available regions:

IBGE Macro-regions:

  • "norte": c("AC", "AP", "AM", "PA", "RO", "RR", "TO")

  • "nordeste": c("AL", "BA", "CE", "MA", "PB", "PE", "PI", "RN", "SE")

  • "centro_oeste": c("DF", "GO", "MT", "MS")

  • "sudeste": c("ES", "MG", "RJ", "SP")

  • "sul": c("PR", "RS", "SC")

Biomes (Ecological Borders):

  • "amazonia_legal": c("AC", "AP", "AM", "PA", "RO", "RR", "MT", "MA", "TO")

  • "mata_atlantica": c("AL", "BA", "CE", "ES", "GO", "MA", "MG", "MS", "PB", "PE", "PI", "PR", "RJ", "RN", "RS", "SC", "SE", "SP")

  • "caatinga": c("AL", "BA", "CE", "MA", "PB", "PE", "PI", "RN", "SE", "MG")

  • "cerrado": c("BA", "DF", "GO", "MA", "MG", "MS", "MT", "PA", "PI", "PR", "RO", "SP", "TO")

  • "pantanal": c("MT", "MS")

  • "pampa": c("RS")

Hydrography & Climate:

  • "bacia_amazonia": c("AC", "AM", "AP", "MT", "PA", "RO", "RR")

  • "bacia_sao_francisco": c("AL", "BA", "DF", "GO", "MG", "PE", "SE")

  • "bacia_parana": c("GO", "MG", "MS", "PR", "SP")

  • "bacia_tocantins": c("GO", "MA", "PA", "TO")

  • "semi_arido": c("AL", "BA", "CE", "MA", "PB", "PE", "PI", "RN", "SE", "MG")

Health, Agriculture & Geopolitics:

  • "matopiba": c("MA", "TO", "PI", "BA")

  • "arco_desmatamento": c("RO", "AC", "AM", "PA", "MT", "MA")

  • "dengue_hyperendemic": c("GO", "MS", "MT", "PR", "RJ", "SP")

  • "sudene": c("AL", "BA", "CE", "MA", "PB", "PE", "PI", "RN", "SE", "MG", "ES")

  • "fronteira_brasil": c("AC", "AM", "AP", "MT", "MS", "PA", "PR", "RO", "RR", "RS", "SC")

year

An integer or vector of integers with the desired years (4 digits).

month

An integer or vector of integers with the desired months (1-12). This argument is only used with monthly-based health information systems: SIH, CNES, and SIA. For annual systems (SIM, SINAN, SINASC), this parameter is ignored.

system

A string indicating the information system. Available systems:

Mortality Systems (SIM - Mortality Information System):

  • "SIM-DO": Death certificates (Declaracoes de Obito) - Complete dataset

  • "SIM-DOFET": Fetal deaths (Obitos Fetais)

  • "SIM-DOEXT": External causes deaths (Obitos por Causas Externas)

  • "SIM-DOINF": Infant deaths (Obitos Infantis)

  • "SIM-DOMAT": Maternal deaths (Obitos Maternos)

Hospitalization Systems (SIH - Hospital Information System):

  • "SIH-RD": Hospital Admission Authorizations (AIH - Autorizacoes de Internacao Hospitalar)

  • "SIH-RJ": Hospital Admission Authorizations - Rio de Janeiro specific

  • "SIH-SP": Hospital Admission Authorizations - Sao Paulo specific

  • "SIH-ER": Emergency Room Records (Prontuarios de Emergencia)

Notifiable Diseases (SINAN - Notifiable Diseases Information System):

  • "SINAN-DENGUE": Dengue fever cases

  • "SINAN-CHIKUNGUNYA": Chikungunya cases

  • "SINAN-ZIKA": Zika virus cases

  • "SINAN-MALARIA": Malaria cases

  • "SINAN-CHAGAS": Chagas disease cases

  • "SINAN-LEISHMANIOSE-VISCERAL": Visceral leishmaniasis cases

  • "SINAN-LEISHMANIOSE-TEGUMENTAR": Cutaneous leishmaniasis cases

  • "SINAN-LEPTOSPIROSE": Leptospirosis cases

Outpatient Systems (SIA - Outpatient Information System):

  • "SIA-AB": Primary Care (Atencao Basica)

  • "SIA-ABO": Dental Procedures (Procedimentos Odontologicos)

  • "SIA-ACF": Pharmaceutical Assistance (Assistencia Farmaceutica)

  • "SIA-AD": High Complexity (Alta Complexidade/Diferenciada)

  • "SIA-AN": Home Care (Atencao Domiciliar)

  • "SIA-AM": Medical Specialties (Ambulatorio de Especialidades)

  • "SIA-AQ": Strategic Actions (Acoes Estrategicas)

  • "SIA-AR": Regulation (Regulacao)

  • "SIA-ATD": Urgency/Emergency (Urgencia/Emergencia)

  • "SIA-PA": Hospital Outpatient (Procedimentos Ambulatoriais em Hospital)

  • "SIA-PS": Psychosocial Care (Atencao Psicossocial)

  • "SIA-SAD": Specialized Care (Atencao Especializada)

Health Establishments (CNES - National Health Establishment Registry):

  • "CNES-LT": Beds (Leitos)

  • "CNES-ST": Health Professionals (Profissionais de Saude)

  • "CNES-DC": Equipment (Equipamentos) - Detailed

  • "CNES-EQ": Equipment (Equipamentos) - Summary

  • "CNES-SR": Specialized Services (Servicos Especializados)

  • "CNES-HB": Hospital Beds (Leitos Hospitalares)

  • "CNES-PF": Health Professionals Detailed (Pessoal Fisico)

  • "CNES-EP": Teaching Participants (Participantes do Ensino)

  • "CNES-RC": Hospital Class (Classificacao Hospitalar)

  • "CNES-IN": Hospital Indicators (Indicadores Hospitalares)

  • "CNES-EE": Educational Entities (Entidades de Ensino)

  • "CNES-EF": Teaching Facilities (Instalacoes de Ensino)

  • "CNES-GM": Management/Support (Gestao e Apoio)

Live Births (SINASC - Live Birth Information System):

  • "SINASC": Live Birth Declarations (Declaracoes de Nascidos Vivos)

use_cache

Logical. If TRUE (default), will use cached data to avoid re-downloads. Cache is based on UF, year, month, and system parameters.

cache_dir

Character. Directory to store cached files. Default is "~/.climasus4r_cache/data".

force_redownload

Logical. If TRUE, ignores cache and re-downloads everything. Useful when you suspect cached data is corrupted or outdated.

parallel

Logical. If TRUE (default), will use parallel processing for multiple UF/year combinations. Significantly speeds up bulk downloads.

workers

Integer. Number of parallel workers to use. Default is 4. Set to 1 to disable parallel processing.

lang

Character string specifying the language for variable labels and messages. Options: "en" (English), "pt" (Portuguese, default), "es" (Spanish).

verbose

Logical. If TRUE (default), prints detailed progress information including cache status, download progress, and time estimates.

Value

A tibble (or data.frame) with the requested data, combining multiple UFs/years when requested. The output includes:

  • All original variables from the DATASUS system

  • Additional metadata columns: source_system, download_timestamp

  • Standardized date formats (Date objects instead of strings)

  • UTF-8 encoded character variables

Note: Large datasets (especially SIA and SIH) may require significant memory (1GB+ for national annual data).

Details

Data Sources

All data is sourced from the Brazilian Ministry of Health's DATASUS portal (http://datasus.saude.gov.br).

Caching System

The cache uses SHA-256 hashing of parameters to create unique cache keys. Cached files are stored as compressed RDS files and include metadata about the download date and parameter combination. Cache is automatically invalidated after 30 days for dynamic systems (CNES, SIA, SIH) and 365 days for static systems (SIM, SINAN, SINASC).

Parallel Processing

When downloading data for multiple states or years, parallel processing can reduce download time by up to 70%. The function uses future.apply internally. For large downloads (>100 files), consider increasing workers up to 8 (if your system has sufficient cores and memory).

References

Brazilian Ministry of Health. DATASUS. http://datasus.saude.gov.br

SALDANHA, Raphael de Freitas; BASTOS, Ronaldo Rocha; BARCELLOS, Christovam. Microdatasus: pacote para download e pre-processamento de microdados do Departamento de Informatica do SUS (DATASUS). Cad. Saude Publica, Rio de Janeiro , v. 35, n. 9, e00032419, 2019. Available from https://doi.org/10.1590/0102-311x00032419.

See also

Examples

if (FALSE) { # \dontrun{
# Basic example: Mortality data for Rio de Janeiro in 2022
df_sim <- sus_data_import(
  uf = "RJ", 
  year = 2022, 
  system = "SIM-DO",
  use_cache = TRUE
)

# Dengue cases for two states with parallel processing
df_dengue <- sus_data_import(
  uf = c("SP", "MG"), 
  year = 2023, 
  system = "SINAN-DENGUE",
  parallel = TRUE,
  workers = 3
)

# Hospitalizations with monthly specification
df_hospital <- sus_data_import(
  uf = "SP",
  year = 2024,
  month = 1:6,  # January to June
  system = "SIH-RD",
  verbose = TRUE
)

# Force re-download ignoring cache
df_births <- sus_data_import(
  uf = "BA",
  year = 2020:2022,
  system = "SINASC",
  use_cache = TRUE,
  force_redownload = TRUE  # Refresh cached data
)
} # }