
Integrated Spatiotemporal Analyses of Health, Climate, and Environment in Brazil
climasus4r is an integrated R toolkit designed to streamline the analysis of health, climate, and environmental data in Brazil. Developed within the INCT Conexão – Amazônia project, it automates and standardizes critical steps in epidemiological and environmental research workflows, promoting reproducibility, efficiency, and scalability.
Built on the solid microdatasus ecosystem, climasus4r expands functionality by incorporating specialized routines for climate and health studies, significantly reducing the effort required for data acquisition, cleaning, integration, and preparation.
Installation
climasus4r is currently under active development. The latest version can be installed directly from GitHub, ensuring access to the most up-to-date features. Before installation, you must have the remotes package, which allows the installation of packages hosted on GitHub.
# Install remotes if you don't have it
if (!require("remotes")) {
install.packages("remotes")
}
# Install CLIMASUS4r
remotes::install_github("ByMaxAnjos/climasus4r", dependencies = TRUE, upgrade = "never")📦 Function Overview
| Category | Function | Description |
|---|---|---|
| 📥 Import & Export | sus_data_import() |
Imports and pre-processes DATASUS data with intelligent caching. |
sus_data_read() |
Optimized reading of processed data with parallel support. | |
sus_data_export() |
Exports processed data preserving metadata. | |
| 🧹 Cleaning & Standardization | sus_data_clean_encoding() |
Detects and corrects character encoding issues. |
sus_data_standardize() |
Standardizes SUS data column names and values. | |
sus_create_variables() |
Creates derived variables for epidemiological analysis. | |
| 🔍 Filters & Selection | sus_data_filter_cid() |
Filters by ICD-10 codes or disease groups (multilingual). |
sus_data_filter_demographics() |
Filters data by demographic variables (age, sex, race). | |
| 🗺️ Spatial & Census | sus_join_spatial() |
Links SUS data to Brazilian geographic boundaries. |
sus_socio_add_census() |
Enriches health data with socioeconomic variables from the Census. | |
sus_data_aggregate() |
Aggregates health data into time series. | |
| 📊 Quality & Metadata | sus_data_quality_report() |
Generates detailed reports on data quality. |
list_disease_groups() |
Lists available disease groups for filtering. | |
sus_census_explore() |
Interactive explorer for Census variables. | |
| ⚡ Cache | clear_climasus_cache() |
Manages and clears local file storage. |
Supported Systems
climasus4r provides simplified and standardized access to major DATASUS information systems through integration with the microdatasus package. This integration automates the collection of raw data from various databases of the Brazilian health system, covering epidemiology, mortality, hospital admissions, and the healthcare network. From this data, climasus4r organizes, cleans, and structures the information, transforming complex DATASUS databases into datasets ready for statistical analysis and spatiotemporal studies.
1. SIM (Mortality Information System)
-
"SIM-DO": Death Certificates (Complete Dataset) -
"SIM-DOFET": Fetal Deaths -
"SIM-DOEXT": Deaths from External Causes -
"SIM-DOINF": Infant Deaths -
"SIM-DOMAT": Maternal Deaths
2. SIH (Hospital Information System)
-
"SIH-RD": AIH (Hospital Admission Authorizations) - General -
"SIH-RJ": AIH - Specific to Rio de Janeiro -
"SIH-SP": AIH - Specific to São Paulo -
"SIH-ER": Emergency Records
3. SINAN (Notifiable Diseases Information System)
-
"SINAN-DENGUE": Dengue cases -
"SINAN-CHIKUNGUNYA": Chikungunya cases -
"SINAN-ZIKA": Zika virus cases -
"SINAN-MALARIA": Malaria cases -
"SINAN-CHAGAS": Chagas disease cases -
"SINAN-LEISHMANIOSE-VISCERAL": Visceral Leishmaniasis -
"SINAN-LEISHMANIOSE-TEGUMENTAR": Tegumentary Leishmaniasis -
"SINAN-LEPTOSPIROSE": Leptospirosis cases
4. SIA (Ambulatory Information System)
-
"SIA-AB": Primary Care (Basic Attention) -
"SIA-ABO": Dental Procedures -
"SIA-ACF": Pharmaceutical Assistance -
"SIA-AD": High Complexity/Differentiated Care -
"SIA-AN": Home Care -
"SIA-AM": Specialized Outpatient Clinics -
"SIA-AQ": Strategic Actions -
"SIA-AR": Regulation -
"SIA-ATD": Urgency/Emergency -
"SIA-PA": Ambulatory Procedures in Hospitals -
"SIA-PS": Psychosocial Care -
"SIA-SAD": Specialized Care
5. CNES (National Register of Health Establishments)
-
"CNES-LT": Hospital Beds -
"CNES-ST": Health Professionals -
"CNES-DC": Equipment (Detailed) -
"CNES-EQ": Equipment (Summary) -
"CNES-SR": Specialized Services -
"CNES-HB": Hospital Beds (Historical) -
"CNES-PF": Physical Personnel (Professionals) -
"CNES-EP": Teaching Participants -
"CNES-RC": Hospital Classification -
"CNES-IN": Hospital Indicators -
"CNES-EE": Teaching Entities -
"CNES-EF": Teaching Facilities -
"CNES-GM": Management and Support
Quick Start
library(climasus4r)
library(dplyr)
# Complete pipeline: Analysis-ready data
df_analysis <- sus_data_import(
uf = "SP",
year = 2023,
system = "SIM-DO"
) |>
sus_data_clean_encoding(lang = "en") |>
sus_data_standardize(lang = "en") |>
sus_data_filter_cid(disease_group = "respiratory", lang = "en") |>
sus_create_variables(create_age_groups = TRUE, lang = "en")Data Infrastructure
The infrastructure phase of climasus4r provides a complete end-to-end pipeline for health data preparation, from raw acquisition to analysis-ready data. With nine main functions, you can transform DATASUS data into aggregated, standardized, and modeling-ready time series in minutes.
RAW DATA (DATASUS)
↓
[1] sus_data_import() → Parallel acquisition
↓
[2] sus_data_clean_encoding() → Encoding correction
↓
[3] sus_data_standardize() → Multilingual standardization
↓
[4] sus_data_filter_cid() → Filtering by disease
↓
[5] sus_create_variables() → Variable creation
↓
[6] sus_data_filter_demographics() → Demographic filtering
↓
[7] sus_data_quality_report() → Quality verification
↓
[8] sus_data_aggregate() → Temporal aggregation
↓
[9] sus_data_export() → Export with metadata
↓
DATA READY FOR ANALYSISFor more information, see the Tutorials and Complete Documentation.