Aggregates individual-level health data into time series counts by specified time units and grouping variables. This function is essential for preparing data for time series analysis, DLNM models, and other temporal epidemiological methods.
Usage
sus_data_aggregate(
df,
time_unit = "day",
fun = "count",
group_by = NULL,
value_col = NULL,
complete_dates = FALSE,
date_col = NULL,
lang = "pt",
verbose = TRUE
)Arguments
- df
A data frame containing health data (output from
sus_data_standardize(), orsus_data_filter*()).- time_unit
Character string specifying the temporal aggregation unit. Standard units:
"day","week","month","quarter","year"Multi-day/week/month:"2 days","5 days"(pentads),"14 days"(fortnightly),"3 months"(trimester),"6 months"(semester). Special:"season"(Brazilian seasons: DJF, MAM, JJA, SON). Default is"day".- fun
Character string or list of functions specifying the aggregation function(s). Options:
"count"(default),"sum","mean","median","min","max","sd","q25"(25th percentile),"q75","q95", and"q99". Can also be a named list for multiple aggregations, e.g.,list(mean_temp = "mean", max_temp = "max").- group_by
Character vector with names of columns to group by (e.g.,
c("sex", "age_group", "race")). IfNULL(default), aggregates across"municipality_code"records.- value_col
Character string with the name of the column to aggregate when using functions other than
"count". Required for"sum","mean", etc. For example,"temperature","precipitation","pm25".- complete_dates
Logical. If
TRUE(default), fills in missing time periods with zero counts to create a complete time series without gaps.- date_col
Character string with the name of the date column to use for aggregation. If
NULL(default), the function will attempt to auto-detect the date column based on common patterns.- lang
Character string specifying the language for messages. Options:
"en"(English),"pt"(Portuguese, default),"es"(Spanish).- verbose
Logical. If
TRUE(default), prints progress messages.
Value
A tibble with aggregated data containing:
date: The aggregated date (start of period)Grouping columns (if
group_bywas specified)Aggregated value column(s) with smart names based on system and function
Details
New Features:
Multiple aggregation functions: Beyond counting, you can now calculate mean, sum, median, percentiles, etc., useful for climate and environmental data.
Smart column naming: The aggregated column is automatically named based on the health system (e.g.,
n_deathsfor SIM,n_hospitalizationsfor SIH-RD,n_birthsfor SINASC,n_casesfor SINAN,n_proceduresfor SIA, andn_establishments, for CNES).
Epidemiological Use Cases:
Daily/Weekly: Standard time series analysis, DLNM for short-term effects
Pentads (5 days): Heat wave analysis, smoothing daily noise
Fortnightly (14 days): Diseases with longer incubation periods
Monthly: Seasonal patterns, long-term trends
Quarterly: SUS management reports, policy evaluation
Seasonal: Dengue, Influenza, respiratory diseases aligned with Brazilian climate
Yearly: Long-term trend analysis, climate change impacts
Brazilian Seasons (when time_unit = "season"):
Summer (Verao): December-January-February (DJF)
Autumn (Outono): March-April-May (MAM)
Winter (Inverno): June-July-August (JJA)
Spring (Primavera): September-October-November (SON)
Examples
if (FALSE) { # \dontrun{
library(climasus4r)
# Basic daily aggregation
df_daily <- sus_data_import(uf = "SP", year = 2023, system = "SIM-DO") %>%
sus_data_standardize() %>%
sus_data_filter_cid(disease_group = "respiratory") %>%
sus_data_aggregate(time_unit = "day")
# Pentad aggregation (5-day periods) for heat wave analysis
df_pentad <- sus_data_aggregate(df, time_unit = "5 days")
# Fortnightly aggregation for diseases with longer incubation
df_fortnightly <- sus_data_aggregate(df, time_unit = "14 days")
# Monthly aggregation by municipality
df_monthly <- sus_data_aggregate(
df,
time_unit = "month",
group_by = c("race", "sex"),
lang = "pt"
)
# Quarterly aggregation for SUS reports
df_quarterly <- sus_data_aggregate(df, time_unit = "quarter")
# Seasonal aggregation for dengue analysis (Brazilian seasons)
df_seasonal <- sus_data_aggregate(
df,
time_unit = "season"
)
# Weekly aggregation by age group and sex
df_weekly <- sus_data_aggregate(
df,
time_unit = "week",
group_by = c("age_group", "sex") #age_group comes from `sus_create_variables()`
)
} # }