Skip to contents

Aggregates individual-level health data into time series counts by specified time units and grouping variables. This function is essential for preparing data for time series analysis, DLNM models, and other temporal epidemiological methods.

Usage

sus_data_aggregate(
  df,
  time_unit = "day",
  fun = "count",
  group_by = NULL,
  value_col = NULL,
  complete_dates = FALSE,
  date_col = NULL,
  lang = "pt",
  verbose = TRUE
)

Arguments

df

A data frame containing health data (output from sus_data_standardize(), or sus_data_filter*()).

time_unit

Character string specifying the temporal aggregation unit. Standard units: "day", "week", "month", "quarter", "year" Multi-day/week/month: "2 days", "5 days" (pentads), "14 days" (fortnightly), "3 months" (trimester), "6 months" (semester). Special: "season" (Brazilian seasons: DJF, MAM, JJA, SON). Default is "day".

fun

Character string or list of functions specifying the aggregation function(s). Options: "count" (default), "sum", "mean", "median", "min", "max", "sd", "q25" (25th percentile), "q75", "q95", and "q99". Can also be a named list for multiple aggregations, e.g., list(mean_temp = "mean", max_temp = "max").

group_by

Character vector with names of columns to group by (e.g., c("sex", "age_group", "race")). If NULL (default), aggregates across "municipality_code" records.

value_col

Character string with the name of the column to aggregate when using functions other than "count". Required for "sum", "mean", etc. For example, "temperature", "precipitation", "pm25".

complete_dates

Logical. If TRUE (default), fills in missing time periods with zero counts to create a complete time series without gaps.

date_col

Character string with the name of the date column to use for aggregation. If NULL (default), the function will attempt to auto-detect the date column based on common patterns.

lang

Character string specifying the language for messages. Options: "en" (English), "pt" (Portuguese, default), "es" (Spanish).

verbose

Logical. If TRUE (default), prints progress messages.

Value

A tibble with aggregated data containing:

  • date: The aggregated date (start of period)

  • Grouping columns (if group_by was specified)

  • Aggregated value column(s) with smart names based on system and function

Details

New Features:

  • Multiple aggregation functions: Beyond counting, you can now calculate mean, sum, median, percentiles, etc., useful for climate and environmental data.

  • Smart column naming: The aggregated column is automatically named based on the health system (e.g., n_deaths for SIM, n_hospitalizations for SIH-RD, n_births for SINASC, n_cases for SINAN, n_procedures for SIA, and n_establishments, for CNES).

Epidemiological Use Cases:

  • Daily/Weekly: Standard time series analysis, DLNM for short-term effects

  • Pentads (5 days): Heat wave analysis, smoothing daily noise

  • Fortnightly (14 days): Diseases with longer incubation periods

  • Monthly: Seasonal patterns, long-term trends

  • Quarterly: SUS management reports, policy evaluation

  • Seasonal: Dengue, Influenza, respiratory diseases aligned with Brazilian climate

  • Yearly: Long-term trend analysis, climate change impacts

Brazilian Seasons (when time_unit = "season"):

  • Summer (Verao): December-January-February (DJF)

  • Autumn (Outono): March-April-May (MAM)

  • Winter (Inverno): June-July-August (JJA)

  • Spring (Primavera): September-October-November (SON)

Examples

if (FALSE) { # \dontrun{
library(climasus4r)

# Basic daily aggregation
df_daily <- sus_data_import(uf = "SP", year = 2023, system = "SIM-DO") %>%
  sus_data_standardize() %>%
  sus_data_filter_cid(disease_group = "respiratory") %>%
  sus_data_aggregate(time_unit = "day")

# Pentad aggregation (5-day periods) for heat wave analysis
df_pentad <- sus_data_aggregate(df, time_unit = "5 days")

# Fortnightly aggregation for diseases with longer incubation
df_fortnightly <- sus_data_aggregate(df, time_unit = "14 days")

# Monthly aggregation by municipality
df_monthly <- sus_data_aggregate(
  df,
  time_unit = "month",
  group_by = c("race", "sex"),
  lang = "pt"
)

# Quarterly aggregation for SUS reports
df_quarterly <- sus_data_aggregate(df, time_unit = "quarter")

# Seasonal aggregation for dengue analysis (Brazilian seasons)
df_seasonal <- sus_data_aggregate(
  df,
  time_unit = "season"
)

# Weekly aggregation by age group and sex
df_weekly <- sus_data_aggregate(
  df,
  time_unit = "week",
  group_by = c("age_group", "sex") #age_group comes from `sus_create_variables()`
)
} # }