Skip to contents

Exports processed health data to a file with optional metadata documentation to ensure reproducibility. Supports multiple file formats optimized for different use cases, including GeoArrow/GeoParquet for spatial data.

Usage

sus_data_export(
  df,
  file_path,
  format = NULL,
  include_metadata = TRUE,
  metadata = NULL,
  compress = TRUE,
  compression_level = 6,
  overwrite = FALSE,
  lang = "pt",
  verbose = TRUE
)

Arguments

df

A data frame or sf object containing the processed health data to export.

file_path

Character string specifying the output file path. The file extension determines the format if format is not explicitly specified.

format

Character string specifying the output format. Options: "rds" (default, R binary format), "arrow" (Apache Arrow/Parquet), "geoparquet" (GeoArrow/GeoParquet for spatial data), "shapefile" (ESRI Shapefile), "GeoPackage", and "csv" (comma-separated values). If NULL, infers from file_path extension and data type (auto-detects sf objects).

include_metadata

Logical. If TRUE (default), saves a companion metadata file (.txt or .json) with processing information.

metadata

Named list containing custom metadata to save. Common fields:

  • source_system: Health system (e.g., "SIM-DO")

  • states: Vector of state codes

  • years: Vector of years

  • filters_applied: Description of filters

  • disease_groups: Disease groups included

  • processing_date: Date of processing

  • package_version: climasus4r version

  • author: Analyst name

  • notes: Additional notes

If NULL, generates basic metadata automatically.

compress

Logical. If TRUE (default for RDS and Arrow), compresses the output file. Compression level can be specified for some formats.

compression_level

Integer specifying compression level (1-9). Higher values = smaller files but slower. Default is 6. Only applies to formats that support compression.

overwrite

Logical. If TRUE, overwrites existing files. If FALSE (default), stops with an error if file exists.

lang

Character string specifying the language for messages. Options: "en" (English, default), "pt" (Portuguese), "es" (Spanish).

verbose

Logical. If TRUE (default), prints export summary.

Value

Invisibly returns the file path of the exported data. If metadata was saved, also returns the metadata file path as an attribute.

Details

File Formats:

  • RDS (.rds): Native R format. Fast, compressed, preserves all R object attributes. Best for R-only workflows.

  • Parquet (.parquet): Columnar format. Excellent compression, fast reading, language-agnostic. Best for large datasets and interoperability with Python, Spark, etc.

  • GeoParquet (.geoparquet, .parquet for sf objects): Optimized columnar format for spatial data. Combines benefits of Parquet with efficient geometry storage. 50-90% smaller than shapefiles, 10-100x faster.

  • Shapefile (.shp or .gpkg): Traditional GIS format. Widely supported but inefficient for large datasets. Multiple files generated (.shp, .shx, .dbf, etc.).

  • CSV (.csv): Universal text format. Human-readable, compatible with all software. Best for sharing with non-R users. Larger file size. Note: Geometries are exported as WKT (Well-Known Text) for spatial data.

Automatic Format Detection: If format = NULL, the function automatically detects the best format:

  • If input is an sf object and extension is .parquet"geoparquet"

  • If input is an sf object and extension is .shp"shapefile"

  • Otherwise, infers from file extension

Spatial Data Export: When exporting sf objects (spatial data from sus_join_spatial()):

  • Recommended: Use GeoParquet format for optimal performance

  • GeoParquet preserves CRS, geometry types, and all attributes

  • Compatible with QGIS, Python (geopandas), and other GIS software

  • Significantly faster and smaller than shapefiles

Examples

if (FALSE) { # \dontrun{
library(climasus4r)

# Export regular data frame to RDS
sus_data_export(df_final, "output/data.rds")

# Export spatial data to GeoParquet (RECOMMENDED)
sf_result <- sus_join_spatial(df, level = "munic")
sus_data_export(
  sf_result,
  file_path = "output/spatial_data.geoparquet",
  format = "geoparquet"  # Auto-detected if extension is .parquet
)

# Export spatial data to Shapefile (traditional)
sus_data_export(
  sf_result,
  file_path = "output/spatial_data.shp",
  format = "shapefile"
)

# Export to Arrow with custom metadata
sus_data_export(
  df_final,
  file_path = "output/respiratory_sp_2023.parquet",
  format = "parquet",
  metadata = list(
    source_system = "SIM-DO",
    states = "SP",
    years = 2023,
    disease_groups = "respiratory",
    author = "Max Anjos"
  )
)
} # }