Exports processed health data to a file with optional metadata documentation to ensure reproducibility. Supports multiple file formats optimized for different use cases, including GeoArrow/GeoParquet for spatial data.
Usage
sus_data_export(
df,
file_path,
format = NULL,
include_metadata = TRUE,
metadata = NULL,
compress = TRUE,
compression_level = 6,
overwrite = FALSE,
lang = "pt",
verbose = TRUE
)Arguments
- df
A data frame or sf object containing the processed health data to export.
- file_path
Character string specifying the output file path. The file extension determines the format if
formatis not explicitly specified.- format
Character string specifying the output format. Options:
"rds"(default, R binary format),"arrow"(Apache Arrow/Parquet),"geoparquet"(GeoArrow/GeoParquet for spatial data),"shapefile"(ESRI Shapefile),"GeoPackage", and"csv"(comma-separated values). IfNULL, infers fromfile_pathextension and data type (auto-detects sf objects).- include_metadata
Logical. If
TRUE(default), saves a companion metadata file (.txtor.json) with processing information.- metadata
Named list containing custom metadata to save. Common fields:
source_system: Health system (e.g., "SIM-DO")states: Vector of state codesyears: Vector of yearsfilters_applied: Description of filtersdisease_groups: Disease groups includedprocessing_date: Date of processingpackage_version: climasus4r versionauthor: Analyst namenotes: Additional notes
If
NULL, generates basic metadata automatically.- compress
Logical. If
TRUE(default for RDS and Arrow), compresses the output file. Compression level can be specified for some formats.- compression_level
Integer specifying compression level (1-9). Higher values = smaller files but slower. Default is 6. Only applies to formats that support compression.
- overwrite
Logical. If
TRUE, overwrites existing files. IfFALSE(default), stops with an error if file exists.- lang
Character string specifying the language for messages. Options:
"en"(English, default),"pt"(Portuguese),"es"(Spanish).- verbose
Logical. If
TRUE(default), prints export summary.
Value
Invisibly returns the file path of the exported data. If metadata was saved, also returns the metadata file path as an attribute.
Details
File Formats:
RDS (
.rds): Native R format. Fast, compressed, preserves all R object attributes. Best for R-only workflows.Parquet (
.parquet): Columnar format. Excellent compression, fast reading, language-agnostic. Best for large datasets and interoperability with Python, Spark, etc.GeoParquet (
.geoparquet,.parquetfor sf objects): Optimized columnar format for spatial data. Combines benefits of Parquet with efficient geometry storage. 50-90% smaller than shapefiles, 10-100x faster.Shapefile (
.shpor.gpkg): Traditional GIS format. Widely supported but inefficient for large datasets. Multiple files generated (.shp, .shx, .dbf, etc.).CSV (
.csv): Universal text format. Human-readable, compatible with all software. Best for sharing with non-R users. Larger file size. Note: Geometries are exported as WKT (Well-Known Text) for spatial data.
Automatic Format Detection:
If format = NULL, the function automatically detects the best format:
If input is an
sfobject and extension is.parquet→"geoparquet"If input is an
sfobject and extension is.shp→"shapefile"Otherwise, infers from file extension
Spatial Data Export:
When exporting sf objects (spatial data from sus_join_spatial()):
Recommended: Use GeoParquet format for optimal performance
GeoParquet preserves CRS, geometry types, and all attributes
Compatible with QGIS, Python (geopandas), and other GIS software
Significantly faster and smaller than shapefiles
Examples
if (FALSE) { # \dontrun{
library(climasus4r)
# Export regular data frame to RDS
sus_data_export(df_final, "output/data.rds")
# Export spatial data to GeoParquet (RECOMMENDED)
sf_result <- sus_join_spatial(df, level = "munic")
sus_data_export(
sf_result,
file_path = "output/spatial_data.geoparquet",
format = "geoparquet" # Auto-detected if extension is .parquet
)
# Export spatial data to Shapefile (traditional)
sus_data_export(
sf_result,
file_path = "output/spatial_data.shp",
format = "shapefile"
)
# Export to Arrow with custom metadata
sus_data_export(
df_final,
file_path = "output/respiratory_sp_2023.parquet",
format = "parquet",
metadata = list(
source_system = "SIM-DO",
states = "SP",
years = 2023,
disease_groups = "respiratory",
author = "Max Anjos"
)
)
} # }