Read Processed Health Data with Batch and Parallel Support
Source:R/sus_data_read.R
sus_data_read.RdSmartly reads one or multiple health data files exported by sus_data_export().
Supports automatic format detection, batch processing, parallel execution, spatial data,
metadata loading, and data validation.
Usage
sus_data_read(
path,
format = NULL,
parallel = FALSE,
workers = 4,
read_metadata = FALSE,
lang = "pt",
verbose = TRUE
)Arguments
- path
Character vector of file paths, or a single directory path. If a directory is provided, all matching files will be read.
- format
Character string specifying the input format. Options:
"dbf","dbc","rds","parquet","geoparquet","shapefile","gpkg","geojson","csv". IfNULL(default), automatically detects format from file extension.- parallel
Logical. If
TRUE, uses parallel processing for multiple files. Requiresfutureandfuture.applypackages. Default:FALSE.- workers
Integer. Number of parallel workers when
parallel = TRUE. Default: 4.- read_metadata
Logical. If
TRUE(default), loads companion metadata files and attaches them as attributes.- lang
Character string specifying the language for messages. Options:
"en"(English),"pt"(Portuguese, default),"es"(Spanish).- verbose
Logical. If
TRUE(default), prints progress and summary.
Value
A data frame or sf object (for spatial data) containing the loaded data.
For batch reads, all files are combined with dplyr::bind_rows().
Metadata is attached as attributes:
Single file:
attr(df, "metadata")Batch:
attr(df, "batch_metadata")(list of metadata from each file)Batch:
attr(df, "n_files_combined")(number of files)
Details
Batch Processing: Pass a vector of file paths or a directory path to read multiple files at once. All files are automatically combined into a single object.
Parallel Processing:
When parallel = TRUE, files are read simultaneously using future.apply.
This significantly speeds up batch reads of large files.
Format Detection:
Automatically detects format from file extension. For .parquet files,
automatically determines if it's GeoParquet (spatial) or regular Parquet.
Memory Efficiency: For very large datasets (>50 GB), consider using chunked processing or reading files individually instead of batch mode.
Examples
if (FALSE) { # \dontrun{
library(climasus4r)
# Single file
df <- sus_data_read("output/data.parquet")
# Multiple files (vector)
df <- sus_data_read(c("output/2020.parquet", "output/2021.parquet"))
# Directory (all Parquet files)
df <- sus_data_read("output/", format = "parquet")
# Parallel batch read
df <- sus_data_read("output/", format = "dbf",
parallel = TRUE, workers = 6)
# Access batch metadata
batch_meta <- attr(df, "batch_metadata")
n_files <- attr(df, "n_files_combined")
} # }