Manage climasus_df S3 Class Metadata and Storage Backends
Source:R/climasus_meta.R, R/utils-S3.R
sus_meta.RdUnified interface for getting, setting, and managing metadata in
climasus_df objects, with native support for three storage backends:
in-memory tibble (default), columnar Parquet files via the
arrow package, and analytical DuckDB databases via the
duckdb package.
A single, unified interface for reading and writing metadata on
climasus_df objects (including Arrow Tables and DuckDB results).
Minimises namespace pollution by replacing the previous family of ten accessor functions.
Usage
sus_meta(
x = NULL,
field = NULL,
system = NULL,
stage = NULL,
type = NULL,
backend = NULL,
add_history = NULL,
print_history = FALSE,
valid_values = NULL,
...
)
sus_meta(
x = NULL,
field = NULL,
system = NULL,
stage = NULL,
type = NULL,
backend = NULL,
add_history = NULL,
print_history = FALSE,
valid_values = NULL,
...
)Arguments
- x
A
climasus_dfobject, Arrow Table, DuckDB result, orNULLwhen usingvalid_values.- field
Character. When provided as the only extra argument, returns the value of that metadata field (
"system","stage","type","spatial","temporal","backend","created","modified","history","user").- system
Character. DATASUS system identifier (e.g.
"SIM").- stage
Character. Pipeline stage (e.g.
"filter_cid").- type
Character. Data type (e.g.
"raw").- backend
Character. Storage backend:
"tibble"(default),"parquet", or"duckdb".- add_history
Character. A single string appended (with timestamp) to the processing history.
- print_history
Logical. If
TRUE, prints the history and returnsinvisible(NULL).- valid_values
Character. When provided, returns the allowed values for that vocabulary (
"system","stage","type","backend").- ...
Additional named metadata fields to update.
Value
Depends on the operation:
Get all metadata: Named list with all metadata fields.
Get specific field: Value of the requested field.
Update metadata: Updated
climasus_dfobject.Add history: Updated
climasus_dfwith new history entry.Print history:
invisible(NULL)after printing.Valid values: Character vector of allowed values.
to_parquet: Updated
climasus_df(backend = "parquet").from_parquet: Reconstructed
climasus_dffrom file.to_duckdb: Updated
climasus_df(backend = "duckdb").from_duckdb: Reconstructed
climasus_dffrom DuckDB.
The updated climasus_df object, the requested field value, or
invisible(NULL) when print_history = TRUE.
Details
Operation Dispatch Order:
valid_values— vocabulary query (noxrequired)from_parquet— read Parquet file and reconstructfrom_duckdb— read DuckDB table and reconstructprint_history— print processing historyadd_history— append timestamped history entryto_parquet— write to Parquet fileto_duckdb— register in DuckDBfieldgetter — return single metadata field...updates — update metadata fieldsDefault — return all metadata as list
Metadata Persistence Across Backends:
When writing to Parquet, sus_meta is serialised as JSON and stored
in the Arrow schema metadata under the key "climasus_meta", making it
fully recoverable after reading the file back.
When writing to DuckDB, sus_meta is stored in a companion table named
<view_name>__meta within the same connection, enabling SQL-level
introspection of pipeline provenance.
Supports multiple backends:
climasus_df: metadata stored as
sus_metaattributeArrow Table: metadata embedded in schema under
"climasus_meta"keyDuckDB result: metadata extracted from companion
<table>__metatable
Backend Operations
The following named arguments trigger backend-specific operations when
passed via ...:
to_parquet = "<path>"Converts the
climasus_dfto an Arrow Table, embedssus_metaas JSON in the Parquet schema, and writes to<path>. Returns the updatedclimasus_dfwithbackend = "parquet".from_parquet = "<path>"Reads a Parquet file written by
to_parquetand reconstructs a fully-featuredclimasus_dfwith all original metadata. Returns the reconstructed object.xis ignored.to_duckdb = <DBI connection>Registers the
climasus_dfas a DuckDB table (default name:"climasus_data") and storessus_metain a companion table<duckdb_view>__meta. Returns the updatedclimasus_dfwithbackend = "duckdb".from_duckdb = <DBI connection>Reads a table from a DuckDB connection and reconstructs a
climasus_df. Combine withduckdb_viewandduckdb_queryfor fine-grained control.duckdb_view = "<name>"Name of the DuckDB table/view to read or write (default:
"climasus_data").duckdb_query = "<SQL>"Optional SQL
WHEREclause or fullSELECTstatement applied when reading from DuckDB.
Examples
if (FALSE) { # \dontrun{
# ── Basic metadata operations ──────────────────────────────────────────────
# Get all metadata
meta <- sus_meta(df)
# Get specific field
sus_meta(df, "stage") # "filter_cid"
sus_meta(df, "backend") # "tibble"
# Update metadata
df <- sus_meta(df, stage = "clean", type = "clean")
# Add to processing history
df <- sus_meta(df, add_history = "Removed missing values")
# Print history
sus_meta(df, print_history = TRUE)
# Query controlled vocabulary
sus_meta(valid_values = "backend") # "tibble" "parquet" "duckdb"
sus_meta(valid_values = "system") # "SIM" "SIH" ...
# ── Arrow / Parquet backend ────────────────────────────────────────────────
# Write to Parquet (metadata embedded in schema)
df <- sus_meta(df, to_parquet = "data/sim_respiratory.parquet")
sus_meta(df, "backend") # "parquet"
# Read back as climasus_df (metadata fully restored)
df2 <- sus_meta(from_parquet = "data/sim_respiratory.parquet")
sus_meta(df2, "stage") # "filter_cid"
# ── DuckDB backend ─────────────────────────────────────────────────────────
library(duckdb)
con <- duckdb::dbConnect(duckdb::duckdb())
# Register as DuckDB view
df <- sus_meta(df, to_duckdb = con, duckdb_view = "sim_respiratory")
# SQL query directly on the view
DBI::dbGetQuery(con, "SELECT sexo, COUNT(*) AS n FROM sim_respiratory GROUP BY sexo")
# Read back as climasus_df (with optional SQL filter)
df3 <- sus_meta(from_duckdb = con,
duckdb_view = "sim_respiratory",
duckdb_query = "WHERE ano_obito = 2020")
sus_meta(df3, "stage") # "filter_cid"
duckdb::dbDisconnect(con)
} # }
if (FALSE) { # \dontrun{
# With climasus_df
sus_meta(df)
sus_meta(df, "stage")
df <- sus_meta(df, stage = "filter_cid", type = "filter_cid")
# With Arrow Table (auto-extracts from schema metadata)
arrow_tbl <- as_arrow_climasus(df)
sus_meta(arrow_tbl) # Extracts from Arrow schema
sus_meta(arrow_tbl, "system")
# With DuckDB result (if metadata table exists)
con <- duckdb::dbConnect(duckdb::duckdb())
as_duckdb_climasus(df, con, "my_data")
result <- DBI::dbGetQuery(con, "SELECT * FROM my_data")
sus_meta(result) # Extracts from companion __meta table
# Query valid values (no object needed)
sus_meta(valid_values = "backend")
} # }