Skip to contents

Unified interface for getting, setting, and managing metadata in climasus_df objects, with native support for three storage backends: in-memory tibble (default), columnar Parquet files via the arrow package, and analytical DuckDB databases via the duckdb package.

A single, unified interface for reading and writing metadata on climasus_df objects (including Arrow Tables and DuckDB results). Minimises namespace pollution by replacing the previous family of ten accessor functions.

Usage

sus_meta(
  x = NULL,
  field = NULL,
  system = NULL,
  stage = NULL,
  type = NULL,
  backend = NULL,
  add_history = NULL,
  print_history = FALSE,
  valid_values = NULL,
  ...
)

sus_meta(
  x = NULL,
  field = NULL,
  system = NULL,
  stage = NULL,
  type = NULL,
  backend = NULL,
  add_history = NULL,
  print_history = FALSE,
  valid_values = NULL,
  ...
)

Arguments

x

A climasus_df object, Arrow Table, DuckDB result, or NULL when using valid_values.

field

Character. When provided as the only extra argument, returns the value of that metadata field ("system", "stage", "type", "spatial", "temporal", "backend", "created", "modified", "history", "user").

system

Character. DATASUS system identifier (e.g. "SIM").

stage

Character. Pipeline stage (e.g. "filter_cid").

type

Character. Data type (e.g. "raw").

backend

Character. Storage backend: "tibble" (default), "parquet", or "duckdb".

add_history

Character. A single string appended (with timestamp) to the processing history.

print_history

Logical. If TRUE, prints the history and returns invisible(NULL).

valid_values

Character. When provided, returns the allowed values for that vocabulary ("system", "stage", "type", "backend").

...

Additional named metadata fields to update.

Value

Depends on the operation:

  • Get all metadata: Named list with all metadata fields.

  • Get specific field: Value of the requested field.

  • Update metadata: Updated climasus_df object.

  • Add history: Updated climasus_df with new history entry.

  • Print history: invisible(NULL) after printing.

  • Valid values: Character vector of allowed values.

  • to_parquet: Updated climasus_df (backend = "parquet").

  • from_parquet: Reconstructed climasus_df from file.

  • to_duckdb: Updated climasus_df (backend = "duckdb").

  • from_duckdb: Reconstructed climasus_df from DuckDB.

The updated climasus_df object, the requested field value, or invisible(NULL) when print_history = TRUE.

Details

Operation Dispatch Order:

  1. valid_values — vocabulary query (no x required)

  2. from_parquet — read Parquet file and reconstruct

  3. from_duckdb — read DuckDB table and reconstruct

  4. print_history — print processing history

  5. add_history — append timestamped history entry

  6. to_parquet — write to Parquet file

  7. to_duckdb — register in DuckDB

  8. field getter — return single metadata field

  9. ... updates — update metadata fields

  10. Default — return all metadata as list

Metadata Persistence Across Backends:

When writing to Parquet, sus_meta is serialised as JSON and stored in the Arrow schema metadata under the key "climasus_meta", making it fully recoverable after reading the file back.

When writing to DuckDB, sus_meta is stored in a companion table named <view_name>__meta within the same connection, enabling SQL-level introspection of pipeline provenance.

Supports multiple backends:

  • climasus_df: metadata stored as sus_meta attribute

  • Arrow Table: metadata embedded in schema under "climasus_meta" key

  • DuckDB result: metadata extracted from companion <table>__meta table

Backend Operations

The following named arguments trigger backend-specific operations when passed via ...:

to_parquet = "<path>"

Converts the climasus_df to an Arrow Table, embeds sus_meta as JSON in the Parquet schema, and writes to <path>. Returns the updated climasus_df with backend = "parquet".

from_parquet = "<path>"

Reads a Parquet file written by to_parquet and reconstructs a fully-featured climasus_df with all original metadata. Returns the reconstructed object. x is ignored.

to_duckdb = <DBI connection>

Registers the climasus_df as a DuckDB table (default name: "climasus_data") and stores sus_meta in a companion table <duckdb_view>__meta. Returns the updated climasus_df with backend = "duckdb".

from_duckdb = <DBI connection>

Reads a table from a DuckDB connection and reconstructs a climasus_df. Combine with duckdb_view and duckdb_query for fine-grained control.

duckdb_view = "<name>"

Name of the DuckDB table/view to read or write (default: "climasus_data").

duckdb_query = "<SQL>"

Optional SQL WHERE clause or full SELECT statement applied when reading from DuckDB.

Examples

if (FALSE) { # \dontrun{
# ── Basic metadata operations ──────────────────────────────────────────────

# Get all metadata
meta <- sus_meta(df)

# Get specific field
sus_meta(df, "stage")     # "filter_cid"
sus_meta(df, "backend")   # "tibble"

# Update metadata
df <- sus_meta(df, stage = "clean", type = "clean")

# Add to processing history
df <- sus_meta(df, add_history = "Removed missing values")

# Print history
sus_meta(df, print_history = TRUE)

# Query controlled vocabulary
sus_meta(valid_values = "backend")   # "tibble" "parquet" "duckdb"
sus_meta(valid_values = "system")    # "SIM" "SIH" ...

# ── Arrow / Parquet backend ────────────────────────────────────────────────

# Write to Parquet (metadata embedded in schema)
df <- sus_meta(df, to_parquet = "data/sim_respiratory.parquet")
sus_meta(df, "backend")  # "parquet"

# Read back as climasus_df (metadata fully restored)
df2 <- sus_meta(from_parquet = "data/sim_respiratory.parquet")
sus_meta(df2, "stage")   # "filter_cid"

# ── DuckDB backend ─────────────────────────────────────────────────────────

library(duckdb)
con <- duckdb::dbConnect(duckdb::duckdb())

# Register as DuckDB view
df <- sus_meta(df, to_duckdb = con, duckdb_view = "sim_respiratory")

# SQL query directly on the view
DBI::dbGetQuery(con, "SELECT sexo, COUNT(*) AS n FROM sim_respiratory GROUP BY sexo")

# Read back as climasus_df (with optional SQL filter)
df3 <- sus_meta(from_duckdb = con,
                duckdb_view  = "sim_respiratory",
                duckdb_query = "WHERE ano_obito = 2020")
sus_meta(df3, "stage")   # "filter_cid"

duckdb::dbDisconnect(con)
} # }

if (FALSE) { # \dontrun{
# With climasus_df
sus_meta(df)
sus_meta(df, "stage")
df <- sus_meta(df, stage = "filter_cid", type = "filter_cid")

# With Arrow Table (auto-extracts from schema metadata)
arrow_tbl <- as_arrow_climasus(df)
sus_meta(arrow_tbl)  # Extracts from Arrow schema
sus_meta(arrow_tbl, "system")

# With DuckDB result (if metadata table exists)
con <- duckdb::dbConnect(duckdb::duckdb())
as_duckdb_climasus(df, con, "my_data")
result <- DBI::dbGetQuery(con, "SELECT * FROM my_data")
sus_meta(result)  # Extracts from companion __meta table

# Query valid values (no object needed)
sus_meta(valid_values = "backend")
} # }