Generate Data Quality Report for Health Data
Source:R/sus_data_quality_report.R
sus_data_quality_report.RdGenerates a comprehensive data quality report for health data, including summaries of missing values, data distributions, date validations, and ICD code frequencies. This function helps identify potential data quality issues before analysis.
Usage
sus_data_quality_report(
df,
output_format = "console",
output_file = NULL,
check_dates = TRUE,
check_icd = TRUE,
top_n = 10,
lang = "pt"
)Arguments
- df
A data frame containing health data.
- output_format
Character string specifying the output format. Options:
"console"(default, prints to console),"html"(saves HTML report),"markdown"(saves Markdown report).- output_file
Character string with the path to save the report file. Required if
output_formatis not"console". IfNULL, uses a default filename based on timestamp.- check_dates
Logical. If
TRUE(default), performs date validation checks (e.g., future dates, dates before birth).- check_icd
Logical. If
TRUE(default), summarizes ICD code distributions.- top_n
Integer. Number of top categories to show in frequency tables. Default is 10.
- lang
Character string specifying the language for the report. Options:
"en"(English),"pt"(Portuguese, default),"es"(Spanish).
Value
Invisibly returns a list containing the quality metrics. If
output_format = "console", prints the report to the console. Otherwise,
saves the report to a file.
Details
The data quality report includes:
Dataset Overview: Dimensions, column types
Missing Values: Count and percentage of NAs by column
Demographic Variables: Frequency tables for sex, race, age groups
Date Validation: Checks for invalid dates (future, before 1900, etc.)
ICD Codes: Top 10 most frequent diagnosis codes
Geographic Distribution: Top municipalities
Examples
if (FALSE) { # \dontrun{
library(climasus4r)
# Print report to console
sus_data_quality_report(df, lang = "pt")
# Save HTML report
sus_data_quality_report(
df,
output_format = "html",
output_file = "reports/dq_report.html",
lang = "en"
)
# Save Markdown report
sus_data_quality_report(
df,
output_format = "markdown",
output_file = "reports/dq_report.md"
)
} # }