Filter SUS health data by ICD-10 codes or disease groups with multilingual support
Source:R/sus_data_filter_cid.R
sus_data_filter_cid.RdFilters Brazilian Unified Health System (SUS) data based on ICD-10 codes. (International Classification of Diseases, 10th Revision) or predefined epidemiological disease groups. This function supports complex filtering scenarios including specific codes, code ranges, chapters, and 50+ disease groups relevant to epidemiological research in Brazil. Includes specialized support for SUS-specific coding practices and multilingual interface (English, Portuguese, Spanish).
Usage
sus_data_filter_cid(
df,
icd_codes = NULL,
disease_group = NULL,
icd_column = NULL,
match_type = "starts_with",
lang = "pt",
verbose = TRUE
)Arguments
- df
A
data.frameortibblecontaining SUS health data with ICD-10 codes. Typically obtained from DATASUS systems (SIM, SIH, SINAN). The data should contain at least one column with ICD-10 codes in standard format (e.g., "A00.0", "I10", "C50.9").- icd_codes
A character vector of ICD-10 codes, ranges, or categories. Multiple syntaxes are supported:
Basic filtering:
Single code:
"J18.9"(Pneumonia, unspecified)Multiple codes:
c("I10", "I11.0", "I11.9")(Hypertensive diseases)
Range filtering:
Complete range:
"J00-J99"(All diseases of respiratory system)Partial range:
"J09-J18"(Influenza and pneumonia only)
Chapter filtering:
Full chapter:
"I"(All diseases of circulatory system, codes I00-I99)Chapter group:
"C00-D48"(All neoplasms)
Special SUS categories:
"causas_externas"or"external_causes": V01-Y98 (External causes)"causas_maternas"or"maternal_causes": O00-O99 (Pregnancy/childbirth)"causas_infantis"or"infant_causes": P00-P96 (Perinatal conditions)"doencas_infecciosas"or"infectious_diseases": A00-B99 (Infectious)"doencas_respiratorias"or"respiratory_diseases": J00-J99 (Respiratory)"doencas_cardiovasculares"or"cardiovascular_diseases": I00-I99 (Cardio)"neoplasias"or"neoplasms": C00-D48 (Neoplasms)
Brazilian epidemiological priorities:
"dengue_like": A90-A91 (Dengue) + A92.0-A92.9 (Other viral fevers)"zika_chik": A92.8 (Zika) + A92.0 (Chikungunya)"tb_respiratoria": A15-A16 (Respiratory tuberculosis)"covid19": U07.1 (COVID-19) + U07.2 (Suspected COVID-19)"violencia": X85-Y09 (Assault) + Y35-Y36 (Legal intervention)
Note: Either
icd_codesORdisease_groupmust be provided, not both.- disease_group
Character. Name of predefined disease group (e.g., "dengue", "cardiovascular", "respiratory"). Use
list_disease_groups()to see all available groups. Mutually exclusive withicd_codes.- icd_column
Character. Name of the column containing ICD-10 codes. If NULL (default), the function attempts auto-detection from common SUS column names in this priority order:
"CAUSABAS"- Underlying cause (primary cause of death in SIM)"DIAG_PRINC"- Main diagnosis (SIH hospitalizations)"DIAG_SECUN"- Secondary diagnosis"CAUSAOBITO"- Cause of death (alternative SIM field)"DIAGNOSTIC"- Diagnosis (SINAN notifiable diseases)"linha_a"through"linha_f"- Multiple cause lines (SIM)
- match_type
Character. Type of matching algorithm:
"exact": Exact code match (e.g., "I10" matches only "I10")"starts_with": Match codes starting with pattern (default, e.g., "I10" matches "I10", "I10.0", "I10.9")"range": Match codes within specified ranges (e.g., "I10-I15" matches I10-I15.9)"chapter": Match entire ICD-10 chapter (e.g., "I" matches I00-I99.9)"fuzzy": Allow for common SUS coding variations (e.g., "I10" matches "I10", "I10 ", "I10X")
- lang
Character. Language for user interface messages, warnings, and documentation. Options:
"en": English"pt": Portuguese (default, recommended for Brazilian users)"es": Spanish Affects all console output and documentation of matched codes.
- verbose
Logical. If TRUE (default), prints detailed filtering information including: records processed, match statistics, common coding issues detected, and time elapsed.
Value
A filtered data.frame or tibble containing only records matching
the specified ICD-10 codes or disease group. The output preserves all original columns
Details
Automatic ICD Column Detection
The function automatically identifies the appropriate ICD column based on the health system
Disease Groups
The function includes 50+ predefined epidemiological groups organized by:
ICD Chapters: All major disease categories (A00-Y98)
Climate-Sensitive Diseases: Vector-borne, waterborne, heat-related, etc.
Specific Conditions: Dengue, malaria, cardiovascular, respiratory, etc.
Syndromic Groups: Fever, respiratory, diarrheal syndromes
Age-Specific Groups: Pediatric, elderly populations
Each group includes:
ICD code ranges
Multilingual labels and descriptions
Climate sensitivity flag
Associated climate factors
Use list_disease_groups() to see all available groups and their details.
References
World Health Organization. (2016). ICD-10 International Statistical Classification of Diseases and Related Health Problems. 10th Revision.
Brazilian Ministry of Health. (2023). Classificacao Estatistica Internacional de Doencas e Problemas Relacionados a Saude - CID-10. DATASUS. http://datasus.saude.gov.br/cid10
Examples
if (FALSE) { # \dontrun{
# Example 1: Filter by explicit ICD codes
df_cardio <- sus_data_filter_cid(
sim_data,
icd_codes = "I00-I99",
lang = "en"
)
# Example 2: Filter by disease group (easier!)
df_dengue <- sus_data_filter_cid(
sinan_data,
disease_group = "dengue",
lang = "pt"
)
# Example 3: Climate-sensitive diseases
df_climate <- sus_data_filter_cid(
sim_data,
disease_group = "climate_sensitive_all",
lang = "en"
)
# Example 4: Multiple specific codes
df_ami_stroke <- sus_data_filter_cid(
sih_data,
icd_codes = c("I21", "I22", "I63", "I64"),
lang = "es"
)
# Example 5: Respiratory diseases in children
df_pediatric <- sus_data_filter_cid(
sih_data,
disease_group = "pediatric_respiratory",
lang = "pt"
)
# List all available disease groups
list_disease_groups(lang = "pt")
# List only climate-sensitive groups
list_disease_groups(climate_sensitive_only = TRUE, lang = "en")
# Get details about a specific group
get_disease_group_details("dengue", lang = "pt")
} # }