Skip to contents

Filters Brazilian Unified Health System (SUS) data based on ICD-10 codes. (International Classification of Diseases, 10th Revision) or predefined epidemiological disease groups. This function supports complex filtering scenarios including specific codes, code ranges, chapters, and 50+ disease groups relevant to epidemiological research in Brazil. Includes specialized support for SUS-specific coding practices and multilingual interface (English, Portuguese, Spanish).

Usage

sus_data_filter_cid(
  df,
  icd_codes = NULL,
  disease_group = NULL,
  icd_column = NULL,
  match_type = "starts_with",
  lang = "pt",
  verbose = TRUE
)

Arguments

df

A data.frame or tibble containing SUS health data with ICD-10 codes. Typically obtained from DATASUS systems (SIM, SIH, SINAN). The data should contain at least one column with ICD-10 codes in standard format (e.g., "A00.0", "I10", "C50.9").

icd_codes

A character vector of ICD-10 codes, ranges, or categories. Multiple syntaxes are supported:

Basic filtering:

  • Single code: "J18.9" (Pneumonia, unspecified)

  • Multiple codes: c("I10", "I11.0", "I11.9") (Hypertensive diseases)

Range filtering:

  • Complete range: "J00-J99" (All diseases of respiratory system)

  • Partial range: "J09-J18" (Influenza and pneumonia only)

Chapter filtering:

  • Full chapter: "I" (All diseases of circulatory system, codes I00-I99)

  • Chapter group: "C00-D48" (All neoplasms)

Special SUS categories:

  • "causas_externas" or "external_causes": V01-Y98 (External causes)

  • "causas_maternas" or "maternal_causes": O00-O99 (Pregnancy/childbirth)

  • "causas_infantis" or "infant_causes": P00-P96 (Perinatal conditions)

  • "doencas_infecciosas" or "infectious_diseases": A00-B99 (Infectious)

  • "doencas_respiratorias" or "respiratory_diseases": J00-J99 (Respiratory)

  • "doencas_cardiovasculares" or "cardiovascular_diseases": I00-I99 (Cardio)

  • "neoplasias" or "neoplasms": C00-D48 (Neoplasms)

Brazilian epidemiological priorities:

  • "dengue_like": A90-A91 (Dengue) + A92.0-A92.9 (Other viral fevers)

  • "zika_chik": A92.8 (Zika) + A92.0 (Chikungunya)

  • "tb_respiratoria": A15-A16 (Respiratory tuberculosis)

  • "covid19": U07.1 (COVID-19) + U07.2 (Suspected COVID-19)

  • "violencia": X85-Y09 (Assault) + Y35-Y36 (Legal intervention)

Note: Either icd_codes OR disease_group must be provided, not both.

disease_group

Character. Name of predefined disease group (e.g., "dengue", "cardiovascular", "respiratory"). Use list_disease_groups() to see all available groups. Mutually exclusive with icd_codes.

icd_column

Character. Name of the column containing ICD-10 codes. If NULL (default), the function attempts auto-detection from common SUS column names in this priority order:

  1. "CAUSABAS" - Underlying cause (primary cause of death in SIM)

  2. "DIAG_PRINC" - Main diagnosis (SIH hospitalizations)

  3. "DIAG_SECUN" - Secondary diagnosis

  4. "CAUSAOBITO" - Cause of death (alternative SIM field)

  5. "DIAGNOSTIC" - Diagnosis (SINAN notifiable diseases)

  6. "linha_a" through "linha_f" - Multiple cause lines (SIM)

match_type

Character. Type of matching algorithm:

  • "exact": Exact code match (e.g., "I10" matches only "I10")

  • "starts_with": Match codes starting with pattern (default, e.g., "I10" matches "I10", "I10.0", "I10.9")

  • "range": Match codes within specified ranges (e.g., "I10-I15" matches I10-I15.9)

  • "chapter": Match entire ICD-10 chapter (e.g., "I" matches I00-I99.9)

  • "fuzzy": Allow for common SUS coding variations (e.g., "I10" matches "I10", "I10 ", "I10X")

lang

Character. Language for user interface messages, warnings, and documentation. Options:

  • "en": English

  • "pt": Portuguese (default, recommended for Brazilian users)

  • "es": Spanish Affects all console output and documentation of matched codes.

verbose

Logical. If TRUE (default), prints detailed filtering information including: records processed, match statistics, common coding issues detected, and time elapsed.

Value

A filtered data.frame or tibble containing only records matching the specified ICD-10 codes or disease group. The output preserves all original columns

Details

Automatic ICD Column Detection

The function automatically identifies the appropriate ICD column based on the health system

Disease Groups

The function includes 50+ predefined epidemiological groups organized by:

  • ICD Chapters: All major disease categories (A00-Y98)

  • Climate-Sensitive Diseases: Vector-borne, waterborne, heat-related, etc.

  • Specific Conditions: Dengue, malaria, cardiovascular, respiratory, etc.

  • Syndromic Groups: Fever, respiratory, diarrheal syndromes

  • Age-Specific Groups: Pediatric, elderly populations

Each group includes:

  • ICD code ranges

  • Multilingual labels and descriptions

  • Climate sensitivity flag

  • Associated climate factors

Use list_disease_groups() to see all available groups and their details.

References

  1. World Health Organization. (2016). ICD-10 International Statistical Classification of Diseases and Related Health Problems. 10th Revision.

  2. Brazilian Ministry of Health. (2023). Classificacao Estatistica Internacional de Doencas e Problemas Relacionados a Saude - CID-10. DATASUS. http://datasus.saude.gov.br/cid10

Examples

if (FALSE) { # \dontrun{
# Example 1: Filter by explicit ICD codes
df_cardio <- sus_data_filter_cid(
  sim_data,
  icd_codes = "I00-I99",
  lang = "en"
)

# Example 2: Filter by disease group (easier!)
df_dengue <- sus_data_filter_cid(
  sinan_data,
  disease_group = "dengue",
  lang = "pt"
)

# Example 3: Climate-sensitive diseases
df_climate <- sus_data_filter_cid(
  sim_data,
  disease_group = "climate_sensitive_all",
  lang = "en"
)

# Example 4: Multiple specific codes
df_ami_stroke <- sus_data_filter_cid(
  sih_data,
  icd_codes = c("I21", "I22", "I63", "I64"),
  lang = "es"
)

# Example 5: Respiratory diseases in children
df_pediatric <- sus_data_filter_cid(
  sih_data,
  disease_group = "pediatric_respiratory",
  lang = "pt"
)

# List all available disease groups
list_disease_groups(lang = "pt")

# List only climate-sensitive groups
list_disease_groups(climate_sensitive_only = TRUE, lang = "en")

# Get details about a specific group
get_disease_group_details("dengue", lang = "pt")
} # }