(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Characterizing clinical pediatric obesity subtypes using electronic health record data [1] ['Elizabeth A. Campbell', 'Department Of Information Science', 'College Of Computing', 'Informatics', 'Drexel University', 'Philadelphia', 'Pennsylvania', 'United States Of America', 'Department Of Biomedical', 'Health Informatics'] Date: 2022-08 In this work, we present a study of electronic health record (EHR) data that aims to identify pediatric obesity clinical subtypes. Specifically, we examine whether certain temporal condition patterns associated with childhood obesity incidence tend to cluster together to characterize subtypes of clinically similar patients. In a previous study, the sequence mining algorithm, SPADE was implemented on EHR data from a large retrospective cohort (n = 49 594 patients) to identify common condition trajectories surrounding pediatric obesity incidence. In this study, we used Latent Class Analysis (LCA) to identify potential subtypes formed by these temporal condition patterns. The demographic characteristics of patients in each subtype are also examined. An LCA model with 8 classes was developed that identified clinically similar patient subtypes. Patients in Class 1 had a high prevalence of respiratory and sleep disorders, patients in Class 2 had high rates of inflammatory skin conditions, patients in Class 3 had a high prevalence of seizure disorders, and patients in Class 4 had a high prevalence of Asthma. Patients in Class 5 lacked a clear characteristic morbidity pattern, and patients in Classes 6, 7, and 8 had a high prevalence of gastrointestinal issues, neurodevelopmental disorders, and physical symptoms respectively. Subjects generally had high membership probability for a single class (>70%), suggesting shared clinical characterization within the individual groups. We identified patient subtypes with temporal condition patterns that are significantly more common among obese pediatric patients using a Latent Class Analysis approach. Our findings may be used to characterize the prevalence of common conditions among newly obese pediatric patients and to identify pediatric obesity subtypes. The identified subtypes align with prior knowledge on comorbidities associated with childhood obesity, including gastro-intestinal, dermatologic, developmental, and sleep disorders, as well as asthma. Childhood obesity is a significant public health challenge in the United States. Despite its prevalence, it remains uncertain if pediatric obesity represents a single condition or is composed of different subtypes with possibly different underlying causes. Electronic Health Records (EHRs) are an important source of data that may be analyzed to yield clinical and epidemiological insights to aid in the obesity treatment and prevention. In this paper, we present a study of EHR data that aimed to identify clinically similar subtypes among a population of newly obese pediatric patients. Specifically, we examine whether certain temporal condition patterns associated with childhood obesity incidence tend to cluster together to characterize subgroups of clinically similar patients. We identified eight potential subtypes, differentiated by the prevalence of various diagnoses including respiratory and sleep disorders, inflammatory skin conditions, asthma, and seizure disorders. This work may be used as a foundation for future investigations into pediatric obesity subtypes as well as to inform methodological and clinical research to mine EHR data for potential insights that improve patient health outcomes. Funding: This work was supported by a grant from the Commonwealth Universal Research Enhancement (C.U.R.E.) program funded by the Pennsylvania Department of Health—2015 Formula award—SAP #4100072543, received by CBF. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Copyright: © 2022 Campbell et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. We present an investigation of EHR data to identify clinically similar subtypes among a population of newly obese pediatric patients. We examine whether certain temporal condition patterns associated with childhood obesity incidence tend to cluster together to characterize subtypes of clinically similar patients, and to describe the demographic characteristics of these patient subtypes. Specifically, we address the following: In recent years, use of such datasets in healthcare has increased.[ 9 ] Data sources include electronic health records (EHRs), medical imaging, wearable devices, genome sequencing, and payer records among others. Data mining methods for pattern discovery and extraction form a core set of methods to facilitate knowledge discovery from large healthcare datasets.[ 10 , 11 ] Data mining in healthcare may be used for numerous purposes including diagnostic outcomes evaluation, to uncover comorbidity and clinical event patterns, and to detect fraud and abuse.[ 12 , 13 ] Despite its prevalence and social importance, it remains uncertain if childhood obesity represents a single condition or is composed of unique phenotypes with possibly different underlying causes. Grouping all types of overweight and obesity into one clinical condition may conceal associations between risk factors and specific subtypes of obesity, which has implications for improving prevention, recognition, and treatment of pediatric obesity.[ 8 ] Analyzing large healthcare datasets may potentially uncover previously unknown relationships concerning diagnoses, event patterns, and outcomes in healthcare, such as the presence of childhood obesity subtypes. Approximately one third of children in the United States are overweight (age- and sex-specific body mass index (BMI) greater than or equal to the 85th percentile per Centers for Disease Control and Prevention (CDC) growth charts) or obese (age- and sex-specific BMI greater than or equal to the 95th percentile per CDC growth charts).[ 1 , 2 ] Obesity is linked with an increased risk of developing multiple comorbidities including asthma, diabetes, hypertension, and psychological conditions among pediatric patients during childhood and later in life.[ 3 , 4 ] Pediatric obesity is a socially significant health issue that disproportionately impacts American Indian, African American, and Latino children, compared to non-Hispanic whites.[ 5 , 6 ] Obesity prevalence is also higher among low-income, rural, or less-educated population subtypes.[ 5 , 7 ] Materials and methods Data from this study were derived from the Pediatric Big Data (PBD) resource at the Children’s Hospital of Philadelphia (CHOP) (a pediatric tertiary academic medical center). The PBD resource includes clinical data collected from CHOP, the CHOP Care Network (a primary care network of over 30 sites), and CHOP Specialty Care and Surgical Centers. Both clinical and non-clinical observations (as defined by Observational Health Data Sciences and Informatics (OHDSI) condition domain standards) from a patient’s EHR are included in the PBD database.[14] The PBD resource contains health-related information, including demographic, encounter, medication, procedure, and measurement (e.g. vital signs, laboratory results) elements for a large, unselected population of children. Non-study personnel extracted all data from the EHR and removed protected health information (PHI) identifiers, with the exception of dates, prior to transfer to the study database. Date information was removed from the analysis dataset as described below. The CHOP Institutional Review Board approved this study and waived the requirement for consent. Temporal condition patterns In a previous study, [15] we applied a sequential pattern mining algorithm to a large retrospective cohort of patients (n = 49 694) from CHOP to identify common condition trajectories surrounding pediatric obesity incidence. This analysis used the CDC definition of childhood obesity (BMI z-score at or above the 95th percentile for age and sex).[16,17] Patients had at least one obesity measurement during a CHOP primary care visit and at least one visit prior to the first obesity measurement where an obese BMI was not recorded. The BMI z-scores were centrally calculated in this analysis. The same definition of obesity was used across study sites for the entire study period. Campbell, et al includes a full study diagram detailing the inclusion criteria implementation for obtaining the study population.[15] EHR data from patients’ records for healthcare visits in which an obese BMI was first recorded (the index visit), as well as immediately before (pre-index visit) and after (post-index visit) were compiled for analysis. The presence of a pre-index visit was required for study inclusion to ensure that patients who became new patients in the CHOP healthcare system were not already obese. However, the presence of a post-index visit was not required for inclusion. Approximately two thirds of patients (67.6%) had a post-index visit. The SPADE algorithm [18] was used to discover frequent temporal patterns among pre, index, and post visits in the study cohort. SPADE is a sequential pattern mining algorithm that finds frequent subsequence patterns from a larger sequence through an Apriori-based candidate generation method.[19] SPADE identified 163 condition patterns that were present in at least 1% of case patients. An example pattern is “1-ALL04, 2-EAR01” (a diagnosis of asthma in the pre-index visit, followed by a diagnosis of otitis media in the index visit). A control population of patients with a healthy BMI matched on age, prior healthcare visits, and sex was obtained and analyzed. We then examined prevalence in the control population of the common patterns identified among case patients. McNemar’s test results indicated that 80 of the 163 patterns were significantly more common among case patients (p<0.05). [15] Latent class analysis The current study builds on results from Campbell, et al. and utilizes the same study population and temporal diagnoses that were previously identified. In this study, latent class analysis (LCA) [20] was used to identify potential subtypes formed by the diagnoses in temporal condition patterns that were significantly more common among obese pediatric patients. The assumption of conditional independence that underlies LCA is violated through the inclusion of both super-sequences and their frequent subsequences because each super-sequence is the intersection of its frequent subsequences. Therefore, only the frequent subsequences (i.e. individual temporal diagnoses) were included in our LCA modelling efforts. A total of 37 temporal diagnoses were evaluated to create patient subtypes in the LCA; these are listed in Table 1 in the Supplemental section. Each subsequence was considered as an individual feature in the dataset used for the LCA, with a binary value of 0 or 1 for if a patient had this temporal diagnosis or not. PPT PowerPoint slide PNG larger image TIFF original image Download: Table 1. Latent Class Model Development Comparison. https://doi.org/10.1371/journal.pdig.0000073.t001 LCA requires specification of the number of classes as a user-selected parameter. Prior research on chronic diseases, including asthma, [21] diabetes, [22] and adult obesity [23] have indicated that there are typically 4–5 subtypes for these diseases. Input from clinician collaborators on this study suggested 8 classes as the maximum number that would be manageable and useful in care provision. Therefore, to obtain a clinically meaningful and interpretable number of patient subtypes, we elected to constrain our LCA evaluation to models with 3–8 classes. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) [24] were used to evaluate goodness-of-fit for each of the models tested. R Version 3.6.1 was used for all data analysis in this study, [25] and the poLCA package [26] was used for the latent class modelling. Demographic subtype analysis The LCA model assigns a probability of membership for each subtype (class) for a given individual. To facilitate analysis, using the final clustering model, each patient was assigned to the group for which he/she had the highest probability of membership. The high-prevalence diagnoses within each LCA-identified subtype, defined as those with ≥ 10% prevalence among patients, were used to clinically describe and name the subtypes. Finally, demographic information from patients’ EHR was incorporated to describe the patient subtypes. The demographic variables considered were sex, race, Medicaid enrollment (a proxy for socioeconomic status at the time of obesity incidence), [27,28] age at index visit (with age evaluated as both a continuous and categorical variable), and Philadelphia residence. Patients were classified as Hispanic if their self-identified ethnicity was specified as Hispanic or Latino; otherwise they were categorized by the value of their self-identified race the EHR. Patients with missing race and ethnicity information were classified as unknown. If patients used multiple insurance types during their index visit, they were classified as being enrolled in Medicaid if one of those insurance plans was Medicaid or Children’s Health Insurance Program (CHIP), Pennsylvania’s state program to provide health insurance to uninsured children and teens who are ineligible or not enrolled in Medicaid.[29] If a patient did not have insurance information recorded for their index visit, all insurance information for patients’ visits within a year of their index visits was obtained from the PBD database and analyzed. If patients had a record of Medicaid/CHIP enrollment within a year of their index visit, then they were classified in the Medicaid/CHIP enrollment category (Medicaid/CHIP eligibility is assessed annually).[30] One hundred patients did not have any insurance information for a visit within a year of the index visit, and were dropped, leaving a total study population of 49,594 patients. The frequency of categorical demographic variables (sex, race, Medicaid enrollment, age at index visit, and Philadelphia residence) and mean and standard deviation (SD) of continuous variables (mean age at index visit) were provided overall and for each subtype. [END] --- [1] Url: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000073 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/