(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Physiologic signatures within six hours of hospitalization identify acute illness phenotypes [1] ['Yuanfang Ren', 'Intelligent Critical Care Center', 'University Of Florida', 'Gainesville', 'Florida', 'United States Of America', 'Department Of Medicine', 'Division Of Nephrology', 'Hypertension', 'Renal Transplantation'] Date: 2022-12 Abstract During the early stages of hospital admission, clinicians use limited information to make decisions as patient acuity evolves. We hypothesized that clustering analysis of vital signs measured within six hours of hospital admission would reveal distinct patient phenotypes with unique pathophysiological signatures and clinical outcomes. We created a longitudinal electronic health record dataset for 75,762 adult patient admissions to a tertiary care center in 2014–2016 lasting six hours or longer. Physiotypes were derived via unsupervised machine learning in a training cohort of 41,502 patients applying consensus k-means clustering to six vital signs measured within six hours of admission. Reproducibility and correlation with clinical biomarkers and outcomes were assessed in validation cohort of 17,415 patients and testing cohort of 16,845 patients. Training, validation, and testing cohorts had similar age (54–55 years) and sex (55% female), distributions. There were four distinct clusters. Physiotype A had physiologic signals consistent with early vasoplegia, hypothermia, and low-grade inflammation and favorable short-and long-term clinical outcomes despite early, severe illness. Physiotype B exhibited early tachycardia, tachypnea, and hypoxemia followed by the highest incidence of prolonged respiratory insufficiency, sepsis, acute kidney injury, and short- and long-term mortality. Physiotype C had minimal early physiological derangement and favorable clinical outcomes. Physiotype D had the greatest prevalence of chronic cardiovascular and kidney disease, presented with severely elevated blood pressure, and had good short-term outcomes but suffered increased 3-year mortality. Comparing sequential organ failure assessment (SOFA) scores across physiotypes demonstrated that clustering did not simply recapitulate previously established acuity assessments. In a heterogeneous cohort of hospitalized patients, unsupervised machine learning techniques applied to routine, early vital sign data identified physiotypes with unique disease categories and distinct clinical outcomes. This approach has the potential to augment understanding of pathophysiology by distilling thousands of disease states into a few physiological signatures. Author summary In this paper, we present a machine learning approach, consensus clustering, to group hospitalized patients based on six routinely collected vital signs measured within six hours of hospital admission into previously undescribed subsets or acute illness phenotypes that may have different risks for a poor outcome or different treatment responses. We identified four acute illness phenotypes associated with distinct clinical characteristics, biomarker patterns, and clinical outcomes. We validated the reproducibility of phenotypes using different dataset and clustering approach. The early identified phenotypes, that have unique disease states and mortality risk, have the potential to augment understanding of pathophysiology by distilling thousands of disease states into a few physiological signatures and clinical decision-support systems under time constraints. Citation: Ren Y, Loftus TJ, Li Y, Guan Z, Ruppert MM, Datta S, et al. (2022) Physiologic signatures within six hours of hospitalization identify acute illness phenotypes. PLOS Digit Health 1(10): e0000110. https://doi.org/10.1371/journal.pdig.0000110 Editor: Jessica Keim-Malpass, University of Virginia, UNITED STATES Received: January 3, 2022; Accepted: August 23, 2022; Published: October 13, 2022 Copyright: © 2022 Ren et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The data has date and time stamps for the vitals that have been used in the analysis. In order to prevent the compromise of patient privacy due to identifiers included in the data, data cannot be shared in a public repository. Data can be shared upon reasonable request from the University of Florida Intelligent Critical Care Center at ic3-center@ufl.edu and the University of Florida Integrated Data Repository at IRBDataRequest@ahc.ufl.edu. Funding: This work was supported by the National Institute of General Medical Sciences (R01 GM110240 to AB, TOB, and PR), the National Institute of Biomedical Imaging and Bioengineering (R01 EB029699 to AB, TOB, and PR; R21 EB027344 to AB and PR), the National Institute of Neurological Disorders and Stroke (R01 NS120924 to AB, TOB, and PR), and by the National Institute of Diabetes and Digestive and Kidney Diseases (R01 DK121730 to AB, TOB, and PR; K01 DK120784 and R01 DK123078 to TOB). TOB was further supported by University of Florida Research (AGR DTD 12-02-20) and the National Center For Advancing Translational Sciences of the National Institutes of Health (UL1TR001427). PR was supported by a National Science Foundation CAREER award (1750192). TJL was supported by the National Institute of General Medical Sciences of the National Institutes of Health (K23 GM140268). This work was also supported in part by the NIH/NCATS Clinical and Translational Sciences Award to the University of Florida (UL1 TR000064). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Introduction Each year in the United States alone there are more than 36 million hospital admissions and seven thousand in-hospital mortalities, nearly one quarter of which may be preventable [1–4]. Early in each hospital admission, clinicians formulate decisions regarding diagnostic tests, treatments, and triage destinations using information that has diluted signal-to-noise ratios [5–7]. These arduous clinical decision-making tasks are supported by analyzing vital signs representing essential physiological processes [8–12]. Identifying early vital sign trajectories may have utility for discovering unique physiological signatures that are associated with distinct patient phenotypes and clinical outcomes. Unsupervised machine learning (ML) clustering analyses of clinical variables have identified meaningful subtypes of sepsis and the acute respiratory distress syndrome, but this approach has not been reported among broad, heterogeneous cohorts incorporating all hospitalized patients [13–15]. Using electronic health record data spanning 75,762 adult hospital admissions, we test the hypothesis that unsupervised ML analysis of vital signs recorded within six hours of hospital admission reveals discrete and reproducible physiologic signatures of acute illness phenotypes (physiotypes) that are associated with distinct disease categories and clinical outcomes. Methods Data source and participants We generated a longitudinal dataset of electronic health records (EHR) for 75,762 hospital admissions of 43,598 patients representing all adults (age ≥18 years) admitted to the University of Florida Health 1000-bed academic hospital between June 1, 2014 and April 1, 2016 with length of stay greater than or equal to six hours including emergency department admission if applicable. Patients completely missing at least two of the six vital sign measurements (systolic and diastolic blood pressure, heart rate, respiratory rate, temperature, and oxygen saturation) within six hours of admission were excluded (S1 Fig). A detailed description of our methods is available in S1 Text. This project was approved by the University of Florida Institutional Review Board. Study design We followed Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) recommendations under the Type 2b analysis category [16] to chronologically split the dataset into training (admissions between June 1, 2014 and May 31, 2015, n = 41,502), validation (admissions between June 1, 2015 and October 31, 2015, n = 17,415), and testing (admissions between November 1, 2015 and April 1, 2016, n = 16,845) cohorts to mitigate potentially adverse effects of dataset drift due to changes in clinical practice or patient populations. To identify acute illness phenotypes (physiotypes) using early physiologic signatures, we applied unsupervised ML clustering to temporal measurements of six vital signs recorded within six hours of hospital admission in the training cohort. We assessed physiotype reproducibility by applying alternative clustering methods in the training dataset, assessing physiotype frequency distributions and clinical outcomes in the validation cohort, and predicting physiotypes in the testing cohort (S2 Fig). Identifying acute illness physiotypes using early physiologic signatures To derive physiotypes with reproducible early physiologic signatures, we applied consensus k-means clustering [17] to 36 features derived from time series of six vital signs measured within six hours of hospital admission for each encounter in the training cohort. Based on consensus matrix plots and cumulative distribution function curves, the optimal number of physiologic clusters was four (S3 Fig) [15]. We processed raw time series to remove outliers and assess distributions, missingness, and correlation (S1 Table and S4 Fig). Raw time series were resampled to an hourly frequency, using mean values when multiple measurements were recorded during the same one-hour window. Missing values were imputed by forward and backpropagating temporally adjacent values [18]. For records with no measurements within six hours of hospitalization, we imputed median values from the training cohort. Each admission was represented by six hourly values for six vital signs, yielding 36 clustering features. Vital sign patterns were visualized using line plots with 95% confidence intervals, t-distribution stochastic neighbor embedding (t-SNE) plots, ranked plots for mean standardized difference between physiotype pairs, and vital sign mosaic plots (see S1 Text for a comprehensive description). Clinical characteristics, biological correlates, and clinical outcomes For each admission we extracted demographics, 19 clinical biomarkers routinely measured at hospital admission (S2 Table), Sequential Organ Dysfunction Assessment (SOFA) and Modified Early Warning Score (MEWS) acuity scores, and patient outcomes [19,20]. Details on data processing are described in S1 Text. Primary outcomes were thirty-day and three-year mortality. Median follow-up duration was 4.3 years per reverse Kaplan-Meier method. Other outcomes were acute kidney injury (AKI), venous thromboembolism, sepsis, intensive care unit (ICU) admission, mechanical ventilation (MV), and renal replacement therapy (RRT). Statistical methods We assessed physiotype reproducibility by comparing phenotype derivation with gaussian mixture modeling (GMM) [21] in the training dataset and by assessing frequency distributions in the validation and testing cohorts (S2 Fig). We assessed the robustness of derived physiotypes using sensitivity analyses excluding variables with high missingness, excluding both highly missing and highly correlated variables, and using a 12-hour vital sign window. We validated derived physiotypes in two steps. In the validation cohort we rederived clusters using consensus k-means and compared them with training cohort clusters. In the testing cohort, we predicted physiotypes based on the clinical characteristics of training cohort clusters. Predictions arose from the minimum Euclidean distance from each patient to the centroid of each physiotype (S1 Text). Clinical variables across clusters were compared using line plots, t-distribution stochastic neighbor embedding plots, and ranked plots. Physiotypes were compared using the χ2 test for categorical variables and analysis of variance and the Kruskal-Wallis test for continuous variables. Overall survival was illustrated using Kaplan–Meier curves and compared using the log-rank test. Adjusted hazard ratios (HR) for each physiotype were compared using Cox proportional-hazards regression while adjusting for age, sex, comorbidities, and SOFA score on admission. We adjusted p values for the family-wise error rate due to multiple comparisons using the Bonferroni correction. To assure that physiotypes did not recapitulate existing acuity scores, we compared physiotypes with SOFA scores within 24 hours of admission using alluvial plots and chord diagrams. Analyses were performed with Python version 3.7 and R version 3.5.1. Discussion Using six vital signs measured within six hours of hospital admission, consensus clustering identified four distinct, clinically relevant patient phenotypes with unique pathophysiological signatures, disease categories, and clinical outcomes. Blood pressure values and trends contributed substantially to cluster assignments: one hypertensive, one normotensive, and two hypotensive clusters. Among the two hypotensive clusters, one was inflammatory, the other non-inflammatory according to C-reactive protein and erythrocyte sedimentation rate values. Beyond these fundamental distinctions, clusters were also differentiated by disease categories, producing the final physiotype labels. Physiotype A, hypotensive non-inflammatory surgical shock, had physiologic signals suggesting early vasoplegia and hypothermia but low-grade inflammation relative to Physiotype B, a hypotensive inflammatory pulmonary dysfunction physiotype associated with early tachycardia, tachypnea, and hypoxemia followed by greatest burdens of prolonged respiratory insufficiency, sepsis, acute kidney injury, and short- and long-term mortality. Physiotype C, a normotensive, rapid normalization physiotype, had minimal early physiological derangement and favorable clinical outcomes. Physiotype D, hypertensive chronic disease exacerbation, had greatest prevalence of chronic cardiovascular and kidney disease, presented with severely elevated blood pressure, and had favorable short-term outcomes but suffered 20% three-year mortality. Each physiotype contained substantial patient proportions across the full ranges of SOFA scores and component subscores, suggesting that clustering did not simply recapitulate SOFA acuity assessments. Finally, physiotype characteristics were reproduced with fidelity in validation and testing cohorts. Beyond the potential to augment understanding of pathophysiology by distilling thousands of disease states into a few physiological signatures, physiotypes could be adapted to augment clinical decision-making under time constraints and uncertainty. Early identification of hypotensive inflammatory pulmonary dysfunction could theoretically facilitate early ICU admission and high suspicion for sepsis with attention to resuscitation strategies that maintain adequate renal perfusion without inducing volume overload and hydrostatic pulmonary edema, primarily by focusing on providing the optimal balance of intravenous fluid resuscitation and vasopressor [5–8,22]. Early identification of normotensive rapid recovery could facilitate early hospital discharge or triage to low-intensity care settings (i.e., hospital floors), avoiding excessive monitoring testing that confers lower value of care and may impart harm from unnecessary treatments [9,10]. Early identification of hypertensive chronic disease exacerbation could suggest low value for critical care resources compared with careful post-discharge follow-up for mitigating long-term mortality, and could be built into a decision-support system that facilitates hospital ward admission and outpatient clinic visits to address modifiable risk factors and optimize medication regimens for treating the underlying chronic disease. Several statistical and machine learning methods can accurately predict risk for death, but these approaches do not elucidate pathophysiologic states or disease categories [23,24]. Conversely, clustering can identify patient phenotypes that have unique disease states and mortality risk, representing a potentially useful adjunct to clinical decision-support systems, particularly among heterogeneous patient cohorts with diverse disease etiologies. We are unaware of previous studies using cluster analyses of early vital sign measurements to identify phenotypes in heterogeneous cohorts of patients hospitalized for any reason. Others have used clustering for identifying patients with unique disease subtypes with unique treatment responses; sepsis and diastolic heart failure are prominent examples. Seymour et al. [15] performed clustering analyses on a multi-center cohort of sepsis patients with the rationale that sepsis pathophysiology is heterogeneous and identifying distinct sepsis phenotypes may facilitate targeted therapy. Clustering was performed on both clinical and host immune response biomarker variables, identifying four distinct clusters. In a series of simulations, varying proportions of each cluster were applied to previously reported randomized controlled trials. Treatment effects varied significantly across simulations, suggesting unique treatment responses. Shah et al. [25] performed clustering analyses on a single-center cohort of patients with heart failure and preserved ejection fraction, another heterogeneous syndrome refractory to one-size-fits-all management. Clustering was performed on electrocardiogram and echocardiogram data as well as clinical variables, identifying three distinct phenotypes with unique risk-adjusted clinical outcomes. While Seymour et al. [15] and Shah et al. [25] both identified subgroups of patients within larger patient groups that share an established diagnosis, we instead apply clustering methods to any hospitalized patient, identifying broad, generalized patterns of pathophysiology rather than targeted treatment responses. This difference precludes further comparison of our results with others. We also acknowledge several limitations. Our study used data from a single institution, limiting the generalizability of our findings, and external validation in databases from different centers is needed. Yet, it seems unlikely that selection bias significantly affected results, as all adult patients admitted for longer than six hours were included. Input variables were limited to the first six hours following hospital admission so that phenotypes could be identified early enough to support clinical decision-making under time constraints and uncertainty. It is possible that the same advantages for early decision-support could be achieved while incorporating historical patient data from previous encounters in the electronic health record; further research is necessary to determine whether this strategy is advantageous. Waveform data, though not universally available in EHRs, has the potential to improve the precision of phenotype clustering. Our clustering approach does not ensure temporal ordering of vital signs, which could influence cluster assignments. Finally, the potential of early clustering to augment clinical decision-making remains theoretical until evaluated in a prospective trial. Conclusions Using six vital signs measured within six hours of hospital admission, clustering analyses identified four distinct patient phenotypes that had unique disease categories and clinical outcomes and did not recapitulate previously established acuity assessments. Beyond elucidating pathophysiology by distilling thousands of disease states into a few physiological signatures, identifying patient phenotypes during the early stages of hospital admission may have important implications for clinical decision-making under time constraints. Acknowledgments The content is solely the responsibility of the authors. AB and TOB had full access to all of the data. The authors thank members of the Intelligent Critical Care Center and Integrated Data Repository at the University of Florida Health for supporting this work. [END] --- [1] Url: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000110 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/