(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org. Licensed under Creative Commons Attribution (CC BY) license. url:https://journals.plos.org/plosone/s/licenses-and-copyright ------------ SWIFT: A deep learning approach to prediction of hypoxemic events in critically-Ill patients using SpO2 waveform prediction ['Akshaya V. Annapragada', 'Johns Hopkins University School Of Medicine', 'Baltimore', 'Maryland', 'United States Of America', 'Joseph L. Greenstein', 'Institute For Computational Medicine', 'The Johns Hopkins University', 'Sanjukta N. Bose', 'Department Of Electrical'] Date: 2022-01 Abstract Hypoxemia is a significant driver of mortality and poor clinical outcomes in conditions such as brain injury and cardiac arrest in critically ill patients, including COVID-19 patients. Given the host of negative clinical outcomes attributed to hypoxemia, identifying patients likely to experience hypoxemia would offer valuable opportunities for early and thus more effective intervention. We present SWIFT (SpO 2 Waveform ICU Forecasting Technique), a deep learning model that predicts blood oxygen saturation (SpO 2 ) waveforms 5 and 30 minutes in the future using only prior SpO 2 values as inputs. When tested on novel data, SWIFT predicts more than 80% and 60% of hypoxemic events in critically ill and COVID-19 patients, respectively. SWIFT also predicts SpO 2 waveforms with average MSE below .0007. SWIFT predicts both occurrence and magnitude of potential hypoxemic events 30 minutes in the future, allowing it to be used to inform clinical interventions, patient triaging, and optimal resource allocation. SWIFT may be used in clinical decision support systems to inform the management of critically ill patients during the COVID-19 pandemic and beyond. Author summary Hypoxemia, or loss of blood oxygen saturation, is a dangerous condition that drives morbidity and mortality in critically ill patients, including COVID-19 patients and patients with brain injury or cardiac arrest. The ability to identify hypoxemia before it occurs would expand the possibilities for effective clinical interventions. To this end, we present SWIFT (SpO 2 Waveform ICU Forecasting Technique), a deep learning model that can predict blood oxygen saturation 5 and 30 minutes in the future in critically ill patients. In testing, SWIFT identified more than 80% and 60% of hypoxemic events in critically ill and COVID-19 patients, respectively. SWIFT can predict both the occurrence and magnitude of hypoxemic events, which provides clinical information that can help prevent hypoxemia in critically ill patients. SWIFT can be used in clinical decision support systems to improve the management of patients at risk for hypoxemia during the COVID-19 pandemic and beyond. Citation: Annapragada AV, Greenstein JL, Bose SN, Winters BD, Sarma SV, Winslow RL (2021) SWIFT: A deep learning approach to prediction of hypoxemic events in critically-Ill patients using SpO 2 waveform prediction. PLoS Comput Biol 17(12): e1009712. https://doi.org/10.1371/journal.pcbi.1009712 Editor: Benjamin Althouse, University of Washington, UNITED STATES Received: March 17, 2021; Accepted: December 2, 2021; Published: December 21, 2021 Copyright: © 2021 Annapragada et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: One data set used in this publication is part of JH-CROWN: The COVID PMAP Registry which is based on the contribution of many patients and clinicians: Covid-19 precision medicine analytics platform registry (JH-CROWN). https://ictr.johnshopkins.edu/coronavirus/jh-crown/ The JH-CROWN data is not publicly available due to IRB restrictions. JH-CROWN states “Please contact Diana Gumas at dgumas1@jhmi.edu if you have questions”. This study was approved by The Johns Hopkins School of Medicine IRB 00251922: RAPID Ventilator Modeling for Management of Covid 19 (SARS CoV-2) induced ARDS. The other data set used is the eICU Collaborative Research Database: The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG and Badawi O. Scientific Data (2018). DOI: 10.1038/sdata.2018.178. Available at: https://www.nature.com/articles/sdata2018178 Instructions for requesting access are here: https://eicu-crd.mit.edu/gettingstarted/access/ All code was written in Python including the use of scipy, scikitlearn, numpy, keras and tensorflow libraries. The code used to train and evaluate the models is available here: https://github.com/JHU-Winslow-Lab/hypoxemia-pred. Funding: This work was supported by NSF RAPID2031195 (RLW and SVS). AVA acknowledges support from the NIH Medical Scientist Training Program 1T32GM136577 and the Joseph and Helen Pardoll Scholarship for MSTP Students (AVA). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Introduction Hypoxemia, or a decrease in blood oxygen saturation, is a common symptom in critically ill patients, with a multinational, multicenter study finding that hypoxemia is a significant risk factor for mortality, with prevalence greater than 50% in ICU patients [1]. Severe hypoxemia can cause permanent brain injury, end-organ shock and cardiac arrest, and even mild or moderate hypoxemia contributes to increased mortality risk by decreasing resistance to infection and wound healing [2]. Severe cases of COVID-19 are also characterized by hypoxemia and dyspnea (difficulty breathing) which can rapidly progress to respiratory failure [3]. These patients often require advanced life support measures including invasive mechanical ventilation, hospitalization in ICUs and even extra-corporeal membrane oxygenation (ECMO). During the COVID-19 pandemic, ventilators and ICU beds have become scarce resources with insufficient capacity in the hardest hit regions [4]. As the COVID-19 pandemic continues to exact a heavy mortality toll with over half a million deaths directly attributed to the disease in the United States alone and herd immunity by vaccination remains elusive, it is important to find ways to manage these scarce resources and identify patients unable to maintain oxygen saturation without intervention. Clinically, an important decision point in the management of COVID-19 patients is determining whether the patient requires endotracheal intubation, a form of invasive ventilation [3]. Triage systems using monitoring of blood oxygenation to inform life support measures are tremendously useful for directing resource allocation and have been demonstrated to reduce mortality [5]. Given the host of negative clinical outcomes attributed to hypoxemia, identifying patients likely to experience acute hypoxemia in the near future would offer valuable opportunities for rapid intervention. Life support interventions ranging from supplemental oxygen to invasive ventilation prior to the onset of hypoxemia can mitigate or prevent the morbidity and mortality associated with hypoxemia [2]. Moreover, identifying patients not at imminent risk of hypoxemia represents an opportunity to conserve ventilators and ICU beds in the context of resource shortages arising from a global pandemic. To this end, we present SWIFT (SpO 2 Waveform ICU Forecasting Technique), a neural network that predicts the blood oxygen saturation (SpO 2 ) waveform for critically ill patients, 5 and 30 minutes in the future. SWIFT is unique for several reasons. First, SWIFT predicts both the occurrence and magnitude of hypoxemic events, and its prediction time horizon provides enough time for potential clinical interventions prior to acute desaturation events. Prior studies have made predictions on short time horizons (20 seconds to 5 minutes), leaving little room for potential clinical interventions [6–9]. Moreover, most other attempts at hypoxemia prediction predict only a class value (hypoxemia vs. no hypoxemia, or mild hypoxemia vs. severe hypoxemia vs. no hypoxemia) rather than an actual SpO 2 value [6–8]. Clinically, there is a large difference between a transient dip in SpO 2 to 91% versus an acute desaturation to 75% SpO 2 , though both would be considered hypoxemia. SWIFT recognizes this difference, hence providing important clinical information. Second, SWIFT employs a Long Short-Term Memory (LSTM) architecture with only prior SpO 2 values as inputs, hence allowing SWIFT to make predictions with limited, routinely acquired and readily available data. LSTM models are a type of recurrent neural network well-suited to modeling of time-series data that have shown promise in clinical applications [10–12]. One prior study did use LSTM architectures with prior SpO 2 values as inputs, but this model was limited to classification of timepoints as either hypoxemic or not with a 5 minute time horizon, and the total ROC-AUC was less than 0.75 [7]. In contrast, other SpO 2 prediction models have used complex, multifactorial data requiring extensive monitoring of patient vitals, demographic data, or ventilator settings [8,9]. This limits their utility to only those patients for whom all of this data is readily available. Third, SWIFT predicts more than 80% of all hypoxemic events (sensitivity) with positive predictive value (PPV) above 94% in two test-sets of ventilated and non-ventilated critically ill patients, and more than 60% of all hypoxemic events with PPV above 98% in a test-set of COVID-19 patients, across all timepoints for both the 5 minute and 30 minute time horizons. SWIFT also provides waveform predictions with an average mean squared error less than .0007 across all patient-stays. These results represent a marked improvement over recently published prediction algorithms. Auto-regressive models with PPV >90% have been limited to prediction time horizons less than 60 seconds [6], and ensemble-based machine learning models to classify hypoxemic events 5 minutes in the future were estimated to capture only 30% of hypoxemic events [9]. To our knowledge, no other study has demonstrated waveform prediction. Finally, SWIFT is highly generalizable across hospital systems, timeframes, and patient conditions. Though trained on only patients without COVID-19, it performs comparably on patients who received mechanical ventilation during their ICU stay and those who did not, and patients with and without a COVID-19 diagnosis. Other studies have been limited to specific groups such as pediatric patients on mechanical ventilation [8], orthopedic postoperative adult patients [6], or patients undergoing surgery in the operating room [7–9]. Discussion SWIFT is a Long Short-Term Memory neural network model capable of predicting the magnitude and occurrence of hypoxemic events 5 and 30 minutes in the future, using only prior SpO 2 values. We tested SWIFT on three different test sets of ICU patient-stays, including patients both requiring and not requiring mechanical ventilation during their ICU stay, and patients with and without COVID-19. Across all time points in these test-sets, SWIFT predicts more than 80% of all hypoxemic events (sensitivity) with PPV above 94% in test-sets of critically ill patients, and more than 60% of all hypoxemic events with PPV above 98% in test-sets of COVID-19 patients, for both the 5 minute and 30 minute time horizons. Additionally, SWIFT-5 and SWIFT-30 accurately predicted SpO 2 waveforms for each patient-stay with an average MSE below .0007 and an average Pearson’s correlation coefficient greater than .95. SWIFT may be especially useful in the context of the COVID-19 pandemic or future similar pandemics with high numbers of patients experiencing hypoxemia and limited supplies of ventilators and ICU beds. Strategies to reduce the demand for mechanical ventilation have been identified as a priority for resource management during the pandemic [4]. To this end, SWIFT can help identify patients likely to experience imminent hypoxemic events versus patients likely to remain stable and offer insights into the magnitude of the potential hypoxemic event. This can enable the increased management of patients off of ventilators, and if needed, offer another data point to be used in the triaging of patients for therapy. Beyond the COVID-19 pandemic, SWIFT could be easily deployed in real time, in low-resource settings without access to complex clinical informatics or large amounts of memory storage. Since SWIFT’s only model inputs are two previous values of SpO 2 , the barriers to use are minimal. SpO 2 can be assessed using simple, non-invasive pulse oximeters. Pulse oximetry is nearly ubiquitous in hospitals and critical care units in the developed world, and substantial effort has been dedicated to increasing the use of pulse oximetry in low resource settings [17]. Given the existing need for hypoxemia monitoring in low and middle income countries and challenges in access to oxygen therapy, SWIFT’s predictive capabilities can play a crucial role in identifying patients likely to experience hypoxemic events and in informing resource allocation decisions [18]. Specifically, since SWIFT-30 uses data sampled at 30 minute intervals (rather than 5 minute intervals), it is especially suitable for scenarios in which high frequency monitoring is not available. Moreover, SWIFT provides benefits by waveform prediction of SpO 2 rather than only binary classification of events as provided by other existing models [7–9]. Studies have demonstrated that pulse oximetry has high levels of false alarms, often for clinically insignificant reasons such as patient movement or skin condition, which can contribute to alarm fatigue [19,20]. Alarm fatigue may lead to slower or absent responses to truly dangerous events [21]. Since SWIFT provides a prediction of SpO 2 magnitude as much as 30 minutes in the future, minor anticipated hypoxemic events can be distinguished from more severe ones, with sufficient time horizon to allow for decision making on this basis. While SWIFT cannot correct for errors in SpO 2 readings caused by the pulse oximetry device, it can anticipate transient dips hence preventing unnecessary response to transient SpO 2 dips that may occur for clinically insignificant reasons. This may have a beneficial effect on controlling the phenomena of alarm fatigue. Importantly, SWIFT generalizes well across patient groups. We did not observe substantial differences in model performance between SWIFT-5 and SWIFT-30, nor between predictions made on ventilated vs. non-ventilated patients and COVID-19 vs. generally critically ill patients. The one exception was sensitivity, when aggregated across all timepoints–in the eICU test sets, more than 80% of hypoxemic events were detected as compared to more than 60% in the JH-CROWN test sets. This is unsurprising given that SWIFT was trained exclusively on non-COVID-19 patients, and the JH-CROWN database consists only of COVID-19 patients. Notably, the lung damage from SARS CoV-2 infection appears more severe than that from Acute Respiratory Distress Syndrome (ARDS) secondary to most other etiologies, and we are still in the early stages of understanding COVID-19 disease mechanistically. This difference in degree of lung damage may contribute to the performance differences. Moreover, these predictions appear to be generalizable across hospitals and dates (the eICU database comprises patient-stays from 208 ICUs in 2014 and 2015, whereas the JH-CROWN database consists of patients from one medical center in 2020). Our test-sets contained male and female patients in roughly equal proportions, a range of admissions diagnoses, and substantial numbers of patients with non-white ethnicities (Table 1). However, one limitation of SWIFT is that it was trained and tested primarily on older, critically ill patients. The median age of patients in each test-set was between 60 and 65 years old, and all data came from critically ill patients. Hypoxemia is a consideration in much younger patients as well, and future work will be needed to evaluate SWIFT-5 and SWIFT-30 on younger patients, or to train new models with additional data. A second limitation is that we did not train race-specific models. Recent work has shown that occult hypoxemia (low arterial oxygen saturation despite a pulse oximetry measurement between 92% and 96%) occurs far more frequently in Black patients than White patients [22]. For this reason, there is racial bias in interpretation of SpO 2 values, which may not be well captured by our models (though our test sets are racially diverse; the JH-CROWN test sets have ~75% non-White patients). Regardless, SWIFT currently demonstrates high potential utility for simple, real-time prediction of hypoxemic events (occurrence and magnitude) 5 and 30 minutes in the future without the use of complex clinical informatics. As part of a clinical decision support system, SWIFT has the potential to inform the management of critically ill patients at risk for hypoxemia, including COVID-19 patients. Methods Data selection First, we selected all patient ICU stays with mechanical ventilation at some point during the ICU stay from the eICU database (n = 1326) [13]. The eICU database consists of critically ill patients treated in 208 intensive care units across the United States in 2014 and 2015. We defined ICU stays with mechanical ventilation as distinct patientUnitStayID identifiers for which a respiratory chart entry included phrases similar to ET TUBE, ETT, Endotracheal, Trach, or Tracheostomy. Then, we randomly selected 1326 patientUnitStayID identifiers from those without indication of mechanical ventilation. We partitioned the first 1000 patientUnitStayID identifiers from the mechanical ventilation and no mechanical ventilation groups into a training set (n = 2000), and the last 326 from each group into two eICU test sets (eICU mechanical ventilation n = 326, eICU no mechanical ventilation n = 326). Then, we queried the vital signs time-series for each of these ICU stays, and excluded any ICU stays without corresponding SpO 2 data recorded. This left 1933 stays in the training set, 326 stays in the mechanical ventilation test set, and 311 stays in the no mechanical ventilation test set. Since it is possible for a patient in the eICU database to have multiple ICU stays, we took the additional step of removing all ICU stays from the test sets for which that patient had a different ICU stay in the training set. This ensured that there was no overlap in patients between the train and test sets despite being unique ICU stays. This left 1933 stays (corresponding to 1859 patients) in the training set, 317 stays (corresponding to 285 patients) in the mechanical ventilation test set, and 311 stays (corresponding to 306 patients) in the no mechanical ventilation test set. Second, all patient stays from the JH-CROWN database up to December 15, 2020 were selected (n = 301). The JH-CROWN database consists of COVID-19 patients seen in any Johns Hopkins Medical Institution facility with confirmed or suspected COVID-19. Each patient-stay in the JH-CROWN database corresponds to a unique patient (n = 301). Data extraction was performed using PostgreSQL, and the Python libraries psycopg2 and pandas [23]. Finally, those patients with entirely blank values for SpO 2 were excluded. The eICU database contains vital signs recorded at 5 minute intervals, whereas the JH-CROWN database records vital signs at variable frequency. Therefore, the data in the JH-CROWN database was interpolated to 5 minute intervals by replacing blank values of SpO 2 with the last valid observation. In the patients selected from the JH-CROWN database, the median time between observations was 25 minutes. If the first SpO 2 value was missing, it was backfilled with the first available SpO 2 value. Finally, those time series with less than 61 datapoints (5 hours) were excluded. This left 1837 patient-stays for model training, and 310, 288, and 298 patient-stays in the eICU Mechanical Ventilation, eICU No Mechanical Ventilation and JH-CROWN test sets respectively (S3 and S4 Figs). Data preparation All SpO 2 values were transformed using the following equation: This transformed value was chosen to magnify differences between SpO 2 values close to 100%. Next, a causal moving average filter with a window of 5 was applied to each patient’s transformed SpO 2 waveform (ie, the SpO 2 values at the previous 4 timepoints and the current timepoint were averaged together. The first 4 available timepoints necessarily did not have smoothing applied). We chose this data smoothing technique since it is causal, meaning that it can be applied in real time, and it reduces transient noise hence providing a less noisy signal more suitable for clinical decision making. Other studies of hypoxemia prediction have also applied averaging filters to time-series data prior to prediction [6,9]. The time series for each patient was then down-sampled to 30 minute frequency for use with SWIFT-30 model which predicts SpO 2 30 minutes in the future. For SWIFT-5, which predicts SpO 2 5 minutes in the future, no changes were made. Finally, the smoothed time series data for each patient was rearranged into an input vector X, and output vector Y where p n is SpO 2 at timestep n: Finally, in the training set, all X vectors were concatenated and all Y vectors were concatenated to create one training set input vector and one training set output vector to be used in model training. In the 4 test sets, the patient-level input and output vectors were maintained to be used for model testing. Data preparation was performed in Python using standard data science libraries [23–25]. Model training A 3-fold cross validation procedure was used for hyperparameter optimization on the training data to evaluate 2 different LSTM model architectures (a deep architecture with 5 LSTM hidden layers and a shallow architecture with 2 LSTM hidden layers; both models had a Batch Normalization input layer and a Dense 1 neuron output layer and contained Dropout layers to prevent overfitting) and 3 different learning rates (ADAM optimizer with learning rates .001, .01 and .1). For both SWIFT-5 and SWIFT-30, the shallow architecture with learning rate .001 demonstrated the lowest average MSE across folds. For this selected architecture, the dropout ratio was .1. The first LSTM hidden layer had 256 nodes and the second had 16 nodes. This architecture was then used to re-train the final models on the full training set. The models were trained for 100 epochs with a random 10% validation set at each epoch. To prevent overfitting, the model weights at the epoch with lowest validation loss were used for the final SWIFT-5 and SWIFT-30 models. All model training was performed using the TensorFlow and Keras libraries in Python [26,27]. Model testing SWIFT-5 and SWIFT-30 were used to predict the transformed SpO 2 waveform for each individual patient-stay in each of the three test sets. Then, the mean-squared-error and Pearson’s correlation coefficient were calculated between the true and predicted waveform for each patient-stay. Pearson’s correlation coefficient could not be calculated for 1 patient-stay in the JH-CROWN test set since the time series was constant and the correlation coefficient was undefined. Next, each time point was classified as hypoxemic or not based on a threshold of SpO 2 92% (transformed SpO 2 .55067). Each prediction was also checked against the same threshold, and the sensitivity, specificity, accuracy, and PPV were calculated for each patient-stay time series. Sensitivity was not calculated for those patient-stays with no hypoxic events; Specificity was not calculated for those patient-stays with all hypoxic events; PPV was not calculated for those patient-stays for which no predictions were positive for hypoxemia, since these values are undefined in these cases. Finally, all timepoints in each test set were aggregated, and the false positive, true positive, false negative, and true negative rates were calculated for each test set. Acknowledgments Stephen Granite assisted with accessing the data sets. [END] [1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009712 (C) Plos One. "Accelerating the publication of peer-reviewed science." Licensed under Creative Commons Attribution (CC BY 4.0) URL: https://creativecommons.org/licenses/by/4.0/ via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/