(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Open-source dataset reveals relationship between walking bout duration and fall risk classification performance in persons with multiple sclerosis [1] ['Brett M. Meyer', 'Department Of Electrical', 'Biomedical Engineering', 'University Of Vermont', 'Burlington', 'Vermont', 'United States Of America', 'Department Of Biomedical Engineering', 'University Of Massachusetts Lowell', 'Lowell'] Date: 2022-12 Abstract Falls are frequent and associated with morbidity in persons with multiple sclerosis (PwMS). Symptoms of MS fluctuate, and standard biannual clinical visits cannot capture these fluctuations. Remote monitoring techniques that leverage wearable sensors have recently emerged as an approach sensitive to disease variability. Previous research has shown that fall risk can be identified from walking data collected by wearable sensors in controlled laboratory conditions however this data may not be generalizable to variable home environments. To investigate fall risk and daily activity performance from remote data, we introduce a new open-source dataset featuring data collected from 38 PwMS, 21 of whom are identified as fallers and 17 as non-fallers based on their six-month fall history. This dataset contains inertial-measurement-unit data from eleven body locations collected in the laboratory, patient-reported surveys and neurological assessments, and two days of free-living sensor data from the chest and right thigh. Six-month (n = 28) and one-year repeat assessment (n = 15) data are also available for some patients. To demonstrate the utility of these data, we explore the use of free-living walking bouts for characterizing fall risk in PwMS, compare these data to those collected in controlled environments, and examine the impact of bout duration on gait parameters and fall risk estimates. Both gait parameters and fall risk classification performance were found to change with bout duration. Deep learning models outperformed feature-based models using home data; the best performance was observed with all bouts for deep-learning and short bouts for feature-based models when evaluating performance on individual bouts. Overall, short duration free-living walking bouts were found to be the least similar to laboratory walking, longer duration free-living walking bouts provided more significant differences between fallers and non-fallers, and an aggregation of all free-living walking bouts yields the best performance in fall risk classification. Author summary Falls are both highly prevalent and injurious in persons with Multiple Sclerosis (PwMS), thus we are interested in finding methods to understand the fall risk of PwMS. To examine the differences between PwMS in a clinic environment and at home, we collected and made publicly available a dataset where PwMS performed daily life activities in the clinic and then wore wearable sensors at home for two days. We found people walk very differently at home vs in-clinic. However, the longer they walk for, the closer their walking attributes relate to how they walk in-clinic. Additionally, in examining multiple approaches, we found both the full length and short bouts of at-home walking periods can identify the fall risk of PwMS- each providing varying levels of performance. Crucially, we find that methods and assessments developed for in-clinic methods may need to be adjusted to function properly at home and when performing walking analysis at home, analyzing differing durations of walking will impact the results. Citation: Meyer BM, Tulipani LJ, Gurchiek RD, Allen DA, Solomon AJ, Cheney N, et al. (2022) Open-source dataset reveals relationship between walking bout duration and fall risk classification performance in persons with multiple sclerosis. PLOS Digit Health 1(10): e0000120. https://doi.org/10.1371/journal.pdig.0000120 Editor: Yuan Lai, Tsinghua University, CHINA Received: March 24, 2022; Accepted: September 2, 2022; Published: October 18, 2022 Copyright: © 2022 Meyer et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: Data are available on SimTK at the following URL: https://simtk.org/projects/msense_ms_adls. Funding: This work was funded by National Institute of Health grant EB027852 (RSM). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: AJS discloses consulting for EMD Serono, Biogen, and Alexion, and research funding from Biogen and NIH. RSM discloses stock ownership in Impellia, Epicore Biosystems, Allostatech, and PanicMechanic; consulting for Impellia, HX Innovations, Pfizer, Solid Biosciences, and Happy Health; and research funding from MC10, Medidata, Epicore Biosystems, NIH, and NSF. Introduction Multiple Sclerosis is characterized by progressive demyelination and axonal damage throughout the central nervous system [1,2]. As a result, persons with multiple sclerosis (PwMS) experience symptoms including debilitating fatigue and impaired coordination, muscle strength, and sensation, leading to difficulty with postural control in dynamic activities which, in turn, leads to falls [3]. Over 50% of falls result in injury and 66% of first-time falls require a visit to the emergency department, reducing quality of life and yielding an estimated annual healthcare cost of $80 billion in the United States alone [4]. Of the 2.3 million PwMS globally, over half will experience a fall in any three-month period [5]. As MS is a chronic condition, injurious falls pose a substantial and long-term burden to patient quality of life and the healthcare system [6]. Given these impacts, effective fall prevention is critical. Fall risk in PwMS is difficult to assess as it is known to vary both within and across days. Fall risk may be elevated in the absence of an assistive device (e.g., walking sticks) [7] or during balance-challenging tasks, such as walking, position transfers, and changes of direction [8]. However, current clinical assessments often only occur once every six months; an observation frequency incapable of capturing the true time-varying nature of symptoms in MS, limiting the ability to prescribe preventative interventions [9]. There is a clear need for novel assessments that are sensitive to this inherent variability and that can capture the relationship between symptom fluctuations and fall risk. One approach is for assessments to incorporate continuous monitoring in free-living conditions, which provide far more than a twice-per-year snapshot of symptoms, and advanced machine learning techniques that can effectively capture the complex relationship between these movement data and fall risk. With the growing availability of wearable sensor data, it may now be possible to leverage machine learning, and particularly deep learning models, to learn high-level outcomes like fall risk directly from raw sensor data without manual feature engineering [10,11]. Studies employing deep learning for time series classification tasks, such as our prior work classifying fall risk in PwMS from in-lab measurements [12] and work from others to detect falls and classify fall risk in non-MS populations with balance and mobility impairment [13–21], have found superior results when compared to machine learning techniques that rely on manually-constructed features. Notably, these results are achieved despite the significant amounts of data needed for training deep learning models. It is possible that given larger available datasets, performance of these models could improve further, but the accumulation of these large datasets remains a barrier to entry for many into the use of deep learning models for characterizing fall risk. Remote gait monitoring in PwMS may enable continuous fall risk assessment and the deployment of personalized fall prevention interventions. In this approach, data from individual walking bouts could inform fall risk status instantaneously. This vision has motivated the development of fall risk classification models that require only wearable sensor data from a single gait bout as model inputs [12,22,23]. However, deploying these models remotely comes with additional challenges that may impact model performance. For example, it is well established in PwMS [24–26] and other populations [27–29] that gait observed in the clinic differs from gait observed remotely (especially for gait speed-dependent variables). Similarly, studies in older adults [30] and PwMS [24] have also discovered that gait parameters change with walking bout duration. However, it is currently unclear how walking bout duration relates to fall risk in PwMS [7,30], and this has not been evaluated in previous development of fall risk classification models [12,22,23]. The primary objective of this work is to share a new, open-source dataset that can help other research groups develop digital biomarkers of impairment and fall risk in PwMS. In service to this objective, we present a framework for remote gait analysis on this dataset and use it to examine how gait parameters and fall risk classification performance, based on feature-based machine learning and stride acceleration based deep learning methods, change in relation to walking bout duration in PwMS. Materials and methods Dataset: Subjects and protocol A sample of 38 PwMS (21:17 fallers:non-fallers; 12:27 Male:Female, mean ± standard deviation age 51 ± 12 y/o), recruited from the Multiple Sclerosis Center at University of Vermont Medical Center participated in this study (exclusion: no major health conditions other than MS, no acute exacerbations within the previous three-months, ambulatory without the use of assistive devices). PwMS who self-reported to have fallen within the previous six-months were characterized as fallers based on the criteria “consider a fall as an event where you unintentionally came to rest on the ground or a lower level.” All participants were asked to return for two additional identical study visits six-months and one-year following their initial visit. Of the 38 original cohort, 28 returned for a six-month follow-up (15:13 fallers:non-fallers; 8:20 Male:Female), and 15 returned for a one-year follow-up (6:9 fallers:non-fallers;6:9 Male:Female). Patients completed self-reported 6-month fall history each visit, allowing their fall status to change at subsequent visits. The high attrition rate observed in this study was largely due to the COVID-19 pandemic, as 3 six-month and 11 one-year follow-ups were cancelled for this reason. On the day of testing, subjects provided written informed consent to participate in the study. A neurologist with subspecialty expertise in MS completed the Expanded Disability Status Scale (EDSS) for each subject [31]. Subjects were asked to complete a fall history survey, Activities-specific Balance Confidence Scale (ABC) [32], Modified Fatigue Impact Scale (MFIS) [33], Neurological Sleep Index (NSI) [34], and Twelve Item MS Walking Scale (MSWS) [35]. Two missing NSI entries in the clinical survey data were filled using k-nearest-neighbors (n = 3) [36]. Table 1 reports demographics of the sample. PPT PowerPoint slide PNG larger image TIFF original image Download: Table 1. Subject demographics. https://doi.org/10.1371/journal.pdig.0000120.t001 Subjects performed several activities in the lab completed in the following order: right and left tibialis anterior maximum voluntary contraction, timed-up-and-go (TUG) [1], timed 25-foot walk test [37], 30-second chair stand test [38], lying to standing transition, three separate two-minute standing tests: tandem standing, feet shoulder-width apart eyes open, and feet shoulder-width apart eyes close, one-minute hallway walk at a self-selected pace including one turn, 30-second normal standing, 30-second upright sitting, 30-second slouch sitting, and 30 seconds each lying on back, left side, right side, and prone. During the lab visit, subjects were instrumented with MC10 BioStamp sensors. Accelerometer (31.25 Hz, ±16G) and electromyography (1000 Hz) were collected from the right and left tibialis anterior. Accelerometer (250 Hz, ±16G) and angular rate gyroscope data (250 Hz, ±2000°/s) were collected from the chest and lower back as well as bilaterally from the anterior thighs, proximal lateral shank, and dorsal aspect of the feet. Electromyography was collected to allow the investigation of foot drop, a common cause of falls in PwMS [39]. Detailed placement information can be found in Table 2. At the conclusion of the lab visit, the participants were sent home with two MC10 BioStamp sensors for 48 hours located on the medial chest and right anterior thigh measuring acceleration (31.25 Hz ± 16G) and placed in accordance with Table 2. Data from these sensors were recorded throughout the subject’s daily life. These deidentified data are available at < https://simtk.org/projects/msense_ms_adls>. This protocol was approved by the University of Vermont’s Institutional Review Board (CHRMS 18–0285). Portions of this dataset have been used previously to support the development of approaches for characterizing fall risk from lab-based gait and from in-lab and remotely tracked thirty-second chair-stand tests [12,40,41]. In these studies, raw gait data collected in lab and deep learning models were able to adequately classify fall risk, and chair-stand-tests conducted remotely and in lab provided similar levels of fall risk classification performance. PPT PowerPoint slide PNG larger image TIFF original image Download: Table 2. Sensor Placement. https://doi.org/10.1371/journal.pdig.0000120.t002 Remote gait analysis An overview of the remote gait analysis pipeline is presented in Fig 1. The depicted framework begins with acceleration gathered from the BioStamp sensors located on the thigh and chest followed by activity classification (e.g. finding walking), event detection within walking bouts, feature extraction, and finally analysis. Each aspect of this pipeline (gait bout identification, stride detection, parameter extraction, and analysis) are discussed in more detail below. In terms of analysis, we examine the impact of context and bout duration on discriminating fallers from non-fallers, and on the performance of feature-based and deep learning methods for classifying fall risk. These analyses are only performed on the data from the initial study visit (n = 38). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 1. Pipeline for free-living gait analysis from BioStamp nPoint wearable sensor data. Activity classification is performed via deep neural network (BiLSTM architecture) on windows of accelerometer data sampled from the chest and thigh. Walking bouts are extracted from the resulting activity timeseries and gait events are identified using previously validated approaches to detect strides. Gait parameters are extracted from each walking bout and used for further analysis. https://doi.org/10.1371/journal.pdig.0000120.g001 Activity classification Activity classification was carried out with wearable sensor data from the chest and thigh. Gait bouts were identified using a deep learning approach that leverages a Long Short Term Memory (LSTM), a type of recurrent neural network for analyzing time series data, architecture adapted from [42]. Specifically, the network is composed of a single BiLSTM layer with 215 hidden units [43], a 40% drop out layer [44], and ADAM optimization [45]. This classifier was developed using 58% data from PwMS, 26% from healthy adults, and 16% from persons with Parkinson’s Disease to provide a wide variety of example gait and non-gait data for training. Data labeled as gait were sampled from prescribed slow, comfortable, and fast walking trials completed overground, as well on a treadmill for healthy adults. Data labeled as non-gait were sampled from standing, sitting, lying, running and stair ascent and descent. Ten-fold cross validation was conducted on the training set consisting of 20,000 4-second observations (50:50 gait:non-gait) yielding validation accuracy of 98.5%. Performance on a held-out test set consisting of 3,000 observations (50:50 gait:non-gait) was 98.4%, providing evidence that the classifier is well positioned to be used on new datasets. This network was then leveraged to identify all walking bouts completed by all subjects during the 48-hour free-living wear period. Walking bouts were identified by classifying 4-second segments of data, where consecutive walking segments were concatenated into a single bout. Stride detection Following walking bout identification, strides were extracted using the method described and validated in [46,47]. At a high level, this stride extraction method estimates step and stride frequency from the power spectral density of the thigh accelerometer signal. A filter bank based on these frequencies then provides the signals used to identify foot-off and foot-contact events from specific signal features. This algorithm has been validated on a wide range of walking speeds, 0.56–1.78 m/s [47], which covers the expected range of walking speeds for PwMS [48]. Bouts with fewer than two extracted strides were removed automatically before proceeding with the analysis that follows. Gait parameter extraction Following walking bout and stride identification, the following features were calculated for each stride and averaged for each bout; stance time, swing time, stride time, coefficient of variation of stride time (stride time CV), duty factor, and coefficient of variation of duty factor (duty factor CV) [46]. The remaining features were calculated on the entire bout. Root mean square of the anterior-posterior acceleration from the chest sensor (RMS AP) [49], medial-lateral frequency dispersion of the chest sensor (Freqd ML) [49], and the entropy ratio between the thigh and chest [50]. Lyapunov exponent of the medial lateral (Ly ML) and anterior-posterior (Ly AP) chest sensor were calculated for gait bouts longer than 60 seconds [49]. The features mentioned above were selected based on previous literature that demonstrates their association with MS-induced gait impairment and fall risk. Stance time, swing time, and stride time have been shown to be significantly correlated with patient reported walking impairment in PwMS [51]. Stride time, duty factor [52], RMS AP, and Freqd ML have been shown to identify differences in walking impairment between PwMS and healthy controls [49]. Stride time CV has been shown to be strongly associated with fall risk in PwMS [53]. Non-linear measures, entropy ratio [50] and Lyapunov exponent in the ML and AP directions of chest acceleration [49], have been shown to capture gait stability in PwMS. Walking context and bout duration analysis Gait parameter data were grouped into one of three categories based on the duration of the walking bout from which they were extracted: short—8 seconds or shorter; medium—12–28 seconds; or long—32 seconds or longer. These durations were based on results reported in other examinations of free-living gait [54]. Comparisons to gait parameters derived from lab-collected hallway-walking data and combined home data, grouped as all, were also made. Bouts where strides could not be identified or with physiologically impossible values were deleted (496 removed in total). Gait parameters for each walking bout in each duration were summarized using mean, median, max, min, standard deviation, 5th percentile, and 95th percentile for each subject. Group differences in each of the gait parameters were identified using Wilcoxon Rank Sum tests between bout durations between fallers and non-fallers at each bout duration and between in-lab and free-living contexts. A significance threshold of α = 0.05 was used for all statistical testing. Feature-based fall risk classification Statistical models that require extracted features for discriminating between individuals at high and low risk for falls were trained and tested on five different feature-sets: gait parameters calculated on short, medium, and long gait bouts, all free-living gait bouts, and in-lab gait data. These feature-sets contained one entry per identified valid walking bout. Classifier performance was established using leave-one-subject-out cross validation (LOSO-CV). In this approach, data from all but one participant (N = 37) were partitioned into a training dataset while data from the remaining subject was used for testing. This process was repeated until data from each subject had been included in the test set. The LOSO-CV approach ensures the model was tested on subjects it had not previously seen, which provides a realistic estimate of how the model would perform during real-world use. The normalized posterior probabilities, known as the decision scores, assigned to the held-out subject were combined to calculate an overall model performance by considering the area under the receiver operating characteristic curve (AUC). AUC was chosen as the main performance metric because it provides a comprehensive measure of how well a classifier is able to discriminate between groups and allows the results to be compared to other studies. Features were normalized using z-scores then reduced using principal components analysis (PCA) within each iteration of the LOSO-CV. Prior to feature reduction, short, medium, and all-bouts have 8 features per input, long bouts have 9 features per input, and lab bouts have 11 features per input. To explain the discrepancy in the number of features, note that Entropy Ratio is computed for the long bouts and Entropy Ratio, Lyapunov Exponent AP-direction, and Lyapunov Exponent ML-direction are computed for lab walking. The principal components that explained 95% of the variance of these reduced feature sets were extracted, resulting in approximately 6 principal components for each home walking duration and 7 principal components for lab data. The reduced feature sets were then used to train Logistic Regression (LR) [55], Support Vector Machine (SVM) [56], Decision Tree [57], K-Nearest Neighbors (KNN) [58], and Ensemble of Trees (ENS) [57] binary statistical classification models to discriminate between subjects at high and low fall risk. A variety of model types were used to capture different relationships in the feature space, as each model excels with different shaped feature spaces [59]. Similar modeling approaches have been used previously to assess fall risk, as the fall risk of non-fallers is considered low and fallers high [12,23]. Model hyperparameters were optimized with MATLAB’s Optimize Hyperparameters feature, with no access to test data, for each input feature set to provide the highest classification performance in terms of AUC. Deep learning fall risk classification Based on previous literature [12], we also developed deep learning models for classifying walking fall risk. As used previously, we leveraged Long Short-Term Memory (LSTM) networks for this analysis. In our prior work, we demonstrated that the best classification performance was achieved considering four strides of data per input to the model, and showed that model performance changed with the number of strides considered [12]. For our analysis, we first optimized our networks to provide the best performance using four strides per input. This was done by extracting every walking bout with four or more strides and concatenating every consecutive four strides into a model input. These inputs contain three channels of raw acceleration from both the thigh and chest sensor from sequential strides. These data were arranged as a 6xN cell array, where the six represents the number of acceleration channels from both sensors and N represents the lengths of each stride summed. In the example case of a four-stride input, each input consisted of the thigh and chest acceleration from extracted stride 1 concatenated with the data from stride 2, then 3 and 4. Model outputs were a decision score for each input representing the posterior probability that the input belonged to a given class. Models were trained using LOSOCV, where n = 36 for training, n = 1 for validation, and n = 1 for testing for each training iteration (n = 35). A modified LOSOCV procedure was used for the deep learning methods to include an additional validation set to investigate the impacts of adjusting the number of training epochs; note, this method ensures that all data from a given subject is only included in one of the training, validation, or test sets. Using four stride inputs, we optimized our model over the number of LSTM or Bidirectional LSTM (BiLSTM) layers, training epochs, and number of hidden units based on the validation performance. The best two models were then selected and used to train inputs with one through twenty-two strides. The model referred to as LSTM 2 consisted of the following layers: an LSTM layer with 290 hidden units, 30% dropout, BiLSTM layer with 10 hidden units, 40% dropout, a fully connected layer, and softmax. The model referred to as LSTM 3 consisted of the following layers: an LSTM layer with 85 hidden units, 55% dropout, an LSTM layer with 85 hidden units, 55% dropout, an LSTM layer with 235 hidden units, 45% dropout, a fully connected layer, and softmax. The models were trained for 55 and 125 epochs, respectively, and both utilized adam optimization. Model denoted as ABC contained the subjects’ ABC score in the model inputs. Performance was assessed using area under the receiver operator curve (AUC) from the held-out test set for individual input predictions and for an aggregated model performance using the median classification from each subject. Discussion In this paper we present a novel wearable sensor dataset collected from PwMS. This dataset includes data from a supervised laboratory visit, neurologist assessments, patient reported measures, and an unsupervised monitoring period for each PwMS. Novel findings from the in-lab period of this study have found walking and 30-second chair stand tests to be indicative of fall risk [12,40]. Analysis of free-living 30-second chair stand tests and posture transitions have also revealed relationships with fall risk and impairment [41]. Herein, we presented a preliminary analysis of walking in the free-living environment as it relates to fall risk and differing lengths of walking bouts. The main finding from this study is that both gait bout length and environment influence wearables-based fall classification in PwMS. Specifically, the best performance overall was observed for classifiers that use lab data or long, steady walking bouts that are similar to the lab (Fig 2 and S1 Table). The best performing feature-based model on free-living data was trained on short walking bouts, suggesting that short free-living bouts may be worth further exploration with a more nuanced feature-set. Our best un-aggregated deep learning model was trained on 3-stride inputs from all bouts. We hypothesize this performed best because deep learning models require a large amount of data to train and considering all bouts allows the model access to far more data than just the short bouts. Compared to other fall risk classification studies, the performance of our remote fall risk classifier is on par with many lab-based studies, but still lags behind the best approaches. In-lab studies have achieved AUCs between 0.73 and 0.79 in older adults [60]. In PwMS an in-lab study using the dynamic gait index achieved an AUC of 0.80 [61] and our prior work, where a deep learning model was used on walking data, achieved an AUC of 0.88 [12]. The difference between our previous lab-based fall risk performance of 0.88 and the performances presented herein highlights a key challenge in using deep learning methods on remote data. Namely, that the model must be able to reconcile the additional variability in gait observed under free living conditions. Performance was observed to increase with increasing dataset size in Fig 4, indicating that deep learning approaches may be able to learn appropriate representations of the data to account for this variability, but the dataset considered here is likely not large enough. By open-sourcing these data, we aim to allow future researchers to realize the promise of deep learning for fall risk classification in PwMS. Our finding that bout length and environment influence discrimination of fallers from non-fallers is in agreement with similar gait-based classification applications in patients with neurological disorders. For example, one study found that the features that best discriminate between PwMS and healthy controls were different when using lab data and home data [62]. Similarly, other studies demonstrate that shorter walking bouts provide better discriminative power when trying to identify a person with Parkinson’s Disease versus healthy controls as well [54], and pace is different in free-living walking compared to in-lab for PwMS [24]. The influence of bout length and environment on fall classification is likely related to the observed differences in the various gait descriptors used as features in the classification models (Tables 3 and 4). This finding contributes more generally to the growing body of evidence that controlled in-lab observations of gait are not representative of free-living conditions. In the current study, this discrepancy was more pronounced for short and medium walking bouts than for long; a finding which is likely due to the fact that the in-lab walking bout was, by our definition, a long walking bout (one-minute long). Differences observed between gait parameters calculated at differing bout lengths (see Table 3) show that stride, stance, and swing time decrease as bout duration increases. This likely means that PwMS are increasing their cadence for longer walking bouts. The observed decrease in ML frequency dispersion with increasing bout length also suggests PwMS walk more steadily, with less lateral motion for long duration walking bouts. These results are consistent with Storm et al., who found that gait pace significantly increased and variability significantly decreased with increasing bout length [24]. Karle et al. found little correlation between an in-lab 2-minute walk test and free-living walking [25]. In older adults, Najafi et. al observed significantly different walking strategies between short and long walks [30]. The reason for this change in gait is unknown, however, it can be speculated that shorter walking bouts may elicit more goal-direction actions towards activities other than walking while longer bouts are more purposeful [54]. Further expanding on the involuntary nature of shorter walking bouts, subjects may be more likely to be dual-task walking, in other words focused on more than just walking, and may be more impacted by the start-up and stopping strides [63]. This conjecture aligns with research on dual-task walking in PwMS that shows dual-task walking is more discriminative of impairment than single task walking [64]. The distribution of bout length in free-living gait from the current sample (61% short, 32% medium, 7% long) is comparable to what has been observed in Parkinson’s disease [54]. Preliminarily, this consistency across populations may suggest a phenomenon that is representative of free-living gait more generally. This raises important questions concerning remote gait analysis more broadly to be investigated in future research. For example, does bout length explain the free-living vs. in-lab discrepancy in various gait descriptors consistently observed across multiple populations? If the observed distribution of bout lengths does generalize, then free-living gait is generally short-bout and less purposeful while long, purposeful walking is rare. Further, given that in-lab investigations of gait are controlled and supervised by a clinician or researcher, they may naturally elicit more purposeful walking from the subject (even over short distances) and be less prone to the impacts of fatigue inherent in daily-life. Thus, differences in free-living and in-lab gait may be explained by the fact that aggregated metrics of free-living data (e.g., average gait speed in a 24-hour period) are dominated by those characteristic of short-duration gait bouts (> 50%) and is influenced to a far lesser extent by metrics characteristic of long-duration and purposeful gait bouts (< 10%). There are several limitations to our study. First, our relatively small sample with moderate to low impairment may not generalize to a larger population of PwMS, particularly PwMS with EDSS greater than six, who were not represented in this study. Other studies utilize different sensing modalities that provide gait speed, which was not available with our data collection set up. Additionally, our analysis methods require a four second window to be classified as non-walking to denote separate bouts. This definition of what defines a separate bout may impact certain gait quantity metrics, however, our study uses gait quality metrics which have been shown to be independent of temporal gait bout definitions [65]. Lastly, symptoms in PwMS are known to fluctuate over differing time scales and thus, 48 hours may not have been a long enough collection time to provide an accurate depiction of each participant’s overall mobility status [9]. Future work will be needed to determine how gait parameters vary in PwMS on longer time scales. With the presented dataset, we hope to alleviate one of the most challenging issues related to human subject research with wearables: not having enough data. Publicly available datasets gathered from PwMS are largely related to medical imaging [66–68] and medication [69]. One dataset tackles a related issue: remote fall detection in PwMS [70], however, it is lacking data from PwMS who have yet to become recurrent fallers, preventing the investigation of gait as it relates to distinguishing fallers from non-fallers and potentially fall-risk prediction. Utilizing the presented data, potentially with other collected or open-source data, researchers may be able to leverage deep learning to enhance the performance of their digital biomarkers and phenotypes, and particularly for detecting fall risk in PwMS in both lab and free-living environments. With that said, the vision of real-time fall risk monitoring comes with challenges such as when and how to alert the user to an elevated fall risk, how or if to integrate with their comprehensive care, and these data need to be protected. These are all challenges that will need to be addressed and researched in the future as we move towards a preventative care paradigm for falls in PwMS and other populations with balance and mobility impairment. Conclusion Herein, we introduce a new open-source dataset featuring activities of daily living and functional assessments from a lab environment as well as two days of free-living data in PwMS. This dataset features data from PwMS with lower impairment, including approximately half that do not yet have recurrent fall histories. As an example use case, we present a study of gait in the free-living environment. In this study, we explored differences in gait parameters calculated on short, medium, and long duration walking bouts. Specifically, we investigated the significant differences between durations of home walking and in-lab walking and fall classification performance using features calculated from differing walking durations. Several significant differences were found between the gait parameters at differing durations. We also demonstrated that fall risk classification performance using gait changes based on walking bout duration. Short walking bouts, 8 seconds or less, were found to be the most discriminative, providing significant differences between fallers and non-fallers and providing the best free-living fall risk classification performance in the feature-based models. Additionally, we demonstrated that in-lab walking gait parameters are significantly different from free-living walking, at all durations, and that fall risk models used on remote data should be trained with remote data. While future studies are required to assess the reliability of these findings over a longer time period, these results suggest that remote gait analysis may benefit from focusing on short walking bouts in future analysis. Supporting information S1 Table. Performance of deep learning models by number of strides and data considered. LSTM: Long-Short Term Memory Neural Network; LSTM 2: Model with one LSTM layer and one BilSTM layer; LSTM 3: Model with LSTM Layers; AGG: Aggregation technique (none or median of all remote stride observations); AUC: Area Under the Receiver Operating Characteristic Curve; ABC: Activity Specific Balance Confidence added as input feature; N/A: Not enough data available to extract specified number of strides from each subject. https://doi.org/10.1371/journal.pdig.0000120.s001 (DOCX) [END] --- [1] Url: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000120 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/