(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . A pilot study of the Earable device to measure facial muscle and eye movement tasks among healthy volunteers [1] ['Matthew F. Wipperman', 'Precision Medicine', 'Regeneron Pharmaceuticals Inc', 'Tarrytown', 'New York', 'United States Of America', 'Early Clinical Development', 'Experimental Sciences', 'Galen Pogoncheff', 'Earable Inc.'] Date: 2022-08 Abstract The Earable device is a behind-the-ear wearable originally developed to measure cognitive function. Since Earable measures electroencephalography (EEG), electromyography (EMG), and electrooculography (EOG), it may also have the potential to objectively quantify facial muscle and eye movement activities relevant in the assessment of neuromuscular disorders. As an initial step to developing a digital assessment in neuromuscular disorders, a pilot study was conducted to determine whether the Earable device could be utilized to objectively measure facial muscle and eye movements intended to be representative of Performance Outcome Assessments, (PerfOs) with tasks designed to model clinical PerfOs, referred to as mock-PerfO activities. The specific aims of this study were: To determine whether the Earable raw EMG, EOG, and EEG signals could be processed to extract features describing these waveforms; To determine Earable feature data quality, test re-test reliability, and statistical properties; To determine whether features derived from Earable could be used to determine the difference between various facial muscle and eye movement activities; and, To determine what features and feature types are important for mock-PerfO activity level classification. A total of N = 10 healthy volunteers participated in the study. Each study participant performed 16 mock-PerfOs activities, including talking, chewing, swallowing, eye closure, gazing in different directions, puffing cheeks, chewing an apple, and making various facial expressions. Each activity was repeated four times in the morning and four times at night. A total of 161 summary features were extracted from the EEG, EMG, and EOG bio-sensor data. Feature vectors were used as input to machine learning models to classify the mock-PerfO activities, and model performance was evaluated on a held-out test set. Additionally, a convolutional neural network (CNN) was used to classify low-level representations of the raw bio-sensor data for each task, and model performance was correspondingly evaluated and compared directly to feature classification performance. The model’s prediction accuracy on the Earable device’s classification ability was quantitatively assessed. Study results indicate that Earable can potentially quantify different aspects of facial and eye movements and may be used to differentiate mock-PerfO activities. Specially, Earable was found to differentiate talking, chewing, and swallowing tasks from other tasks with observed F1 scores >0.9. While EMG features contribute to classification accuracy for all tasks, EOG features are important for classifying gaze tasks. Finally, we found that analysis with summary features outperformed a CNN for activity classification. We believe Earable may be used to measure cranial muscle activity relevant for neuromuscular disorder assessment. Classification performance of mock-PerfO activities with summary features enables a strategy for detecting disease-specific signals relative to controls, as well as the monitoring of intra-subject treatment responses. Further testing is needed to evaluate the Earable device in clinical populations and clinical development settings. Author summary Many neuromuscular disorders impair function of cranial nerve enervated muscles. Clinical assessment of cranial muscle function has several limitations. Clinician rating of symptoms suffers from inter-rater variation, qualitative or semi-quantitative scoring, and limited ability to capture infrequent or fluctuating symptoms. Patient-reported outcomes are limited by recall bias and poor precision. Current tools to measure orofacial and oculomotor function are cumbersome, difficult to implement, and non-portable. Here, we show how Earable, a wearable device, can discriminate certain cranial muscle activities such as chewing, talking, and swallowing. We demonstrate using data from a pilot study how Earable can be used to measure features from EMG, EEG, and EOG waveforms from subjects wearing the device while performing mock Performance Outcome Assessments (PerfOs), utilized widely in clinical research. Our analysis pipeline provides a framework for how to computationally process and statistically rank features from the Earable device. Our results, conducted in a pilot study of healthy participants, enable a more comprehensive strategy for the design, development, and analysis of wearable sensor data for investigating clinical populations. Understanding how to derive clinically meaningful quantitative metrics from wearable sensor devices is required for the development of novel digital endpoints, a hallmark goal of clinical research. Citation: Wipperman MF, Pogoncheff G, Mateo KF, Wu X, Chen Y, Levy O, et al. (2022) A pilot study of the Earable device to measure facial muscle and eye movement tasks among healthy volunteers. PLOS Digit Health 1(6): e0000061. https://doi.org/10.1371/journal.pdig.0000061 Editor: Nicole Yee-Key Li-Jessen, McGill University, CANADA Received: February 9, 2022; Accepted: May 9, 2022; Published: June 30, 2022 Copyright: © 2022 Wipperman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All data and computer code are publicly available here: https://github.com/Earable-ML/Pilot-Study-facial-muscle-and-eye-movement-classification/tree/main/data. Funding: The sole funder of this study was Regeneron Pharmaceuticals, Inc. The study was designed by employees of Regeneron Pharmaceuticals, Inc, and data from this study were analyzed by employees of both Regeneron Pharmaceuticals, Inc and Earable, Inc. Competing interests: The authors have read the journal’s policy and the authors of this manuscript have the following competing interests: MFW, KFM, XW, YC, OL, AA, SCH, RA, and OH are current or former employees and shareholders of Regeneron Pharmaceuticals, Inc. GP, RRD, and TV are employees of Earable, Inc. Introduction Facial/cranial and eye movement dysfunction is an important feature of several neurological disorders that affect multiple levels of the neuraxis. [1] Examples include outright facial weakness due to facial nerve palsy or stroke, diplopia, ptosis, and dysphagia caused by neuromuscular disorders such as myasthenia gravis, dystonia, complex extraocular movement deficits, hypomimia, and dysphagia caused by parkinsonian (and other neurodegenerative) conditions. [2,3] Clinical assessment of these symptoms remains a challenge in medicine and clinical research. [4,5] Existing clinical assessments, such as clinician-reported outcomes (ClinROs) or patient-reported outcomes (PROs) may require the patient to frequently visit sites, primarily rely on subjective measures, and may not necessarily reflect a patient’s condition(s) in the real world. Importantly, patient symptoms can be intermittent and vary throughout the day, making reliable assessment difficult. Finally, they can be variable from patient-to-patient depending on their adaptations to increasing muscle weakness. While there are tools that exist to perform quantitative analysis of cranial muscle function, these tools have significant limitations. For example, facial movements can be measured with video-based technologies using either static images or video capture. [6] Surface Electromyography (EMG), which records the electrical movements of facial muscles, can also be used either alone or in combination with video-based methods. [7] Small studies have suggested that Electrooculography (EOG), which measures electrical potential from the front to the back of the eye, can detect differences between parkinsonian patients and controls. [8,9] Screen-based trackers and wearable glasses have been used to monitor extraocular movements and upper cranial activity (e.g. blinking). [10] Together in their current application, these approaches can be cumbersome, difficult to implement, and most importantly, capture facial movements for brief periods of time in an artificial setting. As such, there is an opportunity to identify and/or develop novel non-invasive approaches to measure cranial symptoms of neuromuscular and neurodegenerative disorders to address these problems in key patient populations. If properly studied and validated, these tools may in turn serve to support diagnostic and disease progression assessments by clinicians, but also outcomes assessment in clinical research. If such approaches can leverage wearable sensing technology, they may be able to address the challenges of existing clinical sensors that are limited for use in highly controlled settings, as opposed to more naturalistic environments (e.g., at home). One such wearable device, Earable is a behind-the ear device developed to measure neural and physiological processes. [11] Electrophysiological signals are acquired at 250 Hz via four re-usable electrodes fabricated from a conductive silicon material. Electrodes of the device are positioned at scalp locations directly above the left and right ears and on left and right mastoid processes, yielding raw bio-signal data analogous to that which could be acquired at Electroencephalography (EEG) reference locations T3, T4, M1, and M2 of the 10–20 electrode placement positions [12], EEG being a measurement of surface brain wave function. This electrode configuration also enables high fidelity acquisition of EMG activity from activation of the temporalis and surrounding muscle groups, and EOG signals yielded by eye deflections. Whereas traditional clinical assessment using biophysiological data may be invasive, expensive, and time consuming, Earable is purposed to offer high fidelity data acquisition and processing to the general population. The EMG, EEG, and EOG signals monitored with Earable have been used for the detection and evaluation of a wide variety of physiological phenomena, such as sleep monitoring [13], microsleep detection [14], and acute postoperative pain quantification. [15] Earable has the potential to support outcome assessment for neuromuscular disorders by objectively quantifying facial muscle and eye movement tasks through capturing and analyzing bio-signal data. A significant challenge is that unprocessed bio-signal data is inherently noisy due to several factors, e.g. participants move during clinical assessments, there may be perturbations in electrode-skin contact, there are artifacts from cardiac activity. Additionally, similar factors naturally induce artifacts in the acquired signal data; EEG, EMG, and EOG signals overlap in typical frequency ranges, making direct separation and analysis of the waveform data non-trivial. Thus, as an initial step to develop a digital assessment for neuromuscular disorders, a pilot study was conducted to determine whether the Earable device could measure facial muscle and eye movements. The specific aims of the study were: To determine how the Earable EMG/EOG/EEG signals may be processed to extract features; To determine Earable feature data quality, test re-test reliability, and statistical properties; To determine whether parameters derived from the Earable device can quantify various facial and ocular muscle activities; To determine what features are important for activity level classification, in comparison to raw bio-signal data classification approaches. In this pilot study, we developed 16 mock Performance Outcome Assessments (mock-PerfOs) designed to assess facial and eye movements with the Earable device on N = 10 control volunteer participants. [16] We present our approach of a fit-for-purpose feature engineering pipeline, where we derive features from the EMG, EOG, and EEG waveforms, evaluate feature relationships to each other, and quantitatively assess how features classify different mock-PerfO activities. The steps taken in this study reflect the analytical validation steps of the V3 framework for the development of digital assessments. [17] Taken together, the results from our study highlight the utility of the Earable device and similar devices that collect bio-signal data as potential measurement tools in a clinical trial setting for evaluating facial and eye movement tasks, and enable further clinical development with this and similar devices. Discussion Improving pipelines for the development and analysis of wearable sensor data and frameworks for how to think about these data in clinical settings is critical for improving accurate patient diagnosis and monitoring treatment responses in all stages of clinical drug development. [17,29] Despite the progress made to date, there still exists challenges in both the development of wearable devices themselves as well the ways in which wearable data are processed and analyzed in clinical settings. [17,30] In this work, we demonstrate a proof-of-concept repurposing of a wearable device, Earable (a sleep aid wearable that measures EMG, EEG, and EOG), to assess facial and ocular muscle movements in a pilot study of healthy controls. We highlight the utility of a feature engineering approach to classify activities intended to be representative of true PerfOs. Further, we present our approach for analyzing and ranking the utility of features generated from Earable and discuss how this data may be used to classify activities performed in this study setting. Finally, we highlight limitations of time series analysis on bio-signal data collected over short periods of time compared to a feature-based analytical approach, something we feel is an important consideration for wearable data analysis pipelines. We demonstrated that data generated by Earable can be used to classify certain types of cranial muscle and ocular movements. The data generated in this pilot study suggest that while further work is needed to refine the types of activities more accurately to be employed as PerfOs in a clinical setting, Earable could potentially be used to objectively monitor certain cranial movements, such as eye blinking rate, which is increased in some neuromuscular disorders such as ocular myasthenia gravis and reduced in parkinsonian disorders. [31,32] Additionally, there may be unrealized advantages of measuring multiple types of waveforms simultaneously from a single device, given the demonstrated utility of these waveforms to measure disease in clinical settings. Interestingly, data from this study were consistent with our expectations of which activities may relate to which types of waveforms. For example, feature importance analyses indicated that EOG was associated with contributing largely to gaze or eye movement activities (up, left, and right) when analyzing which features were most important at classifying activities using the full RF model (Table 2). These types of activities would be expected to have EOG as a significant contributing factor, and while we saw that the EMG signal is overall most important for activity classification, the other components do play an important role. In a limited number of cases, the presence of signal artifacts was observed to obfuscate waveform contribution analysis. For instance, this was notable in the Chewing activity, where residual EMG activity that overlapped with typical EEG frequencies persisted in the EEG signal after signal separation, resulting in an overestimate of EEG waveform contribution. While further research and additional clinical validation data are necessary, we feel that there are numerous neuromuscular and/or neurodegenerative conditions that may benefit from improved use of wearable sensor technology like Earable. The main goal of a feature extraction pipeline from specific waveforms as described in this work is to support the development of novel digital endpoints for use in clinical trial settings. These features, either alone or in combination with other features or other types of data generated in a trial, may form the basis for future clinical endpoints after further evaluation for how they measure disease progression or treatment response. Challenges in this pilot study included several feature engineering and evaluation considerations. We chose to directly compare classification accuracy (F1 scores) of models built from both processed sensor data, as well as from raw bio-signal data. Interestingly, we found that regardless of data augmentation, regularization, and other techniques used to counter overfitting (see Methods), the training dataset was observed to be too small to train a generalizable CNN model. However, in clinical settings, the level or amount of data collected in this pilot study may in fact be representative of data collected in a clinical laboratory setting. As such, understanding the most appropriate analysis method for a particular clinical question is of great utility and importance. We note that different analytical approaches may be more suitable for certain types of questions, and there is no “one size fits all” datatype or model that can address all questions. Our findings suggest that there may be some limitations of time series models applied to bio-signal collection data common in clinical research, especially brief (seconds to minutes) PerfO activities. [16] Limitations of this study include the small sample size, especially with respect to more generalizable claims about device usability (in a real-world setting). Additionally, this study was run in healthy control participants, and thus there is difficulty in extrapolating results to a disease population. Future assessments for verification and analytical validation [17] will include: 1) the testing of the device for usability in relevant patient populations, and 2) the use of the device with true PerfO activities for reference dataset creation in disease populations. Despite the above caveats, data from this study suggest that Earable, as well as similar wearable devices, may be promising tools for further development in clinical research settings, opening the door to more objective quantitation of cranial and eye muscle movements. Future clinical validation [17] work in this space will focus on the clinical utility of the Earable analysis pipeline to: 1) test the utility of the Earable device in disease populations, 2) more accurately measure disease progression within participants, 3) test how Earable features or data relate to existing PROs, and finally 4) more accurately measure treatment effects within disease populations, hallmark goals in early clinical development. The use of Earable in longitudinal studies where disease progression may be measured, for example ongoing natural history studies, may help elucidate which features are most important for quantifying disease effects. Finally, the exploratory use of these devices in clinical trials as part of a wearable clinical development strategy may enable more sensitive detection of treatment responses within disease populations. These clinical validation steps may additionally support a strategy to use devices like Earable for passive monitoring purposes. Methods Ethical statement and study approval All participants provided written informed consent prior to the study. Study participants A total of 10 healthy volunteers were recruited for this pilot study. All participants were screened according to the inclusion and exclusion criteria listed below. Inclusion and exclusion criteria Candidates had to satisfy the following to be enrolled in the study: The candidate age 18 or older The candidate demonstrates the ability to understand the Informed Consent Form (ICF) and the willingness to follow all study instructions. The candidate has read and signed the ICF The presence of any of the following eliminated the candidate from enrollment in the study: The candidate is pregnant The candidate has been diagnosed with the following conditions: muscular dystrophy, myasthenia gravis, Amyotrophic lateral sclerosis, multiple sclerosis, spinal muscular atrophy. Study tasks All participants were asked to complete two 45-minute sessions. During each session, each participant was asked to complete a series of tasks listed in Table 3 below. Participants were asked to take a one-minute break between each task. PPT PowerPoint slide PNG larger image TIFF original image Download: Table 3. Tasks and duration of each task study session. https://doi.org/10.1371/journal.pdig.0000061.t003 These tasks were chosen to represent tasks patients with craniofacial neuromuscular disorders may have difficulty completing. Study tasks were based on activities performed during routine neurological examination of cranial nerves, as well as grading systems for facial weakness. [33,34] These activities are commonly used to diagnose and assess severity of several diseases that affect cranial muscles, including facial nerve injury, stroke, and neuromuscular disorders such as myasthenia gravis and amyotrophic lateral sclerosis. Study procedure Each study participant engaged in two study sessions, one in the morning and one at night. Testing sessions were conducted one-on-one by a study moderator. In the morning session, the study moderator reviewed the informed consent form (ICF) with the participant, ensured that he/she understood the form and agreed to participate. The participants had time to ask questions before signing the ICF. The study moderator read a study script, which provided a study overview and description of various study activities. The study moderator then collected participants’ baseline (background) information. The study moderator then had participants perform the following at each study session: Smile broadly and show teeth as hard as possible 1-minute break Wrinkle forehead as tightly as possible 1-minute break Close eyes as tightly as possible 1-minute break Put out cheeks as much as possible 1-minute break Suck in cheeks as much as possible 1-minute break Chewing for 30 seconds 1-minute break Swallowing 1-minute break Close eye normally for 5 seconds 1-minute break Talking 30 seconds 1-minute break Upward gaze for 45 seconds 1-minute break Lateral gaze left for 45 seconds 1-minute break Lateral gaze left for 45 seconds 1-minute break Open and close jaw as much as possible 1-minute break Facial expression—surprise 1-minute break Facial expression—sad 1-minute break Facial expression—angry Participants’ de-identification, confidentiality and data protection Participants in this study were de-identified. The study moderator assigned a unique code number to each participant as a means of referencing participants, such that during data analysis, study team members could not associate participants’ names or any other unique personal identifiers with the study data. The study moderator took all appropriate measures to ensure that the anonymity of each study participant would be maintained. Participants were identified by their initials and a participant identification number only. Data transfer, storage and consolidation All study data was transferred to a shared data storage platform within 2 business days of data collection. Only approved study team members had access to this platform. All task data was labelled with the below annotations (Table 4). PPT PowerPoint slide PNG larger image TIFF original image Download: Table 4. Task label annotation. https://doi.org/10.1371/journal.pdig.0000061.t004 Earable raw sensor data processing and feature engineering Raw Earable data was continuously collected during each activity of the pilot study. To guarantee reliable ground truth data annotations, data from each activity was manually labeled by an expert technician. For each activity, the onset and offset endpoints of each performed activity were annotated accordingly. A time-synchronized video recording of the participant was utilized as a reference source in this annotation procedure. Using these activity annotations, signals were then segmented according to noted onset and offset timestamps. After completion of the activity, the resulting signals from each channel were scaled to counteract the effects of amplification performed in the device hardware for the purpose of noise suppression and filtered offline using a second-order infinite impulse response (IIR) notch filter to remove 60 Hz power line noise. At this stage, each signal contained a mixture of EEG, EMG, and EOG data. A signal separation algorithm was applied to better isolate each of these components, yielding a total of six channels (two each for EEG, EMG, EEG). Following signal scaling, filtering, and separation, the signals of each of the six separated channels were segmented based on the presence or absence of facial movement activity (Fig 1A). A comprehensive approach to feature extraction was taken for further downstream analysis. We chose to process most general features to summarize each waveform, apart from a subset of features specific to EMG, EOG, or EEG activity. Features that would clearly identify mock-PerfO activities performed within the data collection process but would not generalize to performance of the activity outside of laboratory contexts were omitted (for instance, duration of an activity that each participant was instructed to perform for a specified period). Statistical measures from each separated signal segment were computed to summarize signal behavior in the time-domain. Such measures enable depiction of information such as time-varying amplitude behavior, amplitude distributions, and signal trends observable in their raw forms. As the frequency and time-frequency domains also contain vast amounts of information in bio-signal data, digital signal processing (DSP) analyses were performed to decompose each separated signal segment into frequency components and evaluate patterns in this alternative representation (Fig 1). Furthermore, handcrafted features relevant to theoretical EMG, EOG, and EEG behavior during specific mock-PerfO activities were computed to better represent such activities in the summary feature vectors. Together, this yielded 161-dimension feature vector representations for each mock-PerfO activity performed. The features and their high-level categories are described in S1 Table. To remove features potentially irrelevant to activity-based classification, we implemented feature reduction with the Boruta package [27], yielding a lower dimensionality feature vector representations of each mock-PerfO activity. In this process, 60 features that were estimated as “unimportant” were removed from each feature vector, resulting in 101-dimension feature vectors. A Python implementation of the Boruta package (BorutaPy, version 0.3) was used to preform feature reduction. Statistical analyses Correlation of Earable parameters and differences in parameters between activities. Spearman correlations between all parameters and all activities were computed. We used the silhouette method to determine the optimal number of clusters with the factoextra package in R with function fviz_nbclust with 100 bootstrapped samples. For each of the 16 activities, for all the 161 computed Earable parameters, we report the number of tasks analyzed (n), the minimum value (min), maximum value (max), median value (median), mean value (mean), standard deviation of the mean (sd), and standard error of the mean (se) (S3 Table). Relationships between Earable parameters and activity or demographic information. For data from the pilot study, the intraclass correlation coefficients (ICC1) for participants as the group was computed using linear mixed-effects modeling with the lmer package in R, with the following formula: ~(1|participant). ICC1 was computed separately for each of the 16 activities for each of the 161 Earable parameters (S3 Table). Coefficients of variation were also computed comparing within each activity (S3 Table). We additionally computed the within and between trial variability due to repeated measures, time of day, and participants, as well as the variance not explained by these three factors (S3 Table). We used a nested linear mixed effects model to derive the variation explained by each component: ~ 1+ (1|time) + (1|participant) + (1|repeat/time), where the time component indicates time of day (morning or evening), the participant component indicates the subject, and the repeat component indicates the repeat of the same activity nested within the same time. The percent contribution of each of these variance components are reported in S3 Table. Earable data visualizations. Dimensionality reduction of Earable parameters was performed in Python with umap-learn, with an effective minimum distance between embedded points of one, and default parameters. UMAP coordinates were plotted with ggplot2 in R. Heatmaps of Earable parameters are displayed with individual activities (trials) as columns and Earable parameters as rows. All heatmaps of Earable data display z-scored parameter rows, computed across all activities. Heatmaps were constructed with the ComplexHeatmap package in R. Quantifying pilot study activities and participant-level predictions. To investigate how Earable features could be used to classify each of the 16 activities we implemented multi-class classification models using the Python sklearn module. We used a random forest classifier (using the sklearn RandomForestClassifier class) with 500 decision trees for model building. In each classification setting model training and validation was performed using 80% of the dataset while the remaining 20% of the dataset with withheld for testing. Data samples were assigned to one of the two subsets at random to reduced bias in evaluation results. The F1 score was calculated to evaluate model performances on the test set. The F1 score is the harmonic mean of precision and recall and elucidates the number of predictions that were accurate by the model, balancing both false negatives and false positives. CNN model of activity level prediction. In recent years, Deep Learning models have been used to achieve high performance in many tasks relevant to classification of bio-signal data. [35] Among the many popular Deep Learning architectures leveraged in such tasks, convolutional neural networks (CNNs) are widely used for their ability to learn patterns in structured, multidimensional data (e.g., time-frequency signal representations). However, it is not always clear whether deep learning approaches like CNNs are preferable over more traditional feature-based approaches, especially for short duration bio-signal data. Researchers have demonstrated the accurate inference of movement and gesture activities using CNNs trained on electrophysiological signals acquired from small participant groups. For example, Alias et al. deployed CNN architectures to classify gait activity of six subjects using surface EMG (sEMG) signals. [36] Each participant of this study performed only six trials of walking according to five different gait modes (walking at slow, normal, and fast speeds on a level surface and walking up and down an incline). Ultimately, their proposed CNN model achieved a gait classification accuracy of 77.95%, a 10.15% increase in accuracy as compared to their traditional, feature-based Support Vector Machine model. Briouza et al. similarly proposed a shallow CNN architecture for highly accurate classification of hand movements when trained on sEMG data from the Ninapro-DB2 dataset (a popular hand-motion classification database consisting of sEMG signals of 49 hand motions from 40 participants). [37,38] A hybrid CNN-LSTM (long short-term memory) network consistently outperformed Random Forest Regression and Support Vector Regression models in intra-session and inter-session evaluations of wrist kinematic estimations for data from six healthy participants. [39] In applying such methodologies to the task of mock-PerfO activity-level classification, 16-class CNN classification models were developed and analyzed. These CNN models were constructed to map 2-dimensional spectrogram representations of the mock-PerfO activity signal segments to a probability distribution over the 16 classes. In effort to maximize the generalizability of the latent features learned by the trained CNN, data augmentation was employed in effort to maximize the diversity that we see in the training set. Each time a signal segment is read into the training data set, multiple random croppings of this segment are also added to the training set. To an extent, this allowed us to increase the size of our training dataset without collecting additional samples, helping to counter overfitting. To maintain constant length input signals among the mock-PerfO activities that varied in duration, activity segments shorter in duration than the fixed input data duration (30 seconds) were repeated after shifting the segment according to the randomized cropping scheme, while segments longer in duration were truncated to the fixed input data duration via randomized cropping. Data augmentation was not performed for the testing set as it would bias the resulting model performance estimate. Additional techniques applied to reduce model variance included the use of L2 kernel regularization [40] in the convolutional and fully connected model layers and the inclusion of Dropout layers [41] throughout the network. Ultimately, following development and evaluation on training and validation datasets, a shallow CNN, depicted in S1 Fig, was trained, and employed for testing purposes. Supporting information S1 Table. Features computed from Earable waveforms. Table shows Feature Name, Unit, Group, and Domain. Each feature was computed for the separated EMG, EOG, and EEG signals separately, unless specifically noted. https://doi.org/10.1371/journal.pdig.0000061.s001 (XLSX) S2 Table. Variance components design. Variance components were calculated in this pilot study and recorded in S3 Table for trial repeats, participants, and time of day (time), for each of the 16 mock-PerfO tasks. https://doi.org/10.1371/journal.pdig.0000061.s002 (XLSX) S3 Table. Spearman correlations and Variance components. A: ICC values for each of the 16 activities for all Earable parameters. B: %CV values for each of the 16 activities for all Earable parameters. C: Number of tasks analyzed (n), the minimum value (min), maximum value (max), median value (median), mean value (mean), standard deviation of the mean (sd), and standard error of the mean (se). D: Variance components for each Earable feature for each of the 16 activities as shown in the design in S2 Table. https://doi.org/10.1371/journal.pdig.0000061.s003 (XLSX) S1 Fig. Mock-PerfO Activity-Level CNN Classifier Architecture. Architecture diagram of the final CNN implemented for activity classification. A single channel spectrogram computed from the segmented waveform is input to the model at classification time. A probability distribution over each of the 16 activities is output. The activity associated with the highest output likelihood estimate is inferred. https://doi.org/10.1371/journal.pdig.0000061.s004 (DOCX) Acknowledgments The authors would like to acknowledge Cynthia PortalCelhay for her contributions to the protocol design, as well as Tong Shen for helpful discussions and contributions to the interpretation of the results of this study. [END] --- [1] Url: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000061 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/