Predicting Multiple Sclerosis Outcomes During the COVID-19 Stay-at-home Period: Observational Study Using Passively Sensed Behaviors and Digital Phenotyping

doi:10.2196/38495

Original Paper

¹School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, United States

²Department of Neurology, University of Pittsburgh, Pittsburgh, PA, United States

³Information School, University of Washington, Seattle, Seattle, WA, United States

*these authors contributed equally

Corresponding Author:

Zongqi Xia, MD, PhD

Department of Neurology

University of Pittsburgh

3501 Fifth Avenue,

BST3, Suite 7014

Pittsburgh, PA, 15260

United States

Phone: 1 412 383 5377

Email: zxia1@pitt.edu

Background: The COVID-19 pandemic has broad negative impact on the physical and mental health of people with chronic neurological disorders such as multiple sclerosis (MS).

Objective: We presented a machine learning approach leveraging passive sensor data from smartphones and fitness trackers of people with MS to predict their health outcomes in a natural experiment during a state-mandated stay-at-home period due to a global pandemic.

Methods: First, we extracted features that capture behavior changes due to the stay-at-home order. Then, we adapted and applied an existing algorithm to these behavior-change features to predict the presence of depression, high global MS symptom burden, severe fatigue, and poor sleep quality during the stay-at-home period.

Results: Using data collected between November 2019 and May 2020, the algorithm detected depression with an accuracy of 82.5% (65% improvement over baseline; F₁-score: 0.84), high global MS symptom burden with an accuracy of 90% (39% improvement over baseline; F₁-score: 0.93), severe fatigue with an accuracy of 75.5% (22% improvement over baseline; F₁-score: 0.80), and poor sleep quality with an accuracy of 84% (28% improvement over baseline; F₁-score: 0.84).

Conclusions: Our approach could help clinicians better triage patients with MS and potentially other chronic neurological disorders for interventions and aid patient self-monitoring in their own environment, particularly during extraordinarily stressful circumstances such as pandemics, which would cause drastic behavior changes.

JMIR Ment Health 2022;9(8):e38495

doi:10.2196/38495

Keywords

The COVID-19 pandemic and the ensuing response (eg, lockdown and social distancing) have broad negative impacts on physical and mental health worldwide [Czeisler M, Lane RI, Petrosky E, Wiley JF, Christensen A, Njai R, et al. Mental health, substance use, and suicidal ideation during the COVID-19 pandemic - United States, June 24-30, 2020. MMWR Morb Mortal Wkly Rep 2020 Aug 14;69(32):1049-1057 [FREE Full text] [CrossRef] [Medline]1-Lebel C, MacKinnon A, Bagshawe M, Tomfohr-Madsen L, Giesbrecht G. Elevated depression and anxiety symptoms among pregnant individuals during the COVID-19 pandemic. J Affect Disord 2020 Dec 01;277:5-13 [FREE Full text] [CrossRef] [Medline]7]. The effect is more pronounced for people with chronic neurological diseases such as multiple sclerosis (MS) [Motolese F, Rossi M, Albergo G, Stelitano D, Villanova M, Di Lazzaro V, et al. The psychological impact of COVID-19 pandemic on people with multiple sclerosis. Front Neurol 2020 Oct 30;11:580507 [FREE Full text] [CrossRef] [Medline]8-Broche-Pérez Y, Jiménez-Morales RM, Monasterio-Ramos LO, Vázquez-Gómez LA, Fernández-Fleites Z. Fear of COVID-19, problems accessing medical appointments, and subjective experience of disease progression, predict anxiety and depression reactions in patients with Multiple Sclerosis. Mult Scler Relat Disord 2021 Aug;53:103070 [FREE Full text] [CrossRef] [Medline]10]. People with MS have a significantly higher burden of mental health comorbidities than the general population. Moreover, people with MS have a 50% lifetime prevalence of depression, 2-3 times higher than the general population [Patten SB, Marrie RA, Carta MG. Depression in multiple sclerosis. Int Rev Psychiatry 2017 Oct 06;29(5):463-472. [CrossRef] [Medline]11-Solaro C, Gamberini G, Masuccio FG. Depression in multiple sclerosis: epidemiology, aetiology, diagnosis and treatment. CNS Drugs 2018 Feb 7;32(2):117-133. [CrossRef] [Medline]13]. Given its association with higher disability and mortality, depression is a major comorbidity that lowers the quality of life [Patten SB, Marrie RA, Carta MG. Depression in multiple sclerosis. Int Rev Psychiatry 2017 Oct 06;29(5):463-472. [CrossRef] [Medline]11,Siegert RJ, Abernethy DA. Depression in multiple sclerosis: a review. J Neurol Neurosurg Psychiatry 2005 Apr 01;76(4):469-475 [FREE Full text] [CrossRef] [Medline]14-Strober LB, Arnett PA. An examination of four models predicting fatigue in multiple sclerosis. Arch Clin Neuropsychol 2005 Jul;20(5):631-646. [CrossRef] [Medline]20]. Further, people with MS have greater COVID-19 risk due to certain immune disease-modifying therapies as well as their physical disability, and many have experienced drastic change in their neurological care due to the pandemic [Levin SN, Venkatesh S, Nelson KE, Li Y, Aguerre I, Zhu W, Multiple Sclerosis Resilience to COVID-19 (MSReCOV) Collaborative. Manifestations and impact of the COVID-19 pandemic in neuroinflammatory diseases. Ann Clin Transl Neurol 2021 Apr 22;8(4):918-928 [FREE Full text] [CrossRef] [Medline]21]. Concerns for COVID-19, coupled with decreased social support and health care access during the pandemic, have contributed to even higher stress and depression in people with MS [Broche-Pérez Y, Jiménez-Morales RM, Monasterio-Ramos LO, Vázquez-Gómez LA, Fernández-Fleites Z. Fear of COVID-19, problems accessing medical appointments, and subjective experience of disease progression, predict anxiety and depression reactions in patients with Multiple Sclerosis. Mult Scler Relat Disord 2021 Aug;53:103070 [FREE Full text] [CrossRef] [Medline]10,Vogel AC, Schmidt H, Loud S, McBurney R, Mateen FJ. Impact of the COVID-19 pandemic on the health care of >1,000 People living with multiple sclerosis: A cross-sectional study. Mult Scler Relat Disord 2020 Nov;46:102512 [FREE Full text] [CrossRef] [Medline]22-Levit E, Cohen I, Dahl M, Edwards K, Weinstock-Guttman B, Ishikawa T, Multiple Sclerosis Resilience to COVID-19 (MSReCOV) Collaborative. Worsening physical functioning in patients with neuroinflammatory disease during the COVID-19 pandemic. Mult Scler Relat Disord 2022 Feb;58:103482 [FREE Full text] [CrossRef] [Medline]24].

During the pandemic, digital technologies have become invaluable for supporting social interaction, health care access, and health monitoring. Digital health tools can also measure an individual’s mental health profile based on passive (noninvasive) tracking. Given the complexity and heterogeneity of real-world behaviors, models that leverage different aspects of an individual’s daily behaviors are necessary to accurately predict mental health status. Relevant to depression in people with MS, clinicians could use this digital passive sensing approach to potentially identify patients who require urgent health interventions.

Past research has leveraged passively generated data from personal digital devices (eg, smartphones and fitness trackers) to capture human behavior and predict health outcomes. This moment-by-moment, in situ quantification of the individual-level human phenotype using data from personal digital devices is known as digital phenotyping [Huckvale K, Venkatesh S, Christensen H. Toward clinical digital phenotyping: a timely opportunity to consider purpose, quality, and safety. NPJ Digit Med 2019 Sep 6;2(1):88 [FREE Full text] [CrossRef] [Medline]25]. Previous works using passively sensed smartphone and wearable data to predict physical disability and fatigue in people with MS have been exploratory in assessing the feasibility of data collection and the preliminary association between sensed behaviors and outcomes [Newland P, Wagner JM, Salter A, Thomas FP, Skubic M, Rantz M. Exploring the feasibility and acceptability of sensor monitoring of gait and falls in the homes of persons with multiple sclerosis. Gait Posture 2016 Sep;49:277-282. [CrossRef] [Medline]26-Chitnis T, Glanz BI, Gonzalez C, Healy BC, Saraceno TJ, Sattarnezhad N, et al. Quantifying neurologic disease using biosensor measurements in-clinic and in free-living settings in multiple sclerosis. NPJ Digit Med 2019 Dec 11;2(1):123 [FREE Full text] [CrossRef] [Medline]28]. However, the clinical applicability of digital phenotyping to inform clinical outcomes in people with MS in the real world has not yet been established.

Here, we present a machine learning approach leveraging data from the smartphones and fitness trackers of people with MS to predict their health outcomes during a mandatory stay-at-home period of the pandemic. Building on an existing analytical pipeline [Chikersal P, Doryab A, Tumminia M, Villalba DK, Dutcher JM, Liu X, et al. Detecting depression and predicting its onset using longitudinal symptoms captured by passive sensing. ACM Trans. Comput.-Hum. Interact 2021 Feb 28;28(1):1-41. [CrossRef]29], we quantified behavior changes during the stay-at-home period when compared to the preceding period and used the changes to predict the presence of patient-reported outcomes of depression, neurological disability, fatigue, and poor sleep quality during the stay-at-home period. This study is different from prior studies in that it examines the clinical utility of digital phenotyping with passive sensors for predicting health outcomes during the early wave of the COVID-19 pandemic in a unique natural experiment. The study has relevance for predicting the health outcomes of patients with chronic and complex conditions beyond MS during major stressful scenarios (eg, pandemics and natural disasters) that could considerably alter behaviors.

Overview

This study was part of a larger study that aimed to examine the clinical utility of passive sensors on smartphones and fitness trackers in predicting clinically relevant outcomes in people with MS. Data collection from participants in this larger study occurred between November 2019 and January 2021. Because data collection for 56 participants spanned the locally mandated stay-at-home period in response to the COVID-19 pandemic, we used this unique natural experiment to test the hypothesis whether machine learning models leveraging passive sensor data can predict the health outcomes of people with a chronic neurological disorder (ie, people with MS) during major stressful scenarios.

To briefly summarize our approach, we used data from 3 sensors in the participants’ smartphones (calls, location, and screen activity) and 3 sensors in the participants’ fitness trackers (heart rate, sleep, and steps) to predict patient-reported outcomes of depression, global MS symptom burden, fatigue, and sleep quality during the COVID-19 stay-at-home period. We computed behavioral features from these 6 sensors before and during the stay-at-home period and took the difference as a measure of behavior change resulting from the stay-at-home mandate. We then used changes in behavioral features to predict the outcomes.

All methods were performed in accordance with the institutional review board guideline and institutional regulation.

Participants

The study included adults 18 years or older with a neurologist-confirmed MS diagnosis who owned a smartphone (Android or iOS) and enrolled in the Prospective Investigation of Multiple Sclerosis in the Three Rivers Region study, a clinic-based natural history study at the University of Pittsburgh Medical Center [Levin SN, Venkatesh S, Nelson KE, Li Y, Aguerre I, Zhu W, Multiple Sclerosis Resilience to COVID-19 (MSReCOV) Collaborative. Manifestations and impact of the COVID-19 pandemic in neuroinflammatory diseases. Ann Clin Transl Neurol 2021 Apr 22;8(4):918-928 [FREE Full text] [CrossRef] [Medline]21,Levin SN, Riley CS, Dhand A, White CC, Venkatesh S, Boehm B, et al. Association of social network structure and physical function in patients with multiple sclerosis. Neurology 2020 Aug 07;95(11):e1565-e1574. [CrossRef]30-Epstein S, Xia Z, Lee AJ, Dahk M, Edwards K, Levit E, Multiple Sclerosis Resilience to COVID-19 (MSReCOV) Collaborative. . Vaccination against SARS-CoV-2 in neuroinflammatory disease: early safety/tolerability data. . Mult Scler Relat Disord 2022 Jan;57:103433. [CrossRef]34].

Ethical Considerations

The institutional review boards of University of Pittsburgh (STUDY19080007) and Carnegie Mellon University (STUDY2019_00000037) approved the study. All participants provided written informed consent.

Study Design

The participants downloaded a mobile app to capture sensor data from their own smartphones and additionally received a Fitbit Inspire HR (Fitbit Inc) to track steps, heart rate, and sleep. Data were continuously collected from smartphone and Fitbit sensors of 56 participants during the study period (16 November 2019 to 15 May 2020, including the local stay-at-home period).

All 56 (100%) participants completed data collection for a predefined period of 12 weeks while 39 (70%) agreed to extend data collection for an additional 12 weeks (for a total of 24 weeks). Moreover, 6 (11%) participants who did not have sufficient data during the period before the stay-at-home mandate were excluded from the machine learning analysis.

Survey Response and Patient-Reported Outcomes

All participants completed a baseline questionnaire, which queried their demographics and baseline health outcomes, on the Saturday following enrollment. During the study, the participants completed additional questionnaires, as described below, at intervals according to each questionnaire. All questionnaires for the overall study were administered via the web using the secure, web-based Research Electronic Data Capture system, including the stay-at-home period [Harris PA, Taylor R, Minor BL, Elliott V, Fernandez M, O'Neal L, REDCap Consortium. The REDCap consortium: Building an international community of software platform partners. J Biomed Inform 2019 Jul;95:103208 [FREE Full text] [CrossRef] [Medline]35,Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform 2009 Apr;42(2):377-381 [FREE Full text] [CrossRef] [Medline]36].

Depression

We used the Patient Health Questionnaire-9 (PHQ-9) to measure the severity of depression symptoms once every 2 weeks [Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 2001 Sep;16(9):606-613 [FREE Full text] [CrossRef] [Medline]37]. PHQ-9 contained 9 questions, with each answer being scored on a scale of 0 to 3. Higher scores indicated more severe depressive symptoms.

Global MS Symptom Burden

We used the Multiple Sclerosis Rating Scale—Revised (MSRS-R) to measure global MS symptom burden and neurological disability once every 4 weeks [Wicks P, Vaughan TE, Massagli MP. The multiple sclerosis rating scale, revised (MSRS-R): development, refinement, and psychometric validation using an online community. Health Qual Life Outcomes 2012 Jun 18;10:70 [FREE Full text] [CrossRef] [Medline]38]. MSRS-R assessed 8 neurological domains (walking, upper limb function, vision, speech, swallowing, cognition, sensory, bladder, and bowel function); each domain scored as 0 to 4, with 0 indicating the absence of symptom and 4 indicating higher symptom burden and more severe disability.

Fatigue

We used the 5-item version of the Modified Fatigue Impact Scale (MFIS-5) to measure the impact of fatigue on cognitive, physical, and psychosocial function once every 4 weeks [Meca-Lallana V, Brañas-Pampillón M, Higueras Y, Candeliere-Merlicco A, Aladro-Benito Y, Rodríguez-De la Fuente O, et al. Assessing fatigue in multiple sclerosis: Psychometric properties of the five-item Modified Fatigue Impact Scale (MFIS-5). Mult Scler J Exp Transl Clin 2019 Nov 09;5(4):2055217319887987 [FREE Full text] [CrossRef] [Medline]39]. Each item in MFIS-5 was scored on a 5-point Likert scale from 0 (never) to 4 (almost always). Higher scores indicated more severe fatigue.

Sleep Quality

We used the Pittsburgh Sleep Quality Index (PSQI) to measure sleep disturbances once every 4 weeks [Buysse DJ, Reynolds CIII, Monk TH, Hoch CC, Yeager AL, Kupfer DJ. Quantification of subjective sleep quality in healthy elderly men and women using the Pittsburgh sleep quality index (PSQI). Sleep 1991;14(4):331-338. [CrossRef]40]. PSQI comprised 19 individual items, with 7 component scores (each on a 0-3 scale) and 1 composite score (0 to 21, where higher scores indicate a poorer sleep quality).

For each outcome, we averaged the measures collected during the stay-at-home-period and then dichotomized the resulting outcomes using thresholds. The binary outcomes would likely have better clinical utility as they are more easily understood by patients (for self-monitoring), volunteers with limited mental health training, or even clinicians. For “Depression,” PHQ-9 scores were dichotomized as “≥5: presence of depression” and “<5: absence of depression.” For “Global MS symptom burden,” MSRS-R scores were dichotomized as “≥6.4: higher burden” and “<6.4: lower burden.” For “Fatigue,” MSIF-5 scores were dichotomized as “≥8: high fatigue” and “<8: low fatigue.” For “Sleep quality,” PSQI scores were dichotomized as “≥9: poorer sleep quality” and “<9: better sleep quality.” The thresholds for depression and sleep quality were based on previous works [Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 2001 Sep;16(9):606-613 [FREE Full text] [CrossRef] [Medline]37,Fictenberg NL, Putnam SH, Mann NR, Zafonte RD, Millard AE. Insomnia screening in postacute traumatic brain injury: utility and validity of the Pittsburgh Sleep Quality Index. Am J Phys Med Rehabil 2001 May;80(5):339-345. [CrossRef] [Medline]41]. Given the lack of consensus from the literature, we calculated the median scores of the global MS symptom burden and fatigue in a larger data set of 104 people with MS, of which the 56 (53.8%) people with MS in this paper represented a subgroup (with data collection encompassing the stay-at-home period) and used the median scores as the thresholds.

Sensor Data Collection

Each participant installed a mobile app based on the AWARE framework [Ferreira D, Kostakos V, Dey AK. AWARE: mobile context instrumentation framework. Front. ICT 2015 Apr 20;2:1-9. [CrossRef]42], which provided backend and network infrastructure that unobtrusively collected from smartphones the location, screen usage (ie, when the screen status changed to on or off and locked or unlocked), and call logs (for incoming, outgoing, and missed calls). Further, participants wore a Fitbit Inspire HR, which captured the number of steps, sleep status (asleep, awake, restless, or unknown), and heart rate. Calls and screen use were event-based sensor streams, whereas location, steps, sleep, and heart rate were time series sensor streams. We sampled location coordinates at 1 sample per 10 minutes, and steps, sleep, and heart rate at 1 sample per minute.

Data from AWARE were deidentified and automatically transferred over WiFi to a study server at regular intervals. Data from the Fitbit were retrieved using the Fitbit application programming interface at the end of the data collection. Participants were asked to keep their devices charged and to always carry their phone and wear Fitbit.

To protect confidentiality, we removed identifiable information (eg, names and contact information) from survey and sensor data prior to analysis. We followed the standard practice for sensor data security.

Mediation Analysis

Mediation analysis was performed using the nondichotomized outcomes (ie, the average of the patient-reported outcomes collected during the stay-at-home-period). Process Macro in SPSS (IBM Corp) was used for mediation analysis [Hayes A. Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach. New York, US: Guilford Press; 2017.43].

Data Processing and Machine Learning

The data processing and analysis pipeline (Figure 1) were built on our prior work [Chikersal P, Doryab A, Tumminia M, Villalba DK, Dutcher JM, Liu X, et al. Detecting depression and predicting its onset using longitudinal symptoms captured by passive sensing. ACM Trans. Comput.-Hum. Interact 2021 Feb 28;28(1):1-41. [CrossRef]29] and involved several steps:

Feature extraction from sensors over time slices to identify behavior changes.
Handling missing features.
Machine learning to predict patient-reported health outcomes during the stay-at-home period:
- Using 1-sensor models (ie, models containing features from 1 sensor).
- Combining 1-sensor models to obtain the best model for each outcome.

Figure 1. Data processing and analysis pipeline. (A) Machine learning pipeline for predicting depression (Patient Health Questionnaire-9 [PHQ-9]), global MS symptom burden (Multiple Sclerosis Rating Scale—Revised [MSRS-R]), fatigue (Modified Fatigue Impact Scale-5 [MFIS-5]), and sleep quality (Pittsburgh Sleep Quality Index [PSQI]) using passive sensors from smartphones and fitness trackers. (B) For each sensor during the pre–stay-at-home period and the stay-at-home period, each feature was extracted from 15 time slices. The pre–stay-at-home features were subtracted from the stay-at-home features to obtain the behavior change features. First, raw data from the device sensor were preprocessed and then filtered by a time-of-the-day epoch and a days-of-the-week option. Features were then extracted from the selected raw data.

Feature Extraction

We computed features from the 6 sensors of calls, heart rate, location, screen, sleep, and steps, given their potential to inform depressive symptoms [Chikersal P, Doryab A, Tumminia M, Villalba DK, Dutcher JM, Liu X, et al. Detecting depression and predicting its onset using longitudinal symptoms captured by passive sensing. ACM Trans. Comput.-Hum. Interact 2021 Feb 28;28(1):1-41. [CrossRef]29,Saeb S, Zhang M, Kwasny MM, Karr CJ, Kording K, Mohr DC. The relationship between clinical, momentary, and sensor-based assessment of depression. Int Conf Pervasive Comput Technol Healthc 2015 Aug;2015:1-10 [FREE Full text] [CrossRef] [Medline]44-Xu X, Chikersal P, Doryab A, Villalba DK, Dutcher JM, Tumminia MJ, et al. Leveraging routine behavior and contextually-filtered features for depression detection among college students. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol 2019 Sep 09;3(3):1-33. [CrossRef]48], as well as fatigue [Tong C, Craner M, Vegreville M, Lane ND. Tracking fatigue and health state in multiple sclerosis patients using connnected wellness devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol 2019 Sep 09;3(3):1-19. [CrossRef]49], MS symptom burden such as decreased mobility [Shammas L, Zentek T, von Haaren B, Schlesinger S, Hey S, Rashid A. Home-based system for physical activity monitoring in patients with multiple sclerosis (Pilot study). Biomed Eng Online 2014 Feb 06;13:10 [FREE Full text] [CrossRef] [Medline]27], and sleep quality [Min JK, Doryab A, Wiese J, Amini S, Zimmerman J, Hong JI. Toss 'n' turn: smartphone as sleep and sleep quality detector. 2014 Presented at: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems; April 26, 2014; Toronto, Canada p. 477-486. [CrossRef]50,Sano A, Phillips AJ, Yu AZ, McHill AW, Taylor S, Jaques N, et al. Recognizing academic performance, sleep quality, stress level, and mental health using personality traits, wearable sensors and mobile phones. 2015 Presented at: IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN); October 19, 2015; Cambridge, MA, USA p. 1-6.51].

Location features captured mobility patterns. Steps and heart rate captured the extent of physical activities. Calls features captured communication patterns. Screen features might inform the ability for concentration [Demirci K, Akgönül M, Akpinar A. Relationship of smartphone use severity with sleep quality, depression, and anxiety in university students. J Behav Addict 2015 Jun;4(2):85-92 [FREE Full text] [CrossRef] [Medline]52,Kwon M, Lee J, Won W, Park J, Min J, Hahn C, et al. Development and validation of a smartphone addiction scale (SAS). PLoS One 2013;8(2):e56936 [FREE Full text] [CrossRef] [Medline]53] and the extent of sedentary behavior [Costigan SA, Barnett L, Plotnikoff RC, Lubans DR. The health indicators associated with screen-based sedentary behavior among adolescent girls: a systematic review. J Adolesc Health 2013 Apr;52(4):382-392. [CrossRef] [Medline]54], despite potential caveats for people with MS and other chronic neurological disorders. Sleep features captured sleeping duration and patterns, which could indicate sleep disturbance (eg, insomnia or hypersomnia) associated with depression [Nutt D, Wilson S, Paterson L. Sleep disorders as core symptoms of depression. Dialogues in Clinical Neuroscience 2022 Apr 01;10(3):329-336. [CrossRef]55]. Please see

Multimedia Appendix 1

Supplementary material.

DOCX File , 18995 KB Multimedia Appendix 1 (section A.1 [,,,-]) for details of features extracted from each sensor.

Features from the 6 sensors were extracted over a range of temporal slices (Figure 1B) preceding and during the stay-at-home period. For each period, we obtained the daily averages of these features by computing the average of the daily feature values. We computed features of behavior changes by subtracting the daily averages of features during the baseline (pre–stay-at-home) period from the stay-at-home period for the machine learning models.

Temporal Slicing

The temporal slicing approach extracted sensor features from different time segments (Figure 1B). Past work showed that this approach can better define the relationship between a feature and depression. For example, Chow et al [Chow PI, Fua K, Huang Y, Bonelli W, Xiong H, Barnes LE, et al. Using mobile sensing to test clinical models of depression, social anxiety, state affect, and social isolation among college students. J Med Internet Res 2017 Mar 03;19(3):e62 [FREE Full text] [CrossRef] [Medline]60] found no relationship between depression and the time spent at home during 4-hour time windows, but they found that people with more severe depression tended to spend more time at home between 10 AM and 6 PM. Similarly, Saeb et al [Saeb S, Lattie EG, Schueller SM, Kording KP, Mohr DC. The relationship between mobile phone location sensor data and depressive symptom severity. PeerJ 2016;4:e2537 [FREE Full text] [CrossRef] [Medline]61] found that the same behavioral feature calculated over weekdays and weekends could have a very different association with depression. Here, we obtained all available data (spanning multiple days of the study) from a specific epoch or time segment of the day (all day, night [ie, 12 AM-6 AM], morning [ie, 6 AM-12 PM], afternoon [ie, 12 PM-6 PM], and evening [ie, 6 PM-12 AM]) and for specific days of the week (all days of the week, weekdays only [ie, Monday-Friday], and weekends only [ie, Saturday-Sunday]) to achieve 15 data streams or temporal slices. To extract features from each of the 15 temporal slices, we first computed daily features, averaged daily features from the pre–stay-at-home period, and averaged daily features from the stay-at-home period. We then subtracted the pre–stay-at-home feature matrix from the stay-at-home feature matrix to obtain the behavior change features. We concatenated the resulting 15 temporal slices of behavior change features to derive the final feature matrix.

Feature Matrix

After feature extraction, each of the 6 sensors had a feature matrix, with each sample containing a participant’s feature vector comprising behavior change features from 15 different temporal slices.

Handling Missing Data

Missing sensor data can occasionally occur due to several reasons. Our approach for handling missing data is described in

Multimedia Appendix 1

Supplementary material.

DOCX File , 18995 KB Multimedia Appendix 1 (section A.2).

Machine Learning Using Nested Feature Selection

We built machine learning models to predict dichotomized outcomes using the data set, building on a published approach [Chikersal P, Doryab A, Tumminia M, Villalba DK, Dutcher JM, Liu X, et al. Detecting depression and predicting its onset using longitudinal symptoms captured by passive sensing. ACM Trans. Comput.-Hum. Interact 2021 Feb 28;28(1):1-41. [CrossRef]29], and validated our models using leave-5-participants-out cross-validation to minimize overfitting. The model generation process followed these steps:

Stable feature selection using randomized logistic regression, leveraging temporal slices.
Training and validating 1-sensor models for each of the 6 feature sets of calls, heart rate, location, screen, sleep, and steps.
Obtaining predictions from combinations of sensors by combining detection probabilities from 1-sensor models to identify the best performing model.
Classifying different outcomesby running the pipeline for each outcome.

Stable Feature Selection

To enable stable feature selection from a vast number of behavioral features, Chikersal et al [Chikersal P, Doryab A, Tumminia M, Villalba DK, Dutcher JM, Liu X, et al. Detecting depression and predicting its onset using longitudinal symptoms captured by passive sensing. ACM Trans. Comput.-Hum. Interact 2021 Feb 28;28(1):1-41. [CrossRef]29] proposed an approach called “nested randomized logistic regression,” which we deployed in this study. This method decomposed the feature space for each sensor by grouping features from the same time slices and performed randomized logistic regression on each of these groups. The selected features from all groups (ie, all time slices) are then concatenated to give a new and much smaller set of features. Next, we performed randomized logistic regression on this new set of features to extract the final selected features for the sensor. We performed the nested feature selection for each of the six 1-sensor models, thereby nesting the process. This method was performed in a leave-5-participants-out manner such that the model used to detect an outcome for a participant did not include that person during the feature selection process. More details about this method can be found in

Multimedia Appendix 1

Supplementary material.

DOCX File , 18995 KB Multimedia Appendix 1 (section A.3).

Training and Validating 1-Sensor Models

For each sensor, we built a model of the selected features from that sensor to detect an outcome. We used leave-5-participants-out cross-validation to choose the parameters for that model. We trained models using the following 2 machine learning algorithms: logistic regression and gradient boosting classifier [Chikersal P, Doryab A, Tumminia M, Villalba DK, Dutcher JM, Liu X, et al. Detecting depression and predicting its onset using longitudinal symptoms captured by passive sensing. ACM Trans. Comput.-Hum. Interact 2021 Feb 28;28(1):1-41. [CrossRef]29]. We chose the model with the best F₁-score for a given outcome, which provides the detection probabilities for the outcome. The process is independent of other outcomes.

Obtaining Predictions From Combinations of Sensors

The detection probabilities from all six 1-sensor models were concatenated into a single feature vector and given as input to an ensemble classifier (ie, AdaBoost with gradient boosting classifier as a base estimator), which then outputted the final label for the outcome. For all outcomes, only the detection probabilities of the positive label “1” were concatenated. The positive label was the “presence of depression” for “depression,” “high burden” for “global MS symptom burden,” “severe fatigue” for “fatigue,” and “poor sleep quality” for “sleep quality.” The “n_estimators” (the maximum number of estimators at which boosting is terminated) parameter was tuned during leave-5-participants-out cross-validation to achieve the best-performing combined model.

To analyze the usefulness of each sensor, we implemented a feature ablation analysis by generating detection results for all possible combinations of 1-sensor models. For six 1-sensor models, there were 57 combinations of feature sets, as the total combinations = combinations with 2 sensors + ... + combinations with 6 sensors =

Classifying Different Outcomes

This pipeline of training and validating six 1-sensor models and 57 combined models was run independently for each of the 4 outcomes. For each outcome, we reported the performance based on the best combination of sensors. We also reported the performance of baseline models (ie, a simple majority classifier whereby every point is assigned to whichever is in the majority in the training set) as well as models containing all 6 sensors.

Participant Characteristics

The characteristics of the 56 participants were representative of the typical MS study (median age 43.5 years; n=48, 86% women). Table 1 shows the detailed participant characteristics.

Table 1. Study participant characteristics.

Variable		Value
Sex, n (%)
	Female	48 (86)
	Male	8 (14)
Race, n (%)
	White	51 (91)
	African or African American	5 (9)
Ethnicity, n (%)
	Non-Hispanic or Latino	55 (98)
	Hispanic or Latino	1 (2)
Age (years), median (IQR)		43.5 (37-52)
Time elapsed (years) from age of first neurological symptom onset to study participation, median (IQR)		13.0 (6.7-17.4)
PDDS^a score at start of study, median (IQR)		1 (0-3)
Disease-modifying treatment, n (%)
	Higher efficacy	38 (68)
	Standard efficacy	12 (21)
Depression diagnosis, n (%)
	Not diagnosed with clinical depression before study enrollment	39 (70)
	Diagnosed with clinical depression before study enrollment	17 (30)
Pharmacotherapy for depression, n (%)
	Not taking medication for depression before study enrollment	39 (70)
	Taking medication for depression before study enrollment	17 (30)
Nonpharmacotherapy for depression, n (%)
	Not receiving nonmedication therapy for depression before study enrollment	52 (93)
	Receiving nonmedication therapy for depression before study enrollment	4 (7)
Study outcomes: average measures during the stay-at-home period, median (IQR)
	PHQ-9^b (depression)	3.7 (0.0-7.4)
	MSRS-R^c (global MS^d symptom burden)	7.5 (3.4-10.3)
	MFIS-5^e (fatigue)	8.0 (4.6-11.0)
	PSQI^f (sleep quality)	11.0 (7.8-14.3)

^aPDDS: Patient Determined Disease Steps.

^bPHQ-9: Patient Health Questionnaire-9.

^cMSRS-R: Multiple Sclerosis Rating Scale—Revised.

^dMS: multiple sclerosis.

^eMFIS-5: Modified Fatigue Impact Scale-5.

^fPSQI: Pittsburgh Sleep Quality Index.

Interrelated Outcomes

The main study outcome is patient-reported depression as well as associated neurological symptom burden, fatigue, and sleep quality. We measured the Pearson correlations among the average values of the 4 outcomes during the stay-at-home period for the participants. Depression severity (PHQ-9) correlated with the global MS symptom burden (MSRS-R), fatigue severity (MFIS-5), and sleep quality (PSQI; Figure 2).

To dissect the complex relationship among these outcomes to inform better patient monitoring and guide potentially more precise interventions, we performed mediation analysis (Figure 3). When MFIS-5 and PSQI were both included as mediators in the model (path c’), the association between MSRS-R and PHQ-9 was no longer significant (effect size=0.13 and the bias-corrected bootstrap confidence intervals=–0.14 and 0.40). However, the association between MSRS-R and PHQ-9 through MFIS-5 (path a1b1) remained significant (effect size=0.34 and the bias-corrected bootstrap confidence intervals=0.13-0.52). The association between MSRS-R and PHQ-9 through PSQI (path a2b2) also remained significant (effect size=0.13 and the bias-corrected bootstrap confidence intervals=0.02-0.27). Hence, the relationship between the global MS symptom burden and depression might be mediated by both fatigue and sleep quality.

Figure 2. Correlations among the 4 clinically relevant patient-reported outcomes in this study. For all correlations, P<.001 (N=56). MFIS-5: Modified Fatigue Impact Scale-5; MSRS-R: Multiple Sclerosis Rating Scale—Revised; PHQ-9: Patient Health Questionnaire-9; PSQI: Pittsburgh Sleep Quality Index.

Figure 3. Parallel mediation analysis. Path model showing the effect of Multiple Sclerosis Rating Scale—Revised (MSRS-R; measuring global MS symptom burden) on the Patient Health Questionnaire-9 (PHQ-9) score (measuring depression) as mediated simultaneously by Modified Fatigue Impact Scale-5 (MFIS-5; measuring fatigue) and Pittsburgh Sleep Quality Index (PSQI; measuring sleep quality). Path c represents the effect of MSRS-R on PHQ-9 without mediators in the model. Path c’ represents the effect of MSRS-R on PHQ-9 when MFIS-5 and PSQI mediators are included in the model. Paths a1b1 and a2b2 represent the effect of MSRS-R on PHQ-9 through MFIS-5 or PSQI respectively. The figure shows nonstandardized β regression coefficients (*P<.05, **P<.001, ***P<.0001) as reported by PROCESS Macro in SPSS [Hayes A. Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach. New York, US: Guilford Press; 2017.43]. MS: multiple sclerosis.

Predicting Outcomes During the Stay-at-home Period

Figure 4 shows the performance of the machine learning pipeline for predicting each of the 4 outcomes using the best sensor combinations (ie, the set of sensors that had the best performance for each outcome). Accuracy is the percentage of patients for whom the outcome label was correctly predicted. F₁-score is a metric of model performance that measures the harmonic mean of precision and recall. Precision is the positive predictive value, or the number of true positive labels divided by the number of all positive labels (true positive + false positive). Recall is sensitivity, or the number of true positive labels divided by the number of all patients who should have the positive labels (true positive + false negative). In this study, “positive” label refers to the outcome of interest (eg, presence of depression is the positive label for depression). Figures S1 to S4 in

Multimedia Appendix 1

Supplementary material.

DOCX File , 18995 KB Multimedia Appendix 1 report the performance of individual sensors and when all 6 sensors were included. Tables S1 to S4 in list the features selected by the best models for each outcome, and their corresponding coefficients.

Figure 4. Performance of the machine learning pipeline using all sensors and the best sensor combination for predicting each of the 4 clinically relevant outcomes in people with multiple sclerosis during a state-mandated stay-at-home period. "Accuracy (All Sensors)" and "F1 Score (All Sensors)" are the accuracy (× 0.01) and F1-score obtained by combining all 6 sensors. "Accuracy (Best Sensors)" and "F1 Score (Best Sensors)" are the accuracy (× 0.01) and F1-score obtained by the best combination of sensors. See
Multimedia Appendix 1
Supplementary material.
DOCX File , 18995 KBMultimedia Appendix 1 for additional performance metrics of all models. MFIS-5: Modified Fatigue Impact Scale-5; MSR-R: Multiple Sclerosis Rating Scale—Revised; PHQ-9: Patient Health Questionnaire-9; PSQI: Pittsburgh Sleep Quality Index.