This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Mental Health, is properly cited. The complete bibliographic information, a link to the original publication on http://mental.jmir.org/, as well as this copyright and license information must be included.
Smartphone-based symptom monitoring has gained increased attention in psychiatric research as a cost-efficient tool for prospective and ecologically valid assessments based on participants’ self-reports. However, a meaningful interpretation of smartphone-based assessments requires knowledge about their psychometric properties, especially their validity.
The goal of this study is to systematically investigate the validity of smartphone-administered assessments of self-reported affective symptoms using the Remote Monitoring Application in Psychiatry (ReMAP).
The ReMAP app was distributed to 173 adult participants of ongoing, longitudinal psychiatric phenotyping studies, including healthy control participants, as well as patients with affective disorders and anxiety disorders; the mean age of the sample was 30.14 years (SD 11.92). The Beck Depression Inventory (BDI) and single-item mood and sleep information were assessed via the ReMAP app and validated with non–smartphone-based BDI scores and clinician-rated depression severity using the Hamilton Depression Rating Scale (HDRS).
We found overall high comparability between smartphone-based and non–smartphone-based BDI scores (intraclass correlation coefficient=0.921;
These findings demonstrate that smartphone-based monitoring of depressive symptoms via the ReMAP app provides valid assessments of depressive symptomatology and, therefore, represents a useful tool for prospective digital phenotyping in affective disorder patients in clinical and research applications.
The phasic development of symptoms over time in the form of disease episodes is one of the key characteristics of affective disorders. These disease trajectories can be used as an informative predictor as well as an outcome measure in psychiatric research and personalized medicine. However, the assessment of the development of symptoms over time is challenging. The value of cross-sectional assessments is limited as they can only capture an excerpt of the symptom history and it is unclear whether this excerpt reflects, for example, the peak of an affective episode or a fully or partially remitted state and whether episodes are recurrent. Collecting this information retrospectively from the patients is one approach to gaining insights into their former symptom history, which is likely to be biased by their current depressive state [
Several proof-of-concept studies have pointed to the utility of smartphone-based data in affective disorder research [
The potential of continuous monitoring of psychomotor activity based on acceleration and location for a differentiation of unipolar and bipolar patients has been demonstrated [
Consequently, the comparability between smartphone-based and non–smartphone-based versions (ie, conventional paper-and-pencil or stationary computer-based versions) of psychometric instruments has also received increasing attention [
Data from pilot studies indicate agreement between smartphone-delivered, daily self-rated mood and clinician-rated mood via Hamilton Depression Rating Scale (HDRS) scores among bipolar patients [
In sum, while the aforementioned findings of overall agreement between smartphone-based self-reported depressive symptoms and established clinical scales is encouraging, it appears important to denote that limited sample sizes in previous reports as well as systematic differences, including sample properties, technical properties, and assessment type, currently limit our understanding of the reliability and validity of smartphone-based assessments of depressive symptoms. It thus remains unclear to what degree validation reports of smartphone-based self-reports are generalizable across assessment instruments, cohorts, and applications; hence, app- or study-specific validation of measurements remains the gold standard.
Therefore, the aim of this study is to assess the validity of smartphone-based assessments of depressive symptoms using the Remote Monitoring Application in Psychiatry (ReMAP) app. To this end, we use smartphone-based depression self-reports using single-item and BDI questionnaire data and investigate their comparability with non–smartphone-based versions of the BDI, a well-established and standardized self-report instrument used among psychiatric patients and healthy control participants. We test the hypotheses that both delivery formats—smartphone-based and non–smartphone-based assessments—yield comparable results and, therefore, that smartphone-based monitoring of depressive symptoms via the ReMAP app provides valid assessments of depressive symptomatology. Furthermore, we aim to investigate potential differences in the agreement between smartphone-based and non–smartphone-based assessments of depressive symptoms across diagnostic groups as well as across age and gender.
The ReMAP study was designed as a prospective, naturalistic observational study. An overall sample of 173 participants was included in the analyses; participants had a mean age of 30.14 years (SD 11.92). The single inclusion criterion for this study was availability of a smartphone-based BDI that was completed within 4 weeks of a non–smartphone-based BDI. The sample included adults that were either healthy controls (n=101) or belonged to one of the following diagnostic groups: MDD (n=43), bipolar disorder (n=5), MDD with comorbid social anxiety disorder (SAD) (n=9), SAD only (n=2), or specific phobia (SP), spider subtype (n=13). Participants were recruited for ReMAP participation in the context of ongoing longitudinal cohort studies over which assessments were parallelized; details on subsamples from all cohorts are provided in the
Participants were informed about the possibility of voluntary additional participation in the ReMAP study in a face-to-face meeting at the time they presented at the Department of Psychiatry, University of Münster, Germany, in the context of ongoing, longitudinal cohort assessments. Interested subjects were extensively briefed about aims; methods, especially type and amount of collected data; details on data security (ie, details on data transfer and storage); and financial compensation. The study was approved by the local Institutional Review Board, and written informed consent was obtained before participation.
All measures that were not assessed via smartphone (ie, conventionally administered in interviews or via paper-and-pencil or tablet questionnaires) will be referred to as
Development of ReMAP began in mid-2018 at the Institute for Translational Psychiatry in Münster. It is a native app for iOS and Android, based on Apple ResearchKit, Apple Health, and Google Fit. After an anonymous log-in with a provided subject ID, the app works in background mode and monitors the number of steps taken by the user, the distance walked, the accelerometer, and GPS position data. The data are encrypted on the smartphone and sent regularly via REST-API (REpresentational State Transfer application programming interface) to a back end specifically developed for ReMAP, which is provided on university servers. In addition, the app regularly enables the user to fill out various questionnaires regarding sleep and mood as well as to create short voice recordings. Measures used in this study’s analyses are described below.
After written informed consent was obtained, each participant was provided an individual subject ID (ie, subject code). The participant was then asked to download the developed ReMAP smartphone app and to start the app. At this time, subjects were asked to confirm participation in the study again and to enter their individual subject IDs.
In addition to the continuous assessment of passive data, all participants were asked to provide self-reported ratings of depressive symptoms. To this end, participants filled out a digital version of the BDI-I that was integrated into ReMAP every 2 weeks. Moreover, participants rated their mood and sleep duration by answering single items every 3 days. For the single mood question (ie, “How is your mood today?”), participants provided their responses via touch screen on a scale from 1 (very bad) to 10 (very good). For the single sleep question (ie, “How many hours did you sleep last night?”), participants provided their response on a scale from 0 to 13 hours. For all self-reported data, the app sent out weekly push notifications on a random basis during the daytime with a variance of 2 days or every 2 weeks in case of the BDI. The time of the day when notifications were sent was systematically varied in order to avoid bias from systematically assessing symptom self-reports (eg, only during the morning). Participants were instructed that answering all questions was optional and they were free to choose their time of answering whenever items were made available.
Again, for this study, smartphone-based and non–smartphone-based data were only included if the time interval between completion of the ratings between both delivery formats was less than 4 weeks, in order to minimize potential bias due to temporal change in depressive symptoms. Further, for each participant, the respective BDI, mood, and sleep assessments from the time point with the shortest interval between smartphone-based and non–smartphone-based assessments were included for this study.
Agreement between non–smartphone-based and smartphone-based BDI scores was assessed by absolute agreement using a two-way, mixed-effects intraclass correlation coefficient (ICC) [
This analysis was further repeated for the over-1-week-interval and the under-1-week-interval groups separately in order to assess the influence of the test-retest interval on the agreement between measurements. In addition, the analysis was repeated separately among healthy controls, affective disorder (ie, MDD, SAD + MDD, and bipolar disorder) patients, and anxiety disorder (ie, SP, spider subtype; and SAD) patients, as well as for the two non–smartphone-based BDI versions (ie, BDI-I and BDI-II). The internal consistency of the smartphone-based BDI was assessed via Cronbach α and compared with the internal consistency of the non–smartphone-based BDIs.
The smartphone-based single mood item was correlated with the non–smartphone-based and smartphone-based BDI scores. Although it covers different levels of symptomatology (ie, the BDI assesses complex symptoms over time, while the single mood item assesses only the current subjective mood [
For validation of the smartphone-based single sleep item, it was correlated with the smartphone-based and non–smartphone-based BDI item assessing sleeping disturbance. Analogous to the BDI analysis, one mood and one sleep assessment were used for analysis based on the shortest interval to the non–smartphone-based measures. For further validation, the ReMAP BDI and the ReMAP single mood item were both correlated with clinician-rated depression severity using the HDRS.
All analyses were conducted using SPSS, version 26 (IBM Corp). A multiple test correction was undertaken across all significance tests (n=34) in order to avoid α error accumulation using a false-discovery-rate (FDR) correction following the Benjamini-Hochberg procedure [
Mean BDI scores and their range across all participants were similar for ReMAP (mean 5.35, SD 8.63; range 0-44) and non–smartphone-based BDI (mean 6.46, SD 9.06; range 0-47). Absolute differences between both measurements were, on average, 3.02 points (SD 3.76) with a considerable range covering 0 to 26 points. The mean test-retest interval was 5.84 days (SD 7.29), ranging from 0.20 to 28.70 days. Detailed descriptive statistics across subgroups of the sample are provided in Table S1 in
The overall agreement between ReMAP and the non–smartphone-based BDI was very high (ICC 0.921, 95% CI 0.890-0.942). Separate investigations of the BDI agreement in several subgroups yielded highly comparable ICCs across both BDI versions (ie, BDI-I and BDI-II), across different test-retest intervals, across different age groups, and across males and females—the ICC was over 0.888 for all subgroups. Separate investigations across different diagnostic statuses yielded the highest BDI agreement between delivery formats in the subgroup with affective disorders (ICC 0.912), while healthy controls and participants with anxiety disorders (ie, SP, spider subtype; and SAD) showed moderate agreement between BDIs (ICC 0.639 and ICC 0.736, respectively). Similarly, higher agreement was found among acutely depressed as compared to remitted MDD patients (see
The internal consistency of the ReMAP BDI (Cronbach α=.944, n=174) was virtually identical to both non–smartphone-based BDI versions (BDI-I: α=.945, n=54; BDI-II: α=.944, n=108). For further validation, the ReMAP BDI was correlated with clinician-rated depression severity using the HDRS in a subset of the sample (n=51). The analysis yielded a strong significant correlation (
Intraclass correlation agreement of the Remote Monitoring Application in Psychiatry (ReMAP) Beck Depression Inventory-I (BDI-I) with the full sample and stratified subsamples.
Sample | Number of participants (N=173), n (%) | Intraclass correlation coefficient | 95% CI | |
Full sample | 173 (100) | 0.921 | 0.890-0.942 | <.001 |
BDI-Inon–smartphone based | 64 (37.0) | 0.921 | 0.870-0.952 | <.001 |
BDI-IInon–smartphone based | 109 (63.0) | 0.919 | 0.863-0.850 | <.001 |
≤1-week intervalb | 126 (72.8) | 0.934 | 0.890-0.958 | <.001 |
>1-week intervalc | 47 (27.2) | 0.888 | 0.799-0.938 | <.001 |
Healthy controls | 101 (58.4) | 0.639 | 0.454-0.760 | <.001 |
Affective disorders | 57 (32.9) | 0.912 | 0.851-0.948 | <.001 |
Anxiety disorders | 15 (8.7) | 0.736 | 0.252-0.910 | .008 |
Age ≤35 years | 131 (75.7) | 0.899 | 0.851-0.931 | <.001 |
Age >35 years | 42 (24.3) | 0.962 | 0.930-0.980 | <.001 |
Male | 41 (23.7) | 0.969 | 0.919-0.986 | <.001 |
Female | 132 (76.3) | 0.904 | 0.864-0.933 | <.001 |
aAll
bParticipants completed the smartphone-based BDI within 1 week of completing non–smartphone-based measures.
cParticipants completed the smartphone-based BDI and non–smartphone-based measures more than 1 week apart.
Beck Depression Inventory (BDI) scores via the Remote Monitoring Application in Psychiatry (ReMAP) smartphone app over non–smartphone-based BDI scores across diagnostic groups.
After including all data points with a test-retest interval of up to 4 weeks, the single item for mood assessed via ReMAP correlated moderately with the sum scores of the ReMAP BDI (
The single item for sleep assessed via ReMAP was correlated with the BDI item assessing sleeping disturbance. After including all data points with test-retest intervals of up to 4 weeks, this analysis yielded significant negative associations with the sleep item from the ReMAP BDI (
With this study, we demonstrate that smartphone-based monitoring of depressive symptoms via the ReMAP app provides valid assessments of depressive symptomatology. The overall high agreement between the non–smartphone-based and smartphone-based versions of the BDI confirm that digital assessments via the ReMAP app using the participants’ smartphones have the potential to offer valid estimates of the trajectory of participants’ moods. This notion is additionally supported by the observed correlation of smartphone-administered single-item ratings regarding mood and sleep with corresponding non–smartphone-based assessments. Importantly, the validity of smartphone-based assessments could furthermore be demonstrated by using clinical rating scales as a criterion with a strong correlation of smartphone-based BDI and non–smartphone-based HDRS scores.
The observation of high agreement between self-reported smartphone-based assessments of depressive symptoms and classic non–smartphone-based assessments in this study is supported by previous findings from pilot studies among MDD patients [
Our findings of overall high validity of smartphone-based and conventional non–smartphone-based assessments of depressive symptoms in a relatively large and heterogeneous sample critically underscores the potential of mobile assessment tools in psychiatric research. Considering that smartphone-based assessments offer valid data on patients’ mood states, an expansion of mobile data acquisition in the clinical and research context appears desirable. The cost-efficiency of smartphone-based data might thus allow the acquisition of valid data on patients’ long-term disease trajectories at an unprecedented scale. Together with previous studies investigating the comparability of the BDI versions (ie, BDI-I and BDI-II) [
We furthermore demonstrate that agreement between smartphone-based and non–smartphone-based assessments of depressive symptoms does not depend on the age or gender of participants, which supports the generalizability of smartphone-based assessments of depressive symptoms. This notion appears especially noteworthy considering the relatively large sample size, in comparison with previous reports, as well as the age range of participants included in this study (ie, 18-68 years of age). The inclusion of older participant groups seems relevant, as previous studies have emphasized that smartphone apps for mental health monitoring should meet the needs (eg, easy handling) of older and potentially less technically proficient individuals in order to assure adherence among these group members [
An important observation of this study was that higher agreement between smartphone-based and non–smartphone-based assessments of depressive symptoms was found among affective disorder patients compared to anxiety disorder patients or healthy controls. Notably, while the agreement in the affective disorder sample can be estimated as excellent, intraclass correlations indicate a lower, but still moderate to good, agreement in the healthy control and anxiety disorder samples [
Besides validation of a smartphone version of the BDI, this study found moderate to high agreement between mood ratings via smartphone-based single-item assessments and established clinical scores using the BDI, regardless of the delivery format of the BDI. This finding is of particular importance considering that completion of an entire questionnaire is time-consuming and, hence, the usage of single items might provide a valid possibility of assessing mood on a frequent basis. Importantly, these findings are tentative and limited by the fact that the single mood item and the BDI questionnaires may systematically assess differing concepts in regard to the symptom level as suggested by previous scholars [
In sum, the associations between questionnaire data (ie, the BDI) and single-item mood self-reports pose the following question: Which measure may be better suited for specific research contexts and could one of the two be omitted completely? One may argue that single mood items seem to provide a sufficient proxy for the assessment of mood fluctuations that is more time-efficient and could, therefore, be assessed more frequently as compared to a more exhaustive BDI questionnaire. On the other hand, it may be more beneficial to have a more elaborate symptom profile as obtained, for example, via the BDI in exchange for assessment frequency. It remains to be investigated what temporal and content-related resolution is most beneficial for the investigation of the development of depressive symptoms and for specific feature engineering using machine learning algorithms. Likely, the most beneficial trade-off between the two highly depends on the specific research question.
Compared with the single mood item, the single sleep item showed a lower correlation with corresponding non–smartphone-based assessments in the form of sleep disturbance items within the BDI questionnaire. One possible explanation for this low association is that both items measure slightly different aspects of sleep: while the BDI sleep item assesses increased and decreased sleep duration and, depending on the BDI version, also a combination with subjective sleep quality, the single sleep item assesses purely the duration of sleep during the last night. Further, sleep quality or disturbance may be a more heterogeneous construct and, thus, more difficult to assess via a single item. Another possible explanation for this finding could be that variability in the sleep quality, as well as sleep duration, may be less temporally stable as compared to mood changes. Thus, the test interval of up to 4 weeks may be too long in order to validate the smartphone-based assessment of sleep duration. Considering that smartphone-based and non–smartphone-based assessment methods lie several days or weeks apart, the association between them seems to be reasonably high.
Strengths of this study include the relatively large sample of participants and the availability of smartphone-based data along with conventional psychometric and clinical data. Furthermore, this study included participants with differing psychiatric diagnoses and a high variability in age, thus allowing the assessment of generalizability across such participant groups. Further, a wide variety of assessment forms were used for validation, considering multiple sources of information. The application of non–smartphone-based BDI versions (ie, self-report), as well as clinical ratings (ie, HDRS), underlines the validity of the smartphone-based assessments via the ReMAP app. Limitations include the lack of prospective clinical follow-up data. Future large-scale studies are warranted to assess the prognostic validity of smartphone-based self-reports in affective disorder patients.
Smartphone-based monitoring of depressive symptoms remains a timely matter of critical relevance for translational psychiatry. These results demonstrate overall high validity of smartphone-based assessments of depressive symptoms and should, thus, encourage researchers to apply mobile apps toward continuous prospective assessments of depressive symptoms.
Descriptive statistics of subsamples, correlations of the Remote Monitoring Application in Psychiatry (ReMAP) single mood item with Beck Depression Inventory (BDI) scores across subsamples, and correlations of the ReMAP single sleep item with the BDI sleep item across subsamples.
Beck Depression Inventory
Collaborative Research Center Transregio
German Research Foundation
Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition
false discovery rate
Hamilton Depression Rating Scale
intraclass correlation coefficient
Innovative Medizinische Forschung
Interdisciplinary Center for Clinical Research
major depressive disorder
Patient Health Questionnaire-9
Remote Monitoring Application in Psychiatry
REpresentational State Transfer application programming interface
social anxiety disorder
Structured Clinical Interview for DSM-IV Axis I Disorders
specific phobia
Funding was provided by the German Research Foundation (DFG) (project No. 44541416-TRR 58); project grants C09 and Z02 of the Collaborative Research Center Transregio (CRC-TRR) 58 were awarded to UD. Funding was also provided by the Interdisciplinary Center for Clinical Research (IZKF) of the Faculty of Medicine, University of Münster (grant Dan3/012/17 was awarded to UD and grant SEED 11/19 was awarded to NO). In addition, funding was provided by the Innovative Medizinische Forschung (IMF) of the Faculty of Medicine, University of Münster (grant OP121710 was awarded to NO and TH and grants LE121703 and LE121904 were awarded to EL). Further, we are deeply indebted to all personnel involved in recruitment and data management as well as to all participants of this study.
None declared.