Efficacy, Safety, and Evaluation Criteria of mHealth Interventions for Depression: Systematic Review

Background: Depression is a significant public health issue that can lead to considerable disability and reduced quality of life. With the rise of technology, mobile health (mHealth) interventions, particularly smartphone apps, are emerging as a promising approach for addressing depression. However, the lack of standardized evaluation tools and evidence-based principles for these interventions remains a concern. Objective: In this systematic review and meta-analysis, we aimed to evaluate the efficacy and safety of mHealth interventions for depression and identify the criteria and evaluation tools used for their assessment. Methods: A systematic review and meta-analysis of the literature was carried out following the recommendations of the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement. Studies that recruited adult patients exhibiting elevated depressive symptoms or those diagnosed with depressive disorders and aimed to assess the effectiveness or safety of mHealth interventions were eligible for consideration. The primary outcome of interest was the reduction of depressive symptoms, and only randomized controlled trials (RCTs) were included in the analysis. The risk of bias in the original RCTs was assessed using version 2 of the Cochrane risk-of-bias tool for randomized trials.


Background
Depression is the most common mental health condition in the general population and is one of the leading causes of the global burden of disease and disability [1][2][3].The worldwide incidence of depression increased by 49.86% between 1990 and 2017, from 172 million cases to 25.8 million [2].Unipolar depression is predicted to be the leading cause of disability in high-income countries by 2030, surpassing other health conditions such as ischemic heart disease, dementia, alcohol use disorders, and diabetes [3].
Although there is strong clinical evidence that depression can be treated with a variety of pharmacological and psychological interventions [4], human resources for mental health are inadequate, especially in low-and middle-income countries [5][6][7], and a global shortage of over 15 million health workers is expected by 2030 [8].Given the rapid advancement and adoption of technology, digital interventions-particularly mobile health (mHealth) interventions-have the potential to provide novel and viable methods of delivering population-scale mental health care [9,10].
The World Health Organization defines mHealth as "the term used for medical and public health practices supported by mobile devices, such as phones, patient monitoring devices, personal digital assistants, and other wireless devices" [11].Smartphone apps can especially be powerful vectors for mHealth interventions because of their high connectivity, 24-hour availability, and ubiquitous nature [12].Compared with most traditional treatment services, smartphone-based interventions offer several advantages, including high accessibility and scalability; relatively low costs; minimal contact; patient anonymity; flexibility of use; and the possibility of self-monitoring activity, symptoms, and progression in real time as well as providing motivational support and targeted care [10,13,14].
Self-management features are commonly found in mHealth interventions aimed at mental health problems, enabling clients to manage symptoms by monitoring their own symptoms and behavior [15].In addition, mHealth apps for mood disorder management often provide stress-relieving games, meditation instructions, mood trackers, and psychoeducational materials.Despite the abundance of apps available in the commercial market for managing depressive symptoms, only a limited number incorporate a cognitive behavioral therapy (CBT) approach, despite CBT being widely recognized as a first-line psychological treatment [16].
Previous systematic reviews and meta-analyses have shown that smartphone-based interventions can have beneficial effects on clinical and nonclinical depressive symptoms in both general and clinical populations [9,17].Moreover, digital interventions have been shown to be particularly effective, acceptable, feasible, and user friendly when embedded in a therapeutic context involving social interaction with mental health professionals to monitor progress and provide additional support [18].A recent meta-review of meta-analyses concluded that apps for anxiety and depression produce definite clinical benefits, whether used for self-management or alongside professional guidance [12].
Several mHealth apps are currently available [19][20][21].However, despite increased interest and use, no international standards or apps exist to evaluate mHealth apps in a simple and effective manner.Furthermore, although the number of mobile mental health apps is increasing owing to their convenience and high demand, many of these apps do not apply evidence-based principles or have not been tested for efficacy [16,22].Therefore, selecting an app that is likely to be effective is problematic for users [9].Health professionals and services are also increasingly using digital tools to facilitate disease management and need to be sure that the apps they recommend meet the minimum quality requirements [23].Although several initiatives have been launched to define how mHealth apps should be assessed, these initiatives only address a part of the evaluation process and are mostly concerned with developing a methodology for evaluating all types of mHealth apps.As every health condition has specific needs, new tools and methodologies are required to evaluate apps targeting each condition.

Objectives
This systematic review is part of the EvalDepApps research project [24], the primary objective of which is to develop and pilot an assessment tool for mobile apps aimed at treating and monitoring people with depressive symptoms.To that end, it is critical to comprehensively understand the effectiveness and safety of mHealth interventions based on available scientific evidence as well as the evaluation criteria used to measure these outcomes.Accordingly, the aims of this systematic review are (1) to assess how effective and safe mHealth interventions are in the treatment of depression and (2) to identify the criteria and evaluation tools used to assess these mHealth interventions.

Methods
A systematic review and meta-analysis of the literature was performed following recommendations in the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement [25] (Multimedia Appendix 1).The protocol for this systematic review and meta-analysis was prospectively registered on PROSPERO on February 19, 2022 (CRD42022304684).

Search Strategy
A scoping search conducted to identify relevant search terms resulted in the following: "apps," "mHealth," "eHealth," and "depression."These were applied individually or combined according to Medical Subject Headings keyword terms in 3 electronic databases from inception to February 2022: MEDLINE, PsycINFO, and Embase.In addition, the reference lists of all eligible studies were screened to identify additional studies meeting the inclusion criteria.

Inclusion and Exclusion Criteria
We considered studies recruiting adult patients with elevated depressive symptoms (ie, scoring above the cutoff criteria on a validated depression screening instrument) or diagnosed with depressive disorder (ie, diagnosed by a clinician or using any recognized diagnostic criteria).Studies recruiting children or adolescents aged ≤18 years were excluded.Studies assessing the effectiveness or safety of mHealth-based interventions for treating depression were included, whereas those using no mobile tools or relating to diagnosis or prevention were excluded.Studies referring to the management of other conditions, such as cancer, stroke, Alzheimer disease, epilepsy, social anxiety, alcoholism, or pain, were also excluded.Any comparator other than mHealth interventions was considered, including passive (eg, no intervention or waiting list) or active (eg, antidepressants or face-to-face psychotherapy) groups.The primary outcome was the reduction of depressive symptoms, and secondary outcomes included undesirable effects of the mHealth intervention and the criteria and evaluation tools used to assess the effectiveness and safety of mHealth interventions.
Randomized controlled trials (RCTs) with at least 10 participants were included in the study design.Nonrandomized studies, uncontrolled studies, observational studies, conference abstracts, letters, commentaries, essays, book chapters, qualitative studies, study protocols, and reviews were excluded.We included studies published in English and Spanish, without imposing any restrictions on the publication year.Studies conducted in any country and clinical setting were considered.

Risk-of-Bias Assessment
The risk of bias in the original RCTs was assessed using version 2 of the Cochrane risk-of-bias tool for randomized trials [26].Quality assessment was performed by 2 independent reviewers, and any disagreements were resolved by consulting a third reviewer.

Study Selection and Data Extraction
All citations extracted from electronic databases were imported into Rayyan, a web-based software program for systematic reviews, and duplicates were removed.Two members of the research team independently reviewed all titles and abstracts to preselect those systematic reviews meeting the inclusion criteria.The full texts of potentially relevant studies were screened for eligibility by 2 reviewers.Any disagreement was resolved by discussion and consensus, and a third reviewer was consulted, if required.Two reviewers then independently extracted data from each included RCT using a standardized data extraction form in Microsoft Excel using the following variables: (1) first author, (2) year of publication, (3) country, (4) number of participants, (5) study design, (6) study period, (7) study population, (8) intervention and control details, (9) outcome measures, and (10) main results.To gather information about the intervention details and elements included, we primarily relied on the descriptions of the interventions provided in the included studies.Furthermore, we also referred to other publications related to the same study, which offered a more comprehensive description of the intervention's development process.In addition, when necessary, we consulted public descriptions available through websites or app stores.

Data Synthesis and Analysis
Meta-analyses were performed using the inverse variance method [27] and were visually displayed using forest plots.A random effects model using the Sidik-Jonkman method as the tau estimator was applied [28].Statistical heterogeneity between the different studies included in the meta-analyses was assessed using the Higgins I 2 value [29].For each meta-analysis, 2-tailed 95% prediction intervals were calculated.The following post hoc subgroup analyses were carried out: type of nonactive control, intervention length, depression severity at baseline, mHealth intervention framework, delivery mode, mood monitoring, goal setting, and gamification.Furthermore, the Galbraith plot was used to identify possible outliers, and a sensitivity analysis was performed using the leave-one-out function, which performs multiple meta-analyses excluding a single study at a time.We evaluated the publication bias using the Egger test [30], and the trim-and-fill method was used to correct for possible funnel plot asymmetry.All analyses were performed using Stata (version 17; StataCorp).

Overview
The initial search of the electronic databases yielded 3203 references.After removing duplicates, 1714 records were screened by title and abstract and 87 full-text articles were assessed for eligibility.Two additional records were identified through manual searches.Finally, 29 RCTs reported in 28 articles were included .A flowchart of our selection process is shown in Figure 1.

Quality Assessment of the Included Studies
In total, 20 RCTs were identified as having an unclear risk of bias [31,32,[34][35][36][40][41][42]44,[46][47][48][49][50][51][52][53][56][57][58] and the overall risk of bias in the remaining 9 RCTs was assessed as high [33,[37][38][39]43,45,49,54,55].Depression symptoms were self-reported, and participants were mostly unblinded; therefore, the main sources of bias were the methods used to assess outcomes.A total of 12 RCTs [31,33,38,39,42,43,[49][50][51][52][53]57,58] were judged to have an unclear risk of bias owing to missing outcome data.Most of the studies described treatment allocation as random, but 5 studies [38,39,43,54,55] did not provide XSL • FO RenderX enough details on the methods used to generate or conceal the sequence.Blinding is difficult with psychological mHealth interventions as participants are likely to be aware of what they are receiving.Nine studies reported in 8 references [31,33,42,45,47,49,54,55] that did not provide enough information about blinding or the method used to estimate the effect of assignment on the intervention were deemed to have unclear risks of bias due to deviations from the intended interventions.Most studies were reported in accordance with a prespecified plan and judged as having a low risk of bias in the selection of the reported result.A summary of the evaluation of risk of bias for each study is presented in Figure 2 in the form of a risk-of-bias graph with the opinions of review authors about each risk-of-bias item presented as percentages across all included studies.
A subgroup analysis by type of nonactive control was not statistically significant (P=.12).However, the effect was higher in those studies comparing mHealth with minimal intervention or waiting list than in those comparing with TAU.A subgroup analysis by the severity of depressive symptoms at baseline was not statistically significant, although the effect was higher in people with moderately severe and severe depressive symptoms than in those with moderate depressive symptoms.Similarly, a univariate meta-regression using the baseline Patient Health Questionnaire-9 score as a moderator also displayed a trend toward significance, suggesting that people with higher depressive symptoms would benefit more from mHealth interventions (β=−.15,P=.08, k=14).Neither age nor gender was found to be significantly associated with higher effectiveness.In a subgroup analysis using the mHealth content framework, there were no statistically significant differences (P=.73), but CBT-based interventions were the most effective for reducing depressive symptoms, followed by acceptance-based interventions.Regarding the characteristics of mHealth interventions, only subgroup analysis by delivery mode was statistically significant (P=.03), with hybrid interventions-those combining mHealth with face-to-face sessions-showing the highest effect on reducing depressive symptoms.Univariate meta-regression by number of elements in the mHealth intervention was not statistically significant.More details on the subgroup analyses performed are presented in Table 2.
The funnel plot was symmetrical (Figure 4), trim-and-fill did not need to impute any additional study, and Egger tests showed no evidence of a small-study effect (P=.17).
Four studies reported in 3 articles [31,32,43] compared mHealth interventions against nonactive controls but did not provide means and SDs and therefore were not included in the meta-analysis.In the 2 RCTs reported in the study by Araya et al [31], a digital intervention delivered over a 6-week period significantly improved depressive symptoms at 3 months when compared with usual care, but the magnitude of the effect was small in 1 trial, and the effects were not sustained at 6 months.According to Arean et al [32], mHealth apps designed to engage the cognitive correlates of depression have the greatest effect on reducing depressed mood in people with moderate levels of depression.In addition, Birney et al [43] found that the MoodHacker app produced significant effects on depression symptoms at the 6-week follow-up when compared with minimal intervention.

mHealth Versus Active Control
Three studies compared mHealth interventions with active controls such as bibliotherapy [36], computerized CBT [49], and face-to-face behavioral activation [40].Liu et al [36] found that a chatbot-delivered self-help depression intervention was superior to bibliotherapy in reducing depression.Watts et al [49] investigated whether a previously validated computerized program would be effective when delivered via a mobile phone app.Both the mobile and computer groups showed significant reductions in depressive symptoms at the 3-month follow-up.The authors concluded that the study provided preliminary evidence of clinically significant improvements in depressive symptoms when CBT is delivered via a mobile app.Ly et al [40] compared a hybrid treatment combining face-to-face behavioral activation and a smartphone app with a 10-session behavioral activation in people with major depression.Although both groups displayed significant improvements after 6 months of treatment, the hybrid intervention had reduced therapist time.

Outcome Tools and Measures
The main end point outcome in the included studies was a reduction in depressive symptoms.However, several studies included secondary outcomes related to the effectiveness of mHealth interventions, such as quality of life, behavioral activation, and anxiety.

Disability
According to the World Health Organization, depression is a leading cause of disability worldwide and a major contributor to the overall global burden of disease.Disability was measured in 6 (20%) out of 29 studies [31,32,42,49,50]: the Sheehan Disability Scale was used in 3 studies (N=6, 50%), the World Health Organization Disability Assessment Schedule II was used in 2 (N=6, 33%), and 1 used the Disability Symptom Severity (N=6, 16%).Three studies in 2 references found significant effects [31,51], whereas 3 others did not [32,42,49].Therefore, mHealth interventions have not been conclusively proven to reduce depression-related disability.

Behavioral Activation
As a person with depression may withdraw from their surroundings and disengage from their routines, thus reducing opportunities for positive reinforcement, many depression interventions have included behavioral activation as a goal.Four of the 29 studies (13%) [31,51,56] assessed behavioral activation using the Short Form of the Behavioral Activation for Depression Scale, and 3 of these found statistically significant differences between mHealth interventions and control groups.

Insomnia
In 4 of 29 studies (13%), insomnia was measured using the Insomnia Severity Index [44,50,53,56].Significant between-group differences favoring the mHealth intervention were found in 3 of these studies (low to large effect sizes compared with the waiting list) [50,53,56].In contrast, Raevuori et al [44] found no significant differences in sleep disturbance between a group receiving mHealth plus usual care and a control group receiving usual care alone.

Self-Efficacy
Three studies (N=29, 10%) assessed the effectiveness of mHealth interventions on self-efficacy [45,46,58].Measures used included the General Self-Efficacy Scale, Self-Efficacy Scale, and Parental Sense of Competence Scale.Both studies using general self-efficacy measures found significant results favoring mHealth interventions [45,58], but no effect on parental competence was found [46].

Self-Esteem
The Rosenberg Self-esteem Scale was used in 2 studies (N=29, 6.9%) that compared mHealth interventions with waiting list controls.Although Bruhns et al [52] found a medium to large effect size favoring smartphone self-help apps, Lüdtke et al [37] found no statistically significant differences between groups.
Our review assessed 29 studies reported in 28 articles involving a substantial number of adult patients with elevated depressive symptoms or diagnosed depression.The meta-analysis of 26 studies comparing the effectiveness of mHealth interventions with the waiting list, minimal intervention, and TAU found moderate positive effects (Hedges g=−0.62) for mHealth, despite high levels of heterogeneity.These results align with those of 2 earlier meta-analyses comparing the efficacy of mHealth interventions and nonactive controls on reducing depressive symptoms, which showed effects of Hedges g=−0.56 and Hedges g=−0.51 [9,59].However, these are higher than findings from other studies that included patients with any mental health issue (Hedges g=−0.33)[60] and compared mHealth with active treatments (Hedges g=−0.22)[9].Owing to high heterogeneity and the small number of studies, conducting a meta-analysis to compare mHealth with other active interventions was not feasible.
The dynamic between health care professionals and patients is undergoing transformation owing to the influence of numerous technological, social, and environmental factors, leading to an evolving and changing relationship [61].As mental health care delivery evolves toward a hybrid model incorporating both in-person and online interventions for diagnosis, therapy, and monitoring, the use of mobile devices becomes increasingly crucial, serving as an integral component in the assessment and intervention of mental health problems [62,63].Although the number of studies assessing this type of intervention is small, the available evidence suggests that a combination of these 2 modalities can lead to better outcomes for individuals with depression.A potential explanation for the superior efficacy of hybrid therapy is the synergistic combination of app-based and face-to-face interventions.Although app-based interventions provide access to therapeutic content and activities at any time, face-to-face therapy has the advantages of personal interaction, direct guidance, and a supportive environment.An integration of these modalities provides a comprehensive treatment experience for individuals with depression, which may improve the therapeutic process and lead to better outcomes.Furthermore, the complementary nature of the 2 interventions may enhance the reinforcement of skills and strategies learned in face-to-face therapy as well as provide ongoing support and accountability, thereby potentially improving long-term symptom management.As highlighted by Ly et al [40], this could be explained by the dose-response effect, wherein lower doses of psychotherapy have been associated with poorer outcomes [64].Moreover, hybrid therapy has the potential to be more cost-effective than traditional face-to-face treatments by combining in-person and on the web or app-based sessions, reducing medical costs per patient and increasing the capacity of therapists to treat more individuals with depression, thereby expanding access to treatment.Despite the crucial importance of implementation costs and cost-effectiveness in determining the feasibility and scalability of digital and hybrid interventions, there is a lack of sufficient evidence to date, and additional research is required to inform public and private reimbursement systems and enable investment in digital interventions.
In terms of app design, our findings suggest that incorporating CBT and acceptance frameworks can lead to a greater reduction in depressive symptoms.However, subgroup analyses by theoretical framework did not show statistically significant differences.This is consistent with existing evidence on the effectiveness of psychological interventions.Although CBT is one of the main nonpharmacological treatment options for depressive disorders, a recent network meta-analysis covering efficacy, acceptability, and long-term outcomes found little difference in results from various types of psychotherapy and concluded that most are effective and acceptable for treating adult depression [65].Clearly, it is essential to design mHealth interventions based on evidence-based frameworks to guarantee their foundation in robust and reliable scientific evidence, and studies have highlighted the need for future research to better characterize the app features that maximize therapeutic effects [66].However, we found that none of the individual elements in the apps (ie, psychoeducation, mood monitoring, in-app feedback, goal setting, gamification, and professional support) was significantly associated with a greater reduction in depressive symptoms.Moreover, mHealth interventions with a larger number of components are not always more successful: in some cases, simpler interventions that focus on a limited number of well-implemented and user-centered elements can be more effective.It is thus necessary to move beyond "one-size-fits-all" approaches in the design and delivery of mHealth interventions and prioritize tailored approaches that consider individual differences, needs, and preferences [67,68].
With the goal of identifying which sociodemographic and clinical characteristics of patients were associated with greater app effectiveness, we performed subgroup analyses and meta-regressions for gender, age, and baseline depression symptom severity variables.Our results show that mHealth interventions have been effective across demographic factors but may be more effective for individuals with moderate to severe depressive symptoms than for those with lower symptom levels.This is consistent with the results of a previous systematic review [59].Furthermore, it is in line with the findings of other studies that have concluded that individuals with severe burden benefit equally or to a greater extent from low-intensity internetand mobile-based interventions [69][70][71].There are several potential explanations for these findings.One possibility is that patients with more symptoms have a greater capacity for XSL • FO RenderX definable and noticeable improvement.In addition, people with moderate to severe depressive symptoms may be more motivated to engage in psychological interventions and more likely to adhere to a treatment plan.
The disparity between RCT data and individual patient characteristics encountered in real-world health care settings is a widely acknowledged challenge in daily clinical practice [72].To ensure the ultimate success of the mental health technology revolution, it is imperative to bolster the path toward the evaluation of implementation, bridging the gap between research findings and the unique features of each patient [73].Although RCTs have demonstrated the effectiveness of digital interventions for addressing common mental health issues, it is crucial to shift our focus beyond these controlled settings.Unfortunately, there is a scarcity of reported data regarding the implementation of these interventions in the real-world context.The limited available data suggest that uptake and engagement vary widely among the handful of implemented digital self-help apps and programs that have reported this and that use may vary from that reported in trials [74].It is essential to assess how these mHealth tools are used in real-life scenarios and to determine the extent to which their effectiveness endures beyond the controlled environment of research studies.This exploration beyond RCTs will provide valuable insights into their practical impact, accessibility, and overall contribution to enhancing the mental health of the population.
The increase in the use of mHealth apps has outstripped the development of international standards or practical evaluation tools to assess their effectiveness in a comprehensive and efficient manner.Despite a plethora of mHealth interventions, few have undergone rigorous scientific evaluation.In addition, most mHealth apps that have encountered any evaluation have only undergone a single study, typically with a small sample size.Only a minority of the mHealth interventions identified in our review have been subjected to evaluation in more than one study.Our results do indicate consistency in the assessment of depressive symptoms, as most studies use established and validated measurement tools, such as the Beck Depression Inventory-II, Patient Health Questionnaire-9, and Hospital Anxiety and Depression Scale.However, given the high heterogeneity of the identified measures, there appears to be a lack of consensus on how to assess other important outputs that are crucial in determining primary outcome measures, such as adherence, acceptability, usability, and app use.Furthermore, the absence of adequate regulatory bodies to oversee and regulate app development and availability has made accessing trustworthy and validated mHealth interventions a challenging process [21].Accordingly, there remains a requirement for the development of new tools and methodologies that facilitate the assessment of various aspects of mHealth interventions intended to manage specific conditions.The results of this SR enable us to understand the effectiveness and safety of apps targeting depression that have been evaluated in RCTs, as well as the evaluation criteria used, and will serve as a starting point for the design of an evaluation tool within the context of the EvalDepApps research project [24].

Strengths and Limitations of This Study
Our study has several key strengths, including a rigorous and systematic search and selection process that ensured comprehensive coverage of the available evidence.Furthermore, the use of validated quality assessment tools facilitated a robust evaluation of the risk of bias in the included studies.Clear and transparent reporting of methods and results enhances the reproducibility of the findings and strengthens the validity and reliability of the results.However, there are also several limitations that should be considered when interpreting our results.Our search for studies was limited to those published in English or Spanish and did not incorporate gray literature, which may have excluded some relevant studies.It should also be noted that most of the reviewed studies were conducted in Western high-income countries; thus, it is unclear whether these results can be generalized to low-and middle-income countries.Our analyses revealed moderate heterogeneity that could not be fully accounted for through subgroup analyses.This heterogeneity may be due to differences in populations, interventions (including the framework, elements included, and definitions of these elements), and comparators across the trials.For example, we compared mHealth interventions with a variety of control conditions, including waiting list, minimal intervention, and TAU.Although we found no significant differences between these control conditions, the variability among them may have contributed to the overall heterogeneity.Another noteworthy limitation of our review was the exclusion of studies that did not present results from RCTs.Although observational studies and nonrandomized trials could potentially offer valuable insights into the practical use and effectiveness of mHealth in the real-world context, we decided to exclude them because of the higher susceptibility of these trial designs to various biases, which may compromise the reliability of the findings.Finally, there are important limitations associated with the small sample sizes and moderate to high risk of bias present in most of the studies reviewed.

Conclusions
This study suggests that mHealth interventions directed toward adults experiencing elevated symptoms of depression result in moderate decreases in these symptoms, regardless of age and gender, with hybrid interventions achieving the best results.However, it should be noted that most of the studies in this review had small sample sizes and were associated with a moderate to high risk of bias.In addition, a high level of heterogeneity was observed in the characteristics and components of the mHealth interventions, with no singular element found to be associated with improved outcomes.Hence, it is imperative to move beyond generic solutions when designing and delivering mHealth interventions and prioritize individualized approaches that take into consideration individual differences, needs, and preferences.

Figure 1 .
Figure 1.PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart of the selection process.
a TAU: treatment as usual.b BDI-II: Beck Depression Inventory-II.c CES-D: Center for Epidemiologic Studies Depression Scale.

Table 1 .
Characteristics of the included studies.If there are 3 numbers, the first 2 numbers are intervention groups and the third one is the control group.

Table 2 .
Random effects models and subgroup analyses with depressive symptoms as the outcome.