Published on in Vol 8, No 11 (2021): November

Preprints (earlier versions) of this paper are available at, first published .
Machine Learning Methods for Predicting Postpartum Depression: Scoping Review

Machine Learning Methods for Predicting Postpartum Depression: Scoping Review

Machine Learning Methods for Predicting Postpartum Depression: Scoping Review

Authors of this article:

Kiran Saqib1 Author Orcid Image ;   Amber Fozia Khan1 Author Orcid Image ;   Zahid Ahmad Butt1 Author Orcid Image


School of Public Health Sciences, University of Waterloo, Waterloo, ON, Canada

Corresponding Author:

Zahid Ahmad Butt, MBBS, MSc, PhD

School of Public Health Sciences

University of Waterloo

200 University Avenue West

Waterloo, ON, N2L 3G1


Phone: 1 5198884567 ext 45107


Background: Machine learning (ML) offers vigorous statistical and probabilistic techniques that can successfully predict certain clinical conditions using large volumes of data. A review of ML and big data research analytics in maternal depression is pertinent and timely, given the rapid technological developments in recent years.

Objective: This study aims to synthesize the literature on ML and big data analytics for maternal mental health, particularly the prediction of postpartum depression (PPD).

Methods: We used a scoping review methodology using the Arksey and O’Malley framework to rapidly map research activity in ML for predicting PPD. Two independent researchers searched PsycINFO, PubMed, IEEE Xplore, and the ACM Digital Library in September 2020 to identify relevant publications in the past 12 years. Data were extracted from the articles’ ML model, data type, and study results.

Results: A total of 14 studies were identified. All studies reported the use of supervised learning techniques to predict PPD. Support vector machine and random forest were the most commonly used algorithms in addition to Naive Bayes, regression, artificial neural network, decision trees, and XGBoost (Extreme Gradient Boosting). There was considerable heterogeneity in the best-performing ML algorithm across the selected studies. The area under the receiver operating characteristic curve values reported for different algorithms were support vector machine (range 0.78-0.86), random forest method (0.88), XGBoost (0.80), and logistic regression (0.93).

Conclusions: ML algorithms can analyze larger data sets and perform more advanced computations, which can significantly improve the detection of PPD at an early stage. Further clinical research collaborations are required to fine-tune ML algorithms for prediction and treatment. ML might become part of evidence-based practice in addition to clinical knowledge and existing research evidence.

JMIR Ment Health 2021;8(11):e29838




Postpartum depression (PPD) is considered one of the most frequent maternal morbidities after delivery, with severe implications for the mother and child. According to the National Institute of Mental Health, United States, 10%-15% of women have maternal depression during and after pregnancy worldwide, whereas in low- and middle-income countries, this percentage could be as high as 18%-25% [1] and seems to depend on the cultural and traditional characteristics of the population [2]. Both the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) and the International Classification of Diseases (ICD)-10 recognize maternal depression as a mental illness with different classifications [3].

PPD, the most common complication of childbearing, is a term applied to depressive symptoms that occur within 4 weeks of giving birth and possibly as late as 30 weeks postpartum [4]. PPD is a significant public health issue that affects women as well as child’s physical and mental health and cognitive and interactive development [5], thus making the child vulnerable to developing psychiatric disorders during adolescence [6]. A depressed mother may not establish a positive relationship with her infant [7], and this may continue to affect children into toddlerhood, preschool years, and beyond [8]. Infants of depressed mothers have shown poor nutrition, poor general health, and more frequent diarrheal episodes, and in extreme cases, maternal suicide and infanticide may occur [9,10]. PPD is generally an overlooked health problem that can lead to serious complications and should be addressed in a timely manner [11].

As there is no single etiology for PPD, a single prevention method or treatment will be ineffective. There is a need for a multifactorial approach combining psychological, psychosocial, and biological predictive factors of PPD to contemplate various etiological factors and individual variations [12,13]. An effective PPD prediction model can help health care providers in the early identification and effective management of at-risk patients [14], with evidence from previous studies exploring this possibility and feasibility [15].

Machine learning (ML) algorithms are broadly grouped into 3 categories: (1) supervised, (2) unsupervised, and (3) semisupervised learning. In supervised learning, data with known labels are used to train a model that can predict the label for new data [16]. ML-based predictive models are gaining popularity for combining a huge amount of data into a single model and evaluating the model’s predictive value for previously unseen individuals, for example, at-risk and new patients. ML approaches rely on the use of advanced statistical and probabilistic techniques to construct systems with the ability to automatically learn from data. This enables patterns in data to be more readily and accurately identified and more accurate predictions to be made from data sources (eg, more accurate diagnosis and prognosis) [17]. ML has been used for prediction in psychiatry [18]. ML methods have been successfully used to predict major depressive disorder persistence, chronicity, severity [19], and treatment response [20]. The key to building good ML models is in the rigorous selection of appropriate features and algorithms [17]. Recently, a scoping review of ML application in mental health identified over 190 studies that applied ML in the detection and diagnosis of mental disorders and over 60 studies to predict the progression of mental health problems over time [21]. These studies reported the use of electronic health records (EHRs), mood rating scales, brain imaging data, smartphone monitoring systems, and social media platforms to predict, classify, or subgroup mental health illnesses, including depression, schizophrenia, and suicide ideation and attempts [22]. Two main ML algorithms have been commonly reported in depression prediction studies, namely, support vector machine (SVM) and random forest (RF) algorithms [21]. Depression prediction studies using these 2 methods have achieved relatively good results [23-25].

There is an opinion that ML will help mental health practitioners redefine mental illnesses more objectively than is currently done in the Diagnostic and Statistical Manual of Mental Disorders [3] and would help in the early identification of these illnesses to make interventions more effective [22]. Thus, in addition to disease-model refinement, ML may benefit psychiatry by characterizing those at risk and personalizing and discovering pharmacological therapeutics [26,27].

A literature review of ML and big data research analytics in maternal depression is pertinent and timely, given the rapid technological developments in recent years. This review aims to provide a concise snapshot of the literature on ML applications for predicting PPD. Previous reviews have demonstrated ML techniques to be robust and scalable for general depression and mental health, but no review to date has mapped ML applications within maternal mental health research and practice. Our overall aim is to examine the current state of affairs of ML applications in PPD, providing a snapshot of the methods used. Keeping in view the rapid advancements in ML and the recent use of ML in mental health research, we chose to focus specifically on exploring broadly the nature of research activity, as per the first goal of scoping reviews by Arksey and O’Malley [28].


It is hoped that this scoping review will (1) inform mental health researchers of the methods and applications of ML in the context of prediction of PPD, (2) identify the best-performing algorithm, and (3) identify the evaluation criteria for the best-performing algorithm.


The Arksey and O’Malley framework was used in addition to methodological improvements for scoping review [28-30]. Our methods also align with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist [31]. A scoping review methodology was chosen to map the body of literature on the use of ML in predicting PPD, including a greater range of study designs and methodologies, to provide a descriptive overview of the reviewed material.

Search Strategy

The search strategy was adapted from Shatte et al [21], which is a similar review of big data applications in mental health. As ML and PPD stretch across interdisciplinary fields, the search was conducted in both health and information technology databases. First, a literature search was conducted using health-related research databases, including PsycINFO and PubMed. Next, the information technology databases IEEE Xplore and the ACM Digital Library were searched. Finally, databases that index both fields, including Scopus and Web of Science, were searched. The search period for relevant studies was conducted in September 2020. The search terms included variations in the terms for the following:

  • (a) PPD (maternal∗, perinatal∗, postpartum blues∗, baby blues∗, depression∗, post birth depression∗)
  • (b) ML (machine learning*, artificial intelligence*, supervised learning*, big data*)
  • (c) Prediction (predictive models∗, prediction∗, detection*)

The search was conducted on titles, keywords, and abstracts with AND entered into the database search to link different categories (a, b, and c) of search terms. Truncation symbols (∗) were used to search for all possible forms of a search term (Multimedia Appendix 1). Forward reference searching, that is, examining the references cited in these articles, and backward reference searching, that is, reviewing the references cited in these articles, were applied to identify further studies that met the inclusion criteria.

Study Selection

Articles were included and excluded (Textbox 1) in the review if the following criteria were met.

Inclusion and exclusion criteria.

Inclusion criteria

  • The article reported on a method or application of machine learning (ML) to address postpartum depression only, based on the authors’ descriptions of their analyses: if they deemed it ML, the paper was included.
  • The article evaluated the performance of the ML algorithm or big data technique used to predict postpartum depression.
  • The article was published in a peer-reviewed publication.
  • The article was available in English.
  • The article was published between 2009 and 2021.

Exclusion criteria

  • The article did not report ML applications in postpartum depression (eg, the paper commented on the use of ML in diagnosis, treatment, or prognosis of general depression, anxiety, and other mental health issues).
  • The article did not focus on postpartum depression.
  • The full text of the article was not available (eg, conference or abstracts).
  • If articles were commentaries and essays. Two reviewers (KS and AFK) independently reviewed all studies and reached a consensus on all included studies after consultation with the third author (ZAB).
Textbox 1. Inclusion and exclusion criteria.

Data Extraction and Analysis Plan

For data extraction and analysis, we used the same framework already used in a similar scoping review [32]. For each article, data were extracted regarding (1) overall aim of research, that is, prediction and area of focus, that is, PPD; (2) input data type used; (3) type of ML algorithms used; and (4) the best-performing algorithm, that is, results.

To analyze the data, a narrative review synthesis method [32] was selected to capture the extensive range of research investigating ML and big data for PPD prediction. A meta-analysis was not deemed appropriate, given the aim of identifying research activity in the interdisciplinary field of big data and maternal mental health.


The search strategies using a combination of search terms identified 1392 articles that included a search term from each category in their abstract or title (PRISMA-ScR flowchart). The range for publication year of relevant articles was 2009-2021. A total of 24 articles were duplicates. A database search was carried out by KS and AFK. Abstracts of 368 articles were read by both authors to perform an initial screening of eligibility for this scoping review. Of these, 347 were excluded because they did not focus specifically on PPD. A total of 21 articles were selected for full-text review, but 3 were conference papers and abstract only, and 4 did not use ML to predict PPD. This resulted in a total sample of 14 studies, including one preprint and one focused on predicting PPD in fathers, which met the inclusion criteria according to all authors (Figure 1). The selected 14 studies were reviewed in full by 2 authors (including KS and AFK). A mutual consensus was reached after the final approval from ZAB. In the subsequent narrative analysis, we focus on the 14 studies that reported using the ML model to predict PPD (see Tables 1 and 2 for a summary of the main study characteristics).

Figure 1. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) procedural flowchart. ML: machine learning; PPD: postpartum depression.
View this figure
Table 1. Summary of the main study characteristics (N=14).
#StudyAims or objectivesSample size; input data usedDiagnosis criteria for PPDa
1Jiménez-Serrano et al [24]Develop classification models for detecting the risk of PPD during the first week after childbirth1880; hospital dataEPDSb>9; 8th or 32nd week postbirth
2Betts et al [33]Develop a prediction model to identify women at risk of postpartum psychiatric admission75,054; linked administrative health dataICDc-10 (F20.0-F39.9) or ICD-10: (F53.0-F53.1)
3Tortajada et al [34]To obtain a classification model based on feedforward multilayer perceptron to improve PPD prediction during the 32 weeks after childbirth with a high sensitivity and specificity1397; hospital dataEPDS>9; 8th or 32nd week postbirth
4Wang et al [35]To develop a PPD prediction model, using EHRsd179,980; EHRsICD-10-CM codes O99.3 and O99.34 as well as their ICD-9-CM equivalents for a diagnosis of PPD within 12 months after childbirth
5Zhang et al [36]To compare the effects of 4 different MLe models using data during pregnancy to predict PPD508; hospital dataEPDS >9.5; within 42 days postdelivery
6Zhang et al [37]Propose an ML framework for PPD risk prediction17,633 and 71,106; 2 data sets from EHRsPPD within 1 year of childbirth
7Hochman et al [38]To apply ML approach to create a prediction tool for PPD to be implemented in health care systems214,359; EHRsPPD within first year postpartum (ICD‐9 codes: 300 and 309 or ICD-10 codes: F40-F48) or acute psychotic manic episodes (ICD‐9 codes: 296.0, 296.1, 296.4, 296.6, 296.81, 298.3, 298.4, 298.8)
8De Choudhury et al [39]Detect and predict PPD 165; Facebook survey using PHQf-9PHQ-9
9Natarajan et al [23]Propose an ML-based approach for PPD prediction and diagnosis from survey information207; Facebook and Twitter survey dataPostpartum Depression Predictors Inventory
10Fatima et al [40]Use linguistic features to propose a solution for PPD that can be generalized and deployed across web-based social platforms21; text posts from RedditPPD based on linguistic feature
11Trifan et al [41]To use social media for potential diagnosis of mothers at risk of PPD and thus the implementation of early interventions512; Reddit text postsNot described
12Shatte et al [42]To identify fathers at the risk of PPD365; Reddit text postsICD-10 depression; symptom 06 months postbirth
13Moreira et al [43]Propose an algorithm for emotion-aware smart systems, capable for predicting the risk of PPD during pregnancy through biomedical and sociodemographic data analysisPerformance evaluation used data generated by wearable devices and sensorsNot described
14Shin et al [44]To develop predictive models for PPD using ML approaches28,755; pregnancy risk assessment and monitoring system dataPHQ-2

aPPD: postpartum depression.

bEPDS: Edinburgh Postnatal Depression Scale.

cICD: International Classification of Diseases.

dEHR: electronic health record.

eML: machine learning.

fPHQ: Patient Health Questionnaire.

Table 2. Summary of the main study characteristics (N=14).
#StudyPerformance metricMLa algorithms usedBest-performing algorithm
1Jiménez-Serrano et al [24]Hold-out validation
  • Naive Bayes
  • LRb
  • SVMc
  • ANNd
Naive Bayes model; G function value of 0.73
2Betts et al [33]5-Fold cross-validation in R
  • Gradient boosting
  • Elastic net methods
Boosted trees algorithm (AUCe 0.80, 95% CI 0.76-0.83)
3Tortajada et al [34]Hold-out validation
  • ANN
Multilayer perceptrons 0.82 of G and 0.81 of accuracy (95% CI 0.76-0.86) with 0.84 of sensitivity and 0.81 of specificity
4Wang et al [35]10-fold cross-validation
  • SVM
  • RFf
  • Naive Bayes
  • L2-regularized LR
  • XGBoostg
  • DTh
SVM with AUC (0.79)
5Zhang et al [36]sklearn.cross_validation package in Python
  • SVM
  • RF
SVM and feature selection RF (sensitivity=0.69; AUC=0.78)
6Zhang et al [37]5-Fold cross-validation
  • RF
  • DT
  • XGboost
  • Regularized LR
  • Multilayer perceptron
LR with L2 regularization; AUC (0.937, 95% CI 0.912-0.962)
7Hochman et al [38]Hold-out cross-validation
  • XGBoost
AUC of 0.712 (95% CI 0.690-0.733), with a sensitivity of 0.349 and a specificity of 0.905)
8De Choudhury et al [39]Not described
  • Regression models to develop a series of statistical models
Postnatal model
9Natarajan et al [23]Information not provided
  • Functional gradient boosting
  • DT
  • SVM
  • NBi
Functional gradient boosting (Roc) 0.952
10Fatima et al [40]10-Fold cross-validation
  • LR
  • SVM
  • Multilayer perceptron
Multilayer perceptron; 91∙7% accuracy for depressive content identification and up to 869% accuracy for PPD content prediction
11Trifan et al [41]Hold-out validation
  • SVM
  • Stochastic gradient descent
  • Passive aggressive classifiers
12Shatte et al [42]10-Fold cross-validation
  • SVM classifiers using behavior, emotion, linguistic style, and discussion topics as features
0.67 precision, 0.68 recall, and 0.67F−measure in model including all features
13Moreira et al [43]10-fold cross-validation
  • DT
  • SVM
  • Nearest neighbor
  • Ensemble classifiers
Ensemble classifiers
14Shin et al [44]10-Fold cross-validation
  • RF
  • Stochastic gradient boosting
  • SVM
  • Regression trees
  • NB
  • k-nearest neighbor
  • LR
  • ANN
RF method (AUC) 0.884

aML: machine learning.

bLR: logistic regression.

cSVM: support vector machine.

dANN: artificial neural network.

eAUC: area under the curve.

fRF: random forest.

gXGBoost: Extreme Gradient Boosting.

hDT: decision tree.

iNB: Naive Bayes.

A narrative synthesis of ML activity, particularly in the context of PPD, indicated the emerging nature of this field, with most studies being published in recent years. Publication dates ranged from 2009 to 2020; however, most articles were very recent. There is a 5-year gap between the first 2009 article [34] and the next study in 2014 [39], and publications have accelerated recently with 7 papers published in 2020.

Few studies have focused on developing and testing an ML algorithm for the detection and prediction of PPD, whereas other studies focused on comparing the effects of different ML algorithms to predict PPD and explore which factors in the model are the most important for PPD prediction.

Type of Input Data

When we examined the 14 studies, we identified a subgroup of 7 studies that reported on the use of ML-based models to predict PPD using clinical or hospital data and EHRs. The other 5 studies reported on the application of ML algorithms for the prediction of PPD using data from social media platforms, including Facebook, Twitter, and Reddit. However, these studies were designed to evaluate a prediction model more broadly and did not report details on ML algorithms, training, and testing procedures. Of the remaining 2 studies, one reported on the use of population data and the other used emotion-aware system data. The outcome variable PPD was assessed using psychometric tools such as Patient Health Questionnaire-9, Patient Health Questionnaire-2, Edinburgh Postnatal Depression Scale, Postpartum Depression Predictors Inventory, and ICD-9 and ICD-10 codes in the case of hospital and EHR data, whereas linguistic features were used to predict PPD from text data of social networks.

Type of ML Algorithms Used

All studies reported on the use of supervised ML models, including classification and regression algorithms, to predict PPD. Most of the studies (n=7) reported using more than one algorithm, whereas one study used only regression models to develop statistical models for their data. These included SVM (n=8) logistic regression (LR; n=6), multilayer perceptron using artificial neural network (ANN; n=5), RF (n=4), Naive Bayes (n=3), decision trees (DTs; n=3), gradient boosting (n=2), XGBoost (Extreme Gradient Boosting; n=2), functional gradient boosting (n=1), elastic net methods (n=1), k-nearest neighbor (kNN; n=2), Stochastic Gradient Boosting (n=1), passive aggressive classifiers (n=1), and ensemble classifier (n=1). The data types used to develop ML algorithms included EHRs, either administrative hospital data or organizational data (n=08), mobile and wearable sensor data (n=1), and social media data (n=5).

Reported Best-Performing Algorithm

There was considerable heterogeneity in the best-performing ML algorithm across the selected studies. To report the best-performance algorithm, most studies used sensitivity, specificity, and area under the curve (AUC). Only 5 studies described the technical approaches to cross-validation using either 5-fold or 10-fold cross-validation. One study reported that of 4 ML algorithms, including Naive Bayes, LR, SVM, and ANN, Naive Bayes showed the best balance between sensitivity and specificity as a predictive model for PPD during the first week after delivery according to the G function, with a value of 0.73 [24]. Another study using 6 ML models, including SVM, RF, Naive Bayes, L2-regularized LR, XGBoost, and DT, reported that SVM had the best performance, and the difference across the performance of SVM, L2-regularized LR, RF, Naive Bayes, and XGBoost was minimal, although differences existed with respect to sensitivity and specificity [35]. In total, 9 different ML algorithms, including RF, stochastic gradient boosting, SVM, recursive partitioning and regression trees, Naive Bayes, kNN, LR, and neural network, were used to report the overall classification accuracies of the 9 models ranging from 0.650 (kNN) to 0.791 (RF). The RF method achieved the highest area under the receiver operating characteristic curve (AUROC) value of 0.884, followed by SVM, which achieved the second-best performance with an AUC value of 0.864 [44].

Using the SVM and RF algorithms, the model based on SVM and feature selection RF had the best prediction effects (sensitivity=0.69, AUC=0.78) [36]. Five ML algorithms were trained: RF, DT, XGBoost, regularized LR, and multilayer perceptron. LR with L2 regularization was found to be the best-performing algorithm using data available up to childbirth. The AUC was 0.937 (95% CI 0.912-0.962) and 0.886 (95% CI 0.879-0.893) in hospital data sets, respectively [37]. SVM led to slightly better results in terms of F1 in the validation stage compared with stochastic gradient descent and passive aggressive classifiers [41].

Tortajada et al [34] developed 4 models for predicting PPD using a multilayer perceptron and evaluated them with the geometric mean of accuracies using a hold-out strategy. They reported that the developed models could predict PPD during the first 32 weeks after delivery with high accuracy. A similar study reported that hold-out validation showed that multilayer perceptron outperformed other techniques such as SVM and LR used in one study with 91.7% accuracy for depressive content identification and up to 86.9% accuracy for PPD content prediction [40]. Another study using gradient boosting and elastic net methods reported that the boosted trees algorithm produced the best-performing model, predicting postpartum psychiatric admission in the validation data with good discrimination (AUC 0.80, 95% CI 0.76-0.83) and achieved good calibration. This model outperformed the benchmark LR model and the elastic net model [33]. Natarajan et al [23] reported a successful functional gradient boosting algorithm that demonstrated the potential of ML in predicting PPD.

Hochman et al [38] built a model using XGBoost, an algorithm based on gradient-boosted DTs, and assessed the overall model predictive performance using the AUROC. 95% CIs were estimated using bootstrapping. The prediction model achieved an AUC of 0.712 (95% CI 0.690-0.733), with a sensitivity of 0.349 and a specificity of 0.905 at the 90th percentile risk threshold, identifying PPDs at a rate more than 3 times higher than the overall set (positive and negative predictive values were 0.074 and 0.985, respectively).

After developing a series of statistical models using regression models to predict a mother’s likelihood of PPD, the postnatal model performed the best [39]. Predictive models were developed as a series of SVM classifiers using behavior, emotion, linguistic style, and discussion topics as features. The model incorporating behavior and discussion topic features alone yielded greater recall, with 0.77 and 0.82, respectively, which may be useful for screening purposes [42]. A study using hospital data showed that ensemble classifiers represent a leading solution for predicting psychological disorders related to pregnancy [43].

Many studies did not mention which statistical tools were used for analysis; however, most used a variety of software packages in R, SAS, and Python 3. Studies have reported the use of standard libraries available for data preparation (eg, missing variables), a variety of typical ML models, and natural language processing (NLP) analyses (such as topic modeling) included in their standard packages such as R.

Principal Findings

Most of the reviewed studies used supervised classification techniques rather than other ML techniques to predict PPD. This is perhaps indicative of the extensive focus on detection and diagnosis in the literature, which is typically designed using large, retrospective, labeled data sets ideal for classification tasks [45]. All reviewed studies concluded that ML models were effective in predicting PPD, whether clinical data, EHRs, population data, and data from social media platforms. All the studies implied that the ML approach was more beneficial compared with traditional statistical approaches. However, the level of accuracy, sensitivity, or specificity that is considered acceptable varies depending on the aims of the study and the data set. None of the studies explicitly compared the ML performance with other traditional statistical analyses. In all studies, the ML approach aided researchers in answering their research questions.

The results from a cohort study for predicting PPD using hospital data reported that in the case of a small sample size, SVM can avoid overfitting while providing efficient computing time and better prediction results in depression [46,47]. The same study proposed that when the data set is small, SVM is more practical than RF in prediction research for PPD [36]. Several previous studies used the SVM algorithm to make PPD predictions, as SVM is an example of supervised learning that is most commonly used in classification problems. It focuses on minimizing the structural risks within a set of available data [36]. It has significant advantages and performs well in situations with relatively less available sample data [48]. SVM is a classifier that transforms input data into a multidimensional hyperplane using kernels to discriminate between 2 classes [49]. Jiménez-Serrano et al [24] collected data on postpartum women from 7 Spanish hospitals and used the Edinburgh Postnatal Depression Scale score as the outcome indicator to train a PPD prediction model based on SVM. Natarajan et al [23] used social media as a data source, and based on the mental health data of 173 mothers, an SVM-based PPD prediction model was established. De Choudhury [39] developed an SVM model to identify high-risk emotions and behaviors predictive of PPD using the content of Twitter posts. As these studies either target different populations or use different methods to detect the occurrence of PPD, the model prediction effects cannot be easily compared [36].

In contrast, RF models were built using a DT as the basic classifier. RF approaches have high classification accuracy, strong inductive capacity, a simple parameter adjustment process, fast calculation speed, relatively low sensitivity to missing data values, and the ability to output feature importance [50,51]. RF is an ensemble learning method that operates by constructing a multitude of DTs and outputting the class that is voted by a majority of the trees [52], and Shin et al [44] reported RF to be the best-performing algorithm for predicting PPD.

Tortajada et al [34] developed another prediction model for PPD using multilayer perceptron and pruning for pregnant Spanish women using data from 7 Spanish general hospitals from 2003 to 2004. ANNs have a remarkable ability to characterize discriminating patterns and derive meaning from complex and noisy data sets. They have been widely applied in general medicine for the differential diagnosis, classification, prediction of disease, and condition prognosis. For instance, ANNs have been applied to the diagnosis of dementia using clinical data [53] and more recently for predicting Alzheimer disease using mixed effects neural networks [54].

There is a great deal of debate about which ML model evaluation metric is best [55]. Making sense of reported ML evaluation metrics is made even more difficult because different performance parameters often provide conflicting results and the optimal ML algorithm also depends significantly on the composition of the data set [56]. Some reviewed studies reported varying degrees of accuracy and were not always explicitly clear regarding the meaning of the resulting performance metrics. Owing to the negative effects of PPD on mothers and infants [57,58], such as the negative effects on the physical and mental health of mothers, the closeness of the mother-infant bond, and infant development, it is important to have a model with high sensitivity while maintaining a high AUROC value. The selection of indicators for evaluating depression prediction models varies across studies. For example, Natarajan et al [23] and De Choudhury [39] emphasized the accuracy of the model’s prediction of PPD. Jiménez-Serrano et al [24] emphasized the sensitivity and specificity of the model. The balance between the two is the geometric mean. The AUROC is also widely used to evaluate the comprehensive performance of a model [23,25].

PPD is a highly prevalent problem but frequently goes undetected, leading to substantial treatment delays [59]. EHRs collect a large number of biometric markers and patient characteristics that could foster the detection of PPD in primary care settings. NLP and ML have the potential to complement clinical practice by categorizing and analyzing data from clinical notes [60]. NLP is a computerized process that analyzes and codes human language into text [61] that ML algorithms can analyze and use to predict outcomes [62]. Advances in technology, such as social media, smartphones, wearables, and neuroimaging, have allowed mental health researchers and clinicians to collect a vast range of data at a rapidly growing rate [63]. ML is a vigorous technique with the ability to analyze these data. A data-driven primary intervention approach using ML and EHR data may be leveraged to reduce the burden of health care providers in identifying PPD risk [37].

In the studies included in our review, individuals experiencing PPD were identified through screening surveys, their public sharing of a diagnosis on social media, Twitter, Facebook, or Reddit, and were distinguishable from control users by patterns in their language and web-based activity [23,40,42]. Automated detection methods may help identify depressed or otherwise at-risk individuals through the large-scale passive monitoring of social media and, in the future, may complement existing screening procedures [64]. Social media data and EHRs both hold the promise of innovating in the maternal mental health domain, particularly when leveraged by ML techniques [21].

Finally, there are some challenges to consider when using ML techniques in mental health applications. ML models are inevitably limited by the quality of the data used to develop the model. As such, ML does not replace other research or analytic approaches; rather, it has the potential to add value to mental health research. Many ML techniques require access to training data sets, which calls for collaboration between researchers and clinicians to maximize the usefulness of the models developed. It is important to highlight that ML might become part of evidence-based practice, in addition to clinical knowledge and existing research evidence. Greater collaboration between mental health researchers and clinicians (eg, for the provision of training data sets and for feedback on the clinical usefulness of ML algorithms) will be needed to continue to advance the applications of ML in mental health. Analyzing big data on clinical outcomes, in addition to genetic, biomedical, behavioral, environmental, and demographic patient characteristics, could help predict maternal depression. EHR databases can provide valuable, real-world, practice-based evidence to support better prediction models for at-risk patients [65]. In this way, ML offers a solution for analyzing idiographic research questions in big data [66].


This study has a few limitations. The aim of this scoping review was to provide a snapshot of the research activity in a summarized format while using a systematic search method. In line with the aims of a scoping review, we did not identify specific study designs in advance and did not assess the quality of the included studies [28]. Moreover, because of restrictions in the search methodology, there may be a chance to have missed some relevant articles, for example, broad search terms and the exclusion of nonpeer reviewed literature. This is a common limitation reported in scoping review studies attributed to maintaining a balance between breadth and depth of analysis within a rapid timeframe [67]. This review successfully mapped a cross-section of the literature on the use of ML for PPD prediction and provides a useful synthesis for researchers and clinicians to understand the potential of ML in this field. This study did not examine the effectiveness of individual ML models for predicting PPD. Such research questions would be suitable for future systematic reviews, guided by the framework outlined in our results tables, that is, the effectiveness of specific ML techniques within specific data types for specific clinical applications.


To conclude, the use of ML to predict PPD has revealed exciting advances, particularly in recent years. Compared with traditional statistical methods, ML algorithms are capable of analyzing larger data sets and performing more advanced computations. Overall, it is clear that ML can significantly improve the detection of PPD at an early stage. Research into the applications of ML to identify potential PPD predictors has demonstrated positive results. However, this work is currently limited, and further research is required to identify additional benefits of ML on maternal mental health. ML techniques and the performance of ML models may differ depending on the type, content, and accuracy of the original data; thus, it may be challenging to evaluate the performance of a single model. With ML tools becoming more accessible to researchers and clinicians, it is expected that the field will continue to grow and that novel applications for mental health will follow. Further clinical research collaborations are required to fine-tune ML algorithms for prediction and treatment. As ML algorithms continue to be refined and improved, it might be possible to help clinicians identify maternal mental illnesses at an earlier stage when interventions may be more effective and personalized treatments based on an individual’s unique characteristics. Moreover, the current lack of procedural evaluation guidelines leaves many clinicians and researchers in the field with no means to systematically evaluate the claims, maturity, and clinical readiness of an ML study [68].


We are thankful to Jackie Stapleton for her continuous support and help with the review.

Authors' Contributions

KS conceived the study, participated in its design and coordination, performed the search and data extraction, interpreted the data, and drafted the manuscript. AFK assisted with the search and data extraction and helped revise the manuscript. ZAB conceived the study, participated in its design and coordination, contributed to the interpretation of the data, and helped to draft and revise the manuscript. All authors read and approved the final manuscript.

Conflicts of Interest

None declared.

Multimedia Appendix 1

Databases and search strings used for this review.

DOCX File , 16 KB

  1. Fisher J, Cabral de Mello M, Patel V, Rahman A, Tran T, Holton S, et al. Prevalence and determinants of common perinatal mental disorders in women in low- and lower-middle-income countries: a systematic review. Bull World Health Organ 2011 Nov 24;90(2):139-149 [FREE Full text] [CrossRef] [Medline]
  2. Gaynes BN, Gavin N, Meltzer-Brody S, Lohr KN, Swinson T, Gartlehner G, et al. Perinatal depression: prevalence, screening accuracy, and screening outcomes. Evid Rep Technol Assess (Summ) 2005 Feb(119):1-8. [CrossRef] [Medline]
  3. Wisner KL, Moses-Kolko EL, Sit DK. Postpartum depression: a disorder in search of a definition. Arch Womens Ment Health 2010 Feb;13(1):37-40 [FREE Full text] [CrossRef] [Medline]
  4. Andrews-Fike C. A review of postpartum depression. Prim Care Companion J Clin Psychiatry 1999 Feb 01;1(1):9-14 [FREE Full text] [CrossRef] [Medline]
  5. Murray L, Halligan S, Cooper P. Effects of postnatal depression on mother-infant interactions and child development. In: Wachs T, Bremner G, editors. Handbook of Infant Development (2nd ed.). Oxford, United Kingdom: Wiley-Blackwell; 2010:192-220.
  6. Pawlby S, Sharp D, Hay D, O'Keane V. Postnatal depression and child outcome at 11 years: the importance of accurate diagnosis. J Affect Disord 2008 Apr;107(1-3):241-245 [FREE Full text] [CrossRef] [Medline]
  7. Wisner KL, Sit DK, McShea MC, Rizzo DM, Zoretich RA, Hughes CL, et al. Onset timing, thoughts of self-harm, and diagnoses in postpartum women with screen-positive depression findings. JAMA Psychiatry 2013 May;70(5):490-498 [FREE Full text] [CrossRef] [Medline]
  8. Goodman S, Brand S. Parental psychopathology and its relation to child psychopathology. In: Hersen M, Gross A, editors. Handbook of clinical psychology, vol 2: Children and adolescents. Hoboken, New Jersey, United States: John Wiley & Sons Inc; 2008:937-965.
  9. Muzik M, Borovska S. Perinatal depression: implications for child mental health. Ment Health Fam Med 2010 Dec;7(4):239-247 [FREE Full text] [Medline]
  10. Rahman A, Malik A, Sikander S, Roberts C, Creed F. Cognitive behaviour therapy-based intervention by community health workers for mothers with depression and their infants in rural Pakistan: a cluster-randomised controlled trial. Lancet 2008 Sep;372(9642):902-909. [CrossRef]
  11. Aliani R, Khuwaja B. Epidemiology of postpartum depression in Pakistan: a review of literature. Nat J Health Sci 2017 Feb 01;2(1):24-30. [CrossRef]
  12. Johnstone S, Boyce P, Hickey A, Morris-Yatees AD, Harris M. Obstetric risk factors for postnatal depression in urban and rural community samples. Aust N Z J Psychiatry 2001 Feb;35(1):69-74 [FREE Full text] [CrossRef] [Medline]
  13. Beck CT. Predictors of postpartum depression: an update. Nurs Res 2001;50(5):275-285 [FREE Full text] [CrossRef] [Medline]
  14. Robertson E, Grace S, Wallington T, Stewart DE. Antenatal risk factors for postpartum depression: a synthesis of recent literature. Gen Hosp Psychiatry 2004;26(4):289-295 [FREE Full text] [CrossRef] [Medline]
  15. Righetti-Veltema M, Conne-Perréard E, Bousquet A, Manzano J. Risk factors and predictive signs of postpartum depression. J Affect Disord 1998 Jun;49(3):167-180 [FREE Full text] [CrossRef]
  16. Naqa IE, Murphy MJ. What is machine learning? In: Naqa IE, Li R, Murphy M, editors. Machine Learning in Radiation Oncology. Cham: Springer; 2015:3-11.
  17. Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science 2015 Jul 17;349(6245):255-260. [CrossRef] [Medline]
  18. Gillan CM, Whelan R. What big data can do for treatment in psychiatry. Curr Opin Behav Sci 2017 Dec;18:34-42. [CrossRef]
  19. Kessler RC, van Loo HM, Wardenaar KJ, Bossarte RM, Brenner LA, Cai T, et al. Testing a machine-learning algorithm to predict the persistence and severity of major depressive disorder from baseline self-reports. Mol Psychiatry 2016 Oct;21(10):1366-1371 [FREE Full text] [CrossRef] [Medline]
  20. Chekroud AM, Zotti RJ, Shehzad Z, Gueorguieva R, Johnson MK, Trivedi MH, et al. Cross-trial prediction of treatment outcome in depression: a machine learning approach. Lancet Psychiatry 2016 Mar;3(3):243-250. [CrossRef] [Medline]
  21. Shatte AB, Hutchinson DM, Teague SJ. Machine learning in mental health: a scoping review of methods and applications. Psychol Med 2019 Jul;49(9):1426-1448. [CrossRef] [Medline]
  22. Graham S, Depp C, Lee EE, Nebeker C, Tu X, Kim HC, et al. Artificial intelligence for mental health and mental illnesses: an overview. Curr Psychiatry Rep 2019 Nov 07;21(11):116 [FREE Full text] [CrossRef] [Medline]
  23. Natarajan S, Prabhakar A, Ramanan N, Bagilone A, Siek K, Connelly K. Boosting for postpartum depression prediction. In: Proceedings of the IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE). 2017 Presented at: IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE); July 17-19, 2017; Philadelphia, PA, USA. [CrossRef]
  24. Jiménez-Serrano S, Tortajada S, García-Gómez JM. A mobile health application to predict postpartum depression based on machine learning. Telemed J E Health 2015 Jul;21(7):567-574 [FREE Full text] [CrossRef] [Medline]
  25. Jin H, Wu S, Di Capua P. Development of a clinical forecasting model to predict comorbid depression among diabetes patients and an application in depression screening policy making. Prev Chronic Dis 2015 Sep 03;12:E142 [FREE Full text] [CrossRef] [Medline]
  26. Ćosić K, Popović S, Šarlija M, Kesedžić I, Jovanovic T. Artificial intelligence in prediction of mental health disorders induced by the COVID-19 pandemic among health care workers. Croat Med J 2020 Jul 05;61(3):279-288 [FREE Full text] [Medline]
  27. Bickman L. Improving mental health services: a 50-year journey from randomized experiments to artificial intelligence and precision mental health. Adm Policy Ment Health 2020 Sep 26;47(5):795-843 [FREE Full text] [CrossRef] [Medline]
  28. Arksey H, O'Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol 2005 Feb;8(1):19-32 [FREE Full text] [CrossRef]
  29. Daudt HM, van Mossel C, Scott SJ. Enhancing the scoping study methodology: a large, inter-professional team's experience with Arksey and O'Malley's framework. BMC Med Res Methodol 2013 Mar 23;13:48 [FREE Full text] [CrossRef] [Medline]
  30. O'Brien KK, Colquhoun H, Levac D, Baxter L, Tricco A, Straus S, et al. Advancing scoping study methodology: a web-based survey and consultation of perceptions on terminology, definition and methodological steps. BMC Health Serv Res 2016 Jul 26;16:305 [FREE Full text] [CrossRef] [Medline]
  31. Tricco AC, Lillie E, Zarin W, O'Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for Scoping Reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med 2018 Oct 02;169(7):467-473 [FREE Full text] [CrossRef] [Medline]
  32. van Doorn KA, Kamsteeg C, Bate J, Aafjes M. A scoping review of machine learning in psychotherapy research. Psychother Res 2021 Jan 29;31(1):92-116 [FREE Full text] [CrossRef] [Medline]
  33. Betts KS, Kisely S, Alati R. Predicting postpartum psychiatric admission using a machine learning approach. J Psychiatr Res 2020 Nov;130:35-40 [FREE Full text] [CrossRef] [Medline]
  34. Tortajada S, García-Gomez JM, Vicente J, Sanjuán J, de Frutos R, Martín-Santos R, et al. Prediction of postpartum depression using multilayer perceptrons and pruning. Methods Inf Med 2009;48(3):291-298 [FREE Full text] [CrossRef] [Medline]
  35. Wang S, Pathak J, Zhang Y. Using electronic health records and machine learning to predict postpartum depression. Stud Health Technol Inform 2019 Aug 21;264:888-892. [CrossRef] [Medline]
  36. Zhang W, Liu H, Silenzio VM, Qiu P, Gong W. Machine learning models for the prediction of postpartum depression: application and comparison based on a cohort study. JMIR Med Inform 2020 Apr 30;8(4):e15516 [FREE Full text] [CrossRef] [Medline]
  37. Zhang Y, Wang S, Hermann A, Joly R, Pathak J. Development and validation of a machine learning algorithm for predicting the risk of postpartum depression among pregnant women. J Affect Disord 2021 Jan 15;279:1-8 [FREE Full text] [CrossRef] [Medline]
  38. Hochman E, Feldman B, Weizman A, Krivoy A, Gur S, Barzilay E, et al. Development and validation of a machine learning-based postpartum depression prediction model: a nationwide cohort study. Depress Anxiety 2021 Apr 07;38(4):400-411. [CrossRef] [Medline]
  39. De Choudhury M, Counts S, Horvitz EJ, Hoff A. Characterizing and predicting postpartum depression from shared Facebook data. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing. USA: Association for Computing Machinery; 2014 Presented at: CSCW'14: Computer Supported Cooperative Work; February 15 - 19, 2014; Baltimore Maryland USA p. 626-638. [CrossRef]
  40. Fatima I, Abbasi BU, Khan S, Al-Saeed M, Ahmad HF, Mumtaz R. Prediction of postpartum depression using machine learning techniques from social media text. Expert Syst 2019 Apr 26;36(4):e12409. [CrossRef]
  41. Trifan A, Semeraro D, Drake J, Bukowski R, Oliveira JL. Social media mining for postpartum depression prediction. Stud Health Technol Inform 2020 Jun 16;270:1391-1392. [CrossRef] [Medline]
  42. Shatte AB, Hutchinson DM, Fuller-Tyszkiewicz M, Teague SJ. Social media markers to identify fathers at risk of postpartum depression: a machine learning approach. Cyberpsychol Behav Soc Netw 2020 Sep;23(9):611-618 [FREE Full text] [CrossRef] [Medline]
  43. Moreira MW, Rodrigues JJ, Kumar N, Saleem K, Illin IV. Postpartum depression prediction through pregnancy data analysis for emotion-aware smart systems. Inf Fusion 2019 May;47:23-31. [CrossRef]
  44. Shin D, Lee KJ, Adeluwa T, Hur J. Machine learning-based predictive modeling of postpartum depression. J Clin Med 2020 Sep 08;9(9):2899 [FREE Full text] [CrossRef] [Medline]
  45. Ghassemi M, Naumann T, Schulam P, Beam AL, Chen IY, Ranganath R. A review of challenges and opportunities in machine learning for health. AMIA Jt Summits Transl Sci Proc 2020;2020:191-200 [FREE Full text] [Medline]
  46. Malki K, Koritskaya E, Harris F, Bryson K, Herbster M, Tosto MG. Epigenetic differences in monozygotic twins discordant for major depressive disorder. Transl Psychiatry 2016 Jun 14;6(6):e839 [FREE Full text] [CrossRef] [Medline]
  47. Patel MJ, Khalaf A, Aizenstein HJ. Studying depression using imaging and machine learning methods. Neuroimage Clin 2016;10:115-123 [FREE Full text] [CrossRef] [Medline]
  48. Boser B, Guyon I, Vapnik V. A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. New York, New York, USA: ACM Press; 1992 Presented at: COLT92: 5th Annual Workshop on Computational Learning Theory; July 27 - 29, 1992; Pittsburgh Pennsylvania USA p. 144-152. [CrossRef]
  49. Gultepe E, Green J, Nguyen H, Adams J, Albertson T, Tagkopoulos I. From vital signs to clinical outcomes for patients with sepsis: a machine learning basis for a clinical decision support system. J Am Med Inform Assoc 2014;21(2):315-325 [FREE Full text] [CrossRef] [Medline]
  50. Yao D, Yang J. Research on feature selection and classification method based on random forest for medical data. Dissertation - Harbin Engineering University. 2017.   URL: [accessed 2021-11-09]
  51. Hapfelmeier A, Hothorn T, Ulm K, Strobl C. A new variable importance measure for random forests with missing data. Stat Comput 2012 Aug 28;24(1):21-34. [CrossRef]
  52. Breiman L. Random forests. Mach Learn 2001;45:5-32. [CrossRef]
  53. Mulsant B, Servan-Schreiber E. A connectionist approach to the diagnosis of dementia. In: Proceedings of the Annual Symposium on Computer Application in Medical Care. 1988 Presented at: Annual Symposium on Computer Application in Medical Care; November 7-11, 1988; Orlando, FL, USA   URL:
  54. Tandon R, Adak S, Kaye JA. Neural networks for longitudinal studies in Alzheimer's disease. Artif Intell Med 2006 Mar;36(3):245-255 [FREE Full text] [CrossRef] [Medline]
  55. Handelman GS, Kok HK, Chandra RV, Razavi AH, Lee MJ, Asadi H. eDoctor: machine learning and the future of medicine. J Intern Med 2018 Dec 03;284(6):603-619 [FREE Full text] [CrossRef] [Medline]
  56. Rácz A, Bajusz D, Héberger K. Multi-level comparison of machine learning classifiers and their performance metrics. Molecules 2019 Aug 01;24(15):2811 [FREE Full text] [CrossRef] [Medline]
  57. Martini J, Petzoldt J, Einsle F, Beesdo-Baum K, Höfler M, Wittchen HU. Risk factors and course patterns of anxiety and depressive disorders during pregnancy and after delivery: a prospective-longitudinal study. J Affect Disord 2015 Apr 01;175:385-395. [CrossRef] [Medline]
  58. Glasheen C, Richardson GA, Fabio A. A systematic review of the effects of postnatal maternal anxiety on children. Arch Womens Ment Health 2010 Feb;13(1):61-74 [FREE Full text] [CrossRef] [Medline]
  59. Hansotte E, Payne SI, Babich SM. Positive postpartum depression screening practices and subsequent mental health treatment for low-income women in Western countries: a systematic literature review. Public Health Rev 2017 Jan 31;38:3 [FREE Full text] [CrossRef] [Medline]
  60. Longhurst CA, Harrington RA, Shah NH. A 'green button' for using aggregate patient data at the point of care. Health Aff (Millwood) 2014 Jul;33(7):1229-1235 [FREE Full text] [CrossRef] [Medline]
  61. Lacson R, Khorasani R. Natural language processing: the basics (part 1). J Am Coll Radiol 2011 Jun;8(6):436-437 [FREE Full text] [CrossRef] [Medline]
  62. Monuteaux MC, Stamoulis C. Machine learning: a primer for child psychiatrists. J Am Acad Child Adolesc Psychiatry 2016 Oct;55(10):835-836 [FREE Full text] [CrossRef] [Medline]
  63. Chen M, Mao S, Liu Y. Big data: a survey. Mobile Netw Appl 2014 Jan 22;19(2):171-209 [FREE Full text] [CrossRef]
  64. Guntuku S, Yaden D, Kern M, Ungar L, Eichstaedt J. Detecting depression and mental illness on social media: an integrative review. Curr Opin Behav Sci 2017 Dec;18:43-49. [CrossRef]
  65. Lutz W, Rubel JA, Schwartz B, Schilling V, Deisenhofer AK. Towards integrating personalized feedback research into clinical practice: development of the Trier Treatment Navigator (TTN). Behav Res Ther 2019 Sep;120:103438 [FREE Full text] [CrossRef] [Medline]
  66. Silberschatz G. Improving the yield of psychotherapy research. Psychother Res 2017 Jan 11;27(1):1-13. [CrossRef] [Medline]
  67. Pham MT, Rajić A, Greig JD, Sargeant JM, Papadopoulos A, McEwen SA. A scoping review of scoping reviews: advancing the approach and enhancing the consistency. Res Synth Methods 2014 Dec;5(4):371-385 [FREE Full text] [CrossRef] [Medline]
  68. Cearns M, Hahn T, Baune BT. Recommendations and future directions for supervised machine learning in psychiatry. Transl Psychiatry 2019 Oct 22;9(1):271 [FREE Full text] [CrossRef] [Medline]

ANN: artificial neural network
AUC: area under the curve
AUROC: area under the receiver operating characteristic curve
DT: decision tree
EHR: electronic health record
ICD: International Classification of Diseases
kNN: k-nearest neighbor
LR: logistic regression
ML: machine learning
NLP: natural language processing
PPD: postpartum depression
PRISMA-ScR: Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews
RF: random forest
SVM: support vector machine
XGBoost: Extreme Gradient Boosting

Edited by J Torous; submitted 22.04.21; peer-reviewed by H Oyama, C Gorrostieta; comments to author 15.08.21; revised version received 26.08.21; accepted 30.08.21; published 24.11.21


©Kiran Saqib, Amber Fozia Khan, Zahid Ahmad Butt. Originally published in JMIR Mental Health (, 24.11.2021.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Mental Health, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.