This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Mental Health, is properly cited. The complete bibliographic information, a link to the original publication on https://mental.jmir.org/, as well as this copyright and license information must be included.
Detection of depression gained prominence soon after this troublesome disease emerged as a serious public health concern worldwide.
This systematic review aims to summarize the findings of previous studies concerning applying machine learning (ML) methods to text data from social media to detect depressive symptoms and to suggest directions for future research in this area.
A bibliographic search was conducted for the period of January 1990 to December 2020 in Google Scholar, PubMed, Medline, ERIC, PsycINFO, and BioMed. Two reviewers retrieved and independently assessed the 418 studies consisting of 322 articles identified through database searching and 96 articles identified through other sources; 17 of the studies met the criteria for inclusion.
Of the 17 studies, 10 had identified depression based on researcher-inferred mental status, 5 had identified it based on users’ own descriptions of their mental status, and 2 were identified based on community membership. The ML approaches of 13 of the 17 studies were supervised learning approaches, while 3 used unsupervised learning approaches; the remaining 1 study did not describe its ML approach. Challenges in areas such as sampling, optimization of approaches to prediction and their features, generalizability, privacy, and other ethical issues call for further research.
ML approaches applied to text data from users on social media can work effectively in depression detection and could serve as complementary tools in public mental health practice.
Over recent decades, depression has increasingly become a matter of global public health concern [
Traditionally, depression is detected using standardized scales requiring patients’ subjective responses or clinical diagnoses given by attending clinicians—methods that have some shortcomings. Firstly, people’s responses to standardized scales administered in the traditional way are likely to be affected by context, the patient’s mental status at the time, the relationship between the clinician and the patient, the patient’s current mood, and the patient’s past experiences and memory bias. Traditional diagnostic methods also lack temporal granularity [
Fortunately, application of the machine learning (ML) approach to text data from social media can provide an effective solution to this question. Social media such as Twitter, Facebook, discussion forums, and microblogs have long since become popular platforms for expressing and recording individuals’ personalities, feelings, moods, thoughts, and behaviors. Social media in this review refers to a cluster of applications that build upon technological and ideological foundations [
As far as we know, there are few existing reviews of ML approaches to depression detection that use text data from social media. Some previous reviews have focused on ML applications that use neuroimaging data to predict depression. For example, Mumtaz et al [
In this paper, we systematically reviewed studies that adopted the ML approach to measure depressive symptoms based on any text mining techniques to identify sentiments using social media data. We specified the ML methods that were used to identify mental status and discuss the evolution of the methods and their pros and cons and provide suggestions for future research in the area.
We searched several English- and Chinese-language online bibliographic databases for relevant articles, specifically, Google Scholar, PubMed, Medline, ERIC, PsycINFO, and BioMed, and the Chinese Wangfang, Weipu, and China National Knowledge Infrastructure databases. Our search placed no restrictions on publication type. However, because the age of social media began in the 1990s [
Flowchart for the systematic search of studies in this review.
The article titles and abstracts were screened independently by 2 reviewers (JG and DL). The reviewers then retrieved and assessed the available full texts of the studies and excluded articles that (1) did not discuss ML approaches or detection of depression, (2) were not focused on the use of textual (as opposed to video and image) data from social media, or (3) were themselves reviews of existing research on the use of texts from social media to detect depressive symptoms with ML approaches. The 2 reviewers also recorded important data about the articles such as authors, sample size, platform, study design, assessment tools, outcome of interest, and findings. Disagreements concerning particular articles were resolved through discussions aimed at reaching consensus. Details of the process are shown in
The study quality assessment for the 17 studies included was conducted by 2 independent reviewers, using the 14-item NIH Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies [
Study quality assessment.
Reference | Q1a | Q2b | Q3c | Q4d | Q5e | Q6f | Q7g | Q8h | Q9i | Q10j | Q11k | Q12l | Q13m | Q14n | Total Score | Rank |
Wang et al [ |
1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 12 | high |
Burdisso et al [ |
1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 12 | high |
Nguyen et al [ |
1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 10 | medium |
Fatima et al [ |
1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 10 | medium |
Tung & Lu [ |
0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 9 | medium |
Husseini Orabi et al [ |
1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 10 | medium |
Islam et al [ |
1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 11 | high |
Shen et al [ |
0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 9 | medium |
De Choudhury, Gamon [ |
0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 10 | medium |
Mariñelarena-dondena et al [ |
0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 10 | medium |
Tsugawa et al [ |
1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 13 | high |
Chen et al [ |
0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 6 | low |
De Choudhury, Counts [ |
0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 9 | medium |
Dinkel et al [ |
0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 6 | low |
Sadeque et al [ |
1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 12 | high |
Shatte et al [ |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 12 | high |
Li et al [ |
1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 10 | medium |
aQ1: Was the research question or objective in this paper clearly stated?
bQ2: Was the study population clearly specified and defined?
cQ3: Was the participation rate of eligible persons at least 50%?
dQ4: Were all the subjects selected or recruited from the same or similar populations? Were inclusion and exclusion criteria for being in the study prespecified and applied uniformly to all participants?
eQ5: Was a sample size justification, power description, or variance and effect estimates provided?
fQ6: Were the exposure(s) of interest measured before the outcome(s) were measured?
gQ7: Was the timeframe sufficient so that one could reasonably expect to see an association between exposure and outcome if it existed?
hQ8: For exposures that can vary in amount or level, did the study examine different levels of the exposure in relation to the outcome?
iQ9: Were the exposure measures clearly defined, valid, reliable, and implemented consistently across all study participants?
jQ10: Were the exposure(s) assessed more than once over time?
kQ11: Were the outcome measures clearly defined, valid, reliable, and implemented consistently across all study participants?
lQ12: Were the outcome assessors blinded to the exposure status of participants?
mQ13: Was loss to follow-up after baseline 20% or less?
nQ14: Were key potential confounding variables measured and their impact on the relationship between exposure(s) and outcome(s) statistically adjusted for?
The samples, methods, and results of the 17 studies that met the inclusion criteria are summarized in
Summary of machine learning studies of detection of depression using text data from social media.
Reference | Sample | Platform | Outcome | Depression identification method | MLa approach type | Features examined | Cross- |
Type of study |
Wang et al [ |
122 depressed and 346 nondepressed subjects, the ages of the samples were not reported | Sina microblog | Bayes: mean absolute error=0.186, ROCb=0.908, F-measure=0.85; Trees: mean absolute error=0.239, ROC=0.798, F-measure=0.762; Rules: mean absolute error=0.269, ROC=0.869, F-measure=0.812 | Researcher-inferred | 3 classification approaches: Bayes, trees and rules | Ten features from three dimensions, including microblog content, interactions, and behaviors. Four of the ten features, (1st person singular, 1st person plural, positive emoticons, and negative emoticons) pertain to microblog content, while three pertain to interactions (mentioning, [being] forwarding, and commenting), and two pertain to behaviors (original blogs and blogs posted between midnight and 6:00 am). | 10-fold cross-validation | Observational cohort study |
Burdisso et al [ |
486 training subjects (83 depressed/403 nondepressed); 401 test subjects (52 depressed/349 nondepressed), the ages of the samples were not reported | SS3c: F-measure=0.61, precision=0.63, recall=0.60 |
User-declared | The proposed model: SS3 |
Words in users’ online text posts on Reddit | 4-fold cross-validation | Observational cohort study | |
Nguyen et al [ |
5000 posts made by users from clinical communities and 5000 posts from control communities, the ages of the samples were not reported | LiveJournal | Lasso to classify communities (Accuracy): ANEWd=0.89, mood=0.96, topic=1, LIWCe=1; Lasso to classify posts (Accuracy): topic=0.93, LIWC=0.88 | Community membership-based | The Lasso model | Affective features, mood tags, features topics from the LIWC, all extracted from posts on LiveJournal. | 10-fold cross-validation | Observational cohort study |
Fatima et al [ |
4026 posts (2019/2007) from depressive and non-depressive communities, the ages of the samples were not reported | LiveJournal | The proposed RFf-based model (Accuracy): post=0.898, community=0.950, depression degree=0.923; SVMg (Accuracy): post=0.8, community=0.895 | Community membership-based | Random forest, SVM | The values of the feature set serve as inputs to the classification algorithm, being extracted from first person singular, positive emotion, negative emotion, anxiety, cognitive process, insight, cause, affiliation health, and informal language of online text. | 10-fold cross-validation | Observational cohort study |
Tung & Lu [ |
724 posts, the ages of the samples were not reported | PTTh | EDDTWi: precision=0.593, recall=0.668, F-measure=0.624 | Researcher-inferred | EDDTW | Negative emotion lexicon, negative thought lexicon, negative event lexicon, and symptom lexicon. | 10-fold cross-validation | Observational cohort study |
Husseini Orabi et al [ |
154 subjects (53 labeled as Depressed/101 labeled as Control), the ages of the samples were not reported | The optimized CNNj model: accuracy=0.880 | User-declared | CNN-based models, RNNk-based models, SVM | Twitter texts from among which all the @mentions, retweets, nonalphanumeric characters, and URLs were extracted by the researchers. | 5-fold cross-validation | Observational cohort study | |
Islam et al [ |
7145 Facebook comments (58% depressed/42% nondepressed), the ages of the samples were not reported | Decision Tree (F-measure): emotional process=0.73, linguistic style=0.73, temporal process=0.73, all features=0.73; SVM (F-measure): emotional process=0.73, linguistic style=0.73, temporal process=0.73, all features=0.73; KNNl (F-measure): emotional process=0.71, linguistic style=0.70, temporal process=0.70, all features=0.67; Ensemble (F-measure): emotional process=0.73, linguistic style=0.73, temporal process=0.73, all features=0.73 | User-declared | SVM, decision tree, ensemble, KNN | Emotional information (positive, negative, anxiety, anger, and sad), linguistic style (prepositions, articles, personal, conjunctions, auxiliary verbs), temporal process information (past, present, and future) | 10-fold cross-validation | Observational cohort study | |
Shen et al [ |
1402 depressed users, 36993 depression-candidate users, and over 300 million nondepressed users, the ages of the samples were not reported | Accuracy: NBm=0.73, MSNLn=0.83, WDLo=0.77, MDLp=0.85 | User-declared | MDL, NB, MSNL, WDL | Features of network interactions (number of tweets, social interactions, and posting behaviors), user profiles (users’ personal information in social networks), and visual, emotional, and topic-level features, domain-specific features | 5-fold cross-validation | Observational cohort study | |
De Choudhury, Gamon, et al [ |
476 users (171 depressed/305 nondepressed), with a median age of 25 | Accuracy: engagement=0.553, ego-network=0.612, emotion=0.643, linguistic style=0.684, depression language=0.692, demographics=0.513, all features=0.712 | Researcher-inferred | SVM | Engagement, egocentric social graph, emotion, linguistic style, depression language, demographics | 10-fold cross-validation | Observational cohort study | |
Mariñelarena-dondena et al [ |
135 articles (20 depressed/115 nondepressed), the ages of the samples were not reported | Precision=0.850, recall=0.810, F-measure=0.829, accuracy=0.948 | User-declared | SVD, GBMq, SMOTEr | n-grams, use of which can create a large feature space and hold much important information | Not reported | Observational cohort study | |
Tsugawa et al [ |
209 Japanese users (81 depressed/128 nondepressed), and users were aged 16-55, with a median age of 28.8 years | Precision=0.61, recall=0.37, F-measure=0.46, accuracy=0.66 | Researcher-inferred | LDAs, SVM | Frequencies of words used in the tweet, ratio of tweet topics found by LDA, ratio of positive-affect words contained in the tweet, ratio of negative-affect words contained in the tweet, hourly posting frequency, tweets per day, average number of words per tweet, overall retweet rate, overall mention rate, ratio of tweets containing a URL, number of users following, number of users followed | 10-fold cross-validation | Observational cohort study | |
Chen et al [ |
446 perinatal users, the ages of the samples were not reported | WeChat circle of friends | The result of LSTMw was similar to EPDSx | Researcher-inferred | LSTM | Top 10 emotions in the data set | Not reported | Observational cohort study |
De Choudhury, Counts, et al [ |
489 users, with a median age of 25 years | Accuracy: eng.+ego=0.593, n-grams=0.600, style=0.658, emo.+ time=0.686, all features=0.701 | Researcher-inferred | PCAt, SVM | Postcentric features (emotion, time, linguistic style, n-grams), user-centric features (engagement, ego-network) | 5-fold cross-validation | Observational cohort study | |
Dinkel et al [ |
142 speakers (42 depressed/100 nondepressed), the ages of the samples were not reported | Distress Analysis Interview Corpus-Wizard of Oz (WOZ-DAIC) database | Precision=0.93, recall=0.83, F-measure=0.87 | Researcher-inferred | LSTM | Words from online posts | 10-fold cross-validation | Observational cohort study |
Sadeque et al [ |
888 users (136 depressed/752 nondepressed), the ages of the samples were not reported | F-measure: |
User-declared | LibSVM, RNN, Ensemble, WekaSVM | Depression lexicony, metamap featuresz | 5-fold cross-validation | Observational cohort study | |
Shatte et al [ |
365 fathers in the perinatal period, the ages of the samples were not reported | Precision=0.67, recall=0.68, F-measure=0.67, accuracy=0.66 | Researcher-inferred | SVM | Fathers’ behaviors, emotions, linguistic style, and discussion topics | 10-fold cross-validation | Observational cohort study | |
Li et al [ |
1,410,651 users, the ages of the samples were not reported | Accuracy:SVM (radial basis function kernel)=0.82, SVM (linear kernel)=0.87, logistic regression=0.86, naïve Bayes=0.81, simple neural network=0.87 | Researcher-inferred | SVM, logistic regression, naïve Bayes |
512 features that were extracted from tweets using a universal sentence encoder | Not reported | Observational cohort study |
aML: machine learning.
bROC: receiver operating characteristic.
cSS3: sequential S3 (smoothness, significance, and sanction).
dANEW: affective norms for English words.
eLIWC: linguistic inquiry and word count.
fRF: random forest.
gSVM: support vector machine.
hPTT: the gossip forum on the Professional Technology Temple.
iEDDTW: event-driven depression tendency warning.
jCNN: convolutional neural networks.
kRNN: recurrent neural network.
lKNN: k-nearest neighbor.
mNB: naive Bayesian.
nMSNL: multiple social networking learning.
oWDL: Wasserstein Dictionary Learning.
pMDL: multimodal depressive dictionary learning.
qGBM: gradient boosting machine.
rSMOTE: synthetic minority oversampling technique.
sLDA: latent Dirichlet allocation.
tPCA: principal component analysis.
uLibSVM: library for support vector machines.
vWekaSVM: Waikato Environment for Knowledge Analysis for support vector machines.
wLSTM: long short-term memory.
xEPDS: Edinburgh Postnatal Depression Scale.
yA cluster of unigrams that has a great likelihood of appearing in depression-related posts.
zThe features were extracted using Metamap based on concepts from the Unified Medical Language System Metathesaurus.
Summary of the studies’ depression identification methods.
Type of depression identification method and reference | Platform | Specific diagnostic method | ||
|
|
|
||
|
De Choudhury, Gamon, et al [ |
CES-Da questionnaire | ||
|
De Choudhury, Counts, et al [ |
CES-Da questionnaire | ||
|
Tsugawa et al [ |
CES-Da questionnaire | ||
|
Chen et al [ |
The Edinburgh Postnatal Depression Scale (EPDS) questionnaire | ||
|
Dinkel et al [ |
|
The Patient Health Questionnaire (PHQ-8) | |
|
Li et al [ |
The Patient Health Questionnaire (PHQ) | ||
|
Wang et al [ |
Sina Microblog | Diagnosis by psychologists using interviews and questionnaires | |
|
Tung et al [ |
PTTb | Diagnosis by three professional students | |
|
Shatte et al [ |
ICD-10c and diagnosis by a clinical psychologist specializing in perinatal mental health | ||
|
|
|
||
|
Burdisso et al [ |
Statements specifically indicating depression, such as “I was diagnosed with depression.” | ||
|
Mariñelarena-dondena et al [ |
Documents declaring depression diagnoses | ||
|
Sadeque et al [ |
Statements like “I have been diagnosed with depression.” | ||
|
Husseini Orabi et al [ |
Documents declaring depression diagnoses | ||
|
Shen et al [ |
Tweets of statements like “I was diagnosed with depression.” | ||
|
Islam et al [ |
Indication of depression by ground truth label information on selected posts | ||
|
|
|
||
|
Nguyen et al [ |
LiveJournal | Five “clinical” communities and five “control” communities | |
|
Fatima et al [ |
LiveJournal | Five depressed and five nondepressed communities |
aCES-D: Center For Epidemiologic Studies Depression Scale.
bPTT: the gossip forum on the professional technology temple.
cICD-10: International Classification Of Diseases, tenth revision.
Researcher-inferred mental status means that researchers identified the users’ mental status based on the content of the users’ online posts using ML approaches and professional diagnostic scales or expert opinions. Among the studies reviewed, 9 out of 17 studies [
In particular, De Choudhury et al [
In addition, Tsugawa et al [
Finally, Dinkel et al [
In addition, Li et al [
One of the 2 studies in which depression was identified in a traditional way was Wang et al’s [
One other study combined depression diagnostic criteria with expert opinions. Shatte et al [
“User-declared mental status” means that users declared, in social media posts, that they had been diagnosed with depression. Six studies used depression identification based on user declarations of mental status in the social media data. Burdisso et al [
Two other studies constructed their experimental data sets using Twitter data. Husseini Orabi et al [
Islam et al [
Two studies that both explored depression identification, using community membership as an identifier, collected their data from LiveJournal. To construct a balanced data set, Nguyen et al [
The ML approaches used in these studies included supervised learning (SL) and unsupervised learning (UL) approaches. SL methods specify a targeted outcome variable, such as the presence of a mental disorder, and are often used in prediction tasks. UL methods are used to detect relationships among the variables in a data set in the absence of a specified target outcome or response variable to supervise the analyses. UL aims to discover underlying structures such as clusters, components, or dimensions, in the data set [
Summary of the machine learning approaches used in the depression detection studies.
Study | Machine learning approaches | Features | Outcomes | ||||
|
|||||||
|
Nguyen et al [ |
The Lasso model | Affective features, mood tags, thee linguistic inquiry and word count (LIWC) features and topics that were all extracted from posts on LiveJournal. | Community classification of user (Accuracy): ANEW=0.89, mood=0.96, topic=1, LIWC=1; Community classification of post (Accuracy): topic=0.93, LIWC=0.88 | |||
|
Chen et al [ |
LSTMa | Top 10 emotions in the data set | Depression, according to the LSTM, and according to the EPDSb. The results were similar for both | |||
|
Dinkel et al [ |
LSTM | Words from online posts | Precision=0.93, recall=0.83, F-measure=0.87 | |||
|
Wang et al [ |
Bayes,Trees, and Rules | Micro-blog content, interactions, and behaviors | Bayes: Mean absolute error=0.186, ROC=0.908, F-measure=0.85; Trees: Mean absolute error=0.239, ROC=0.798, F-measure=0.762; Rules: Mean absolute error=0.269, ROC=0.869,F-measure=0.812 | |||
|
Burdisso et al [ |
The proposed model: SS3 | Words in online text users posts on Reddit | SS3: F-measure =0.61, precision=0.63, recall=0.60 | |||
|
De Choudhury, Gamon et al [ |
SVMc | Engagement, egocentric social graph, emotion, linguistic style, depression language, demographics | Accuracy: engagement=0.553, ego-network=0.612, emotion=0.643, linguistic style=0.684, depression language=0.692, demographics=0.513, all features=0.712 | |||
|
Tsugawa et al [ |
LDAd, SVM | Frequencies of words used in the tweet, ratio of tweet topics found by LDA, ratio of positive-affect words contained in the tweet, ratio of negative-affect words contained in the tweet, hourly posting frequency, tweets per day, average number of words per tweet, overall retweet rate, overall mention rate, ratio of tweets containing a URL, number of users following, number of users followed | Precision=0.61, recall=0.37, F-measure=0.46, accuracy=0.66 | |||
|
Islam et al [ |
SVM, decision tree, ensemble, KNNe | Emotional information (positive, negative, anxiety, anger, and sad), linguistic style (prepositions, articles, personal, conjunctions, auxiliary verbs), temporal process information (past, present, and future) | Decision Tree (F-measure): emotional process=0.73, linguistic style=0.73, temporal process=0.73, all features=0.73; SVM (F-measure): emotional process=0.73, linguistic style=0.73, temporal process=0.73, all features=0.73; KNN (F-measure): emotional process=0.71, linguistic style=0.70, temporal process=0.70, all features=0.67; Ensemble (F-measure): emotional process=0.73, linguistic style=0.73, temporal process=0.73, all features=0.73 | |||
|
Fatima et al [ |
Random forests, SVM | The feature set values serve as an input to the classification algorithm, which were extracted from first person singular, positive emotion, negative emotion, anxiety, cognitive process, insight, cause, affiliation health, and informal language of text online. | The proposed RF-based model (Accuracy): post=0.898, community=0.950, depression degree=0.923; SVM (Accuracy): post=0.82, community=0.895 | |||
|
Shatte et al [ |
SVM | Fathers’ behaviors, emotions, linguistic style, and discussion topics | Precision=0.67, recall=0.68, F-measure=0.67, accuracy=0.66 | |||
|
Husseini Orabi et al [ |
CNNf-based models、RNNg-based models、SVM | Twitter text, among which all the @mentions, retweets, nonalphanumeric characters and, URLs were removed by researchers. | The optimized CNN model: accuracy=0.880 | |||
|
Sadeque et al [ |
LibSVM, RNN, Ensemble, WekaSVM | Depression lexicon, metamap features |
F-measure: LibSVM=0.40, WekaSVM=0.30, RNN=0.34, Ensemble=0.45 | |||
|
Li et al [ |
SVM, logistic regression, naïve Bayes classifier, simple neural network | 512 features that were extracted from tweets using a universal sentence encoder | Accuracy: SVM (radial basis function kernel)=0.82, SVM (linear kernel)=0.87, logistic regression=0.86, naïve Bayes=0.81, simple neural network=0.87 | |||
|
Tung et al [ |
EDDTWj | Negative emotion lexicon, negative thought lexicon, negative event lexicon, and symptoms lexicon | Precision=0.593 |
|||
|
|||||||
|
Shen et al [ |
MDLk, NBl, MSNLm, WDLn | Social network feature (number of tweets, social interactions, and posting behaviors), user profile feature (users’ personal information in social networks), visual feature, emotional feature, topic-level feature、domain-specific feature | Accuracy: NB=0.73, MSNL=0.83, WDL=0.77, MDL=0.85 | |||
|
De Choudhury, Counts et al [ |
PCAo, SVM | Post-centric features (emotion, time, linguistic style, n-grams), user-centric features (engagement, ego-network) | Accuracy: eng.+ego=0.593, n-grams=0.600, style=0.658, emo.+time=0.686, all features=0.701 | |||
|
Mariñelarena-dondena et al [ |
SVDp, GBMq, SMOTEr | n-grams that could produce large feature space and hold important information | Precision=0.850, recall=0.810, F-measure=0.829, accuracy=0.948 |
aLSTM: long short-term memory.
bEPDS: Edinburgh Postnatal Depression Scale Questionnaire.
cSVM: support vector machine.
dLDA: latent Dirichlet allocation.
eKNN: k-nearest neighbor.
fCNN: convolutional neural networks.
gRNN: recurrent neural network.
hLibSVM: a library for support vector machines.
iWekaSVM: Waikato Environment for Knowledge for support vector machines.
jEDDTW: event-driven depression tendency warning.
kMDL: multimodal depressive dictionary learning.
lNB: naive Bayesian.
mMSNL: multiple social networking learning.
nWDL: Wasserstein Dictionary Learning.
oPCA: principal component analysis.
pSVD: singular value decomposition.
qGBM: gradient boosting machine.
rSMOTE: synthetic minority oversampling technique.
The SL approaches used include regression and classification. Among the 14 studies that employed SL, 3 used regression [
Of the 3 that used regression-type SL approaches, Nguyen et al [
Additionally, 8 studies focused on depression detection using classification-type SL approaches [
Of the 4 studies that focused on posting submissions, Tung et al [
Finally, 3 studies combined regression and classification [
Three studies combined SL and UL approaches. Shen et al [
Our review aimed to outline studies that conducted depression detection with ML approaches based on text from social media. According to studies included in this review, researchers would extract features from online text users posted on social media using text analysis strategies such as LIWC and other word-embedding models. Next, the researchers input the features into ML models to conduct depression prediction. The features among the seventeen studies were all produced based on words in the online text, such as emotional information, linguistic style, temporal process information, social network features, etc. As for ML approaches used in depression predicting, SL was adopted more than UL. According to the above-mentioned studies [
It is worth noting that there are some common patterns in the studies reviewed here. In terms of depression identification, the existing studies analyzed in this review are consistent in that the researchers, in each case, first identified depressed and nondepressed groups among their subjects, according to either researcher-inferred mental status, user-declared mental status, community membership, or clinicians’ judgments and then explored ways of classifying the subjects using ML approaches, and measured the accuracies of the models’ predictions. Furthermore, most of the experiments reviewed here used SL rather than UL models. UL is used to identify unobserved or underlying psychological dimensions and explore how to achieve optimal classification, while SL uses existing information in the feature database for higher-level analyses concerning identification. SL uses classifications established ahead of time to explore ways to forecast a specific outcome of interest, such as the presence of a psychiatric disorder (eg, depression and anxiety). UL explores phenomena such as clustering and compression within sets of unlabeled data [
Several limitations and challenges of the depression identification and predicting models reviewed here should be acknowledged. Firstly, for depression identification, the fact that some types of information about the individuals, such as sociodemographic characteristics, behaviors behind the scenes, psychological, social, and cultural environment are often lacking in social media data and pose challenges that may be hard to resolve [
The challenges facing ML approaches for depression detection may, however, be resolvable. For example, existing studies indicate that homophily exists among depressed users; that is to say, friends who interact with depressed users frequently are more likely to have depressive symptoms themselves. Therefore, the interactions and ties between users are significant. But the data used in such prediction models tend to be widely scattered on social media, and it is difficult to analyze the connections among individuals in such a way as to improve the accuracies of the ML approaches [
In addition, the approaches and features selected for the analyses are crucial aspects of studies in this area. Wider ranges of possible features, such as specific depression lexicons appropriate for particular cultural populations or groups, and more complex techniques for analyzing posts should be explored with a view to ameliorating experimental processes and improving the accuracies of models [
To improve the validity and feasibility of depression detection research based on the application of ML approaches to social media data, increased efforts to reduce research bias will be needed. For depression identification, researchers should employ criteria and tools for depression diagnosis that are both accurate and suitable for different online populations. Moreover, collecting personal information such as sociodemographic characteristics and behaviors behind the scenes should also be considered, where necessary and ethical [
In summary, the studies described in this review have demonstrated that ML approaches can be effective for detecting depression using text data from social media and that the objective of developing a highly valid approach for such research may be within reach. Additionally, it seems appropriate and applicable for these methods to function as a complementary tool to the more traditional, established methods for diagnosing depression. However, further research is still needed in the areas of sample size, optimization of predictive approaches and features, generalizability, privacy issues, and general research ethics.
application programming interface
convolutional neural networks
event-driven depression tendency warning
latent Dirichlet allocation
linguistic inquiry and word count
long short term memory
multimodal depressive dictionary learning
machine learning
Gossip Forum on the Professional Technology Temple
recurrent neural network
supervised learning
sequential S3 (smoothness, significance, and sanction)
support vector machine
unsupervised learning
Distress Analysis Interview Corpus-Wizard of Oz
This work was supported by the National Natural Science Fund of China (grant numbers 71761130083 and 82173636).
JG and DL designed the study. DL conducted the analysis, interpreted the collected articles, and wrote the first draft of the manuscript. DL, XLF, JG, FA, and MS contributed to valuable revisions of this manuscript. All authors contributed to further writing and approved the final version.
None declared.