Review
Abstract
Background: A growing body of literature has detailed the use of qualitative analyses to measure the therapeutic processes and intrinsic effectiveness of psychotherapies, which yield small databases. Nonetheless, these approaches have several limitations and machine learning algorithms are needed.
Objective: The objective of this study is to conduct a systematic review of the use of machine learning for automated text classification for small data sets in the fields of psychiatry, psychology, and social sciences. This review will identify available algorithms and assess if automated classification of textual entities is comparable to the classification done by human evaluators.
Methods: A systematic search was performed in the electronic databases of Medline, Web of Science, PsycNet (PsycINFO), and Google Scholar from their inception dates to 2021. The fields of psychiatry, psychology, and social sciences were selected as they include a vast array of textual entities in the domain of mental health that can be reviewed. Additional records identified through cross-referencing were used to find other studies.
Results: This literature search identified 5442 articles that were eligible for our study after the removal of duplicates. Following abstract screening, 114 full articles were assessed in their entirety, of which 107 were excluded. The remaining 7 studies were analyzed. Classification algorithms such as naive Bayes, decision tree, and support vector machine classifiers were identified. Support vector machine is the most used algorithm and best performing as per the identified articles. Prediction classification scores for the identified algorithms ranged from 53%-91% for the classification of textual entities in 4-7 categories. In addition, 3 of the 7 studies reported an interjudge agreement statistic; these were consistent with agreement statistics for text classification done by human evaluators.
Conclusions: A systematic review of available machine learning algorithms for automated text classification for small data sets in several fields (psychiatry, psychology, and social sciences) was conducted. We compared automated classification with classification done by human evaluators. Our results show that it is possible to automatically classify textual entities of a transcript based solely on small databases. Future studies are nevertheless needed to assess whether such algorithms can be implemented in the context of psychotherapies.
doi:10.2196/22651
Keywords
Introduction
The intrinsic effectiveness of psychotherapies is generally measured through semistructured interviews or self-reported questionnaires [
- ]. However, these instruments have limitations in relation to constructs that can be set a priori, for which there are standardized measures available. To assess the intrinsic effectiveness of psychotherapies (the psychotherapeutic process itself), an increasing number of research teams have started to use qualitative methods. Although these approaches have inherent biases (eg, data analysis subjectivity), mathematical algorithms can be used to reduce such biases. Furthermore, assessment of a psychotherapy’s intrinsic effectiveness usually refers to an assessment of a patient’s characteristics and the therapeutic process [ ]. Studies often use therapy session transcripts to qualitatively evaluate psychotherapies [ ]. For in-person therapies, transcriptions are often time-consuming and classifying therapeutic interactions under various themes (labels) for analysis is even more demanding. Machine learning is a potential solution to reduce the amount of labor-intensive work required [ ]. With the increasing development of new psychotherapies for various psychopathologies, there is a higher need for tools to measure and understand their effectiveness.Text mining is one of the few techniques used in psychiatry to derive data from the large number of interactions that occur during therapy sessions [
]. One such technique is the use of artificial intelligence by means of machine learning. It is currently being used in many areas in the medical field, ranging from surgical procedure analyses to medical diagnostics [ ]. When attempting to classify textual entities from medical fields into various categories, the text is often classified into a few categories. This can be done by applying a set of rules to an algorithm to be used for classification and is usually facilitated by the nature of the entity being classified (eg, signs and symptoms relating to a particular diagnosis or treatment) [ ]. Classification of therapeutic interactions can be tricky considering the vast array of information associated with the therapy itself, the ability of the patient to communicate, and the context in which the therapy is being conducted [ ]. This leads to transcripts that may vary widely from patient to patient; therefore, the information is less directly interpretable than medical records or results. In relevant fields where such data is usually used for research, such as psychiatry and psychology, the use of machine learning in the context of text mining in psychotherapy has been limited [ ]. Many algorithms are readily available to conduct automated text classification [ ]. Simple probabilistic mathematical algorithms (ie, naive Bayesian probability algorithms) as well as more complex ones (ie, neural networks) are available via open access libraries on the web [ ]. Machine learning algorithms often need large databases to adequately classify new data by creating training sets and testing sets [ - ]. Large databases, such as some seen in the field of internet-enabled cognitive behavioral therapy, are required for complex machine learning algorithms to adequately learn and classify new information [ ]. However, in-person therapies often yield databases that are smaller than the ones generated by internet-enabled cognitive behavioral therapy because of the need for human-driven transcriptions. This creates a need to find potential algorithms that can operate on small databases [ , ]. A machine learning algorithm applicable for small databases is therefore needed for such cases.The objective of this study is to conduct a systematic review of the use of machine learning for automated text classification for small databases in the fields of psychiatry, psychology, and social sciences to determine the best algorithm for automatically classifying the content of psychotherapy transcripts. This would provide an interesting solution for automated therapy annotations in the context of qualitative analysis and could generate data to enable the evaluation of therapeutic processes.
Methods
Search Strategies
A systematic search was performed in the electronic databases of Medline, Web Of Science, PsycNet (PsycINFO), and Google Scholar from their inception dates until 2021 using text words and indexing (MeSH) terms with keywords that were inclusive for the fields of psychiatry (eg, psychiatric, psychiatry), psychology (eg, psychology, psychotherapy, neuropsychology) and social sciences (eg, social science) and machine learning. Additional records identified through cross-referencing were used to find other studies. The fields of psychiatry, psychology, and social sciences were selected as they include a vast array of textual entities in the domain of mental health that can be reviewed. A complete electronic search strategy is available in
. The search methodology was developed by the corresponding author and a librarian specialized in mental health at the Institut universitaire en santé mentale de Montréal. Searches were completed by AH and cross-validated by MB in May 2021. No setting, date, or geographical restrictions were applied. Searches were limited to English- or French-language sources.Study Eligibility
Studies were included if they met the following criteria: (1) classification in various data categories of textual entities (eg, medical records, letters, transcripts); (2) the study was conducted in the fields of psychiatry, psychology, or social sciences; (3) automated classification of text was conducted in more than 2 data categories (text was classified in more than two features); (4) automated text classification was conducted by machine learning (either supervised or unsupervised algorithms); and (5) the number of elements in the database used was less than 10,000, which corresponds to a small database. Although there is no consensus on what a small database is, we defined a small database as one that had a maximum of 10,000 items since 5000-10,000 items have been referred to as small samples in prior studies [
- ]. Studies that use a combination of many algorithms, instead of a single algorithm, were also included. Unpublished literature was excluded as well as studies using artificial intelligence algorithms outside the scope of machine learning.Data Extraction
Data were extracted with a standardized form and cross-verified for consistency and integrity by two authors, AH and MB. Information such as size of the database, number of classification categories, algorithms used, prediction success rate (in %), and interjudge agreement were recorded.
Results
Description of Studies
Our systematic review assessed studies that used machine learning to classify text in the fields of psychiatry, psychology, and social sciences. This literature search identified 5442 articles that were eligible for our study after the removal of duplicates. Following abstract screening, 114 full articles were assessed in their entirety, of which 107 were excluded. The remaining 7 studies were analyzed. The flowchart for the inclusion of studies in this systematic review is found in
. The details of the studies are provided in . Notably, a limited number of articles on automated text classification with small databases were found. Studies that met inclusion criteria reported different types of documents used for automated annotation. Social medical content, such as forum posts in the study by Yu et al [ ] and Twitter entries in the study by Balakrishnan et al [ ] generated the largest data sets (5000 and 5453 items, respectively). Those textual entities consisted of complete or partial sentences manually written by users and were annotated in their entirety. The remaining types of documents were mainly medical records completed by physicians or health science professionals. No image or mathematical data were classified by the algorithms as part of these studies.Algorithms
Overview
Several algorithms have been used on the presented textual entities. Naive Bayes classifier, decision tree–based algorithms, support vector machine (SVM) classifiers, and combinations of multiple algorithms were the main strategies used by the included studies. The number of categories for text classification ranged from 4-7 and overall precision classification ranged from 77.0%-91.8%. For the studies that included multiple algorithms, SVM-based algorithms demonstrated the best accuracy in 5 of 7 studies.
Naive Bayes Classifier
A naive Bayes classifier is a probabilistic-based classifier that makes use of Bayes’ theorem to classify items into different categories [
]. This type of classifier achieves average performance in the context of supervised learning [ ]. This type of algorithm is advantageous when little data is available as it can be optimally parameterized in the event of a small data set [ ]. This algorithm assumes that there is independence between the predictors. For text classification, Balakrishnan et al [ ] outlined that this algorithm works best when using each word as a variable that needs to be classified.Decision Tree–Based Classifiers
Decision tree–based classifiers are nonparameterized; they are supervised learning methods that can be used to classify items [
]. Observations about an item are represented as branches and conclusions about an item's value (score) are represented as leaves [ ]. Splitting across the different branches is based on defined rules according to the categories used to classify the items. In text classification, the general idea is that every piece of text being classified is split across the branches until it reaches a leaf (category) based on probabilistic rules set by the designer of the tree [ ].SVM Classifiers
SVM classifiers can be used in both supervised and unsupervised learning contexts. In simple terms, these classifiers use the concept of a hyperplane that divides a data set into classes. A hyperplane in an n-dimensional Euclidean space is a flat, n–1 dimensional subset of that space that divides the space into two disconnected parts [
]. The items in the data set are considered as data points on the hyperplane. The item being classified is therefore categorized in one of the disconnected parts.Outcomes
In the 7 identified studies, SVM classifiers and algorithms combined with SVM classifiers tended to achieve the best prediction score (in %) as compared to other algorithms for small data sets. Studies by Zolnoori et al [
], Singh et al [ ], and Yu et al [ ] reported prediction scores of SVM classifiers that were superior to other classifiers for their data sets. Their precision scores ranged from 77%-90%. Only 3 studies attempted to compare the classification done by the classifiers with human annotators. The statistics used to assess these automated annotations were κ and pairwise agreements. The interrater agreement of these studies was comparable to interrater agreements for annotation done by human annotators; the κ scores were 0.84 [ ], 0.67 [ ], and 0.86 [ ], respectively.Discussion
Review of Findings
In this study, we conducted a systematic review to identify potential algorithms that could be useful for small databases for the automatic annotation of unannotated interview transcripts from the field of psychotherapy. The systematic review we conducted demonstrated that limited literature exists on the subject. However, few algorithms displayed sufficient accuracy when performing text classification on small databases. SVM classifiers tended to display the best accuracy in the context of small databases.
Compared to other reviews on the subject, this study highlights algorithms being used in the context of small data sets, which is consistent with the reality of studies of therapies [
], as transcribing therapy sessions is time-consuming and demanding. Regarding novel therapy developments, such as virtual reality–based therapy, this is even more needed considering the small number of patients that have received these treatments so far [ ]. Therapy usually involves a wider range of words and contextual sentences compared to other areas of medicine where specific words (eg, symptoms, signs) can be used to facilitate classification. Therefore, it is not surprising to see that this systematic review identified algorithms that differ from those that are widely used in other medical fields. For example, Srivastava et al [ ] reviewed the efficiency of different text classifiers in the context of social media posts referring to medical content. They found that a multilayer perceptron–based neural network performed best in their study as compared to a SVM classifier. Another study, conducted by Visveswaran and colleagues [ ], identified convolutional long short-term memory neural networks as the best at predicting vaping habits. This can be explained by the fact that most classifiers are combined with a vectorizer when used to classify textual entities. A vectorizer transforms text into a meaningful number vector that can then be used by classifiers [ ]. Considering that classification of textual entities to identify a specific diagnosis or medical condition usually requires specific terms that pertain to the diagnosis or condition, vectors tend to discriminate better between the textual entities of these fields [ ]. This is usually not the case with therapy transcripts in the context of analysis of the psychotherapeutic process as this analysis often requires a larger array of categories that can sometime overlap.In contrast with other types of medical data—such as imagery or numerical entities (eg, laboratory results)—where neural networks seem to be the most used class of algorithms for classification, textual classification appears to be performed with a more restricted number of classifiers [
]. This can be explained by the fact that text classification requires additional considerations. Automated classifications lack the ability to interpret a sentence out of a given context (eg, a therapeutic session), while the meaning of a sentence could change based on the context. Another complexity is that words can refer to different entities based on the sociocultural context. Therefore, considering such complexities can require further parameterizations and considerations, which may also explain why, in the identified studies, the same algorithm used on data sets of a similar size could have a diverging predictive score.Consistent with our findings, linear SVM classifiers tend to be regarded as one of the best text classifying algorithms in the literature [
]. Many types of classifiers are available, but it appears that only a few are consistently used for the classification of textual entities [ ]. This is consistent with our review, as the identified studies tended to use similar strategies when classifying textual entities. A recent literature review on data classification of clinical text data explains this phenomenon by the fact that there is a bottleneck of annotations in the context of supervised learning [ ].Limitations
This systematic review of literature focuses on the fields of psychiatry, psychology, and social sciences to reflect the type of textual entities usually found in therapy transcripts. A limitation of this study is the small number of classification algorithm studies published in these fields. As this is an emerging domain, the number of studies on the topic should increase in the future.
Conclusions
Machine learning can be beneficial for the field of psychiatry. Automated text classification for psychotherapy is a promising avenue to generate quantitative and qualitative data in an efficient way to make the data readily available for analyses. SVM classifiers appear to be preferred over other types of classifiers in the context of small databases. Using such classifiers could be useful in the evaluation of therapeutic processes of novel therapies where data are limited. Nevertheless, the limited number of articles found on the subject outlines the need for more development in this field, especially regarding the use of such classifiers in the domain of mental health.
Acknowledgments
This study was funded by Le Fonds de recherche du Québec – Santé (FRQS) and Services et recherches psychiatriques AD.
Authors' Contributions
The study was designed by AH, SP, and AD. Statistical analyses were performed by AH and MB. All the authors have made substantial contributions and have revised, edited, and approved the manuscript.
Conflicts of Interest
None declared.
Electronic search strategy for the systematic review conducted.
DOCX File , 14 KB
Detailed results of the systematic review study selection.
DOCX File , 17 KBReferences
- Ewbank MP, Cummins R, Tablan V, Bateup S, Catarino A, Martin AJ, et al. Quantifying the Association Between Psychotherapy Content and Clinical Outcomes Using Deep Learning. JAMA Psychiatry 2020 Jan 01;77(1):35-43 [FREE Full text] [CrossRef] [Medline]
- Cook SC, Schwartz AC, Kaslow NJ. Evidence-Based Psychotherapy: Advantages and Challenges. Neurotherapeutics 2017 Jul 26;14(3):537-545 [FREE Full text] [CrossRef] [Medline]
- Hill C, Chui H, Baumann E. Revisiting and reenvisioning the outcome problem in psychotherapy: An argument to include individualized and qualitative measurement. In: Kazdin AE, editor. Methodological issues and strategies in clinical research (4th ed). Washington, DC: American Psychological Association; 2016:373-386.
- Szymańska A, Dobrenko K, Grzesiuk L. Characteristics and experience of the patient in psychotherapyand the psychotherapy’s effectiveness. A structural approach. Psychiatr Pol 2017;51(4):619-631. [CrossRef]
- Perepletchikova F. On the topic of treatment integrity. Clinical Psychology: Science and Practice 2011 Jun;18(2):148-153. [CrossRef]
- Sebastiani F. Machine learning in automated text categorization. ACM Comput Surv 2002 Mar;34(1):1-47. [CrossRef]
- Abbe A, Grouin C, Zweigenbaum P, Falissard B. Text mining applications in psychiatry: a systematic literature review. Int J Methods Psychiatr Res 2016 Jun 17;25(2):86-100 [FREE Full text] [CrossRef] [Medline]
- Khalid S, Goldenberg M, Grantcharov T, Taati B, Rudzicz F. Evaluation of Deep Learning Models for Identifying Surgical Actions and Measuring Performance. JAMA Netw Open 2020 Mar 02;3(3):e201664 [FREE Full text] [CrossRef] [Medline]
- Tang S, Chappell GT, Mazzoli A, Tewari M, Choi SW, Wiens J. Predicting Acute Graft-Versus-Host Disease Using Machine Learning and Longitudinal Vital Sign Data From Electronic Health Records. JCO Clinical Cancer Informatics 2020 Sep(4):128-135. [CrossRef]
- Høglend P. Exploration of the patient-therapist relationship in psychotherapy. Am J Psychiatry 2014 Oct;171(10):1056-1066. [CrossRef] [Medline]
- Durstewitz D, Koppe G, Meyer-Lindenberg A. Deep neural networks in psychiatry. Mol Psychiatry 2019 Feb 15;24(11):1583-1598. [CrossRef]
- Gupta A, Katarya R. Social media based surveillance systems for healthcare using machine learning: A systematic review. Journal of Biomedical Informatics 2020 Aug;108:103500. [CrossRef]
- Vora S, Yang H. A Comprehensive Study of Eleven Feature Selection Algorithms and their Impact on Text Classification. In: 2017 Computing Conference. 2017 Presented at: 2017 Computing Conference; July 18-20, 2017; London, UK p. 440-449. [CrossRef]
- Deo RC. Machine Learning in Medicine. Circulation 2015 Nov 17;132(20):1920-1930 [FREE Full text] [CrossRef] [Medline]
- Cao H, Meyer-Lindenberg A, Schwarz E. Comparative Evaluation of Machine Learning Strategies for Analyzing Big Data in Psychiatry. Int J Mol Sci 2018 Oct 29;19(11):3387 [FREE Full text] [CrossRef] [Medline]
- Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D. Text Classification Algorithms: A Survey. Information 2019 Apr 23;10(4):150. [CrossRef]
- Hämäläinen W, Vinni M. Comparison of Machine Learning Methods for Intelligent Tutoring Systems. In: Intelligent Tutoring Systems. 2006 Presented at: ITS 2006; June 26-30, 2006; Jhongli, Taiwan p. 525-534. [CrossRef]
- Wanigasekara C, Swain A, Nguang SK, Prusty BG. Improved Learning from Small Data Sets Through Effective Combination of Machine Learning Tools with VSG Techniques. In: International Joint Conference on Neural Networks. 2018 Presented at: International Joint Conference on Neural Networks; 2018; Rio, Brazil.
- Shiner B, D'Avolio L, Nguyen T, Zayed M, Watts B, Fiore L. Automated classification of psychotherapy note text: implications for quality assessment in PTSD care. J Eval Clin Pract 2012 Jun;18(3):698-701 [FREE Full text] [CrossRef] [Medline]
- Slonim N, Tishby N. The Power of Word Clusters for Text Classification. In: 23rd European Colloquium on Information Retrieval Research. 2001 Jan 12 Presented at: 23rd European Colloquium on Information Retrieval Research; 2001; Darmstadt, Germany URL: http://www-old.cs.huji.ac.il/site/labs/learning/Papers/irsg3.pdf
- Joachims T. Transductive inference for text classification using support vector machines. ICML. 1999. URL: http://www1.cs.columbia.edu/~dplewis/candidacy/joachims99transductive.pdf [accessed 2020-06-15]
- Yu L, Chan C, Lin C, Lin I. Mining association language patterns using a distributional semantic model for negative life event classification. J Biomed Inform 2011 Aug;44(4):509-518 [FREE Full text] [CrossRef] [Medline]
- Balakrishnan V, Khan S, Arabnia HR. Improving cyberbullying detection using Twitter users’ psychological features and machine learning. Computers & Security 2020 Mar;90:101710. [CrossRef]
- Zhang W, Gao F. An Improvement to Naive Bayes for Text Classification. Procedia Engineering 2011;15:2160-2164. [CrossRef]
- Huang Y, Li L. Naive Bayes classification algorithm based on small sample set. 2011 Presented at: IEEE International Conference on Cloud Computing and Intelligence Systems; 2011; Beijing, China. [CrossRef]
- Vijayan VK, Bindu KR, Parameswaran L. A comprehensive study of text classification algorithms. 2017 Presented at: International Conference on Advances in Computing, Communications and Informatics (ICACCI); 2017; Udipi, India. [CrossRef]
- Kamiński B, Jakubczyk M, Szufel P. A framework for sensitivity analysis of decision trees. Cent Eur J Oper Res 2018 May 24;26(1):135-159 [FREE Full text] [CrossRef] [Medline]
- Noble WS. What is a support vector machine? Nat Biotechnol 2006 Dec;24(12):1565-1567. [CrossRef] [Medline]
- Zolnoori M, Fung KW, Patrick TB, Fontelo P, Kharrazi H, Faiola A, et al. A systematic approach for developing a corpus of patient reported adverse drug events: A case study for SSRI and SNRI medications. J Biomed Inform 2019 Feb;90:103091 [FREE Full text] [CrossRef] [Medline]
- Singh V, Shrivastava U, Bouayad L, Padmanabhan B, Ialynytchev A, Schultz S. Machine learning for psychiatric patient triaging: an investigation of cascading classifiers. J Am Med Inform Assoc 2018 Nov 01;25(11):1481-1487 [FREE Full text] [CrossRef] [Medline]
- Hartmann J, Huppertz J, Schamp C, Heitmann M. Comparing automated text classification methods. International Journal of Research in Marketing 2019 Mar;36(1):20-38. [CrossRef]
- Fodor LA, Coteț CD, Cuijpers P, Szamoskozi Ș, David D, Cristea IA. The effectiveness of virtual reality based interventions for symptoms of anxiety and depression: A meta-analysis. Sci Rep 2018 Jul 09;8(1):10323 [FREE Full text] [CrossRef] [Medline]
- Srivastava SK, Singh SK, Suri JS. Healthcare Text Classification System and its Performance Evaluation: A Source of Better Intelligence by Characterizing Healthcare Text. J Med Syst 2018 Apr 13;42(5):97. [CrossRef] [Medline]
- Visweswaran S, Colditz JB, O'Halloran P, Han N, Taneja SB, Welling J, et al. Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study. J Med Internet Res 2020 Aug 12;22(8):e17478 [FREE Full text] [CrossRef] [Medline]
- Shahmirzadi O, Lugowski A, Younge K. Text Similarity in Vector Space Models: A Comparative Study. 2017 Presented at: 18th IEEE International Conference On Machine Learning And Applications (ICMLA); 2019; Boca Raton, Florida. [CrossRef]
- Khattak FK, Jeblee S, Pou-Prom C, Abdalla M, Meaney C, Rudzicz F. A survey of word embeddings for clinical text. J Biomed Inform 2019 Dec;100S:100057 [FREE Full text] [CrossRef] [Medline]
- Yadav SS, Jadhav SM. Deep convolutional neural network based medical image classification for disease diagnosis. J Big Data 2019 Dec 17;6(1):113. [CrossRef]
- Agnihotri D, Verma K, Tripathi P. An automatic classification of text documents based on correlative association of words. J Intell Inf Syst 2017 Aug 14;50(3):549-572. [CrossRef]
- Spasic I, Nenadic G. Clinical Text Data in Machine Learning: Systematic Review. JMIR Med Inform 2020 Mar 31;8(3):e17984 [FREE Full text] [CrossRef] [Medline]
Abbreviations
SVM: support vector machine |
Edited by G Eysenbach; submitted 19.07.20; peer-reviewed by T Craig, E Frontoni, JA Benítez-Andrades, G Kannan, J Gleeson; comments to author 25.09.20; revised version received 06.10.20; accepted 27.07.21; published 22.10.21
Copyright©Alexandre Hudon, Mélissa Beaudoin, Kingsada Phraxayavong, Laura Dellazizzo, Stéphane Potvin, Alexandre Dumais. Originally published in JMIR Mental Health (https://mental.jmir.org), 22.10.2021.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Mental Health, is properly cited. The complete bibliographic information, a link to the original publication on https://mental.jmir.org/, as well as this copyright and license information must be included.