Background

JMH

JMIR Ment Health

JMIR Mental Health

2368-7959

JMIR Publications

Toronto, Canada

v8i10e22651

34677133

10.2196/22651

Review

Use of Automated Thematic Annotations for Small Data Sets in a Psychotherapeutic Context: Systematic Review of Machine Learning Algorithms

Eysenbach

Gunther

Craig

Tom

Frontoni

Emanuele

Benítez-Andrades

José Alberto

Kannan

Gokul

Gleeson

John

Hudon

Alexandre

BEng, MD 1 2

https://orcid.org/0000-0002-4868-0928

Beaudoin

Mélissa

MSc 1 2

https://orcid.org/0000-0002-0169-8055

Phraxayavong

Kingsada

BSc 3

https://orcid.org/0000-0003-3113-9104

Dellazizzo

Laura

MSc 1 2

https://orcid.org/0000-0001-8262-130X

Potvin

Stéphane

PhD 1 2

https://orcid.org/0000-0003-1624-378X

Dumais

Alexandre

MD, PhD 1

Centre de recherche de l'Institut Universitaire en Santé Mentale de Montréal

7331, rue Hochelaga

Montréal, QC

Canada 1 (514) 251 4000 alexandre.dumais@umontreal.ca

2 3 4

https://orcid.org/0000-0002-4480-0064

1 Centre de recherche de l'Institut Universitaire en Santé Mentale de Montréal

Montréal, QC

Canada 2 Department of Psychiatry and Addictology Faculty of Medicine Université de Montréal

Montréal, QC

Canada 3 Services et Recherches Psychiatriques AD

Montréal, QC

Canada 4 Institut national de psychiatrie légale Philippe-Pinel

Montréal, QC

Canada

Corresponding Author: Alexandre Dumais alexandre.dumais@umontreal.ca

10 2021

22 10 2021

8 10

e22651

19 7 2020 25 9 2020 6 10 2020 27 7 2021

©Alexandre Hudon, Mélissa Beaudoin, Kingsada Phraxayavong, Laura Dellazizzo, Stéphane Potvin, Alexandre Dumais. Originally published in JMIR Mental Health (https://mental.jmir.org), 22.10.2021.

2021

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Mental Health, is properly cited. The complete bibliographic information, a link to the original publication on https://mental.jmir.org/, as well as this copyright and license information must be included.

Background

A growing body of literature has detailed the use of qualitative analyses to measure the therapeutic processes and intrinsic effectiveness of psychotherapies, which yield small databases. Nonetheless, these approaches have several limitations and machine learning algorithms are needed.

Objective

The objective of this study is to conduct a systematic review of the use of machine learning for automated text classification for small data sets in the fields of psychiatry, psychology, and social sciences. This review will identify available algorithms and assess if automated classification of textual entities is comparable to the classification done by human evaluators.

Methods

A systematic search was performed in the electronic databases of Medline, Web of Science, PsycNet (PsycINFO), and Google Scholar from their inception dates to 2021. The fields of psychiatry, psychology, and social sciences were selected as they include a vast array of textual entities in the domain of mental health that can be reviewed. Additional records identified through cross-referencing were used to find other studies.

Results

This literature search identified 5442 articles that were eligible for our study after the removal of duplicates. Following abstract screening, 114 full articles were assessed in their entirety, of which 107 were excluded. The remaining 7 studies were analyzed. Classification algorithms such as naive Bayes, decision tree, and support vector machine classifiers were identified. Support vector machine is the most used algorithm and best performing as per the identified articles. Prediction classification scores for the identified algorithms ranged from 53%-91% for the classification of textual entities in 4-7 categories. In addition, 3 of the 7 studies reported an interjudge agreement statistic; these were consistent with agreement statistics for text classification done by human evaluators.

Conclusions

A systematic review of available machine learning algorithms for automated text classification for small data sets in several fields (psychiatry, psychology, and social sciences) was conducted. We compared automated classification with classification done by human evaluators. Our results show that it is possible to automatically classify textual entities of a transcript based solely on small databases. Future studies are nevertheless needed to assess whether such algorithms can be implemented in the context of psychotherapies.

psychotherapy artificial intelligence automated text classification machine learning systematic review

Introduction

The intrinsic effectiveness of psychotherapies is generally measured through semistructured interviews or self-reported questionnaires [1-3]. However, these instruments have limitations in relation to constructs that can be set a priori, for which there are standardized measures available. To assess the intrinsic effectiveness of psychotherapies (the psychotherapeutic process itself), an increasing number of research teams have started to use qualitative methods. Although these approaches have inherent biases (eg, data analysis subjectivity), mathematical algorithms can be used to reduce such biases. Furthermore, assessment of a psychotherapy’s intrinsic effectiveness usually refers to an assessment of a patient’s characteristics and the therapeutic process [4]. Studies often use therapy session transcripts to qualitatively evaluate psychotherapies [5]. For in-person therapies, transcriptions are often time-consuming and classifying therapeutic interactions under various themes (labels) for analysis is even more demanding. Machine learning is a potential solution to reduce the amount of labor-intensive work required [6]. With the increasing development of new psychotherapies for various psychopathologies, there is a higher need for tools to measure and understand their effectiveness.

Text mining is one of the few techniques used in psychiatry to derive data from the large number of interactions that occur during therapy sessions [7]. One such technique is the use of artificial intelligence by means of machine learning. It is currently being used in many areas in the medical field, ranging from surgical procedure analyses to medical diagnostics [8]. When attempting to classify textual entities from medical fields into various categories, the text is often classified into a few categories. This can be done by applying a set of rules to an algorithm to be used for classification and is usually facilitated by the nature of the entity being classified (eg, signs and symptoms relating to a particular diagnosis or treatment) [9]. Classification of therapeutic interactions can be tricky considering the vast array of information associated with the therapy itself, the ability of the patient to communicate, and the context in which the therapy is being conducted [10]. This leads to transcripts that may vary widely from patient to patient; therefore, the information is less directly interpretable than medical records or results. In relevant fields where such data is usually used for research, such as psychiatry and psychology, the use of machine learning in the context of text mining in psychotherapy has been limited [11]. Many algorithms are readily available to conduct automated text classification [12]. Simple probabilistic mathematical algorithms (ie, naive Bayesian probability algorithms) as well as more complex ones (ie, neural networks) are available via open access libraries on the web [13]. Machine learning algorithms often need large databases to adequately classify new data by creating training sets and testing sets [14-16]. Large databases, such as some seen in the field of internet-enabled cognitive behavioral therapy, are required for complex machine learning algorithms to adequately learn and classify new information [1]. However, in-person therapies often yield databases that are smaller than the ones generated by internet-enabled cognitive behavioral therapy because of the need for human-driven transcriptions. This creates a need to find potential algorithms that can operate on small databases [17,18]. A machine learning algorithm applicable for small databases is therefore needed for such cases.

The objective of this study is to conduct a systematic review of the use of machine learning for automated text classification for small databases in the fields of psychiatry, psychology, and social sciences to determine the best algorithm for automatically classifying the content of psychotherapy transcripts. This would provide an interesting solution for automated therapy annotations in the context of qualitative analysis and could generate data to enable the evaluation of therapeutic processes.

Methods Search Strategies

A systematic search was performed in the electronic databases of Medline, Web Of Science, PsycNet (PsycINFO), and Google Scholar from their inception dates until 2021 using text words and indexing (MeSH) terms with keywords that were inclusive for the fields of psychiatry (eg, psychiatric, psychiatry), psychology (eg, psychology, psychotherapy, neuropsychology) and social sciences (eg, social science) and machine learning. Additional records identified through cross-referencing were used to find other studies. The fields of psychiatry, psychology, and social sciences were selected as they include a vast array of textual entities in the domain of mental health that can be reviewed. A complete electronic search strategy is available in Multimedia Appendix 1. The search methodology was developed by the corresponding author and a librarian specialized in mental health at the Institut universitaire en santé mentale de Montréal. Searches were completed by AH and cross-validated by MB in May 2021. No setting, date, or geographical restrictions were applied. Searches were limited to English- or French-language sources.

Study Eligibility

Studies were included if they met the following criteria: (1) classification in various data categories of textual entities (eg, medical records, letters, transcripts); (2) the study was conducted in the fields of psychiatry, psychology, or social sciences; (3) automated classification of text was conducted in more than 2 data categories (text was classified in more than two features); (4) automated text classification was conducted by machine learning (either supervised or unsupervised algorithms); and (5) the number of elements in the database used was less than 10,000, which corresponds to a small database. Although there is no consensus on what a small database is, we defined a small database as one that had a maximum of 10,000 items since 5000-10,000 items have been referred to as small samples in prior studies [19-21]. Studies that use a combination of many algorithms, instead of a single algorithm, were also included. Unpublished literature was excluded as well as studies using artificial intelligence algorithms outside the scope of machine learning.

Data Extraction

Data were extracted with a standardized form and cross-verified for consistency and integrity by two authors, AH and MB. Information such as size of the database, number of classification categories, algorithms used, prediction success rate (in %), and interjudge agreement were recorded.

Results Description of Studies

Our systematic review assessed studies that used machine learning to classify text in the fields of psychiatry, psychology, and social sciences. This literature search identified 5442 articles that were eligible for our study after the removal of duplicates. Following abstract screening, 114 full articles were assessed in their entirety, of which 107 were excluded. The remaining 7 studies were analyzed. The flowchart for the inclusion of studies in this systematic review is found in Figure 1. The details of the studies are provided in Multimedia Appendix 2. Notably, a limited number of articles on automated text classification with small databases were found. Studies that met inclusion criteria reported different types of documents used for automated annotation. Social medical content, such as forum posts in the study by Yu et al [22] and Twitter entries in the study by Balakrishnan et al [23] generated the largest data sets (5000 and 5453 items, respectively). Those textual entities consisted of complete or partial sentences manually written by users and were annotated in their entirety. The remaining types of documents were mainly medical records completed by physicians or health science professionals. No image or mathematical data were classified by the algorithms as part of these studies.

Figure 1

Flowchart depicting the process of study selection.

Algorithms Overview

Several algorithms have been used on the presented textual entities. Naive Bayes classifier, decision tree–based algorithms, support vector machine (SVM) classifiers, and combinations of multiple algorithms were the main strategies used by the included studies. The number of categories for text classification ranged from 4-7 and overall precision classification ranged from 77.0%-91.8%. For the studies that included multiple algorithms, SVM-based algorithms demonstrated the best accuracy in 5 of 7 studies.

Naive Bayes Classifier

A naive Bayes classifier is a probabilistic-based classifier that makes use of Bayes’ theorem to classify items into different categories [12]. This type of classifier achieves average performance in the context of supervised learning [24]. This type of algorithm is advantageous when little data is available as it can be optimally parameterized in the event of a small data set [25]. This algorithm assumes that there is independence between the predictors. For text classification, Balakrishnan et al [23] outlined that this algorithm works best when using each word as a variable that needs to be classified.

Decision Tree–Based Classifiers

Decision tree–based classifiers are nonparameterized; they are supervised learning methods that can be used to classify items [26]. Observations about an item are represented as branches and conclusions about an item's value (score) are represented as leaves [27]. Splitting across the different branches is based on defined rules according to the categories used to classify the items. In text classification, the general idea is that every piece of text being classified is split across the branches until it reaches a leaf (category) based on probabilistic rules set by the designer of the tree [27].

SVM Classifiers

SVM classifiers can be used in both supervised and unsupervised learning contexts. In simple terms, these classifiers use the concept of a hyperplane that divides a data set into classes. A hyperplane in an n-dimensional Euclidean space is a flat, n–1 dimensional subset of that space that divides the space into two disconnected parts [28]. The items in the data set are considered as data points on the hyperplane. The item being classified is therefore categorized in one of the disconnected parts.

Outcomes

In the 7 identified studies, SVM classifiers and algorithms combined with SVM classifiers tended to achieve the best prediction score (in %) as compared to other algorithms for small data sets. Studies by Zolnoori et al [29], Singh et al [30], and Yu et al [22] reported prediction scores of SVM classifiers that were superior to other classifiers for their data sets. Their precision scores ranged from 77%-90%. Only 3 studies attempted to compare the classification done by the classifiers with human annotators. The statistics used to assess these automated annotations were κ and pairwise agreements. The interrater agreement of these studies was comparable to interrater agreements for annotation done by human annotators; the κ scores were 0.84 [23], 0.67 [30], and 0.86 [29], respectively.

Discussion Review of Findings

In this study, we conducted a systematic review to identify potential algorithms that could be useful for small databases for the automatic annotation of unannotated interview transcripts from the field of psychotherapy. The systematic review we conducted demonstrated that limited literature exists on the subject. However, few algorithms displayed sufficient accuracy when performing text classification on small databases. SVM classifiers tended to display the best accuracy in the context of small databases.

Compared to other reviews on the subject, this study highlights algorithms being used in the context of small data sets, which is consistent with the reality of studies of therapies [31], as transcribing therapy sessions is time-consuming and demanding. Regarding novel therapy developments, such as virtual reality–based therapy, this is even more needed considering the small number of patients that have received these treatments so far [32]. Therapy usually involves a wider range of words and contextual sentences compared to other areas of medicine where specific words (eg, symptoms, signs) can be used to facilitate classification. Therefore, it is not surprising to see that this systematic review identified algorithms that differ from those that are widely used in other medical fields. For example, Srivastava et al [33] reviewed the efficiency of different text classifiers in the context of social media posts referring to medical content. They found that a multilayer perceptron–based neural network performed best in their study as compared to a SVM classifier. Another study, conducted by Visveswaran and colleagues [34], identified convolutional long short-term memory neural networks as the best at predicting vaping habits. This can be explained by the fact that most classifiers are combined with a vectorizer when used to classify textual entities. A vectorizer transforms text into a meaningful number vector that can then be used by classifiers [35]. Considering that classification of textual entities to identify a specific diagnosis or medical condition usually requires specific terms that pertain to the diagnosis or condition, vectors tend to discriminate better between the textual entities of these fields [36]. This is usually not the case with therapy transcripts in the context of analysis of the psychotherapeutic process as this analysis often requires a larger array of categories that can sometime overlap.

In contrast with other types of medical data—such as imagery or numerical entities (eg, laboratory results)—where neural networks seem to be the most used class of algorithms for classification, textual classification appears to be performed with a more restricted number of classifiers [37]. This can be explained by the fact that text classification requires additional considerations. Automated classifications lack the ability to interpret a sentence out of a given context (eg, a therapeutic session), while the meaning of a sentence could change based on the context. Another complexity is that words can refer to different entities based on the sociocultural context. Therefore, considering such complexities can require further parameterizations and considerations, which may also explain why, in the identified studies, the same algorithm used on data sets of a similar size could have a diverging predictive score.

Consistent with our findings, linear SVM classifiers tend to be regarded as one of the best text classifying algorithms in the literature [38]. Many types of classifiers are available, but it appears that only a few are consistently used for the classification of textual entities [26]. This is consistent with our review, as the identified studies tended to use similar strategies when classifying textual entities. A recent literature review on data classification of clinical text data explains this phenomenon by the fact that there is a bottleneck of annotations in the context of supervised learning [39].

Limitations

This systematic review of literature focuses on the fields of psychiatry, psychology, and social sciences to reflect the type of textual entities usually found in therapy transcripts. A limitation of this study is the small number of classification algorithm studies published in these fields. As this is an emerging domain, the number of studies on the topic should increase in the future.

Conclusions

Machine learning can be beneficial for the field of psychiatry. Automated text classification for psychotherapy is a promising avenue to generate quantitative and qualitative data in an efficient way to make the data readily available for analyses. SVM classifiers appear to be preferred over other types of classifiers in the context of small databases. Using such classifiers could be useful in the evaluation of therapeutic processes of novel therapies where data are limited. Nevertheless, the limited number of articles found on the subject outlines the need for more development in this field, especially regarding the use of such classifiers in the domain of mental health.

Multimedia Appendix 1

Electronic search strategy for the systematic review conducted.

Multimedia Appendix 2

Detailed results of the systematic review study selection.

Abbreviations

SVM

support vector machine

This study was funded by Le Fonds de recherche du Québec – Santé (FRQS) and Services et recherches psychiatriques AD.

The study was designed by AH, SP, and AD. Statistical analyses were performed by AH and MB. All the authors have made substantial contributions and have revised, edited, and approved the manuscript.

None declared.

Ewbank

Cummins

Tablan

Bateup

Catarino

Martin

Blackwell

Quantifying the Association Between Psychotherapy Content and Clinical Outcomes Using Deep Learning

JAMA Psychiatry 2020 01 01 77 1 35 43

10.1001/jamapsychiatry.2019.2664

31436785

2748757

PMC6707006

Cook

Schwartz

Kaslow

Evidence-Based Psychotherapy: Advantages and Challenges

Neurotherapeutics 2017 07 26 14 3 537 545

10.1007/s13311-017-0549-4

28653278

10.1007/s13311-017-0549-4

PMC5509639

Hill

Chui

Baumann

Kazdin

Revisiting and reenvisioning the outcome problem in psychotherapy: An argument to include individualized and qualitative measurement

Methodological issues and strategies in clinical research (4th ed) 2016

Washington, DC

American Psychological Association

373 386

Szymańska

Dobrenko

Grzesiuk

Characteristics and experience of the patient in psychotherapyand the psychotherapy’s effectiveness. A structural approach

Psychiatr Pol 2017 51 4 619 631

10.12740/pp/62483

Perepletchikova

On the topic of treatment integrity

Clinical Psychology: Science and Practice 2011 06 18 2 148 153

10.1111/j.1468-2850.2011.01246.x

Sebastiani

Machine learning in automated text categorization

ACM Comput Surv 2002 03 34 1 1 47

10.1145/505282.505283

Abbe

Grouin

Zweigenbaum

Falissard

Text mining applications in psychiatry: a systematic literature review

Int J Methods Psychiatr Res 2016 06 17 25 2 86 100

10.1002/mpr.1481

26184780

PMC6877250

Khalid

Goldenberg

Grantcharov

Taati

Rudzicz

Evaluation of Deep Learning Models for Identifying Surgical Actions and Measuring Performance

JAMA Netw Open 2020 03 02 3 3 e201664

10.1001/jamanetworkopen.2020.1664

32227178

2763474

Tang

Chappell

Mazzoli

Tewari

Choi

Wiens

Predicting Acute Graft-Versus-Host Disease Using Machine Learning and Longitudinal Vital Sign Data From Electronic Health Records

JCO Clinical Cancer Informatics 2020 09 4 128 135

10.1200/cci.19.00105

Høglend

Exploration of the patient-therapist relationship in psychotherapy

Am J Psychiatry 2014 10 171 10 1056 66

10.1176/appi.ajp.2014.14010121

25017093

1888993

Durstewitz

Koppe

Meyer-Lindenberg

Deep neural networks in psychiatry

Mol Psychiatry 2019 2 15 24 11 1583 1598

10.1038/s41380-019-0365-9

Gupta

Katarya

Social media based surveillance systems for healthcare using machine learning: A systematic review

Journal of Biomedical Informatics 2020 08 108 103500

10.1016/j.jbi.2020.103500

Vora

Yang

A Comprehensive Study of Eleven Feature Selection Algorithms and their Impact on Text Classification

2017 Computing Conference 2017

2017 Computing Conference

July 18-20, 2017

London, UK

440 449

10.1109/sai.2017.8252136

Deo

Machine Learning in Medicine

Circulation 2015 11 17 132 20 1920 30

10.1161/CIRCULATIONAHA.115.001593

26572668

CIRCULATIONAHA.115.001593

PMC5831252

Cao

Meyer-Lindenberg

Schwarz

Comparative Evaluation of Machine Learning Strategies for Analyzing Big Data in Psychiatry

Int J Mol Sci 2018 10 29 19 11 3387

10.3390/ijms19113387

30380679

ijms19113387

PMC6274760

Kowsari

Jafari Meimandi

Heidarysafa

Mendu

Barnes

Brown

Text Classification Algorithms: A Survey

Information 2019 04 23 10 4 150

10.3390/info10040150

Hämäläinen

Vinni

Comparison of Machine Learning Methods for Intelligent Tutoring Systems

Intelligent Tutoring Systems 2006

ITS 2006

June 26-30, 2006

Jhongli, Taiwan

525 534

10.1007/11774303_52

Wanigasekara

Swain

Nguang

Prusty

Improved Learning from Small Data Sets Through Effective Combination of Machine Learning Tools with VSG Techniques

International Joint Conference on Neural Networks 2018

International Joint Conference on Neural Networks

2018

Rio, Brazil

Shiner

D'Avolio

Nguyen

Zayed

Watts

Fiore

Automated classification of psychotherapy note text: implications for quality assessment in PTSD care

J Eval Clin Pract 2012 06 18 3 698 701

10.1111/j.1365-2753.2011.01634.x

21668796

PMC4539242

Slonim

Tishby

The Power of Word Clusters for Text Classification

23rd European Colloquium on Information Retrieval Research 2001 01 12

23rd European Colloquium on Information Retrieval Research

2001

Darmstadt, Germany

Joachims

Transductive inference for text classification using support vector machines

ICML 1999

2020-06-15

http://www1.cs.columbia.edu/~dplewis/candidacy/joachims99transductive.pdf

Chan

Lin

Mining association language patterns using a distributional semantic model for negative life event classification

J Biomed Inform 2011 08 44 4 509 18

10.1016/j.jbi.2011.01.006

21292030

S1532-0464(11)00008-6

Balakrishnan

Khan

Arabnia

Improving cyberbullying detection using Twitter users’ psychological features and machine learning

Computers & Security 2020 03 90 101710

10.1016/j.cose.2019.101710

Zhang

Gao

An Improvement to Naive Bayes for Text Classification

Procedia Engineering 2011 15 2160 2164

10.1016/j.proeng.2011.08.404

Huang

Naive Bayes classification algorithm based on small sample set

2011

IEEE International Conference on Cloud Computing and Intelligence Systems

2011

Beijing, China

10.1109/ccis.2011.6045027

Vijayan

Bindu

Parameswaran

A comprehensive study of text classification algorithms

2017

International Conference on Advances in Computing, Communications and Informatics (ICACCI)

2017

Udipi, India

10.1109/icacci.2017.8125990

Kamiński

Jakubczyk

Szufel

A framework for sensitivity analysis of decision trees

Cent Eur J Oper Res 2018 5 24 26 1 135 159

10.1007/s10100-017-0479-6

29375266

479

PMC5767274

Noble

What is a support vector machine?

Nat Biotechnol 2006 12 24 12 1565 7

10.1038/nbt1206-1565

17160063

nbt1206-1565

Zolnoori

Fung

Patrick

Fontelo

Kharrazi

Faiola

YSS

Eldredge

Luo

Conway

Zhu

Park

Moayyed

Goudarzvand

A systematic approach for developing a corpus of patient reported adverse drug events: A case study for SSRI and SNRI medications

J Biomed Inform 2019 02 90 103091

10.1016/j.jbi.2018.12.005

30611893

S1532-0464(19)30001-2

Singh

Shrivastava

Bouayad

Padmanabhan

Ialynytchev

Schultz

Machine learning for psychiatric patient triaging: an investigation of cascading classifiers

J Am Med Inform Assoc 2018 11 01 25 11 1481 1487

10.1093/jamia/ocy109

30380082

5149328

PMC6213089

Hartmann

Huppertz

Schamp

Heitmann

Comparing automated text classification methods

International Journal of Research in Marketing 2019 03 36 1 20 38

10.1016/j.ijresmar.2018.09.009

Fodor

Coteț

Cuijpers

Szamoskozi

David

Cristea

The effectiveness of virtual reality based interventions for symptoms of anxiety and depression: A meta-analysis

Sci Rep 2018 07 09 8 1 10323

10.1038/s41598-018-28113-6

29985400

10.1038/s41598-018-28113-6

PMC6037699

Srivastava

Singh

Suri

Healthcare Text Classification System and its Performance Evaluation: A Source of Better Intelligence by Characterizing Healthcare Text

J Med Syst 2018 04 13 42 5 97

10.1007/s10916-018-0941-6

29654417

10.1007/s10916-018-0941-6

Visweswaran

Colditz

O'Halloran

Han

Taneja

Welling

Chu

Sidani

Primack

Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study

J Med Internet Res 2020 08 12 22 8 e17478

10.2196/17478

32784184

v22i8e17478

PMC7450367

Shahmirzadi

Lugowski

Younge

Text Similarity in Vector Space Models: A Comparative Study

2017

18th IEEE International Conference On Machine Learning And Applications (ICMLA)

2019

Boca Raton, Florida

10.1109/icmla.2019.00120

Khattak

Jeblee

Pou-Prom

Abdalla

Meaney

Rudzicz

A survey of word embeddings for clinical text

J Biomed Inform 2019 12 100S 100057

10.1016/j.yjbinx.2019.100057

34384583

S2590-177X(19)30056-3

Yadav

Jadhav

Deep convolutional neural network based medical image classification for disease diagnosis

J Big Data 2019 12 17 6 1 113

10.1186/s40537-019-0276-2

Agnihotri

Verma

Tripathi

An automatic classification of text documents based on correlative association of words

J Intell Inf Syst 2017 8 14 50 3 549 572

10.1007/s10844-017-0482-3

Spasic

Nenadic

Clinical Text Data in Machine Learning: Systematic Review

JMIR Med Inform 2020 03 31 8 3 e17984

10.2196/17984

32229465

v8i3e17984

PMC7157505