This is an openaccess article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Mental Health, is properly cited. The complete bibliographic information, a link to the original publication on http://mental.jmir.org/, as well as this copyright and license information must be included.
We recently described a new questionnaire to monitor mood called mood zoom (MZ). MZ comprises 6 items assessing mood symptoms on a 7point Likert scale; we had previously used standard principal component analysis (PCA) to tentatively understand its properties, but the presence of multiple nonzero loadings obstructed the interpretation of its latent variables.
The aim of this study was to rigorously investigate the internal properties and latent variables of MZ using an algorithmic approach which may lead to more interpretable results than PCA. Additionally, we explored three other widely used psychiatric questionnaires to investigate latent variable structure similarities with MZ: (1) Altman selfrating mania scale (ASRM), assessing mania; (2) quick inventory of depressive symptomatology (QIDS) selfreport, assessing depression; and (3) generalized anxiety disorder (7item) (GAD7), assessing anxiety.
We elicited responses from 131 participants: 48 bipolar disorder (BD), 32 borderline personality disorder (BPD), and 51 healthy controls (HC), collected longitudinally (median [interquartile range, IQR]: 363 [276] days). Participants were requested to complete ASRM, QIDS, and GAD7 weekly (all 3 questionnaires were completed on the Web) and MZ daily (using a custombased smartphone app). We applied sparse PCA (SPCA) to determine the latent variables for the four questionnaires, where a small subset of the original items contributes toward each latent variable.
We found that MZ had great consistency across the three cohorts studied. Three main principal components were derived using SPCA, which can be tentatively interpreted as (1) anxiety and sadness, (2) positive affect, and (3) irritability. The MZ principal component comprising anxiety and sadness explains most of the variance in BD and BPD, whereas the positive affect of MZ explains most of the variance in HC. The latent variables in ASRM were identical for the patient groups but different for HC; nevertheless, the latent variables shared common items across both the patient group and HC. On the contrary, QIDS had overall very different principal components across groups; sleep was a key element in HC and BD but was absent in BPD. In GAD7, nervousness was the principal component explaining most of the variance in BD and HC.
This study has important implications for understanding selfreported mood. MZ has a consistent, intuitively interpretable latent variable structure and hence may be a good instrument for generic mood assessment. Irritability appears to be the key distinguishing latent variable between BD and BPD and might be useful for differential diagnosis. Anxiety and sadness are closely interlinked, a finding that might inform treatment effects to jointly address these covarying symptoms. Anxiety and nervousness appear to be amongst the cardinal latent variable symptoms in BD and merit close attention in clinical practice.
Regular monitoring of symptom severity and disease progression in mental disorders is widely encouraged in treatment guidelines [
One approach toward PROMs is to develop generic instruments capturing universal outcomes that are relevant across a wide range of diseases and conditions such as pain and fatigue. This motivated the development of the patient reported outcomes measurement information system (PROMIS), an instrument for selfreporting physical, mental, and social health aspects in the general population [
In this study, we focus on mining PROMs using diseasespecific clinical scales to better understand the underlying symptoms in bipolar disorder (BD) and borderline personality disorder (BPD), comparing findings against healthy controls (HC). BD is characterized by recurrent alternating periods of elated mood (known as mania or hypomania, depending on symptom severity) and depression, which is usually more common [
A critical aspect of understanding PROMs is deciphering the underlying structure inherent in the questionnaires eliciting the participants’ responses. That is, identifying some characteristics (latent variables) which are not directly observed through the items in the questionnaires but which are inferred through algorithmic processing of the observed items. One of the main advantages of using latent variables is explaining most of the data using a few variables which may be tentatively interpretable. They comprise items grouped together, thus indicating which different symptoms may be related. Hence, latent variables might offer additional insight into the underlying mood symptoms, and suggest new directions for clinical assessment and care.
The aims of this study were to: (1) explore the latent variable structure of a recently introduced psychiatric questionnaire known as Mood Zoom (MZ) [
The data were collected as part of a large ongoing research project known as automated monitoring of symptom severity (AMoSS) [
We excluded data from participants who either withdrew consent (1 participant) or completed participation without providing at least two months of useful data for all questionnaires (9 participants). We processed data from 131 participants, 120 of whom had provided data for at least three months, and 108 of whom had provided data for at least 12 months. All participants gave written informed consent to participate in the study. All patient participants were screened by an experienced psychiatrist (KEAS) using the structured clinical interview for diagnostic and statistical manual of mental disorders, 4^{th}edition (DSM IV) and the borderline items of the international personality disorder examination (IPDE) [
The participants reported their mood on a weekly basis using three validated questionnaires: (1) Altman selfrating mania scale (ASRM) [
ASRM is comprised of 5 items: (1) mood, (2) selfconfidence, (3) sleep disturbance, (4) speech, and (5) activity. Items are scored on a 0 (symptomfree) to 4 (present nearly all the time) scale, and the total ASRM is computed by adding up the items in the 5 sections giving rise to the range 0 to 20. Miller et al [
QIDS is comprised of 16 items, where each item is scored on a 0 (symptomfree) to 3 scale. The items map onto 9 DSMIV symptom criteria domains for depression: (1) sad mood, (2) concentration, (3) selfcriticism, (4) suicidal ideation, (5) loss of interest, (6) energy or fatigue, (7) sleep disturbance, (8) changes in appetite or weight, and (9) psychomotor agitation or retardation. Each domain is either the highest score of a subset of the 16 QIDS items or one of the original QIDS items; see Rush et al for details [
GAD7 is comprised of 7 items which are scored on a 0 (symptomfree) to 3 (nearly every day) scale, with total scores ranging from 0 to 21. Kroenke et al [
MZ is comprised of 6 items: (1) anxious, (2) elated, (3) sad, (4) angry, (5) irritable, and (6) energetic. Each item is scored on a Likert scale ranging from 1 (“not at all”) to 7 (“very much”). Participants were prompted to complete MZ during the study daily in the evening at a prespecified chosen time.
We constructed 4 data matrices to contain the data for subsequent processing, one data matrix for each of the questionnaires. Subsequently, we worked independently on each of those 4 matrices to determine properties applicable to each of the questionnaires.
For ASRM we used a 5719×5 data matrix. There were 2363 samples for BD, 1298 samples for BPD, and 2058 samples for HC.
For QIDS we used a 4871×9 data matrix. There were 2054 samples for BD, 1099 samples for BPD, and 1718 samples for HC.
For GAD7 we used a 5652×7 data matrix. There were 2208 samples for BD, 1389 samples for BPD, and 2055 samples for HC.
For MZ we used a 44725×6 data matrix (44725 samples and 6 items). There were 17317 samples for BD, 11120 samples for BPD, and 16288 samples for HC.
Any missing entries (~20% as we reported in our previous study [
Summary of the key demographics of participants in automated monitoring of symptom severity (AMoSS).
Bipolar disorder  Borderline personality disorder  Healthy controls  
Originally recruited  53  34  54 
Processed data from  48  32  51 
Days in study, median (IQR^{a}range)  365 (325; 69867)  364 (194; 81858)  363 (191; 80651) 
Age (years), median (IQR range)  38 (19; 1864)  34 (14 2156)  37 (20; 1963) 
Gender (male)  17  2  18 
Unemployed  7  15  6 
Any psychotropic medication  47  23  0 
Lithium  19  0  0 
Anticonvulsant  19  1  0 
Antipsychotic  33  6  0 
Antidepressants  17  23  0 
Hypnotics  3  2  0 
^{a}IQR: interquartile range.
Before processing the data, we standardized entries to reflect individual reporting bias so that they are directly comparable across participants. This preprocessing step was deemed necessary because the same level of mood may be assigned a different item score by different participants, and hence the raw item scores are not directly comparable across participants. Therefore, for each questionnaire, we subtracted from each item entry the mean value of that item per participant. Effectively, this transformed the discrete data matrices into continuous data matrices. This step is particularly useful in combination with the latent variable structure approach described below.
Given a data matrix
The mathematical approaches to achieve this can be generally divided into linear and nonlinear methods, depending on how the original variables in the data matrix are combined to derive the latent variables. Although sophisticated nonlinear methods may work well in complicated toy problems, they are often more difficult to interpret than some standard linear projection techniques (which in many practical settings may also work very well). One of the most widely used methods for detecting the latent variable structure of a data matrix is principal component analysis (PCA) [
P_{1}=
P_{2}=
…
P_{M}=
In the equation, P_{1}… P_{M} are the principal components, x_{1}… x_{M} are the items in each questionnaire, and
In practice, each principal component is a linear combination of all the original variables; that is, the loadings are generally nonzero, and therefore the interpretation of the resulting principal components may be challenging. Ideally the structure (ie, collectively the loadings) should be simple, comprising a few nonzero entries associating a small subset of the variables in subset of the
In this study, we followed the methodology proposed in Hein and Buehler [
We computed the densities using kernel density estimation with Gaussian kernels to visualize the differences in the latent variables for the three cohorts and used the 2sample KolmogorovSmirnov goodnessoffit statistical hypothesis test to determine whether the distributions are statistically significantly different. We tested the null hypothesis that the random samples are drawn from the same underlying continuous distribution.
Next, we wanted to quantify the difference in the distributions of the principal components for the different groups. The computation of effect sizes is one widely used approach to quantify these differences, but relies on having Gaussian distributions which is not necessarily the case here. A more generic methodology to quantify differences between two distributions relies on the divergence metrics [
In
Next, we applied SPCA on ASRM (
The latent variable structure of ASRM is not consistent across the 3 groups, but it is consistent for the psychiatric groups. Some of the computed latent variables are not easily interpretable: for example, it is not clear how we should interpret the latent variable consisting of the items “sleepy” and “talkative.” The “positive affect” in the ASRM latent variable reported in
Mood zoom (MZ) latent variable structure using standard principal component analysis (PCA).
MZ item  P1  P2  P3  
Anxious  0.52  0.10  0.81  
Elated  −0.19  0.72  0.07  
Sad  0.49  0.09  −0.05  
Angry  0.45  0.17  −0.44  
Irritable  0.47  0.19  −0.38  
Energetic  −0.19  0.63  0.03  
% total variance explained  57.8  77.2  84.6  
Tentative interpretation  Negative affect  Positive affect  
Anxious  0.51  −0.01  0.39  
Elated  −0.13  0.70  0.24  
Sad  0.48  −0.24  0.56  
Angry  0.48  0.24  −0.36  
Irritable  0.51  0.27  −0.49  
Energetic  −0.07  0.58  0.32  
% total variance explained  48.9  69.6  81.2  
Tentative interpretation  Negative affect  Positive affect  
Anxious  0.18  0.57  −0.06  
Elated  0.74  −0.23  −0.63  
Sad  0.15  0.50  −0.02  
Angry  0.12  0.37  0.03  
Irritable  0.12  0.46  0.05  
Energetic  0.61  −0.17  0.77  
% total variance explained  51.7  78.2  87.8  
Tentative interpretation  Positive affect  Negative affect 
Sparse mood zoom (MZ) latent variable structure.
MZ item  P1  P2  P3  
Anxious  0.75  0  0  
Elated  0  −0.64  0  
Sad  0.66  0  0  
Angry  0  0  0.62  
Irritable  0  0  0.79  
Energetic  0  −0.77  0  
% total variance explained  33.1  56.6  75.8  
Tentative interpretation  Anxiety and sadness  Positive affect  Irritability  
Anxious  0.66  0  0  
Elated  0  0  −0.71  
Sad  0.75  0  0  
Angry  0  0.67  0  
Irritable  0  0.74  0  
Energetic  0  0  −0.70  
% total variance explained  31.5  54.9  74.7  
Tentative interpretation  Anxiety and sadness  Irritability  Positive affect  
Anxious  0  0.73  0  
Elated  −0.66  0  0  
Sad  0  0.68  0  
Angry  0  0  −0.59  
Irritable  0  0  −0.81  
Energetic  −0.75  0  0  
% total variance explained  37.9  58.9  73.5  
Tentative interpretation  Positive affect  Anxiety and sadness  Irritability 
QIDS appears to have a very inconsistent structure when examined with SPCA. In most cases, it is not easy to interpret what the resulting principal components mean; this may reflect that the QIDS items are disjoint, and there is no clear underlying latent variable structure.
GAD7, like QIDS, is not very consistent across the 3 cohorts. Moreover, some of the resulting latent variables are difficult to interpret, for example, the meaning of the principal component comprised of the items “relaxed” and “restless.” Nevertheless, some of the latent variables across cohorts are consistent: the latent variable “nervousness” explains most of the variance in HC and BD. This is effectively the equivalent latent variable of MZ “anxiety and sadness” in
Sparse Altman selfrating mania (ASRM) scale latent variable structure.
ASRM item  P1  P2  P3  
Happy  0.65  0  −0.45  
Confident  0  0  −0.89  
Sleepy  0  0.92  0  
Talkative  0  0.38  0  
Active  0.76  0  0  
% total variance explained  50.1  69.1  82  
Tentative interpretation  Positive affect  Sleepy and talkative  Assertiveness  
Happy  0.57  0  −0.58  
Confident  0  0  −0.81  
Sleepy  0  0.88  0  
Talkative  0  0.47  0  
Active  0.82  0  0  
% total variance explained  47.4  67  80.9  
Tentative interpretation  Positive affect  Sleepy and talkative  Assertiveness  
Happy  0.90  0  0  
Confident  0.44  0  0  
Sleepy  0  0  0  
Talkative  0  0.31  −0.95  
Active  0  0.95  0.31  
% total variance explained  39.7  66.2  79.9  
Tentative interpretation  Assertiveness  Active and talkative  Quiet and active 
We investigated whether the principal components could differentiate the 3 cohorts in the study, BD, BPD, and HC. Since only MZ has a consistent latent variable structure across all 3 cohorts, the comparisons are only reported for that questionnaire in
The densities of the principal components for the 3 cohorts are presented in
We summarized the MZ latent variable values and quantified the differences between pairs of distributions using the symmetric KullbackLeibler divergence in
Overall, the findings in
Sparse quick inventory of depressive symptomatology (QIDS) selfreport latent variable structure.
QIDS item  P1  P2  P3  
Sleep  0  −0.96  0  
Sad  −0.72  0  0  
Appetite or weight  0  0  −0.98  
Concentration  0  0  0  
Selfview  −0.69  0  0  
Suicide  0  0  0  
Interest  0  0  0  
Energy  0  0  −0.22  
Restless  0  0.28  0  
% total variance explained  30.2  50.4  68.4  
Tentative interpretation  Esteem and sadness  Sleep changes  Appetite and energy  
Sleep  0  0  0  
Sad  0  0  0  
Appetite or weight  0  −0.94  0  
Concentration  0  0  0  
Selfview  0  0  0.89  
Suicide  0  0  0.45  
Interest  −0.78  0  0  
Energy  −0.62  0  0  
Restless  0  −0.33  0  
% total variance explained  31.2  50.3  68  
Tentative interpretation  Energetic  Appetite and restlessness  Selfesteem and suicide  
Sleep  −0.99  0  0  
Sad  0  0  −0.83  
Appetite or weight  0  −0.96  0  
Concentration  0  0  0  
Selfview  0  0  −0.55  
Suicide  0  0  0  
Interest  0  0  0  
Energy  −0.15  −0.29  0  
Restless  0  0  0  
% total variance explained  37.9  59.7  76  
Tentative interpretation  Sleep  Appetite and energy  Esteem and sadness 
Sparse generalized anxiety disorder 7 (GAD7) latent variable structure.
GAD7 item  P1  P2  P3  
Nervous or anxious  −0.75  0  0  
Control worries  −0.67  0  0  
Worried  0  0  0  
Relaxed  0  −0.37  0.54  
Restless  0  0  0.84  
Irritable  0  −0.93  0  
Afraid  0  0  0  
% total variance explained  41.2  60.5  72.9  
Tentative interpretation  Nervousness  Irritability and relaxation  Activity  
Nervous or anxious  0  0  0  
Control worries  0  0  −0.71  
Worried  0  0  −0.70  
Relaxed  0.63  0  0  
Restless  0.78  0  0  
Irritable  0  0.81  0  
Afraid  0  0.58  0  
% total variance explained  29.3  48.4  69.8  
Tentative interpretation  Activity  Irritability and fear  Worry  
Nervous or anxious  0.81  0  0  
Control worries  0.58  0  0.46  
Worried  0  −0.23  0.89  
Relaxed  0  0  0  
Restless  0  0  0  
Irritable  0  −0.97  0  
Afraid  0  0  0  
% total variance explained  36.2  59.9  73.5  
Tentative interpretation  Nervousness  Irritability and worry  Worry 
Summary statistics for the sparse principal components computed in
Sparse principal component  BD^{a} 
BPD^{b} 
HC^{c} 
BD versus BPD 
BD versus HC 
BPD versus HC 

P1^{e}  −0.16 (1.89)  −0.12 (2.56)  −0.08 (0.63)  1.78  4.46  4.72  
P2^{f}  0.16 (1.60)  0.11 (1.98)  0.03 (1.29)  1.15  0.97  1.25  
P3^{g}  −0.27 (1.47)  −0.16 (2.31)  −0.05 (0.34)  3.67  3.17  6.78 
^{a}BD: bipolar disorder.
^{b}BPD: borderline personality disorder.
^{c}HC: healthy controls.
^{d}IQR: interquartile range.
^{e}P1= “anxiety and sadness.”
^{f}P2= “positive affect.”
^{g}P3= “irritability.”
Density estimates of the “anxiety and sadness” principal component for the three cohorts.
Density estimates of the “positive affect” principal component for the three cohorts.
Density estimates of the “irritability” principal component for the three cohorts.
We have applied a recently developed form of SPCA to explore the latent variables of four psychiatric questionnaires across BD, BPD, and HC. We emphasize that the SPCA used here was guided primarily by the need to develop simple latent variables that would facilitate interpretation over and above findings computed using the standard PCA. As expected, in most cases the loadings in the patient cohorts were more similar compared with HC. The latent variable structure was stable across all three cohorts for MZ and stable across the patient cohorts for ASRM. On the contrary, the latent variable structure was quite different for the three cohorts for QIDS and GAD7. Broadly speaking, having the same latent variables across cohorts indicates internal consistency of a questionnaire and is a convenient property because it enables direct quantitative comparisons of the resulting latent variables (see
The recently proposed MZ [
The latent variable structure of ASRM was identical for BD and BPD but differed when compared with HC; this may indicate that the psychiatric groups have the same underlying effects when reporting mania symptoms. However, we view this finding very cautiously, because the ASRM variability was extremely low for HC. Sleep appears to be a key item in the latent variables of QIDS for HC and BD but not BPD. This might reflect a true difference in the perception of the effect of sleep on mood symptoms in BPD; again, this finding should be treated with caution because most BPD participants in the study were unemployed and hence, this may have skewed their responses.
It is difficult to crossreference the questionnaires since they have been fundamentally developed to capture different mood symptoms (ASRM for mania, QIDS for depression, and GAD7 for anxiety). Nevertheless, we have seen that irritability is a key latent variable in MZ, and that item dominates the second latent variable in GAD7. Similarly, “anxiety and sadness” is the primary latent variable of MZ, which is similar to the first latent variable observed for BD and HC in GAD7 (
Understanding and interpreting the latent variables may have important implications for understanding mood traits and mood trait interactions and could lead into new hypotheses and clinical research insights. We found that anxiety and sadness are mood characteristics that covary consistently across groups (
We have presented results from a relatively large number of participants in the context of longitudinal mood monitoring, tracking their mood variation for multiple months as opposed to other studies, which were confined to a few weeks (eg, [
There is a large number of PROMs developed for (1) the general population, (2) broad population cohorts (eg, people diagnosed with mental disorders), and (3) specific disorders such as BD. Wellknown generic instruments include the profile of moods state (POMS) [
Alternative specialized PROM instruments such as the young mania rating scale (YMRS) [
Clinical diagnosis of mental disorders has traditionally relied on conventional DSM guidelines, which is a symptombased approach. A relatively recently proposed framework for studying mental disorders is the research domain criteria (RDoC), which aims to provide a more inclusive, multidimensional approach including genetic, neural, and behavioral features [
Although some previous studies have studied the internal consistency of psychiatric questionnaires [
In a recent previous study [
Notwithstanding the relatively large number of participants for the studied patient groups, there were certain limitations. First, we used three widely established questionnaires used for selfassessment of mood symptoms (ASRM, QIDS, and GAD7) and the recently proposed MZ. There are numerous other questionnaires in the psychiatric literature, some of which have also been used in the context of BD.
Second, most of the BD participants were recruited from a larger study; therefore, they might be more compliant than a new cohort in this diagnostic group. However, we stress that participants were originally recruited for 3 months with the option to stay longer; the majority found the study engaging and provided data for at least a year. Although the study cohort was representative of a subgroup of psychiatric outpatients, it did not include those who were psychotic or who had significant comorbidities. Moreover, the vast majority of the BD cohort was euthymic for the larger part of the AMoSS study with very few participants exhibiting the characteristic alternating periods of mania and depression. Future studies could investigate differences within BD to compare questionnaire latent variable structures and loadings of a euthymic subgroup versus a subgroup cycling through mania and depression.
Third, the study was observational in nature, and we had very little contact with participants. The pharmacological treatment at trial onset was recorded, but we do not have accurate information on changes in medication through the duration of the study. All the reported scores rely on selfassessment; there is a lack of ongoing clinical assessment by experts to validate the findings. For example, FaurholtJepsen et al [
Finally, there are multiple machine learning techniques to determine the latent variable structure of the data. In addition to different types of SPCA with different penalties and regularization settings, there are alternative techniques such as factor analysis, nonnegative matrix factorization, and more complicated manifold embedding methods [
We tried to identify the underlying psychological processes for the three cohorts by interpreting the latent variables computed from a single modality: selfassessed questionnaires. It could be argued that using latent variables compared with single items might be more robust in defining underlying psychological processes because they rely on multiple items which covary, and hence these provide a better means to identify differences between cohorts. Nevertheless, this argument would need to be validated using additional data looking at more detailed aspects about how these facets overlap with markers from other modalities. We have collected a large set of additional modalities in AMoSS (electrocardiogram, geolocation, activity, sleep, and social interaction) which we will be exploring in future work. Ultimately, as suggested in RDoC, mental health is not a singledimensional concept, and fusing information from multiple modalities can bring additional key insights and improve understanding of the underlying processes and clinical assessment.
The findings in this study further support the recent introduction of MZ in clinical psychiatric practice. Its structure in terms of the first three principal components is consistent across BD, BPD, and HC, and the order of the principal components can be tentatively understood intuitively. ASRM is consistent for the patient groups versus HC. QIDS and GAD7 are more varied and do not lead to easily interpretable principal components. We found that BD and BPD are very similar in terms of some standardized questionnaires (ASRM) but quite divergent in terms of QIDS and GAD7. Further work is warranted to understand the similarities and differences between BD and BPD, which may facilitate differential diagnosis and longterm monitoring of their treatment approaches.
automated monitoring of symptom severity
Altman selfrating mania
bipolar disorder
borderline personality disorder
diagnostic and statistical manual of mental disorders
generalized anxiety disorder (7item)
quick inventory of depressive symptomatology
healthy controls
international personality disorder examination
mood zoom
positive and negative affectivity schedule
principal component analysis
profile of moods state
patient reported outcome measures
research domain criteria
sparse principal component analysis
true colors
young mania rating scale
We are grateful to the research assistants in the AMoSS project: L. Atkinson, D. Brett, and P. Panchal for assistance in the data collection. The study was supported by the Wellcome Trust through a Centre Grant No. 098461/Z/12/Z, “The University of Oxford Sleep and Circadian Neuroscience Institute (SCNi).” This work was also funded by a Wellcome Trust Strategic Award (CONBRIO: Collaborative Oxford Network for Bipolar Research to Improve Outcomes, Reference number 102616/Z). NP acknowledges the support of the RCUK Digital Economy Programme grant number EP/G036861/1 (Oxford Centre for Doctoral Training in Healthcare Innovation). The sponsors had no involvement in the data collection, processing, and the decision to submit the manuscript for publication. Requests for access to the data can be made to GMG, but the data cannot be placed into a publicly accessible repository.
ACB has received salaries from P1vital Ltd. GMG has held grants from Servier; received honoraria for speaking or chairing educational meetings from Abbvie, AZ, GSK, Lilly, Lundbeck, Medscape, Servier; advised AZ, Cephalon/Teva, Lundbeck, Merck, Otsuka, P1vital, Servier, Sunovion and Takeda; and holds shares in P1vital.