Dynamic Interactive Social Cognition Training in Virtual Reality (DiSCoVR) for People With a Psychotic Disorder: Single-Group Feasibility and Acceptability Study

Background: People with a psychotic disorder commonly experience problems in social cognition and functioning. Social cognition training (SCT) improves social cognition, but may inadequately simulate real-life social interactions. Virtual reality (VR) provides a realistic, interactive, customizable, and controllable training environment, which could facilitate the application of skills in daily life. Objective: We developed a 16-session immersive VR SCT (Dynamic Interactive Social Cognition Training in Virtual Reality [DiSCoVR]) and conducted a single-group feasibility pilot study. Methods: A total of 22 people with


Introduction
People with a psychotic disorder commonly experience problems with social functioning, that is, impairments in the ability to interact successfully with the social environment and to adequately fulfil a societal role (eg, work, personal relationships) [1].These are often related to problems in the cognitive processes used in understanding and thinking about interactions with other people, known as social cognition [2,3].
The most commonly identified domains of social cognition are emotion perception (ie, the identification and processing of emotional cues), social perception and social knowledge (understanding social cues, rules, and context), theory of mind (ToM; identifying and understanding others' mental states and separating these from one's own perspective), and attribution style (inferences about the causes and intentions underlying events and others' behavior) [4].A meta-analysis found moderate to large deficits in people with schizophrenia in emotion perception, social perception, and ToM, but not in attribution style [2].
Social cognition has become an important treatment target for improvement of social functioning; a multitude of behavioral approaches to improve social cognition and social functioning has emerged in recent years [5].Three meta-analyses have found moderate to large effects of social cognition training (SCT) interventions on social cognition [6][7][8].Broad-based or comprehensive forms of SCT (eg, Social Cognition and Interaction Training [9]) appear to be the most effective overall [8].Improvements in social functioning were found in 2 meta-analyses [6,8], though only for broad-based SCT in the latter meta-analysis [8].
However, there are some concerns about SCT [5].First, while positive effects on lower-order social cognitive domains are robust, findings regarding higher-order domains are more heterogeneous, and depend on the type of task used [5].Second, findings regarding the generalization of social cognitive gains, as measured by social cognition tasks, to daily life social functioning are mixed; many studies do not find a significant effect [5].Third, the durability of treatment effects has not yet been established; of the available studies, some find sustained improvements at follow-up, whereas others do not [5,8].Thus, while SCT has a demonstrated effect on social cognition, particularly lower-order domains, these improvements do not always enduringly carry over to higher-order social cognitive domains and social functioning.One possible explanation for these mixed findings could be that the generalization of training material to social functioning requires better opportunities to apply training techniques in real-life social situations.From studies on cognitive remediation, we know that combining treatment with a meaningful context to practice newly learned behavior is vital for improvement of social functioning [10].However, the techniques and stimuli that are typically used in SCT, such as group discussions, videos, and pictures, lack the complexity and dynamic interaction that are present in real-life social situations [5,11].Although it is theoretically possible to accompany patients and practice in real-life social situations, doing so is generally not feasible in clinical practice.Furthermore, real-life situations cannot be controlled for training purposes.
These shortcomings of SCT could be addressed by administering interventions using virtual reality (VR).VR involves wearing a headset that projects continuously rendered 3D images [12].With VR, highly immersive, dynamic, and interactive social environments can be created, providing a high degree of ecological validity for assessment as well as treatment [13].Furthermore, VR is controllable, facilitating structured SCT in realistic social situations, and allows for scenarios to be personalized, repeated, and varied.Therapists can observe participants unobtrusively and provide real-time feedback.In addition, VR has practical benefits, because a wide scope of social situations can be simulated without leaving the treatment setting.Finally, barriers to practice may be smaller with VR than in real life, as users know that their actions have no real-world consequences, and that the VR can be stopped at any time.
VR has been found to be an effective tool in treatment of psychotic disorders, for example, for treating paranoid ideation [14] and auditory verbal hallucinations [15].A recent pilot study (n=19) of SCT using virtual environments in people with first-episode psychosis reported that SCT in a virtual world was acceptable and feasible, and found improvements in emotion recognition and anxiety.However, no significant change was observed in other domains of social cognition and social functioning [16].Besides, a case series (n=2) of VR SCT in people with psychotic disorder reported improvements in social cognition and social functioning [17].Furthermore, 2 trials studied the effect of VR social skills training (SST) on participants with a psychotic disorder: Park and colleagues [18] demonstrated enhanced improvement of assertiveness and conversational skills of a VR SST compared with conventional SST, and a pilot study [19] showed improvements in social anxiety, social functioning, and emotion perception after VR SST.Finally, promising results of VR SCT have been reported in other clinical populations, particularly in those with autism spectrum disorder [20,21].Together, these encouraging preliminary findings support the viability and utility of VR SCT.
We have developed a VR SCT called "Dynamic Interactive Social Cognition Training in Virtual Reality" (DiSCoVR).In this pilot study, our aims were twofold: • To determine whether providing VR SCT is feasible and acceptable to participants and therapists, evaluated in terms of commonly used criteria for feasibility (ie, acceptability, user satisfaction, demand, perceived usefulness,

•
To explore the effect of DiSCoVR on social cognition, neurocognition, and psychiatric symptoms, by examining participants' baseline and posttreatment scores.

Design and Participants
This pilot study had a pretest posttest design with a single treatment group.All participants continued to receive their treatment as usual alongside their participation in the study.People with a psychotic disorder were recruited from 3 mental health treatment centers in the Netherlands (University Medical Center Groningen, GGZ Drenthe, and GGZ Delfland).Potential participants were referred to the study by their treating clinician.To help clinicians determine which patients might be eligible, screening questions were provided: (1)

Intervention
DiSCoVR consisted of sixteen 45-60-minute individual treatment sessions, which took place two times a week.The intervention was provided on-site by therapists with (at minimum) a clinical psychology master's degree.A treatment protocol was used; all therapists were trained in its use.The protocol included background information, examples of goals and strategies, software manuals, exercises (eg, standard situations to practice in role play and their relation to social cognition), and detailed instructions on how to carry out sessions.Therapists received supervision at least once for each client and could consult the research team as needed for additional supervision and technical support.
Social cognition was trained by practicing with social material in immersive virtual environments (Figure 1) and by learning to apply strategies in these environments (eg, verbalizing facial characteristics, or verifying with others whether a social assessment is correct).Participants formulated concrete personal goals that could be achieved with improvement of social cognition.At the end of each session, participants reflected on how they could use new knowledge and skills to achieve their goals.In (optional, although strongly encouraged) homework assignments throughout the intervention, participants were encouraged to apply strategies in daily life.The intervention was structured to start with lower-order social cognition, and complexity was increased in each module.
Module 1 (Sessions 1-5) targeted emotion perception, practiced using a VR facial emotion recognition task.Participants explored a shopping street with avatars (virtual characters) who showed an emotion upon approach.Participants then selected the emotion they thought the avatar expressed in a multiple-choice menu.Immediate visual feedback on the correctness of their answer (green or red screen) was given.If an answer was incorrect, the same emotion was shown again with greater intensity.Several characteristics of the avatars and environment (eg, intensity of emotions, allotted time for answers) could be altered.Seven standard practice levels were created in which all available parameters increased in difficulty, but therapists could customize these parameters to create tailored levels.Participants learned strategies to recognize emotions (eg, verbalizing salient features) and practiced them both in VR and in their (real-life) home environment.
Module 2 (Sessions 6-9) targeted social perception and ToM.By practicing with interactive social scenarios, participants learned to understand the social context, hints, social missteps and ambiguity, perception of body language, and tone of voice.Participants observed social interactions between avatars in a café and supermarket, containing misunderstandings, hints, true and false beliefs, and social missteps.They answered multiple-choice and open-ended questions within the VR environment about the emotions, thoughts, and intentions of the avatars.Outside of the VR environment, participants continued to practice strategies (eg, remembering your thoughts/emotions in a similar situation) and tried to assess behavior, thoughts, and emotions of themselves and others in (real-life) daily social situations.Module 3 (Sessions 10-16) targeted application of higher-order social cognition in social interactions, practiced using interactive role-play exercises.Participants interacted with an avatar, whose appearance, voice, and emotions were controlled by the therapist.Participants practiced with situations that were difficult for them or that fit their goals.Therapists could also use standard (nonpersonalized) role-play exercises from the protocol, containing sarcasm, hinting, misunderstandings, and social missteps.Participants learned a social cognitive problem-solving technique in which they first considered the behavior, thoughts, and emotions of themselves and the other person, then formulated (and role-played) different possible reactions, and finally executed the reaction they preferred.Participants were encouraged to also apply this technique in their daily lives.
The virtual environments (a shopping street, a supermarket, and a bar) were shown using an Oculus Rift VR-headset (Consumer Version 1).The software was developed by CleVR BV.The VR software was controlled by the therapist, using one monitor to observe the participant's field of vision, and another monitor to control the virtual environment with the user interface.

RenderX
Participants used a Microsoft Xbox game controller to move around and to indicate answers in multiple-choice-menus.

Diagnostic Measures
The following measures were administered at baseline for diagnostic purposes.

National Adult Reading Test
National Adult Reading Test [26] (Dutch version, Nederlandse Leestest voor Volwassenen, [27]) is a proxy measure of premorbid intelligence.Participants recite a list of 50 increasingly uncommon words.Correct pronunciations yield 2 points.

Feasibility and Acceptability
Feasibility and acceptability of DiSCoVR were assessed in participants by a questionnaire consisting of 2 parts: statements about the intervention (eg, "I enjoyed the training") that were rated on 10-point Likert scales, and open-ended interview questions (eg, "What were strengths of the intervention?").The complete questionnaire can be found in Tables 2 and 3. We also recorded dropout rate and number of sessions completed as well as the time taken to complete DiSCoVR.
In addition, therapists completed an open-ended questionnaire about their satisfaction with the treatment protocol and materials (Table 4).Protocol fidelity was assessed with a self-report form and checklist after each treatment session, on which therapists could indicate any particularities, whether they deviated from the protocol, and why.

Social Cognition Ekman 60 Faces Test
Ekman 60 Faces Test [28] is a 60-item computerized picture task, measuring emotion perception.Participants are asked to identify 6 basic emotions (happy, surprised, anxious, disgusted, sad, and angry).The total score (ie, the number of correctly identified stimuli) across all emotions was analyzed.

Bell-Lysaker Emotion Recognition Test
Bell-Lysaker Emotion Recognition Test (BLERT [29], Dutch version unpublished) is a video task measuring emotion perception, consisting of 35 sentences, in which actors portray an emotionally ambiguous sentence neutrally or with 1 of 6 basic emotions.Participants identify the portrayed emotion.Total scores were analyzed.

The Awareness of Social Inference Task
The Awareness of Social Inference Task (TASIT [30]; Dutch version [31]) is a video task containing social vignettes, and consists of 3 parts: I-III.TASIT-I assesses emotion perception, distinguishing between neutral and 6 basic emotions.TASIT-II and -III measure social perception and ToM, and have questions about the intentions, message, thoughts, and feelings of the people in the video.TASIT-II consists of clips with genuine utterings or sarcasm; TASIT-III contains clips of lies or sarcasm.Parallel versions were used; the version order was A-B for all participants.For analysis, we used the total score for each part of TASIT.

Empathic Accuracy Task
Empathic Accuracy Task (EAT [32]; Dutch version [33]) is a computerized video task measuring empathy, with clips of people speaking about emotionally charged autobiographical events with either a positive or a negative valence.Conforming to previous studies [34,35], a shortened version (4 videos; 2 positive and 2 negative) was used.Parallel versions were administered, using counterbalanced randomization.Participants used a rating dial to indicate continuously how speakers were feeling while speaking (very negative to very positive).Empathic accuracy scores were generated for each video clip by correlating participants' affect ratings with original speakers' own affect ratings.These correlations (-1 to +1) underwent a Fisher z transformation prior to data analysis.For each participant, the mean Fisher z transformed EAT scores across video clips were used in the data analyses.

Faux Pas
Faux Pas [36] is a measure of ToM.Ten stories are read to the participant, 5 of which contain a faux pas.Participants are asked whether a faux pas occurred, who committed it, why it was a faux pas, and why it happened.A story comprehension and empathy question are also asked after each story.Parallel versions were used, but the order was not counterbalanced or randomized.The total score was used for analysis.

Rapid Visual Processing
Rapid Visual Processing (RVP; [37]) is a measure of sustained visual attention.A white box with alternating numbers (0-9, 100 digits per minute) is shown on a computer screen.Participants press a button if 1 of the 3 target sequences occur.Outcome variables of the RVP are response latency in milliseconds, sensitivity, and probability of hit (0-1).

Trail Making Test
The Trail Making Test (TMT; [38]) assesses processing speed and executive function.Numbers (TMT-A) or numbers and letters (TMT-B) are shown in circles, scattered across a sheet of paper.Participants connect the numbers (and letters) in consecutive order (eg, 1-2-3 or 1-A-2-B).The completion time in seconds for each subtest was used for analysis.

Green Paranoid Thought Scale
The Green Paranoid Thought Scale (GPTS; [39]) is a 32-item self-report questionnaire measuring paranoid thoughts on 2 dimensions (social reference and social persecution), using a 5-point Likert scale.We analyzed the total score for both subscales separately.

Social Interaction Anxiety Scale
The Social Interaction Anxiety Scale (SIAS; [40]) is a 20-item self-report questionnaire investigating verbal and nonverbal social anxiety, using a Likert scale: 0 (not at all) to 4 (completely).The total score was analyzed.

Beck Depression Inventory
Beck Depression Inventory (BDI; [41]) is a 21-item self-report questionnaire on symptoms of depression.Each item of the BDI uses statements fitting an increasing severity of depressive symptoms.We used the total BDI score for analysis.

Self-Esteem Rating Scale
The Self-Esteem Rating Scale (SERS; [42]) is a 20-item self-report questionnaire on (explicit) self-esteem.The SERS uses statements that are rated on a 1 (disagree totally) to 7 (agree totally) Likert scale.The total score was used for analysis.

Positive and Negative Syndrome Scale
The Positive and Negative Syndrome Scale (PANSS; [43]) is a semistructured interview investigating symptoms of psychosis.The positive (7 items) and negative (7 items) subscales were administered.Total subscale scores were used for analysis.

Procedure
After referral from a clinician or self-enrollment, interested patients were contacted and screened by the research team.Participants provided written informed consent during a face-to-face meeting, after which the baseline assessment (approximately 3 hours) took place.An overview of the measures, including their order and length, is included in Multimedia Appendix 1. Measurements were performed by trained assessors.After the baseline measurement, participants were enrolled in DiSCoVR.Upon finishing the training, a (face-to-face) posttreatment assessment (approximately 2.5 hours) took place.Participants who dropped out were asked to participate in the evaluation survey.
This study was approved by the Medical Ethical Committee of the University Medical Center Groningen (ABR: NL55477.042.16,METC: 2016/050), as well as by the ethics boards of the other participating centers (ie, the Committee for Research and Health Care Innovation, GGZ Drenthe, the Committee for Scientific Research, and GGZ Delfland).All participants gave written informed consent in accordance with the Declaration of Helsinki.

Analysis
We assessed 3 types of feasibility and acceptability data: (1) relevant quantitative parameters, such as dropout rates, time to recruit, intervention completion time, protocol adherence, and occurrence of issues (eg, technical problems) in sessions; (2) the participant survey; and (3) the therapist survey.
For quantitative data, descriptive statistics were examined.For qualitative items, similar answers or categories were grouped together and absolute and relative frequencies were evaluated.That is, because questions were open ended, we grouped comparable answers and counted how frequently they occurred.For example, for the question "Did the training meet your treatment needs?", the answers "I've learned what I'd wanted to learn" and "After setting goals, I could work on them very well" were grouped as "Yes," whereas "I did not fully succeed in developing better empathy, but I was able to practice," and "On some points, but not others" were grouped as "Partly".
To compare baseline and posttreatment scores, paired t tests were used, unless difference scores (T 1 -T 0 ) were not normally distributed.This was the case for the BLERT, GPTS-A, PANSS Positive, and TMT-B.Thus, Wilcoxon tests were carried out for these measures.Pairwise complete-case analysis was used in case of missing data.To account for the multitude of measures, we adopted an α of .01 as a threshold of significance.

Participants
Demographic and clinical characteristics of the sample are presented in Table 1.Participants were recruited between January and August 2017; 17 of the 22 participants (77%) completed the study.Reasons for dropout (n=5) were having too much going on (n=2), finding the intensity too high (n=1), not feeling a connection with the therapist (n=1), and (self-reported) negative symptoms and social anxiety (n=1).Noncompleters dropped out at Sessions 2 (n=1), 4 (n=1), 7 (n=2), and 10 (n=1).Three of the five participants who dropped out participated in the evaluation survey.The results of the survey are presented in Tables 2 and 3.In the quantitative survey, participants gave positive ratings to their enjoyment of DiSCoVR (mean 7.25, SD 2.05; range 3-10), to the amount they learned (mean 6.65, SD 1.81; range 3-10), and usefulness for daily social activities (mean 7.00, SD 2.05; range 3-10).Participants positively evaluated the combination of a therapist and VR (mean 7.85, SD 2.11; range 3-10) and the appropriateness of the difficulty level (mean 7.20, SD 1.91; range 3-10).Participants gave relatively low ratings to the realism of the appearance of the avatars (mean 5.45, SD 2.18; range 2-10) but ratings for their facial expressions (mean 6.65, SD 2.06; range 3-10) and voices (mean 6.95, SD 2.35; range 3-10) were higher.
In the open-ended questions of the qualitative survey (N=20), participants most commonly (n=14, 70%) mentioned the opportunity to practice with social situations in VR as a strength of the intervention.Other common subjective strengths of DiSCoVR were the personalization of the intervention (ie, targeting specific personal goals and situations; n=5, 25%) and (role of) the therapist (n=5, 25%), the emotion recognition module (n=3, 15%), and realism of emotions and role-play exercises (n=3, 15%).For a majority of people (n=13, 65%), the intervention fit their treatment needs.The most commonly reported subjective effect of DiSCoVR was improved social skills (n=7, 35%), followed by improved emotion recognition (n=6, 30%) and increased assertiveness and confidence (n=5, 25%).
The aspect of DiSCoVR most commonly named as a weakness was technical issues (n=7, 35%), particularly problems with sound regulation in the interaction module, as well as limitations of the content (eg, the inability to practice group conversations) and the graphical quality.Some participants criticized the realism of the intervention (n=4, 20%), particularly the avatars' movements and facial expressions.A few participants (n=4, 20%) indicated that the treatment only partly fit their needs; reasons given were that they felt their social cognition had improved, but not as much as they had wanted (n=1); that the latter part of the training was useful, but not the emotion perception module (n=1); that it was relevant and they had learned useful strategies, but that the intervention could be more focused and that they needed to keep reminding themselves to use them (n=1); and that there was insufficient opportunity to practice "small talk" (n=1).Overall, 3/20 (15%) participants stated that the intervention did not fit their needs: one participant felt it was too focused on (others') behavior; one indicated that recognizing emotions in conversations was still difficult; and one thought the role-play exercises were insufficiently realistic.
Finally, while a majority indicated that they were satisfied with the number (n=10, 50%), intensity (n=5, 25%), and duration (n=13, 65%) of sessions, those who were employed or had a long commute found the intensity too high (n=1, 5%) or somewhat high (n=4, 20%).11 (55) Good/realistic/opportunity for learning 6 (30) Okay, needs some improvement 1 (5) Fake/unrealistic 3 (15) Funny/takes getting used to 1 (5) Not applicable a n (%) refers to the number and percentage of participants who provided a certain answer.Because participants could provide multiple answers to a single question (eg, "I learned social skills and assertiveness"), and some participants did not answer all questions completely (eg, "The number of sessions was fine," but said nothing about the intensity/duration), n (%) may not add to 20 (100).

Therapists
The results of the evaluation of DiSCoVR by therapists are presented in Table 4. Therapists noted the role-play exercises and the opportunity to practice with social situations in VR as the main strength of DiSCoVR (4/6, 67%), considering these to be the most important and effective component of the intervention (5/6, 83%), followed by reflection on social situations (3/6, 50%).Other commonly named strengths were the treatment protocol (3/6, 50%) and the structure of the intervention (2/6, 33%).The majority of therapists considered the VR software to be adequate (4/6, 67%) or good (2/6, 33%), stating that it was easy and intuitive to work with (4/6, 67%), and praising its technical support (2/6, 33%).
The therapists mainly criticized the lack of technical reliability and limited capabilities of the software (5/6, 83%).Half of them recommended improving existing functionality, particularly the sound and graphical quality, and half of them recommended adding new features (eg, environments or scenarios).Regarding the relevance of scenarios for daily life, 2/6 (33%) therapists were satisfied, 3/6 (50%) felt they were relevant but could be improved, and 1/6 (17%) was dissatisfied and noted that they felt unnatural.
Therapists reported deviating from the protocol in 18.2% (55/303) of the total number of sessions that were carried out.In addition, technical issues were reported in 14.9% (45/303) of the sessions.Other issues (eg, a participant being late) occurred in 11.6% (35/303) of sessions.A total of 3/303 sessions (0.99%) were terminated early.It took a mean of 12.4 weeks (SD 5.2; range 8-22; median 11) for participants to complete the intervention.The reports indicated that participants spent a mean of 342 minutes practicing in VR across the 16 sessions (SD 28.8; range 227-451; median 362), which is equivalent to a mean of 17.9 minutes per session (SD 6.06; range 5.0-28.2;median 17.9).The mean duration of a session was 55.4 minutes (SD 7.2; range 34-67; median 57).

XSL • FO
RenderX n (%) a Question and answers 1 (17) Fine, but should be structured differently 1 (17) Needs more sessions 1 (17) Needs longer sessions a n (%) refers to the number and percentage of therapists who provided a certain answer.Because they could provide multiple answers to a single question and some therapists did not answer all questions completely, n (%) may not add to 6 (100).

Effects of DiSCoVR (Baseline Versus Posttreatment)
Baseline and posttreatment means, standard deviations, test statistics, and effect sizes are presented in Table 5. Analyses were conducted with 17 participants (unless indicated otherwise).At =.01, only the emotion perception, as measured by the Ekman 60 Faces Test, improved significantly after DiSCoVR (t 16 =-4.79,P<.001, mean difference 4.18).No significant improvement was observed on the other measures of emotion perception, any of the ToM measures, in neurocognition, or in levels of psychiatric symptoms.
For emotion perception, a moderate, effect size (d=-0.67)was found for the Ekman 60 Faces Test, but for the other measures, effect sizes were negligible (BLERT: d=0.03) and small (TASIT-I: d=-0.15).For social perception and ToM, we found negligible to small effects on all outcome measures (ranging between d=-0.15 and d=0.25).Negligible effects were found for information processing (TMT-A and B; d=0.11 and d=0.08), but small to moderate effects were found for sensitivity and probability of hit of the RVP (d=-0.47 for both).Small improvements were also observed for most symptom domains, with effect sizes ranging between d=0.16 and d=0.34.A small effect size was also found for self-esteem (d=-0.25).

Principal Findings
The main goal of this study was to evaluate the feasibility and acceptability of DiSCoVR, and to identify aspects in need of improvement [22].We found that participants and therapists were generally satisfied with the intervention.The interactive role-play exercises were most commonly named as a strength of the intervention, as well as the opportunity to practice with social situations and the combination of VR and a therapist.Both therapists and participants provided useful feedback for further development, particularly regarding technical issues.A secondary goal was to obtain an estimate of treatment effect sizes on various outcome domains.We found a significant improvement in emotion perception.However, no significant change was observed on the other measures, and most effect sizes were negligible to small.As stated in the "Introduction" section, commonly used criteria to evaluate feasibility and acceptance are acceptability, implementation potential, practicality, and limited-efficacy testing [22].Additional areas of interest regarding the feasibility of technological interventions are provided by the Technology Acceptance Model, which emphasizes perceived ease of use, perceived usefulness, and user attitudes toward the technology [23].Finally, the Systems Usability Scale [24] was likewise developed to evaluate technological innovations, and enquire about effectiveness, efficiency, and user satisfaction.In the following, we will focus on these criteria to evaluate the feasibility and acceptability of DiSCoVR.

Acceptability, User Attitudes, and Satisfaction
Participants gave positive ratings to the enjoyability of the intervention, its usefulness for daily social interactions, the combination of VR and a therapist, and the appropriateness of the difficulty level.The most important strength of the intervention, as indicated by both participants and therapists, was the opportunity to practice with interactive social situations resembling daily life.As such, we succeeded in our goal of creating a method to facilitate practice in realistic social situations.However, the use of new technology to accomplish this also has an important disadvantage, in the form of technical issues (particularly problems with sound settings) and limited capability of the software (eg, lack of sophisticated animations and group role-play features).Technical capability and reliability were the most important point of criticism from participants as well as therapists.While these technical limitations were troublesome, it is possible to address them in future iterations of the software.
Notably, some participants considered the emotion recognition module to be particularly useful, but an equal number considered it to be unnecessary.Given the considerable variation in baseline social cognitive ability, it is likely that some participants were relatively unimpaired in emotion perception.For them, the first module may have been unnecessary.Therefore, (VR) SCTs may need to take a modular approach, that is, emphasize different domains for different people, based on their needs.

Demand and Perceived Usefulness
We found that we could recruit participants relatively quickly, and most of them completed the intervention.DiSCoVR met treatment needs for the majority of participants, but not for everyone, mainly because people did not learn (all) the things they had wanted to learn, or because their subjective progress fell short of expectations.From this, we can learn that DiSCoVR does meet a demand, but it remains important that therapists and participants communicate clearly and regularly on social goals and their feasibility.

Implementation Potential, Practicality, and Perceived Ease of Use
The intervention could generally be delivered as intended, with therapists reporting protocol deviations (55/303, 18.2% of sessions) and technical issues (45/303, 14.9% of sessions) relatively rarely.Therapists also indicated that the software and treatment protocol were intuitive and easy to use, and praised the quality and availability of technical support.However, the average time taken to complete the intervention was approximately 50% longer than intended, possibly reflecting that twice-weekly sessions were impractical.Nonetheless, the majority of participants was satisfied with the intensity of DiSCoVR, and the frequency and number of sessions match similar VR studies [14,44] and previous SCT studies [45][46][47].Moreover, research suggests that for cognitive training, higher treatment intensity may produce better outcomes [48].

Limited-Efficacy Testing and (Perceived) Effectiveness
Common subjective effects of the intervention were enhanced social skills, improved emotion recognition, and increased confidence and assertiveness.While demonstrating efficacy was not the goal of this study, we observed a significant improvement in emotion perception (specifically in the Ekman 60 Faces Test), but we did not see a statistically significant or clinically relevant improvement in higher-order social cognitive domains.Thus, despite our reasoning that the use of VR would facilitate generalization to higher-order social processes, we observed neither a statistically significant nor a clinically relevant change in any of the measures of ToM.
Although this study was small and uncontrolled, our findings appear to be consistent with previous studies indicating that lower-order processes such as emotion perception can be improved more effectively than higher-order processes [5].Perhaps this is because, more so than identification of emotions in static pictures, ToM requires synthesis of multiple processes and sources of information [49], such as identifying and remembering relevant contextual details and processing of emotional cues (eg, verbal, auditive, facial, body language).
Our second module may have involved too much emphasis on emotion perception and too little on higher-order reflective processes.While little is known about what it takes to improve ToM [50], a recent meta-analysis [8] suggested that SCTs encompassing multiple domains of social cognition may be more effective than targeted interventions.Therefore, going forward, a greater emphasis on integration of higher-order social cognitive processes and application in social situations may be necessary.

Adjustments as a Result of the Pilot Study
To address concerns about the interaction module (Module 3), the graphical quality, character animations, and sound control settings of the software were updated.Given the lack of an effect on ToM, we have also updated the second VR module.We changed the multiple-choice questions into open-ended questions to incite in-depth reflection on avatars' behavior, thoughts, and feelings.That is, instead of asking only how an avatar is feeling (from a 4-choice menu), we now also ask why, what they are thinking, and how this relates to their behavior.This way, we hope to stimulate integrative reflection on social situations and engagement of higher-order social cognition.
We also adjusted the treatment protocol, to stimulate practical application of social cognition in daily life, and to better align the intervention with participants' treatment needs.First, we placed a stronger emphasis on the use and practice of strategies throughout the intervention: therapists were more explicitly instructed to select a strategy with participants before each VR session, and to encourage its application in homework exercises.Second, we simplified the way goals are set, reflected upon, and evaluated, allowing goals to not only exclusively target social cognition (eg, recognizing social cues better), but also social functioning (eg, making friends).This way, we aim to enhance the relevance and generalization of training content, to ensure treatment needs are met.Lastly, we have added a monthly supervision group where therapists can receive input and advice from the research team and one another.

Limitations
As an uncontrolled pilot study, the results of this study lack statistical power and methodological rigor to draw conclusions concerning the efficacy of DiSCoVR.Moreover, parallel versions were unavailable (Ekman 60 Faces, BLERT, TMT, all questionnaires) or were not administered in a randomized order (TASIT, Faux Pas).We therefore cannot exclude the possibility of learning, repetition, and order effects.For example, administering TASIT-A before B has been shown to result in significantly higher scores on TASIT-A [31] than if B is administered before A. This could potentially obfuscate treatment effects, as in this study TASIT-A was administered first.In addition, our recruitment method required potential participants to have sufficient ability for reflection to recognize a need for social-cognitive treatment.Therefore, we may have failed to recruit people with more severe (social) cognitive deficits, limiting the generalizability of our findings.Finally, we did not use a standardized measure to assess feasibility and acceptability, such as the System Usability Scale [24] or the Simulator Sickness Questionnaire [51] because we were interested in elements specific to our intervention.Thus, while our custom survey was informative for further development, it cannot be directly compared with previous research.

Conclusions
We set out to develop a new type of SCT, building upon existing interventions by using VR as a tool.Taking into account the criteria described above [22][23][24], we can conclude that VR SCT is feasible and acceptable for both patients and therapists, and captures the interactive nature of social situations.This pilot study therefore demonstrates that a larger-scale clinical trial using these research and treatment protocols is feasible and acceptable.However, this study also demonstrated that there is room for improvement, particularly regarding the content and reliability of the VR software and hardware.Based on these results, we have adjusted DiSCoVR.Our next step will be to test this adjusted version in a randomized controlled trial [52], comparing it with an active control condition (VR relaxation).
Does this person struggle to recognize what goes on in another person's mind?, (2) Are there observable deficits in their assessment of social situations?, (3) Does this person have problems understanding what other people mean?, and (4) Do these problems lead to social dysfunction?Promotional flyers and posters were also distributed.Participants received a compensation of €15 (US $17) for each completed assessment (up to €30 [US $34] if they completed the study), and reimbursement of any travel costs incurred for the assessments.Inclusion criteria were (1) a diagnosis of a psychotic disorder as determined by a structured diagnostic instrument (eg, Mini-International Neuropsychiatric Interview [M.I.N.I.] [25]) in the past 3 years, or as verified by a structured clinical interview (M.I.N.I.Plus) at baseline; (2) problems in social cognition as indicated by the treating clinician; and (3) an age between 18 and 65 years.Exclusion criteria were (1) an estimated IQ below 70; (2) substance dependence; (3) a diagnosis of a neurological disorder, such as epilepsy or dementia; and (4) inadequate Dutch language proficiency.

A
semistructured interview-M.I.N.I.Plus [25]-was used to verify the diagnosis of psychotic disorder, if a diagnosis of psychosis had not been determined by a (semi)structured clinical interview (eg, Structured Clinical Interview for DSM, Schedules XSL • FO RenderX for Clinical Assessment in Neuropsychiatry, M.I.N.I.) in the past 3 years.This was the case for 17 of our 22 participants (77%).

Table 2 .Table 3 .
Quantitative evaluation of VR intervention by participants (N=20).The difficulty level of the training was exactly right (1-10) Qualitative evaluation of the VR intervention by participants (N=20).n (%) a Question and answers Were you satisfied with the number, intensity, and duration of sessions?Other (ie, dizziness, assessment duration, therapist, cognitive load, tailoring, too much emphasis on others, difficulty of homework) 2020 | vol.7 | iss.8 | e17808 | p. 9 https://mental.jmir.org/2020/8/e17808(page number not for citation purposes)What did you think of the conversations of (Module 2) and with (Module 3) the avatars?

Table 1 .
Demographic and clinical characteristics of the sample (N=22).

Table 5 .
Means, standard deviations, and test statistics (baseline and posttreatment).Wilcoxon tests were carried out for the BLERT, GPTS-A, PANSS Positive, and TMT-B.