This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Mental Health, is properly cited. The complete bibliographic information, a link to the original publication on http://mental.jmir.org/, as well as this copyright and license information must be included.
Computer-delivered interventions have been shown to be effective in reducing alcohol consumption in heavy drinking college students. However, these computer-delivered interventions rely on mouse, keyboard, or touchscreen responses for interactions between the users and the computer-delivered intervention. The principles of motivational interviewing suggest that in-person interventions may be effective, in part, because they encourage individuals to think through and speak aloud their motivations for changing a health behavior, which current computer-delivered interventions do not allow.
The objective of this study was to take the initial steps toward development of a voice-based computer-delivered intervention that can ask open-ended questions and respond appropriately to users’ verbal responses, more closely mirroring a human-delivered motivational intervention.
We developed (1) a voice-based computer-delivered intervention that was run by a human controller and that allowed participants to speak their responses to scripted prompts delivered by speech generation software and (2) a text-based computer-delivered intervention that relied on the mouse, keyboard, and computer screen for all interactions. We randomized 60 heavy drinking college students to interact with the voice-based computer-delivered intervention and 30 to interact with the text-based computer-delivered intervention and compared their ratings of the systems as well as their motivation to change drinking and their drinking behavior at 1-month follow-up.
Participants reported that the voice-based computer-delivered intervention engaged positively with them in the session and delivered content in a manner consistent with motivational interviewing principles. At 1-month follow-up, participants in the voice-based computer-delivered intervention condition reported significant decreases in quantity, frequency, and problems associated with drinking, and increased perceived importance of changing drinking behaviors. In comparison to the text-based computer-delivered intervention condition, those assigned to voice-based computer-delivered intervention reported significantly fewer alcohol-related problems at the 1-month follow-up (incident rate ratio 0.60, 95% CI 0.44-0.83,
Results indicate that it is feasible to construct a series of open-ended questions and a bank of responses and follow-up prompts that can be used in a future fully automated voice-based computer-delivered intervention that may mirror more closely human-delivered motivational interventions to reduce drinking. Such efforts will require using advanced speech recognition capabilities and machine-learning approaches to train a program to mirror the decisions made by human controllers in the voice-based computer-delivered intervention used in this study. In addition, future studies should examine enhancements that can increase the perceived warmth and empathy of voice-based computer-delivered intervention, possibly through greater personalization, improvements in the speech generation software, and embodying the computer-delivered intervention in a physical form.
In the United States, heavy drinking among college students is a major public health concern that results in negative consequences for both drinking and nondrinking students [
A central tenet of MI, supported by research, is that the elicitation of “change talk” (ie, verbal behavior that is supportive of behavior change) is a key active ingredient of the intervention that predicts later changes in behavior [
There are numerous challenges to developing a voice-based computer-delivered intervention that mirrors the processes occurring in human-delivered MI more closely than existing computer-delivered interventions. Although it is relatively straightforward to program open-ended prompts for a computer to deliver using speech software and although natural language recognition programs are becoming increasingly sophisticated [
The purpose of this project was to take initial steps toward development of a voice-based computer-delivered intervention by creating a system of questions and responses that would mirror the content and style of a brief MI. For this initial development, we chose to create a “Wizard of Oz” computerized system where participants would speak directly to a computer screen and a human controller would select appropriate responses and follow-up questions from an onscreen menu, which would then be “spoken” by the computer using voice-generation software. Thus, our software was responsible for answer generation and speech synthesis, and a human operator handled the problem of speech understanding and dialog flow. Because automating these features will require significant engineering work, we focused on the proof of concept as demonstrated by this mixed human/computer approach. The system was designed to ask open-ended questions, encourage deeper reflection of motivations, and provide MI-consistent responses such as paraphrased reflections, double-sided reflections, affirmations, and summary statements.
We tested the feasibility and acceptability of our human-controlled version of a voice-based computer-delivered intervention with a sample of heavy drinking college students. We examined (1) participants’ ratings of how well the voice-based computer-delivered intervention attained key goals of MI, such as understanding the participant, being nonjudgmental, and being empathic and engaging; (2) whether participants were willing to set a goal to change drinking during the interaction; and (3) whether participants accepted a printed sheet on tips for reducing drinking at the end of the session. We also conducted a follow-up assessment with participants 1 month after the initial interaction with the voice-based computer-delivered intervention in order to test our primary hypotheses that participants receiving the voice-based computer-delivered intervention would report a significant increase in perceived importance of changing their drinking and report significant reductions in drinking and alcohol-related problems, consistent with the literature on computer-delivered interventions in college student populations. In order to gauge in a preliminary manner how the voice-based computer-delivered intervention might differ in its effect from traditional text-based computer-delivered intervention, we randomized one-third of participants to a text-based computer-delivered intervention, which matched the voice-based computer-delivered intervention in content, but relied on mouse and keyboard entries of participant responses and provided only text-based responses from the computer. We compared the voice-based computer-delivered intervention to the text-based computer-delivered intervention on acceptability measures. We also examined the drinking outcomes of participants assigned to the voice-based computer-delivered intervention versus the text-based computer-delivered intervention at 1-month follow-up. Given the literature cited previously regarding the importance of change talk and our supposition that a voice-based computer-delivered intervention may increase processing of change talk through verbalization, we hypothesized that the voice-based computer-delivered intervention, compared to the text-based computer-delivered intervention, would result in greater increases in perceived importance of changing drinking and greater reductions in drinking behavior and related problems. These secondary hypotheses were considered preliminary because the study was not fully powered to assess differences between conditions over time, and our emphasis was on the overall direction of effects across measures.
Participants were recruited from local colleges and universities using flyers and Web-based advertisements. Eligible participants were enrolled in undergraduate or graduate programs in the Northeastern United States, were 18 years of age or older, and endorsed at least one episode of heavy drinking (≥5 drinks in a single sitting for men, ≥4 drinks for women) in the past 30 days.
Sample size was determined by taking the following considerations into account. First, we wanted ample power to detect—within the voice-based computer-delivered intervention condition—significant changes in importance of changing drinking and drinking-related outcomes, our primary hypotheses. An initial sample size of 60, assuming an 85% follow-up rate, provided power of .94 to detect a medium effect size of
All procedures were approved by the Brown University Institutional Review Board. Following eligibility assessment by telephone, those who appeared eligible were invited for a baseline session, which occurred in the laboratory. At the baseline interview, participants first completed written informed consent followed by measures of demographics, alcohol use over the past 30 days, alcohol-related problems, and importance of changing drinking. Breath alcohol concentration was measured at baseline; those with values greater than zero were asked to reschedule. After baseline assessments were completed, participants were randomized in a 2:1 ratio using the urn procedure [
Thirty days after the baseline session, email links were sent with instructions to complete follow-up surveys. (See
Participant flow.
Alcohol use over the past 30 days was assessed using an online timeline follow-back measure [
To assess the extent to which the system approximated MI counseling characteristics, we administered two brief surveys, specifically designed for this project, to all participants that contained (1) five items (7-point Likert scale: 1=“not at all” to 7=“very”) reflecting general therapist traits (eg, warmth, understanding) as well as (2) eight items (4-point Likert scale: 1=“strongly disagree” to 4=“strongly agree”) reflecting MI strategies (eg, helped me to talk about my ideas for change). Both the five-item scale assessing general therapist traits and the eight-item scale assessing MI strategies demonstrated good internal consistency (Cronbach alpha=.89 and .83, respectively).
Other relevant measures were assessed from within the intervention as participants completed it, including whether participants (1) set a goal for reducing their drinking and/or (2) agreed to receive further information on changing their drinking.
The computer-delivered interventions contained several common facets. They both assessed users’ levels of drinking and provided feedback in the form of peer-based norms. The computer-delivered interventions assessed positive and negative consequences of drinking, and used 0 to 10 rulers to assess participants’ perceived importance of and confidence in changing drinking behavior, followed by assessment of reasons for those ratings. Finally, if participants endorsed willingness, the computer-delivered interventions assisted users in setting a goal for change. Additional information (a pamphlet) on reducing drinking was also offered to users at the end of the session.
Participants assigned to the text-based computer-delivered intervention completed the session with no observer, interacting with the system by entering their responses using a keyboard and mouse. For example, the system presented an onscreen question asking what the user liked about drinking, and the participant responded by viewing a list of possible options and checking the corresponding boxes that applied to their experience. The system then reflected the positive and negative consequences endorsed by the participant in text presented on the computer screen.
Participants assigned to the voice-based computer-delivered intervention completed the intervention by speaking to the system. Their verbal responses were captured by a microphone in the interview room and were monitored by a research assistant outside of the room, who could also see the participant through a one-way mirror. The research assistant listened to the questions that the system asked and based on a participant’s responses selected appropriate paraphrases of content or prompts to the participant for further information from a pre-established list of possible responses. For example, the system verbally asked what the user liked about drinking and, as the user responded verbally, the human controller checked off responses such as drinking “helps you have fun” or that drinking “tastes good.” The positive and negative consequences of drinking were then verbally reflected to the participant via computerized voice, with the phrases strung together to create a double-sided reflection: “On the one hand you like that drinking..., but on the other hand, you do not like that...” The voice-based computer-delivered intervention also allowed custom user responses to be entered and allowed the human controller to have the system inject common follow-up questions and comments, such as “Can you repeat that?” and “What else?” The voice used to speak the computer responses was selected from the standard speech-to-text voice options available on Mac OS X.
We first examined participants’ ratings of the characteristics of the voice-based computer-delivered intervention to determine how well the system met the objective of reflecting positive therapist traits (eg, how supportive was the system) and MI-based therapy traits (eg, how well did the system help you talk about your own reasons for change). We compared these ratings to those given to the text-based computer-delivered intervention using
Demographic characteristics of the 90 participants in the study are shown in
General therapist traits were rated at the midpoint between “not at all” and “very” for the voice-based computer-delivered intervention group, and participants agreed that the voice-based computer-delivered intervention system was consistent with MI counseling style (mean 3.0). Within the voice-based computer-delivered intervention condition, 61% (37/60) of participants were willing to set a goal to reduce their drinking, and 60% (36/60) accepted additional information on reducing their drinking at the conclusion of the session. As shown in
Demographics for the full sample and intervention.
Variable | Total (N=90) | Computer-delivered intervention | ||
Voice (n=60) | Text (n=30) | |||
Age (years), mean (SD) | 21.6 (2.8) | 21.7 (2.3) | 21.47 (3.5) | |
Female | 51 (57) | 32 (53) | 19 (63) | |
Male | 38 (42) | 27 (45) | 11 (37) | |
Other | 1 (1) | 1 (2) | 0 (0) | |
Asian | 12 (13) | 8 (13) | 4 (13) | |
Black | 13 (14) | 12 (20) | 1 (3) | |
Biracial | 1 (1) | 1 (2) | 0 (0) | |
Multiracial | 5 (6) | 3 (5) | 2 (7) | |
Other race | 5 (6) | 3 (5) | 2 (7) | |
Pacific Islander | 1 (1) | 0 (0) | 1 (3) | |
White | 53 (59) | 33 (55) | 20 (67) | |
Years of education, mean (SD) | 15.0 (1.7) | 15.1 (1.5) | 14.9 (1.8) | |
Full-time student, n (%) | 43 (48) | 27 (45) | 16 (53) |
Ratings of therapist and brief motivational interviewing traits by intervention.
Traits | Computer-delivered intervention, mean (SD) | α | ||||
Voice | Text | |||||
0.882 | .46 | .89 | ||||
How engaging was the system? | 4.6 (1.4) | 4.3 (1.7) | ||||
How empathetic was the system? | 3.7 (1.5) | 4.3 (1.6) | ||||
How warm was the system? | 3.8 (1.5) | 4.0 (1.7) | ||||
How well did the system understand you? | 4.4 (1.6) | 4.5 (1.9) | ||||
How satisfied did you feel with the system? | 4.3 (1.5) | 4.7 (1.6) | ||||
Total | 4.2 (1.2) | 4.4 (1.5) | ||||
–0.555 | .48 | .83 | ||||
Was easy to interact with | 3.0 (0.6) | 3.2 (0.6) | ||||
Understood me | 2.8 (0.7) | 2.8 (0.7) | ||||
Asked about my ideas before presenting its own | 3.2 (0.5) | 2.9 (0.6) | ||||
Helped me talk about my own reasons for change | 3.1 (0.6) | 2.8 (0.7) | ||||
Respected my ideas about how I might make changes | 3.1 (0.5) | 2.9 (0.6) | ||||
Did not push me into something I wasn’t ready for | 3.1 (0.6) | 3.0 (0.5) | ||||
Accepted that I might not want to change | 3.0 (0.7) | 2.9 (0.7) | ||||
I felt engaged in the session (willing to discuss drinking) | 3.1 (0.7) | 3.0 (0.6) | ||||
Total | 3.0 (0.4) | 2.9 (0.5) |
a Five items rated on a 7-point Likert scale (1=“not at all” to 7=“very”).
b Eight items rated on a 4-point Likert scale (1=“strongly disagree” to 4=“strongly agree”).
Attrition analyses were conducted to assess if there were any significant differences between participants who completed the follow-up assessment and those who did not. Noncompleters were not significantly different from completers in terms of demographics, number of drinks consumed per week, or number of heavy drinking episodes in past month. However, a significant difference was observed between completers and noncompleters in number of alcohol-related problems (BYAACQ), with noncompleters (n=12; voice: n=8, 13%; text: n=4, 13%) endorsing significantly more alcohol-related problems at baseline (mean difference 2.94;
Paired
Baseline and follow-up alcohol-related measures for the full sample and by condition.
Variable | Full sample, mean (SD) (N=90) | Voice-based computer-delivered intervention, mean (SD) (n=60) | Text-based computer-delivered intervention, mean (SD) (n=30) | |||
Baseline | Follow-up | Baseline | Follow-up | Baseline | Follow-up | |
Number of drinks per weeka | 10.1 (8.2) | 7.5 (6.0) | 9.6 (7.2) | 7.0 (5.4) | 11.2 (9.9) | 8.6 (7.0) |
Number of heavy drinking daysa | 4.4 (3.5) | 2.8 (2.7) | 4.3 (3.3) | 2.7 (2.8) | 4.6 (3.9) | 2.9 (2.5) |
Alcohol-related problemsa | 6.1 (4.3) | 4.9 (4.2) | 5.9 (4.3) | 4.0 (3.3) | 6.5 (4.1) | 6.8 (5.2) |
Importance of changing drinking | 2.7 (2.2) | 3.5 (3.0) | 2.5 (2.0) | 3.6 (3.1) | 3.0 (2.5) | 3.3 (2.9) |
Confidence to change drinking | 7.7 (2.1) | 8.3 (1.9) | 7.8 (2.1) | 8.4 (1.8) | 7.7 (2.2) | 8.2 (2.3) |
a Number of drinks per week and number of heavy drinking days in the past month were collected via Alcohol Timeline Follow-back. Alcohol-related problems experienced in the past month were assessed via BYAACQ.
Covarying baseline alcohol-related problems, participants randomized to the voice-based computer-delivered intervention reported 40% fewer alcohol-related problems at follow-up compared to participants in the text-based condition (incident rate ratio [IRR]=0.60, 95% CI 0.44-0.83,
This study represents a promising initial step toward developing a computer-delivered intervention for heavy drinking that relies on an interactive voice-based system rather than a traditional keyboard-and-mouse text-based system. Results showed that it was feasible to create a set of predetermined questions and responses that were sufficient to direct a user through the typical components of a brief MI, while demonstrating to users that their responses were heard and understood. Participants receiving the voice-based computer-delivered intervention agreed that the system demonstrated MI-consistent behavior (eg, helped me talk about reasons for change, asked me about my ideas before presenting its own), and displayed at least moderate levels of general therapist traits (eg, was understanding, was engaging). When compared to a text-based computer-delivered intervention, the voice-based computer-delivered intervention appeared to perform equally well in terms of these system ratings. Although no significant differences on the total score for either scale were observed between conditions, several ratings on the individual-item level that might have been expected to be greater for the voice-based computer-delivered intervention were observed to be numerically lower than the text-based computer-delivered intervention; for example, empathy and warmth were rated lower on average for the voice-based computer-delivered intervention. The observation that the point-and-click interface (text-based computer-delivered intervention) may be rated at least as, if not more, empathetic/warm than the voice-based computer-delivered intervention highlights potential areas for improvement. We speculate that the voice we used for the voice-based computer-delivered intervention system, which had a distinctly robotic tone, may have contributed to these relatively low user ratings. Furthermore, we did not have an onscreen avatar or other visual presence during the session, and some participants expressed, while interacting with the voice-based computer-delivered intervention, that they were unsure whether they should be speaking to the static image on the computer screen or looking elsewhere.
Participants in the voice-based computer-delivered intervention condition reported significant decreases in number of drinks consumed and number of heavy drinking days, and significant increases in perceived importance of changing drinking, but confidence in their ability to change drinking, which was high at baseline, did not increase significantly. The voice-based computer-delivered intervention, compared to the text-based computer-delivered intervention, did not result in significantly greater change on any of these variables, and the differences between the conditions on these variables were small. However, we did observe a significant difference between conditions in alcohol-related problems reported at 1-month follow-up. Specifically, those randomized to the voice-based computer-delivered intervention, compared to those in the text-based computer-delivered intervention, reported about a 40% lower number of alcohol problems in the month after intervention. Although drinking was reduced following both computer-delivered interventions, only the voice-based computer-delivered intervention appeared to lead to a reduction in alcohol problems.
The fact that the voice-based computer-delivered intervention, compared to the text-based computer-delivered intervention, resulted in significantly lower alcohol-related problems but did not appear to have a greater effect on reducing alcohol consumption was unexpected. However, previous studies have demonstrated that alcohol consumption and problems have distinct etiological pathways [
Several important limitations should be taken into consideration when evaluating results of this study. First, the sample consisted of college-aged participants who met criteria for heavy drinking, but whose overall levels of drinking were relatively low compared to other intervention studies with college students (eg, [
The task of constructing a voice-based computer-delivered intervention that can ask questions about alcohol use and respond in a manner consistent with MI practice is a challenging one. First, the voice-based computer-delivered intervention used in this study relied on a human controller. We have recorded participant responses and therefore can analyze the participant verbal behavior that led to specific choices by the human controller about which response button to push. Machine-learning algorithms may be able to detect the key verbal content and configurations that suggest the appropriate response, which can then be used to develop a prototype of an automated system.
Prior research has shown that people respond more strongly to automated systems that are more emotive in speech and animation. For example, users tasked with training a robot how to dance trained with the robot longer and with more accurate examples when the robot’s reactions to its progress were more emotive [
The use of a voice-based system that can allow for greater personalization of the computerized interventionist (eg, allowing the system to introduce itself and address the participant directly) may help to increase general therapist ratings. The system could also be made more sophisticated by creating ways in which information obtained earlier in the interaction are reintroduced later in the interaction, such as when the user is making a change plan. This would be particularly important in regards to change talk, which could be reiterated in later portions of the session to make it more salient to the user. Identifying mediating variables that account for the differences observed between the interventions will help inform future directions for improving the voice-based computer-delivered intervention. In particular, it would be useful to know what strategies participants used to avoid alcohol-related problems. That information could be used, in turn, to improve the voice-based computer-delivered intervention by highlighting those potential strategies when completing a change plan. Finally, an emerging line of experimental research has shown that compared to screen avatars, embodied robots (ie, robots that have a physical form and are in the room with participants) elicit greater engagement and compliance from people who are following directions from the automated system [
Brief Young Adult Alcohol Consequences Questionnaire
incident rate ratio
motivational interviewing
personal computer
The authors thank Catherine Costantino, Jennifer Duff, and Majesta Kitts for their help in running this study. Funding for this project was provided by an internal Seed Fund Award from the Office of the Vice President for Research at Brown University.
None declared.