This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Mental Health, is properly cited. The complete bibliographic information, a link to the original publication on http://mental.jmir.org/, as well as this copyright and license information must be included.
The use of online surveys for data collection has increased exponentially, yet it is often unclear whether interview-based cognitive assessments (such as face-to-face or telephonic word recall tasks) can be adapted for use in application-based research settings.
The objective of the current study was to compare and characterize the results of online word recall tasks to those of the Health and Retirement Study (HRS) and determine the feasibility and reliability of incorporating word recall tasks into application-based cognitive assessments.
The results of the online immediate and delayed word recall assessment, included within the Women’s Health and Valuation (WHV) study, were compared to the results of the immediate and delayed recall tasks of Waves 5-11 (2000-2012) of the HRS.
Performance on the WHV immediate and delayed tasks demonstrated strong concordance with performance on the HRS tasks (ρc=.79, 95% CI 0.67-0.91), despite significant differences between study populations (
The key finding of this study is that the HRS word recall tasks performed similarly when used as an online cognitive assessment in the WHV. Online administration of cognitive tests, which has the potential to significantly reduce participant and administrative burden, should be considered in future research studies and health assessments.
The use of Internet-enabled devices, such as computers, smartphones, and tablets, to conduct cognitive research has increased dramatically over the past decade [
Application-based administration of cognitive tests has the potential to significantly advance research examining changes in cognition due to aging or illness. Repeated, short online cognitive batteries can provide a fine-grained assessment of cognitive capabilities in everyday life. For example, studies could examine situations or times of day in which cognitive lapses are most likely to occur (ie, during stress) [
Frequent use of cognitive assessments may be particularly important in clinical and primary care settings, where early indicators of mild cognitive impairment can be misdiagnosed as typical age-related declines in as many as 91% of cases [
Episodic memory is one of the first domains in which people experience subclinical changes in cognitive performance [
Recent evidence indicates that subtle changes in episodic memory can be detected in individuals with normal or slightly impaired cognitive abilities [
In response to this gap, the current study opted to replicate the episodic memory tasks (immediate and delayed recall) of the Health and Retirement Study (HRS) in an online survey. These tasks were selected for a number of reasons. First, performance on the cognitive measures of the HRS has shown to be stable from wave to wave, after controlling for cohort effects and test-retest bias [
This study examines the performance of an online word recall task that was originally developed as part of the HRS for cognitively healthy adults. Specifically, the results of an online immediate and delayed word recall task in a nationally representative sample of women aged 40 to 69 years were compared to the results of female respondents from waves 5-11 (2000-2012) of the HRS. Using these primary and secondary data, two questions were examined: (1) Do the online word recall tasks demonstrate sufficient equivalence to the HRS word recall tasks? (2) Does word recall performance vary as a function of respondent characteristics and task modality? Ultimately, the results of this study will aid in the evaluation of the potential of cognitive assessments in online surveys and health assessments.
Since its launch in 1992, the goal of the Health and Retirement Study (HRS) has been to provide a detailed, national representation of US adults aged 50 years and older. Jointly managed through the National Institute on Aging (U01 AG009740), the Institute for Social Research, and the University of Michigan (IRB Protocols HUM00056464, HUM00061128, HUM00002562, HUM00079949, HUM00080925, and HUM00074501), the HRS is widely cited as an excellent source of data for use in examining cognitive trends and abilities of the aging US population [
Conducted at Moffitt Cancer Center in Tampa, Florida, the Women’s Health Valuation (WHV) study is an Internet-based health valuation study that included health measures and a discrete choice experiment (DCE) where respondents reported their preferences between possible health outcomes. The approach and methods, including its sampling design and survey instrument, were adapted from the PROMIS-29 valuation study (1R01CA160104) [
The WHV online survey instrument had four components: screener, health, DCE, and follow-up. Each component had a series of questions distributed across a continuous series of pages, and responses were recorded by clicking or typing answers and then hitting the
Participants were recruited from a pre-existing national panel of US adults. To promote concordance with the 2010 US Census, participants were sampled according to 6 demographic quotas: age in years (40-54 and 55-69) and race/ethnicity (Hispanic; black, non-Hispanic; white; and other, non-Hispanic). Further details about the methods of this study are available online [
The cognitive battery of the HRS has been evaluated for internal consistency and validity [
As a measure of episodic memory, the immediately and delayed recall tasks are drawn from four categorized lists of 10 English nouns that did not overlap in content. Respondents are randomly assigned to one of the four lists at the initial interview. Longitudinally, each respondent is randomly assigned to receive an alternative word list, such that each respondent is assigned to a different set of words for the three successive waves of data collection. With this counterbalanced approach, each respondent was assigned to each word list only once over 4 waves of data collection, and approximately 8 years will pass before a respondent is reassigned to the same set of words as their initial interview.
During the immediate recall task, an interviewer reads a list of 10 words at a rate of approximately 2 seconds per word to each respondent, who verbally recalled as many words as possible. Approximately 5 minutes after the immediate word recall test, during which respondents answered questions about their emotional state and completed two mental status tasks (eg, counting backwards, serial 7s), respondents were asked to recall the words from the immediate recall task. For each task, the number of correctly recalled words is scored, with higher scores indicating better performance.
In addition to episodic memory, HRS respondents are also asked to self-report their memory at the present time (excellent, very good, good, fair, or poor) and compare their current memory to their memory 2 years ago (better, same, or worse).
For the purpose of comparison, this study examines all word recall responses from waves 5-11 (2000-2012) of the HRS. Since the WHV was restricted to female respondents, we excluded male respondents from the HRS to decrease the risk of gender bias. Participants of the HRS who reported using a proxy respondent; refused to respond to word recall tasks; or had missing data on demographic, memory, or word recall variables (less than 2.0% of the sample) were also excluded. Aside from these exclusion criteria, 12,545 women completed between 1 and 7 word recall tasks with a median (interquartile range) of 3 tasks (2-5 tasks). These tasks were restructured to represent a cross-sectional dataset with a total of 43,417 word recall tasks.
The episodic memory of the WHV replicated the word recall task conducted as part of the HRS. All respondents were asked to recall 10 English nouns immediately after they were presented on-screen (immediate recall) and after a delay (delayed recall). Each respondent received one of four randomly assigned sets of words, which were taken verbatim from the HRS and presented in the same order. Prior to the immediate recall task, respondents were presented with a screen that informed them that they would be shown a set of 10 words and would be asked to recall as many words as they could. These instructions were largely based on those given to HRS respondents but modified for online presentation. Words appeared on the computer screen one at a time for approximately 3 seconds. Respondents were asked to recall the words directly after the presentation of all 10 words (immediate recall) and then approximately 20 minutes later at the end of the DCE component (delayed recall). For each recall, respondents typed as many words as they could remember, in any order, in empty text boxes within the survey. As with the HRS, the primary measure of episodic memory was the sum of correctly recalled words for each task, regardless of order.
The self-reported memory questions of the WHV were replicated from the self-reported memory questions of the HRS. As part of the health component, the self-reported memory questions asked participants to rate their memory at the present time (excellent, very good, good, fair, or poor) and compare their current memory to their memory 2 years ago (better, same, or worse).
Compared to the word recall task in the HRS, the online task in WHV differed in the several ways. The word lists were displayed visually on a computer device/browser as opposed to being spoken by an interviewer (basic literacy skills were required, with less reliance on verbal communication), respondents recalled words by typing them versus speaking them (basic typing skills were required, with less reliance on verbal communication), and the words can sound the same with different spelling (eg, see vs sea and rock vs roc), which may make the WHV task more specific. In addition, the delay between the immediate and delayed recalls task was shorter (5 minutes vs 20 minutes) and the WHV version was purely cross-sectional, whereas HRS respondents may have completed the tasks up to seven times. Nevertheless, the study took all available steps possible to replicate the original HRS tasks.
Demographic and descriptive statistics (
The WHV online survey had 4474 respondents, each of whom completed 1 word recall task. The HRS survey had 12,545 respondents who completed between 1 and 7 recall tasks. As shown in
For the immediate and delayed recall tasks, this study assessed differences in association between the number of correctly recalled words by study sample and word list assignment (
Respondent characteristics by modality.
|
WHV | HRS |
|
|
Number of respondents | 4474 | 12,545 |
|
|
Number of tasks per respondent, |
1 | 3 (2-5) |
|
|
Total number of tasks | 4474 | 43,417 |
|
|
|
53 (48-61) | 60 (55-64) | <.001 | |
|
40-44, n (%) | 629 (14.17) | 735 (1.68) |
|
|
45-49, n (%) | 754 (16.98) | 2158 (4.94) |
|
|
50-54, n (%) | 1051 (23.67) | 7701 (17.63) |
|
|
55-59, n (%) | 661 (14.89) | 10, 951 (25.07) |
|
|
60-64, n (%) | 641 (14.44) | 11,260 (25.78) |
|
|
65-69, n (%) | 704 (15.86) | 10,868 (24.88) |
|
|
|
|
<.001 | |
|
White, n (%) | 3556 (80.09) | 33,992 (75.14) |
|
|
Black, n (%) | 632 (14.23) | 8832 (19.52) |
|
|
Other, n (%) | 252 (5.68) | 2412 (5.33) |
|
|
|
|
<.001 | |
|
No, n (%) | 3743 (84.30) | 39,604 (87.27) |
|
|
Yes, n (%) | 697 (15.70) | 5774 (12.72) |
|
|
|
|
<.001 | |
|
No degree, n (%) | 168 (3.78%) | 8334 (18.40) |
|
|
High school diploma/GED, n (%) | 1955 (44.03) | 24,941 (55.05) |
|
|
Associates degree/some college, n (%) | 1257 (28.31) | 2747 (6.06) |
|
|
Bachelor's degree, n (%) | 669 (15.07) | 5664 (12.50) |
|
|
Master's degree, n (%) | 320 (7.21) | 3198 (7.06) |
|
|
Law/MD/PhD, n (%) | 71 (1.60) | 419 (0.92) |
|
|
|
|
<.001 | |
|
Married, n (%) | 2338 (53.78) | 28,333 (62.35) |
|
|
Partnered, n (%) | 231 (5.20) | 2069 (4.55) |
|
|
Separated/divorced, n (%) | 1048 (23.60) | 8654 (19.04) |
|
|
Widowed, n (%) | 262 (5.90) | 4922 (10.83) |
|
|
Never married, n (%) | 511 (11.51) | 1467 (3.23) |
|
|
|
|
<.001 | |
|
Excellent, n (%) | 468 (10.54) | 2518 (5.77) |
|
|
Very good, n (%) | 1743 (39.04) | 11,353 (26.00) |
|
|
Good, n (%) | 1704 (38.38) | 18,991 (43.48) |
|
|
Fair, n (%) | 486 (10.93) | 9137 (20.92) |
|
|
Poor, n (%) | 48 (1.08) | 1654 (3.79) |
Likelihood of immediate recall by word.
Average number of correctly recalled words by list and modality.
Immediate recall | Delayed recall | |||||
|
WHV | HRS |
|
WHV | HRS |
|
Overall | 7.24 | 6.06 | 1 | 5.41 | 5.21 | <.001 |
List 1a | 7.40 | 6.14 | .<001 | 5.57 | 5.24 | <.001 |
List 2a | 7.16 | 5.93 | .<001 | 5.29 | 5.08 | .013 |
List 3a | 7.12 | 6.12 | <.002 | 5.27 | 5.28 | .922 |
List 4a | 7.30 | 6.07 | <.001 | 5.50 | 5.23 | .002 |
aSignificant differences were detected between lists for immediate (
Associated respondent characteristics and number of correctly recalled words by survey modality: WHV versus HRS.
|
Immediate recall | Delayed recall | |||||
|
|
WHV | HRS |
|
WHV | HRS |
|
Constantb | 7.19d | 6.34d | <.001 | 5.62d | 5.48d | <.001 | |
|
|
|
|
|
|
|
|
|
40-44 | −.04 | .25d | .01 | −.13 | .31d | <.001 |
|
45-49 | −.01 | .01 | .84 | −.05 | .01 | .65 |
|
50-54 | — | — | — | — | — | — |
|
55-59 | .14 | −.01 | .10 | −.04 | .04 | .57 |
|
60-64 | .01 | −.07d | .48 | −.10 | .02 | .39 |
|
65-69 | .02 | −.24d | <.001 | .15 | −.16d | .03 |
|
|
|
|
|
|
|
|
|
White | — | — | — | — | — | — |
|
Black | −.55d | −.50d | .57 | −.71d | −.74d | .86 |
|
Other | −.16 | −.34d | .26 | −.32 | −.41d | .70 |
|
|
|
|
|
|
|
|
|
No | — | — | — | — | — | — |
|
Yes | −.19c | −.36d | .08 | −.10 | −.35d | .09 |
|
|
|
|
|
|
|
|
|
No degree | −.36c | −.61d | .14 | −.25 | −.64d | .07 |
|
High school |
— | — | — | — | — | — |
|
Associate’s degree/ |
.12 | .19d | .35 | −.13 | .18d | <.001 |
|
Bachelor's degree | .35d | .47d | .19 | .26c | .50d | .07 |
|
Master's degree | .60d | .61d | .92 | .40c | .68d | .12 |
|
Law/MD/PhD | .67d | 1.01d | .13 | .52 | 1.09d | .16 |
|
|
|
|
|
|
|
|
|
Married | — | — | — | — | — | — |
|
Partnered | −.14 | −.10d | .79 | .17 | −.11c | .17 |
|
Separated/divorced | −.02 | −.07d | .45 | −.17 | −.10d | .52 |
|
Widowed | .04 | −.04 | .48 | .05 | −.03 | .67 |
|
Never married | .05 | −.17d | .04 | −.05 | −.14c | .56 |
|
|
|
|
|
|
||
|
Excellent | .18 | .13d | .64 | .12 | .12d | .99 |
|
Very good | .22d | .21d | .97 | .07 | .24d | .10 |
|
Good | — | — | — | — | — | — |
|
Fair | −.52d | −.31d | .03 | −.60d | −.35d | <.001 |
|
Poor | −1.53d | −.74d | .01 | −1.93d | −.85d | <.001 |
aRepresents
bBase scenario represents the average number of words that are correctly recalled by a white female, aged 50-54 years, who is married, has a high school diploma, and self-reports her current memory as good.
c
d
Immediate word recall was significantly associated with respondent characteristics in WHV and HRS tasks, and there were significant modality differences between the online and HRS studies. Overall, WHV respondents immediately recalled about one more word (0.85) than HRS respondents did, after adjusting for respondent characteristics. In terms of demographics, age was significantly associated with immediate recall for the HRS task but not the WHV task. Specifically, younger respondents recalled more words than older respondents in the HRS tasks but not in the WHV tasks. Non-white and/or Hispanic respondents were significantly associated with reduced immediate recall for either modality; however, their associations were not significantly different by modality.
Levels of educational attainment were significantly associated with immediate recall for both the HRS and WHV tasks. Detrimental effects were seen for the lowest education level; respondents with less than a high school diploma recalled fewer words. The benefits of obtaining education beyond high school were incrementally significant, with the exception of WHV respondents who earned an associate’s degree. Marital status was significantly associated with immediate recall in the HRS tasks but not the WHV tasks. Specifically, respondents who reported being partnered, separated, divorced, or never married recalled fewer words than their married counterparts. However, the only associations that differed significantly between modalities were those for individuals who were never married.
Self-reported current memory was significantly associated with immediate word recall in both modalities. As expected, those who reported their memory as excellent or very good were more likely to recall more words than those with a fair or poor memory. However, it is unclear whether those who reported excellent memory had better recall than those who reported very good memory. The association between a poor memory and immediate word recall was statistically significant with a noteworthy effect (1.53 words less than good memory). The association with fair or poor was greater for the WHV task than the HRS task, possibly because of interviewer biases (eg, slowing the task for persons who reported poor memory).
As with immediate word recall, the associations between respondent characteristics and delayed word recall were significant, and their associations differed by modality. Adjusting for respondent characteristics, WHV respondents recalled approximately 0.14 more words after a delay than HRS respondents. Like the immediate recall results, the association between age and delayed recall was significant for the HRS task but not the WHV task. For both modalities, respondents who were Non-white and/or Hispanic performed significantly worse on the delayed recall tasks, but the associations did not differ significantly.
Levels of educational attainment were significantly associated for both modalities and differed slightly from what was seen for the immediate recall task. Significant detrimental effects were no longer seen for WHV respondents with less than a high school diploma but persisted for HRS respondents. Higher levels of education beyond an associate’s degree remained significantly associated with greater delayed recall, with the exception of WHV respondents who earned an associate’s or advanced degree. The association between advanced education levels and recall was very strong for HRS respondents, who recalled approximately 0.50 more words compared to similarly educated WHV respondents. Marital status was significantly associated with delayed recall for the HRS modality but not the online modality. HRS respondents who reported being partnered, separated or divorced, or never married recalled significantly fewer words compared to married respondents. The associations between modalities were not significantly different.
Self-reported current memory was significantly associated with delayed word recall in both modalities. Similar to the immediate recall task, respondents who reported their memory as excellent or very good were more likely to recall more words than those with a fair or poor memory. The association between poor memory and delayed recall intensified for WHV respondents, who recalled nearly 2 words less compared to the base scenario and more than 1 word less compared to HRS respondents with a similar memory rating.
In order to explore the possibility that word recall scores for WHV respondents were influenced by literacy level and typing skills (ie, misspelled words would not be counted as correct), the previous analyses were rerun after correcting words that were misspelled by one letter. This arbitrary adjustment was based on the number of WHV responses that appeared to be related to misspellings (eg, doller for dollar) or mistyping (eg, ovean for ocean), and is akin to the best-judgment practice granted to HRS interviewers when determining whether a HRS response should be counted as correct (eg, woman for women or shoe for shoes). When the analyses were rerun using the spell-corrected word counts, no significant differences were seen for any of the results. Therefore, the results reported here were conducted using the uncorrected word recall responses for WHV respondents.
This study compared and characterized the results of the WHV word recall task to those of a gold standard HRS word recall task in order to determine reliability for future surveys. The results of this study provide support for the inclusion of online cognitive assessments in health surveys. This is the first study attempting to replicate the HRS word recall tasks in an application-based assessment. The results indicate that the immediate and delayed word recall tasks were equivalent to the HRS tasks, as evidenced by high levels of concordance (precision) and association with self-reported memory (convergent validity). Even after controlling for age, education, and self-reported memory, WHV respondents recalled nearly one more word than HRS respondents for the immediate recall tasks. This difference decreased but remained significant for the delayed recall and may be attributed to study design differences or other unobservable sample selection biases. In summary, both HRS and WHV tasks appear to perform well despite key differences between the studies.
While our normalized results demonstrated a high level of concordance between the WHV and HRS tasks and thus support the primary goal of this study, we did note significant differences between samples that may be related to a number of potential confounders, such as differences in study design. For example, the HRS recall lists were presented verbally, whereas the words of the WHV lists were presented visually. Upon initial review, one may think that differences in how the brain processes auditory versus visual information may contribute to modality differences. However, research has shown that auditory and visual recall tasks activate overlapping regions of the brain, and while the left hemisphere of the brain is activated slightly more during visual tasks, there is no evidence that recall performance is impacted by modality [
An additional difference in study design is the length of time and type of activities that were completed by respondents between the immediate and delayed recall tasks. HRS respondents answered questions regarding their emotional state over the past week (eg, levels of motivation, happiness, and loneliness) and completed two mental math tasks (ie, counting backwards and subtracting 7s) for 5 minutes. WHV respondents completed a series of DCE tasks during the 20-minute delay, which may arguably require greater levels of cognitive engagement. These dissimilarities in the amount of delay and the complexity of the tasks completed during the delay may have contributed to the observed modality differences. The regression analysis may control for some of the sample selection issues, but panel and delay attributes may also explain differences by modality.
In addition to modality differences, there is a potential concern for practice effects to bias the results of repeated word recall tasks, particularly since such effects mask true declines in cognitive performance [
In a more recent analysis, McArdle et al found evidence of practice effects in cognitive data from earlier waves of the HRS (1992-2004) [
These methodological changes clearly restrict the interpretability of potential practice effects noted within the HRS. The results of the current study are less subjective to such biases since the analyses are restricted to the 2000-2012 waves of the HRS (ie, the counterbalanced assignment of word recall lists is uniform across waves). Despite this counterbalanced approach, it is not possible to completely rule out the potential influence of practice effects. Future studies should attempt to measure the presence and impact of practice effects in the HRS using only the waves with identical methodological approaches.
We also found several interesting associations between episodic memory performance and sociodemographic characteristics. The effect of marital status on word recall was significant only for HRS respondents; individuals who were partnered, separated or divorced, or never married performed worse compared to those who were married. The presence of significant results in the HRS sample but not the WHV sample may be related to the fact that married/partnered HRS respondents are often interviewed one after the other. Previous research has indicated that spouses who are interviewed second may be at a disadvantage in free recall tasks [
Education was another sociodemographic characteristic that was significantly associated with word recall performance, with higher levels of education significantly predicting higher episodic memory performance. Higher levels of education are thought to influence cognitive function by increasing individual levels of brain and cognitive reserve [
While several computer-based cognitive batteries have been developed [
Future research should assess additional cognitive tasks included in the HRS. This type of research might expand the results of the current study to investigate the effects of setting (eg, waiting room, hospital room, home use of online tasks) or to support the use of routine online cognitive assessments to track cognitive change in healthy older adults or clinical populations. Furthermore, clear standards for measurement using online tasks similar to the electronic patient-reported outcome literature should be created [
A key limitation of the study is the use of an existing panel in the community setting. While some may argue that sampling bias is introduced by using research panels who demonstrate high levels of technological capabilities (ie, use of computers, smartphones, tablets), it has also been noted that such panels allow researchers to collect large amounts of data from diverse populations [
Inability to monitor respondent behavior is a limitation of online and telephone surveys [
In summary, this study found a high level of convergent validity between the WHV and HRS word recall tasks, after controlling for age, education, and self-reported memory. Use of application-based cognitive assessments should continue to expand in community research and clinical settings, but greater efforts need to be made in regards to validating such online measures. Additionally, researchers should be wary of a number of potential biases, including modality differences, retest effects, and gender differences in cognitive performance.
discrete choice experiment
Health and Retirement Study
Women’s Health Valuation Study
The authors thank Carol Templeton, Michelle Owens, and Nawreen Jahan at Moffitt Cancer Center for their contributions to the research and creation of this paper.
None declared.