Original Paper
Abstract
Background: A smartphone is a promising tool for daily cardiovascular measurement and mental stress monitoring. A smartphone camera–based photoplethysmography (PPG) and a low-cost thermal camera can be used to create cheap, convenient, and mobile monitoring systems. However, to ensure reliable monitoring results, a person must remain still for several minutes while a measurement is being taken. This is cumbersome and makes its use in real-life situations impractical.
Objective: We proposed a system that combines PPG and thermography with the aim of improving cardiovascular signal quality and detecting stress responses quickly.
Methods: Using a smartphone camera with a low-cost thermal camera added on, we built a novel system that continuously and reliably measures 2 different types of cardiovascular events: (1) blood volume pulse and (2) vasoconstriction/dilation-induced temperature changes of the nose tip. 17 participants, involved in stress-inducing mental workload tasks, measured their physiological responses to stressors over a short time period (20 seconds) immediately after each task. Participants reported their perceived stress levels on a 10-cm visual analog scale. For the instant stress inference task, we built novel low-level feature sets representing cardiovascular variability. We then used the automatic feature learning capability of artificial neural networks to improve the mapping between the extracted features and the self-reported ratings. We compared our proposed method with existing hand-engineered features-based machine learning methods.
Results: First, we found that the measured PPG signals presented high quality cardiac cyclic information (mean pSQI: 0.755; SD 0.068). We also found that the measured thermal changes of the nose tip presented high-quality breathing cyclic information and filtering helped extract vasoconstriction/dilation-induced patterns with fewer respiratory effects (mean pSQI: from 0.714 to 0.157). Second, we found low correlations between the self-reported stress scores and the existing metrics of the cardiovascular signals (ie, heart rate variability and thermal directionality) from short measurements, suggesting they were not very dependent upon one another. Third, we tested the performance of the instant perceived stress inference method. The proposed method achieved significantly higher accuracies than existing precrafted features-based methods. In addition, the 17-fold leave-one-subject-out cross-validation results showed that combining both modalities produced higher accuracy than using PPG or thermal imaging only (PPG+Thermal: 78.33%; PPG: 68.53%; Thermal: 58.82%). The multimodal results are comparable to the state-of-the-art stress recognition methods that require long-term measurements. Finally, we explored effects of different data labeling strategies on the sensitivity of our inference methods. Our results showed the need for separation of and normalization between individual data.
Conclusions: The results demonstrate the feasibility of using smartphone-based imaging for instant stress detection. Given that this approach does not need long-term measurements requiring attention and reduced mobility, we believe it is more suitable for mobile mental health care solutions in the wild.
doi:10.2196/10140
Keywords
Introduction
Human physiological events are controlled by the actions of the sympathetic nervous system (SNS) and the parasympathetic nervous system (PSNS). Of the many different types, cardiovascular and respiratory events have been shown to be important for monitoring a person’s mental health and stress [
- ]. Recent studies have demonstrated that it is possible to use smartphone cameras (ie, Red Green Blue vision) to measure blood volume pulse (BVP) [ - ] and mobile thermal cameras attached to a smartphone (or integrated into it, for example, Cat S60) to measure respiratory cycles [ ]. These encouraging results suggest that smartphones could become a powerful apparatus for monitoring and supporting mental stress management on a daily basis through biofeedback [ ]. Indeed, the combination of RGB and thermal cameras in one device has the potential to provide a very large set of physiological measurements for stress monitoring in our daily life. Smartphone apps with such capabilities are increasingly desired as possible tools for facilitating stress self-management [ - ] as people are often unaware of their level of stress and of being stress-sensitive to particular situations, for example, chronic pain can cause a fear of movement [ ]. There is also a strong interest within the industry in complementing typically used questionnaires in order to enable improved assessment of well-being with personnel as well as revisiting work plans and work environments [ ]. Given their size and mobility, such sensors could be embedded into employees’ aids for ease of use. Although these low-cost sensors are still not perfect, the literature shows that their reliability is increasing, and we are contributing to this body of work. At the same time, we hope that our work contributes to the literature in general using these signals as stress measures [ - ]. In this paper, we aim to focus on 2 important cardiovascular events that can be captured by low-cost, low-resolution sensors: cardiac cyclic events with smartphone photoplethysmography (PPG) and vasoconstriction/dilation-induced nose tip temperature dynamics with a low-cost thermal camera. In particular, we investigate how to instantly capture stress-induced variability of such physiological patterns.Heart rate variability (HRV) is the time series of variation in heartbeats. It has been used to measure a person’s mental stress [
, , - ]. HRV’s popularity arises from the fact that it has been shown to abstract information about the sympathovagal balance between the SNS and PSNS. When confronted with a stressor, the autonomic nervous system can produce a sequence of fight-or-flight responses [ ]. These manifest themselves as alternations of accelerated and decelerated cardiovascular patterns [ , ]. To characterize the HRV, various authors [ , , , ] have proposed a variety of hand-crafted HRV metrics that are computed over time intervals between heartbeats. Although most of the HRV metrics were originally built based on the RR intervals from electrocardiogram (ECG) measurements [ ], the metrics have been applied to the PP intervals from PPG measuring BVP [ , , , ]. In the case of PPG, the term pulse rate variability (PRV) or PPG HRV is often used to clarify the different type (even if related) of event measured [ , - ] with respect to ECG. Among the most commonly used are statistical metrics (such as the standard deviation of RR or PP intervals) and frequency-band metrics (eg, the normalized power in a frequency band of interest). In particular, various studies have found that the Low Frequency (LF; 0.04 Hz-0.15 Hz) and High Frequency (HF; 0.15 Hz-0.4 Hz) bands of the time intervals in heart rates appear to reflect the SNS and PSNS activities [ ]. Based on this observation, many studies have proposed to use the LF/HF ratio as a stress indicator [ , , , ]. However, the use of such metrics has remained controversial in that they tend to oversimplify physiological phenomenon [ - ]. In particular, a single physiological metric itself does not strongly contribute to automatically detecting a person’s stress levels (ie, machine learning tasks) [ , ]. Hence, multiple HRV metrics–derived features have been used together with those from other physiological activities such as perspiration and respiratory activities for automatically inferring mental stress, for example, during driving tasks [ ] and desk activities [ ]. To ensure reliable measurements with such features, a relatively long-term window of data (several minutes to a few hours) must also be used [ , ]. Although this is acceptable in specialist settings or with medical devices, it is highly inconvenient in the real world with unstructured settings using low-cost devices (in particular, the PPG). For example, if smartphone-based finger PPG was to be used, a user would have to continuously make sure their finger is held stably in front of the camera. Another issue is that changes in ambient light levels, as a user moves around, can corrupt long-term measurements.Another documented cardiovascular event that happens as a reaction to mental stressors is vasoconstriction of blood vessels in a person’s nasal peripheral tissues [
, ]. This causes blood flow to drop, resulting in a decrease in temperature, which can be detected by monitoring the temperature of the nose tip. This study [ ] found that a contact-based multi-channel thermistor was able to detect a significant decrease in temperature of the nasal area as relative to the forehead in mentally stressful conditions. The same result has been repeatedly reported from the use of thermal imaging in mental stress induction studies [ , ], indicating that the thermal directionality (ie, temperature drop) can be a potential barometer of mental stress. However, studies show similar limitations as they require keeping the head still (often authors use a chinrest). In addition, they also require measuring baseline temperatures to compute the thermal direction, which may limit its use in real-life applications [ , ]. In this work, we address the former issue by using a state-of-the-art tracking method [ ]. Furthermore, we rely only on the instant measurement with the area of interest (nose tip) to address the latter.The reason for proposing the use of 2 sensors in this study rather than just 1 is that despite the potential of thermal imaging in measuring BVP [
], its accuracy is low and its ability in measuring PP intervals has not been yet validated. Instead, camera-based PPG has been shown to be more reliable [ , ] and can be used simultaneously with thermal imaging, possibly compensating each unimodal performance in inference tasks. In addition, the use of finger PPG and thermal camera raises much less privacy concerns than RGB-based facial analysis, that is, remote PPG [ ]. Furthermore, the use of multiple measurements increases reliability of stress monitoring. Finally, even if not investigated in this paper, low-cost thermal imaging could provide further measurements of stress-related phenomena—respiration rate [ , ] has already shown to be possible with a mobile thermal camera and possibly sweat [ ]—to provide a wide battery of cues for reliable assessment.Rather than focusing on all possible physiological signals that could be later added, this paper investigates the possibility to build a fast stress recognition system that only requires a very short time window of PPG and thermal measurements. This is to ensure the possible use in real-life ubiquitous situations. In particular, we contribute to the literature on 4 fronts. First, we propose new preprocessing techniques to enhance the quality of the signals that are extracted from both the smartphone-based PPG and thermal camera and to reliably produce PP intervals and thermal variability data as low-level features. This is particularly important when working with ultrashort measurements [
]. Second, we explore correlations between currently used metrics from thermal and PPG signals over a short period of time and self-reported stress scores. Third, instead of using the existing metrics as high-level features, we propose to use the low-level features and let artificial neural networks (NNs) learn informative high-level ones themselves. We evaluate the approach on a multimodal dataset purposely collected for this study. Finally, we further investigate sensitivities of different labeling strategies from self-reported stress scores within the perceived stress recognition performance.Methods
Overview
This section presents a method that enables quick inference of a person’s perceived stress level using smartphone-integrated PPG and thermography. We call these measurements instant measurements to differentiate them from the short measurements (typically between 2 min and 5 min), which have been previously defined in the literature [
].First, we describe software we implemented. This includes a recording setup and a set of techniques to produce reliable PPG-derived HRV profiles and sequential nose tip thermal variations (called hereafter the thermal variability sequence) from the thermal imaging sensor. We then introduce our study protocol to induce different levels of mental stress and collect short sequences (20 seconds) of cardiac pulse–related and thermal events together with self-reports of perceived mental stress scores. Third, we extract low-level (1-dimensional PP intervals and thermal variability sequences) and high-level hand-engineered features, comparing the performance of our system over the 2 sets of features and sensor modalities. We conclude by comparing our approach to data labeling with standard approaches to discuss the effect of intersubjective variability in reporting stress scores.
Toward Smartphone as a Reliable Multiple Cardiovascular Measure
The main cardiovascular sensing channels of this work are the rear RGB camera of a mobile phone (LG Nexus 5) and a low-cost thermal camera (FLIR One 2G) attached to the phone.
shows the smartphone with the attached thermal camera, the required finger placement and light emission for PPG, and the physiological measurement interface.Although the smartphone imaging–based PPG measurement can be performed in either a contact [
, ] or a contactless manner [ ], in our work, we only focus on a contact-based imaging PPG. The reason is based upon previously repeated investigations within clinical studies [ , ] reporting its high accuracy. In addition, given that a normal RGB camera is only sensitive to a narrow electromagnetic spectral range of visible light in the so-called visible spectrum [ ], adequate lighting is required before it can be used as a PPG sensor. Hence, a light emission from the rear flash light-emitting diode (LED) is used and a user is required to hold the smartphone body and place his/her finger over both the back camera and flash light ( ). Unfortunately, the use of the back flash limits the duration of the measurements in some devices since its heat can potentially burn a person’s skin. As shown in , a large amount of heat is produced by the LED emission from the chosen smartphone (LG Nexus 5) in just 25-30 seconds of operation. A similar amount of heat was observed from another mobile phone (Samsung Galaxy 6 in ). Since temperatures above 50°C are potentially damaging to human skin tissues, for example, skin erythema could occur from 25 seconds heating at 51.07°C [ ], we limit the cardiovascular measurement to a 20-second time period. This is also the required minimum duration for obtaining valid HRV metrics values, particularly LF/HF [ ].To capture a time series of apparent thermal sequences, we developed bespoke recording software using the FLIR One library (FLIR Systems). The interface is shown in
. Considering the thermal properties of human skin, the emissivity of the thermal imaging sensor was fixed at 0.98 [ ]. As the thermal imaging system does not guarantee a consistent frame rate [ ], the recording interface stores the time stamp with each image frame.Blood Volume Pulse and PP Interval Estimation Through Photoplethysmography
summarizes the approach we use to extract BVP and PP intervals through the smartphone imaging PPG. Following previous studies [ , , ], our method estimates the BVP signals by capturing subtle color variations associated with light absorptivity patterns of hemoglobin in the capillaries of a person’s skin. However, rather than using average values of the pixels of the red (or green) channel to estimate the BVP value, which is the most widely used method [ , , ], we propose to use the negative temporal variations in spatial Shannon entropy [ ] of sequential R-channel images (–Ht(X)) as raw BVP signals. This is because of averaging, which tends to ignore fairly small but important variations in color distribution [ ]. The estimated BVP value at a given time t can be expressed in the following manner (equation 1):
where xi,j is the brightness of pixel (i,j) and p(xi,j) is the probability distribution, which is generally estimated using a grayscale histogram in image analysis [
] (here, for the R channel).As our interest is in measuring raw PP intervals from PPG signals, we used a simple signal processing technique to create similar amplitudes of each peak of BVP, which helps detect peaks for measuring the time interval (ie, PP interval) between the peaks. This was done by the subtraction of the k- sample moving average signals from the raw entropy signal (
) which can be expressed in the following manner (equation 2):Since a high sampling rate produces a higher sensitivity of the PP intervals [
], we upsampled the raw sequences to 256 Hz with spline interpolation and used a 1 second moving average to smooth heartbeat induced variations within the duration where at least one heartbeat of a normal person is expected to appear [ ]. Finally, we used the simple local maxima detection [ ] with a 0.5 second sliding window to recover PP intervals ( ).Continuous Extraction of Nose Tip Thermal Variability Sequence
To extract the 1D sequential nose tip thermal changes, our approach uses the 3 computational steps shown in
. These are (1) nose tip region-of-interest (ROI) tracking, (2) breathing artifact reduction, and (3) postprocessing for extracting low-level features representing thermal variability.For ROI tracking, we can take advantage of recent advances in thermal ROI-tracking techniques, which help minimize the effects of motion artifacts and thermal environmental changes. In particular, we used the Optimal Quantization and Thermal Gradient Flow methods (
) introduced in a study by Cho Y et al [ ]. Through the use of these techniques, we can continuously extract a spatial average temperature sequence over the ROI. As breathing causes thermal changes in the area close to the nose tip (see ), we need to remove such effects from the ROI for reliable measurements. This is necessary despite the fact that breathing dynamics are significant indicators of mental stress [ , ]. For this, we propose to use a low-pass filter with a cutoff frequency lower than the normal range of breathing rates of healthy people, for example, 0.1 Hz-0.85 Hz [ ]. As a thermal directional change is a relatively slow physiological event [ ], we set this to 0.08 Hz, which is lower than the low boundary. For the implementation, we used a zero-phase filtering (seventh-order, Butterworth) to avoid a phase-shifted result. Finally, we computed the thermal variability sequences of the nose tip ( ) by downsampling with a linear interpolation and feature scaling the signal. Here, downsampling (1 Hz) is used to address the unsteady frame rate of the thermal camera and compute successive temperature differences sampled at regular temporal points. Feature scaling ( ) was applied to minimize the effect of different levels of nasal temperatures across participants and sessions and to explore the thermal temporal variability within short-term data. As this new method helps extract nose tip thermal variability sequences continuously, it can produce richer feature sets in comparison with earlier methods [ , , ]. In turn, this could possibly provide useful information, even from an instant measurement, contributing to the automatic inference of a person’s stress.Data Collection Protocol
A data collection study was carried out to gather physiological data from participants during different tasks that induced different levels of mental load. The data collection protocol is described below.
Participants
A total of 17 healthy adults (mean age 29.82 years, SD 12.02; 9 female) of varying ethnicities and different skin tones (pale white to black) were recruited from the University College London (UCL) and nonresearch community through the UCL psychology subject pool system. Participants completed prescreening through the system that was designed to exclude participants with any history of psychiatric disorders or medicine intakes, which may influence their physiological signatures. Each participant was given the information sheet, asked to provide a signed consent to take part in the study, and to fill in the demographics form before the start of data acquisition. The study was conducted in a quiet lab room with no distractions. Participants were informed that they could stop the study at any time if they felt uncomfortable. Only 1 experimenter was present in the room during the data collection but kept his distance from the participant (further than 1.5 m). We compensated each participant with an £8 Amazon voucher after completion of the study. The experimental protocol was approved by the Ethics Committee of the University College London Interaction Centre (ID Number: STAFF/1011/005).
Task Structure and Instant Measurements of Lasting Stress-Induced Physiological Events
We designed a stress induction study protocol to collect physiological data and subjective self-reports in association with mental stress levels [
, ]. From the literature on mental stress induction studies in psychology, neuroscience, and affective computing [ , , , ], we chose 2 cognitive-load induction tasks—the Stroop Color-Word test [ ] and the Mathematical Serial Subtraction test [ ]. These tests were selected as they have been shown in various studies to induce mental stress by increasing cognitive load. They have also been used in other thermal imaging studies [ , ]. Each task was divided into 2 subtasks with varying difficulty levels to elicit different stress levels (easy and hard: Se=Stroop easy, Sh=Stroop hard, Me=Math easy, Mh=Math hard) and each subtask was counterbalanced in a Latin squared design as done in a study by Cho Y et al [ ]. Between subtasks, we added a break period encouraging participants to fully recover (without any measurements, constraints) so as to avoid potential effects from previous sessions.Although it has been shown that the Stroop and Math tasks lead to cognitive overload [
, ], they are limited in the amount of stress they induce because of the lack of psychosocial stressors or other stressors [ , ]. Hence, following previous studies [ , , , ], we also introduce further stressors: (1) social evaluative threats, that is, close observation and assessment of a person’s performance [ , ], (2) time pressure, for example, 1.5 second limitation for each Stroop question [ ], and (3) loud sound feedback, particularly, an unpleasant sound for wrong answers [ ].As described above, heat caused by the use of the smartphone PPG limited our data gathering to a 20-second window immediately after each task. The aim is to capture the cardiovascular changes related to stress responses and their dynamics immediately after the stressor has ended instead of measuring the signals during each task (
). shows the overall study protocol.Measuring and Self-Report of Perceived Mental Stress
For the 20-second physiological measurements, the participants were asked to hold their index finger on the smartphone RGB camera while keeping the smartphone add-on thermal camera facing their nose, as shown in
. After each 20-second physiological measurement, all participants were asked to answer a questionnaire about their perceived level of mental stress. We used a 10-cm visual analog scale (VAS), which allows participants to answer on an analog basis (continuous) to avoid nonparametric properties [ , ]. The question asked was “How much did you feel mentally stressed?” (ranging from 0, not at all, to 10, very much). Only 1 VAS straight line was used for each participant to self-report his/her perceived stress levels across all tasks and sessions. This is to help participants easily compare stress scores they report with sessions as shown in . This approach combines a numerical approach to self-reporting with a ranking one, as ranking is generally more reliable than simple quantization of a subjective state [ - ]. The labels in have been added to the figure by the researcher to clarify their reference to each of the tasks (R1, R2: Rest from Session 1 and 2, Se: Stroop easy, Sh: Stroop hard, Me: Math easy, Mh: Math hard).- Introduction
- Waiting in the corridor, introduction and entering the study room (5 min-10 min)
- Information/consent/demographics forms filled in (5 min-10 min)
- Session 1
- Rest 1: sitting, resting (5 min)
- 20-second measurement and self-reporting of perceived stress (1 min-2 min)
- Task 1: Stroop Test 1 (5 min)
- 20-second measurement and self-reporting of perceived stress (1 min-2 min)
- Break (5 min)
- Task 2: Stroop Test 2 (5 min)
- 20-second measurement and self-reporting of perceived stress (1 min-2 min)
- Break (3 min)
- Session 2
- Rest 2: sitting, resting (5 min)
- 20-second measurement and self-reporting of perceived stress (1 min-2 min)
- Task 3: Math Test 1 (5 min)
- 20-second measurement and self-reporting of perceived stress (1 min-2 min)
- Break (5 min)
- Task 4: Math Test 2 (5 min)
- 20-second measurement and self-reporting of perceived stress (1 min-2 min)
- Break (5 min)
- Closing
- Wrap-up and participant’s feedback (5 min-20 min)
Automatic Inference of Perceived Mental Stress From Instant Measurement
Low-Level and High-Level Features From Cardiovascular Events
The 20-second cardiovascular measurement with the developed interface (
and ) simultaneously produces the following signals: (1) 1-dimensional PP intervals and (2) 1-dimensional thermal variability sequenceWe take the PP intervals (
) and thermal variability sequence ( ) as low-level features representing each modality throughout this paper.In order to evaluate the effectiveness of our approach against standard approaches, we also extracted high-level engineered features for both BVP and nose tip temperature variations as the evaluation benchmark for our approach. We followed earlier studies on stress inference using HRV metrics as the features [
, , , ] (in our case, PPG-derived HRV; for readability, hereafter simply called PRV), although we excluded features directly from HR given its minor role repeatedly found in stress inference studies [ ]. After the preprocessing method described above, we extracted the following PRV features:- PRV F1 (LF Power)
- PRV F2 (HF Power)
- PRV F3 (LF/HF ratio)
- PRV F4 (SDPP: Standard Deviation of PP intervals)
- PRV F5 (RMSSD: Root Mean Square of the Successive Differences of PP intervals)
- PRV F6 (pPP50: Proportion of the number of the successive differences of PP intervals greater than 50 ms of the total number of the intervals)As for high-level features representing the nose tip thermal signature, we used the most primarily used feature in the literature [ , - ]:
- Nose temperature F1 (TD: Temperature Difference between data from the start and the end).In addition, we extracted basic statistical features from the processed thermal variability sequence, similar to SDPP from the PP intervals:
- Nose temperature F2 (SDSTV: Standard Deviation of the Successive differences of the Thermal Variability sequence)
- Nose temperature F3 (SDTV: Standard Deviation of the Thermal Variability sequence).
The sliding window was not used to extract these features given the short period of time over which they were measured.
Labeling Strategy and Machine Learning Classifiers
Given the focus on automated inference of a person’s perceived stress level, the labeling of self-reported stress scores is an important step. However, interpersonal variability has been repeatedly found from self-reports of perceived mental stress [
, , ]. This is a key issue that must be addressed if we are to create automatic stress recognition systems that can generalize across people. Following our earlier work [ ], we use the normalized K-means clustering technique to label the measured events, as the K-means has been shown to be effective in handling self-reported data [ ]. In detail, all perceived stress scores collected from each participant are normalized through feature scaling that identifies the minimum and maximum scores for a participant and rescales all the scores so that the range is the same across all participants. Then, the K-means algorithm (k=3) is used to group the participants’ VAS scores into 3 levels of perceived stress scores corresponding to “None or low stress,” “Moderate,” and “Very high” on the VAS we used (see ). In this paper, we focus on discriminating between 2 levels of stress, No-Stress and Stress, given the limited amount of data for a more refined discrimination. Hence, a third step is required. We split the labels into 2 groups: the No-Stress group referring to the K-mean “None or low stress scores” cluster and the Stress group containing both the K-mean “Moderate” and “Very high” score clusters. A total of 2 obtained labelled groups are hence used to label the related physiological signatures from each 20-second window (L1).Furthermore, we explored the possible effect of different data labeling strategies: (1) L2, combining the first and second K-means clusters (from k=3) into No-Stress by contrast with L1, (2) L3, K-means with k=2, and (3) L4, the original stress scores divided by directly dividing the VAS scale into 3 equal sections and then combining the “Moderate” and “Very high” stress classes into 1, that is, “Not at all” and “Moderate+Very high” (threshold at point 3.334 on the VAS scale in
). The aim of L2 and L3 was to understand the sensitivity of our approach in separating the moderate level of stress with the other 2 classes. L4 was used as a way to compare with more standard techniques used in the field [ ].A total of 2 machine learning algorithms were tested. First, we used a single hidden-layer NN, which is suitable to work with low-level features (ie, PP intervals and thermal variability vectors), capturing their temporal dynamics. The use of artificial NNs can empower automatic learning of informative physiological features with backpropagation to repeatedly tune internal parameters to let the features emerge from the data (this is also called representation learning). Second, with the high-level engineered features, we used the k-Nearest Neighbor classifier (denoted as kNN, k=1) as a benchmark stress inference model given that this is typically used in this area [
]. By choosing this second algorithm, we aim to assess the limitations of the use of handcrafted features, which may simplify a person’s dynamic physiological events, and in turn possibly miss out some fast, informative moments. In particular, in the case of instant measurements (short period of time), this cannot be compensated by the use of a sliding window producing sequential feature values, for example, 120 seconds sliding window used in a study by McDuff DJ et al [ ] to continuously produce PRV features during a 180-second task session.For the implementation of NNs, we tested 2 sizes of hidden layer nodes: (1) small (n=80, NN1) and (2) large (n=260, NN2)—each node size was empirically chosen. The mean and standard deviation of the training dataset were used to normalize both the training and testing dataset. The sigmoid was used as an activation function. In the training process, a fixed learning rate of 0.5 was used for 100 epochs.
Results
In this section we evaluate our proposed approach. First, we report the statistical analysis of the collected data. Second, we discuss the recognition performance of our system over the different modalities and types of features. Finally, we compare the results for the different labeling approaches.
Reliability of Measured Physiological Patterns
First of all, we tested the reliability of the physiological measurements. From the 17 participants, we collected 102 sets of the estimated BVP signals, PP intervals, and thermal variability sequences from 20-second instant measurements taken after each Stroop and Math task and after each resting session. However, 2 sets of data were not recorded because of phone battery issues at the end of 1 experiment, and 1 set was not recorded as 1 participant clicked the turn-off button on the phone by mistake. A total of 6 further sets had to be discarded because some participant’s nose was not visible on thermal images (nose outside of the range of view because of sudden severe coughing during the 20 seconds, or because of head turned toward the experimenter, or the nose was covered by a person’s hand). Although these disturbances were often transient, they meant that data could not be collected within the 20 seconds immediately following the end of the stressor. An analysis of the thermal data from Rest 1 also showed some extreme patterns in the nose tip temperature (eg, sudden increase in temperature). This may be explained by the fact that the experiment was conducted during the winter and temperatures outside of the experimental room were often significantly lower. This included both outdoors and indoors, in the corridor where the participants waited for the experiment. Despite the temperature changes, the Rest 1 data were kept in the dataset. A total of 93 sets were used for the study.
As the measurement capability of smartphone PPG has previously been thoroughly investigated in earlier studies [
, , ], we only tested the reliability of the cardiac pulse signals measured with our approach and compared it with the mean brightness intensity–based method, which has been dominantly used [ , , ]. For this, we used the relative power Signal Quality Index (pSQI), which is to assess the strength of physiological signals in a frequency range of interest as a measure of quality [ , , , ]. The pSQI for the BVP signals can be expressed in the following manner (equation 3):where 0≤P ≤1, is the power spectral density of BVP signals (in our case, in equation 2), and are the lower and upper boundary of expected HRs, respectively. Here, we set the expected HR range to 0.8 Hz (48 bpm) to 2.0 Hz (120 bpm) given that HRs of healthy adults mostly fall into this range [
]. To minimize effects of the baseline wander and high-frequency noise on this signal quality test [ , ], we used band-pass filtered BVP signals (0.7 Hz-4.0 Hz) as in a study by Chan P-H et al [ ]. shows the better quality of the estimated BVP signals B̂ from the proposed method—Equation (2)—than that from the mean intensity method (Proposed: mean 0.755, SD 0.068; Traditional: mean 0.692, SD 0.075).shows examples of thermal images taken from the participants during the data collection study. From our observations, we found that respiration influences the nasal tip temperature measurement in some cases. For instance, in , thermal images of a person’s nose tip surface, which were sequentially captured, show that inhaled air changed the nose tip temperature. Hence, we tested how much participants’ respiratory cycled events affected the nose tip temperature measurements by using the pSQI in Equation (3) with the expected respiratory rate of interest (from 0.1 Hz-0.85 Hz) as used by Cho Y et al [ ]. demonstrates how the measured nose tip temperatures involved respiratory cyclic patterns (respiratory pSQI: mean 0.714, SD 0.163), indicating that such affected temperature patterns may lead to wrong stress-level classification. On the other hand, the processing technique we propose to use ( ) instead led to reducing respiratory artifacts on the measurement (respiratory pSQI: mean 0.157, SD 0.091).
Self-Reported Stress Ratings and Hand-Engineered Metrics
An important step was the analysis and possible normalization of the self-reported stress scores. The boxplot in
(top) shows the distribution of the self-reported scores over the resting periods and the different sessions and tasks. It is clear that the stress elicitation procedures did overall produce the wanted levels of stress with the hard sessions scoring higher than the easy sessions and the latter scoring higher than the resting periods (Rest from Session 1: mean 1.49, SD 1.94; Rest from Session 2: mean 1.30, SD 1.26; Stroop Easy: mean 2.17, SD 1.46; Math Easy: mean 2.66, SD 1.80; Stroop Hard: mean 3.92, SD 2.11; Math Hard: mean 5.17, SD 2.55) despite 2 outliers. However, the wide boxplots also show intersubject variability in self-reporting. In addition, the ranges (maximum-minimum) in scores for each participant differ quite highly (Maximum range: 8.75, Minimum range: 1.5; mean 4.7, SD 2.1), further suggesting the need for normalization of the scores.Therefore, we normalized the data for each participant with respect to their range of scores over all the sessions.
(middle) shows the original data and (bottom) shows the normalized data. The normalization helps to identify 2 main modes in the score distributions, suggesting the presence of 2 main clusters of stress levels. Given the subjectivity of stress ratings and the limited amount of data sets to carry a multilevel model, in this paper, we focused on binary classification of perceived mental stress: no/low stress versus medium/high (or very high) stress. The K-means separation between the 2 clusters is represented by each different color in (bottom).We tested the correlations among the original self-reported scores, normalized self-reported scores, and the high-level hand-crafted PRV and thermal metrics as summarized in
(using Pearson correlation coefficients). The normalized self-scores maintained a high correlation with the original scores (r=.752, P<.001). Although some metrics of each physiological sensing channel were significantly correlated among themselves (eg, PRV F2-F4: r=.838, P<.001; Thermal F1-F3: r=.803, P<.001), the correlation values were lower across sensing channels. In addition, only SDSTV shows approaching significance but low correlation with the self-report scores (r=.196, P=.059), indicating that each individual engineered metric alone could not lead to high discrimination among perceived levels of stress.Scores | Self-reports | PRV (PPG derived HRV) | Nose Temperature | ||||||||||||
S1a | S2b | LF (F1) | HF (F2) | LF/HF (F3) | SDPP (F4) | RMSSD (F5) | pPP50 (F6) | TD (F1) | SDSTV (F2) | SDTV (F3) | |||||
Self-report | |||||||||||||||
S1 | Corrc | 1 | .752 | .007 | .011 | -.044 | .03 | .146 | .058 | −.154 | .196 | .02 | |||
S1 | P | <.001 | .94 | .91 | .66 | .77 | .15 | .57 | .14 | .059 | .85 | ||||
S2 | Corr | 1 | −.079 | −.044 | −.082 | −.002 | .083 | .097 | −.153 | .197 | .032 | ||||
S2 | P | .44 | .66 | .42 | .99 | .41 | .34 | .14 | .06 | .76 | |||||
PRV (PPG derived HRV) | |||||||||||||||
F1 | Corr | 1 | .394 | .573 | .638 | .098 | .134 | .016 | .12 | .047 | |||||
F1 | P | <.001 | <.001 | <.001 | .34 | .19 | .88 | .25 | .66 | ||||||
F2 | Corr | 1 | −.293 | .838 | .13 | .39 | .083 | .2 | .054 | ||||||
F2 | P | .003 | <.001 | .20 | <.001 | .43 | .054 | .61 | |||||||
F3 | Corr | 1 | .007 | −.027 | −.178 | .056 | .057 | .123 | |||||||
F3 | P | .95 | .79 | .08 | .60 | .59 | .24 | ||||||||
F4 | Corr | 1 | .139 | .571 | .1 | .198 | .084 | ||||||||
F4 | P | .17 | <.001 | .34 | .06 | .43 | |||||||||
F5 | Corr | 1 | −.067 | −.059 | .174 | −.067 | |||||||||
F5 | P | .51 | .57 | .095 | .52 | ||||||||||
F6 | Corr | 1 | .134 | .212 | .127 | ||||||||||
F6 | P | .2 | .042 | .23 | |||||||||||
Temperature | |||||||||||||||
F1 | Corr | 1 | .213 | .803 | |||||||||||
F1 | P | .039 | <.001 | ||||||||||||
F2 | Corr | 1 | .487 | ||||||||||||
F2 | P | <.001 | |||||||||||||
F3 | Corr | 1 | |||||||||||||
F3 | P |
aS1: normalized self-reported scores.
bS2: original self-reported scores.
cCorr: correlation coefficients.
shows values of each precrafted metric across the sessions (rest and 3 stressful events, ie, Stroop: easy/hard and Math: easy/hard) and across the labels produced by the labeling technique. As shown in , there was no common pattern found between 2 easy or hard tasks, although they were designed to induce similar levels of mental stress (eg, easy: low stress level, hard: high stress level). For example, Thermal F1 appeared to strongly decrease during the Math hard task but not during the Stroop hard task, Thermal F2 increased with the Stroop hard task but less during the Math hard task. PRV F5 was generally high after both Math easy and hard task sessions than the Stroop hard session. This can further indicate that each metric alone from the instant measurement is less likely to contribute to the inference of each session. On the other hand, when we applied our labeling technique, Thermal F1 values grouped into Stress were generally lower than No-Stress data as shown in , consistent with findings from the literature [ , , ].
Instant Stress Inference Results
To evaluate the performance of instant stress recognition, we used a 17-fold leave-one-subject (participant)-out (LOSO) cross-validation. LOSO was chosen to test the ability to generalize to unseen participants (one size fits all) [
, ]. summarizes the accuracy results of the 3 classifiers (NN1, NN2, and kNN) using LOSO (N=17) for 3 different cases: (1) multimodal approach by simply combining features from both sensing channels (PRV, Thermal), (2) unimodal approach using thermal features, and (3) unimodal approach using PRV features. Both NN1 and NN2 used our proposed low-level features only (ie, PP intervals and thermal variability sequences). Overall, the NN2-based multimodal approach produced the highest mean accuracy of 78.33% (SD 15.43), mean F1 score of 77.92%, in discriminating between no-stress and perceived stress (see confusion matrix in for details). The NN1 (whose hidden layer is smaller than that for NN2) produced a lower accuracy (mean 66.76%, SD 21.75). From all cases of modality, the kNN with the high-level features (ie, using the hand-engineered 6 PRV and 3 thermal metrics) performed worst. A similar pattern can be seen for the PRV unimodal channel (NN1: mean 65.78%, SD 20.55; NN2: mean 68.53%, SD 18.89; kNN: mean 50.20%, SD 19.63). For the thermal channel, the NN1 appears to perform marginally better (mean 58.82%, SD 21.11) than the NN2 (mean 56.67%, SD 18.79), but both NNs again perform better than the kNN (mean 48.14%, SD 16.52).However, it should be noted that, for all the models, the confusion matrices for the thermal case (
, Thermal) show a clear bias toward the no-stress class. Given this bias and the fact that thermal data from the Rest 1 sessions appeared to be affected by the large variation in temperature between the waiting space and the experiment room (in addition, some participants had just arrived from the outside while others had already been indoor for sometimes), we reran the models, discarding the data from the Rest 1 sessions. Although the overall performance over this modality did not largely change (NN1: mean 58.14%, SD 23.33; NN2: mean 58.14%, SD 21.59; kNN: mean 55.88%, SD 22.38) and NN1 and NN2 still perform better than the kNN with hand-engineered features, all the confusion matrices ( bottom) show more balanced results and a better prediction of the stress class overall.A repeated measures analysis of variance was carried out on results from the 17 folds (including the Rest 1 data) to compare the 2 NN modeling approaches (that use our proposed low-level features) with the kNN (that uses hand-engineered metrics) to determine whether there was a statistical mean difference in performance. The results show significant differences between the methods for the multi and the PRV modalities—PRV+Thermal: F2,32=3.763, P=.034, ηp2=.190; PRV: F2,32=6.001, P=.006, ηp2=.273. No differences were found for the thermal case—Thermal: F2,32=2.304, P=.116, ηp2=.126. Posthoc paired t test with Bonferroni correction (see
) showed that NN2 performed significantly better than kNN for the unimodal PRV case (PRV: P=.023). For the multimodal case, NN2 approached significantly better performance than kNN (PRV+Thermal: P=.064) and NN1 (PRV+Thermal: P=.052). NN1 did not significantly perform better than kNN; however, it presented a positive trend in the unimodal PRV case (PRV: P=.091). Even if no significance differences were found over the unimodal thermal case, the graphs in show how the 2 NN models performed slightly better than the kNN for all cases including the thermal one. It could be expected that in the case of deployment, a larger sample of data for each class could indeed lead to statistical significance.Lastly, we investigated the effect of the normalization and K-means clustering of self-reported scores in inferring the perceived stress levels. For this part of the study, we removed the Rest 1 data. There were 2 reasons for this. First, we wanted to avoid the noise from the set of data affecting the comparison among the labeling methods. Second, this was also to obtain a more balanced number of instances in each class for testing different labeling methods, less biasing the learning process. The comparison of models over the different labeling techniques did not aim to obtain better performance but to understand how normalization and different clustering approaches could affect the modeling by acting on class separation and interperson variability in subjective self-reports. We were also interested in understanding how sensitive the system was in separating stress scores by using the same dataset and merging the intermediate levels with 1 of the 2 classes (L1 and L2).
We tested the 3 models (NN1, NN2, and kNN) for the multimodal approach with the different labeling strategies (L2-L4, introduced in the previous section).
summarizes the accuracy results for 4 different strategies—L1: the main method, L2: K-means with k=3, but combining no-stress and moderate level stress scores as 1 group, L3: K-means with k=2, dissecting the moderate level scores into no-stress and stress, and L4: original scores divided by a point between no-stress and moderate levels (ie, 3.334 of 10, see c). The results showed that the L1 performed best in separating the bimodal distribution of normalized self-reported scores and helped address the interpersonal variability issue. Indeed, all 3 models obtained the best accuracy with L1 and the worst performance for L3 and L4 with L4 being marginally better than L3. Finally, it should be noted that in the case of L3 and L4, the best performance was obtained with NN2 rather than NN1. This may indicate that mapping feature values to perceived stress scores may benefit from a larger hidden layer to capture the complexity of the relation.Discussion
This paper contributed to the body of work that aims to make mobile measurements of mental stress more feasible and robust. We focused on 2 stress-related cardiovascular signals: BVP and vasoconstriction/dilation-related nose tip temperature. They have been widely investigated in both the mental health and computing literature [
, , , , ], but their applicability together with low-cost sensing offered by mobile devices has not been explored. Our work makes 4 key contributions: (1) a set of methods to improve the quality of the sensed signal, (2) a demonstration of the limited capability of typically used engineered features in the context of very short-term (instant) measurements, (3) a new set of low-level features to capture the dynamical variability of the 2 signals, and (4) the feasibility of using 20-second measurements to discriminate between no-stress and stress responses. Finally, we report on the lesson learned from the analysis of different labeling methods and their effect on the modeling process. Below are detailed discussions of these contributions.Toward Smartphones as Reliable Cardiovascular Measures
Our first contribution is to develop a new set of preprocessing techniques to enhance the quality of the signal extracted from either the PPG channel, which detects blood pulse variability, or the thermal camera, which detects vasoconstriction/dilation induced nasal temperature variability. This is particularly important in mobile, ubiquitous settings where physiological sensing setups are still of lower quality and have to be less controlled in comparison with the ones generally used in medical environments.
With the data collected from our stress-inducing tasks, we wanted to test the possibility of building algorithms that can reliably and continuously capture (1) a person’s BVP pattern from the smartphone camera and (2) nose tip temperature sequence from the add-on thermal camera. Reliable BVP recording is critical, particularly for short-term measurements [
, ]. The conducted signal quality test with the pSQI showed that our method produced higher quality BVP signals than the ones obtained with traditional camera-based PPG approaches [ , , ] (see ). In addition, we found that a person’s respiratory cycles interfered with capturing thermal variations accurately from a person’s nasal area ( ). Hence, we built a new technique to minimize such effects and gather a more reliable nose tip thermal signature. This was achieved through the use of an advanced thermal ROI tracking [ ] and signal processing techniques to filter out breathing cyclic events ( ) on measured temperatures from the nose area.However, it should be noted that despite the use of the quantization approach that helps handle environmental temperature changes [
], thermal data during Rest 1 was affected by the difference in temperature between the waiting area and experiment area. This effect was further enhanced when the participants just arrived from outdoors with body temperature being strongly influenced by the cold weather outdoors (winter season). This is important because if the system has to be used, it is crucial for the person to use it in the same environment where stressful events occur. It should also be tested in future studies if a decrease in nose tip temperature may be saturated by very cold environments and therefore be less informative in such situations for automatically detecting mental stress.Traditional Cardiovascular Metrics Do Not Capture Stress-Related Variability From an Instant Measurement
We found that the capability of the HRV metrics, used as high-level features in the literature [
, , ], in instantly quantifying stress was very limited (see ). This is important as despite their general use (eg, literature in psychology or affective computing), there have still been arguments of such metrics with regard to the possibility of oversimplifying physiological responses [ - ]. It should be noted that although we used PPG-derived metrics rather than the more investigated ECG-derived metrics, strong correlations have been found between the 2 signal metrics in the case of healthy participants and limited physical movement [ , ]. Stressors in general affect cardiac pulse–related events even if the 2 types of events (heart rate and BVP) may be differently affected within nonhealthy or elderly population and extreme situations (hot temperature) [ - ]. It should also be noted that although mathematically, a shorter measurement period could lead to a lower resolution of data in the frequency domain resulting in a lower accuracy in computing metrics such as LF/HF [ ], recent studies have validated the use of them with very short measurements, from 10 seconds to 30 seconds [ ].Similarly, the metrics applied to short-term nasal thermal data (eg, TD: Temperature Difference) did also weakly contribute to stress quantification. This may explain inconsistent findings in the literature where such metrics have been used to capture thermal responses to stressful events [
, ]. All in all, the results suggested the need to develop a novel way that describes dynamical information of BVP and vasoconstriction/ dilation-related nasal temperature to help improve the understanding and capturing of their complex phenomenon.Overcoming Limitations to Mobile Automatic Stress Inference
On the basis of the low correlation between perceived mental stress levels and typically engineered metrics for these 2 signals, we proposed to use thermal variability and PP interval sequences as a novel set of low-level features to capture stress responses of cardiovascular activities. With this, we investigated how to benefit from automatic feature learning capabilities of machine learning classifiers (ie, NNs) in instantly inferencing mental stress. The results showed clear improvements in performance. Indeed, our proposed method with the 2 cardiovascular signals achieved 78.33% correct recognition accuracy with the NN2, whereas only 60.59% from the kNN with the hand-engineered features. Similarly, using the HRV-related features only, there was an improvement by 18.33% with respect to the traditional approach (50.20%). The improvement on the thermal channel was smaller but still evident from the results.
In addition, 2 further contributions can be highlighted from our approach to the modeling of automatic stress inference: instant measurements and no need for baseline. First, previous work required relatively long-term measurements of between 2 minutes and 5 minutes [
, , ]. Indeed, our results demonstrated the possibility to use just a 20-second measurement to automatically discriminate between stress and nonstress moments. This approach achieved state-of-the-art performance when compared with approaches using much longer measurements, up to around 70%-80% correct recognition from LOSO cross-validation [ ]. This is very important given that stillness is critical during PPG measurements and for thermal imaging to a certain extent. In fact, even if automatic ROI-tracking methods may help with thermal measurements, people tend to easily move away from the camera or cover their nose with their hands (5 participants did so at least once even for 20 seconds).Second, our approach (more reliable signal and richer features) led to state-of-the-art results without the use of a baseline. This is critical to everyday life settings as in everyday life, such baselines may be difficult to establish. Resting periods just before a stressful event cannot be planned, and continuously gathering such measures can be costly, whereas at the same time, nonstressful resting periods would also need to be automatically detected. In addition, our data from resting periods show that such a gold standard resting situation does not exist and environment temperature may change drastically, affecting skin temperature. This could have been because of a lab effect but general everyday life may also have specific effects on the data. Even when using differential features (eg, temperature differences between 2 areas of the face-forehead and nose tip), a baseline period was used [
]. The lack of a baseline is overcome here by proposing richer features capturing informative physiological variations over time.How Do We Define the Ground Truth: What is the Best Approach?
Setting the ground truth is a difficult process when dealing with subjective reports. How to use self-reports to label the data is a critical issue in the field because of their subjectivity. Interpersonal variability has been repeatedly reported as a critical barrier for building stress inference or quantification systems that can generalize across people [
, ]. The intersubjectivity of self-reports and the need to reduce the number of classes along with types of applications or the size of the dataset require some decisions on how to refine the labels to be taken. In doing so, there is the danger to add noise to the dataset and hence to the modeling process. We explored how different labeling techniques may affect the modeling process.We proposed to address this problem. The first step was to use a standard normalization technique to take into account personal score ranges over all tasks that aimed to induce a wide range of stress levels (from none to medium to quite high). This transformation led to a bimodal distribution highlighting at least 2 opposite levels of stress (low and high), whereas it still maintained its strong correlation with the original scores (r=.752, P<.001). The bimodal distribution is interesting as, given the low number of participants, it suggests the moderate level of stress is not well separated from the other 2 classes. A binary classification was hence a sensible approach to take in this paper; however, with larger datasets, a more refined analysis and modeling should be carried out. Second, we used a machine learning clustering technique, K-means, to improve separation of the scores into 2 classes of stress. The results obtained from the comparison of our approach (L1) with its variation (L2) and the more typically used approaches (L3 and L4) led to an interesting lesson on how to create a more reliable ground truth rather than increase noise in labeling.
Then, how should the data be clustered? According to the number of stress levels to be recognized or according to the number of stress levels the data collection experiment was set to induce? The latter approach appeared to be more successful. All labeling methods using K=3 (L1, L2, and to a certain extent L4) performed better than L3 using K=2. This suggests that directly clustering according to the number of classes to be recognized (2 in our case) may spread instances with similar stress-level responses (in this case, medium responses) across classes introducing noise rather than overcoming the problems of intersubjectivity. However, it should be noted that the normalization step was important. Indeed, the models built on either L1 and L2 using the normalized scores performed better than L4 where the original scores were used instead.
Another important issue to be addressed is how should the data be grouped when the number of classes to be detected is smaller than the number of levels induced? This decision could be needed either because there were no sufficient instances for a more refined inference or because the application at hand did not require such level of granularity (at the risk of introducing noise because of intersubjective variability). The results showed that L1, collapsing the moderate level into the high-level class, led to better performance than L2, where medium and no/low stress scores were instead combined. This may suggest that unless the stress level is very low, stress responses share more similarities than with no-stress responses. A more in-depth analysis of this aspect could be part of a future work and it may require an in-depth analysis of individual responses and validations over other datasets.
Although the results provide some interesting insights on how to cluster data from experiments, a question remains on how to deal with data from real-life situations. It is expected that in real-life situations, larger datasets may enable finer levels of discrimination personalized to a specific person. In such situations, as the dataset grows, parameters for labeling may need to be adapted to optimize the personalization. However, such rules we used could be helpful to bootstrap models on the basis of experimental datasets or well-structured initial real-life data collections. The bootstrapped models could then be personalized to specific users and recognition levels as data would be continuously collected by the person.
Limitations and Future Directions
Despite the findings and contributions described above, there is still space for improvement. First, our proposed approach did not perform properly on multiple levels of stress (labeling the data using perceived self-scores). As discussed, this was most probably because of the limited size of the dataset, especially for the medium level of stress (out of 3 levels). Deploying built software in real life could be a way to build a larger dataset. With a function to collect self-reported person’s perceived stress scores (eg, digitalized VAS sliding bar in an app), this data collection in the wild could produce a sufficient size of cardiovascular signal sets to support more reliable performance in inferencing multiple levels. In addition, it would be interesting to investigate how the transformation of the self-reported scores could be used to support multiclass classification.
Second, this work focused on sedentary situations (but without constraining one’s mobility) and did not include physical activity (eg, walking). It is well-known that physical activity induces cardiovascular changes, in turn affecting stress inference performance [
]. Hence, it would be interesting to test the instant stress inference ability of our system in situations where there is a considerable amount of physical activity, for example, industrial factory work floor.Finally, investigating the reliability of mobile sensing technologies themselves was outside the scope of this paper—see reviews on this topic [
]. We aimed to contribute a better stress inference method that can be used independently regardless of what sensing technology is used. This may be even more crucial when the sensing technology may not be as accurate and fine-grain as more expensive and medically approved technology.Conclusions
With the long-term aim of building a stress monitoring system for mobile, everyday use, this paper focuses on the use of smartphone-based imaging capabilities: PPG and thermal imaging. To overcome the difficulties in using smartphone imaging for long period measurements, we propose a novel method that quickly infers a person's perceived level of stress from instant physiological measurements. This is achieved by (1) developing a more reliable PPG-sensing technique to extract a person’s BVP and its variability, (2) building a thermal imaging–based vasoconstriction monitoring system, (3) investigating the performance of widely used high-level features from PPG and nasal temperature in instant stress inference tasks, (4) proposing novel low-level features to represent HRV and thermal variability, (5) building an automatic feature learning–based multimodal perceived stress recognizer, and (6) investigating effects of clustering self-report scores to take into account the subjectivity of self-reports and ensure clear separation among the levels of stress to be modeled.
Through the data collection study with 17 participants and a series of stress-inducing tasks with different levels, we demonstrated how this system was able to achieve state-of-the-art performance using 20 seconds of data, rather than 2 to 5 minutes typically required by existing methods. This work makes smartphone imaging–based physiological computing capabilities more feasible for real-life applications, opening new possibilities for the development of mental-stress support apps and research.
Acknowledgments
The authors thank all the participants who participated in the experiment.
Conflicts of Interest
None declared.
Multimedia Appendix 1
Single hidden-layer Neural Network (NN) and k-Nearest Neighbor (kNN).
PDF File (Adobe PDF File), 190KBReferences
- Everly GS, Lating JM. A Clinical Guide to the Treatment of the Human Stress Response. New York. Springer; 2012.
- Dedovic K, D'Aguiar C, Pruessner JC. What stress does to your brain: a review of neuroimaging studies. Can J Psychiatry. Jan 2009;54(1):6-15. [CrossRef] [Medline]
- Grossman P. Respiration, stress, and cardiovascular function. Psychophysiology. May 1983;20(3):284-300. [CrossRef] [Medline]
- Pagani M, Lombardi F, Guzzetti S, Rimoldi O, Furlan R, Pizzinelli P, et al. Power spectral analysis of heart rate and arterial pressure variabilities as a marker of sympatho-vagal interaction in man and conscious dog. Circ Res. Aug 1986;59(2):178-193. [CrossRef] [Medline]
- Kirschbaum C, Pirke KM, Hellhammer DH. The 'Trier Social Stress Test'--a tool for investigating psychobiological stress responses in a laboratory setting. Neuropsychobiology. 1993;28(1-2):76-81. [CrossRef] [Medline]
- Chan P, Wong C, Poh Y, Pun L, Leung WW, Wong YF, et al. Diagnostic performance of a smartphone-based photoplethysmographic application for atrial fibrillation screening in a primary care setting. J Am Heart Assoc. Dec 21, 2016;5(7). [FREE Full text] [CrossRef] [Medline]
- Jonathan E, Leahy M. Investigating a smartphone imaging unit for photoplethysmography. Physiol Meas. Nov 2010;31(11):N79-N83. [CrossRef] [Medline]
- Xu S, Sun L, Rohde GK. Robust efficient estimation of heart rate pulse from video. Biomed Opt Express. Apr 1, 2014;5(4):1124-1135. [FREE Full text] [CrossRef] [Medline]
- McManus DD, Lee J, Maitas O, Esa N, Pidikiti R, Carlucci A, et al. A novel application for the detection of an irregular pulse using an iPhone 4S in patients with atrial fibrillation. Heart Rhythm. Mar 2013;10(3):315-319. [FREE Full text] [CrossRef] [Medline]
- White RD, Flaker G. Smartphone-based arrhythmia detection: should we encourage patients to use the ECG in their pocket? J Atr Fibrillation. 2017;9(6):1605. [FREE Full text] [CrossRef] [Medline]
- Cho Y, Julier SJ, Marquardt N, Bianchi-Berthouze N. Robust tracking of respiratory rate in high-dynamic range scenes using mobile thermal imaging. Biomed Opt Express. Oct 1, 2017;8(10):4480-4503. [FREE Full text] [CrossRef] [Medline]
- Yu B, Funk M, Hu J, Wang Q, Feijs L. Biofeedback for everyday stress management: a systematic review. Front ICT. Sep 7, 2018;5. [CrossRef]
- Ptakauskaite N, Cox AL, Berthouze N. Knowing what you’re doing or knowing what to do: how stress management apps support reflection and behaviour change. In: Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems. 2012. Presented at: CHIEA'18; April 21-26, 2018:1; Montreal QC, Canada. [CrossRef]
- Coulon SM, Monroe CM, West DS. A systematic, multi-domain review of mobile smartphone apps for evidence-based stress management. Am J Prev Med. Dec 2016;51(1):95-105. [CrossRef] [Medline]
- Konrad A, Bellotti V, Crenshaw N, Tucker S, Nelson L, Du H, et al. Finding the adaptive sweet spot: Balancing compliance and achievement in automated stress reduction. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM; 2015. Presented at: CHI'15; April 18-23, 2015:3829; Seoul, Republic of Korea.
- Felipe S, Singh A, Bradley C, Williams A, Bianchi-Berthouze N. Roles for personal informatics in chronic pain. In: Proceedings of the 9th International Conference on Pervasive Computing Technologies for Healthcare. 2015. Presented at: PervasiveHealth'15; May 20-23, 2015:161-168; Istanbul, Turkey. [CrossRef]
- Fleck R, Cox A, Robison R. Balancing Boundaries: Using Multiple Devices to Manage Work-Life Balance. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 2015. Presented at: CHI'15; April 18-23, 2015:3985-3988; Seoul, Republic of Korea. [CrossRef]
- Jobbágy A, Majnár M, Tóth L, Nagy P. HRV-based stress level assessment using very short recordings. Period Polytech Electr Eng Comput Sci. 2017;61(3):245. [CrossRef]
- Charlton PH, Celka P, Farukh B, Chowienczyk P, Alastruey J. Assessing mental stress from the photoplethysmogram: a numerical study. Physiol Meas. May 15, 2018;39(5):054001. [FREE Full text] [CrossRef] [Medline]
- Mohan P, Nagarajan V, Das S. Stress measurement from wearable photoplethysmographic sensor using heart rate variability data. 2016. Presented at: 2016 International Conference on Communication and Signal Processing; April 6-8, 2016:1141; Melmaruvathur, India. [CrossRef]
- Task Force of The European Society of Cardiology and The North American Society of Pacing and Electrophysiology. Heart rate variability: standards of measurement, physiological interpretation and clinical use. Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiolog. Circulation. Mar 1, 1996;93(5):1043-1065. [Medline]
- Hjortskov NA, Rissén D, Blangsted AT, Fallentin N, Lundberg U, Søgaard K. The effect of mental stress on heart rate variability and blood pressure during computer work. Eur J Appl Physiol. Jun 2004;92(1-2):84-89. [CrossRef] [Medline]
- Bernardi L, Wdowczyk-Szulc J, Valenti C, Castoldi S, Passino C, Spadacini G, et al. Effects of controlled breathing, mental activity and mental stress with or without verbalization on heart rate variability. J Am Coll Cardiol. May 2000;35(6):1462-1469. [FREE Full text] [Medline]
- Zhu B, Hedman A, Feng S, Li H, Osika W. Designing, prototyping and evaluating digital mindfulness applications: a case study of mindful breathing for stress reduction. J Med Internet Res. Dec 14, 2017;19(6):e197. [FREE Full text] [CrossRef] [Medline]
- McDuff D, Hernandez J, Gontarek S, Picard R. Contact-free Measurement of Cognitive Stress During Computer Tasks with a Digital Camera. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM; 2016. Presented at: CHI'16; May 07-12, 2016:4000-4004; San Jose, California, USA. [CrossRef]
- Shaffer F, McCraty R, Zerr CL. A healthy heart is not a metronome: an integrative review of the heart's anatomy and heart rate variability. Front Psychol. Sep 2014;5:1040. [FREE Full text] [CrossRef] [Medline]
- Williamon A, Aufegger L, Wasley D, Looney D, Mandic DP. Complexity of physiological responses decreases in high-stress musical performance. J R Soc Interface. Dec 6, 2013;10(89):20130719. [FREE Full text] [CrossRef] [Medline]
- Billman GE. Heart rate variability - a historical perspective. Front Physiol. 2011;2:86. [FREE Full text] [CrossRef] [Medline]
- Heathers JA. Smartphone-enabled pulse rate variability: an alternative methodology for the collection of heart rate variability in psychophysiological research. Int J Psychophysiol. Sep 2013;89(3):297-304. [CrossRef] [Medline]
- Giardino ND, Lehrer PM, Edelberg R. Comparison of finger plethysmograph to ECG in the measurement of heart rate variability. Psychophysiology. Mar 2002;39(2):246-253. [CrossRef] [Medline]
- Schäfer A, Vagedes J. How accurate is pulse rate variability as an estimate of heart rate variability? A review on studies comparing photoplethysmographic technology with an electrocardiogram. Int J Cardiol. Jun 5, 2013;166(1):15-29. [CrossRef] [Medline]
- Salahuddin L, Kim D. Detection of Acute Stress by Heart Rate Variability Using a Prototype Mobile ECG Sensor. In: Proceedings of the 2006 International Conference on Hybrid Information Technology-Volume 02. 2016. Presented at: ICHIT'06; November 09-11, 2006:453-459; Jeju Island, Korea. [CrossRef]
- Billman GE. The LF/HF ratio does not accurately measure cardiac sympatho-vagal balance. Front Physiol. 2013;4:26. [FREE Full text] [CrossRef] [Medline]
- Karemaker J. Heart rate variability: why do spectral analysis? Heart. 1997;77(2):101. [CrossRef] [Medline]
- Eckberg D. Sympathovagal balance: a critical appraisal. Circulation. Nov 4, 1997;96(9):3224-3232. [CrossRef] [Medline]
- Cho Y, Bianchi-Berthouze N, Julier S. DeepBreath: Deep Learning of Breathing Patterns for Automatic Stress Recognition using Low-Cost Thermal Imaging in Unconstrained Settings. 2017. Presented at: Seventh International Conference on Affective Computing and Intelligent Interaction (ACII); 2017:456-463; San Antonio. URL: https://arxiv.org/ftp/arxiv/papers/1708/1708.06026.pdf
- Healey J, Picard R. Detecting stress during real-world driving tasks using physiological sensors. IEEE Trans Intell Transport Syst. Jun 2005;6(2):156-166. [CrossRef]
- Or CK, Duffy VG. Development of a facial skin temperature-based methodology for non-intrusive mental workload measurement. Occup Ergon. 2007;7(2):94. [FREE Full text]
- Ioannou S, Gallese V, Merla A. Thermal infrared imaging in psychophysiology: potentialities and limits. Psychophysiology. Oct 2014;51(10):951-963. [FREE Full text] [CrossRef] [Medline]
- Genno H, Ishikawa K, Kanbara O, Kikumoto M, Fujiwara Y, Suzuki R, et al. Using facial skin temperature to objectively evaluate sensations. Int J Ind Ergon. Feb 1997;19(2):161-171. [CrossRef]
- Engert V, Merla A, Grant JA, Cardone D, Tusche A, Singer T. Exploring the use of thermal infrared imaging in human stress research. PLoS One. Mar 2014;9(3):e90782. [FREE Full text] [CrossRef] [Medline]
- Abdelrahman Y, Velloso E, Dingler T, Schmidt A, Vetere F. Cognitive heat exploring the usage of thermal imaging to unobtrusively estimate cognitive load. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2017;1(3). [CrossRef]
- Cho Y. Automated Mental Stress Recognition through Mobile Thermal Imaging. 2017. Presented at: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction; October 23-26, 2017:596-600; San Antonio, Texas. [CrossRef]
- Garbey M, Sun N, Merla A, Pavlidis I. Contact-free measurement of cardiac pulse based on the analysis of thermal imagery. IEEE Trans Biomed Eng. Aug 2007;54(8):1418-1426. [CrossRef] [Medline]
- Plews DJ, Scott B, Altini M, Wood M, Kilding AE, Laursen PB. Comparison of heart-rate-variability recording with smartphone photoplethysmography, Polar H7 chest strap, and electrocardiography. Int J Sports Physiol Perform. Nov 1, 2017;12(10):1324-1328. [CrossRef] [Medline]
- Pavlidis I, Tsiamyrtzis P, Shastri D, Wesley A, Zhou Y, Lindner P, et al. Fast by nature - how stress patterns define human experience and performance in dexterous tasks. Sci Rep. Mar 2012:2. [CrossRef] [Medline]
- Shaffer F, Ginsberg J. An overview of heart rate variability metrics and norms. Front Public Health. 2017;5:258. [FREE Full text] [CrossRef] [Medline]
- Cho Y, Bianchi-Berthouze N, Marquardt N, Julier SJ. Deep Thermal Imaging: Proximate Material Type Recognition in the Wild through Deep Learning of Spatial Surface Temperature Patterns. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 2018. Presented at: CHI'18; April 21-26, 2018; Montreal QC, Canada. [CrossRef]
- Yarmolenko PS, Moon EJ, Landon C, Manzoor A, Hochman DW, Viglianti BL, et al. Thresholds for thermal damage to normal tissues: an update. Int J Hyperthermia. 2011;27(4):320-343. [FREE Full text] [CrossRef] [Medline]
- Steketee J. Spectral emissivity of skin and pericardium. Phys. Med. Biol. May 27, 2002;18(5):686-694. [CrossRef] [Medline]
- Shannon CE. A mathematical theory of communication. Bell Syst Tech J. Jul 1947;27(3):423. [FREE Full text] [CrossRef]
- Sonka M, Hlavac V, Boyle R. Image Processing: Analysis and Machine Vision. Stamford, USA. Cengage Learning; 2014.
- Kumar M, Veeraraghavan A, Sabharwal A. DistancePPG: robust non-contact vital signs monitoring using a camera. Biomed Opt Express. May 1, 2015;6(5):1565-1588. [FREE Full text] [CrossRef] [Medline]
- Tsuji H, Venditti FJ, Manders ES, Evans JC, Larson MG, Feldman CL, et al. Determinants of heart rate variability. J Am Coll Cardiol. Nov 15, 1996;28(6):1539-1546. [FREE Full text] [CrossRef] [Medline]
- Hamilton PS, Tompkins WJ. Quantitative investigation of QRS detection rules using the MIT/BIH arrhythmia database. IEEE Trans Biomed Eng. Dec 1986;BME-33(12):1157-1165. [CrossRef] [Medline]
- Kuraoka K, Nakamura K. The use of nasal skin temperature measurements in studying emotion in macaque monkeys. Physiol Behav. Mar 1, 2011;102(3-4):347-355. [CrossRef] [Medline]
- Lazarus R. From psychological stress to the emotions: a history of changing outlooks. Annu Rev Psychol. 1993;44:1-21. [CrossRef] [Medline]
- Hong J, Ramos J, Dey A. Understanding Physiological Responses to Stressors During Physical Activity. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing. 2012. Presented at: UbiComp'12; September 5-8, 2012:270-279; Pittsburgh, Pennsylvania. [CrossRef]
- Akerstedt T, Gillberg M, Hjemdahl P, Sigurdson K, Gustavsson I, Daleskog M, et al. Comparison of urinary and plasma catecholamine responses to mental stress. Acta Physiol Scand. Jan 1983;117(1):19-26. [CrossRef] [Medline]
- Stroop JR. Studies of interference in serial verbal reactions. J Exp Psychol. 1935;18(6):643-662. [CrossRef]
- Soufer R, Bremner JD, Arrighi JA, Cohen I, Zaret BL, Burg MM, et al. Cerebral cortical hyperactivation in response to mental stress in patients with coronary artery disease. Proc Natl Acad Sci U S A. May 26, 1998;95(11):6454-6459. [FREE Full text] [Medline]
- Setz C, Arnrich B, Schumm J, La MR, Tröster G, Ehlert U. Discriminating stress from cognitive load using a wearable EDA device. IEEE Trans Inf Technol Biomed. 2010;14(2):417. [CrossRef]
- Bijur PE, Silver W, Gallagher EJ. Reliability of the visual analog scale for measurement of acute pain. Acad Emerg Med. Dec 2001;8(12):1153-1157. [FREE Full text] [Medline]
- Lesage FX, Berjot S, Deschamps F. Clinical stress assessment using a visual analogue scale. Occup Med (Lond). Dec 2012;62(8):600-605. [FREE Full text] [CrossRef] [Medline]
- Yannakakis G, Cowie R, Busso C. The ordinal nature of emotions. 2017. Presented at: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction; October 23-26, 2017:248-255; San Antonio, Texas. [CrossRef]
- Atkinson D, Baurley S, Petreca B, Bianchi-Berthouze N, Watkins P. The tactile triangle: a design research framework demonstrated through tactile comparisons of textile materials. J Des Res. 2016;14(2):170. [CrossRef]
- Bang A. Triads as a means for dialogue about emotional values in textile design. 2009. Presented at: 8th European Academy Of Design Conference; April 1-3, 2009; Aberdeen, Scotland.
- Hovsepian K, al Absi M, Ertin E, Kamarck T, Nakajima M, Kumar S. cStress: Towards a Gold Standard for Continuous Stress Assessment in the Mobile Environment. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 2015. Presented at: UbiComp'15; September 07-11, 2015:493; Osaka, Japan. [CrossRef]
- Wang J, Lin C, Yang YC. A k-nearest-neighbor classifier with heart rate variability feature-based transformation algorithm for driving stress recognition. Neurocomputing. Sep 2013;116:136-143. [CrossRef]
- Hernandez J, Morris R, Picard R. Call Center Stress Recognition with Person-Specific Models. In: Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I. 2011. Presented at: ACII'11; October 09-12, 2011:125-134; Memphis, TN. [CrossRef]
- Salmivalli C, Kaukiainen A, Kaistaniemi L, Lagerspetz KMJ. Self-evaluated self-esteem, peer-evaluated self-esteem, and defensive egotism as predictors of adolescents' participation in bullying situations. Pers Soc Psychol Bull. Jul 2, 2016;25(10):1268-1278. [CrossRef]
- Sano A, Picard R. Stress Recognition Using Wearable Sensors and Mobile Phones. 2013. Presented at: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction; September 2-5, 2013:671-676; Geneva, Switzerland. [CrossRef]
- Clifford G, Behar J, Li Q, Rezek I. Signal quality indices and data fusion for determining clinical acceptability of electrocardiograms. Physiol Meas. Sep 2012;33(9):1419-1433. [CrossRef] [Medline]
- Elgendi M. Optimal signal quality index for photoplethysmogram signals. Bioengineering (Basel). Sep 22, 2016;3(4). [FREE Full text] [CrossRef] [Medline]
- Allen J. Photoplethysmography and its application in clinical physiological measurement. Physiol Meas. 2007;28(3). [CrossRef] [Medline]
- McKinley PS, Shapiro PA, Bagiella E, Myers MM, de Meersman RE, Grant I, et al. Deriving heart period variability from blood pressure waveforms. J Appl Physiol (1985). Oct 2003;95(4):1431-1438. [FREE Full text] [CrossRef] [Medline]
- Shin H. Ambient temperature effect on pulse rate variability as an alternative to heart rate variability in young adult. J Clin Monit Comput. Dec 2016;30(6):939-948. [FREE Full text] [CrossRef] [Medline]
- Veltman J, Vos W. Wright State University: CORE Scholar. 2005. URL: https://corescholar.libraries.wright.edu/cgi/viewcontent.cgi?article=1124&context=isap_2005
- Lane N, Miluzzo E, Lu H, Peebles D, Choudhury T, Campbell A. A survey of mobile phone sensing. IEEE Commun Mag. 2010;48(9). [CrossRef]
Abbreviations
BVP: blood volume pulse |
ECG: electrocardiogram |
HF: High Frequency |
HRV: heart rate variability |
kNN: k-Nearest Neighbor |
LED: light-emitting diode |
LF: Low Frequency |
LOSO: leave-one-subject-out |
NN: neural network |
PPG: photoplethysmography |
pPP50: Proportion of the number of the successive differences of PP intervals greater than 50 ms of the total number of the intervals |
PRV: pulse rate variability |
PSNS: parasympathetic nervous system |
pSQI: power Signal Quality Index |
RMSSD: root mean square of the successive differences of PP intervals |
ROI: region of interest |
SDPP: standard deviation of PP intervals |
SDSTV: standard deviation of the successive differences of the thermal variability |
SDTV: standard deviation of the thermal variability sequence |
SNS: sympathetic nervous system |
TD: temperature difference |
UCL: University College London |
Edited by J Torous; submitted 15.02.18; peer-reviewed by A Sano, F Andrasik, M Lang; comments to author 13.03.18; revised version received 21.08.18; accepted 05.12.18; published 09.04.19.
Copyright©Youngjun Cho, Simon J Julier, Nadia Bianchi-Berthouze. Originally published in JMIR Mental Health (http://mental.jmir.org), 09.04.2019.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Mental Health, is properly cited. The complete bibliographic information, a link to the original publication on http://mental.jmir.org/, as well as this copyright and license information must be included.