OPEN SUBJECT AREAS: COMPUTATIONAL BIOPHYSICS BIOMEDICAL ENGINEERING APPLIED MATHEMATICS COMPUTATIONAL SCIENCE Revealing Real-Time Emotional Responses: a Personalized Assessment based on Heartbeat Dynamics ´ Gaetano Valenza1,2,3, Luca Citi1,2,4, Antonio Lanata3, Enzo Pasquale Scilingo3 & Riccardo Barbieri1,2 1 Received 25 September 2013 Accepted 4 March 2014 Published 21 May 2014 Neuroscience Statistics Research Laboratory, Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02114, USA, 2Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA, 3 Department of Information Engineering and Research Centre E Piaggio, University of Pisa, Pisa, Italy, 4School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO43SQ, UK. Correspondence and requests for materials should be addressed to G.V. (g.valenza@ieee. org) Emotion recognition through computational modeling and analysis of physiological signals has been widely investigated in the last decade. Most of the proposed emotion recognition systems require relatively long-time series of multivariate records and do not provide accurate real-time characterizations using short-time series. To overcome these limitations, we propose a novel personalized probabilistic framework able to characterize the emotional state of a subject through the analysis of heartbeat dynamics exclusively. The study includes thirty subjects presented with a set of standardized images gathered from the international affective picture system, alternating levels of arousal and valence. Due to the intrinsic nonlinearity and nonstationarity of the RR interval series, a specific point-process model was devised for instantaneous identification considering autoregressive nonlinearities up to the third-order according to the Wiener-Volterra representation, thus tracking very fast stimulus-response changes. Features from the instantaneous spectrum and bispectrum, as well as the dominant Lyapunov exponent, were extracted and considered as input features to a support vector machine for classification. Results, estimating emotions each 10 seconds, achieve an overall accuracy in recognizing four emotional states based on the circumplex model of affect of 79.29%, with 79.15% on the valence axis, and 83.55% on the arousal axis. he detection and recognition of emotional information is an important topic in the field of affective computing, i.e. the study of human affects by technological systems and devices1. Changes in emotional states often reflect facial, vocal, and gestural modifications in order to communicate, sometimes subunconsciously, personal feelings to other people. Such changes can be generalized across cultures, e.g. nonverbal emotional, or can be culture-specific2. Since mood alteration strongly affects the normal emotional process, emotion recognition is also an ambitious objective in the field of mood disorder psychopathology. In the last decade, several efforts have tried to obtain a reliable methodology to automatically identify the emotional/mood state of a subject, starting from the analysis of facial expressions, behavioral correlates, and physiological signals. Despite such efforts, current practices still use simple mood questionnaires or interviews for emotional assessment. In mental care, for instance, the diagnosis of pathological emotional fluctuations is mainly made through the physician’s experience. Several epidemiological studies report that more than two million Americans have been diagnosed with bipolar disorder3, and about 82.7 million of the adult European population from 18 to 65 years of age, have been affected by at least one mental disorder4. Several computational methods for emotion recognition based on variables associated with the Central Nervous System (CNS), for example the Electroencephalogram (EEG), have been recently proposed5–12. These methods are justified by the fact that human emotions originate in the cerebral cortex involving several areas for their regulation and feeling. The prefrontal cortex and amygdala, in fact, represent the essence of two specific pathways: affective elicitations longer than 6 seconds allow the prefrontal cortex to encode the stimulus information and transmit it to other areas of the Central Autonomic Network (CAN) to the brainstem, thus producing a context appropriate response13; briefly presented stimuli access the fast route of emotion recognition via the amygdala. Of note, it has been found that the visual cortex is involved in emotional reactions to different classes of stimuli14. Dysfunctions on these CNS recruitment circuits lead to pathological effects15 such as anhedonia, i.e. the loss of pleasure or interest in previously rewarding stimuli, which is a core feature of major depression and other serious mood disorders. SCIENTIFIC REPORTS | 4 : 4998 | DOI: 10.1038/srep04998 1 T www.nature.com/scientificreports Figure 1 | A graphical representation of the circumplex model of affect with the horizontal axis representing the valence or pleasant dimension and the vertical axis representing the arousal or activation dimension49. Given the CAN involvement in emotional responses, an important direction for affective computing studies is related to changes of the Autonomic Nervous System (ANS) activity as elicited by specific emotional states. Monitoring physiological variables linked to ANS activity, in fact, can be easily performed through wearable systems, e.g. sensorized t-shirts16,17 or gloves18,46. Its dynamics is thought to be less sensitive to artifacting events than in the EEG case. Moreover, the human vagus nerve is anatomically linked to the cranial nerves that regulate social engagement via facial expression and vocalization. Engineering approaches to assess ANS patterns related to emotions constitute a relevant part of the state-of-the-art methods used in affective computing. For example, a recent review written by Calvo et al.19 reports on emotion theories as well as on affect detection systems using physiological and speech signals (also reviewed in20), face expression and movement analysis. Long multivariate recordings are currently needed to accurately characterize the emotional state of a subject. Such a constraint surely reduces the potential wide spectrum of real applications due to computational cost and number of sensors. More recently, ECG morphological analysis by Hilbert-Huang transform21, mutual information analysis of respiratory signals22, and a multiparametric approach related to ANS activity23 have been proposed to assess human affective states. Experimental evidence over the past two decades shows that Heart Rate Variability (HRV) analysis, in both time and frequency domain, can provide a unique, noninvasive assessment of autonomic function24,25,88. Nevertheless, HRV analysis by means of standard procedures presents several limitations when high time and frequency resolutions are needed, due mainly to associated inherent assumptions of stationarity required to define most of the relevant HRV time and frequency domain indices24,25. More importantly, standard methods are generally not suitable to provide accurate nonlinear measures in the absence of information regarding phase space fitting. It has been well-accepted by the scientific community that physiological models should be nonlinear in order to thoroughly describe the characteristics of such complex systems. Within the cardiovascular system, the complex and nonstationary dynamics of heartbeat variations have been associated to nonlinear neural interactions and integrations occurring at the neuron and receptor levels, so that the sinoatrial node responds in a nonlinear way to the changing levels of efferent autonomic inputs27. In fact, HRV nonlinear measures SCIENTIFIC REPORTS | 4 : 4998 | DOI: 10.1038/srep04998 have been demonstrated to be of prognostic value in aging and diseases24–26,28–36,41. In several previous works37–43, we have demonstrated how it is possible to estimate heartbeat dynamics in cardiovascular recordings under nonstationary conditions by means of the analysis of the probabilistic generative mechanism of the heartbeat. Concerning emotion recognition, we recently demonstrated the important role of nonlinear dynamics for a correct arousal and valence recognition from ANS signals44–46,71 including a preliminary feasibility study on the dataset considered here47. In the light of all these issues, we here propose a new methodology in the field of affective computing, able to recognize emotional swings (positive or negative), as well as two levels of arousal and valence (low-medium and medium-high), using only one biosignal, the ECG, and able to instantaneously assess the subject’s state even in short-time events (,10 seconds). Emotions associated with a shorttime stimulus are identified through a self-reported label as well as four specific regions in the arousal-valence orthogonal dimension (see Fig. 1). The proposed methodology is fully based on a personalized probabilistic point process nonlinear model. In general, we model the probability function of the next heartbeat given the past Revents. The probability function is fully parametrized, considering up to cubic autoregressive Wiener-Volterra relationship to model its first order moment. All the considered features are estimated by the linear, quadratic, and cubic coefficients of the linear and nonlinear terms of such a relationship. As a consequence, our model provides the unique opportunity to take into account all the possible linear and nonlinear features, which can be estimated from the model parameters. Importantly as the probability function is defined at each moment in time, the parameter estimation is performed instantaneously, a feature not reliably accomplishable by using other more standard linear and nonlinear indices such as pNN50%, triangular index, RMSSD, Recurrence Quantification Analysis, etc.24,25,88. In particular, the linear terms allow for the instantaneous spectral estimation, the quadratic terms allow for the instantaneous bispectral estimation, whereas the dominant Lyapunov exponent can be defined by considering the cubic terms. Of note, the use of higher order statistics (HOS) to estimate our features is encouraged by the fact that quantification of HRV nonlinear dynamics play a crucial role in emotion recognition systems44,48,71 extending the information given by spectral analysis, and providing useful information on the nonlinear frequency interactions. 2 www.nature.com/scientificreports Results Experimental protocol. The recording paradigm related to this work has been previously described in44,48. We adopted a common dimensional model which uses multiple dimensions to categorize emotions, the Circumplex Model of Affects (CMA)49. The CMA used in our experiment takes into account two main dimensions conceptualized by the terms of valence and arousal (see Fig. 1). Valence represents how much an emotion is perceived as positive or negative, whereas arousal indicates how strongly the emotion is felt. Accordingly, we employed visual stimuli belonging to an international standardized database having a specific emotional rating expressed in terms of valence and arousal. Specifically, we chose the International Affective Picture System (IAPS)50, which is one of the most frequently cited tools in the area of affective stimulation. The IAPS is a set of 944 images with emotional ratings based on several studies previously conducted where subjects were requested to rank these images using the self assessment manikin (both valence and arousal scales range from 0 to 10). A general overview of the experimental protocol and analysis is shown in Fig. 2. The passive affective elicitation performed through the IAPS images stimulates several cortical areas also allowing the prefrontal cortex modulation generating cognitive perceptions13. An homogeneous population of 30 healthy subjects (aged from 21 to 24), not suffering from both cardiovascular and evident mental pathologies, was recruited to participate in the experiment. The experimental protocol for this study was approved by the ethical committee of the University of Pisa and an informed consent was obtained from all participants involved in the experiment. All participants were screened by Patient Health QuestionnaireTM (PHQ) and only participants with a score lower than 5 were included in the study51. The affective elicitation was performed by projecting the IAPS images to a PC monitor. The slideshow was comprised of 9 image sessions, alternating neutral sessions and arousal sessions (see Fig. 3). The neutral sessions consist of 6 images having valence range (min 5 5.52, max 5 7.08), and arousal range (min 5 2.42, max 5 3.22). The arousal sessions are divided into Low-Medium (L-M) and MediumHigh (M-H) classes, according to the arousal score associated. Such sessions include 20 images eliciting an increasing level of valence (from unpleasant to pleasant). The L-M arousal sessions had a valence range (min 5 1.95, max 5 8.03), and an arousal range (min 5 3.08, max 5 4.99). The M-H arousal sessions had a valence range (min 5 1.49, max 5 7.77), and an arousal range (min 5 5.01, max 5 6.99). The overall protocol utilized 110 images. Each image was presented for 10 seconds for a total duration of the experiment of 18 minutes and 20 seconds. During the visual elicitation, the electrocardiogram (ECG) was acquired by using the ECG100C Electrocardiogram Amplifier from BIOPAC inc., with a sampling rate of 250 Hz. A block diagram of the proposed recognition system is illustrated in Fig. 4. In line with the CMA model, the combination of two levels of arousal and valence brings to the definition of four different emotional states. The stimuli, with high and low arousal and high and low valence, produce changes in the ANS dynamics through both sympathetic and parasympathetic pathways that can be tracked by a multidimensional representation estimated in continuous time by the proposed point-process model. The obtained features are then processed for classification by adopting a leave-one-out procedure. Algorithms. The ECG signal was analyzed off-line to extract the RR intervals24, then further processed to correct for erroneous and ectopic beats by a previously developed algorithm52. The presence of nonlinear behaviors in such heartbeat series was tested by using a well-established time-domain test based on high-order statistics53. The null hypothesis assumes that the time series are generated by a linear system. We set the number of laps to M 5 8, and a total of 500 SCIENTIFIC REPORTS | 4 : 4998 | DOI: 10.1038/srep04998 bootstrap replications for every test. Experimental results are shown in Table 1. The nonlinearity test gave significant results (p , 0.05) on 27 out of 30 subjects. In light of this result, we based our methodology on Nonlinear Autoregressive Integrative (NARI) models. Nonlinearities are intended as quadratic and cubic functions of the past RR intervals according to the Wiener-Volterra representation54,55. Major improvements of our approach rely on the possibility of performing a regression on the derivative RR series based on an Inverse Gaussian (IG) probability structure37–39. The quadratic nonlinearities contribute to the complete emotional assessment through features coming from the instantaneous spectrum and bispectrum56,57. It is worthwhile noticing that our feature estimation is derived from an equivalent nth-order inputoutput Wiener-Volterra model54,55, thus allowing for the potential estimation of the nth-order polyspectra of the physiological signal60 (see Materials and Methods section for details). Moreover, by representing the RR series with cubic autoregressive functions, it is possible to perform a further instantaneous nonlinear assessment of the complex cardiovascular dynamics and estimate the dominant Lyapunov exponent at each moment in time61. Indices from a representative subject are shown in Fig. 5. Importantly, the NARI model as applied to the considered data provides excellent results in terms of goodness-of-fit, and independence test, with KS distances never above 0.056. A comparison analysis was performed between the simple linear and NARI models considering the Sum of the Squared Distances (SSD) of the points outside the confidence interval of the autocorrelation plot (see Table 1). We report that nonlinear point-process models resulted in lower SSD on all the considered subjects. Further results reporting the number of points outside the confidence interval of the autocorrelation plot are shown in the Supporting Information. To summarize, the necessary algorithmic steps for the assessment of instantaneous ANS responses to short-time emotional stimuli are as follows: a) extract an artifact-free RR interval series from the ECG; b) use the autoregressive coefficients of the quadratic NARI expansion to extract the input-output kernels; c) estimate the instantaneous spectral and bispectral features; d) use the autoregressive coefficients of the cubic NARI expansion and fast orthogonal search algorithm to estimate the instantaneous dominant Lyapunov exponent. All the extracted instantaneous features (see Materials and Methods) are used as input of the classification procedure described below. Of note, since no other comparable statistical models have been advocated for a similar application, goodness-of-fit and classification performance of the proposed nonlinear approach are compared with its linear counterpart: the basic linear point-process model described in37, is also used here for the first time for the proposed classification analysis. Clearly, in order to perform a fair comparison, the order of such a simple linear IG-based point-process model was chosen considering an equal number of parameters that need to be estimated for the nonlinear models. Moreover, here we report on the usability of other simple pointprocess models having the Poisson distribution as inter-beat probability. Such a model gave poor performances in terms of both goodness-of-fit and resulting in sufficient reliability to solve the proposed classification problems. All of the algorithms were implemented by using MatlabE R2013b endowed with self-made code and three additional toolboxes for pattern recognition and signal processing, i.e. LIBSVM62, PRTool63 and time series analysis toolbox64. Classification. To perform pattern recognition of the elicited emotional states, a two-class problem was considered for the arousal, valence and self-reported emotion: Low-Medium (L-M) and Medium-High (M-H). The arousal classification was linked to the capability of our methodology in distinguishing the L-M arousal stimuli from the M-H ones, with the neutral sessions associated to the L-M arousal class. The overall protocol utilized 110 images. According to the scores associated to each image, for each subject, 3 www.nature.com/scientificreports Figure 2 | An overview of the experimental set-up and block scheme of the overall signal processing and classification chain. The central nervous system is emotionally stimulated through images gathered from the International Affective Picture System. Such a standardized dataset associates multiple scores to each picture quantifying the supposed elicited pleasantness (valence) and activation (arousal). Accordingly, the pictures are grouped into arousal and valence classes, including the neutral ones. During the slideshow each image stands for 10 seconds, activating the prefrontal cortex and other cortical areas, consequently producing the proper autonomic nervous system changes through both parasympathetic and sympathetic pathways. Starting from the ECG recordings, the RR interval series are extracted by using automatic R-peak detection algorithms applied on artifact-free ECG. The absence of both algorithmic errors (e.g., mis-detected peaks) or ectopic beats in such a signal is ensured by the application of effective artifact removal methods as well as visual inspection. The proposed point-process model is fitted on the RR interval series, and several features are estimated in an instantaneous fashion. Then, for each subject, a feature set is chosen and then split into training and test set for support vector machine-based classification. This image was drawn by G. Valenza, who holds both copyright and responsibility. the dataset was comprised of 64 examples for the L-M arousal class and 40 examples for the M-H arousal class. Regarding valence, we distinguished the L-M from the M-H valence regardless of the images belonging to the neutral classes. This choice is justified by the fact that the neutral images can be equally associated to the L-M or M-H valence classes. According to the experimental protocol timeline, for SCIENTIFIC REPORTS | 4 : 4998 | DOI: 10.1038/srep04998 each subject, the dataset was comprised of 40 examples for the L-M valence class and 40 examples for the M-H valence class. For the selfreported emotions, we used labels given by the self-assessment manikin (SAM) report. After the visual elicitation, in fact, each subject was asked to fill out a SAM test associating either a positive or a negative emotion to each of the seen images. During this phase, 4 www.nature.com/scientificreports algorithm. Specifically, we used a nu-SVM (nu 5 0.5) having a radial   2  À Á basis kernel function K xi ,xj ~exp {cxi {xj  with c 5 N21, x[