Adaptive Normalization and Feature Extraction for Electrodermal Activity Analysis

Viana-Matesanz, Miguel; Sánchez-Ávila, Carmen

doi:10.3390/math12020202

Open AccessArticle

Adaptive Normalization and Feature Extraction for Electrodermal Activity Analysis

by

Miguel Viana-Matesanz

^1,2

and

Carmen Sánchez-Ávila

^2,3,*

¹

PhD Programme in Biomedical Engineering, ETSI Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain

²

Research Group on Biometrics, Biosignals, Security and Smart Mobility, UPM’s R&D+i Center for Energy Efficiency, Virtual Reality, Optical Engineering and Biometrics (CeDInt-UPM), Campus Montegancedo of International Excelence, Universidad Politécnica de Madrid, 28223 Pozuelo de Alarcón, Spain

³

Department of Applied Mathematics to ICT, ETSI Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(2), 202; https://doi.org/10.3390/math12020202

Submission received: 18 November 2023 / Revised: 22 December 2023 / Accepted: 27 December 2023 / Published: 8 January 2024

(This article belongs to the Special Issue Advanced Applications of Artificial Intelligence and Machine Learning in Biomedical Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Electrodermal Activity (EDA) has shown great potential for emotion recognition and the early detection of physiological anomalies associated with stress. However, its non-stationary nature limits the capability of current analytical and detection techniques, which are highly dependent on signal stability and controlled environmental conditions. This paper proposes a framework for EDA normalization based on the exponential moving average (EMA) with outlier removal applicable to non-stationary heteroscedastic signals and a novel set of features for analysis. The normalized time series preserves the morphological and statistical properties after transformation. Meanwhile, the proposed features expand on typical time-domain EDA features and profit from the resulting normalized signal properties. Parameter selection and validation were performed using two different EDA databases on stress assessment, accomplishing trend preservation using windows between 5 and 20 s. The proposed normalization and feature extraction framework for EDA analysis showed promising results for the identification of noisy, relaxed and arousal-like patterns in data with conventional clustering approaches like K-means over the aforementioned normalized features.

Keywords:

electrodermal activity; normalization; stochastic; biosignals; stress; feature extraction; rolling window; unsupervised classification

MSC:

68T20

1. Introduction

The main tenet of operationalism establishes that concepts cannot be fully understood until they can be measured [1]. For a long period, the analysis of Electrodermal Activity (EDA) has focused on understanding the underlying physiological processes that trigger any observable and measurable phenomena [2]. Sweat gland activity is regulated by the sympathetic function and has shown higher levels of reactivity to psychological stimuli than to environmental factors like temperature and humidity [3]. EDA can be measured from the skin conductance produced when a current is applied. Therefore, EDA can objectively assess the level of cognitive arousal and operation of the autonomous system, as the time and amplitude of the stimuli generated in the brain control centers dictate the observable properties. Its analysis has proven useful in clinical, behavioural and cognitive research. For example, the assessment of acute stress and anxiety derived from pain levels or cognitive load [4,5], emotional and cognitive stress detection [6,7,8,9,10], or emotional response detection [11], among others.

Nowadays, ambulatory and clinical EDA acquisition is trending towards the usage of embedded sensors in wearable devices. For example, devices with EDA measurements like Shimmer3 GSR+, Biopac or the Fitbit Sense 2 have contributed to establishing the prominence of wearable technology in the landscape of EDA analysis [11,12]. Nevertheless, research studies on EDA have shown Empatica devices (E4, EmbracePlus) to be the most prevalent among these due to the availability of their raw EDA data as well as their accessibility and reliability despite a more limited sampling rate and sensitivity than conventional approaches [7,13]. EDA as an indicator of sympathetic nervous activity has proven helpful for classification problems in clinical areas like stress, epilepsy or perioperative monitoring through pain detection [2,6,14]. EDA has successfully replaced more complex signals like EEG or EMG in these scenarios, increasing the viability of ambulatory analysis. In most cases, EDA is one of several biosignals acquired to achieve an enhanced data context of the user’s state, such as ECG, skin temperature or body movement. Shifting from clinical applications, emotion recognition applications have incorporated the usage of EDA to assist the decision [15,16].

However, EDA data acquisition and experimental repeatability has shown great complexity due to its nonstochastic behavior and sensitivity to environmental conditions [2], reducing the amount of available data and their quality. The quick fluctuations or response in EDA is the most valuable asset in regards to its analysis. Several factors determine the variability observed over the response properties: first, the diversity in skin properties associated to gender, age and ethnicity [2]; second, the major differences in response from the induced stimuli [2,13]; last but not least, when recording aspects like artifacts from movement or respiration, electrode malfunctions occur after long periods or excessive sweat.

In light of these considerations, the statistical properties can drastically change between studies and users. Data normalization can help standardize EDA analysis, as there is no general consensus on the processing approach. As a matter of fact, novel EDA-based applications like stress detection rely on Support Vector Machines (SVMs) [17], Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) Neural Networks [18,19,20], which are techniques that benefit from standardized and normalized features and values.

This paper proposes a novel technique for the normalization of EDA using the exponential moving average (EMA) inspired by the works of Ogasawara et al. [21] on the normalization of the monetary exchange rate time series. EMA is a frequently used indicator for sudden trend change detection due to its parameterization controlling the weight of the most recent values. Our method adapts the cited normalization technique to EDA properties and proposes a new set of features from the normalized time series for EDA analysis. The proposed features are in the time domain and can be easily normalized with minor considerations over the selected parameter values. In order to test the potential of the proposed normalized features, unsupervised classification with K-Means after PCA feature selection was performed using two different stress-induced response EDA datasets. THe clustering results showed that the proposed features and their distribution identified groups in data with response, noise and relaxation patterns as determined by their centroid values.

The remaining parts of the document are structured as follows: first, the methodology section presents the background on EDA analysis and the proposed EDA normalization and feature extraction framework. Then, the results section displays the parameterization analysis and the framework validation scenario with unsupervised learning. Finally, this paper concludes with a discussion around the normalization, parameterization, features and potential application scenarios for the framework described in this study.

2. Methodology

This section covers the basis and details of the proposed framework. First is the background on the properties of EDA and its variability as well as the current approaches for processing and analysis. After that, the different modules of the EDA normalization and feature extraction are presented. Last of all, we propose the validation scenario for the framework with K-means unsupervised classification and feature selection over stress data.

2.1. Background on EDA Analysis

The most commonly used measure of EDA is skin conductance (SC). It is acquired by applying direct current (DC) with constant voltage to the skin. SC can be decomposed into two components: tonic and phasic. The tonic component includes slow drifts of the baseline skin conductance level (SCL) and spontaneous fluctuations (SFs) in SC. On the other hand, the phasic component corresponds to the rapid variations in the conductance caused by changes in the sweat levels. High correlation between the sweat produced by the level of sympathetic arousal and bursts of sympathetic nerve activity and the amplitude of the rapid transient events in the EDA signal have been observed [22].

The skin conductance response (SCR) is the most outstanding property and reflects the short-time reaction to a stimulus while the non-specific response, known as Ns.SCR, relates to the remaining underlying processes. The typical shape of SCR is comprised of a relatively rapid rise from the conductance level followed by a slower, asymptotic exponential decay back to the baseline [23] as shown in Figure 1. The morphology of the responses has been accurately parameterized and described in relation to the acquisition, on the observed latency, recovery times, amplitude, shape, and area under the curve [2,23]. As a result, the tonic component can be interpreted as the combination of SCL and the Ns.SCRs because they cannot be linked to a specific stimuli but are the result of spontaneous fluctuations in EDA.

Artifacts from motion during acquisition also impact the observed EDA distribution, and in some cases they can be misidentified as sudden and brief responses followed by a change in the observed EDA baseline [2]. The major challenge regarding artifacts is the lack of a predetermined range of amplitude values. Nevertheless, several approaches have been deemed useful for artifact removal, such as low-pass filtering, curve fitting or Wavelets [13].

2.1.1. Statistical Differences in EDA Properties

Specifically, the SCL and Ns.SCRs exhibited higher coefficients of variation [13]. Aging and gender stand out as the two main demographic factors for the variability observed in EDA. On the one hand, a decrease in SCL is observed for older aging groups during all main scenarios (resting, active, etc.) along with lower Ns.SCR frequency and SCR amplitude in older males [2]. On the other, gender differences focus on higher SCL scores in females, while males present higher reactivity under stimulation [2,24].

The SCL oscillates with sweat, blood flow changes, phase of the day and environmental factors, like temperature or humidity [2,25]. Under high-arousal inference scenarios, the SCL can suffer a quick increasing swift caused by unidentified high-frequency Ns.SCRs. Consequently, the upper boundary of the tonic component cannot be accurately delimited for all scenarios. In some cases, the baseline value can exceed more than 20

μ

S. Genuine SCL determination is hindered by low predictable response fluctuations, sweat caused after prolonged electrode contact, and environmental factors that can alter skin properties, like temperature or humidity.

Meanwhile, the phasic component presents amplitude, latency and recovery parameters that describe its course much differently to the shape of artifacts. Typical latency times range between 1 and 2 s but can be prolonged up to 5 s under skin cooling [2,26]. Minimal conductance values can range between 0.1 and 0.5

μ

S depending on the acquisition ranges, signal-to-noise ratio and reactivity during acquisition [2]. Moreover, the response has shown dependency with the observed SCL at its initial point. This level serves as reference for a relative amplitude criterion of between 0.1% and 10% of the initial level, and 0.05

μ

S in overall terms [2]. As these parameters depend on the tonic component, major variations may appear from the previously described conditions.

Another case of concern occurs when the inter-stimulus interval (ISI) is shorter than the recovery time of the first response, and then two SCRs overlap. This occurrence is observed in many experimental paradigms, particularly in cognitive neuroscience where common values of ISI (1–2 s) are generally shorter than the recommended minimum ISI to avoid such an overlap, which is around 10–20 s [25,27]. In general, SC decomposition to phasic and tonic is limited by the presence of overlapping SCRs and artifacts.

2.1.2. Normalization and Analysis

Current EDA applications rely on EDA as a time series or feature-based approaches. In this respect, artifact removal and data transformation, like normalization, have shown to improve the overall quality and ease of analysis through signal-to-noise increase and achieving less sparse statistical properties and reduced variance.

One valid approach for EDA response extraction consists in estimating and subtracting the SCL component. SCL Score Averaging [2,28] and low-pass signal-filtering techniques have shown acceptable SCL scores [29]. Nevertheless, these approaches tend to overestimate the SCL score under high-arousal scenarios with overlapping Ns.SCR [28] and movement-induced artifacts. When internal external stimuli are not ascertained, any observed spontaneous fluctuations are regarded as part of tonic EDA, which may impact the statistical distribution of the amplitude and frequency of SC. Both the Ns.SCR frequency and SCL score play an important role for arousal, emotional and stress research [4,30,31].

Range correction procedures adjust values to intraindividual ranges for the SCL and SCR through the determination of lower and upper boundaries for each component [32]. However, the extended periods of time required to acquire the minimum SCL and the maximum SCR [2] reduce its feasibility. Alternatively, z-scoring [2] has shown to provide a more accurate depiction of the interindividual variance in the response, achieving different normal distributions. Nevertheless, this approach is not reliable for small datasets and uncontrolled acquisition conditions.

Artifact removal can be easily achieved in most cases using band-pass filters or spike detection algorithms at the expense of losing some of the response properties in exchange [2,4,31]. Recent computer-based techniques like cvxEDA [25] are able to accurately extract the tonic and phasic components separately. These approaches do not require high computational resources but are restricted to properties like the sampling frequency or sensitivity of the sensor.

When it comes to analyzing EDA, time and frequency domains are the historically recurrent options for EDA feature extraction. The time domain focuses on the extraction of SCL, SCR and Ns.SCR properties (e.g., heights, times, and area under the curve) previously cited, as well as the overall signal information. In the case of the frequency domain, Fourier Transform (FT) and Power Spectral Density (PSD) analyses are common approaches [33]. Alternatively, wavelet coefficients have been proposed due to their robustness and merging of time and frequency properties, especially for non-stationary signals like EDA [30,34]. Windowing is applied to split the data for the extraction of statistical and transform features, like the ones described in Table 1. The window length, which is typically in the range of 10 to 60 s, determines the observed feature distribution. Feature value normalization is typically applied on a database scope, as environmental and acquisition condition differences may lead to different values and ranges among the EDA properties. Overall, frequency-domain features demand higher computational costs and requirements compare to time-domain features but displayed greater inter-user consistency and stability among different types of induced-stress scenarios [13]. This suggests that time-domain features may benefit from better normalization approaches.

2.2. Proposed Normalization and Feature Extraction Framework

This section presents the details of the normalization and feature extraction modules involved in our framework. The signal is first smoothed with a band-pass filter before being transformed using the adaptive normalization of non-stationary heteroscedatic time series. The feature extraction module processes the normalized signal in windows to generate the proposed set of features. The pipeline described in Figure 2 illustrates the connections and outputs from each module.

Along with the Adaptive Normalization (AN) parameters described in the work by Ogasawara et al. [21], we incorporate the interval and feature extraction window parameters. Two stress databases with EDA data were used for parameter tuning across the framework and the analysis of feature distributions.

2.2.1. Databases

The EDA data used to adjust the normalization parameters come from acute stress-inducing experiments recorded with the Empatica E4 wearable wristband. This device possesses a 4 Hz sampling frequency for the EDA sensor with a precision of 0.9 nS and range of sensitivity between 0.01

μ

S and 100

μ

S.

First, the Wearable Exam Stress Dataset [35,36,37] provides 60 h of data from 10 different anonymous young students. The experiment was divided into two midterm exams of 90 min and a final test of 180 min long with additional test performance results to address the effects of stress. Differences among students are observed in exam duration and response to the induced conditions. For this reason, we discarded recording sections with inactivity from the user when detecting low or null amplitude over several minutes.

Secondly, a stress induction dataset was acquired at GB2S-UPM [38] with inference through the psychological Stroop test, Anagram and Arithmetic tests preceded by 120 s resting phases and a final recovery phase. A total amount of 45 healthy individuals (males and females) aged between 25 and 50 with technical and administrative profiles were employed. Each participant provided 20 min of EDA recording in addition to test performance metrics. The main observed differences relate to the induced user responses for each test, especially under Stroop.

The acquisition protocol consisted of the sitting-down position and free hand movements for both datasets. As shown in Table 2, there are significant intra and intersubject differences in the variability and ranges of EDA, which is desirable for the purpose of this analysis.

2.2.2. Filtering and Adaptive Normalization

The 4 Hz sampling frequency used in both databases restricts the maximum upper cut frequency for the high-pass or band-pass filters to 2 Hz maximum as defined by the Nyquist theorem. In order to achieve a short and abrupt slope at this cutoff frequency, complex high-order filters are required. Alternatively, a median filter with a kernel size of five samples (approximately 2 s) was chosen as the smoothing and high-frequency removal approach due to its lessened complexity and cost of implementation. The median filter acts as a band-pass filter focused on the central value of the kernel to achieve noise reduction while preserving some higher-frequency properties.

For the normalization module, we propose a variation of the AN in the works by Ogasawara et al. [21] based on the sliding window technique normalization. The raw recorded EDA time series is split into intervals of size n, then transformed and processed before finally being normalized. The original signal properties are carried in each of the Disjoint Sliding Windows (DSWs) created, which represent various levels of volatility in the data. Two main parameters govern the transformation: the order k of the EMA and the length

ω

for the DSW. Being that S is the interval of data with length n from the original time series of length N, and

S^{(k)}

is the EMA of order k (

α

= 2/(k + 1)), the sequence R is defined as:

R [i] = \frac{S [⌈ \frac{i}{ω} ⌉ (i - 1)] m o d ω}{S^{(k)} [⌈ \frac{i}{ω} ⌉]}

(1)

where 1 ≤ i ≤ (n −

ω

+ 1)

\cdot ω

. The new sequence R described in (1) is composed of (n −

ω

+ 1) different DSWs. The EMA factor in the denominator is important to preserve the original trend of the time series and to bring the same inertia to all the values in a DSW. The applied EMA performs overall data smoothing, thus reducing the small variations caused by the sensor’s sensitivity while retaining high-frequency changes as shown by Figure 3. As a result, artifacts are preserved in the resulting sequence R but are expected to represent boundary or outlier values inside the interval data.

Our proposal incorporates the additional parameter n for the interval size and normalization. The n ≤ N interval size parameter aims to preserve the localized seasonality in the non-stochastic time series. As previously shown, the time-related parameters of the SCR (rise, recovery, and ISI) are delimited to a well-known range between 1 and 10 s. In general, these regularization parameters aim to achieve regularization and standardization of the resulting data and features.

Processing is composed of outlier removal in the resulting sequence R based on box plots and data normalization in order to reduce the presence of artifacts in the final data. The original approach in the work by Ogasawara et al. treats values outside of the

[Q 1 - 1.5 I Q R, Q 3 + 1.5 I Q R]

range as outliers, with Q1, Q3 and IQR representing the first, third quartiles and interquartile range values, respectively. In our case, the proposed range needs to be addressed after observing the EDA values, artifact properties, device sensitivity and range of precision. Finally, a min-max normalization of sequence R in the range of [0, 1] is applied adequately to the resulting distribution.

2.2.3. Feature Extraction

Several conventional EDA-related features are extracted from the normalized signal: first, the signal amplitude from the normalized time series R; second, the local maxima and minima properties, like height and width; and the difference in time (distance) and amplitude between consecutive peaks of the same type. The volatility sharpness and frequency properties can be described through the analysis of consecutive minima and maxima separately. These critical points are detected using peak detection algorithms with configurable peak height and width thresholds. Therefore, it is possible to separate true response changes from small swifts caused by the sensitivity limitations by peak parameter configuration. Figure 4 displays the step-by-step process of the feature extraction module.

The extraction of shape-related features from the resulting normalized time series aims to enhance the information typically present in amplitude and critical point properties. Tendency features focus on describing rising or descending sections composed by consecutive local maxima or minima. The duration, height, and length properties are extracted from the tendencies like the ones present in Figure 5.

Max, Min, Median, Std. deviation and IQR are the statistical properties calculated for the features in each of the extraction windows. Additionally, count features aim to provide a complementary and comparative background to the statistical features. The window size for feature extraction is expected to capture multiple DSWs so that the statistics from the features reflect the transformation properties. For example, selecting 5 s feature extraction and

ω

window sizes are a sufficient time range to capture the effects of volatility through multiple DSWs. The usage of width and height thresholds reduces the number of peaks like small fluctuations caused by sensor sensitivity. For this reason, the extraction of tendency features from Table 3 require prior peak filtering to reduce noise and improve results.

2.2.4. Feature Distribution

The statistical distributions from the list of features present in Table 3 are either normal or multimodal, with the latter being the predominant one. A limited set of futures, such as the height and first difference of the normalized signal, present a Gaussian distribution. Null values are expected, as peak and tendency features are determined by the presence of peaks after the application of the height and width thresholds. For example, Figure 3 shows multiple intervals with no descending maxima tendencies.

One of the main challenges in feature creation is achieving a low percentage of nulls and simplified filling. Filling values in the case of count features can be trivialized to a 0 value. On the other hand, duration features value selection may impact cases with a short length. Moreover, features with a high probability of nulls are expected to present multimodal distributions with one of these values set at the filling value.

All proposed feature distributions are impacted by the chosen parameter values. For example, the smoothing factor k is expected to help detect more volatility after normalization when set to relative higher values. Large values of n and

ω

may result in excessive levels of normalization smoothing. On the other hand, increasing the number of DSWs by setting the

ω

close to n is expected to cause dissimilarity, especially under high volatility levels.

Ideally, these parameter values need to be set so that the volatility caused by regular and overlapping SCRs as well as artifacts and sudden changes can be easily separated from stable intervals. Reducing the number of peaks to more predominant or isolated cases can help detect more volatility in median and IQR features. Peak detection threshold parameters can be set over the results of normalization to adjust the distributions. Multimodal distributions may indicate the presence of multiple groups in the data. When combining multiple multimodal or bimodal features from a dataset, it can lead to the discovery of patterns in the data.

2.3. Application to Unsupervised Classification Scenario with Stress Data

The purpose of this scenario is to validate the proposed framework by aiding with parameter value optimization, search patterns in data through feature distributions, and identifying those features from Table 3 that are most proficient for future analysis and classification applications. The main handicaps are the significant presence of noise, the high dimensionality of the proposed features, and their probable correlation.

Adaptive normalization behaves similar to a convolution with a window size

ω

. In cases where the number of maxima or minima are low or equivalent to the number of DSWs, the peak and tendency features may present a high correlation between them. This is due to tendency features originating from peak properties, like height or distance. High correlation can be mitigated using feature selection approaches.

2.3.1. Feature Selection

For this scenario, we propose a feature selection approach based on feature importance and variability split in two phases. First, the identification of those features that present high-correlation coefficient values to those in the proposed set can help reduce information redundancy. The structure shown in Table 3 implies that the features grouped under the same domain and/or property are expected to share some level of correlation. Additionally, features with a high percentage of null cases are expected to show positive correlation between each other. Correlation coefficients provide an intuitive reference on the level of redundancy while allowing different filtering criteria, for example, selecting the top 10 features that present lower absolute correlation coefficients underneath a threshold value to at least 90% of the other proposed features.

Secondly, feature variability is measured to locate those features which may have more predictive power. Principal Component Analysis (PCA) is a recurrent technique for insight obtained through dimensionality reduction [39]. Additional normalization is applied over the proposed set of features in order to achieve scaled variables and linearity for PCA. As displayed in Table 4, different normalization approaches are proposed for each feature type. Height-related features present already normalized values from the proposed AN transformation. For example, the normalization factor for count features can be set to the

ω

parameter value or the selected feature extraction window length.

2.3.2. Interval Clustering

The previously described EDA feature selection approach aims to strengthen the framework for classification scenarios. Emotion recognition, arousal and stress classification with EDA have shown poor performance with probabilistic model classifiers like Naïve Bayes [15] or Trees [40] because of low independent features. PCA has served as a reliable approach for achieving independent variables and further cluster analysis [4].

Dimensionality reduction methods like PCA have demonstrated to successfully improve the performance of SVM classifiers and K-means clustering [17,34] but could also be applied to different NNs. Scenarios of application vary from emotion recognition to sleep analysis studies, where different response properties are expected to be observed over time. Wavelet coefficients, AUC and deviation of each EDA component have shown the best results among the features shown in Table 1.

Unsupervised classification is a recurrent technique for decision support and data understanding. The K-means algorithm serves as a reliable clustering approach for the partition of data spaces. For example, it has been applied to cluster EDA responses and users in groups [4]. Similarly, Self-Organizing Maps (SOMs) have been used for EDA feature selection over arousal state discrimination with good overall performance results [41].

Due to its more intuitive approach for analysis, K-means was chosen as a baseline clustering method to validate the proposed framework. The number of optimal clusters for the K-means may contribute to determining the actual volatility groups present in the data and the responsible key features from the obtained cluster centroids.

3. Results

The analysis provides insights into the different parts of the proposed framework: parameterization of the normalization, feature analysis and clustering results. Parameter and feature selection was performed using the main phases of activity from all users in the datasets and validated with 1000 randomly chosen intervals previously excluded from the selection stage.

3.1. Framework Parameters

The focus on parameter optimization is the preservation of volatility and reduction in information present in the original signal. These can be analyzed through the morphology of the transformed signal and the proposed feature distributions. First, the EMA regularization parameter k shows lower influence in the preservation of volatility properties than

ω

despite being the main parameter controlling the denominator in (1). Higher values of k are especially relevant during volatile sections of data, causing greater variability in the DSWs. This has some implications for the statistical features extracted too like std. deviation and IQR.

The DSW size parameter

ω

is shown to influence the number of peaks and the consistency in the distribution of values in R. Since volatility in EDA is caused by regular and overlapping SCRs and artifacts, values of

ω

between 2 and 10 s can fit this purpose. Nonetheless, higher values are more likely to ignore volatility. On the other hand, critical points can be found where two different DSWs meet. Lower values of

ω

increase the number of DSWs and the relative volatility observed in the samples. Ideally, we aim to preserve some level of variability in the volatility levels of each extraction window and, as a result, the AN parameters were set to n = 20,

ω

= 5 and k = 2 s (Figure 6).

Under transformation with

ω

= 5, the resulting distribution for the test data presented a tightly clustered shape with Q1 = 0.96 and Q3 = 1.09 over the test samples. When comparing the IQR = 0.13 with the min = 0.81 and max = 1.54, it is clear that the proposed upper outlier boundary of Q3 + 1.5IQR = 1.29 corresponds to the 97% percentile. In light of these results, the outlier boundaries need to be redefined for our scenario. Distribution analysis determined that the range [Q1 − 0.25IQR, Q3 + 0.75IQR] preserved 5% to 95% percentiles of the values. This new boundary applies a more statistical reasonable outlier removal over the transformed data.

The distortion in the actual distribution caused by threshold and excessive parameterization was analyzed for the proposed features. The number of samples after the normalization module is increased as a result of the transformation, but the width of peaks is preserved due to the creation of the different DSWs. In detail, inspection of the local maxima and minima width during the main recording sections of both databases showed a distribution with a Q1 of 4.2 and IQR of 7, with lower probability in the range of [1, 4] for the maxima. Therefore, peaks under five samples of width were assumed to be small sensitivity changes. As shown in Figure 7, the effects of increasing the DSW size and the peak thresholds reduce the number of peaks and narrow the distribution range. In the case of tendency-related features, bimodal distribution appears after applying higher thresholds. Other peak properties, such as peak and interval heights, appear to center around one specific value as the threshold value increases. As a result, a minimum peak width of four samples was selected in order to preserve as much of the bimodal and multimodal feature distributions as before thresholding while discarding small and localized volatility cases for the feature extraction process.

3.2. Feature Selection

The proposed feature selection approach consisted in the minimization of feature correlation and assuring variability among the final set. Additional feature engineering after the feature normalization was applied to improve the PCA performance due to count property features possessing upper values far from the desirable [0, 1] range. The tendency count feature distributions were transformed from continuous to discrete values to reflect the presence (1) or absence (0) of these properties. Additionally, the PCA results are expected to improve, as the previous distribution’s upper boundary n_cases/n_samples_window is far from the desirable standardized and normalized value. The information on the number of cases is expected to be captured by the properties in other features. Nevertheless, high correlation values between the engineered tendency presence features and the remaining ones are to be expected.

Selecting around 8 to 15 final features is a good dimensionality size. The optimal correlation index limit was analyzed through iteration using a 0.03 step increase. Specifically, nine features presented an absolute correlation coefficient below 0.7 with all the remaining ones. Figure 8 displays the mutual correlation levels between them. Height and duration features still present values close to the selected correlation boundary. Additionally, the redefined tendency presence features showed high correlation with multiple other features, but their usage is essential to differentiate low volatility.

The distribution on the selected features related to the count and first difference properties showed bimodal distributions. Bimodal distributions predominate among the count and first difference features. Furthermore, filling values were typically the most probable value in the distribution of the multimodal distributed features. PCA achieved around 95% of the explained variance with six components over the correlation-based feature selection as shown in Table 5. For reference, a 20% increase in variability was captured when compared to two components. The previously cited tendency presence distribution changes may be responsible for the increase in the number of components required.

3.3. Clustering Analysis

Silhouette coefficients and elbow plots were used to determine the optimal number of clusters for the K-means algorithm. In this scenario, ideal clustering should guarantee highly differentiated signal shapes and uneven populations, as the response inference phases are expected to be longer and continue through some resting states. Additionally, the decision on the selected number of clusters was supported by visual observations of the time-series labelling. The maximum number of clusters for this scenario was set to 10. As shown in the silhouette coefficient graph in Figure 9, two points of emphasis at K = 4 and K = 6 clusters were found. The elbow plot was used to determine that the optimal number of clusters was four. Nevertheless, we analyzed the impact for these two cases.

Using K = 6 clusters provided mixed SCR response clusters due to the Gaussian ball behavior of K-means and its aim for distanced centroids. On the contrary, using four clusters resulted in two of them (cluster 0 and 1) presenting opposite SCR responses, while the remaining grouped cases of noise and irregular behaviors. Section presence and number of minima proved to be the leading factors in separating the clusters for K = 4. Nevertheless, both cases showed clusters with expected response outliers due to the issues regarding the decision boundaries using K-means clustering. Moreover, visual validation over the original signal revealed that the usage of K = 4 clusters achieved reliable identification of no-response intervals or SCR recovery-time windows while grouping noisy or non-specific responses together. Table 6 summarizes the cluster assignments in terms of the original signal morphology and the percentage of samples with a test set of 1000 intervals.

As shown in Figure 10, an average silhouette coefficient of 0.59 was achieved, with cluster 2 being the only case with negative difference values. Most outliers were assigned to cluster 2. When applied to the original signal intervals, clusters 0 and 1 presented a clear definition of cases, with cluster 0 capturing some of the recovery time intervals. Cluster 2 can be defined as regions with soft response and stable tonic levels. Finally, cluster 3 indicates the presence of volatility with no recovery.

Figure 11 shows the clustering method applied to each 5 s feature extraction window using 20 s interval normalization throughout 120 s of recording during the midterm exam. Clear descending sections are captured by cluster 0 (red), and SCR shaped sections are tagged to cluster 1, which is the most predominant. The remaining two clusters are represented in high- and low-volatility cases. Table 6 expands on the cluster centroid information and description.

4. Discussion

This paper introduced a novel normalization and feature extraction framework for the analysis of EDA signals. The aim of this study was to analyze the feasibility and impact of using the exponential moving average (EMA) as the main standardization component prior to data normalization. Two different stress-related datasets with non-equivalent acquisition protocols and frameworks and considerable EDA variability among subjects were used for that purpose. In addition, we proposed a new set of morphology EDA features based on peak tendencies over the normalized signal. The proposed framework does not represent a complex and computationally costly approach for its development across different environments.

EMA has shown able to smooth the signal and preserve sudden trend changes. The observed signal volatility can be interpreted as response changes or artifacts. The main regularization parameter

ω

showed the most impact over the transformed signal in terms of trend and volatility preservation. Small values in the range of 2–5 s were able to preserve these traits adequately enough for the study. Overall, the parameterization applied can be susceptible to scenario changes, such as different acquisition sampling frequencies or sensitivity. However, the

ω

parameter value is set to deal with the expected shape of EDA under induced responses. Downsampling to 5–10 Hz may be required with greater frequencies than the one used for this study. The other parameterization before the final data normalization corresponds to the boundaries set for outlier data. The selected [Q1 − 0.25IQR, Q3 + 0.75IQR] range should be reviewed and adapted to the observed signal and study conditions accordingly. For the proposed scenario, it can be asserted that not all artifacts were present outside the boundaries. Therefore, outlier boundaries validation must be addressed prior to data normalization with the proposed framework.

In regards to the proposed features from the normalized signal, the observed bimodal and multimodal distributions are caused by the previously chosen parameter values. The applied peak threshold is key for the tendency features, as small and short peaks may introduce noise to these values. Each of the modal values in the distributions imply the existence of groups in the data. In parallel to the high dimensionality and correlation levels, more than 30% of the proposed tendency features presented null values. Excessive threshold values may be responsible as well as low information present in the feature originally. The distribution observed on the peak distances, first difference values and tendency properties showed reliability for their usage on response change detection.

The validation scenario using PCA for feature selection and K-means clustering over stress data illustrated some key aspects. First, the approach was able to locate three main clusters of intervals in the data: responses, relaxation and noisy/undetermined. These clusters present very distant volatility levels between one another. Secondly, the selected or most prevalent features came from tendency and peak properties in the normalized signal, where filling values clearly specified absence, such as interval duration or height differences. As evidenced by the K-means centroid feature values, tendency pattern distributions were identified for the intervals with response and relaxation. The remaining clusters presented higher numbers and variability in their peak properties. Therefore, we can assure that tendency features could become useful, as well as reassuring that the shape of the normalized signal can take a decisive part in future classification scenarios. Peak width thresholds contributed to data grouping by establishing the boundaries for each state or cluster in exchange for loss of the actual height, width and length information present in features.

Further works can explore different classification approaches over the transformed and normalized EDA signal and selected features from our validation scenario. As cited in this study, the usage of neural networks can expand the areas of application of the proposed normalization approach. On the one hand, self-organizing maps (SOMs) may contribute to the identification of more volatility-related groups and a better understanding of feature association to them. On the other hand, CNNs and deep neural networks (DNNs) over the normalized signal R can be applied to supervised classification scenarios, thus solving the previously cited issues of high-dimensionality and mutually correlated and null features.

Author Contributions

Conceptualization, M.V.-M. and C.S.-Á.; Methodology, M.V.-M.; Validation, M.V.-M.; Formal analysis, M.V.-M.; Resources, C.S.-Á.; Data curation, M.V.-M.; Writing—review & editing, M.V.-M.; Supervision, C.S.-Á. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chang, H. Operationalism. In The Stanford Encyclopedia of Philosophy, Fall 2021 ed.; Zalta, E.N., Ed.; Metaphysics Research Lab, Stanford University: Stanford, CA, USA, 2021. [Google Scholar]
Boucsein, W. Electrodermal Activity; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Bach, D.R.; Friston, K.J. Model-based analysis of skin conductance responses: Towards causal models in psychophysiology. Psychophysiology 2013, 50, 15–22. [Google Scholar] [CrossRef] [PubMed]
Raymond, K. Analyzing Electrodermal Activity Data with an Unsupervised Machine Learning Approach. Ph.D. Thesis, University of Guelph, Guelph, ON, Canada, 2021. [Google Scholar]
Kong, Y.; Posada-Quintero, H.F.; Chon, K.H. Real-Time High-Level Acute Pain Detection Using a Smartphone and a Wrist-Worn Electrodermal Activity Sensor. Sensors 2021, 21, 3956. [Google Scholar] [CrossRef] [PubMed]
Fernandes, A.; Helawar, R.; Lokesh, R.; Tari, T.; Shahapurkar, A.V. Determination of stress using Blood Pressure and Galvanic Skin Response. In Proceedings of the 2014 International Conference on Communication and Network Technologies, Sivakasi, India, 18–19 December 2014; pp. 165–168. [Google Scholar] [CrossRef]
Healey, J.A.; Picard, R.W. Detecting stress during real-world driving tasks using physiological sensors. IEEE Trans. Intell. Transp. Syst. 2005, 6, 156–166. [Google Scholar] [CrossRef]
Rahma, O.; Putra, A.; Rahmatillah, A.; Putri, Y.; Fajriaty, N.; Ain, K.; Chai, R. Electrodermal activity for measuring cognitive and emotional stress level. J. Med. Signals Sens. 2022, 12, 155–162. [Google Scholar] [CrossRef] [PubMed]
Winter, M.; Pryss, R.; Probst, T.; Reichert, M. Towards the Applicability of Measuring the Electrodermal Activity in the Context of Process Model Comprehension: Feasibility Study. Sensors 2020, 20, 4561. [Google Scholar] [CrossRef]
Buchwald, M.; Kupiński, S.; Bykowski, A.; Marcinkowska, J.; Ratajczyk, D.; Jukiewicz, M. Electrodermal activity as a measure of cognitive load: A methodological approach. In Proceedings of the 2019 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), Poznan, Poland, 18–20 September 2019; pp. 175–179. [Google Scholar] [CrossRef]
Horvers, A.; Tombeng, N.; Bosse, T.; Lazonder, A.W.; Molenaar, I. Detecting Emotions through Electrodermal Activity in Learning Contexts: A Systematic Review. Sensors 2021, 21, 7869. [Google Scholar] [CrossRef] [PubMed]
Ronca, V.; Martinez-Levy, A.C.; Vozzi, A.; Giorgi, A.; Aricò, P.; Capotorto, R.; Borghini, G.; Babiloni, F.; Di Flumeri, G. Wearable Technologies for Electrodermal and Cardiac Activity Measurements: A Comparison between Fitbit Sense, Empatica E4 and Shimmer GSR3+. Sensors 2023, 23, 5847. [Google Scholar] [CrossRef] [PubMed]
Posada-Quintero, H.F.; Chon, K.H. Innovations in Electrodermal Activity Data Collection and Signal Processing: A Systematic Review. Sensors 2020, 20, 479. [Google Scholar] [CrossRef]
Susam, B.T.; Akcakaya, M.; Nezamfar, H.; Diaz, D.; Xu, X.; de Sa, V.R.; Craig, K.D.; Huang, J.S.; Goodwin, M.S. Automated Pain Assessment using Electrodermal Activity Data and Machine Learning. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 18–21 July 2018; pp. 372–375. [Google Scholar] [CrossRef]
Zhao, B.; Wang, Z.; Yu, Z.; Guo, B. EmotionSense: Emotion Recognition Based on Wearable Wristband. In Proceedings of the 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Guangzhou, China, 8–12 October 2018; pp. 346–355. [Google Scholar] [CrossRef]
Caruelle, D.; Gustafsson, A.; Shams, P.; Lervik-Olsen, L. The use of electrodermal activity (EDA) measurement to understand consumer emotions—A literature review and a call for action. J. Bus. Res. 2019, 104, 146–160. [Google Scholar] [CrossRef]
Sánchez-Reolid, R.; Martínez-Rodrigo, A.; López, M.T.; Fernández-Caballero, A. Deep Support Vector Machines for the Identification of Stress Condition from Electrodermal Activity. Int. J. Neural Syst. 2020, 30, 2050031. [Google Scholar] [CrossRef]
Sánchez-Reolid, R.; López de la Rosa, F.; López, M.T.; Fernández-Caballero, A. One-dimensional convolutional neural networks for low/high arousal classification from electrodermal activity. Biomed. Signal Process. Control 2022, 71, 103203. [Google Scholar] [CrossRef]
Vasile, F.; Vizziello, A.; Brondino, N.; Savazzi, P. Stress State Classification Based on Deep Neural Network and Electrodermal Activity Modeling. Sensors 2023, 23, 2504. [Google Scholar] [CrossRef] [PubMed]
Li, R.; Liu, Z. Stress detection using deep neural networks. BMC Med. Inform. Decis. Mak. 2020, 20, 285. [Google Scholar] [CrossRef] [PubMed]
Ogasawara, E.; Martinez, L.; de Oliveira, D. Adaptive Normalization: A novel data normalization approach for non-stationary time series. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; pp. 1–8. [Google Scholar] [CrossRef]
Wallin, B.G. Sympathetic nerve activity underlying electrodermal and cardiovascular reactions in man. Psychophysiology 1981, 18, 470–476. [Google Scholar] [CrossRef] [PubMed]
Dawson, M.E.; Schell, A.M.; Filion, D.L. The electrodermal system. Handb. Psychophysiol. 2007, 2, 200–223. [Google Scholar]
Carrillo, E.; Moya-Albiol, L.; González-Bono, E.; Salvador, A.; Ricarte, J.; Gómez-Amor, J. Gender differences in cardiovascular and electrodermal responses to public speaking task: The role of anxiety and mood states. Int. J. Psychophysiol. 2001, 42, 253–264. [Google Scholar] [CrossRef] [PubMed]
Greco, A.; Valenza, G.; Scilingo, E. Advances in Electrodermal Activity Processing with Applications for Mental Health; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar] [CrossRef]
Edelberg, R. Electrical properties of the skin. In Methods in Physiopsychology; Brown, C.C., Ed.; Williams & Wilkins: Baltimore, MD, USA, 1967; pp. 1–53. [Google Scholar]
Breska, A.; Maoz, K.; Ben-Shakhar, G. Interstimulus intervals for skin conductance response measurement. Psychophysiology 2011, 48, 437–440. [Google Scholar] [CrossRef]
SL Pineles, M.O.; Orr, S. An alternative scoring method for skin conductance responding in a differential fear conditioning paradigm with a long-duration conditioned stimulus. Psychophysiology 2009, 46, 984–995. [Google Scholar] [CrossRef]
Thammasan, N.; Stuldreher, I.; Schreuders, E.; Giletta, M.; Brouwer, A.M. A Usability Study of Physiological Measurement in School Using Wearable Sensors. Sensors 2020, 20, 5380. [Google Scholar] [CrossRef]
Shukla, J.; Barreda-Ángeles, M.; Oliver, J.; Nandi, G.C.; Puig, D. Feature Extraction and Selection for Emotion Recognition from Electrodermal Activity. IEEE Trans. Affect. Comput. 2021, 12, 857–869. [Google Scholar] [CrossRef]
Sánchez Reolid, R.; López Bonal, M.; Fernández-Caballero, A. Machine Learning for Stress Detection from Electrodermal Activity: A Scoping Review. Preprints 2020, 2020110043. [Google Scholar] [CrossRef]
Braithwaite, J.; Watson, D. Issues Surrounding the Normalization and Standardisation of Skin Conductance Responses (SCRs). 2015. Available online: https://www.birmingham.ac.uk/Documents/college-les/psych/saal/research-note-SCRs.pdf (accessed on 1 November 2023).
Posada Quintero, H.F.; Florián, J.P.; Orjuela-Cañón, A.D.; Aljama Corrales, T.; Charleston-Villalobos, S.; Chon, K.H. Power Spectral Density Analysis of Electrodermal Activity for Sympathetic Function Assessment. Ann. Biomed. Eng. 2016, 4, 3124–3135. [Google Scholar] [CrossRef]
Xia, V.; Jaques, N.; Taylor, S.; Fedor, S.; Picard, R. Active learning for electrodermal activity classification. In Proceedings of the 2015 IEEE Signal Processing in Medicine and Biology Symposium (SPMB), Philadelphia, PA, USA, 12 December 2015; pp. 1–6. [Google Scholar] [CrossRef]
Amin, M.R.; Wickramasuriya, D.; Faghih, R.T. A Wearable Exam Stress Dataset for Predicting Cognitive Performance in Real-World Settings, Version 1.0.0. PhysioNet. 2022. Available online: https://physionet.org/content/wearable-exam-stress/1.0.0/ (accessed on 1 November 2023). [CrossRef]
Wickramasuriya, D.S.; Amin, M.R.; Faghih, R.T. A Wearable Exam Stress Dataset for Predicting Grades using Physiological Signals. In Proceedings of the 2022 IEEE Healthcare Innovations and Point of Care Technologies (HI-POCT), Houston, TX, USA, 10–11 March 2022. [Google Scholar]
Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mietus, J.E.; Moody, G.B.; Peng, C.-K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, e215–e220. [Google Scholar] [CrossRef] [PubMed]
Group Biometrics, Biosignals, Security and Smart Mobility (GB2S) @ Universidad Politécnica de Madrid (UPM).
Jolliffe, I. Principal Component Analysis. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 1094–1096. [Google Scholar] [CrossRef]
Katsis, C.D.; Katertsidis, N.S.; Fotiadis, D.I. An integrated system based on physiological signals for the assessment of affective states in patients with anxiety disorders. Biomed. Signal Process. Control 2011, 6, 261–268. [Google Scholar] [CrossRef]
Grigore, O.; Bornoiu, I.V. Kohonen Neural Network Stress Detection Using Only Electrodermal Activity Features. Adv. Electr. Comput. Eng. 2014, 14, 71–78. [Google Scholar] [CrossRef]

Figure 1. Original and median filtered EDA acquired using a wearable sensor placed on the wrist (Empatica E4). The evolution of the EDA signal with time shows the baseline and response during cognitive activity. The highlighted area around 35 s shows the amplitude, rise and recovery times from an observed Ns.SCR.

Figure 2. Pipeline of the proposed method for the normalization and extraction of EDA features.

Figure 3. On the left, original signal after median filtering. On the right, the transformed signal after min-max normalization (n = 20,

ω

= 5 s) shows abrupt rising sections in the DSWs within 0 and 2 s. After the minima at 2 s, the DSWs present less variability due to the increase in the original signal.

Figure 3. On the left, original signal after median filtering. On the right, the transformed signal after min-max normalization (n = 20,

ω

= 5 s) shows abrupt rising sections in the DSWs within 0 and 2 s. After the minima at 2 s, the DSWs present less variability due to the increase in the original signal.

Figure 4. Pseudocode for the proposed feature extraction. First, the local maxima and minima (peaks) are detected. After that, the location and height of these peaks are used to determine the variables (height, duration and length) of the rising and falling sections for each type of peak.

Figure 5. Example of tendency features. Section height and duration values can be interpreted as the absolute value of the difference between the first and last peak that compose the section. Highlighted in red are two rise sections composed by consecutive rising local minima. The one to the left has 0.4 height, two peaks and 0.15 s of duration. The second rising section highlighted is shorter in height but longer. The two purple highlighted falling sections from left to right are composed by the local minima and maxima, respectively.

Figure 6. Comparison between the transformed and normalized signal using different values of the

ω

parameter with same n value.

Figure 6. Comparison between the transformed and normalized signal using different values of the

ω

parameter with same n value.

Figure 7. Effect of the peak threshold and

ω

parameters over different proposed features.

Figure 7. Effect of the peak threshold and

ω

parameters over different proposed features.

Figure 8. Correlation heatmap of the selected 9 features that presented less than 0.7 absolute correlation with all of the features in the proposed list.

Figure 9. Silhouette coefficients and elbow graph for the K-means clustering over the PCA data with 6 components.

Figure 10. Silhouette plots for the 4 identified clusters in the test data.

Figure 11. Cluster assignment to the 5-second feature extraction windows over 120 s of the raw EDA signal after the first 20 min of an exam. Predominance from cluster 0 and 1 can be observed in this case.

Table 1. List of typically used EDA features in analysis from the time and frequency domains.

Domain	Component	Variable	Description	Range
Time	SCR	Peak Amplitude	SCRs peak amplitude	$μ$ S
		AUC	Area under the SCRs	-
		Amplitude	Signal amplitude in SCRs	-
		Rise Time	SCR rise times	s
		Recovery Time	SCR recovery times	s
		Number of Peaks	Number of SCRs detected in the window	-
	SCL	Amplitude	Signal amplitude of the tonic component	$μ$ S
		1° and 2° derivative	Slope and magnitude of the tonic component	-
		AUC	Area under the tonic component	-
	Ns.SCR	Peak Amplitude	Ns.SCR peak Amplitudes	$μ$ S
		AUC	Area under the Ns.SCRs	-
		Number of Peaks	Number of Ns.SCRs detected in the window	-
		Peak distance	Distance between consecutive Ns.SCRs	-
Frequency	FT	FFT	Frequency Band coefficients through Fast Fourier Transform	-
	FT	STFT	Frequency Band coefficients through Short Time Fourier Transform	-
	PSD	Signal Energy	Energy signal	-
		Spectral Power	Spectral power in the [0.05–0.5 Hz] bands	-
		Ns.SCR	Frequency spectrum associated to the Ns.SCRs	-
Wavelets	CWT	Amplitude	Amplitude of the Wavelet Transform	-
	CWT	Morlet coeffs	Amplitude of Morlet WT coefficients between 0.5 and 50 Hz	-
	DWT	Haar WT	Coefficients at 4, 2 and 1 Hz	-

Table 2. Dataset participants, durations and EDA recorded variation for each user (intraindividual) and between users (interindividual). Shown EDA values and ranges only with user activity detected.

Database	Property	Max	Min	Median
GB2S-45 Users-4 Hz	EDA ( $μ$ S)	20	0.7	1.5
	Recording (min)	23	18	21
	Intraindividual EDA variation	20.77	0.20	1.0
	Interindividual EDA variation	24.07	0.06	12.19
Exams-10 Users-4 Hz	EDA ( $μ$ S)	7.14	0.00	0.01
	Recording (min)	23	18	21
	Intraindividual EDA variation	7.40	2.10	3.86
	Interindividual EDA variation	6.48	0.00	0.12

Table 3. List of proposed features extracted from the transformed and normalized signal R. Tendency features are used for rising and falling sections separately. Min, max, median, std. deviation and IQR extracted from all feature types except the count cases.

Domain	Property	Feature	Details
Height	R normalized	Height	Normalized height
	R normalized	Difference	First difference of consecutive height values
	Local Maxima	Height	Height of the local maxima
		Count	Number of local maxima in the interval
		Difference	Height difference between consecutive local maxima
	Local Minima	Height	Height of the local minima
		Count	Number of local minima in the interval
		Difference	Height difference between consecutive local minima
Time	Local Maxima	Position	Position of the local maxima in interval
	Local Maxima	Difference	Time difference between consecutive local maxima
	Local Minima	Position	Position of the local maxima in interval
	Local Minima	Difference	Time difference between consecutive local minima
Tendency	Local Maxima	Height	Section height from start to finish.
		Duration	Section length in time.
		Length	Section length in number of section elements.
		Count	Number of sections in the interval
	Local Minima	Height	Section height from start to finish.
		Duration	Section length in time.
		Length	Section length in number of section elements.
		Count	Number of sections in the interval

Table 4. Windowed Normalization approaches for the different types of proposed features after AN transformation.

Feature Type	Window Normalization	Example
Height	No additional normalization required	Maxima Height Median
Length	Min-Max normalization over the windowed values	Minima Fall interval duration
Number	Normalized using the total number of samples in the window	Maxima count

Table 5. Explained variance of the applied PCA by number of components used over the dataset with the 9 selected features.

No. Components	1	2	3	4	5	6
Explained Variance	0.7	0.75	0.85	0.90	0.92	0.95

Table 6. Cluster descriptions for the proposed K = 4 cluster extracted with K-means over the data with PCA and 6 components.

Cluster	Centroid	Description	% Samples
0	No Fall sections, only Rising sections	Intervals where the original EDA signal tends to decrease monotonously—no response	20.7
1	Fall and rise sections in window	SCR with rise and recovery times (no overlap)	52.7
2	No minima, major height diff. features	Stable sections (low volatility) with some eventual volatility case but not SCR necessarily	17.0
3	Fall section, major fall height	Volatility detected but no recovery phase (not finished SCR-Overlapping)	10.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Viana-Matesanz, M.; Sánchez-Ávila, C. Adaptive Normalization and Feature Extraction for Electrodermal Activity Analysis. Mathematics 2024, 12, 202. https://doi.org/10.3390/math12020202

AMA Style

Viana-Matesanz M, Sánchez-Ávila C. Adaptive Normalization and Feature Extraction for Electrodermal Activity Analysis. Mathematics. 2024; 12(2):202. https://doi.org/10.3390/math12020202

Chicago/Turabian Style

Viana-Matesanz, Miguel, and Carmen Sánchez-Ávila. 2024. "Adaptive Normalization and Feature Extraction for Electrodermal Activity Analysis" Mathematics 12, no. 2: 202. https://doi.org/10.3390/math12020202

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Normalization and Feature Extraction for Electrodermal Activity Analysis

Abstract

1. Introduction

2. Methodology

2.1. Background on EDA Analysis

2.1.1. Statistical Differences in EDA Properties

2.1.2. Normalization and Analysis

2.2. Proposed Normalization and Feature Extraction Framework

2.2.1. Databases

2.2.2. Filtering and Adaptive Normalization

2.2.3. Feature Extraction

2.2.4. Feature Distribution

2.3. Application to Unsupervised Classification Scenario with Stress Data

2.3.1. Feature Selection

2.3.2. Interval Clustering

3. Results

3.1. Framework Parameters

3.2. Feature Selection

3.3. Clustering Analysis

4. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI