1 Introduction
The utilization of simulated signals allows for training, evaluating, or comparing different digital signal processing techniques or pattern recognition algorithms, providing researchers with an unlimited number of test signals for experimentation [1]. While monitoring brain activity through electroencephalographic recordings is widely practiced, assessing the methods for analyzing these signals poses a challenge due to the absence of a reliable gold standard for comparison.
However, to assess various algorithms proposed for signal analysis and pattern detection, researchers often resort to using simulated signals instead of real signals, which typically conform to oversimplified models that do not accurately represent reality. In [2], a system is introduced for generating simulations of evoked potential recordings that exhibit a high level of realism.
This simulation takes into consideration potential variations in latency, width, and amplitude, which are common in real-world scenarios. Event-related potentials in actual contexts can be contaminated by additive and multiplicative noise, as well as affected by recording instrument effects such as analog filtering, sampling, and quantization. Unfortunately, these aspects are often overlooked in most current evaluations.
All these parameters were estimated from real signals described in [3]. Using realistic simulations of evoked potentials and their associated noise and interference, different methods of robust estimation of the evoked response waveform will be evaluated.
2 Methods
2.1 Selection of the Parameters for Realistic Simulation
The simulation scheme (Fig. 1) used was previously proposed in [3] to simulate event-related potentials in a wide sense. The selection of the parameters for the realistic simulations of evoked potentials was carried out in [2].
In block 1 of Fig. 1,
In this specific case, the clean recordings of Auditory Evoked Potentials from healthy individuals, obtained from the database published in [4] and described in [3], are chosen as the basic waveforms for simulation. These signals specifically correspond to auditory brainstem responses (ABR) and are characterized by their short-latency potentials.
The parameter
Similarly, the parameter
The selection of the event period
These parameter values can be adjusted to accommodate the specific choice of the
initial basic epoch and the potential being simulated. Regarding the additive
noise component, denoted as
In this case, it consists of the estimated background noise, the 60 Hz interference, and its harmonics, as well as the alpha rhythm commonly present in many signals from the selected database. To process this noise, a low-pass filter is applied using the coefficients estimated by an 8th-order Burg model. Subsequently, a high-pass filter is employed, following the specifications detailed in Table 1, as outlined in the approach presented in [2].
Filter's design characteristics | Value |
fc in the passband | 30 Hz |
fc in the stopband | 15 Hz |
Attenuation in the passband | 1 dB |
Attenuation in the stopband | -6 dB |
Order | 2 |
To simulate the alpha rhythm, which appears in certain signals and must be considered when analyzing signal non-homogeneities that can impact the estimation of the average signal, a white noise signal is employed. This white noise signal is subjected to bandpass filtering using a second-order Butterworth approximation with cutoff frequencies ranging from 9 to 11 Hz, as detailed in [6].
The amplitude of the alpha rhythm is randomly distributed between 30 µV and 50 µV, following a normal distribution. This distribution aligns with the analysis carried out on the dataset, ensuring consistency with the characteristics of the actual signals. The parameters associated with filtering, sampling, and quantization are derived from the description provided in the database acquisition documentation.
2.2 Average Methods
The coherent average also referred to as the arithmetic mean (M_Mean as denoted in this study), can be computed using the ensemble matrix formed by the simulated 𝑀 evoked responses [7].
In this context, the response
The estimation of the deterministic component of the signal, denoted as
The application of signal averaging assumes that the underlying noise is stationary and follows a normal distribution with a mean of zero. Additionally, the noise variance should be consistent and equal across all potentials.
However, this condition is not always met, which can impact the effectiveness of the coherent average. To address this limitation, various methods have been proposed in the literature, including weighted averaging and robust averaging [1]. In the case of estimating the deterministic component of the signal, a weighted average approach is employed [8], as described by formula 3:
In the weighted average (Weighted_Mean) approach, each evoked response is assigned a weight based on specific criteria. One commonly used criterion is to assign weights based on the variance of the estimated noise in each response.
In this method, a smaller weight is assigned to potentials with higher levels of noise [5, 9, 11, 12]. Equation (4) represents the formulation corresponding to these weight assignment criteria:
In formula 4,
However, these techniques have limitations when occasional artifacts with large amplitude values (outliers) are present. In the literature, a family of estimators known as trimmed mean has been proposed to mitigate contamination by outliers [1, 5, 8]. The trimmed estimators are based on the median, which serves as the Maximum Likelihood (ML) estimator of s when the noise is assumed to follow a Laplacian distribution [1].
To compute the median, the samples in the ensemble matrix are ordered based on their amplitudes independently for each time point relative to the stimulus, regardless of other time points. This independence allows the median averaging to be unaffected by the nonstationarity of the noise within an epoch.
In the trimmed mean methods, the coherent average is combined with the median to obtain the final response. The ensemble matrix is ordered, and a portion of extreme values is discarded or modified, while all other values are used for averaging similarly to conventional mean averaging. It is important to note that the rejection of extreme values differs from the concept of artifact rejection.
The α-trimmed mean (Trimmed_Mean) is one of the most popular trimmed estimators [9]. Equation (5) represents the estimation of the deterministic signal using the α-trimmed mean:
If we compare formula 5 with formula 3, it becomes apparent that the weights can be assigned using the following equation:
where
2.3 Quality Measures
Given that the acquired signal (p) can be modelled as the desired signal (s) plus additive noise (r), as shown in formula 1, by applying different techniques to estimate the desired signal, attenuating the different existing interferences (Fig. 2). The output (y), after applying these techniques to estimate the desired signal (ŝ), can be seen as the combination of the desired signal (s) plus a remaining noise (θ).
The quality of the estimation of a signal segment can be expressed in terms of several parameters. Some of them are the signal-to-noise ratio (SNR) and the bias, which are expressed in equations 7 and 8, respectively [10]:
In each of the above equations, N represents the total number of samples of the segment to be evaluated, θ is the remaining noise in the signal (signal obtained after attenuating the noise minus the ideal signal). The subscript j refers to the jth sample of the affected parameter and s is pure ideal signal. It is important to reach a compromise between bias and SNR (Equations 7 and 8).
Since bias is the factor that indicates the distortion that is introduced by using a given noise and interference attenuation method, it is necessary to achieve high SNR values but low bias values. Unfortunately, in real situations, the pure ideal signal is not available a priori, so it is impossible to use these measures. But in a controlled environment, such as when using simulated signals, these measures can be used.
2.4 Experiment Description
To evaluate the averaging methods described using simulated signals, 100 data sets of 2,000 epochs each were obtained, without adding artifacts, and then the same 100 data sets of 2,000 epochs, where the 10%, 20%, and 30 % of the total samples of the array were randomly contaminated with outliers.
It was decided to add this level of artifacts based on other experiments found in the literature [6]. From each data set, 512 epochs were randomly selected, 100 times, thus forming a Monte Carlo experiment of 100 runs.
Evoked responses were then estimated using the classic Ensemble Average (M Mean), the Trimmed Average (Trimmed Mean), with a 20% trim factor, and the Weighted Average (Weighted Mean). The signal-to-noise ratio and bias values were calculated on the estimated signals to compare the estimation methods.
3 Results and Discussion
3.1 Simulated Noised
To generate the simulated noise signals, we first generated Gaussian white noise, which was subsequently subjected to low-pass filtering using coefficients estimated by the model. Next, the noise was high-pass filtered using a filter with the specifications outlined in Table 1.
Additionally, 60 Hz powerline interference was added to the noise signal. In Fig. 3a, we can observe a comparison of the spectra between one of the background noise signals simulated using this approach and the actual reference signal.
Figure 3b illustrates the comparison of the signals in the time domain. It is worth noting that the analysis was conducted on one-second segments of the signal to ensure stationary conditions. The simulated signal demonstrates the variations that have occurred in the signal's variance over time.
3.2 Simulated EP (Evoked Potential) Records
Figure 4a presents a comparison in the
frequency domain between a simulated signal generated according to the
specifications described earlier. In this case, the initial epoch, denoted as
The tests yielded a consistent NRMSE adjustment of 92.5%. Randomly selected initial epochs were used, and simulated noise, interferences, and artifacts were added, resulting in a signal-to-noise ratio of -26.71 dB. The top part of Figure 5 visually demonstrates the effects of relative displacements that can occur in real scenarios.
This simulation example includes noise, interference, and artifacts. The lower part of Figure 5 shows the ensemble matrix, which combines evoked responses with noise and interference. Due to the high level of noise and interference, with an initial signal-to-noise ratio of -26.71 dB, it is not possible to discern any waveform associated with the desired signal.
It should be noted that the achieved level of realism in this simulation is significantly higher than in previous simulations, making direct comparisons with previous simulations inappropriate.
The codes used in these simulations are available in the GitHub repository for benchmarkingpurposesfn.
3.3 Estimation of Event-Related Potentials Using Realistic Simulation
Table 2 displays the average Signal-to-Noise Ratio (SNR) values and their corresponding standard deviations obtained from the experiment described in section 2.4.
Average Methods | 0% artifacts | 10% artifacts | 20% artifacts | 30% artifacts |
Initial SNR (dB) | -26.04 ± 1.17 | -30.02 ± 1.06 | -32.54 ± 0.93 | -34.15 ± 1.01 |
Mean | -0.20 ± 1.09 | -5.90 ± 0.35 | -5.68 ± 0.78 | -7.92 ± 0.23 |
Weighted Mean | 1.96 ± 0.29 | 0.82 ± 0.04 | -0.25 ± 0.20 | -1.30 ± 0.04 |
Trimmed Mean | -0.66 ± 0.77 | -3.88 ± 0.53 | -0.81 ± 0.35 | -1.83 ± 0.33 |
Average Methods | 0% artifacts | 10% artifacts | 20% artifacts | 30% artifacts |
Mean | 0.16 ± 0.07 | 0.24 ± 0.03 | 0.28 ± 0.02 | 0.31 ± 0.02 |
Weighted Mean | 0.09 ± 0.01 | 0.10 ± 0.01 | 0.23 ± 0.01 | 0.19- ± 0.01 |
Trimmed Mean | 0.17 ± 0.01 | 0.21 ± 0.01 | 0.20 ± 0.01 | 0.15 ± 0.01 |
The results indicate that the Weighted Mean method consistently yielded the highest SNR values across all cases. This suggests that the Weighted Mean method performed better in terms of minimizing the impact of noise and maximizing the clarity of the desired signal compared to the other methods evaluated in the experiment.
Based on the results obtained, a Friedman test was performed to analyze whether there were significant differences in at least one of the averaging methods used for estimation. A value of p < 0.05 was obtained, so at least two combinations have significant differences concerning their means.
A post-hoc was performed using the Bonferroni test with an alpha of 0.05 to determine which combinations have differences. Figure 6 shows the multicomparison between the three average methods used, it can be seen how there are differences between the Weighted Mean method with respect to the other two. The differences between the M Mean and Trimmed Mean are not significant.
A similar analysis was performed when 10%, 20%, and 30% of the samples were contaminated (Fig. 7 - Fig. 9). In this case, the results of the SNR values have significant differences between the three methods. No critical distance overlaps. In all cases, with 0% artifacts, 10%, 20% and 30%, the Weighted_Mean method offered the best results in terms of the value of the signal-to-noise ratio.
With the bias, an analysis similar to that performed with the SNR was performed, and the lowest distortion values of the resulting signal were obtained for the weighted average when there were 0% artifacts and for the trimmed average when there were 30% artifacts.
Figures 10, 11, 12, and 13 show the Bonferroni-Holm post hoc analysis with alpha equal to 0.05 to assess the differences between the mean values of bias obtained for the data set without artifacts and with outliers, after finding through a Friedman test that at least one of the methods had significant differences.
When the data set has 0% artifacts, the lowest degree of distortion is presented by the weighted average, with significant differences concerning the average and trimmed average, however, there are no significant differences between these last two methods.
When the samples are contaminated with artifacts, the best results are offered by the trimmed mean for 20% and 30%, significantly different from the other two methods.
In the case of the SNR calculation, it is not the one that offers the highest value, but let us remember that the objective of these two measures is to provide a compromise ratio. So, when there are artifacts, the best compromise is offered by the trimmed mean.
4 Conclusions
Simulated raw recordings of evoked potentials provide a controlled dataset for benchmarking new methods in detecting evoked responses.
Burg's method, utilizing an 8th-order autoregressive (AR) model, offers a reliable estimate of the background noise. Simulations can also incorporate interferences commonly found in real signals, such as 60 Hz powerline interference, alpha rhythm, and instrumentation channel noises. Furthermore, the simulation scheme allows for the introduction of out-of-range values and impulsive noise.
In the benchmarking study of Averaging Methods using Realistic Simulation of Evoked Potentials, it was observed that the weighted average method yields superior results when the data is free from artifacts.
However, in cases where artifacts are present, the trimmed mean method provides the best compromise in terms of performance.