1. Introduction
Ambulatory wearable heart rate trackers provide physiological measurements during dynamic everyday real-world activities. However, they have been characterized as being less accurate in tasks associated with high levels of movement compared to data acquired in clinic. The accuracy of wearable devices is related to their placement; for example, a device on the wrist is more likely to catch movement artefacts than one on the chest [
1]. Nonetheless, even when placed on the chest, the relative error of a heart rate monitor increases with the level of exercise intensity [
2].
In addition, the performance of automatic heartbeat detection algorithms depends on signal-to-noise ratio [
3]; for example, for signal-to-noise ratios below 5 dB, the R peak detection in electrocardiogram (ECG) is considered unreliable [
4]. Generally, a detector sensitivity and positive predictive value decrease for ambulatory data compared to the standard system in a clinic [
5]. Even when poor-quality data or data corrupted by motion artefacts are excluded from analysis, the accuracy of detectors applied to ambulatory data is typically worse than that for data acquired in clinics [
6]. Georgiou
et al. [
7] pointed out that so far, wearable devices can only be used as a surrogate for heart rate variability at resting or mild exercise conditions, as their accuracy fades out with increasing exercise load.
Hearables are a very convenient wearables modality, owning the privileged position of the head and ear canal on the human body and the fixed distance to vital organs. However, the in-ear ECG [
8] signal, measured between electrodes placed inside the ear canal, has a smaller amplitude that standard ECG acquired from torso and a lower signal-to-noise ratio, making it difficult to automatically detect R peaks with standard algorithms [
9]. On the other hand, earpieces provide a good fit and benefit from collocated position of multiple sensors (electrodes, accelerometer, microphone, and photoplethysmography (PPG) sensor) on an earplug [
10].
The multimodalities of hearables has already been employed in mental stress detection [
11], showing that classification performance can be improved by utilizing heart rate variability features extracted from ear-ECG, with breathing and oxygen saturation features extracted from ear-PPG signals. Multimodality also has a crucial role in artefacts removal [
12], where signals from microphones and accelerometer were used to model artefacts and remove them from ear-EEG recordings. Regarding the estimation of heart rate, it can be monitored using various in-ear signal modalities [
13]:
Sounds (heart tones) [
16].
The multimodality of hearables provides an opportunity for more robust HR estimation by combining heart rate data from various sources using data fusion methods.
Data fusion techniques have the potential to improve estimation accuracy by sensor redundancy (e.g. multiple PPG signals), or by estimating HR from different sensor modalities (PPG and ECG). For example, data fusion can be achieved with weighted average, where weights are automatically adjusted based on signal quality indexes (SQI). This approach ensures that data from high SQI signals, i.e. likely to be more accurate, are used for HR estimation.
In this paper, we evaluate two methods of data fusion: the method described by Li
et al. [
17] and the method proposed by Rankawat and Dubey [
18] for heart rate estimation from simultaneous in-ear ECG and in-ear PPG, recorded on eight subjects while performing 5-minute sitting and walking tasks.
3. Results
Figure 1 shows scatter plots that illustrate the relationship between the HR estimates obtained from source in-ear signals and the HR values derived from data fusion methods to ground true the HR estimation from torso ECG. The highest correlation to the reference for a single source was R=0.38 for PPG1 IR signal, while the highest correlation R=0.60 was obtained when using the Rankawat and Dubey’s method [
18].
Notably, the HR estimates derived from PPG signals tend to be overestimated (frequently above the perfect correlation y=x orange line). On the other hand, HR estimates obtained from in-ear signals using the DeepMF method are more likely to underestimate the true HR values.
It is important to note that the Rankawat and Dubey’s method has the capability to reject outliers and may not provide results in cases where the signal does not have an adequate SQI value. For example, in subject 7 the method gave results only in two segments from thirty.
Table 2 summarized MAE obtained during sitting from in-ear source signals and data fusion methods. The corresponding MAEs values during the walking activity are summarized in
Table 3.
The Rankawat and Dubey’s method consistently demonstrated the lowest mean MAE across subjects for both activities, with values of 8.0 bpm during sitting and 15 bpm during walking. Notably, the Li et al. method outperformed the best single source HR estimation method, specifically during sitting based on the PPG2 IR signal (17 bpm compared to 23 bpm) and during walking (18 bpm compared to 23 bpm for Green PPG2).
Rankawat and Dubey’s method had high MAE values in subjects 2, 5, 10 (
Table 2). However in subject 2 and 10 the method had the lowest MAE value among all. In subject 5, the method follows too closely the results obtained from the DeepMF, while better outcomes are observed when using the Red and IR PPG signals. (
Figure 1).
During walking, Rankawat and Dubey’s method maintained an acceptable MAE (below 5 bpm) in 3 subjects (1, 4 and 8). Otherwise, low acceptable MAE values were only obtained in subject 1 when using PPG2 IR and Green signals.
Figure 2 shows the relationship between MAE of HR estimation and mean SQI values, with each data point representing values for different subjects. For PPG signals, when the SQI values are high, the MAE was consistently below 20 bpm. With SQI values below 50, the MAE tend to rise. In contrast, for in-ear ECG signals, the relationship between MAE and SQI was not as clear. The MAE was low for subjects 1 and 4, even though their SQI is below 50, and higher SQI like subjects 3 and 5 had larger MAE.
4. Discussion
We have shown that data fusion methods have lower MAE than single source HR estimations (
Table 2). Furthermore, data fusion methods reduced the variation of MAE and provided more robust HR estimation, especially during the walking activity (
Table 3) when signals are affected by motion artefacts.
The major concept of data fusion methods is to select the best available sources for HR estimation. The main advantage of Rankawat and Dubey’s method is its ability to reject measurements when there is no valid source (every signal has SQI lower than 0.7). In this case, the method does not provide HR estimation. On the other hand, the Li
et al. [
17] weighting algorithm uses information from all sources, even from low quality ones. When all of them are poor and have very low SQI, the resulting HR estimation will include information from all of them and provide an unreasonable estimation.
In this study, for in-ear recordings during sitting, the HR estimated from five subjects (subjects: 2, 6, 7, 8 and 10), based on individual signals, had MAE greater than 5 bpm. In these situations, Rankawat and Dubey’s method [
25] correctly rejected invalid measurements and kept MAE values at a reasonable level, lower than single source estimation. On the other hand, Li
et al. [
17] algorithm resulted in MAE slightly higher than the best single source method.
For correct data fusion, it is critical to correctly estimate SQIs and to prevent the usage of invalid data when estimating HR. Rahman
et al. [
25] evaluated the performance of different SQIs on synthetic data (ECG recordings with artificially added noises). They found that the performance of SQI considerably fluctuated against varying datasets and concluded that fixed threshold-based SQIs cannot be used as a robust noise detection system. They suggested using adaptive thresholds and machine learning mechanisms to improve signal quality assessment.
In our study, quality SQI estimation was especially challenging in subject 5 (
Figure 1), where SQI for in-ear ECG were overestimated and led to incorrect estimations of HR provided by Rankawat and Dubey’s method.
This was also observed for SQI estimation in an in-ear ECG recording (
Figure 2). SQIs did not seem to be related to MAE, while in a standard scenario, higher SQI should lead to lower MAEs, as in the case of PPG signals. The method used in this study for estimating SQI is based on the correlation of an individual beat with a template built on the average of 30 previous beats. This method seems reasonable for PPG signals where repeatable pulsation has a much larger amplitude than noise. However, it does not seem to be working properly in the case of in-ear ECG signals where signal to noise ratio is likely to be lower (noise level is similar to ECG amplitude).
The in-ear ECG signal requires a more dedicated method for SQI estimation. Improvements in signal quality assessment, for example with deep neural networks [
26] or cascade of classifiers [
27], may further improve the performance of data fusion methods.
Notably, the HR estimates derived from PPG and ECG show opposite trends. When employing the DeepMF method, ECG signals tend to underestimate HR (i.e. miss a few peaks), while PPG signals tend to overestimate it (i.e. identify more peaks than the real ones), as depicted in
Figure 1. This contrasting behaviour makes this problem well-suited for data fusion methods, where the combination of different estimates compensates for distinct and opposite biases, resulting in a more reliable estimation.
Beat detectors for PPG performed well on high-quality PPG signals, but their performance decreases for noisy or low-amplitude signals such as those from in-ears. The reliability of PPG towards HR estimation has been questioned recently. Weiler
et al. [
28] compared averaged HR readings from PPG and ECG signals, and they did not find a statistically significant difference, but when the HR reached a value around 155-160 bpm, a difference of ± 5 bpm was observed. Charlton
et al. [
29] evaluated 8 different beat detectors for PPG and found that detectors performed well on hospital data and at rest, but performed worse during movement, stress, atrial fibrillation, and in neonates. In the study, detectors denoted
MSPTD [
30] and
qppg [
21] (used in this study) performed best, with complementary performance characteristics.
MSPTD looks for peaks in PPG signals without using
a priori knowledge of the characteristics of the signal, while
qppg searches for systolic up-slopes based on their expected characteristics.
The performance of detectors may be improved; for example Galli
et al. [
31] proposed an algorithm that combines three sequential signal processing stages of signal denoising by joint principal component analysis of PPG and accelerometer signals, Fourier-based heart rate measurement, and smoothing HR estimation via Kalman filtering. Galli
et al. [
31] showed that the average deviation from reference values was 1.66 bpm during running and 2.92 bpm during boxing activity. The development of a dedicated onset detector for in-ear PPG signals, such as DeepMF for in-ear ECG, is an interesting route for further study.
Moreover, PPG quality is affected by different skin colours, interfering reflection of light used for measurement, and disturbing optical measurements. Racial bias for blood oxygen saturation measurement using PPG were observed [
32]. Different measurement sites can have thinner epidermis compared with the finger and lower exposure to sunlight and may be less prone to the influence of melanin and pigmentation [
33]. Hartman
et al. [
34] discussed that PPG acquired from different locations vary in amplitude and shape, and in some cases may be unsuitable for analysis. In the Hartman
et al. [
34] study, 95% of recordings from the finger were suitable for analysis, followed by 86% of recordings on the wrist, and 81% on the earlobe.
Data fusion methods seem to be a necessity for PPG in the earlobe location, where the signal amplitudes are smaller and likely to be corrupted by motion artefacts. Data fusion methods and Kalman filter used by Li
et al. [
17] provide ways to reject outlier results. Further improvement of data fusion can be made by modifying the weighting equation. Our observations suggest that better result should come from the fusion operation in a winner-take-all fashion. We hypothesize that data fusion should mostly use the best available signal, and weights should be associated with the best signal and drop rapidly with a relative drop of SQI.
Data fusion methods provide more robust HR estimation than a single cardiovascular signal. In particular, data fusion methods are useful for data recorded during movement, where signals may be affected by motion artefacts. Data fusion methods through integration of multimodal signals available from in-ear location, can enhance the performance of a wearable device in HR tracking.
Future work will include:
- 1.
Enhancement of data fusion methods by refining the assessment of weights.
- 2.
Development of PPG beat detector optimized for low-amplitude in-ear PPG signals.
- 3.
Improvement of SQI estimation methods towards more reliable HR estimation.
Author Contributions
Conceptualization, M.Z. and E.O.; methodology, M.Z. and E.O.; software, M.Z.; validation, E.O. and M.Z.; formal analysis, M.Z. and E.O.; investigation, M.Z.; resources, M.B., A.N, A.M., and H.D.; data curation, M.Z.; writing—original draft preparation, M.Z.; writing—review and editing, E.O. and A.N.; visualization, M.Z. and E.O.; supervision, D.M.; project administration, D.M.; funding acquisition, D.M. All authors have read and agreed to the published version of the manuscript.