Automated Hertbeat Abnormality Detection Using Realtime R-Assisted Lightweight Feature Extraction Algorithm

Automated Electrocardiogram (ECG) processing is an important technique which helps in identifying abnormalities in the heart before any formal diagnosis. This research presents a real-time and lightweight R-assisted feature extraction algorithm and a heartbeat classification scheme which achieves highly accurate abnormality detection. In the proposed algorithm, we extract fifteen features from each heartbeat taken from raw LeadII ECG signals. The features carry medically valuable information such as locations, amplitude and energy of ECG waves (P, Q, R, S, T waves) which are then used for detection of any abnormality that might be present in the heartbeat using various classification algorithms. We have used four popular databases from Physionet and extracted ten thousand ECG signals from each for training the models and benchmarking results. Four classification models i.e. Naïve Bays, k-Nearest Neighbor, Neural Network, Decision Tree were used for abnormality detection validating the efficiency of the system. Keywords—abnormal ECG; ECG processing, feature extraction; heart beat classification, abnormality detection.


I. INTRODUCTION
Electrocardiogram (ECG) represents electronic activities of myocardium.The ideal representation of a heartbeat on a Lead II ECG signal is shown in Figure 1.Horizontal axis represents time and the vertical axis is amplitude of the signal.The positions and shapes of these waves are analysed by physicians to detect and diagnose abnormalities which may be present in the heart.This process can be automated using computer programs, helping the physicians quickly find the region of the ECG where problems might persist.
Cardio vascular disease is a leading cause of death worldwide.It results in around 17.3 million fatalities per year which is predicted to rise rapidly with around 23.6 million deaths per year expected by 2030 [1].Because of this there is a rapid increase in heart problem diagnosis related hardware.There are more than 120 ECG monitoring systems designed especially for older adults trying to tackle this problem [2], Numerous studies about automated ECG processing algorithms can be found in the literature.In recent works [3], [4], advantages and limitations of various ECG processing methods are discussed.Automated ECG processing generally consists of four main stages; 1) Noise reduction, 2) Feature extraction, 3) Feature generation or optimization, 4) Classification.Various noise reduction methods can be applied to ECG signals such as wavelet transform [5]- [8], adaptive filters [9]- [11] etc.Although they seem to help the removing errors in early stage, the complex methods add more computational load and are not suitable for lightweight analysis.The next stage is feature extraction.There are numerous methods of ECG feature extraction available in the literature.Some of the methods are summarised in the following surveys [3], [12], [13] The most popular and widely used feature extraction technique is wavelet transform [14].There are many other accurate and computationally expensive methods such as Fast Fourier Transform [15]- [17], Linear Prediction [18], [19], and Independent Component Analysis [20]- [22].Spectral Analysis [23], [24] , Hilbert transform [25] etc.
In the final phase, extracted features are classified using various statistical classification techniques.The most common classification methods used in ECG analysis are Neural Networks (NN), Support Vector Machine (SVM), and Decision Tree (DTree) [26].A Lead II ECG signal consists of P, Q, R, S and T waves [27].The standard extractable features are marked in Figure 1.The most commonly used features in abnormality detection are the location and duration of each of the waves in each heartbeat, and their shapes.Locating exact start and end regions of a wave is a highly difficult task.Because of this, extraction of features such as PR-Segment, PR-Interval, ST-Segment and QT-Interval, are not suitable for lightweight analysis.However, finding peaks and minima locations are relatively easy and can be used to construct position-based features.Therefore, we propose a lightweight scheme using this idea to find pseudo locations of the mentioned ECG waves and construct nonstandard features.The constructed features convey a subset of medically important information which are used for identification of abnormal heartbeats.Previously, we have proposed multiple standalone ECG processing schemes such as [28], [29], however, in this work, we intend to propose a lightweight system and benchmark it using multiple databases and classification techniques.
The remainder of the paper is organised as follows.In Section II we describe the various methodologies used during the experiments.Section III discusses the experimental setup along with the databases used and classification model configurations.In Section IV experimental results are explained.Finally, the paper is concluded with remarks and future works in Section V.

II. METHODOLOGY
In this section we discuss the methodologies used in all the experiments.Figure 2 represents a complete heartbeat abnormality detection system from ECG signals.In the first stage, a QRS detector locates the position of a heartbeat.Next, ECG features are extracted for the corresponding heartbeat.To improve accuracy some extra features are calculated in the following stage.Finally, all the features are fed to the classifiers for abnormality detection.The proposed system does not use any conventional noise reduction process.Instead, a moving window average filter is used during pre-processing to reduce spikes.In addition to that, our proposed system does not include its own QRS detector as it assumes that the signal to be processed is already R-annotated by other lightweight QRS detectors such as [30]- [32] etc. Feature extraction and generation are included along with multiple classification techniques for benchmarking.Especially, feature extraction and feature generation are elaborated in the following.

A. Feature Extraction
The proposed algorithm takes the raw Lead II ECG signal and the R wave locations as inputs and outputs the relative locations of the rest of the waves for the corresponding heartbeat.The feature extraction algorithm only extracts pseudo locations of P, Q, S, and T waves.'Pseudo' is used in the sense that the waves might not be present or even be inverted, however, the algorithm will always return assumed locations which might not represent the exact locations of the corresponding wave.The algorithm is real-time and requires at least three R locations to compute outputs.The pseudo-code of the proposed algorithm is given below.

Algorithm: R Assisted feature extraction
Input:

B. Feature Generation
In our experiments we have used a total of 15 features including relative locations and amplitudes of wave peaks and wave energy.All the used features are summarized in Table 1.The first four features are the pseudo locations of P, Q, S, and T waves.The fifth feature is minima in T region.This feature is important because T waves comes in different shapes and sometimes the curve might come with upside down.Feature six to Feature eleven are amplitudes of the corresponding waves relative to the baseline of the signal.The last three features are P, QRS and T region energy relative to the baseline.These features indirectly carry information about approximate durations and shapes of the waves without having to know their corresponding exact boundaries.Energy for each of the region is calculated using the formula shown in Table 1.The mentioned features are also illustrated on top of an ideal Lead II ECG signal in Figure 4.The numbers in the Figure represent the respective features described in Table 1.

A. Databases
We have employed raw ECG signals extracted from Physionet Databank [33].Four popular databases were used during the evaluation including MIT-BIH Arrhythmia Database [34], QT Database [35], and European ST-T Database [36] and St.-Petersburg Institute of Cardiological Technics 12-lead Arrhythmia Database [ref].The number of normal heartbeats is very high compared to the number of abnormal heartbeats in all the mentioned databases.To avoid this class imbalance problem, five thousand normal and abnormal heartbeats were randomly taken from each of the databases for evaluation.

B. Experimental Sequence
The sequence of operation for ECG abnormality detection is illustrated in Figure 2.However, we did not include a QRS detection algorithm, so, the corresponding stage is excluded.Instead, we used the R wave annotations directly from the databases.From the proposed the pseudo locations of P, Q, S and T waves are extracted and used for construction of fifteen features as described in Table 1.The feature examples were fed to the four classifiers including Neighbours (KNN), Naive Bayes Classifier (N-Bayes), Decision Tree and Artificial Neural Network (ANN).The feature extraction algorithm is implemented using Matlab [37] software and the abnormality detection process is benchmarked using Rapidminer [38] visual scripting with default settings for each of the classifiers.The visual benchmarking process design used in Rapidminer is shown in Figure 5. 10-Fold cross validation is used for performance comparison using all the classifiers mentioned on each of the test databases.

IV. EXPERIMENTAL RESULTS
In this section, the classification results are described in detail.First, the performance metrices are discussed then the results obtained using multiple classifiers on mentioned databases are shown and explained.

A. Performance Metrices
During the benchmarking multiple performance metrices are used.The number of abnormal heartbeats which are truly identified by the classifiers is considered true positive (TP).On the other hand, the number of abnormal heartbeats which are classified as normal is considered as a false positive (FP).Similarly, the number of normal heartbeats which are truly identified by the classifiers is considered to be a true negative (TN) and the number of normal heartbeats which are classified as abnormal is considered as false negative (FN).Sensitivity (SN), Positive Predictivity (PP), and Overall Accuracy (OA) are used to evaluate the classifiers' performance along with overall accuracy (OA) which can be defined as follows:

B. Results
Table 2 shows the summary of the experiments which is also plotted on Figure 6.It is very clear that the classifier wise ANN has the best performance and managed to detect abnormalities with an average of 96.54% of overall accuracy.On the other hand, KNN performed most poorly.Decision Tree also performed well with average overall accuracy of more than 90%.Database wise, the system worked the best on INCARTDB and poorly on QTDB.This might be because of proportional variation of abnormal heartbeats present in the QTDB.We have used only four types of abnormalities with most examples from EDB and MITDB which led to relatively highly accurate results.Furthermore, a detailed version of comparison results can be found in the appendix describing results on each of the mentioned databases.
V. CONCLUSION In this research we presented a window-based lightweight real-time R-assisted feature extraction algorithm which can extract valuable information from raw Lead II ECG signals with the help of QRS detectors.Given the location of R waves, the proposed abnormality detection scheme extracts 15 features and passes them to diverse classifiers for diagnosis.The performance of the system is benchmarked using four popular ECG databases.One of the shortcomings of the system is that it cannot work without the assistance of a R-Peak detector.Furthermore, although it is a common strategy, the extracted features do not accurately represent the locations of the waves, especially, if the corresponding heartbeat is abnormal.Even though they are not accurate locations, the unconventional positioning of the expected wave locations produces feature value deviations which in turn, contributes to abnormality detection.
In future work, this algorithm can be embedded with lightweight QRS detectors to produce a complete heartbeat abnormality detection system.Due to its lower processing power requirement, it could be very useful for implementation in mobile or hand-held devices making an efficient real-time abnormality detection system while consuming a lesser amount of battery power.

Figure 2 :
Figure 2: Experiment sequence for abnormality detection

Figure 4 :
Figure 3 : Feature search windows

Table 2 :
Performances of classifiers