In Computers in biology and medicine
Cardiovascular disease (CVD) is the most fatal disease in the world, so its accurate and automated detection in the early stages will certainly support the medical expert in timely diagnosis and treatment, which can save many lives. Many types of research have been carried out in this regard, but due to the problem of data imbalance in the medical and health care sector, it may not provide the desired results in all aspects. To overcome this problem, a sequential ensemble technique has been proposed that detects 6 types of cardiac arrhythmias on large ECG imbalanced datasets, and the data imbalanced issue of the ECG dataset has been addressed by using a hybrid data resampling technique called "Synthetically Minority Oversampling Technique and Tomek Link (SMOTE + Tomek)". The sequential ensemble technique employs two distinct deep learning models: Convolutional Neural Network (CNN) and a hybrid model, CNN with Long Short-Term Memory Network (CNN-LSTM). The two standard datasets "MIT-BIH arrhythmias database" (MITDB) and "PTB diagnostic database" (PTBDB) are combined and extracted 23, 998 ECG beats for the model validation. In this work, the three models CNN, CNN-LSTM, and ensemble approach were tested on four kinds of ECG datasets: the original data (imbalanced), the data sampled using a random oversampled technique, data sampled using SMOTE, and the dataset resampled using SMOTE + Tomek algorithm. The overall highest accuracy was obtained of 99.02% on the SMOTE + Tomek sampled dataset by ensemble technique and the minority class accuracy result (Recall) is improved by 20% as compared to the imbalanced data.
Rai Hari Mohan, Chatterjee Kalyan, Dashkevych Serhii
Convolutional neural network (CNN), Deep learning, Electrocardiogram (ECG), Ensemble technique, Long short-term memory network (LSTM), Synthetically minority oversampling technique (SMOTE)