In Computers in biology and medicine
Class imbalance and the presence of irrelevant or redundant features in training data can pose serious challenges to the development of a classification framework. This paper proposes a framework for developing a Clinical Decision Support System (CDSS) that addresses class imbalance and the feature selection problem. Under this framework, the dataset is balanced at the data level and a wrapper approach is used to perform feature selection. The following three clinical datasets from the University of California Irvine (UCI) machine learning repository were used for experimentation: the Indian Liver Patient Dataset (ILPD), the Thoracic Surgery Dataset (TSD) and the Pima Indian Diabetes (PID) dataset. The Synthetic Minority Over-sampling Technique (SMOTE), which was enhanced using Orchard's algorithm, was used to balance the datasets. A wrapper approach that uses Chaotic Multi-Verse Optimisation (CMVO) was proposed for feature subset selection. The arithmetic mean of the Matthews correlation coefficient (MCC) and F-score (F1), which was measured using a Random Forest (RF) classifier, was used as the fitness function. After selecting the relevant features, a RF, which comprises 100 estimators and uses the Information Gain Ratio as the split criteria, was used for classification. The classifier achieved a 0.65 MCC, a 0.84 F1 and 82.46% accuracy for the ILPD; a 0.74 MCC, a 0.87 F1 and 86.88% accuracy for the TSD; and a 0.78 MCC, a 0.89 F1and 89.04% accuracy for the PID dataset. The effects of balancing and feature selection on the classifier were investigated and the performance of the framework was compared with the existing works in the literature. The results showed that the proposed framework is competitive in terms of the three performance measures used. The results of a Wilcoxon test confirmed the statistical superiority of the proposed method.
Sreejith S, Khanna Nehemiah H, Kannan A
Chaotic maps, Class imbalance, Classification, Clinical decision support system, Feature selection, Multi Verse Optimisation, SMOTE