In Health information science and systems
Given the demand for developing the efficient Machine Learning (ML) classification models for healthcare data, and the potentiality of Bio-Inspired Optimization (BIO) algorithms to tackle the problem of high dimensional data, we investigate the range of ML classification models trained with the optimal subset of features of PD data set for efficient PD classification. We used two BIO algorithms, Genetic Algorithm (GA) and Binary Particle Swarm Optimization (BPSO), to determine the optimal subset of features of PD data set. The data set chosen for investigation comprises 756 observations (rows or records) taken over 755 attributes (columns or dimensions or features) from 252 PD patients. We employed MaxAbsolute feature scaling method to normalize the data and one hold cross-validation method to avoid biased results. Accordingly, the data is split in to training and testing set in the ratio of 70% and 30%. Subsequently, we employed GA and BPSO algorithms separately on 11 ML classifiers (Logistic Regression (LR), linear Support Vector Machine (lSVM), radial basis function Support Vector Machine (rSVM), Gaussian Naïve Bayes (GNB), Gaussian Process Classifier (GPC), k-Nearest Neighbor (kNN), Decision Tree (DT), Random Forest (RF), Multilayer Perceptron (MLP), Ada Boost (AB) and Quadratic Discriminant Analysis (QDA)), to determine the optimal subset of features (reduction of dimensionality) contributing to the highest classification accuracy. Among all the bio-inspired ML classifiers employed: GA-inspired MLP produced the maximum dimensionality reduction of 52.32% by selecting only 359 features and delivering 85.1% of the classification accuracy; GA-inspired AB delivered the maximum classification accuracy of 90.7% producing the dimensionality reduction of 41.43% by selecting only 441 features; And, BPSO-inspired GNB produced the maximum dimensionality reduction of 47.14% by selecting 396 features and delivering the classification accuracy of 79.3%; BPSOMLP delivered the maximum classification accuracy of 89% and produced 46.48% of the dimensionality reduction by selecting only 403 features.
Pasha Akram, Latha P H
Binary particle swarm optimization, Bio-inspired computing, Classification, Data mining, Dimensionality reduction, Feature selection, Genetic algorithm, Machine learning