In Neural networks : the official journal of the International Neural Network Society
Compared with relatively easy feature creation or generation in data analysis, manual data labeling needs a lot of time and effort in most cases. Even if automated data labeling seems to make it better in some cases, the labeling results still need to be checked and verified by manual. The High Dimension and Low Sample Size (HDLSS) data are therefore very common in data mining and machine learning. For classification problems with the HDLSS data, due to data piling and approximate equidistance between any two input points in high-dimension space, some traditional classifiers often give poor predictive performance. In this paper, we propose a Maximum Decentral Projection Margin Classifier (MDPMC) in the framework of a Support Vector Classifier (SVC). In the MDPMC model, the constraints of maximizing the projection distance between decentralized input points and their supporting hyperplane are integrated into the SVC model in addition to maximizing the margin of two supporting hyperplanes. On ten real HDLSS datasets, the experiment results show that the proposed MDPMC approach can deal well with data piling and approximate equidistance problems. Compared with SVC with Linear Kernel (SVC-LK) and Radial Basis Function Kernel (SVC-RBFK), Distance Weighted Discrimination (DWD), weighted DWD (wDWD), Distance-Weighted Support Vector Machine (DWSVM), Population-Guided Large Margin Classifier (PGLMC), and Data Maximum Dispersion Classifier (DMDC), MDPMC obtains better predictive accuracy and lower classification errors than the other seven classifiers on the HDLSS data.
Zhang Zhiwang, He Jing, Cao Jie, Li Shuqing
2022-Oct-22
Classification, High dimension, Low sample size, Support vector classifier