In IEEE transactions on neural networks and learning systems
Attribute reduction, also called feature selection, is one of the most important issues of rough set theory, which is regarded as a vital preprocessing step in pattern recognition, machine learning, and data mining. Nowadays, high-dimensional mixed and incomplete data sets are very common in real-world applications. Certainly, the selection of a promising feature subset from such data sets is a very interesting, but challenging problem. Almost all of the existing methods generated a cover on the space of objects to determine important features. However, some tolerance classes in the cover are useless for the computational process. Thus, this article introduces a new concept of stripped neighborhood covers to reduce unnecessary tolerance classes from the original cover. Based on the proposed stripped neighborhood cover, we define a new reduct in mixed and incomplete decision tables, and then design an efficient heuristic algorithm to find this reduct. For each loop in the main loop of the proposed algorithm, we use an error measure to select an optimal feature and put it into the selected feature subset. Besides, to deal more efficiently with high-dimensional data sets, we also determine redundant features after each loop and remove them from the candidate feature subset. For the purpose of verifying the performance of the proposed algorithm, we carry out experiments on data sets downloaded from public data sources to compare with existing state-of-the-art algorithms. Experimental results showed that our algorithm outperforms compared algorithms, especially in classification accuracy.
Thuy Nguyen Ngoc, Wongthanavasu Sartra