In Neural networks : the official journal of the International Neural Network Society
Document classification aims to assign one or more classes to a document for ease of management by understanding the content of a document. Hierarchical attention network (HAN) has been showed effective to classify documents that are ambiguous. HAN parses information-intense documents into slices (i.e., words and sentences) such that each slice can be learned separately and in parallel before assigning the classes. However, introducing hierarchical attention approach leads to the redundancy of training parameters which is prone to overfitting. To mitigate the concern of overfitting, we propose a variant of hierarchical attention network using adversarial and virtual adversarial perturbations in 1) word representation, 2) sentence representation and 3) both word and sentence representations. The proposed variant is tested on eight publicly available datasets. The results show that the proposed variant outperforms the hierarchical attention network with and without using random perturbation. More importantly, the proposed variant achieves state-of-the-art performance on multiple benchmark datasets. Visualizations and analysis are provided to show that perturbation can effectively alleviate the overfitting issue and improve the performance of hierarchical attention network.
Poon Hoon-Keng, Yap Wun-She, Tee Yee-Kai, Lee Wai-Kong, Goi Bok-Min
Adversarial training, Machine learning, Neural network, Small-scale datasets, Text classification