Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In Annals of translational medicine

Background : Entity relation extraction is an important task in the construction of professional knowledge graphs in the medical field. Research on entity relation extraction for academic books in the medical field has revealed that there is a great difference in the number of different entity relations, which has led to the formation of a typical unbalanced data set that is difficult to recognize but has certain research value.

Methods : In this article, we propose a new entity relation extraction method based on data augmentation. According to the distribution of individual entity relation classes in the data set, the probability of whether a text is augmented during training was calculated. In text-oriented data augmentation, different augmentation methods perform differently in different language environments. The reinforcement of learning determines which data augmentation method to use in the current language environment. This strategy was applied to the entity relation extraction of the medical professional book, Pharmacopoeia of the People's Republic of China, and different data augmentation methods (i.e., no data augmentation, traditional data augmentation, and reinforcement learning-based data augmentation) were compared under the same neural network model.

Results : The deep-learning model using data augmentation was better than the model without data augmentation, as data augmentation significantly improved the evaluation indicators of the relation classes with low data volumes in the unbalanced data set and slightly improved the evaluation indicators of the relation classes with sufficient features and large data volumes. Additionally, the deep-learning model using reinforcement learning-based data augmentation was superior to the deep-learning model using traditional data augmentation. We found that after the application of reinforcement learning-based data augmentation, the evaluation indicators of the multiple relation classes were much better than those to which reinforcement learning-based data augmentation had not been applied.

Conclusions : For unbalanced data sets, data augmentation can effectively improve the ability of the deep-learning model to obtain data features, and reinforcement learning-based data augmentation can further enhance this ability. Our experiments confirmed the superiority of reinforcement learning-based data augmentation.

Wang Anli, Li Linyi, Wu Xuehong, Zhu Jianping, Yu Shanshan, Chen Xi, Li Jianhua, Zhu Hongtao

2022-Oct

Data augmentation, medical entity and relation extraction, reinforcement learning, unbalanced data set