In Studies in health technology and informatics ; h5-index 23.0

This paper explores a methodology for bias quantification in transformer-based deep neural network language models for Chinese, English, and French. When queried with health-related mythbusters on COVID-19, we observe a bias that is not of a semantic/encyclopaedical knowledge nature, but rather a syntactic one, as predicted by theoretical insights of structural complexity. Our results highlight the need for the creation of health-communication corpora as training sets for deep learning.

Samo Giuseppe, Bonan Caterina, Si Fuzhen


COVID-19, Corpora, Knowledge Reproduction, Language Models, Natural Language Processing