ArXiv Preprint
Nowadays the medical domain is receiving more and more attention in
applications involving Artificial Intelligence. Clinicians have to deal with an
enormous amount of unstructured textual data to make a conclusion about
patients' health in their everyday life. Argument mining helps to provide a
structure to such data by detecting argumentative components in the text and
classifying the relations between them. However, as it is the case for many
tasks in Natural Language Processing in general and in medical text processing
in particular, the large majority of the work on computational argumentation
has been done only for English. This is also the case with the only dataset
available for argumentation in the medical domain, namely, the annotated
medical data of abstracts of Randomized Controlled Trials (RCT) from the
MEDLINE database. In order to mitigate the lack of annotated data for other
languages, we empirically investigate several strategies to perform argument
mining and classification in medical texts for a language for which no
annotated data is available. This project shows that automatically translating
and project annotations from English to a target language (Spanish) is an
effective way to generate annotated data without manual intervention.
Furthermore, our experiments demonstrate that the translation and projection
approach outperforms zero-shot cross-lingual approaches using a large masked
multilingual language model. Finally, we show how the automatically generated
data in Spanish can also be used to improve results in the original English
evaluation setting.
Anar Yeginbergenova, Rodrigo Agerri
2023-01-25