ArXiv Preprint
According to the Global Burden of Disease list provided by the World Health
Organization (WHO), mental disorders are among the most debilitating
disorders.To improve the diagnosis and the therapy effectiveness in recent
years, researchers have tried to identify individual biomarkers. Gathering
neurobiological data however, is costly and time-consuming. Another potential
source of information, which is already part of the clinical routine, are
therapist-patient dialogues. While there are some pioneering works
investigating the role of language as predictors for various therapeutic
parameters, for example patient-therapist alliance, there are no large-scale
studies. A major obstacle to conduct these studies is the availability of
sizeable datasets, which are needed to train machine learning models. While
these conversations are part of the daily routine of clinicians, gathering them
is usually hindered by various ethical (purpose of data usage), legal (data
privacy) and technical (data formatting) limitations. Some of these limitations
are particular to the domain of therapy dialogues, like the increased
difficulty in anonymisation, or the transcription of the recordings. In this
paper, we elaborate on the challenges we faced in starting our collection of
therapist-patient dialogues in a psychiatry clinic under the General Data
Privacy Regulation of the European Union with the goal to use the data for
Natural Language Processing (NLP) research. We give an overview of each step in
our procedure and point out the potential pitfalls to motivate further research
in this field.
Tobias Mayer, Neha Warikoo, Oliver Grimm, Andreas Reif, Iryna Gurevych
2022-11-22