ArXiv Preprint
Classification algorithms using Transformer architectures can be affected by
the sequence length learning problem whenever observations from different
classes have a different length distribution. This problem brings models to use
sequence length as a predictive feature instead of relying on important textual
information. Even if most public datasets are not affected by this problem,
privately corpora for fields such as medicine and insurance may carry this data
bias. This poses challenges throughout the value chain given their usage in a
machine learning application. In this paper, we empirically expose this problem
and present approaches to minimize its impacts.
Jean-Thomas Baillargeon, Luc Lamontagne
2022-12-16