DATA CARTOGRAPHY BASED AUGMENTATION TECHNIQUES FOR STANCE DETECTION

BIANCA-STEFANIA MUSAT; Cornelia CARAGEA; Florentina HRISTEA

doi:10.62229/aubinf63/93-113

Authors

Bianca-Ștefania MUȘAT Quant Risk Analyst at London Stock Exchange Group Author
Cornelia CARAGEA University of Illinois at Chicago, USA Author
Florentina HRISTEA University of Bucharest, Romania Author

DOI:

https://doi.org/10.62229/aubinf63/93-113

Keywords:

stance detection, data cartography, training dynamics, data augmentation

Abstract

Stance detection is the task of determining whether the information conveyed in a text is against, neutral, or in favor of a particular target. Since there is a plethora of targets upon which one can adopt a position, one common challenge of the stance detection task is the scarcity of annotations. Conversely, the emphasis on data quantity frequently entails a compromise in terms of the quality of the data. To address both challenges, we propose two data augmentation techniques that leverage training dynamics – the model behavior on individual instances during training – to identify and combine data instances with properties that differ, triggering, for example, the improvement of the generalization capabilities of the model or the enhancement of its optimization process. The first data augmentation method uses training dynamics to generate additional virtual samples during model training by interpolating existing annotated samples with characteristics that differ. The second data annotation approach is defined as a conditional masked language modeling task that generates additional samples by predicting the masked words of the input sentence, conditioned not only on its context but also on an auxiliary sentence sampled based on its characteristics. We empirically validated that fine-tuning a pre-trained language model on a subset of the training data, such that the instances that harm the training process are excluded, achieves better performance as compared to the same model fine-tuned on the entire training dataset. Moreover, in most cases, the performance of the existing augmentation approaches was also improved by using data with properties that differ during the annotation process, as opposed to random sampling.

Author Biographies

Bianca-Ștefania MUȘAT, Quant Risk Analyst at London Stock Exchange Group

Quant Risk Analyst at London Stock Exchange Group
Cornelia CARAGEA, University of Illinois at Chicago, USA

Full Professor in the Department of Computer Science at the University of Illinois at Chicago, USA, and Adjunct Associate Professor at Kansas State University, USA
Florentina HRISTEA, University of Bucharest, Romania

Full Professor Univ. Dr. in the Department of Computer Science, at the University of Bucharest, Romania

DATA CARTOGRAPHY BASED AUGMENTATION TECHNIQUES FOR STANCE DETECTION

Authors

DOI:

Keywords:

Abstract

Author Biographies

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)

Latest publications

Information

Language