Szczegóły publikacji
Opis bibliograficzny
Data augmentation for sentiment analysis in English – the online approach / Michał JUNGIEWICZ, Aleksander SMYWIŃSKI-POHL // W: Artificial neural networks and machine learning - ICANN 2020 : 29th International Conference on Artificial Neural Networks : Bratislava, Slovakia, September 15–18, 2020 : proceedings, Pt. 2 / eds. Igor Farkaš, Paolo Masulli, Stefan Wermter. — Cham : Springer Nature Switzerland, cop. 2020. — (Lecture Notes in Computer Science ; ISSN 0302-9743 ; LNCS 12397. Theoretical Computer Science and General Issues ; ISSN 0302-9743). — ISBN: 978-3-030-61615-1; e-ISBN: 978-3-030-61616-8. — S. 584–595. — Bibliogr., Abstr. — Publikacja dostępna online od: 2020-10-14
Autorzy (2)
Słowa kluczowe
Dane bibliometryczne
| ID BaDAP | 130729 |
|---|---|
| Data dodania do BaDAP | 2020-10-20 |
| DOI | 10.1007/978-3-030-61616-8_47 |
| Rok publikacji | 2020 |
| Typ publikacji | materiały konferencyjne (aut.) |
| Otwarty dostęp | |
| Wydawca | Springer |
| Konferencja | International Conference on Artificial Neural Networks 2020 |
| Czasopisma/serie | Lecture Notes in Computer Science, Theoretical Computer Science and General Issues |
Abstract
This paper investigates a change of approach to textual data augmentation for sentiment classification, by switching from offline to online data modification. In other words, from changing the data before the training is started to using transformed samples during the training process. This allows utilizing the information about the current loss of the classifier. We try training with examples that maximize, minimize the loss, or are randomly sampled. We observe that the maximizing variant performs best in most cases. We use 2 neural network architectures, 3 data augmentation methods, and test them on 4 different datasets. Our experiments indicate that the switch to the online data augmentation improves the results for recurrent neural networks in all cases and for convolutional networks in some cases. The improvement reaches 2.3% above the baseline in terms of accuracy, averaged over all datasets, and 2.25% on one of the datasets, but averaged over dataset sizes.