Szczegóły publikacji
Opis bibliograficzny
Text annotation automation for hate speech detection using SVM-classifier based on feature extraction / Shoffan SAIFULLAH, Nur Heri Cahyana, Yuli Fauziah, Agus Sasmito Aribowo, Felix Andika DWIYANTO, Rafał DREŻEWSKI // W: ICSSET 2022 [Dokument elektroniczny] : 2nd International Conference Series on Science, Engineering, and Technology : 22 June 2022, Sidoarjo, Indonesia. — Wersja do Windows. — Dane tekstowe. — [Indonesia] : AIP Publishing, [2024]. — (AIP Conference Proceedings ; ISSN 0094-243X ; vol. 3167). — e-ISBN: 978-0-7354-5005-9. — S. 040003-1–040003-7. — Wymagania systemowe: Adobe Reader. — Bibliogr. s. 040003-5–040003-7, Abstr. — S. Saifullah - dod. afiliacja: University of Pembangunan Nasional Veteran Yogyakarta, Yogyakarta, Indonesia
Autorzy (6)
- AGHSaifullah Shoffan
- Cahyana Nur Heri
- Fauziah Yuli
- Aribowo Agus Sasmito
- AGHDwiyanto Felix Andika
- AGHDreżewski Rafał
Słowa kluczowe
Dane bibliometryczne
| ID BaDAP | 154726 |
|---|---|
| Data dodania do BaDAP | 2024-08-06 |
| Tekst źródłowy | URL |
| DOI | 10.1063/5.0218034 |
| Rok publikacji | 2024 |
| Typ publikacji | materiały konferencyjne (aut.) |
| Otwarty dostęp | |
| Wydawca | American Institute of Physics |
| Czasopismo/seria | AIP Conference Proceedings |
Abstract
This article aims to develop a semi-supervised method for automatically annotating hate speech in social media using natural language processing (NLP) techniques. The approach is based on a Support Vector Machine (SVM) classifier that combines feature extraction algorithms, including ensemble meta-learners and meta-vectorizers. The system was trained on a dataset of 13,169 elements, and the results show that the accuracy of the model is highly dependent on the feature extraction method used. The optimal automatic annotation was achieved using TF-IDF feature extraction, resulting in an accuracy of 92.5%. The implications of this study are that automated hate speech annotation using NLP techniques can significantly improve the accuracy, reliability, and inclusiveness of identifying hate speech online. The results of this study suggest that SVM and TF-IDF are the most suitable methods for this task.