Szczegóły publikacji

Opis bibliograficzny

Automated text annotation using a semi-supervised approach with meta vectorizer and machine learning algorithms for hate speech detection / Shoffan SAIFULLAH, Rafał DREŻEWSKI, Felix Andika DWIYANTO, Agus Sasmito Aribowo, Yuli Fauziah, Nur Heri Cahyana // Applied Sciences (Basel) [Dokument elektroniczny]. — Czasopismo elektroniczne ; ISSN 2076-3417. — 2024 — vol. 14 iss. 3 art. no. 1078, s. 1–19. — Wymagania systemowe: Adobe Reader. — Bibliogr. s. 17–19, Abstr. — Publikacja dostępna online od: 2024-01-26. — S. Saifullah - dod. afiliacja: Department of Informatics, Universitas Pembangunan Nasional Veteran Yogyakarta, Indonesia. — R. Dreżewski - dod. afiliacja: Artificial Intelligence Research Group (AIRG), Informatics Department, Faculty of Industrial Technology, Universitas Ahmad Dahlan, Indonesia. — F. A. Dwiyanto - dod. afiliacja: Department of Electrical Engineering, Universitas Negeri Malang, Malang, Indonesia


Autorzy (6)


Słowa kluczowe

text miningsemi-supervised learningself-learningsentiment analysishate speech detectionmachine learning

Dane bibliometryczne

ID BaDAP151683
Data dodania do BaDAP2024-01-30
Tekst źródłowyURL
DOI10.3390/app14031078
Rok publikacji2024
Typ publikacjiartykuł w czasopiśmie
Otwarty dostęptak
Creative Commons
Czasopismo/seriaApplied Sciences (Basel)

Abstract

Text annotation is an essential element of the natural language processing approaches. The manual annotation process performed by humans has various drawbacks, such as subjectivity, slowness, fatigue, and possibly carelessness. In addition, annotators may annotate ambiguous data. Therefore, we have developed the concept of automated annotation to get the best annotations using several machine-learning approaches. The proposed approach is based on an ensemble algorithm of meta-learners and meta-vectorizer techniques. The approach employs a semi-supervised learning technique for automated annotation to detect hate speech. This involves leveraging various machine learning algorithms, including Support Vector Machine (SVM), Decision Tree (DT), K-Nearest Neighbors (KNN), and Naive Bayes (NB), in conjunction with Word2Vec and TF-IDF text extraction methods. The annotation process is performed using 13,169 Indonesian YouTube comments data. The proposed model used a Stemming approach using data from Sastrawi and new data of 2245 words. Semi-supervised learning uses 5%, 10%, and 20% of labeled data compared to performing labeling based on 80% of the datasets. In semi-supervised learning, the model learns from the labeled data, which provides explicit information, and the unlabeled data, which offers implicit insights. This hybrid approach enables the model to generalize and make informed predictions even when limited labeled data is available (based on self-learning). Ultimately, this enhances its ability to handle real-world scenarios with scarce annotated information. In addition, the proposed method uses a variety of thresholds for matching words labeled with hate speech ranging from 0.6, 0.7, 0.8, to 0.9. The experiments indicated that the DT-TF-IDF model has the best accuracy value of 97.1% with a scenario of 5%:80%:0.9. However, several other methods have accuracy above 90%, such as SVM (TF-IDF and Word2Vec) and KNN (Word2Vec), based on both text extraction methods in several test scenarios.

Publikacje, które mogą Cię zainteresować

artykuł
Semi-supervised text annotation for hate speech detection using K-nearest neighbors and term frequency-inverse document frequency / Nur Heri Cahyana, Shoffan SAIFULLAH, Yuli Fauziah, Agus Sasmito Aribowo, Rafał DREŻEWSKI // International Journal of Advanced Computer Science and Applications (IJACSA) ; ISSN 2158-107X. — 2022 — vol. 13 no. 10, s. 147-151. — Bibliogr. s. 150-151, Abstr. — S. Saifullah - dod. afiliacja: Department of Informatics, Universitas Pembangunan Nasional Veteran Yogyakarta Yogyakarta, Indonesia
fragment książki
Sentiment analysis using machine learning approach based on feature extraction for anxiety detection / Shoffan SAIFULLAH, Rafał DREŻEWSKI, Felix Andika DWIYANTO, Agus Sasmito Aribowo, Yuli Fauziah // W: Computational Science – ICCS 2023 : 23rd international conference : Prague, Czech Republic, July 3–5, 2023 : proceedings, Pt. 2 / eds. Jiří Mikyška [et al.]. — Cham, Switzerland : Springer, cop. 2023. — (Lecture Notes in Computer Science ; ISSN 0302-9743 ; LNCS 14074). — ISBN: 978-3-031-36020-6; e-ISBN: 978-3-031-36021-3. — S. 365–372. — Bibliogr., Abstr. — Publikacja dostępna online od: 2023-06-26. — S. Saifullah - dod. afiliacja: Universitas Pembangunan Nasional Veteran Yogyakarta, Indonesia