Szczegóły publikacji

Opis bibliograficzny

Semi-supervised sentiment classification using self-learning and enhanced co-training / Agus Sasmito Aribowo, Siti Khomsah, Shoffan SAIFULLAH // Jurnal Infotel ; ISSN 2085-3688. — 2025 — vol. 17 no. 3, s. 472–489. — Bibliogr. s. 487–489, Abstr. — Publikacja dostępna online od: 2025-08-31

Autorzy (3)

Słowa kluczowe

co trainingsentiment analysissemi supervised learningself learningSSL

Dane bibliometryczne

ID BaDAP162622
Data dodania do BaDAP2025-09-19
Tekst źródłowyURL
DOI10.20895/infotel.v17i3.1344
Rok publikacji2025
Typ publikacjiartykuł w czasopiśmie
Otwarty dostęptak
Creative Commons
Czasopismo/seriaJurnal Infotel

Abstract

Sentiment classification is usually done manually by humans. Manual senti- ment labeling is ineffective. Therefore, automated labeling using machine learning is es- sential. Building a computerized labeling model presents challenges when labeled data is scarce, which can decrease model accuracy. This study proposes a semi-supervised learn- ing (SSL) framework for sentiment analysis with limited labeled data. The framework integrates self-learning and enhanced co-training. The co-training model combines three machine learning methods: Support Vector Machine (SVM), Random Forest (RF), and Lo- gistic Regression (LR). We use TF-IDF and FastText for feature extraction. The co-training model will generate pseudo-labels. Then, the pseudo-labels from models (SVM, RF, LR) are checked to choose the highest confidence — this is called self-learning. This framework is applied to English and Indonesian language datasets. We ran each dataset five times. The performance difference between the baseline model (without pseudo-labels) and SSL (with pseudo-labels) is not significant; the Wilcoxon Signed-Rank Test confirms it, obtaining a p- value < 0.05. Results show that SSL produces pseudo-labels on unlabeled data with quality close to the original labels on unlabeled data. Although the significance test performs well on four datasets, it has not yet surpassed the performance of the supervised classification (baseline). Labeling using SSL proves more efficient than manual labeling, as evidenced by the processing time of around 10-20 minutes to label thousands to tens of thousands of samples. In conclusion, self-learning in SSL with co-training can effectively label unla- beled data in multilingual and limited datasets, but it has not yet converged across various datasets.

Publikacje, które mogą Cię zainteresować

artykuł
#151683Data dodania: 30.1.2024
Automated text annotation using a semi-supervised approach with meta vectorizer and machine learning algorithms for hate speech detection / Shoffan SAIFULLAH, Rafał DREŻEWSKI, Felix Andika DWIYANTO, Agus Sasmito Aribowo, Yuli Fauziah, Nur Heri Cahyana // Applied Sciences (Basel) [Dokument elektroniczny]. — Czasopismo elektroniczne ; ISSN 2076-3417. — 2024 — vol. 14 iss. 3 art. no. 1078, s. 1–19. — Wymagania systemowe: Adobe Reader. — Bibliogr. s. 17–19, Abstr. — Publikacja dostępna online od: 2024-01-26. — S. Saifullah - dod. afiliacja: Department of Informatics, Universitas Pembangunan Nasional Veteran Yogyakarta, Indonesia. — R. Dreżewski - dod. afiliacja: Artificial Intelligence Research Group (AIRG), Informatics Department, Faculty of Industrial Technology, Universitas Ahmad Dahlan, Indonesia. — F. A. Dwiyanto - dod. afiliacja: Department of Electrical Engineering, Universitas Negeri Malang, Malang, Indonesia
artykuł
#144918Data dodania: 30.1.2023
Sentiment analysis of Indonesian reviews using fine-tuning IndoBERT and R-CNN / Herlina Jayadianti, Wilis Kaswidjanti, Agung Tri Utomo, Shoffan SAIFULLAH, Felix Andika DWIYANTO, Rafał DREŻEWSKI // Ilkom Jurnal Ilmiah ; ISSN 2087-1716. — 2022 — vol. 14 no. 3, s. 348-354. — Bibliogr. s. 352-354, Abstr. — Publikacja dostępna online od: 2022-12-20. — S. Saifullah - dod. afiliacja: Universitas Pembangunan Nasional Veteran Yogyakarta, Indonesia ; F. A. Dwiyanto - dod. afiliacja: Universitas Negeri Malang, Indonesia