Szczegóły publikacji

Opis bibliograficzny

The evaluation of effects of oversampling and word embedding on sentiment analysis / Nur Heri Cahyana, Yuli Fauziah, Wisnalmawati, Agus Sasmito Aribowo, Shoffan SAIFULLAH // Jurnal Infotel ; ISSN 2085-3688. — 2025 — vol. 17 no. 1, s. 54–67. — Bibliogr. s. 66–67, Abstr. — Publikacja dostępna online od: 2025-04-17

Autorzy (5)

Słowa kluczowe

oversamplingsentiment analysisword embedding

Dane bibliometryczne

ID BaDAP159649
Data dodania do BaDAP2025-06-09
Tekst źródłowyURL
DOI10.20895/INFOTEL.V17I1.1077
Rok publikacji2025
Typ publikacjiartykuł w czasopiśmie
Otwarty dostęptak
Creative Commons
Czasopismo/seriaJurnal Infotel

Abstract

Generally, opinion datasets for sentiment analysis are in an unbalanced condition. Unbalanced data tends to have a bias in favor of classification in the majority class. Data balancing by adding synthetic data to the minority class requires an oversampling strategy. This research aims to overcome this imbalance by combining oversampling and word embedding (Word2Vec or FastText). We convert the opinion dataset into a sentence vector, and then an oversampling method is applied here. We use 5 (five) datasets from comments on YouTube videos with several differences in terms, number of records, and imbalance conditions. We observed increased sentiment analysis accuracy with combining Word2Vec or FastText with 3 (three) oversampling methods: SMOTE, Borderline SMOTE, or ADASYN. Random Forest is used as machine learning in the classification model, and Confusion Matrix is used for validation. Model performance measurement uses accuracy and F-measure. After testing with five datasets, the performance of the Word2Vec method is almost equal to FastText. Meanwhile, the best oversampling method is Borderline SMOTE. Combining Word2Vec or FastText with Borderline SMOTE could be the best choice because of its accuracy score and F-measure reaching 91.0% - 91.3%. It is hoped that the sentiment analysis model using Word2Vec or FastText with Borderline SMOTE can become a high-performance alternative model.

Publikacje, które mogą Cię zainteresować

artykuł
#144918Data dodania: 30.1.2023
Sentiment analysis of Indonesian reviews using fine-tuning IndoBERT and R-CNN / Herlina Jayadianti, Wilis Kaswidjanti, Agung Tri Utomo, Shoffan SAIFULLAH, Felix Andika DWIYANTO, Rafał DREŻEWSKI // Ilkom Jurnal Ilmiah ; ISSN 2087-1716. — 2022 — vol. 14 no. 3, s. 348-354. — Bibliogr. s. 352-354, Abstr. — Publikacja dostępna online od: 2022-12-20. — S. Saifullah - dod. afiliacja: Universitas Pembangunan Nasional Veteran Yogyakarta, Indonesia ; F. A. Dwiyanto - dod. afiliacja: Universitas Negeri Malang, Indonesia
artykuł
#162622Data dodania: 19.9.2025
Semi-supervised sentiment classification using self-learning and enhanced co-training / Agus Sasmito Aribowo, Siti Khomsah, Shoffan SAIFULLAH // Jurnal Infotel ; ISSN 2085-3688. — 2025 — vol. 17 no. 3, s. 472–489. — Bibliogr. s. 487–489, Abstr. — Publikacja dostępna online od: 2025-08-31