Szczegóły publikacji
Opis bibliograficzny
Are n-gram categories helpful in text classification? / Jakub Kruczek, Paulina Kruczek, Marcin KUTA // W: Computational Science - ICCS 2020 : 20th International Conference : Amsterdam, The Netherlands, June 3–5, 2020 : proceedings, Pt. 2 / eds. Valeria V. Krzhizhanovskaya, [et al.]. — Cham : Springer Nature Switzerland, cop. 2020. — (Lecture Notes in Computer Science ; ISSN 0302-9743 ; LNCS 12138. Theoretical Computer Science and General Issues ; ISSN 0302-9743). — ISBN: 978-3-030-50416-8; e-ISBN: 978-3-030-50417-5. — S. 524–537. — Bibliogr. s. 536–537, Abstr. — Publikacja dostępna online od: 2020-06-15
Autorzy (3)
Słowa kluczowe
Dane bibliometryczne
ID BaDAP | 129152 |
---|---|
Data dodania do BaDAP | 2020-06-25 |
Tekst źródłowy | URL |
DOI | 10.1007/978-3-030-50417-5_39 |
Rok publikacji | 2020 |
Typ publikacji | materiały konferencyjne (aut.) |
Otwarty dostęp | |
Wydawca | Springer |
Konferencja | 20th International Conference on Computational Science |
Czasopisma/serie | Lecture Notes in Computer Science, Theoretical Computer Science and General Issues |
Abstract
Character n-grams are widely used in text categorization problems and are the single most successful type of feature in authorship attribution. Their primary advantage is language independence, as they can be applied to a new language with no additional effort. Typed character n-grams reflect information about their content and context. According to previous research, typed character n-grams improve the accuracy of authorship attribution. This paper examines their effectiveness in three domains: authorship attribution, author profiling and sentiment analysis. The problem of a very high number of features is tackled with distributed Apache Spark processing.