Szczegóły publikacji
Opis bibliograficzny
Data augmentation using Large Language Models: methods, challenges, and perspectives / Agata Kozina, Michał PIKUS, Jarosław WĄS // W: Emerging challenges in intelligent management information systems : proceedings of 28th European Conference on Artificial Intelligence ECAI 2025 - IMIS Workshop : [Bologna, Italy, October 25–30, 2025] , Vol. 1 / eds. Marcin Hernes, Ewa Walaszczyk, Artur Rot. — Cham : Springer Nature Switzerland, 2026. — ( Lecture Notes in Networks and Systems ; ISSN 2367-3370 ; LNNS 1767 ). — ISBN: 978-3-032-13868-2; e-ISBN: 978-3-032-13869-9. — S. 58–68. — Bibliogr., Abstr. — Publikacja dostępna online od: 2026-01-02
Autorzy (3)
- Kozina Agata
- AGHPikus Michał
- AGHWąs Jarosław
Dane bibliometryczne
| ID BaDAP | 165498 |
|---|---|
| Data dodania do BaDAP | 2026-02-18 |
| DOI | 10.1007/978-3-032-13869-9_5 |
| Rok publikacji | 2026 |
| Typ publikacji | materiały konferencyjne (aut.) |
| Otwarty dostęp | |
| Wydawca | Springer |
| Konferencja | European Conference on Artificial Intelligence 2025 |
| Czasopismo/seria | Lecture Notes in Networks and Systems |
Abstract
Data augmentation plays a pivotal role in improving machine learning performance, especially when labeled data are limited. With the rapid advancement of Large Language Models (LLMs), their ability to generate high quality synthetic data has attracted significant attention. In this study, we examine the use of LLMs for data augmentation, particularly the Polish large-language model Bielik. We assess its proficiency in producing diverse and contextually relevant synthetic text to enrich datasets within the financial sector, using data sourced from a leasing company. Our investigation covers a variety of enhancement strategies. These include controlled text generation through paraphrasing (with adjustable prompt parameters), the addition of noise to continuous variables (with modifiable scaling), and binary variable modifications through flipping at predetermined rates. We evaluated the impact of these techniques on overall model performance while addressing challenges such as data quality, bias mitigation, and ethical considerations in integrating LLM generated data into machine learning workflows. Ultimately, this research provides forward looking perspectives on the evolving applications and potential enhancements of LLM driven data augmentation, offering valuable insights for both practitioners and researchers.