Szczegóły publikacji

Opis bibliograficzny

FinOps-driven optimization of cloud resource usage for high-performance computing using machine learning / Piotr NAWROCKI, Mateusz SMENDOWSKI // Journal of Computational Science ; ISSN 1877-7503. — 2024 — vol. 79 art. no. 102292, s. 1–18. — Bibliogr. s. 17–18, Abstr. — Publikacja dostępna online od: 2024-04-27

Autorzy (2)

Słowa kluczowe

machine learninglong-term resource predictionFinOpsresource reservationtime series forecastingcloud computinghigh performance computing

Dane bibliometryczne

ID BaDAP153060
Data dodania do BaDAP2024-05-16
Tekst źródłowyURL
DOI10.1016/j.jocs.2024.102292
Rok publikacji2024
Typ publikacjiartykuł w czasopiśmie
Otwarty dostęptak
Czasopismo/seriaJournal of Computational Science

Abstract

Cloud computing is gaining popularity in high-performance computing applications. Its utilization enables advanced simulations when local computing resources are limited. However, cloud usage may increase costs and entail resource unavailability risks. This article presents an original approach that employs machine learning to predict long-term cloud resource usage. This enables optimizing resource utilization through appropriate reservation plans, reducing the associated costs. The solution developed utilizes statistical models, XGBoost, neural networks and the Temporal Fusion Transformer. Long-term prediction of cloud resource consumption, especially the Cloud Resource Usage Optimization System that is critical for prolonged simulations, involves using prediction results to dynamically create resource reservation plans across various virtual machine types for HPC on the Google Cloud Platform. Experiments using real-life production data demonstrate that the TFT prediction model improved prediction quality (by 31.4%) compared to the best baseline method, particularly in adapting to chaotic changes in resource consumption. However, it should be noted that the best prediction model in terms of error magnitude might not be the most suitable for resource reservation planning. This was validated by the neural network-based method, introducing an FR metric for forecast evaluation. Resource reservation plans were assessed both qualitatively and quantitatively, focusing on various aspects like a service-level agreement compliance and potential downtime. This paper is an extension of work originally presented during the International Conference on Computational Science — ICCS 2023, entitled “Long-Term Prediction of Cloud Resource Usage in High-Performance Computing”.

Publikacje, które mogą Cię zainteresować

fragment książki
#147728Data dodania: 20.7.2023
Long-term prediction of cloud resource usage in high-performance computing / Piotr NAWROCKI, Mateusz Smendowski // W: Computational Science – ICCS 2023 : 23rd international conference : Prague, Czech Republic, July 3–5, 2023 : proceedings, Pt. 2 / eds. Jiří Mikyška [et al.]. — Cham, Switzerland : Springer, cop. 2023. — (Lecture Notes in Computer Science ; ISSN 0302-9743 ; LNCS 14074). — ISBN: 978-3-031-36020-6; e-ISBN: 978-3-031-36021-3. — S. 532–546. — Bibliogr., Abstr. — Publikacja dostępna online od: 2023-06-26
artykuł
#152873Data dodania: 7.5.2024
Signature-based adaptive cloud resource usage prediction using machine learning and anomaly detection / Wiktor SUS, Piotr NAWROCKI // Journal of Grid Computing ; ISSN 1570-7873. — 2024 — vol. 22 iss. 2 art. no. 46, s. 1–15. — Bibliogr. s. 14–15, Abstr. — Publikacja dostępna online od: 2024-04-23