Szczegóły publikacji
Opis bibliograficzny
Fast prediction of job execution times in the ALICE grid through GPU-based inference with quantization and sparsity techniques / Tomasz LELEK, Szymon MAZUREK, Maciej WIELGOSZ, Bartosz BALIŚ // W: Computational Science – ICCS 2025 : 25th international conference : Singapore, Singapore, July 7–9, 2025 : proceedings, Pt. 4 / eds. Michael H. Lees [et al.]. — Cham : Springer Nature Switzerland, cop. 2025. — (Lecture Notes in Computer Science ; ISSN 0302-9743 ; LNCS 15906). — ISBN: 978-3-031-97634-6; e-ISBN: 978-3-031-97635-3. — S. 97–105. — Bibliogr., Abstr. — Publikacja dostępna online od: 2026-07-06. — Sz. Mazurek, M. Wielgosz - dod. afiliacja: ACC Cyfronet AGH
Autorzy (4)
Dane bibliometryczne
| ID BaDAP | 161043 |
|---|---|
| Data dodania do BaDAP | 2025-07-31 |
| DOI | 10.1007/978-3-031-97635-3_12 |
| Rok publikacji | 2025 |
| Typ publikacji | materiały konferencyjne (aut.) |
| Otwarty dostęp | |
| Wydawca | Springer |
| Konferencja | International Conference on Computational Science 2025 |
| Czasopismo/seria | Lecture Notes in Computer Science |
Abstract
We propose a latency-optimized neural network model to dynamically predict job execution times for the ALICE experiment at CERN, replacing static Time-To-Live (TTL) allocations. Utilizing Nvidia A100 GPUs, we optimize inference latency via FP16 and INT8 quantization, 2:4 sparsity, quantization-aware training, and graph compilation. Results show that FP16 and sparsity reduce latency for larger batches, while INT8 is optimal for single-sample predictions. For single-sample online inference, static INT8 quantization achieves a median 0.38 ms prediction time, a 1.8x improvement over the 0.71 ms baseline. The model achieves a 1.9-hour RMSE, improving on the 14.23-hour RMSE of current TTL assignments. With sub-40ms inference latency on GPU hardware, this work demonstrates how NN optimization can help achieve performance demands of large-scale distributed computing systems.