Szczegóły publikacji
Opis bibliograficzny
Seamlessly managing HPC workloads through Kubernetes / Sergio López-Huguet, J. Damià Segrelles, Marek KASZTELNIK, Marian BUBAK, Ignacio Blanquer // W: High performance computing : ISC high performance 2020 : international workshops : Frankfurt, Germany, June 21–25, 2020 : revised selected papers / eds. Heike Jagode, [et al.]. — Cham: Springer Nature Switzerland AG, cop. 2020. — (Lecture Notes in Computer Science ; ISSN 0302-9743 ; LNCS 12321. Theoretical Computer Science and General Issues ; ISSN 0302-9743). — ISBN: 978-3-030-59850-1; e-ISBN: 978-3-030-59851-8. — S. 310–320. — Bibliogr., Abstr. — Publikacja dostępna online od: 2020-10-20
Autorzy (5)
- López-Huguet Sergio
- Segrelles J. Damià
- AGHKasztelnik Marek
- AGHBubak Marian
- Blanquer Ignacio
Słowa kluczowe
Dane bibliometryczne
ID BaDAP | 131326 |
---|---|
Data dodania do BaDAP | 2020-12-10 |
DOI | 10.1007/978-3-030-59851-8_20 |
Rok publikacji | 2020 |
Typ publikacji | materiały konferencyjne (aut.) |
Otwarty dostęp | |
Wydawca | Springer |
Konferencja | High performance computing |
Czasopisma/serie | Theoretical Computer Science and General Issues, Lecture Notes in Computer Science |
Abstract
This paper describes an approach to integrate the jobs management of High Performance Computing (HPC) infrastructures in cloud architectures by managing HPC workloads seamlessly from the cloud job scheduler. The paper presents hpc-connector, an open source tool that is designed for managing the full life cycle of jobs in the HPC infrastructure from the cloud job scheduler interacting with the workload manager of the HPC system. The key point is that, thanks to running hpc-connector in the cloud infrastructure, it is possible to reflect in the cloud infrastructure, the execution of a job running in the HPC infrastructure managed by hpc-connector. If the user cancels the cloud-job, as hpc-connector catches Operating System (OS) signals (for example, SIGINT), it will cancel the job in the HPC infrastructure too. Furthermore, it can retrieve logs if requested. Therefore, by using hpc-connector, the cloud job scheduler can manage the jobs in the HPC infrastructure without requiring any special privilege, as it does not need changes on the Job scheduler. Finally, we perform an experiment training a neural network for automated segmentation of Neuroblastoma tumours in the Prometheus supercomputer using hpc-connector as a batch job from a Kubernetes infrastructure. © 2020, Springer Nature Switzerland AG.