Szczegóły publikacji

Opis bibliograficzny

Seamlessly managing HPC workloads through Kubernetes / Sergio López-Huguet, J. Damià Segrelles, Marek KASZTELNIK, Marian BUBAK, Ignacio Blanquer // W: High performance computing : ISC high performance 2020 : international workshops : Frankfurt, Germany, June 21–25, 2020 : revised selected papers / eds.  Heike Jagode, [et al.]. — Cham: Springer Nature Switzerland AG, cop. 2020. — (Lecture Notes in Computer Science ; ISSN 0302-9743 ; LNCS 12321. Theoretical Computer Science and General Issues ; ISSN 0302-9743). — ISBN: 978-3-030-59850-1; e-ISBN: 978-3-030-59851-8. — S. 310–320. — Bibliogr., Abstr. — Publikacja dostępna online od: 2020-10-20


Autorzy (5)


Słowa kluczowe

Kubernetesintegrating cloud and HPCdocker and singularity containers

Dane bibliometryczne

ID BaDAP131326
Data dodania do BaDAP2020-12-10
DOI10.1007/978-3-030-59851-8_20
Rok publikacji2020
Typ publikacjimateriały konferencyjne (aut.)
Otwarty dostęptak
WydawcaSpringer
KonferencjaHigh performance computing
Czasopisma/serieTheoretical Computer Science and General Issues, Lecture Notes in Computer Science

Abstract

This paper describes an approach to integrate the jobs management of High Performance Computing (HPC) infrastructures in cloud architectures by managing HPC workloads seamlessly from the cloud job scheduler. The paper presents hpc-connector, an open source tool that is designed for managing the full life cycle of jobs in the HPC infrastructure from the cloud job scheduler interacting with the workload manager of the HPC system. The key point is that, thanks to running hpc-connector in the cloud infrastructure, it is possible to reflect in the cloud infrastructure, the execution of a job running in the HPC infrastructure managed by hpc-connector. If the user cancels the cloud-job, as hpc-connector catches Operating System (OS) signals (for example, SIGINT), it will cancel the job in the HPC infrastructure too. Furthermore, it can retrieve logs if requested. Therefore, by using hpc-connector, the cloud job scheduler can manage the jobs in the HPC infrastructure without requiring any special privilege, as it does not need changes on the Job scheduler. Finally, we perform an experiment training a neural network for automated segmentation of Neuroblastoma tumours in the Prometheus supercomputer using hpc-connector as a batch job from a Kubernetes infrastructure. © 2020, Springer Nature Switzerland AG.