Szczegóły publikacji
Opis bibliograficzny
Efficient astronomical data condensation using approximate nearest neighbors / Szymon ŁUKASIK, Konrad Lalik, Piotr Sarna, Piotr A. KOWALSKI, Małgorzata Charytanowicz, Piotr KULCZYCKI // International Journal of Applied Mathematics and Computer Science ; ISSN 1641-876X. — 2019 — vol. 29 no. 3, s. 467–476. — Bibliogr. s. 474–475. — S. Łukasik, P. A. Kowalski, P. Kulczycki - dod. afiliacja: Systems Research Institute, Polish Academy of Sciences, Warsaw
Autorzy (6)
- AGHŁukasik Szymon
- AGHLalik Konrad
- AGHSarna Piotr
- AGHKowalski Piotr Andrzej
- Charytanowicz Małgorzata
- AGHKulczycki Piotr
Słowa kluczowe
Dane bibliometryczne
ID BaDAP | 125699 |
---|---|
Data dodania do BaDAP | 2020-01-21 |
Tekst źródłowy | URL |
DOI | 10.2478/amcs-2019-0034 |
Rok publikacji | 2019 |
Typ publikacji | artykuł w czasopiśmie |
Otwarty dostęp | |
Creative Commons | |
Czasopismo/seria | International Journal of Applied Mathematics and Computer Science |
Abstract
Extracting useful information from astronomical observations represents one of the most challenging tasks of data exploration. This is largely due to the volume of the data acquired using advanced observational tools. While other challenges typical for the class of big data problems (like data variety) are also present, the size of datasets represents the most significant obstacle in visualization and subsequent analysis. This paper studies an efficient data condensation algorithm aimed at providing its compact representation. It is based on fast nearest neighbor calculation using tree structures and parallel processing. In addition to that, the possibility of using approximate identification of neighbors, to even further improve the algorithm time performance, is also evaluated. The properties of the proposed approach, both in terms of performance and condensation quality, are experimentally assessed on astronomical datasets related to the GAIA mission. It is concluded that the introduced technique might serve as a scalable method of alleviating the problem of the dataset size.