Szczegóły publikacji
Opis bibliograficzny
ivhd: a robust linear-time and memory efficient method for visual exploratory data analysis / Witold DZWINEL, Rafał WCISŁO // W: Machine learning and data mining in pattern recognition : 13th international conference : MLDM 2017, New York, NY, USA, July 15–20, 2017 : proceedings / ed. Petra Perner. — Cham : Springer, cop. 2017. — (Lecture Notes in Computer Science ; ISSN 0302-9743 ; vol. 10358). — ISBN: 978-3-319-62415-0; e-ISBN: 978-3-319-62416-7. — S. 345–360. — Bibliogr. s. 360, Abstr. — Publikacja dostępna online od: 2017-07-02. — Błędnie podano nazwisko autora: W. Dzwine
Autorzy (2)
Słowa kluczowe
Dane bibliometryczne
ID BaDAP | 107792 |
---|---|
Data dodania do BaDAP | 2017-10-02 |
Tekst źródłowy | URL |
DOI | 10.1007/978-3-319-62416-7_25 |
Rok publikacji | 2017 |
Typ publikacji | materiały konferencyjne (aut.) |
Otwarty dostęp | |
Wydawca | Springer |
Konferencja | 13th International conference on Machine Learning and Data Mining in pattern recognition |
Czasopismo/seria | Lecture Notes in Computer Science |
Abstract
Data embedding (DE) and graph visualization (GV) methods are very compatible tools used in Exploratory Data Analysis for visualization of complex data such as high-dimensional data and complex networks. However, high computational complexity and memory load of existing DE and GV algorithms, considerably hinders visualization of truly large and big data consisting of as many as M~106+ data objects and N~103+ dimensions. Recently, we have shown that by employing only a small fraction of distances between data objects one can obtain very satisfactory reconstruction of topology of a complex data in 2D in a linear-time O(M). In this paper, we demonstrate the high robustness of our approach. We show that even poor approximations of the nn-nearst neighbor graph, representing high-dimensional data, can yield acceptable data embeddings. Furthermore, some incorrectness in the nearest neighbor list can often be useful to improve the quality of data visualization. This robustness of our DE method, together with its high memory and time efficiency, meets perfectly the requirements of big and distributed data visualization, when finding the accurate nearest neighbor list represents a great computational challenge.