Digitale Bibliotheek
Sluiten Bladeren door artikelen uit een tijdschrift
<< vorige    volgende >>
     Tijdschrift beschrijving
       Alle jaargangen van het bijbehorende tijdschrift
         Alle afleveringen van het bijbehorende jaargang
           Alle artikelen van de bijbehorende aflevering
                                       Details van artikel 5 van 9 gevonden artikelen
  Mining Outliers in Correlated Subspaces for High Dimensional Data Sets
Titel: Mining Outliers in Correlated Subspaces for High Dimensional Data Sets
Auteur: Leng, Jinsong
Hong, Tzung-Pei
Verschenen in: Fundamenta informaticae
Paginering: Jaargang 98 (2010) nr. 1 pagina's 71-86
Jaar: 2010-03-15
Inhoud: Outlier detection in high dimensional data sets is a challenging data mining task. Mining outliers in subspaces seems to be a promising solution, because outliers may be embedded in some interesting subspaces. Searching for all possible subspaces can lead to the problem called "the curse of dimensionality". Due to the existence of many irrelevant dimensions in high dimensional data sets, it is of paramount importance to eliminate the irrelevant or unimportant dimensions and identify interesting subspaces with strong correlation. Normally, the correlation among dimensions can be determined by traditional feature selection techniques or subspace-based clustering methods. The dimension-growth subspace clustering techniques can find interesting subspaces in relatively lower dimension spaces, while dimension-reduction approaches try to group interesting subspaces with larger dimensions. This paper aims to investigate the possibility of detecting outliers in correlated subspaces. We present a novel approach by identifying outliers in the correlated subspaces. The degree of correlation among dimensions is measured in terms of the mean squared residue. In doing so, we employ a dimension-reduction method to find the correlated subspaces. Based on the correlated subspaces obtained, we introduce another criterion called "shape factor" to rank most important subspaces in the projected subspaces. Finally, outliers are distinguished from most important subspaces by using classical outlier detection techniques. Empirical studies show that the proposed approach can identify outliers effectively in high dimensional data sets.
Uitgever: IOS Press
Bronbestand: Elektronische Wetenschappelijke Tijdschriften

                             Details van artikel 5 van 9 gevonden artikelen
<< vorige    volgende >>
 Koninklijke Bibliotheek - Nationale Bibliotheek van Nederland