Digitale Bibliotheek
Sluiten Bladeren door artikelen uit een tijdschrift
 
<< vorige    volgende >>
     Tijdschrift beschrijving
       Alle jaargangen van het bijbehorende tijdschrift
         Alle afleveringen van het bijbehorende jaargang
           Alle artikelen van de bijbehorende aflevering
                                       Details van artikel 6 van 10 gevonden artikelen
 
 
  Models for association rules based on clustering and correlation
 
 
Titel: Models for association rules based on clustering and correlation
Auteur: Ordonez, Carlos
Verschenen in: Intelligent data analysis
Paginering: Jaargang 13 (2009) nr. 2 pagina's 337-358
Jaar: 2009-04-22
Inhoud: Association rules require models to understand their relationship to statistical properties of the data set. In this work, we study mathematical relationships between association rules and two fundamental techniques: clustering and correlation. Each cluster represents an important itemset. We show the sufficient statistics for clustering and correlation on binary data sets are the linear sum of points and the quadratic sum of points, respectively. We prove itemset support can be bounded and approximated from both models. Support bounds and support estimation obey the set downward closure property for fast bottom-up search for frequent itemsets. Both models can be efficiently computed with sparse matrix computations. Experiments with real and synthetic data sets evaluate model accuracy and speed. The clustering model is accurate to estimate support, given a sufficiently large number of clusters and it is more accurate than correlation, except for sets of two items. Accuracy increases as the number of clusters grows, but decreases as the minimum support threshold decreases. Once built, the clustering model represents a faster alternative than the traditional A-priori algorithm and the correlation model to mine associations. The correlation model is faster to compute than clustering, but it is less accurate. Time complexity to compute both models is linear on data set size, whereas dimensionality marginally impacts time when analyzing large transaction data sets.
Uitgever: IOS Press
Bronbestand: Elektronische Wetenschappelijke Tijdschriften
 
 

                             Details van artikel 6 van 10 gevonden artikelen
 
<< vorige    volgende >>
 
 Koninklijke Bibliotheek - Nationale Bibliotheek van Nederland