Digitale Bibliotheek
Sluiten Bladeren door artikelen uit een tijdschrift
 
<< vorige    volgende >>
     Tijdschrift beschrijving
       Alle jaargangen van het bijbehorende tijdschrift
         Alle afleveringen van het bijbehorende jaargang
           Alle artikelen van de bijbehorende aflevering
                                       Details van artikel 6 van 7 gevonden artikelen
 
 
  In search of deterministic methods for initializing K-means and Gaussian mixture clustering
 
 
Titel: In search of deterministic methods for initializing K-means and Gaussian mixture clustering
Auteur: Su, Ting
Dy, Jennifer G.
Verschenen in: Intelligent data analysis
Paginering: Jaargang 11 (2007) nr. 4 pagina's 319-338
Jaar: 2007-08-01
Inhoud: The performance of K-means and Gaussian mixture model (GMM) clustering depends on the initial guess of partitions. Typically, clustering algorithms are initialized by random starts. In our search for a deterministic method, we found two promising approaches: principal component analysis (PCA) partitioning and Var-Part (Variance Partitioning). K-means clustering tries to minimize the sum-squared-error criterion. The largest eigenvector with the largest eigenvalue is the component which contributes to the largest sum-squared-error. Hence, a good candidate direction to project a cluster for splitting is the direction of the cluster's largest eigenvector, the basis for PCA partitioning. Similarly, GMM clustering maximizes the likelihood; minimizing the determinant of the covariance matrices of each cluster helps to increase the likelihood. The largest eigenvector contributes to the largest determinant and is thus a good candidate direction for splitting. However, PCA is computationally expensive. We, thus, introduce Var-Part, which is computationally less complex (with complexity equal to one K-means iteration) and approximates PCA partitioning assuming diagonal covariance matrix. Experiments reveal that Var-Part has similar performance with PCA partitioning, sometimes better, and leads K-means (and GMM) to yield sum-squared-error (and maximum-likelihood) values close to the optimum values obtained by several random-start runs and often at faster convergence rates.
Uitgever: IOS Press
Bronbestand: Elektronische Wetenschappelijke Tijdschriften
 
 

                             Details van artikel 6 van 7 gevonden artikelen
 
<< vorige    volgende >>
 
 Koninklijke Bibliotheek - Nationale Bibliotheek van Nederland