Person:

Teerapittayanon, Surat

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Teerapittayanon

First Name

Surat

Name

Teerapittayanon, Surat

Search Results

Now showing 1 - 1 of 1
  • Publication

    Lambda means clustering: Automatic parameter search and distributed computing implementation

    (2016) Comiter, Marcus; Cha, Miriam; Kung, H.; Teerapittayanon, Surat

    Recent advances in clustering have shown that ensuring a minimum separation between cluster centroids leads to higher quality clusters compared to those found by methods that explicitly set the number of clusters to be found, such as k-means. One such algorithm is DP-means, which sets a distance parameter λ for the minimum separation. However, without knowing either the true number of clusters or the underlying true distribution, setting λ itself can be difficult, and poor choices in setting λ will negatively impact cluster quality. As a general solution for finding λ, in this paper we present λ-means, a clustering algorithm capable of deriving an optimal value for λ automatically. We contribute both a theoretically-motivated cluster-based version of λ-means, as well as a faster conflict-based version of λ-means. We demonstrate that λ-means discovers the true underlying value of λ asymptotically when run on datasets generated by a Dirichlet Process, and achieves competitive performance on a real world test dataset. Further, we demonstrate that when run on both parallel multicore computers and distributed cluster computers in the cloud, cluster-based λ-means achieves near perfect speedup, and while being a more efficient algorithm, conflict-based λmeans achieves speedups only a factor of two away from the maximum-possible.