DSpace Repository

New Development in Cluster Analysis and Other Related Multivariate Analysis Methods

Show simple item record

dc.contributor.advisor Zhu, Wei en_US
dc.contributor.author Zhang, Shaonan en_US
dc.contributor.other Department of Applied Mathematics and Statistics en_US
dc.date.accessioned 2013-05-24T16:38:17Z
dc.date.accessioned 2015-04-24T14:47:43Z
dc.date.available 2013-05-24T16:38:17Z
dc.date.available 2015-04-24T14:47:43Z
dc.date.issued 2012-05-01
dc.identifier.uri http://hdl.handle.net/1951/60237 en_US
dc.identifier.uri http://hdl.handle.net/11401/71482 en_US
dc.description 121 pg. en_US
dc.description.abstract Cluster analysis is a multivariate analysis method aimed at (1) unraveling the natural groupings embedded within the data, and (2) dimension reduction. With the wide application of cluster analysis in the diversified modern research/business fields including machine learning, bioinformatics, medical image analysis, pattern recognition, market research and global climate research, many clustering algorithms have been developed to date. However, novel and/or special circumstances always call for better customized cluster analysis methods, and thus this thesis. This thesis work consists of two parts. In the first part, we extend the modern multiple-objective cluster analysis from using a single set of features to multiple distinct sets of features by developing the novel compound clustering method and the constrained clustering method. We also developed a new statistic, the "complete linkage" R <super>2</super> along with the well-known largest average silhouette, to determine the optimal number of clusters in the compound clustering. The novel compound/constrained clustering methods are illustrated through a gene microarray study with both gene expression data and gene function information. In the second part of this thesis we propose a novel algorithm for the weighted k-means clustering. Weighted k-means clustering is an extension of the k-means clustering in which a set of nonnegative weights are assigned to all the variables. We first derived the optimal variable weights for weighted k-means clustering in order to obtain more meaningful and interpretable clusters. We then improved the current weighted k-means clustering method (Huh and Lim 2009) by incorporating our novel algorithm to obtain global-optimal guaranteed variable weights based on the method of Lagrange multiplier and the Karush-Kuhn-Tucker conditions. Here we first present the related theoretical formulation and derivation of the optimal weights. Then we provide an iteration-based computing algorithm to calculate such optimal weights. Numerical examples on both simulated and well known real data are provided to illustrate our method. It is shown that our method outperforms the original proposed method in terms of classification accuracy, stability and computation efficiency. en_US
dc.description.sponsorship This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree. en_US
dc.format Monograph en_US
dc.format.medium Electronic Resource en_US
dc.language.iso en_US en_US
dc.publisher The Graduate School, Stony Brook University: Stony Brook, NY. en_US
dc.subject.lcsh Statistics--Applied mathematics--Biostatistics en_US
dc.subject.other Cluster Analysis, Multi-objective clustering, Multivariate Analysis, Optimization, Weighted k-means clustering en_US
dc.title New Development in Cluster Analysis and Other Related Multivariate Analysis Methods en_US
dc.type Dissertation en_US
dc.mimetype Application/PDF en_US
dc.contributor.committeemember Hu, Jiaqiao en_US
dc.contributor.committeemember Xing, Haipeng en_US
dc.contributor.committeemember Benveniste, Helene D. en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account