DSpace Repository

Statistical Methods for Association Analysis of Biological Data

Show simple item record

dc.contributor.advisor Zhu, Wei en_US
dc.contributor.advisor Wu, Song en_US
dc.contributor.author Huang, Erya en_US
dc.contributor.other Department of Applied Mathematics and Statistics. en_US
dc.date.accessioned 2017-09-20T16:53:15Z
dc.date.available 2017-09-20T16:53:15Z
dc.date.issued 2015-12-01 en_US
dc.identifier.uri http://hdl.handle.net/11401/77665 en_US
dc.description 138 pg. en_US
dc.description.abstract Genome-wide association studies (GWA studies) are an important tool for identifying disease susceptibility variants for common and complex diseases. Traditional approaches to data analysis in GWA studies suffer with the multiple testing problem and also ignore any potential relationships between gene variants. We introduced here a novel two-stage framework with the combination of partial correlation network analysis (PCNA) and data mining techniques. This network-based technique, focusing on SNPs in joint modeling and their partial associations, alleviated the multiple testing problem and consequently increased the power to detect biologically relevant variants and their associations. Variable selection was achieved through penalized logistic regression using sparse-group lasso (SGL) penalty by grouping SNPs based on their: 1) pairwise canonical correlation measurement; or 2) biological information such as gene mapping. Network construction was based on pairwise partial correlation coefficients. Simulation studies have indicated that this two-stage approach achieved high accuracy and a low false-positive rate in the identification of known individual and two-way association targets, which elucidated that it is possible to recover the true direct relationship even for high-dimensional situation. Subsequently, we illustrated the proposed approach in a search for potential significant SNP-SNP/gene-gene associations with nicotine dependence using a real data example from a GWA study conducted by the Washington University at St. Louis. The result would provide researchers potentially biologically relevant genetic networks for further investigation. Another contribution of this thesis is the exploration of miRNA-mRNA regulatory set associated with essential thrombocytosis (ET) through the introduction of an application of penalized technique to canonical correlation analysis on microarray data sets. The identified variables were successfully tested by leave-one-out cross validation and a network exploration system. en_US
dc.description.sponsorship This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree. en_US
dc.format Monograph en_US
dc.format.medium Electronic Resource en_US
dc.language.iso en_US en_US
dc.publisher The Graduate School, Stony Brook University: Stony Brook, NY. en_US
dc.subject.lcsh Statistics en_US
dc.title Statistical Methods for Association Analysis of Biological Data en_US
dc.type Dissertation en_US
dc.mimetype Application/PDF en_US
dc.contributor.committeemember Wang, Xuefeng en_US
dc.contributor.committeemember Bahou, Wadie. en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account