DSpace Repository

Hardy-Weinberg Deviation and EM-based Haplotype Frequency Estimation

Show simple item record

dc.contributor.advisor John J. Chen. en_US
dc.contributor.author Ahn, Hyeong Jun en_US
dc.contributor.other Department of Applied Mathematics and Statistics en_US
dc.date.accessioned 2012-05-17T12:19:44Z
dc.date.accessioned 2015-04-24T14:47:54Z
dc.date.available 2012-05-17T12:19:44Z
dc.date.available 2015-04-24T14:47:54Z
dc.date.issued 2011-08-01
dc.identifier Ahn_grad.sunysb_0771E_10595.pdf en_US
dc.identifier.uri http://hdl.handle.net/1951/55943 en_US
dc.identifier.uri http://hdl.handle.net/11401/71557 en_US
dc.description.abstract Single-nucleotide polymorphisms (SNPs) are the most common type of genetic variation in human genome. Haplotypes which combine multiple SNPs into super-alleles have been widely used in modern genetic analysis, especially in human disease association studies. The Expectation Maximization (EM) algorithm is commonly used in haplotype phasing and frequency estimation, and Hardy-Weinberg (HW) equilibrium is a key assumption built into the EM algorithm. The accuracy of EM-based haplotype frequency estimation when the HW equilibrium assumption is violated has been explored by several studies. The general consensus is that the sampling error plays a more dominant role in haplotypes estimation than the estimation error due to HW deviation; the accuracy of haplotype frequency estimation tends to improve with increasing homozygosity in the sample. However, these studies mainly concentrated on the impact of SNP level HW deviation. A theoretical foundation for the impact of HW deviation at the haplotype level on haplotype frequency estimation has not been established. In this dissertation, we derived the theoretical relationship among three haplotype mean squared errors: between population and sample frequencies (MSEPS), between true sample and sample estimated frequencies (MSESE), and between population and sample estimated frequencies (MSEPE). The theoretical relationship between SNP level and haplotype level HW deviations was also established. Our simulations show that the violation of HW equilibrium at haplotype level could result in more severe haplotype estimation error than sampling error, and the accuracy of haplotype frequency estimation is not always improved with increasing homozygosity. To incorporate the possible haplotype level HW deviations into the haplotype frequency estimation process, we propose a Hardy-Weinberg Deviation-Expectation/Conditional Maximization (HWD-ECM) method which allows us to estimate HW deviation parameters and haplotype frequencies simultaneously. For two SNPs cases, the HWD-ECM algorithm consists of three iteration steps: 1). an expectation step estimating genotype frequencies allowing HW deviation parameters; 2). a conditional maximization step for HW deviation parameter estimation utilizing constraints of SNP level or haplotype level HW deviation parameters; and 3). a conditional maximization step for haplotype frequencies. Simulation results show that the HWD-ECM method performs significantly better than the EM-based approach in haplotype estimation when HWE assumption is violated. Algorithm for extension of HWD-ECM to multiple SNPs is also discussed. en_US
dc.description.sponsorship This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree. en_US
dc.format Monograph en_US
dc.format.medium Electronic Resource en_US
dc.language.iso en_US en_US
dc.publisher The Graduate School, Stony Brook University: Stony Brook, NY. en_US
dc.subject.lcsh Statistics -- Biostatistics en_US
dc.subject.other Expectation/Conditional Maximization (ECM) algorithm, Expectation Maximization (EM) algorithm, haplotype frequency estimation, Hardy-Weinberg Deviation-Expectation/Conditional Maximization (HWD-ECM) algorithm, Hardy-Weinberg (HW) deviation, Single-nucleotide polymorphism (SNP) en_US
dc.title Hardy-Weinberg Deviation and EM-based Haplotype Frequency Estimation en_US
dc.type Dissertation en_US
dc.mimetype Application/PDF en_US
dc.contributor.committeemember Nancy R. Mendell en_US
dc.contributor.committeemember Wei Zhu en_US
dc.contributor.committeemember Barbara Nemesure. en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account