DSpace Repository

Modeling the effect of sequencing error

Show simple item record

dc.contributor.advisor Finch, Stephen J. en_US
dc.contributor.author Zhang, Ruiqi en_US
dc.contributor.other Department of Applied Mathematics and Statistics en_US
dc.date.accessioned 2017-09-20T16:50:35Z
dc.date.available 2017-09-20T16:50:35Z
dc.date.issued 2014-12-01 en_US
dc.identifier.uri http://hdl.handle.net/11401/76538 en_US
dc.description 110 pgs en_US
dc.description.abstract Genotype misclassification errors are known to reduce the power to detect genetic association, but the size of the effect is not known in next generation sequencing (NGS). The non-centrality parameter (NCP) and hence power of the association test allowing for errors for a specified error model at a base pair was found. This NCP was compared to the NCP for the usual chi-square test. The asymptotic power was compared to simulated power for specific settings of the true genotype and phenotype frequencies in the case and control populations, genotype misclassification rates, and total sample size. An R script was provided for calculating the NCP. Next, the effect of misclassification error using data from NGS technology for case-control genetic association studies was modeled. The Likelihood Ratio Test Allowing for Error using NGS data (LRTNGS) was derived. The estimated genotype frequencies and misclassification rates from the observed base pair reads were calculated using the expectation-maximization (EM) algorithm. This statistic allows for both non-differential and differential misclassification. The distribution of LRTNGS was studied by simulations for both null and alternative settings. The effects of genotyping misclassification rates on the sample size needed to maintain the constant asymptotic Type I and Type II error rates were studied. For at risk minor allele frequencies less than 0.01, large sample sizes were required for the asymptotic distribution to be a good approximation. Increasing the sequencing coverage increased the estimated power and the adequacy of simulated power. en_US
dc.description.sponsorship This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree. en_US
dc.format Monograph en_US
dc.format.medium Electronic Resource en_US
dc.language.iso en_US en_US
dc.publisher The Graduate School, Stony Brook University: Stony Brook, NY. en_US
dc.subject.lcsh Statistics en_US
dc.title Modeling the effect of sequencing error en_US
dc.type Dissertation en_US
dc.mimetype Application/PDF en_US
dc.contributor.committeemember Mendell, Nancy en_US
dc.contributor.committeemember Zhu, Wei en_US
dc.contributor.committeemember Gordon, Derek en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account