DSpace Repository

An Isoform-free Model for Differential Expression Analysis in RNA-seq Data

Show simple item record

dc.contributor.advisor Wu, Song en_US
dc.contributor.advisor Zhu, Wei en_US
dc.contributor.author Liu, Yang en_US
dc.contributor.other Department of Applied Mathematics and Statistics en_US
dc.date.accessioned 2017-09-20T16:52:13Z
dc.date.available 2017-09-20T16:52:13Z
dc.date.issued 2016-12-01 en_US
dc.identifier.uri http://hdl.handle.net/11401/77221 en_US
dc.description 90 pg. en_US
dc.description.abstract Next generation sequencing (NGS) technology has been widely used in biomedical research, particularly on those genomics-related studies. One of the NGS applications is high-throughput mRNA sequencing (RNA-seq), which is usually applied to discover alternative splicing events, to evaluate gene expression level and to identify differentially expressed genes. Compared with the traditional microarrays, RNA-seq is more efficient and economical. Currently, many useful software tools have been developed for RNA-seq differential expression (DE) analyses, such as edgeR, DESeq and Cufflinks; however, all these methods either ignore the isoforms of mRNA transcript, or rely on the predefined isoform structures, or depend on the De Novo isoform reconstruction from the sequencing data, which lead to less accurate inference. In this thesis, we developed and implemented a novel splicing-graph based negative binomial (SGNB) model for gene differential expression analysis in RNA-seq data. The principle of our model is to change the expression comparisons from the unobservable transcript level to the observable read type level, according to the fundamental theory of the linear algebra. The likelihood ratio test is used for finding DE genes. Computationally, we employed the expectation-maximization (EM) and the Newton-Raphson algorithms for parameter estimation. The main advantage of our model is that it considers the isoform but does not require the pre-defined isoform structure and therefore is expected to be more robust and powerful. At the same time, our method does not ask for the De Novo procedure, which will save the time and avoid errors in reconstructing isoforms. We performed intensive simulations to compare our new method with one of the most popular package, edgeR. Under various scenarios we examined, the results showed that our new model can achieve better power, while correctly controlling the false discovery rate. We also applied our method to a real data set to demonstrate its applicability in practice. en_US
dc.description.sponsorship This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree. en_US
dc.format Monograph en_US
dc.format.medium Electronic Resource en_US
dc.language.iso en_US en_US
dc.publisher The Graduate School, Stony Brook University: Stony Brook, NY. en_US
dc.subject.lcsh Statistics en_US
dc.title An Isoform-free Model for Differential Expression Analysis in RNA-seq Data en_US
dc.type Dissertation en_US
dc.mimetype Application/PDF en_US
dc.contributor.committeemember Yang, Jie en_US
dc.contributor.committeemember Galambos, Nora. en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account