Dissertations

GET STARTED
1
Request Info
2
Visit
3
Apply

Classified Mixed Model Prediction in Epigenetics: Modeling Non-Diverse Samples and Its Multivariate Extension

Author: Hang Zhang

Date: 8/30/2021

Executive Summary:
Epigenetic modifications form a bridge between the environment and gene expression and plays a crucial role in tumor development. Thus, epigenetic markers such as DNA methylation are more and more the focus of cancer research. With the advancement of high-throughput omics (HTO) technologies, a vast amount of -omics data could be accessed from public repositories such as The Cancer Genome Atlas (TCGA) allowing for comprehensive investigation on genomic and epigenomic interrelationships. However, public genomic repositories are notoriously lacking in racially and ethnically diverse samples. This limits the reaches of exploration and has in fact been one of the driving factors for the initiation of the All of Us project.The particular focus in this thesis is to provide a model-based framework for accurately predicting DNA methylation from genetic information using racially sparse public repository data. Epigenetic alterations are of great interest in cancer research but public repository data is limited in the information it provides; however, genetic data is more plentiful. The phenotype of interest is cervical cancer (CESC) in TCGA. Being able to generate such predictions would nicely complement other work that has generated predictions of gene expression from normal samples.We describe an application of the Classified Mixed Model Prediction (CMMP) (Jiang et al., 2018) method that enables accurate race-specific prediction of DNA methylation (DNAm) from genetic data lacking racial diversity such as in the TCGA CESC data. The predictive performance of the model is enhanced by combining different types of cancer data to increase data het-erogeneity and induce borrowing of information. These findings have been published in Genomics (Rao et al., 2020).It has been known that dynamic methylation markers exhibit correlative patterns across the genome (Teschendorff et al., 2014). One limitation of applying CMMP in the case of CESC DNAm prediction is that the joint correlation structure among the methylation outcomes was ignored, which motivates the multivariate version of the CMMP idea (mvCMMP). Therefore, I present mvCMMP, an efficient multivariate classified mixed model prediction approach that is derived from CMMP incorporating a flexible estimation method based on ASReml (Gilmour et al., 1995) for mixed effect prediction on multivariate outcomes . We demonstrate that mvCMMP addresses prediction accuracy in correlated outcomes through simulation as well as the TCGA CESC data.In summary, the purpose of the whole thesis is to incorporate real quantitative biological markers with sophisticated statistical modeling while mitigating racial disparities.