Statistical Inference For High Dimensional Models In Genomics And Microbiome

Loading...
Thumbnail Image

Degree type

Doctor of Philosophy (PhD)

Graduate group

Epidemiology & Biostatistics

Discipline

Subject

Causal inference
Genomcis
High dimensional model
Microbiome
Statistical inference
Biostatistics

Funder

Grant number

License

Copyright date

2021-08-31T20:20:00-07:00

Distributor

Related resources

Contributor

Abstract

Human microbiome consists of all living microorganisms that are in and on human body. Largescale microbiome studies such as the NIH Human Microbiome Project (HMP), have shown that this complex ecosystem has large impact on human health through multiple ways. The analysis of these datasets leads to new statistical challenges that require the development of novel methodologies. Motivated by several microbiome studies, we develop several methods of statistical inference for high dimensional models to address the association between microbiome compositions and certain outcomes. The high-dimensionality and compositional nature of the microbiome data make the naive application of the classical regression models invalid. To study the association between microbiome compositions with a disease’s risk, we develop a generalized linear model with linear constraints on regression coefficients and a related debiased procedure to obtain asymptotically unbiased and normally distributed estimates. Application of this method to an inflammatory bowel disease (IBD) study identifies several gut bacterial species that are associated with the risk of IBD. We also consider the post-selection inference for models with linear equality constraints, where we develop methods for constructing the confidence intervals for the selected non-zero coefficients chosen by a Lasso-type estimator with linear constraints. These confidence intervals are shown to have desired coverage probabilities when conditioned on the selected model. Finally, the last chapter of this dissertation presents a method for inference of high dimensional instrumental variable regression. Gene expression and phenotype association can be affected by potential unmeasured confounders, leading to biased estimates of the associations. Using genetic variants as instruments, we consider the problem of hypothesis testing for sparse IV regression models and present methods for testing both single and multiple regression coefficients. A multiple testing procedure is developed for selecting variables and is shown to control the false discovery rate. These methods are illustrated by an analysis of a yeast dataset in order to identify genes that are associated with growth in the presence of hydrogen peroxide.

Date of degree

2020-01-01

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Volume number

Issue number

Publisher

Publisher DOI

relationships.isJournalIssueOf

Comments

Recommended citation