Identifiability and Inference of Non-Parametric Rates-Across-Sites Models on Large-Scale Phylogenies

Loading...
Thumbnail Image

Related Collections

Degree type

Discipline

Subject

phylogenetic reconstruction
rates-across-sites models
concentration of measure
Biostatistics
Statistics and Probability

Funder

Grant number

License

Copyright date

Distributor

Related resources

Contributor

Abstract

Mutation rate variation across loci is well known to cause difficulties, notably identifiability issues, in the reconstruction of evolutionary trees from molecular sequences. Here we introduce a new approach for estimating general rates-across-sites models. Our results imply, in particular, that large phylogenies are typically identifiable under rate variation. We also derive sequence-length requirements for high-probability reconstruction. Our main contribution is a novel algorithm that clusters sites according to their mutation rate. Following this site clustering step, standard reconstruction techniques can be used to recover the phylogeny. Our results rely on a basic insight: that, for large trees, certain site statistics experience concentration-of-measure phenomena.

Advisor

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Publication date

2013-10-01

Journal title

Journal of Mathematical Biology

Volume number

Issue number

Publisher

Publisher DOI

relationships.isJournalIssueOf

Comments

Recommended citation

Collection