Optimal Phylogenetic Reconstruction

Loading...
Thumbnail Image

Related Collections

Degree type

Discipline

Subject

optimal phylogenetic reconstruction
mutation probability
second author
markov chain
phylogenetic tree
underlying biology
special case
statistical physic
phase transition
reconstruction problem
evolutionary tree
genetic sequence
molecular data
cfn evolutionary model
evolutionary model
clear mathematical formulation
true evolutionary tree
major task
evolutionary biology
critical importance
free measure
Statistics and Probability
Theory and Algorithms

Funder

Grant number

License

Copyright date

Distributor

Related resources

Contributor

Abstract

One of the major tasks of evolutionary biology is the reconstruction of phylogenetic trees from molecular data. This problem is of critical importance in almost all areas of biology and has a very clear mathematical formulation. The evolutionary model is given by a Markov chain on the true evolutionary tree. Given samples from this Markov chain at the leaves of the tree, the goal is to reconstruct the evolutionary tree. It is crucial to minimize the number of samples, i.e., the length of genetic sequences, as it is constrained by the underlying biology, the price of sequencing etc. It is well known that in order to reconstruct a tree on n leaves, sequences of length Ω(log n) are needed. It was conjectured by M. Steel that for the CFN evolutionary model, if the mutation probability on all edges of the tree is less than p∗ = ( √ 2 −1)/23/2 than the tree can be recovered from sequences of length O(log n). This was proven by the second author in the special case where the tree is “balanced”. The second author also proved that if all edges have mutation probability larger than p∗ then the length needed is nΩ(1). This “phase-transition ” in the number of samples needed is closely related to the phase transition for the reconstruction problem (or extremality of free measure) studied extensively in statistical physics and probability. Here we complete the proof of Steel’s conjecture and give a reconstruction algorithm using optimal (up to a multiplicative constant) sequence length. Our results further extend to obtain optimal reconstruction algorithm for the Jukes-Cantor model with short edges. All reconstruction algorithms run in time polynomial in the sequence length. The algorithm and the proofs are based on a novel combination of combinatorial, metric and probabilistic arguments.

Advisor

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Publication date

2006-01-01

Journal title

Proceedings of the Thirty-Eighth Annual ACM Symposium on Theory of Computing

Volume number

Issue number

Publisher

Publisher DOI

relationships.isJournalIssueOf

Comments

Recommended citation

Collection