Tags

Type your tag names separated by a space and hit enter

Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference.
Mol Biol Evol. 2012 Jan; 29(1):325-35.MB

Abstract

Recent studies have observed that Bayesian analyses of sequence data sets using the program MrBayes sometimes generate extremely large branch lengths, with posterior credibility intervals for the tree length (sum of branch lengths) excluding the maximum likelihood estimates. Suggested explanations for this phenomenon include the existence of multiple local peaks in the posterior, lack of convergence of the chain in the tail of the posterior, mixing problems, and misspecified priors on branch lengths. Here, we analyze the behavior of Bayesian Markov chain Monte Carlo algorithms when the chain is in the tail of the posterior distribution and note that all these phenomena can occur. In Bayesian phylogenetics, the likelihood function approaches a constant instead of zero when the branch lengths increase to infinity. The flat tail of the likelihood can cause poor mixing and undue influence of the prior. We suggest that the main cause of the extreme branch length estimates produced in many Bayesian analyses is the poor choice of a default prior on branch lengths in current Bayesian phylogenetic programs. The default prior in MrBayes assigns independent and identical distributions to branch lengths, imposing strong (and unreasonable) assumptions about the tree length. The problem is exacerbated by the strong correlation between the branch lengths and parameters in models of variable rates among sites or among site partitions. To resolve the problem, we suggest two multivariate priors for the branch lengths (called compound Dirichlet priors) that are fairly diffuse and demonstrate their utility in the special case of branch length estimation on a star phylogeny. Our analysis highlights the need for careful thought in the specification of high-dimensional priors in Bayesian analyses.

Authors+Show Affiliations

Center for Computational and Evolutionary Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.No affiliation info availableNo affiliation info available

Pub Type(s)

Journal Article
Research Support, Non-U.S. Gov't

Language

eng

PubMed ID

21890479

Citation

Rannala, Bruce, et al. "Tail Paradox, Partial Identifiability, and Influential Priors in Bayesian Branch Length Inference." Molecular Biology and Evolution, vol. 29, no. 1, 2012, pp. 325-35.
Rannala B, Zhu T, Yang Z. Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference. Mol Biol Evol. 2012;29(1):325-35.
Rannala, B., Zhu, T., & Yang, Z. (2012). Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference. Molecular Biology and Evolution, 29(1), 325-35. https://doi.org/10.1093/molbev/msr210
Rannala B, Zhu T, Yang Z. Tail Paradox, Partial Identifiability, and Influential Priors in Bayesian Branch Length Inference. Mol Biol Evol. 2012;29(1):325-35. PubMed PMID: 21890479.
* Article titles in AMA citation format should be in sentence-case
TY - JOUR T1 - Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference. AU - Rannala,Bruce, AU - Zhu,Tianqi, AU - Yang,Ziheng, Y1 - 2011/09/02/ PY - 2011/9/6/entrez PY - 2011/9/6/pubmed PY - 2012/8/16/medline SP - 325 EP - 35 JF - Molecular biology and evolution JO - Mol Biol Evol VL - 29 IS - 1 N2 - Recent studies have observed that Bayesian analyses of sequence data sets using the program MrBayes sometimes generate extremely large branch lengths, with posterior credibility intervals for the tree length (sum of branch lengths) excluding the maximum likelihood estimates. Suggested explanations for this phenomenon include the existence of multiple local peaks in the posterior, lack of convergence of the chain in the tail of the posterior, mixing problems, and misspecified priors on branch lengths. Here, we analyze the behavior of Bayesian Markov chain Monte Carlo algorithms when the chain is in the tail of the posterior distribution and note that all these phenomena can occur. In Bayesian phylogenetics, the likelihood function approaches a constant instead of zero when the branch lengths increase to infinity. The flat tail of the likelihood can cause poor mixing and undue influence of the prior. We suggest that the main cause of the extreme branch length estimates produced in many Bayesian analyses is the poor choice of a default prior on branch lengths in current Bayesian phylogenetic programs. The default prior in MrBayes assigns independent and identical distributions to branch lengths, imposing strong (and unreasonable) assumptions about the tree length. The problem is exacerbated by the strong correlation between the branch lengths and parameters in models of variable rates among sites or among site partitions. To resolve the problem, we suggest two multivariate priors for the branch lengths (called compound Dirichlet priors) that are fairly diffuse and demonstrate their utility in the special case of branch length estimation on a star phylogeny. Our analysis highlights the need for careful thought in the specification of high-dimensional priors in Bayesian analyses. SN - 1537-1719 UR - https://www.unboundmedicine.com/medline/citation/21890479/Tail_paradox_partial_identifiability_and_influential_priors_in_Bayesian_branch_length_inference_ L2 - https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msr210 DB - PRIME DP - Unbound Medicine ER -