Categories
Nevin Manimala Statistics

Parameter Identifiability for a Profile Mixture Model of Protein Evolution

J Comput Biol. 2021 May 6. doi: 10.1089/cmb.2020.0315. Online ahead of print.

ABSTRACT

A profile mixture (PM) model is a model of protein evolution, describing sequence data in which sites are assumed to follow many related substitution processes on a single evolutionary tree. The processes depend, in part, on different amino acid distributions, or profiles, varying over sites in aligned sequences. A fundamental question for any stochastic model, which must be answered positively to justify model-based inference, is whether the parameters are identifiable from the probability distribution they determine. Here, using algebraic methods, we show that a PM model has identifiable parameters under circumstances in which it is likely to be used for empirical analyses. In particular, for a tree relating 9 or more taxa, both the tree topology and all numerical parameters are generically identifiable when the number of profiles is less than 74.

PMID:33960831 | DOI:10.1089/cmb.2020.0315

By Nevin Manimala

Portfolio Website for Nevin Manimala