Pitfalls in the statistical analysis of microbiome amplicon sequencing data

Mol Ecol Resour. 2022 Nov 4. doi: 10.1111/1755-0998.13730. Online ahead of print.

ABSTRACT

Microbiome data are characterized by several aspects that make them challenging to analyse statistically: they are compositional, high dimensional, and rich in zeros. There is a large array of statistical methods used to analyse these data. Some are borrowed from other fields, as from ecology or RNA-Sequencing, while others are custom-made for microbiome data. The large range of available methods, which moreover is continuously expanding, means that researchers have to invest a considerable effort in choosing what method(s) to apply. In this paper we list 14 statistical methods or approaches that we think should be generally avoided. In several cases this is because we believe the assumptions behind the method are unlikely to be met for microbiome data. In other cases we see methods that are used in ways they are not intended to be used. We believe researchers would be helped by more critical evaluations of existing methods, as not all methods in use are suitable or have been sufficiently reviewed. We hope this paper contributes to such a critical discussion on what methods are appropriate to use in the analysis of microbiome data.

PMID:36330663 | DOI:10.1111/1755-0998.13730

By Nevin Manimala