Int J Biostat. 2023 Mar 6. doi: 10.1515/ijb-2022-0063. Online ahead of print.
Common statistical approaches are not designed to deal with so-called “short fat data” in biomarker pilot studies, where the number of biomarker candidates exceeds the sample size by magnitudes. High-throughput technologies for omics data enable the measurement of ten thousands and more biomarker candidates for specific diseases or states of a disease. Due to the limited availability of study participants, ethical reasons and high costs for sample processing and analysis researchers often prefer to start with a small sample size pilot study in order to judge the potential of finding biomarkers that enable – usually in combination – a sufficiently reliable classification of the disease state under consideration. We developed a user-friendly tool, called HiPerMAb that allows to evaluate pilot studies based on performance measures like multiclass AUC, entropy, area above the cost curve, hypervolume under manifold, and misclassification rate using Monte-Carlo simulations to compute the p-values and confidence intervals. The number of “good” biomarker candidates is compared to the expected number of “good” biomarker candidates in a data set with no association to the considered disease states. This allows judging the potential in the pilot study even if statistical tests with correction for multiple testing fail to provide any hint of significance.
PMID:36867668 | DOI:10.1515/ijb-2022-0063