Categories
Nevin Manimala Statistics

Lq-based robust analytics on ultrahigh and high dimensional data

Stat Med. 2022 Sep 13. doi: 10.1002/sim.9563. Online ahead of print.

ABSTRACT

Ultrahigh and high dimensional data are common in regression analysis for various fields, such as omics data, finance, and biological engineering. In addition to the problem of dimension, the data might also be contaminated. There are two main types of contamination: outliers and model misspecification. We develop an unique method that takes into account the ultrahigh or high dimensional issues and both types of contamination. In this article, we propose a framework for feature screening and selection based on the minimum Lq-likelihood estimation (MLqE), which accounts for the model misspecification contamination issue and has also been shown to be robust to outliers. In numerical analysis, we explore the robustness of this framework under different outliers and model misspecification scenarios. To examine the performance of this framework, we conduct real data analysis using the skin cutaneous melanoma data. When comparing with traditional screening and feature selection methods, the proposed method shows superiority in both variable identification effectiveness and parameter estimation accuracy.

PMID:36098057 | DOI:10.1002/sim.9563

By Nevin Manimala

Portfolio Website for Nevin Manimala