Categories
Nevin Manimala Statistics

Combining wavelength importance ranking to the random forest classifier to analyze multiclass spectral data

Forensic Sci Int. 2021 Sep 14;328:110998. doi: 10.1016/j.forsciint.2021.110998. Online ahead of print.

ABSTRACT

Near Infrared (NIR) is a type of vibrational spectroscopy widely used in different areas to characterize substances. NIR datasets are comprised of absorbance measures on a range of wavelengths (λ). Typically noisy and correlated, the use of such datasets tend to compromise the performance of several statistical techniques; one way to overcome that is to select portions of the spectra in which wavelengths are more informative. In this paper we investigate the performance of the Random Forest (RF) classifier associated with several wavelength importance ranking approaches on the task of classifying product samples into categories, such as quality levels or authenticity. Our propositions are tested using six NIR datasets comprised of two or more classes of food and pharmaceutical products, as well as illegal drugs. Our proposed classification model, an integration of the χ2 ranking score and the RF classifier, substantially reduced the number of wavelengths in the dataset, while increasing the classification accuracy when compared to the use of complete datasets. Our propositions also presented good performance when compared to competing methods available in the literature.

PMID:34551367 | DOI:10.1016/j.forsciint.2021.110998

By Nevin Manimala

Portfolio Website for Nevin Manimala