Categories
Nevin Manimala Statistics

Determination of the Stage Composition of Plasmodium Infections from Bulk Gene Expression Data

mSystems. 2022 Jul 5:e0025822. doi: 10.1128/msystems.00258-22. Online ahead of print.

ABSTRACT

Malaria symptoms are caused by the development of the parasites within the blood of an infected host. Bulk RNA sequencing (RNA-seq) of infected blood can reveal interactions between parasites and the host immune system during an infection, but because multiple developmental stages with distinct transcriptional profiles are concurrently present in infected blood, it is necessary to correct such analyses for differences in cell composition among samples. Gene expression deconvolution is a statistical approach that has been developed for inferring the cell composition of complex tissues characterized by bulk RNA-seq using gene expression profiles from reference cell types. Here, we describe the evaluation of a species-agnostic reference data set that can be used for efficient and accurate gene expression deconvolution of bulk RNA-seq data generated from any Plasmodium species and for correct gene expression analyses for biases caused by differences in stage composition among samples. IMPORTANCE Differences in cell type proportions among samples can introduce artifacts in gene expression analyses and mask genuine differences in gene regulation. Gene expression deconvolution allows estimation of the proportion of each cell type present in one sample directly from bulk RNA sequencing data, but this approach requires a reference data set with the signature profile of each cell type. Here, we evaluate the suitability of a rodent malaria parasite gene expression data set for estimating the proportions of each parasite developmental stage present in bulk RNA sequencing data generated from blood-stage infections with the human parasites Plasmodium falciparum and Plasmodium vivax. These analyses provide a species-agnostic approach for reliably estimating stage proportions in infected human blood and correcting subsequent gene expression analyses for these variations.

PMID:35862820 | DOI:10.1128/msystems.00258-22

By Nevin Manimala

Portfolio Website for Nevin Manimala