Categories
Nevin Manimala Statistics

Mapping EQ-5D-5L Score From SGRQ in Patients with Asthma and/or COPD in NOVELTY

Pragmat Obs Res. 2025 Jun 4;16:135-145. doi: 10.2147/POR.S508814. eCollection 2025.

ABSTRACT

PURPOSE: The St George’s Respiratory Questionnaire (SGRQ) measures health status in obstructive airways disease. Starkie et al proposed an algorithm for mapping the SGRQ to EQ-5D-5L, a preference-based utility measure, in chronic obstructive pulmonary disease (COPD) (Value Health 2011;14:354-60); only SGRQ total score, its squared value, and sex were included as covariates. We aimed to determine if including additional covariates could improve the performance of this algorithm type and whether amendments were required to extend this mapping to asthma or asthma+COPD.

PATIENTS AND METHODS: SGRQ and EQ-5D-5L were measured from a large, global, prospective, longitudinal study in asthma and/or COPD (NOVELTY; NCT02760329). We fitted six longitudinal linear mixed models to the development sample (baseline and Year 1 data), with EQ-5D-5L as the response variable. Each model had a different combination of covariates. Mixed model repeated measures methodology was used to enable the accommodation of within-patient correlation among measurements. Restricted maximum likelihood and an unstructured covariance matrix were used to fit all models. Performance (mean square errors [MSE]) was evaluated relative to the Starkie et al algorithm in the validation sample (Year 2 and Year 3 data).

RESULTS: A total of 6813 patients (asthma: 3546; asthma+COPD: 872; COPD: 2395) with available EQ-5D-5L and SGRQ data were included at baseline. MSEs indicated good performance, were similar across models (Year 2: 0.0302-0.0308 [45-46% variance explained]; Year 3: 0.0272-0.0277 [47-48% variance explained]), and were modestly smaller than those obtained by Starkie et al (Year 2: 0.0340; Year 3: 0.0296). Performance was similar across models in the asthma and COPD subgroups.

CONCLUSION: Including additional covariates and SGRQ domains resulted in similar model performance to Starkie et al, suggesting their covariates are adequate for mapping in asthma and/or COPD. NOVELTY coefficients broaden the population with chronic airways disease for whom this mapping can be applied.

PMID:40502208 | PMC:PMC12152420 | DOI:10.2147/POR.S508814

Categories
Nevin Manimala Statistics

Cyclo-stationary distributions of mRNA and Protein counts for random cell division times

bioRxiv [Preprint]. 2025 Jun 8:2025.06.06.658238. doi: 10.1101/2025.06.06.658238.

ABSTRACT

There is a long history of using experimental and computational approaches to study noise in single-cell levels of mRNA and proteins. The noise originates from a myriad of factors: intrinsic processes of gene expression, partitioning errors during division, and extrinsic effects, such as, random cell-cycle times. Although theoretical methods are well developed to analytically understand full statistics of copy numbers for fixed or Erlang distributed cell cycle times, the general problem of random division times is still open. For any random (but uncorrelated) division time distribution, we present a method to address this challenging problem and obtain exact series representations of the copy number distributions in the cyclo-stationary state. We provide explicit cell age-specific and age-averaged results, and analyze the relative contribution to noise from intrinsic and extrinsic sources. Our analytical approach will aid the analysis of single-cell expression data and help in disentangling the impact of variability in division times.

PMID:40502203 | PMC:PMC12157499 | DOI:10.1101/2025.06.06.658238

Categories
Nevin Manimala Statistics

Z-Form Stabilization By The Zα Domain Of Adar1p150 Has Subtle Effects On A-To-I Editing

bioRxiv [Preprint]. 2025 Jun 3:2025.06.02.657529. doi: 10.1101/2025.06.02.657529.

ABSTRACT

The role of Adenosine Deaminase Acting on RNA 1 (ADAR1)’s Z-conformation stabilizing Zα domain in A-to-I editing is unclear. Previous studies on Zα mutations faced limitations, including variable ADAR1p150 expression, differential editing analysis challenges, and unaccounted changes in ADAR1p150 localization. To address these issues, we developed a Cre-lox system in ADAR1p150 KO cells to generate stable cell lines expressing Zα mutant ADAR1p150 constructs. Using total RNA sequencing analyzing editing clusters as a proxy for dsRNAs, we found that Zα mutations slightly decreased overall A-to-I editing, consistent with recent findings. These decreases correlated with mislocalization of ADAR1p150 rather than reduced editing specificity, and practically no statistically significant differentially edited sites were identified between wild-type and Zα mutant ADAR1p150 constructs. These results suggest that Zα’s impact on editing is minor and that phenotypes in Zα mutant mouse models and human patients may arise from editing-independent inhibition of Z-DNA-Binding Protein 1 (ZBP1), rather than changes in RNA editing.

PMID:40502085 | PMC:PMC12157636 | DOI:10.1101/2025.06.02.657529

Categories
Nevin Manimala Statistics

Would you agree if N is three? On statistical inference for small N

bioRxiv [Preprint]. 2025 Jun 2:2024.08.26.609821. doi: 10.1101/2024.08.26.609821.

ABSTRACT

Non-human primate studies traditionally use two or three animals. We previously used standard statistics to argue for using either one animal, for an inference about that sample, or five or more animals, for a useful inference about the population. A recently proposed framework argued for testing three animals and accepting the outcome found in the majority as the outcome that is most representative for the population. The proposal tests this framework under various assumptions about the true probability of the representative outcome in the population, i.e. its typicality. On this basis, it argues that the framework is valid across a wide range of typicalities. Here, we show (1) that the error rate of the framework depends strongly on the typicality of the representative outcome, (2) that an acceptable error rate requires this typicality to be very high (87% for a single type of outlier), which actually renders empirical testing beyond a single animal obsolete, (3) that moving from one to three animals decreases error rates mainly for typicality values of 70-90%, and much less for both lower and higher values. Furthermore, we use conjunction analysis to demonstrate that two out of three animals with a given outcome only allow to infer a lower bound to typicality of 9%, which is of limited value. Thus, the use of two or three animals does not allow a useful inference about the population, and if this option is nevertheless chosen, the inferred lower bound of typicality should be reported.

PMID:40502083 | PMC:PMC12157567 | DOI:10.1101/2024.08.26.609821

Categories
Nevin Manimala Statistics

Effects of Natural Lithium and Lithium Isotopes on Voltage Gated Sodium Channel Activity in SH-SY5Y and IPSC Derived Cortical Neurons

bioRxiv [Preprint]. 2025 Jun 1:2025.05.28.656602. doi: 10.1101/2025.05.28.656602.

ABSTRACT

Although lithium (Li) is a widely used treatment for bipolar disorder, its exact mechanisms of action remain elusive. Research has shown that the two stable Li isotopes, which differ in their mass and nuclear spin, can induce distinct effects in both in vivo and in vitro studies. Since sodium (Na + ) channels are the primary pathway for Li + entry into cells, we examined how Li + affects the current of Na + channels using whole-cell patch-clamp techniques on SH-SY5Y neuroblastoma cells and human iPSC-derived cortical neurons. Our findings indicate that mammalian Na + channels in both neuronal models studied here display no selectivity between Na + and Li + , unlike previously reported bacterial Na + channels. We observed differences between the two neuronal models in three measured parameters ( V half , G max , z ). We saw no statistically significant differences between any ions in SHSY-5Y cells, but small differences in the half-maximum activation potential ( V half ) between Na + and 6 Li + and between 7 Li + and 6 Li + were found in iPSC-derived cortical neurons. Although Na + channels are widely expressed and important in neuronal function, the very small differences observed in this work suggest that Li + regulation through Na + channels is likely not the primary mechanism underlying Li + isotope differentiation.

PMID:40502081 | PMC:PMC12154711 | DOI:10.1101/2025.05.28.656602

Categories
Nevin Manimala Statistics

It’s a wrap: deriving distinct discoveries with FDR control after a GWAS pipeline

bioRxiv [Preprint]. 2025 Jun 8:2025.06.05.658138. doi: 10.1101/2025.06.05.658138.

ABSTRACT

Recent work has shown how to test conditional independence hypotheses between an outcome of interest and a large number of explanatory variables with false discovery rate control (FDR), even without access to individual level data. In the case of genome-wide association studies (GWAS) specifically, summary statistics resulting from the standard analysis pipeline can be used as input of a procedure which identifies distinct signals across the genome with FDR control. This secondary analysis requires sampling of negative controls (knockoff) from a distribution determined by the linkage disequilibrium patterns in the genome of the population under study. In prior work, we have pre-computed this distribution for European genomes, starting from information derived from the UK Biobank. Thus, researchers working with European GWASes can carry out a knockoff analysis with minimal computational costs, using the distributed routine GhostKnockoffGWAS . Here we introduce and release a new software (solveblock) that extends this capability to a much richer collection of studies. Given a set of genotyped samples, or a reference dataset, our pipeline efficiently estimates the high-dimensional correlation matrices that describe correlation structures across the genome, making rather common sparsity assumptions. Taking this sample-specific estimate as input, the software identifies groups of genetic variants that are highly correlated, and uses them to define an appropriate resolution for conditional independence hypotheses. Finally, we compute the distribution for the exchangeable negative controls necessary to test these hypotheses. The output of solveblock can be passed directly to GhostKnockoffGWAS , allowing users to carry out the complete analysis in a two step procedure. We illustrate the performance of the routine analyzing data from five UK Biobank sub-populations. In simulations, our method controls FDR. Analyzing real data relative to 26 phenotypes of varying polygenicity in British individuals, we make an average of ≈ 19 additional discoveries, compared to standard marginal association testing. Our code, precompiled software, and processed files for these five sub-populations are openly shared.

PMID:40502041 | PMC:PMC12157521 | DOI:10.1101/2025.06.05.658138

Categories
Nevin Manimala Statistics

Improved quantitative accuracy in data-independent acquisition proteomics via retention time boundary imputation

bioRxiv [Preprint]. 2025 May 31:2025.05.27.656394. doi: 10.1101/2025.05.27.656394.

ABSTRACT

The traditional approaches to handling missing values in DIA proteomics are to either remove high-missingness proteins or impute them with statistical procedures. Both have their disadvantages-removal can limit statistical power, while imputation can introduce spurious correlations or dilute signal. We present an alternative approach based on imputing peptide retention times (RTs) rather than quantitations. For each missing value, we impute the RT boundaries, then obtain a quantitation by integrating the chromatographic signal within the imputed boundaries. Our method yields more accurate quantitations than existing proteomics imputation methods. RT boundary imputation also identifies differentially abundant peptides from key Alzheimer’s genes that were not identified with library search alone. RT boundary imputation improves the ability to estimate radiation exposure in biological tissues. RT boundary imputation significantly increases the number of peptides with quantitations, leading to increases in statistical power. Finally, RT boundary imputation better quantifies low abundance peptides than library search alone. Our RT boundary imputation method, called Nettle, is available as a standalone tool.

PMID:40502008 | PMC:PMC12154835 | DOI:10.1101/2025.05.27.656394

Categories
Nevin Manimala Statistics

A Bayesian Approach for Identifying Driver Mutations within Oncogenic Pathways through Mutual Exclusivity

bioRxiv [Preprint]. 2025 May 31:2025.05.27.656485. doi: 10.1101/2025.05.27.656485.

ABSTRACT

Distinguishing driver mutations from the large background of passenger mutations remains a major challenge in cancer genomics. Evidence-based approaches to nominate driver mutations are often limited by the availability of experimental or clinical validation for specific variants. As clinical sequencing becomes integrated into patient care, computational methods provide powerful opportunities to analyze expanding genomic datasets and identify functional candidates beyond the current knowledge base. Among various analytical frameworks, mutual exclusivity, the observation that mutations in two or more genes tend not to co-occur within the same tumor, has been particularly attractive. Building on this principle, we propose BayesMAGPIE, a refined version of a statistical method, MAGPIE, developed previously for identifying driver genes within oncogenic pathways. The new method introduces two key innovations. First, it incorporates information on mutation type using a Bayesian hierarchical modeling framework, enabling the distinction between potential differences in functional effects among variants within the same gene, thereby improving the accuracy of driver identification. Second, it models gene-specific driver frequencies with a Dirichlet prior which effectively controls the sparsity of the inferred driver set and aligns with the biological expectation that most tumor types are driven by a small number of genes. We evaluate BayesMAGPIE through extensive simulation studies to assess its estimation bias and accuracy in driver identification, and benchmark its performance against MAGPIE using TCGA data from eight cancer types.

PMID:40501980 | PMC:PMC12154917 | DOI:10.1101/2025.05.27.656485

Categories
Nevin Manimala Statistics

Dynamic flexibility of the murine gut microbiota to morphine disturbance enables escape from the stable dysbiosis associated with addiction-like behavior

bioRxiv [Preprint]. 2025 Jun 1:2025.06.01.657215. doi: 10.1101/2025.06.01.657215.

ABSTRACT

Although opioids are effective analgesics, they can lead to problematic drug use behaviors that underlie opioid use disorder (OUD). Opioids also drive gut microbiota dysbiosis which is linked to altered opioid responses tied to OUD. To interrogate the role of the gut microbiota in a mouse model of OUD, we used a longitudinal paradigm of voluntary oral morphine self-administration to capture multiple facets of drug seeking and preserve both individual behavioral response and gut microbiota variation to examine associations between these two variables. After prolonged morphine consumption, only a subset of mice transitioned to a state we define statistically as compulsive. In compulsive mice, morphine fragmented the microbiota networks which subsequently reorganized to form robust novel connections. In contrast, the communities of non-compulsive mice also changed but were highly interconnected during morphine disturbance and maintained more continuity post morphine suggesting dynamic flexibility. Compulsive mice displayed a greater loss of functional diversity and a shift towards a new stable state dominated by potential pathobionts, whereas non-compulsive mice better preserved genera associated with gut health and broader functional diversity. These findings highlight how persistent and stable gut microbiota dysbiosis aligns with long-term behavioral changes underlying OUD, potentially contributing to relapse.

PMID:40501972 | PMC:PMC12154951 | DOI:10.1101/2025.06.01.657215

Categories
Nevin Manimala Statistics

Chevreul: An R Bioconductor Package for Exploratory Analysis of Full-Length Single Cell Sequencing

bioRxiv [Preprint]. 2025 Jun 1:2025.05.27.656486. doi: 10.1101/2025.05.27.656486.

ABSTRACT

Chevreul is an open-source R Bioconductor package and interactive R Shiny app for processing and visualization of single cell RNA sequencing (scRNA-seq) data. It differs from other scRNA- seq analysis packages in its ease of use, its capacity to analyze full-length RNA sequencing data for exon coverage and transcript isoform inference, and its support for batch correction. Chevreul enables exploratory analysis of scRNA-seq data using Bioconductor SingleCellExperiment or Seurat objects. Simple processing functions with sensible default settings enable batch integration, quality control filtering, read count normalization and transformation, dimensionality reduction, clustering at a range of resolutions, and cluster marker gene identification. Processed data can be visualized in an interactive R Shiny app with dynamically linked plots. Expression of gene or transcript features can be displayed on PCA, tSNE, and UMAP embeddings, heatmaps, or violin plots while differential expression can be evaluated with several statistical tests without extensive programming. Existing analysis tools do not provide specialized tools for isoform-level analysis or alternative splicing detection. By enabling isoform-level expression analysis for differential expression, dimensionality reduction and batch integration, Chevreul empowers researchers without prior programming experience to analyze full-length scRNA-seq data.

DATA AVAILABILITY: A test dataset formatted as a SingleCellExperiment object can be found at https://github.com/cobriniklab/chevreuldata .

AVAILABILITY & IMPLEMENTATION: Chevreul is implemented in R and the R package and integrated Shiny application are freely available at https://github.com/cobriniklab/chevreul .

PMID:40501968 | PMC:PMC12154678 | DOI:10.1101/2025.05.27.656486