Categories
Nevin Manimala Statistics

Effects of Natural Lithium and Lithium Isotopes on Voltage Gated Sodium Channel Activity in SH-SY5Y and IPSC Derived Cortical Neurons

bioRxiv [Preprint]. 2025 Jun 1:2025.05.28.656602. doi: 10.1101/2025.05.28.656602.

ABSTRACT

Although lithium (Li) is a widely used treatment for bipolar disorder, its exact mechanisms of action remain elusive. Research has shown that the two stable Li isotopes, which differ in their mass and nuclear spin, can induce distinct effects in both in vivo and in vitro studies. Since sodium (Na + ) channels are the primary pathway for Li + entry into cells, we examined how Li + affects the current of Na + channels using whole-cell patch-clamp techniques on SH-SY5Y neuroblastoma cells and human iPSC-derived cortical neurons. Our findings indicate that mammalian Na + channels in both neuronal models studied here display no selectivity between Na + and Li + , unlike previously reported bacterial Na + channels. We observed differences between the two neuronal models in three measured parameters ( V half , G max , z ). We saw no statistically significant differences between any ions in SHSY-5Y cells, but small differences in the half-maximum activation potential ( V half ) between Na + and 6 Li + and between 7 Li + and 6 Li + were found in iPSC-derived cortical neurons. Although Na + channels are widely expressed and important in neuronal function, the very small differences observed in this work suggest that Li + regulation through Na + channels is likely not the primary mechanism underlying Li + isotope differentiation.

PMID:40502081 | PMC:PMC12154711 | DOI:10.1101/2025.05.28.656602

Categories
Nevin Manimala Statistics

It’s a wrap: deriving distinct discoveries with FDR control after a GWAS pipeline

bioRxiv [Preprint]. 2025 Jun 8:2025.06.05.658138. doi: 10.1101/2025.06.05.658138.

ABSTRACT

Recent work has shown how to test conditional independence hypotheses between an outcome of interest and a large number of explanatory variables with false discovery rate control (FDR), even without access to individual level data. In the case of genome-wide association studies (GWAS) specifically, summary statistics resulting from the standard analysis pipeline can be used as input of a procedure which identifies distinct signals across the genome with FDR control. This secondary analysis requires sampling of negative controls (knockoff) from a distribution determined by the linkage disequilibrium patterns in the genome of the population under study. In prior work, we have pre-computed this distribution for European genomes, starting from information derived from the UK Biobank. Thus, researchers working with European GWASes can carry out a knockoff analysis with minimal computational costs, using the distributed routine GhostKnockoffGWAS . Here we introduce and release a new software (solveblock) that extends this capability to a much richer collection of studies. Given a set of genotyped samples, or a reference dataset, our pipeline efficiently estimates the high-dimensional correlation matrices that describe correlation structures across the genome, making rather common sparsity assumptions. Taking this sample-specific estimate as input, the software identifies groups of genetic variants that are highly correlated, and uses them to define an appropriate resolution for conditional independence hypotheses. Finally, we compute the distribution for the exchangeable negative controls necessary to test these hypotheses. The output of solveblock can be passed directly to GhostKnockoffGWAS , allowing users to carry out the complete analysis in a two step procedure. We illustrate the performance of the routine analyzing data from five UK Biobank sub-populations. In simulations, our method controls FDR. Analyzing real data relative to 26 phenotypes of varying polygenicity in British individuals, we make an average of ≈ 19 additional discoveries, compared to standard marginal association testing. Our code, precompiled software, and processed files for these five sub-populations are openly shared.

PMID:40502041 | PMC:PMC12157521 | DOI:10.1101/2025.06.05.658138

Categories
Nevin Manimala Statistics

Improved quantitative accuracy in data-independent acquisition proteomics via retention time boundary imputation

bioRxiv [Preprint]. 2025 May 31:2025.05.27.656394. doi: 10.1101/2025.05.27.656394.

ABSTRACT

The traditional approaches to handling missing values in DIA proteomics are to either remove high-missingness proteins or impute them with statistical procedures. Both have their disadvantages-removal can limit statistical power, while imputation can introduce spurious correlations or dilute signal. We present an alternative approach based on imputing peptide retention times (RTs) rather than quantitations. For each missing value, we impute the RT boundaries, then obtain a quantitation by integrating the chromatographic signal within the imputed boundaries. Our method yields more accurate quantitations than existing proteomics imputation methods. RT boundary imputation also identifies differentially abundant peptides from key Alzheimer’s genes that were not identified with library search alone. RT boundary imputation improves the ability to estimate radiation exposure in biological tissues. RT boundary imputation significantly increases the number of peptides with quantitations, leading to increases in statistical power. Finally, RT boundary imputation better quantifies low abundance peptides than library search alone. Our RT boundary imputation method, called Nettle, is available as a standalone tool.

PMID:40502008 | PMC:PMC12154835 | DOI:10.1101/2025.05.27.656394

Categories
Nevin Manimala Statistics

A Bayesian Approach for Identifying Driver Mutations within Oncogenic Pathways through Mutual Exclusivity

bioRxiv [Preprint]. 2025 May 31:2025.05.27.656485. doi: 10.1101/2025.05.27.656485.

ABSTRACT

Distinguishing driver mutations from the large background of passenger mutations remains a major challenge in cancer genomics. Evidence-based approaches to nominate driver mutations are often limited by the availability of experimental or clinical validation for specific variants. As clinical sequencing becomes integrated into patient care, computational methods provide powerful opportunities to analyze expanding genomic datasets and identify functional candidates beyond the current knowledge base. Among various analytical frameworks, mutual exclusivity, the observation that mutations in two or more genes tend not to co-occur within the same tumor, has been particularly attractive. Building on this principle, we propose BayesMAGPIE, a refined version of a statistical method, MAGPIE, developed previously for identifying driver genes within oncogenic pathways. The new method introduces two key innovations. First, it incorporates information on mutation type using a Bayesian hierarchical modeling framework, enabling the distinction between potential differences in functional effects among variants within the same gene, thereby improving the accuracy of driver identification. Second, it models gene-specific driver frequencies with a Dirichlet prior which effectively controls the sparsity of the inferred driver set and aligns with the biological expectation that most tumor types are driven by a small number of genes. We evaluate BayesMAGPIE through extensive simulation studies to assess its estimation bias and accuracy in driver identification, and benchmark its performance against MAGPIE using TCGA data from eight cancer types.

PMID:40501980 | PMC:PMC12154917 | DOI:10.1101/2025.05.27.656485

Categories
Nevin Manimala Statistics

Dynamic flexibility of the murine gut microbiota to morphine disturbance enables escape from the stable dysbiosis associated with addiction-like behavior

bioRxiv [Preprint]. 2025 Jun 1:2025.06.01.657215. doi: 10.1101/2025.06.01.657215.

ABSTRACT

Although opioids are effective analgesics, they can lead to problematic drug use behaviors that underlie opioid use disorder (OUD). Opioids also drive gut microbiota dysbiosis which is linked to altered opioid responses tied to OUD. To interrogate the role of the gut microbiota in a mouse model of OUD, we used a longitudinal paradigm of voluntary oral morphine self-administration to capture multiple facets of drug seeking and preserve both individual behavioral response and gut microbiota variation to examine associations between these two variables. After prolonged morphine consumption, only a subset of mice transitioned to a state we define statistically as compulsive. In compulsive mice, morphine fragmented the microbiota networks which subsequently reorganized to form robust novel connections. In contrast, the communities of non-compulsive mice also changed but were highly interconnected during morphine disturbance and maintained more continuity post morphine suggesting dynamic flexibility. Compulsive mice displayed a greater loss of functional diversity and a shift towards a new stable state dominated by potential pathobionts, whereas non-compulsive mice better preserved genera associated with gut health and broader functional diversity. These findings highlight how persistent and stable gut microbiota dysbiosis aligns with long-term behavioral changes underlying OUD, potentially contributing to relapse.

PMID:40501972 | PMC:PMC12154951 | DOI:10.1101/2025.06.01.657215

Categories
Nevin Manimala Statistics

Chevreul: An R Bioconductor Package for Exploratory Analysis of Full-Length Single Cell Sequencing

bioRxiv [Preprint]. 2025 Jun 1:2025.05.27.656486. doi: 10.1101/2025.05.27.656486.

ABSTRACT

Chevreul is an open-source R Bioconductor package and interactive R Shiny app for processing and visualization of single cell RNA sequencing (scRNA-seq) data. It differs from other scRNA- seq analysis packages in its ease of use, its capacity to analyze full-length RNA sequencing data for exon coverage and transcript isoform inference, and its support for batch correction. Chevreul enables exploratory analysis of scRNA-seq data using Bioconductor SingleCellExperiment or Seurat objects. Simple processing functions with sensible default settings enable batch integration, quality control filtering, read count normalization and transformation, dimensionality reduction, clustering at a range of resolutions, and cluster marker gene identification. Processed data can be visualized in an interactive R Shiny app with dynamically linked plots. Expression of gene or transcript features can be displayed on PCA, tSNE, and UMAP embeddings, heatmaps, or violin plots while differential expression can be evaluated with several statistical tests without extensive programming. Existing analysis tools do not provide specialized tools for isoform-level analysis or alternative splicing detection. By enabling isoform-level expression analysis for differential expression, dimensionality reduction and batch integration, Chevreul empowers researchers without prior programming experience to analyze full-length scRNA-seq data.

DATA AVAILABILITY: A test dataset formatted as a SingleCellExperiment object can be found at https://github.com/cobriniklab/chevreuldata .

AVAILABILITY & IMPLEMENTATION: Chevreul is implemented in R and the R package and integrated Shiny application are freely available at https://github.com/cobriniklab/chevreul .

PMID:40501968 | PMC:PMC12154678 | DOI:10.1101/2025.05.27.656486

Categories
Nevin Manimala Statistics

Robust statistical assessment of Oncogenotype to Organotropism translation in xenografted zebrafish

bioRxiv [Preprint]. 2025 Jun 1:2025.05.28.656734. doi: 10.1101/2025.05.28.656734.

ABSTRACT

Organotropism results from the functional versatility of metastatic cancer cells to survive and proliferate in diverse microenvironments. This adaptivity can originate in clonal variation of the spreading tumor and is often empowered by epigenetic and molecular reprogramming of cell regulatory circuits. Related to organotropic colonization of metastatic sites are environmentally-sensitive, differential responses of cancer cells to therapeutic attack. Accordingly, understanding the organotropic profile of a cancer and probing the underlying driver mechanisms are of high clinical importance. However, determining systematically the organotropism of one cancer versus the organotropism of another cancer, potentially with the granularity of comparing the same cancer type between patients or tracking the evolution of a cancer in a single patient for the purpose of personalized treatment, has remained very challenging. It requires a host organism that allows observation of the spreading pattern over relatively short experimental times. Moreover, organotropic patterns often tend to be statistically weak and superimposed by experimental variation. Thus, an assay for organotropism must give access to statistical powers that can separate ‘meaningful heterogeneity’, i.e., heterogeneity that determines organotropism, from ‘meaningless heterogeneity’, i.e., heterogeneity that causes experimental noise. Here we describe an experimental workflow that leverages the physiological properties of zebrafish larvae for an imaging-based assessment of organotropic patterns over a time-frame of 3 days. The workflow incorporates computer vision pipelines to automatically integrate the stochastic spreading behavior of a particular cancer xenograft in tens to hundreds of larvae allowing subtle trends in the colonization of particular organs to emerge above random cell depositions throughout the host organism. We validate our approach with positive control experiments comparing the spreading patterns of a metastatic sarcoma against non-transformed fibroblasts and the spreading patterns of two melanoma cell lines with previously established differences in metastatic propensity. We then show that integration of the spreading pattern of xenografts in 40 – 50 larvae is necessary and sufficient to generate a Fish Metastatic Atlas page that is representative of the organotropism of a particular oncogenotype and experimental condition. Finally, we apply the power of this assay to determine the function of the EWSR1::FLI1 fusion oncogene and its transcriptional target SOX6 as plasticity factors that enhance the adaptive capacity of metastatic Ewing sarcoma.

PMID:40501949 | PMC:PMC12154647 | DOI:10.1101/2025.05.28.656734

Categories
Nevin Manimala Statistics

Allele Specific Expression Quality Control Fills Critical Gap in Transcriptome Assisted Rare Variant Interpretation

bioRxiv [Preprint]. 2025 Jun 8:2025.05.30.657086. doi: 10.1101/2025.05.30.657086.

ABSTRACT

Allele-specific expression (ASE) captures the functional impact of genetic variation on transcription, offering a high-resolution view of cis-regulatory effects, but its quality can be diminished by technical, biological, and analysis artifacts. We introduce aseQC, a statistical framework that quantifies sample-level ASE quality in terms of the overall expected extra-binomial variation to exclude uncharacteristically noisy samples in a cohort to improve robustness of downstream analyses. Applying aseQC to a dataset of rare mendelian muscular disorders, successfully identified previously annotated low-quality cases demonstrating clinical genomic utility. When applied to 15,253 samples in extensively quality controlled GTEx project data, aseQC uncovered 563 low-quality samples that exhibit excessive allelic imbalance. We identify these to be associated with specific processing dates but not otherwise described adequately by any other quality control measures and metadata available in GTEx data. We show that these low-quality samples lead to 23.6 and 31.6 -fold increased ASE, and splicing outliers, degrading the performance of transcriptome analysis for rare variant interpretation. In contrast, we did not observe any adverse effect associated with inclusion of these samples in common-variant analysis using quantitative traits loci mapping. By enabling quick and reliable assessment of sample quality, aseQC presents a critical step for identifying subtle quality issues that remain critical for a successful analysis of rare variant effects using transcriptome data.

PMID:40501944 | PMC:PMC12157414 | DOI:10.1101/2025.05.30.657086

Categories
Nevin Manimala Statistics

MARLOWE: Taxonomic Characterization of Unknown Samples for Forensics Using De Novo Peptide Identification

bioRxiv [Preprint]. 2025 Jun 2:2024.09.30.615220. doi: 10.1101/2024.09.30.615220.

ABSTRACT

We present a computational tool, MARLOWE, for source organism characterization of unknown, forensic biological samples. The intent of MARLOWE is to address a gap in applying proteomics data analysis to forensic applications. MARLOWE produces a list of potential source organisms given confident peptide tags derived from de novo peptide sequencing and a statistical approach to assign peptides to organisms in a probabilistic manner, based on a broad sequence database. In this way, the algorithm assumes no a priori knowledge of potential sources, and the probabilistic way peptides are taxonomically assigned and then scored enables results to be unbiased (within the constraints of the sequence database). In a proof-of-concept study, we examined MARLOWE’s performance on two datasets, the Biodiversity dataset and the Bacillus cereus superspecies dataset. Not only did MARLOWE demonstrate successful characterization to true contributors in single source and binary mixtures in the Biodiversity dataset, but also provided sufficient specificity to distinguish species within a bacterial superspecies group. We also compared MARLOWE’s results to those of MiCId, a leading microbial identification/characterization tool based on proteomics database search. Comparison of the two tools using 225 mass spectrometry data files yielded comparable performance, with slightly higher accuracy and specificity for MiCId. At the species level, MARLOWE achieved a specificity of 91.4% at 5% FDR. These results suggest that MARLOWE is suitable for candidate- or lead-generation identification of single-organism and binary samples that can generate forensic leads and aid in selecting appropriate follow-on analyses in a forensic context.

PMID:40501933 | PMC:PMC12157597 | DOI:10.1101/2024.09.30.615220

Categories
Nevin Manimala Statistics

Comparing phenotypic manifolds with Kompot: Detecting differential abundance and gene expression at single-cell resolution

bioRxiv [Preprint]. 2025 Jun 7:2025.06.03.657769. doi: 10.1101/2025.06.03.657769.

ABSTRACT

Kompot is a statistical framework for holistic comparison of multi-condition single-cell datasets, supporting both differential abundance and differential expression. Differential abundance captures changes in how cells populate the phenotypic manifold across conditions, while differential expression identifies condition-specific changes in gene regulation that may be localized to particular regions of that manifold. Kompot models the distribution of cells and gene expression as continuous functions over a low-dimensional representation of cell states, enabling single-cell resolution inference with calibrated uncertainty estimates. Applying Kompot to aging murine bone marrow, we identified a continuum of shifts in hematopoietic stem cell and mature cell states, transcriptional remodeling of monocytes independent of compositional changes, and divergent regulation of oxidative stress response genes across cell types. By capturing both global and cell-state-specific effects of perturbation, Kompot reveals how aging reshapes cellular identity and regulatory programs across the hematopoietic landscape. This framework is broadly applicable to dissecting condition-specific effects in complex single-cell landscapes.

PMID:40501932 | PMC:PMC12157388 | DOI:10.1101/2025.06.03.657769