Categories
Nevin Manimala Statistics

Predicting survival time for critically ill patients with heart failure using conformalized survival analysis

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:576-597. eCollection 2025.

ABSTRACT

Heart failure (HF) is a significant public health challenge, especially among critically ill patients in intensive care units (ICUs). Predicting survival outcomes for these patients with calibrated uncertainty is both challenging and essential for guiding subsequent treatments. This study introduces conformalized survival analysis (CSA) as a novel method for predicting survival times in critically ill HF patients. CSA enhances each predicted survival time with a statistically rigorous lower bound, providing valuable uncertainty quantification. Using the MIMIC-IV dataset, we demonstrate that CSA effectively delivers calibrated uncertainty quantification for survival predictions, in contrast to parametric models like the Cox or Accelerated Failure Time models. Through the application of CSA to a large, real-world dataset, this study underscores its potential to improve decision-making in critical care, offering a more precise and reliable tool for prognosis in a setting where accurate predictions and calibrated uncertainty can profoundly impact patient outcomes.

PMID:40502254 | PMC:PMC12150701

Categories
Nevin Manimala Statistics

Outpatient Portal Use and Blood Pressure Management during Pregnancy

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:537-545. eCollection 2025.

ABSTRACT

We investigated the association between systole and diastole, and outpatient portal use during pregnancy. We used electronic and administrative data from our academic medical center. We categorized patients into two groups: (<140 mm Hg; <90 mm Hg), and out-of-range (≥140 mm Hg, ≥ 90 mm Hg). Random effects linear regression models examined the association between mean trimester blood pressure (BP) levels and portal use, adjusting for covariates. As portal use increased, both systole and diastole levels decreased for the out-of-range group. These differences were statistically significant for patients who were initially out-of-range. For the in-range group, systole and diastole levels were stable as portal use increased. Results provide evidence to support a relationship between outpatient portal use and BP outcomes during pregnancy. More research is needed to expand on our findings, especially those focused on the implementation and design of outpatient portals for pregnancy.

PMID:40502244 | PMC:PMC12150748

Categories
Nevin Manimala Statistics

Safeguarding Privacy in Genome Research: A Comprehensive Framework for Authors

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:177-186. eCollection 2025.

ABSTRACT

As genomic research continues to advance, sharing of genomic data and research outcomes has become increasingly important for fostering collaboration and accelerating scientific discovery. However, such data sharing must be balanced with the need to protect the privacy of individuals whose genetic information is being utilized. This paper presents a bidirectional framework for evaluating privacy risks associated with data shared (both in terms of summary statistics and research datasets) in genomic research papers, particularly focusing on re-identification risks such as membership inference attacks (MIA). The framework consists of a structured workflow that begins with a questionnaire designed to capture researchers’ (authors’) self-reported data sharing practices and privacy protection measures. Responses are used to calculate the risk of re-identification for their study (paper) when compared with the National Institutes of Health (NIH) genomic data sharing policy. Any gaps in compliance help us to identify potential vulnerabilities and encourage the researchers to enhance their privacy measures before submitting their research for publication. The paper also demonstrates the application of this framework, using published genomic research as case study scenarios to emphasize the importance of implementing bidirectional frameworks to support trustworthy open science and genomic data sharing practices.

PMID:40502226 | PMC:PMC12150713

Categories
Nevin Manimala Statistics

Mapping EQ-5D-5L Score From SGRQ in Patients with Asthma and/or COPD in NOVELTY

Pragmat Obs Res. 2025 Jun 4;16:135-145. doi: 10.2147/POR.S508814. eCollection 2025.

ABSTRACT

PURPOSE: The St George’s Respiratory Questionnaire (SGRQ) measures health status in obstructive airways disease. Starkie et al proposed an algorithm for mapping the SGRQ to EQ-5D-5L, a preference-based utility measure, in chronic obstructive pulmonary disease (COPD) (Value Health 2011;14:354-60); only SGRQ total score, its squared value, and sex were included as covariates. We aimed to determine if including additional covariates could improve the performance of this algorithm type and whether amendments were required to extend this mapping to asthma or asthma+COPD.

PATIENTS AND METHODS: SGRQ and EQ-5D-5L were measured from a large, global, prospective, longitudinal study in asthma and/or COPD (NOVELTY; NCT02760329). We fitted six longitudinal linear mixed models to the development sample (baseline and Year 1 data), with EQ-5D-5L as the response variable. Each model had a different combination of covariates. Mixed model repeated measures methodology was used to enable the accommodation of within-patient correlation among measurements. Restricted maximum likelihood and an unstructured covariance matrix were used to fit all models. Performance (mean square errors [MSE]) was evaluated relative to the Starkie et al algorithm in the validation sample (Year 2 and Year 3 data).

RESULTS: A total of 6813 patients (asthma: 3546; asthma+COPD: 872; COPD: 2395) with available EQ-5D-5L and SGRQ data were included at baseline. MSEs indicated good performance, were similar across models (Year 2: 0.0302-0.0308 [45-46% variance explained]; Year 3: 0.0272-0.0277 [47-48% variance explained]), and were modestly smaller than those obtained by Starkie et al (Year 2: 0.0340; Year 3: 0.0296). Performance was similar across models in the asthma and COPD subgroups.

CONCLUSION: Including additional covariates and SGRQ domains resulted in similar model performance to Starkie et al, suggesting their covariates are adequate for mapping in asthma and/or COPD. NOVELTY coefficients broaden the population with chronic airways disease for whom this mapping can be applied.

PMID:40502208 | PMC:PMC12152420 | DOI:10.2147/POR.S508814

Categories
Nevin Manimala Statistics

Cyclo-stationary distributions of mRNA and Protein counts for random cell division times

bioRxiv [Preprint]. 2025 Jun 8:2025.06.06.658238. doi: 10.1101/2025.06.06.658238.

ABSTRACT

There is a long history of using experimental and computational approaches to study noise in single-cell levels of mRNA and proteins. The noise originates from a myriad of factors: intrinsic processes of gene expression, partitioning errors during division, and extrinsic effects, such as, random cell-cycle times. Although theoretical methods are well developed to analytically understand full statistics of copy numbers for fixed or Erlang distributed cell cycle times, the general problem of random division times is still open. For any random (but uncorrelated) division time distribution, we present a method to address this challenging problem and obtain exact series representations of the copy number distributions in the cyclo-stationary state. We provide explicit cell age-specific and age-averaged results, and analyze the relative contribution to noise from intrinsic and extrinsic sources. Our analytical approach will aid the analysis of single-cell expression data and help in disentangling the impact of variability in division times.

PMID:40502203 | PMC:PMC12157499 | DOI:10.1101/2025.06.06.658238

Categories
Nevin Manimala Statistics

Z-Form Stabilization By The Zα Domain Of Adar1p150 Has Subtle Effects On A-To-I Editing

bioRxiv [Preprint]. 2025 Jun 3:2025.06.02.657529. doi: 10.1101/2025.06.02.657529.

ABSTRACT

The role of Adenosine Deaminase Acting on RNA 1 (ADAR1)’s Z-conformation stabilizing Zα domain in A-to-I editing is unclear. Previous studies on Zα mutations faced limitations, including variable ADAR1p150 expression, differential editing analysis challenges, and unaccounted changes in ADAR1p150 localization. To address these issues, we developed a Cre-lox system in ADAR1p150 KO cells to generate stable cell lines expressing Zα mutant ADAR1p150 constructs. Using total RNA sequencing analyzing editing clusters as a proxy for dsRNAs, we found that Zα mutations slightly decreased overall A-to-I editing, consistent with recent findings. These decreases correlated with mislocalization of ADAR1p150 rather than reduced editing specificity, and practically no statistically significant differentially edited sites were identified between wild-type and Zα mutant ADAR1p150 constructs. These results suggest that Zα’s impact on editing is minor and that phenotypes in Zα mutant mouse models and human patients may arise from editing-independent inhibition of Z-DNA-Binding Protein 1 (ZBP1), rather than changes in RNA editing.

PMID:40502085 | PMC:PMC12157636 | DOI:10.1101/2025.06.02.657529

Categories
Nevin Manimala Statistics

Would you agree if N is three? On statistical inference for small N

bioRxiv [Preprint]. 2025 Jun 2:2024.08.26.609821. doi: 10.1101/2024.08.26.609821.

ABSTRACT

Non-human primate studies traditionally use two or three animals. We previously used standard statistics to argue for using either one animal, for an inference about that sample, or five or more animals, for a useful inference about the population. A recently proposed framework argued for testing three animals and accepting the outcome found in the majority as the outcome that is most representative for the population. The proposal tests this framework under various assumptions about the true probability of the representative outcome in the population, i.e. its typicality. On this basis, it argues that the framework is valid across a wide range of typicalities. Here, we show (1) that the error rate of the framework depends strongly on the typicality of the representative outcome, (2) that an acceptable error rate requires this typicality to be very high (87% for a single type of outlier), which actually renders empirical testing beyond a single animal obsolete, (3) that moving from one to three animals decreases error rates mainly for typicality values of 70-90%, and much less for both lower and higher values. Furthermore, we use conjunction analysis to demonstrate that two out of three animals with a given outcome only allow to infer a lower bound to typicality of 9%, which is of limited value. Thus, the use of two or three animals does not allow a useful inference about the population, and if this option is nevertheless chosen, the inferred lower bound of typicality should be reported.

PMID:40502083 | PMC:PMC12157567 | DOI:10.1101/2024.08.26.609821

Categories
Nevin Manimala Statistics

Effects of Natural Lithium and Lithium Isotopes on Voltage Gated Sodium Channel Activity in SH-SY5Y and IPSC Derived Cortical Neurons

bioRxiv [Preprint]. 2025 Jun 1:2025.05.28.656602. doi: 10.1101/2025.05.28.656602.

ABSTRACT

Although lithium (Li) is a widely used treatment for bipolar disorder, its exact mechanisms of action remain elusive. Research has shown that the two stable Li isotopes, which differ in their mass and nuclear spin, can induce distinct effects in both in vivo and in vitro studies. Since sodium (Na + ) channels are the primary pathway for Li + entry into cells, we examined how Li + affects the current of Na + channels using whole-cell patch-clamp techniques on SH-SY5Y neuroblastoma cells and human iPSC-derived cortical neurons. Our findings indicate that mammalian Na + channels in both neuronal models studied here display no selectivity between Na + and Li + , unlike previously reported bacterial Na + channels. We observed differences between the two neuronal models in three measured parameters ( V half , G max , z ). We saw no statistically significant differences between any ions in SHSY-5Y cells, but small differences in the half-maximum activation potential ( V half ) between Na + and 6 Li + and between 7 Li + and 6 Li + were found in iPSC-derived cortical neurons. Although Na + channels are widely expressed and important in neuronal function, the very small differences observed in this work suggest that Li + regulation through Na + channels is likely not the primary mechanism underlying Li + isotope differentiation.

PMID:40502081 | PMC:PMC12154711 | DOI:10.1101/2025.05.28.656602

Categories
Nevin Manimala Statistics

It’s a wrap: deriving distinct discoveries with FDR control after a GWAS pipeline

bioRxiv [Preprint]. 2025 Jun 8:2025.06.05.658138. doi: 10.1101/2025.06.05.658138.

ABSTRACT

Recent work has shown how to test conditional independence hypotheses between an outcome of interest and a large number of explanatory variables with false discovery rate control (FDR), even without access to individual level data. In the case of genome-wide association studies (GWAS) specifically, summary statistics resulting from the standard analysis pipeline can be used as input of a procedure which identifies distinct signals across the genome with FDR control. This secondary analysis requires sampling of negative controls (knockoff) from a distribution determined by the linkage disequilibrium patterns in the genome of the population under study. In prior work, we have pre-computed this distribution for European genomes, starting from information derived from the UK Biobank. Thus, researchers working with European GWASes can carry out a knockoff analysis with minimal computational costs, using the distributed routine GhostKnockoffGWAS . Here we introduce and release a new software (solveblock) that extends this capability to a much richer collection of studies. Given a set of genotyped samples, or a reference dataset, our pipeline efficiently estimates the high-dimensional correlation matrices that describe correlation structures across the genome, making rather common sparsity assumptions. Taking this sample-specific estimate as input, the software identifies groups of genetic variants that are highly correlated, and uses them to define an appropriate resolution for conditional independence hypotheses. Finally, we compute the distribution for the exchangeable negative controls necessary to test these hypotheses. The output of solveblock can be passed directly to GhostKnockoffGWAS , allowing users to carry out the complete analysis in a two step procedure. We illustrate the performance of the routine analyzing data from five UK Biobank sub-populations. In simulations, our method controls FDR. Analyzing real data relative to 26 phenotypes of varying polygenicity in British individuals, we make an average of ≈ 19 additional discoveries, compared to standard marginal association testing. Our code, precompiled software, and processed files for these five sub-populations are openly shared.

PMID:40502041 | PMC:PMC12157521 | DOI:10.1101/2025.06.05.658138

Categories
Nevin Manimala Statistics

Improved quantitative accuracy in data-independent acquisition proteomics via retention time boundary imputation

bioRxiv [Preprint]. 2025 May 31:2025.05.27.656394. doi: 10.1101/2025.05.27.656394.

ABSTRACT

The traditional approaches to handling missing values in DIA proteomics are to either remove high-missingness proteins or impute them with statistical procedures. Both have their disadvantages-removal can limit statistical power, while imputation can introduce spurious correlations or dilute signal. We present an alternative approach based on imputing peptide retention times (RTs) rather than quantitations. For each missing value, we impute the RT boundaries, then obtain a quantitation by integrating the chromatographic signal within the imputed boundaries. Our method yields more accurate quantitations than existing proteomics imputation methods. RT boundary imputation also identifies differentially abundant peptides from key Alzheimer’s genes that were not identified with library search alone. RT boundary imputation improves the ability to estimate radiation exposure in biological tissues. RT boundary imputation significantly increases the number of peptides with quantitations, leading to increases in statistical power. Finally, RT boundary imputation better quantifies low abundance peptides than library search alone. Our RT boundary imputation method, called Nettle, is available as a standalone tool.

PMID:40502008 | PMC:PMC12154835 | DOI:10.1101/2025.05.27.656394