Categories
Nevin Manimala Statistics

powerROC: An Interactive Web Tool for Sample Size Calculation in Assessing Models’ Discriminative Abilities

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:196-204. eCollection 2025.

ABSTRACT

Rigorous external validation is crucial for assessing the generalizability of prediction models, particularly by evaluating their discrimination (AUROC) on new data. This often involves comparing a new model’s AUROC to that of an established reference model. However, many studies rely on arbitrary rules of thumb for sample size calculations, often resulting in underpowered analyses and unreliable conclusions. This paper reviews crucial concepts for accurate sample size determination in AUROC-based external validation studies, making the theory and practice more accessible to researchers and clinicians. We introduce powerROC, an open-source web tool designed to simplify these calculations, enabling both the evaluation of a single model and the comparison of two models. The tool offers guidance on selecting target precision levels and employs flexible approaches, leveraging either pilot data or user-defined probability distributions. We illustrate powerROC’s utility through a case study on hospital mortality prediction using the MIMIC database.

PMID:40502274 | PMC:PMC12150715

Categories
Nevin Manimala Statistics

Building Trust in Clinical AI: A Web-Based Explainable Decision Support System for Chronic Kidney Disease

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:375-384. eCollection 2025.

ABSTRACT

Chronic Kidney Disease (CKD) is a significant global public health issue, affecting over 10% of the population. Timely diagnosis is crucial for effective management. Leveraging machine learning within healthcare offers promising advancements in predictive diagnostics. We developed a Web-Based Clinical Decision Support System (CDSS) for CKD, incorporating advanced Explainable AI (XAI) methods, specifically SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations). The model employs and evaluates multiple classifiers: KNN, Random Forest, AdaBoost, XGBoost, CatBoost, and Extra Trees, to predict CKD. The effectiveness of the models is assessed by measuring their accuracy, analyzing confusion matrix statistics, and the AUC. AdaBoost achieved a 100% accuracy rate. Except for KNN, all classifiers consistently reached perfect precision and sensitivity. Additionally, we present a real-time web-based application to operationalize the model, enhancing trust and accessibility for healthcare practitioners and stakeholder.

PMID:40502268 | PMC:PMC12150721

Categories
Nevin Manimala Statistics

Predicting survival time for critically ill patients with heart failure using conformalized survival analysis

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:576-597. eCollection 2025.

ABSTRACT

Heart failure (HF) is a significant public health challenge, especially among critically ill patients in intensive care units (ICUs). Predicting survival outcomes for these patients with calibrated uncertainty is both challenging and essential for guiding subsequent treatments. This study introduces conformalized survival analysis (CSA) as a novel method for predicting survival times in critically ill HF patients. CSA enhances each predicted survival time with a statistically rigorous lower bound, providing valuable uncertainty quantification. Using the MIMIC-IV dataset, we demonstrate that CSA effectively delivers calibrated uncertainty quantification for survival predictions, in contrast to parametric models like the Cox or Accelerated Failure Time models. Through the application of CSA to a large, real-world dataset, this study underscores its potential to improve decision-making in critical care, offering a more precise and reliable tool for prognosis in a setting where accurate predictions and calibrated uncertainty can profoundly impact patient outcomes.

PMID:40502254 | PMC:PMC12150701

Categories
Nevin Manimala Statistics

Outpatient Portal Use and Blood Pressure Management during Pregnancy

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:537-545. eCollection 2025.

ABSTRACT

We investigated the association between systole and diastole, and outpatient portal use during pregnancy. We used electronic and administrative data from our academic medical center. We categorized patients into two groups: (<140 mm Hg; <90 mm Hg), and out-of-range (≥140 mm Hg, ≥ 90 mm Hg). Random effects linear regression models examined the association between mean trimester blood pressure (BP) levels and portal use, adjusting for covariates. As portal use increased, both systole and diastole levels decreased for the out-of-range group. These differences were statistically significant for patients who were initially out-of-range. For the in-range group, systole and diastole levels were stable as portal use increased. Results provide evidence to support a relationship between outpatient portal use and BP outcomes during pregnancy. More research is needed to expand on our findings, especially those focused on the implementation and design of outpatient portals for pregnancy.

PMID:40502244 | PMC:PMC12150748

Categories
Nevin Manimala Statistics

Safeguarding Privacy in Genome Research: A Comprehensive Framework for Authors

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:177-186. eCollection 2025.

ABSTRACT

As genomic research continues to advance, sharing of genomic data and research outcomes has become increasingly important for fostering collaboration and accelerating scientific discovery. However, such data sharing must be balanced with the need to protect the privacy of individuals whose genetic information is being utilized. This paper presents a bidirectional framework for evaluating privacy risks associated with data shared (both in terms of summary statistics and research datasets) in genomic research papers, particularly focusing on re-identification risks such as membership inference attacks (MIA). The framework consists of a structured workflow that begins with a questionnaire designed to capture researchers’ (authors’) self-reported data sharing practices and privacy protection measures. Responses are used to calculate the risk of re-identification for their study (paper) when compared with the National Institutes of Health (NIH) genomic data sharing policy. Any gaps in compliance help us to identify potential vulnerabilities and encourage the researchers to enhance their privacy measures before submitting their research for publication. The paper also demonstrates the application of this framework, using published genomic research as case study scenarios to emphasize the importance of implementing bidirectional frameworks to support trustworthy open science and genomic data sharing practices.

PMID:40502226 | PMC:PMC12150713

Categories
Nevin Manimala Statistics

Mapping EQ-5D-5L Score From SGRQ in Patients with Asthma and/or COPD in NOVELTY

Pragmat Obs Res. 2025 Jun 4;16:135-145. doi: 10.2147/POR.S508814. eCollection 2025.

ABSTRACT

PURPOSE: The St George’s Respiratory Questionnaire (SGRQ) measures health status in obstructive airways disease. Starkie et al proposed an algorithm for mapping the SGRQ to EQ-5D-5L, a preference-based utility measure, in chronic obstructive pulmonary disease (COPD) (Value Health 2011;14:354-60); only SGRQ total score, its squared value, and sex were included as covariates. We aimed to determine if including additional covariates could improve the performance of this algorithm type and whether amendments were required to extend this mapping to asthma or asthma+COPD.

PATIENTS AND METHODS: SGRQ and EQ-5D-5L were measured from a large, global, prospective, longitudinal study in asthma and/or COPD (NOVELTY; NCT02760329). We fitted six longitudinal linear mixed models to the development sample (baseline and Year 1 data), with EQ-5D-5L as the response variable. Each model had a different combination of covariates. Mixed model repeated measures methodology was used to enable the accommodation of within-patient correlation among measurements. Restricted maximum likelihood and an unstructured covariance matrix were used to fit all models. Performance (mean square errors [MSE]) was evaluated relative to the Starkie et al algorithm in the validation sample (Year 2 and Year 3 data).

RESULTS: A total of 6813 patients (asthma: 3546; asthma+COPD: 872; COPD: 2395) with available EQ-5D-5L and SGRQ data were included at baseline. MSEs indicated good performance, were similar across models (Year 2: 0.0302-0.0308 [45-46% variance explained]; Year 3: 0.0272-0.0277 [47-48% variance explained]), and were modestly smaller than those obtained by Starkie et al (Year 2: 0.0340; Year 3: 0.0296). Performance was similar across models in the asthma and COPD subgroups.

CONCLUSION: Including additional covariates and SGRQ domains resulted in similar model performance to Starkie et al, suggesting their covariates are adequate for mapping in asthma and/or COPD. NOVELTY coefficients broaden the population with chronic airways disease for whom this mapping can be applied.

PMID:40502208 | PMC:PMC12152420 | DOI:10.2147/POR.S508814

Categories
Nevin Manimala Statistics

Cyclo-stationary distributions of mRNA and Protein counts for random cell division times

bioRxiv [Preprint]. 2025 Jun 8:2025.06.06.658238. doi: 10.1101/2025.06.06.658238.

ABSTRACT

There is a long history of using experimental and computational approaches to study noise in single-cell levels of mRNA and proteins. The noise originates from a myriad of factors: intrinsic processes of gene expression, partitioning errors during division, and extrinsic effects, such as, random cell-cycle times. Although theoretical methods are well developed to analytically understand full statistics of copy numbers for fixed or Erlang distributed cell cycle times, the general problem of random division times is still open. For any random (but uncorrelated) division time distribution, we present a method to address this challenging problem and obtain exact series representations of the copy number distributions in the cyclo-stationary state. We provide explicit cell age-specific and age-averaged results, and analyze the relative contribution to noise from intrinsic and extrinsic sources. Our analytical approach will aid the analysis of single-cell expression data and help in disentangling the impact of variability in division times.

PMID:40502203 | PMC:PMC12157499 | DOI:10.1101/2025.06.06.658238

Categories
Nevin Manimala Statistics

Z-Form Stabilization By The Zα Domain Of Adar1p150 Has Subtle Effects On A-To-I Editing

bioRxiv [Preprint]. 2025 Jun 3:2025.06.02.657529. doi: 10.1101/2025.06.02.657529.

ABSTRACT

The role of Adenosine Deaminase Acting on RNA 1 (ADAR1)’s Z-conformation stabilizing Zα domain in A-to-I editing is unclear. Previous studies on Zα mutations faced limitations, including variable ADAR1p150 expression, differential editing analysis challenges, and unaccounted changes in ADAR1p150 localization. To address these issues, we developed a Cre-lox system in ADAR1p150 KO cells to generate stable cell lines expressing Zα mutant ADAR1p150 constructs. Using total RNA sequencing analyzing editing clusters as a proxy for dsRNAs, we found that Zα mutations slightly decreased overall A-to-I editing, consistent with recent findings. These decreases correlated with mislocalization of ADAR1p150 rather than reduced editing specificity, and practically no statistically significant differentially edited sites were identified between wild-type and Zα mutant ADAR1p150 constructs. These results suggest that Zα’s impact on editing is minor and that phenotypes in Zα mutant mouse models and human patients may arise from editing-independent inhibition of Z-DNA-Binding Protein 1 (ZBP1), rather than changes in RNA editing.

PMID:40502085 | PMC:PMC12157636 | DOI:10.1101/2025.06.02.657529

Categories
Nevin Manimala Statistics

Would you agree if N is three? On statistical inference for small N

bioRxiv [Preprint]. 2025 Jun 2:2024.08.26.609821. doi: 10.1101/2024.08.26.609821.

ABSTRACT

Non-human primate studies traditionally use two or three animals. We previously used standard statistics to argue for using either one animal, for an inference about that sample, or five or more animals, for a useful inference about the population. A recently proposed framework argued for testing three animals and accepting the outcome found in the majority as the outcome that is most representative for the population. The proposal tests this framework under various assumptions about the true probability of the representative outcome in the population, i.e. its typicality. On this basis, it argues that the framework is valid across a wide range of typicalities. Here, we show (1) that the error rate of the framework depends strongly on the typicality of the representative outcome, (2) that an acceptable error rate requires this typicality to be very high (87% for a single type of outlier), which actually renders empirical testing beyond a single animal obsolete, (3) that moving from one to three animals decreases error rates mainly for typicality values of 70-90%, and much less for both lower and higher values. Furthermore, we use conjunction analysis to demonstrate that two out of three animals with a given outcome only allow to infer a lower bound to typicality of 9%, which is of limited value. Thus, the use of two or three animals does not allow a useful inference about the population, and if this option is nevertheless chosen, the inferred lower bound of typicality should be reported.

PMID:40502083 | PMC:PMC12157567 | DOI:10.1101/2024.08.26.609821

Categories
Nevin Manimala Statistics

Effects of Natural Lithium and Lithium Isotopes on Voltage Gated Sodium Channel Activity in SH-SY5Y and IPSC Derived Cortical Neurons

bioRxiv [Preprint]. 2025 Jun 1:2025.05.28.656602. doi: 10.1101/2025.05.28.656602.

ABSTRACT

Although lithium (Li) is a widely used treatment for bipolar disorder, its exact mechanisms of action remain elusive. Research has shown that the two stable Li isotopes, which differ in their mass and nuclear spin, can induce distinct effects in both in vivo and in vitro studies. Since sodium (Na + ) channels are the primary pathway for Li + entry into cells, we examined how Li + affects the current of Na + channels using whole-cell patch-clamp techniques on SH-SY5Y neuroblastoma cells and human iPSC-derived cortical neurons. Our findings indicate that mammalian Na + channels in both neuronal models studied here display no selectivity between Na + and Li + , unlike previously reported bacterial Na + channels. We observed differences between the two neuronal models in three measured parameters ( V half , G max , z ). We saw no statistically significant differences between any ions in SHSY-5Y cells, but small differences in the half-maximum activation potential ( V half ) between Na + and 6 Li + and between 7 Li + and 6 Li + were found in iPSC-derived cortical neurons. Although Na + channels are widely expressed and important in neuronal function, the very small differences observed in this work suggest that Li + regulation through Na + channels is likely not the primary mechanism underlying Li + isotope differentiation.

PMID:40502081 | PMC:PMC12154711 | DOI:10.1101/2025.05.28.656602