Categories
Nevin Manimala Statistics

Incidence of malignancy in lung lesions initially classified as organizing pneumonia on CT-guided biopsies

Abdom Radiol (NY). 2025 Jun 12. doi: 10.1007/s00261-025-05048-x. Online ahead of print.

ABSTRACT

PURPOSE: Organizing pneumonia is an inflammatory disorder that may co-exist with malignancy in the lung or elsewhere in the body. We aimed to assess patients with a lung biopsy diagnosis of organizing pneumonia for subsequent pathology confirmation of co-existing malignancy.

METHODS: In this retrospective IRB-approved, HIPAA-compliant study, 1314 consecutive patients who underwent CT-guided lung biopsy for suspected lung cancer or metastatic disease from 02/2014 to 04/2022 at a single tertiary referral hospital were identified. In 98/1314 (7.5%) patients, biopsies showed organizing pneumonia, which represented our study cohort. Clinical outcomes, including follow-up imaging and repeat tissue sampling if performed, were evaluated through chart review. Descriptive statistics were calculated.

RESULTS: There were 43/98 (44%) females, mean age was 66 ± 14 years, mean lesion size 2.9 ± 2.1 cm, and 11/98 (11.2%) had prior history of malignancy. Of 98 patients initially diagnosed with organizing pneumonia on lung biopsy, 11 (11.2%) were subsequently found to have malignancy. Among these, 6 (54.5%) had pulmonary metastases and 5 (45.5%) had primary lung cancer. Malignancies were confirmed through percutaneous re-biopsy in 3/11 (27%) and bronchoscopic, endoscopic, or surgical procedures in 8/11 (73%).

CONCLUSION: Malignancy can co-exist with organizing pneumonia in a substantial percentage of initial lung biopsies. Therefore, repeat tissue sampling should be considered when there is high clinical suspicion of malignancy despite an initial histopathological diagnosis of organizing pneumonia. This is especially relevant in lesions that demonstrate FDG avidity on PET/CT or an increase in size on interval imaging, or in instances where the biopsy core sizes are small or where the biopsies have intraprocedural complications.

PMID:40504392 | DOI:10.1007/s00261-025-05048-x

Categories
Nevin Manimala Statistics

powerROC: An Interactive Web Tool for Sample Size Calculation in Assessing Models’ Discriminative Abilities

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:196-204. eCollection 2025.

ABSTRACT

Rigorous external validation is crucial for assessing the generalizability of prediction models, particularly by evaluating their discrimination (AUROC) on new data. This often involves comparing a new model’s AUROC to that of an established reference model. However, many studies rely on arbitrary rules of thumb for sample size calculations, often resulting in underpowered analyses and unreliable conclusions. This paper reviews crucial concepts for accurate sample size determination in AUROC-based external validation studies, making the theory and practice more accessible to researchers and clinicians. We introduce powerROC, an open-source web tool designed to simplify these calculations, enabling both the evaluation of a single model and the comparison of two models. The tool offers guidance on selecting target precision levels and employs flexible approaches, leveraging either pilot data or user-defined probability distributions. We illustrate powerROC’s utility through a case study on hospital mortality prediction using the MIMIC database.

PMID:40502274 | PMC:PMC12150715

Categories
Nevin Manimala Statistics

Building Trust in Clinical AI: A Web-Based Explainable Decision Support System for Chronic Kidney Disease

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:375-384. eCollection 2025.

ABSTRACT

Chronic Kidney Disease (CKD) is a significant global public health issue, affecting over 10% of the population. Timely diagnosis is crucial for effective management. Leveraging machine learning within healthcare offers promising advancements in predictive diagnostics. We developed a Web-Based Clinical Decision Support System (CDSS) for CKD, incorporating advanced Explainable AI (XAI) methods, specifically SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations). The model employs and evaluates multiple classifiers: KNN, Random Forest, AdaBoost, XGBoost, CatBoost, and Extra Trees, to predict CKD. The effectiveness of the models is assessed by measuring their accuracy, analyzing confusion matrix statistics, and the AUC. AdaBoost achieved a 100% accuracy rate. Except for KNN, all classifiers consistently reached perfect precision and sensitivity. Additionally, we present a real-time web-based application to operationalize the model, enhancing trust and accessibility for healthcare practitioners and stakeholder.

PMID:40502268 | PMC:PMC12150721

Categories
Nevin Manimala Statistics

Predicting survival time for critically ill patients with heart failure using conformalized survival analysis

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:576-597. eCollection 2025.

ABSTRACT

Heart failure (HF) is a significant public health challenge, especially among critically ill patients in intensive care units (ICUs). Predicting survival outcomes for these patients with calibrated uncertainty is both challenging and essential for guiding subsequent treatments. This study introduces conformalized survival analysis (CSA) as a novel method for predicting survival times in critically ill HF patients. CSA enhances each predicted survival time with a statistically rigorous lower bound, providing valuable uncertainty quantification. Using the MIMIC-IV dataset, we demonstrate that CSA effectively delivers calibrated uncertainty quantification for survival predictions, in contrast to parametric models like the Cox or Accelerated Failure Time models. Through the application of CSA to a large, real-world dataset, this study underscores its potential to improve decision-making in critical care, offering a more precise and reliable tool for prognosis in a setting where accurate predictions and calibrated uncertainty can profoundly impact patient outcomes.

PMID:40502254 | PMC:PMC12150701

Categories
Nevin Manimala Statistics

Outpatient Portal Use and Blood Pressure Management during Pregnancy

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:537-545. eCollection 2025.

ABSTRACT

We investigated the association between systole and diastole, and outpatient portal use during pregnancy. We used electronic and administrative data from our academic medical center. We categorized patients into two groups: (<140 mm Hg; <90 mm Hg), and out-of-range (≥140 mm Hg, ≥ 90 mm Hg). Random effects linear regression models examined the association between mean trimester blood pressure (BP) levels and portal use, adjusting for covariates. As portal use increased, both systole and diastole levels decreased for the out-of-range group. These differences were statistically significant for patients who were initially out-of-range. For the in-range group, systole and diastole levels were stable as portal use increased. Results provide evidence to support a relationship between outpatient portal use and BP outcomes during pregnancy. More research is needed to expand on our findings, especially those focused on the implementation and design of outpatient portals for pregnancy.

PMID:40502244 | PMC:PMC12150748

Categories
Nevin Manimala Statistics

Safeguarding Privacy in Genome Research: A Comprehensive Framework for Authors

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:177-186. eCollection 2025.

ABSTRACT

As genomic research continues to advance, sharing of genomic data and research outcomes has become increasingly important for fostering collaboration and accelerating scientific discovery. However, such data sharing must be balanced with the need to protect the privacy of individuals whose genetic information is being utilized. This paper presents a bidirectional framework for evaluating privacy risks associated with data shared (both in terms of summary statistics and research datasets) in genomic research papers, particularly focusing on re-identification risks such as membership inference attacks (MIA). The framework consists of a structured workflow that begins with a questionnaire designed to capture researchers’ (authors’) self-reported data sharing practices and privacy protection measures. Responses are used to calculate the risk of re-identification for their study (paper) when compared with the National Institutes of Health (NIH) genomic data sharing policy. Any gaps in compliance help us to identify potential vulnerabilities and encourage the researchers to enhance their privacy measures before submitting their research for publication. The paper also demonstrates the application of this framework, using published genomic research as case study scenarios to emphasize the importance of implementing bidirectional frameworks to support trustworthy open science and genomic data sharing practices.

PMID:40502226 | PMC:PMC12150713

Categories
Nevin Manimala Statistics

Mapping EQ-5D-5L Score From SGRQ in Patients with Asthma and/or COPD in NOVELTY

Pragmat Obs Res. 2025 Jun 4;16:135-145. doi: 10.2147/POR.S508814. eCollection 2025.

ABSTRACT

PURPOSE: The St George’s Respiratory Questionnaire (SGRQ) measures health status in obstructive airways disease. Starkie et al proposed an algorithm for mapping the SGRQ to EQ-5D-5L, a preference-based utility measure, in chronic obstructive pulmonary disease (COPD) (Value Health 2011;14:354-60); only SGRQ total score, its squared value, and sex were included as covariates. We aimed to determine if including additional covariates could improve the performance of this algorithm type and whether amendments were required to extend this mapping to asthma or asthma+COPD.

PATIENTS AND METHODS: SGRQ and EQ-5D-5L were measured from a large, global, prospective, longitudinal study in asthma and/or COPD (NOVELTY; NCT02760329). We fitted six longitudinal linear mixed models to the development sample (baseline and Year 1 data), with EQ-5D-5L as the response variable. Each model had a different combination of covariates. Mixed model repeated measures methodology was used to enable the accommodation of within-patient correlation among measurements. Restricted maximum likelihood and an unstructured covariance matrix were used to fit all models. Performance (mean square errors [MSE]) was evaluated relative to the Starkie et al algorithm in the validation sample (Year 2 and Year 3 data).

RESULTS: A total of 6813 patients (asthma: 3546; asthma+COPD: 872; COPD: 2395) with available EQ-5D-5L and SGRQ data were included at baseline. MSEs indicated good performance, were similar across models (Year 2: 0.0302-0.0308 [45-46% variance explained]; Year 3: 0.0272-0.0277 [47-48% variance explained]), and were modestly smaller than those obtained by Starkie et al (Year 2: 0.0340; Year 3: 0.0296). Performance was similar across models in the asthma and COPD subgroups.

CONCLUSION: Including additional covariates and SGRQ domains resulted in similar model performance to Starkie et al, suggesting their covariates are adequate for mapping in asthma and/or COPD. NOVELTY coefficients broaden the population with chronic airways disease for whom this mapping can be applied.

PMID:40502208 | PMC:PMC12152420 | DOI:10.2147/POR.S508814

Categories
Nevin Manimala Statistics

Cyclo-stationary distributions of mRNA and Protein counts for random cell division times

bioRxiv [Preprint]. 2025 Jun 8:2025.06.06.658238. doi: 10.1101/2025.06.06.658238.

ABSTRACT

There is a long history of using experimental and computational approaches to study noise in single-cell levels of mRNA and proteins. The noise originates from a myriad of factors: intrinsic processes of gene expression, partitioning errors during division, and extrinsic effects, such as, random cell-cycle times. Although theoretical methods are well developed to analytically understand full statistics of copy numbers for fixed or Erlang distributed cell cycle times, the general problem of random division times is still open. For any random (but uncorrelated) division time distribution, we present a method to address this challenging problem and obtain exact series representations of the copy number distributions in the cyclo-stationary state. We provide explicit cell age-specific and age-averaged results, and analyze the relative contribution to noise from intrinsic and extrinsic sources. Our analytical approach will aid the analysis of single-cell expression data and help in disentangling the impact of variability in division times.

PMID:40502203 | PMC:PMC12157499 | DOI:10.1101/2025.06.06.658238

Categories
Nevin Manimala Statistics

Z-Form Stabilization By The Zα Domain Of Adar1p150 Has Subtle Effects On A-To-I Editing

bioRxiv [Preprint]. 2025 Jun 3:2025.06.02.657529. doi: 10.1101/2025.06.02.657529.

ABSTRACT

The role of Adenosine Deaminase Acting on RNA 1 (ADAR1)’s Z-conformation stabilizing Zα domain in A-to-I editing is unclear. Previous studies on Zα mutations faced limitations, including variable ADAR1p150 expression, differential editing analysis challenges, and unaccounted changes in ADAR1p150 localization. To address these issues, we developed a Cre-lox system in ADAR1p150 KO cells to generate stable cell lines expressing Zα mutant ADAR1p150 constructs. Using total RNA sequencing analyzing editing clusters as a proxy for dsRNAs, we found that Zα mutations slightly decreased overall A-to-I editing, consistent with recent findings. These decreases correlated with mislocalization of ADAR1p150 rather than reduced editing specificity, and practically no statistically significant differentially edited sites were identified between wild-type and Zα mutant ADAR1p150 constructs. These results suggest that Zα’s impact on editing is minor and that phenotypes in Zα mutant mouse models and human patients may arise from editing-independent inhibition of Z-DNA-Binding Protein 1 (ZBP1), rather than changes in RNA editing.

PMID:40502085 | PMC:PMC12157636 | DOI:10.1101/2025.06.02.657529

Categories
Nevin Manimala Statistics

Would you agree if N is three? On statistical inference for small N

bioRxiv [Preprint]. 2025 Jun 2:2024.08.26.609821. doi: 10.1101/2024.08.26.609821.

ABSTRACT

Non-human primate studies traditionally use two or three animals. We previously used standard statistics to argue for using either one animal, for an inference about that sample, or five or more animals, for a useful inference about the population. A recently proposed framework argued for testing three animals and accepting the outcome found in the majority as the outcome that is most representative for the population. The proposal tests this framework under various assumptions about the true probability of the representative outcome in the population, i.e. its typicality. On this basis, it argues that the framework is valid across a wide range of typicalities. Here, we show (1) that the error rate of the framework depends strongly on the typicality of the representative outcome, (2) that an acceptable error rate requires this typicality to be very high (87% for a single type of outlier), which actually renders empirical testing beyond a single animal obsolete, (3) that moving from one to three animals decreases error rates mainly for typicality values of 70-90%, and much less for both lower and higher values. Furthermore, we use conjunction analysis to demonstrate that two out of three animals with a given outcome only allow to infer a lower bound to typicality of 9%, which is of limited value. Thus, the use of two or three animals does not allow a useful inference about the population, and if this option is nevertheless chosen, the inferred lower bound of typicality should be reported.

PMID:40502083 | PMC:PMC12157567 | DOI:10.1101/2024.08.26.609821