Categories
Nevin Manimala Statistics

Identifying Key Variances in Clinical Pathways Associated With Prolonged Hospital Stays Using Machine Learning and ePath Real-World Data: Model Development and Validation Study

JMIR Med Inform. 2025 Dec 1;13:e71617. doi: 10.2196/71617.

ABSTRACT

BACKGROUND: Prolonged hospital stays can lead to inefficiencies in health care delivery and unnecessary consumption of medical resources.

OBJECTIVE: This study aimed to identify key clinical variances associated with prolonged length of stay (PLOS) in clinical pathways using a machine learning model trained on real-world data from the ePath system.

METHODS: We analyzed data from 480 patients with lung cancer (age: mean 68.3, SD 11.2 years; n=263, 54.8% men) who underwent video-assisted thoracoscopic surgery at a university hospital between 2019 and 2023. PLOS was defined as a hospital stay exceeding 9 days after video-assisted thoracoscopic surgery. The variables collected between admission and 4 days after surgery were examined, and those that showed a significant association with PLOS in univariate analyses (P<.01) were selected as predictors. Predictive models were developed using sparse linear regression methods (Lasso, ridge, and elastic net) and decision tree ensembles (random forest and extreme gradient boosting). The data were divided into derivation (earlier study period) and testing (later period) cohorts for temporal validation. The model performance was assessed using the area under the receiver operating characteristic curve, Brier score, and calibration plots. Counterfactual analysis was used to identify key clinical factors influencing PLOS.

RESULTS: A 3D heatmap illustrated the temporal relationships between clinical factors and PLOS based on patient demographics, comorbidities, functional status, surgical details, care processes, medications, and variances recorded from admission to 4 days after surgery. Among the 5 algorithms evaluated, the ridge regression model demonstrated the best performance in terms of both discrimination and calibration. Specifically, it achieved area under the receiver operating characteristic curve values of 0.84 and 0.82 and Brier scores of 0.16 and 0.17 in the derivation and test cohorts, respectively. In the final model, a range of variables, including blood tests, care, patient background, procedures, and clinical variances, were associated with PLOS. Among these, particular emphasis was placed on clinical variances. Counterfactual analysis using the ridge regression model identified 6 key variables strongly linked to PLOS. In order of impact, these were abnormal respiratory sounds, postoperative fever, arrhythmia, impaired ambulation, complications after drain removal, and pulmonary air leaks.

CONCLUSIONS: A machine learning-based model using ePath data effectively identified critical variances in the clinical pathways associated with PLOS. This automated tool may enhance clinical decision-making and improve patient management.

PMID:41325598 | DOI:10.2196/71617

Categories
Nevin Manimala Statistics

Health Motivation as a Predictor of mHealth Engagement Across BMI: Cross-Sectional Survey

JMIR Mhealth Uhealth. 2025 Dec 1;13:e71625. doi: 10.2196/71625.

ABSTRACT

BACKGROUND: Digital health tools, such as mobile apps and wearable devices, have been widely adopted to support self-management of health behaviors. However, user engagement remains inconsistent, particularly among populations with varying BMI. While digital health technologies have the potential to promote healthier behaviors, little is known about how psychological and behavioral factors interact with BMI to influence use patterns.

OBJECTIVE: This study aimed to explore the relationship between BMI and digital health technology use and to examine how factors such as health awareness, self-efficacy, and health motivation contribute to technology engagement.

METHODS: A cross-sectional online survey was conducted from January 2024 to April 2024. A total of 184 valid questionnaire participants were included in this study. The questionnaire was measured on a 5-point Likert scale. Descriptive statistics, chi-square tests, and multiple regression analyses were applied.

RESULTS: Of the participants, 38.6% (71/184) had a BMI<24 kg/m2, 42.4% (78/184) had a BMI between 24 and 29.9 kg/m2, and 19% (35/184) had a BMI≥30 kg/m2. Significant BMI differences were observed based on sex (P<.001) and age (P<.001) but not based on prior digital health tool use. Use rates for Bluetooth or Wi-Fi devices, wearables, and mobile apps were 32.1% (59/184), 38.6% (71/184), and 39.1% (72/184), respectively. A negative correlation between BMI and mobile app use frequency was identified (P=.02). Multiple regression analysis indicated that health motivation significantly predicted digital health use (P<.001), whereas health awareness, lifestyle, and self-efficacy did not.

CONCLUSIONS: Individuals with higher BMI reported a lower frequency of digital health tool use, potentially due to lower health motivation in the studied population. Health motivation was the strongest predictor of digital health engagement. Integrating personalized medical records into apps may enhance health motivation, thereby improving user engagement and promoting healthier behaviors in individuals with higher BMI.

PMID:41325596 | DOI:10.2196/71625

Categories
Nevin Manimala Statistics

SmokeBERT: A Bidirectional Encoder Representations From Transformers-Based Model for Quantitative Smoking History Extraction From Clinical Narratives to Improve Lung Cancer Screening

JCO Clin Cancer Inform. 2025 Dec;9:e2500223. doi: 10.1200/CCI-25-00223. Epub 2025 Dec 1.

ABSTRACT

PURPOSE: Tobacco use is a major risk factor for diseases such as cancer. Granular quantitative details of smoking (eg, pack years and years since quitting) are essential for assessing disease risk and determining eligibility for lung cancer screening (LCS). However, existing natural language processing (NLP) tools struggle to extract detailed quantitative smoking data from clinical narratives.

METHODS: We cross-validated four pretrained Bidirectional Encoder Representations from Transformers (BERT)-based models-BERT, BioBERT, ClinicalBERT, and MedBERT-by fine-tuning them on 90% of 3,261 sentences mentioning smoking history to extract six quantitative smoking history variables from clinical narratives. The model with the highest cross-validated micro-averaged F1 scores across most variables was selected as the final SmokeBERT model and was further fine-tuned on the 90% training data. Model performance was evaluated on a 10% holdout test set and an external validation set containing 3,191 sentences.

RESULTS: ClinicalBERT was selected as the final model based on cross-validation and was fine-tuned on the training data to create the SmokeBERT model. Compared with the state-of-the-art rule-based NLP model and the Generative Pre-trained Transformer Open Source Series 20 billion parameter model, SmokeBERT demonstrated superior performance in smoking data extraction (overall F1 score, holdout test: 0.97 v 0.88-0.90; external validation: 0.86 v 0.72-0.79) and in identifying LCS-eligible patients (97% v 59%-97% for ≥20 pack-years and 100% v 60%-84% for ≤15 years since quitting).

CONCLUSION: We developed SmokeBERT, a fine-tuned BERT-based model optimized for extracting detailed quantitative smoking histories. Future work includes evaluating performance on larger clinical data sets and developing a multilingual, language-agnostic version of SmokeBERT.

PMID:41325572 | DOI:10.1200/CCI-25-00223

Categories
Nevin Manimala Statistics

Clinical phenotype matters: structural and functional thalamic changes in neuropathic low-back pain

Pain. 2025 Nov 25. doi: 10.1097/j.pain.0000000000003843. Online ahead of print.

ABSTRACT

Neuropathic chronic low-back pain (neuCLBP) is associated with worse clinical outcomes compared with non-neuropathic or axial CLBP (non-neuCLBP) and has limited effective nonsurgical treatment options, reflecting poor understanding of its underlying pathophysiology. In this study, we compared neuCLBP and non-neuCLBP patients using standardized clinical phenotyping of the neuropathic component alongside multimodal brain functional magnetic resonance imaging (fMRI). We hypothesized that, consistent with the definition of neuropathic pain as pain arising from injury to the somatosensory nervous system, neuCLBP patients would exhibit reduced thalamic volume and/or altered thalamic shape, reduced primary somatosensory cortex (S1) thickness, and altered resting-state functional connectivity of these structures compared with non-neuCLBP patients and pain-free healthy controls. Consistent with previous literature, we observed that neuCLBP patients (n = 28) presented with more severe clinical symptoms than non-neuCLBP patients (n = 28). Structurally, neuCLBP patients exhibited extensive differences in thalamic shape but no significant differences in thalamic volume or S1 gray matter thickness. By contrast, by examining resting-state thalamic connectivity gradient maps, we found that non-neuCLBP patients exhibited the most pronounced alterations in these gradients. This study is the first to combine multimodal fMRI with rigorous, standardized phenotyping to investigate neuCLBP. While our results may be influenced by greater symptom severity in the neuCLBP patients, they indicate that these patients may display distinct central plasticity patterns. The findings also highlight the importance of distinguishing between these clinical phenotypes to reduce heterogeneity in future studies.

PMID:41325555 | DOI:10.1097/j.pain.0000000000003843

Categories
Nevin Manimala Statistics

Materials Dual-Source Knowledge Retrieval-Augmented Generation for Local Large Language Models in Photocatalysts

J Chem Inf Model. 2025 Dec 1. doi: 10.1021/acs.jcim.5c01941. Online ahead of print.

ABSTRACT

Large language models (LLMs) have the potential to serve as collaborative assistants in scientific research. However, adapting them to specialized domains is difficult because it requires the integration of domain-specific knowledge. We propose Materials Dual-Source Knowledge Retrieval-Augmented Generation (MDSK-RAG), a retrieval-augmented generation (RAG) framework that enables domain specialization of LLMs for materials development under fully offline (no-Internet) operation to ensure data confidentiality. The framework unifies two complementary knowledge sources, experimental CSV data (practical knowledge) and scientific PDF literature (theoretical insights), by converting tabular records into template-based text, retrieving relevant passages from each source, summarizing them with a local LLM, and merging the summaries with the user query prior to generation. As a case study, we applied the framework to metal-sulfide photocatalysts using 740 in-house experimental records and 20 scientific PDFs. We evaluated the framework on a benchmark consisting of 14 expert-defined questions and used two-sided Wilcoxon signed-rank tests for paired comparisons. Models with fewer than 10 billion parameters were executed on a laptop, whereas larger models were run on a dedicated local server; the cloud-based LLM (GPT-4o) was evaluated via the cloud service. For practical deployment, gemma-2-9b-it (<10 billion parameters) was chosen as the primary local model; we additionally tested Qwen2.5-7B-Instruct and a larger gemma-2-27b-it to assess model choice and scalability. For gemma-2-9b-it, the framework increased the median cosine similarity to expert reference answers from 0.63 to 0.71, an absolute increase of 0.08 (corresponding to a relative percentage gain of 12.70%; Wilcoxon signed-rank test statistic: W = 14.0, two-sided p-value: p = 1.34 × 10-2) and improved the median expert 5-point rating from 2 to 3, an absolute increase of 1 point (corresponding to a relative percentage gain of 50.00%; Wilcoxon signed-rank test statistic: W = 3.5, two-sided p-value: p = 7.00 × 10-3). For reasoning-type questions, incomplete context retrieved by MDSK-RAG sometimes disrupted the model’s reasoning process and led to incorrect conclusions, indicating remaining room for improvement. Comparable, statistically significant improvements were observed for the other local models (Qwen2.5-7B-Instruct and a larger gemma-2-27b-it) between conditions with and without the framework in the evaluation by cosine similarity to expert reference answers. In comparison to a cloud-based LLM, the gemma-2-9b-it with the framework outperformed GPT-4o. In this case study, the framework effectively incorporated practical experimental knowledge and theoretical literature into local LLM responses, improving accuracy for domain-specific queries. The framework presented here offers a practical and extensible adaptation of local LLMs to domain-specific scientific research.

PMID:41325550 | DOI:10.1021/acs.jcim.5c01941

Categories
Nevin Manimala Statistics

Systemic C-reactive protein levels and central serous chorioretinopathy: A systematic review with meta-analysis

Acta Ophthalmol. 2025 Dec 1. doi: 10.1111/aos.70049. Online ahead of print.

ABSTRACT

Elevated corticosteroid levels are the strongest known risk factor for central serous chorioretinopathy (CSC), and previous studies have explored if alterations in systemic immunity could play a role in CSC. Here, we explored if elevated systemic C-reactive protein (CRP), a marker of systemic low-grade inflammation, is associated with CSC. We systematically searched 12 literature databases on 12 April 2025 for studies in which blood CRP is measured in both patients with CSC and a comparable control group. Studies were reviewed qualitatively. Meta-analysis using the random-effects model was performed on the weighted mean difference in systemic CRP levels between patients with CSC and controls. Six studies comprising 544 patients with CSC and 655 control individuals were eligible for this review. The meta-analysis of the difference in CRP between patients with CSC and controls showed no statistically significant difference at 0.86 mg/L (95% CI: -1.03-2.75 mg/L; p = 0.37). One study reported a very high degree of association between elevated CRP and CSC, which was not reproduced in the other studies. The lack of association remained consistent in the sensitivity analyses. Current evidence does not suggest that elevated systemic CRP levels are associated with CSC. Further studies on CSC pathophysiology are warranted.

PMID:41325540 | DOI:10.1111/aos.70049

Categories
Nevin Manimala Statistics

Subsistence fishing patterns near food deserts

Proc Natl Acad Sci U S A. 2025 Dec 9;122(49):e2519112122. doi: 10.1073/pnas.2519112122. Epub 2025 Dec 1.

ABSTRACT

Fisheries are critical for sustaining waterfront communities. However, subsistence fishing is not well understood in the United States, despite its potential contributions to health and culture. We piloted a multivariable construct to classify subsistence vs. nonsubsistence fishers, identified the strongest predictor of participating in this practice, and tested for differences in place-based fishing motivations, behaviors, and community sharing. Among shore-based fishers in coastal Alabama, lower household income was the most powerful predictor of subsistence fishing. Subsistence fishers held more fishing motivations, targeted more specific fish groups, were more efficient in catching and keeping fish, and more frequently shared fish across social groups. Informed by these findings, we discussed management strategies to addressopportunities and barriers for shore-based subsistence fishing in coastal Alabama. More broadly, the framework piloted here offers a pathway to integrate subsistence fisheries into management using place-based evidence.

PMID:41325526 | DOI:10.1073/pnas.2519112122

Categories
Nevin Manimala Statistics

Deep Learning vs Classical Methods in Potency and ADME Prediction: Insights from a Computational Blind Challenge

J Chem Inf Model. 2025 Dec 1. doi: 10.1021/acs.jcim.5c01982. Online ahead of print.

ABSTRACT

Reliable prediction of compound potency and the ADME profile is crucial in drug discovery. With the recent surge of AI and deep learning frameworks, it remains unclear whether these modern techniques offer statistically significant improvement over the well-established classical methods. The 2025 ASAP-Polaris-OpenADMET Antiviral Challenge provided a unique benchmarking opportunity to address this question, with over 65 teams of computational scientists worldwide. Our submissions were among the top performers in terms of Pearson r correlation, ranked first in pIC50 prediction for SARS-CoV-2 Mpro and fourth in aggregated ADME. In this work, we present a retrospective analysis of our modeling strategies and highlight our lessons learned. Through rigorous statistical benchmarking, we demonstrate that while classical methods remain highly competitive for predicting potency, modern deep learning algorithms significantly outperformed traditional machine learning in ADME prediction. We also illustrate the importance of appropriate data curation and the benefits of leveraging public datasets via feature augmentation. Finally, we outline current limitations and identify future opportunities including the integration of structure-guided modeling. Overall, these results not only provide practical guidance for building robust predictive models but also offer valuable insights into the field of computational drug discovery.

PMID:41325513 | DOI:10.1021/acs.jcim.5c01982

Categories
Nevin Manimala Statistics

Cost-effectiveness of Zvandiri, a community-based support intervention to reduce virological failure in adolescents living with HIV in Zimbabwe: Results of a decision analytical model

PLOS Glob Public Health. 2025 Dec 1;5(12):e0005545. doi: 10.1371/journal.pgph.0005545. eCollection 2025.

ABSTRACT

Improving antiretroviral therapy (ART) adherence among adolescents living with HIV (ALHIV) improves outcomes, but with resource implications. We conducted a cost-effectiveness analysis extrapolating the costs and benefits of a community-based peer-support intervention (Zvandiri) among ALHIV in Zimbabwe. We used a de-novo multistate Markov decision-analytic model that simulated Zvandiri lifetime costs and benefits on viral suppression, death rates, life-years (LY) and quality-adjusted-life-years (QALYs) gained from the healthcare system perspective. We estimate the incremental cost-effectiveness ratio (ICER) per LY and QALY gained and compare the ICER to proposed cost-effectiveness thresholds of $500 and $700 per LY or QALY gained. We explore parameter uncertainty using probabilistic sensitivity analyses. Cohort-microsimulation suggests that after 40 years under SoC, 21% of 280 ALHIV will have undetectable viral-load (VL), 12% will have low VL (<1000 copies/mL), 10% will have high VL (≥1000 copies/mL) and 57% would have died. With Zvandiri, ART adherence improves, decreasing annual probability of virological failure or death. After 40 years, 65% will have undetectable viral load, 23% low VL, 3% high VL and 9% would have died. Zvandiri results in 1,345 LYs gained at incremental cost of $500,587, yielding a discounted ICER of $372 per LY gained. Zvandiri also results in 1,246 QALYs at incremental cost of $123,645, yielding a discounted ICER of $99 per QALY. The ICER is highly sensitive to programme costs, health-related utilities, and the discount rate. Zvandiri is a cost-effective intervention for reducing virological failure and death in ALHIV. Our analysis likely underestimates the full benefits of the intervention by not accounting for reductions in HIV transmissions resulting from higher virological suppression observed in full transmission models.

PMID:41325500 | DOI:10.1371/journal.pgph.0005545

Categories
Nevin Manimala Statistics

Exploring spatial variation and multilevel modeling of malaria prevalence among children aged 6-59 months based on RDT in Niger: Insights for public health decision-making

PLoS One. 2025 Dec 1;20(12):e0336022. doi: 10.1371/journal.pone.0336022. eCollection 2025.

ABSTRACT

BACKGROUND: Malaria is a life-threatening infectious disease caused by parasites of the genus Plasmodium transmitted through the bite of infected female Anopheles mosquitoes, which act as vectors of the disease. It affects approximately 219 million people globally and results in 435,000 deaths each year. Fever, chills, and exhaustion are among of the signs of this illness. If left untreated, these symptoms can develop into serious problems like anemia, respiratory distress, and even organ failure. By identifying determinants related to malaria prevalence, this study supports evidence-based national malaria prevention and control initiatives. The results help improve decision-making for malaria control efforts and guide focused public health initiatives by identifying areas with a high malaria burden.

METHODS: Data from the 2021 Niger Malaria Indicator Survey (NMIS) is used, focusing on RDT-confirmed malaria cases in children aged 6-59 months. The dataset includes individual, household, and community-level variables, such as age, household income, education, healthcare access, and geographic coordinates. Spatial distribution of malaria prevalence is first visualized through maps and hot spot analysis to identify areas with high and low malaria rates. Random effects are incorporated to capture unobserved heterogeneity between regions and communities, allowing for more accurate estimates of malaria prevalence by adjusting for spatial clustering. Multilevel logistic regression models are applied to account for the hierarchical structure of the data. Model fit is evaluated using standard criteria (AIC, BIC and DIC), and diagnostics are performed to ensure reliability.

RESULTS: 1121 (23.7%) of the 4724 children aged 6 to 59 months who were examined had positive RDT results for malaria. Malaria prevalence in Niger among children aged 6-59 months is significantly clustered (Moran’s I = 0.434, p < 0.001), revealing distinct hotspots and cold spots unlikely due to chance. Model III provides a better fit for RDT prevalence among children aged 6-59 months with malaria, as indicated by the smallest AIC, BIC, and deviation statistics compared to other reduced models. Malaria prevalence was associated with factors, including child age, anemia levels, maternal education, the number of children sleeping under bed nets, the use of insecticide-treated nets, the number of children aged 5 and under, as well as residence and region.

CONCLUSION: The findings show that malaria prevalence among children aged 6-59 months in Niger is significantly influenced by factors such as child age, anemia levels, maternal education, and bed net usage, emphasizing the need for improved coverage of insecticide-treated nets and tailored interventions based on local conditions.

PMID:41325497 | DOI:10.1371/journal.pone.0336022