Categories
Nevin Manimala Statistics

A Machine Learning Algorithm Predicting Acute Kidney Injury in Intensive Care Unit Patients (NAVOY Acute Kidney Injury): Proof-of-Concept Study

JMIR Form Res. 2023 Dec 14;7:e45979. doi: 10.2196/45979.

ABSTRACT

BACKGROUND: Acute kidney injury (AKI) represents a significant global health challenge, leading to increased patient distress and financial health care burdens. The development of AKI in intensive care unit (ICU) settings is linked to prolonged ICU stays, a heightened risk of long-term renal dysfunction, and elevated short- and long-term mortality rates. The current diagnostic approach for AKI is based on late indicators, such as elevated serum creatinine and decreased urine output, which can only detect AKI after renal injury has transpired. There are no treatments to reverse or restore renal function once AKI has developed, other than supportive care. Early prediction of AKI enables proactive management and may improve patient outcomes.

OBJECTIVE: The primary aim was to develop a machine learning algorithm, NAVOY Acute Kidney Injury, capable of predicting the onset of AKI in ICU patients using data routinely collected in ICU electronic health records. The ultimate goal was to create a clinical decision support tool that empowers ICU clinicians to proactively manage AKI and, consequently, enhance patient outcomes.

METHODS: We developed the NAVOY Acute Kidney Injury algorithm using a hybrid ensemble model, which combines the strengths of both a Random Forest (Leo Breiman and Adele Cutler) and an XGBoost model (Tianqi Chen). To ensure the accuracy of predictions, the algorithm used 22 clinical variables for hourly predictions of AKI as defined by the Kidney Disease: Improving Global Outcomes guidelines. Data for algorithm development were sourced from the Massachusetts Institute of Technology Lab for Computational Physiology Medical Information Mart for Intensive Care IV clinical database, focusing on ICU patients aged 18 years or older.

RESULTS: The developed algorithm, NAVOY Acute Kidney Injury, uses 4 hours of input and can, with high accuracy, predict patients with a high risk of developing AKI 12 hours before onset. The prediction performance compares well with previously published prediction algorithms designed to predict AKI onset in accordance with Kidney Disease: Improving Global Outcomes diagnosis criteria, with an impressive area under the receiver operating characteristics curve (AUROC) of 0.91 and an area under the precision-recall curve (AUPRC) of 0.75. The algorithm’s predictive performance was externally validated on an independent hold-out test data set, confirming its ability to predict AKI with exceptional accuracy.

CONCLUSIONS: NAVOY Acute Kidney Injury is an important development in the field of critical care medicine. It offers the ability to predict the onset of AKI with high accuracy using only 4 hours of data routinely collected in ICU electronic health records. This early detection capability has the potential to strengthen patient monitoring and management, ultimately leading to improved patient outcomes. Furthermore, NAVOY Acute Kidney Injury has been granted Conformite Europeenne (CE)-marking, marking a significant milestone as the first CE-marked AKI prediction algorithm for commercial use in European ICUs.

PMID:38096015 | DOI:10.2196/45979

Categories
Nevin Manimala Statistics

Potential and Limitations of ChatGPT 3.5 and 4.0 as a Source of COVID-19 Information: Comprehensive Comparative Analysis of Generative and Authoritative Information

J Med Internet Res. 2023 Dec 14;25:e49771. doi: 10.2196/49771.

ABSTRACT

BACKGROUND: The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has necessitated reliable and authoritative information for public guidance. The World Health Organization (WHO) has been a primary source of such information, disseminating it through a question and answer format on its official website. Concurrently, ChatGPT 3.5 and 4.0, a deep learning-based natural language generation system, has shown potential in generating diverse text types based on user input.

OBJECTIVE: This study evaluates the accuracy of COVID-19 information generated by ChatGPT 3.5 and 4.0, assessing its potential as a supplementary public information source during the pandemic.

METHODS: We extracted 487 COVID-19-related questions from the WHO’s official website and used ChatGPT 3.5 and 4.0 to generate corresponding answers. These generated answers were then compared against the official WHO responses for evaluation. Two clinical experts scored the generated answers on a scale of 0-5 across 4 dimensions-accuracy, comprehensiveness, relevance, and clarity-with higher scores indicating better performance in each dimension. The WHO responses served as the reference for this assessment. Additionally, we used the BERT (Bidirectional Encoder Representations from Transformers) model to generate similarity scores (0-1) between the generated and official answers, providing a dual validation mechanism.

RESULTS: The mean (SD) scores for ChatGPT 3.5-generated answers were 3.47 (0.725) for accuracy, 3.89 (0.719) for comprehensiveness, 4.09 (0.787) for relevance, and 3.49 (0.809) for clarity. For ChatGPT 4.0, the mean (SD) scores were 4.15 (0.780), 4.47 (0.641), 4.56 (0.600), and 4.09 (0.698), respectively. All differences were statistically significant (P<.001), with ChatGPT 4.0 outperforming ChatGPT 3.5. The BERT model verification showed mean (SD) similarity scores of 0.83 (0.07) for ChatGPT 3.5 and 0.85 (0.07) for ChatGPT 4.0 compared with the official WHO answers.

CONCLUSIONS: ChatGPT 3.5 and 4.0 can generate accurate and relevant COVID-19 information to a certain extent. However, compared with official WHO responses, gaps and deficiencies exist. Thus, users of ChatGPT 3.5 and 4.0 should also reference other reliable information sources to mitigate potential misinformation risks. Notably, ChatGPT 4.0 outperformed ChatGPT 3.5 across all evaluated dimensions, a finding corroborated by BERT model validation.

PMID:38096014 | DOI:10.2196/49771

Categories
Nevin Manimala Statistics

Development of Risk Prediction Models for Severe Periodontitis in a Thai Population: Statistical and Machine Learning Approaches

JMIR Form Res. 2023 Dec 14;7:e48351. doi: 10.2196/48351.

ABSTRACT

BACKGROUND: Severe periodontitis affects 26% of Thai adults and 11.2% of adults globally and is characterized by the loss of alveolar bone height. Full-mouth examination by periodontal probing is the gold standard for diagnosis but is time- and resource-intensive. A screening model to identify those at high risk of severe periodontitis would offer a targeted approach and aid in reducing the workload for dentists. While statistical modelling by a logistic regression is commonly applied, optimal performance depends on feature selections and engineering. Machine learning has been recently gaining favor given its potential discriminatory power and ability to deal with multiway interactions without the requirements of linear assumptions.

OBJECTIVE: We aim to compare the performance of screening models developed using statistical and machine learning approaches for the risk prediction of severe periodontitis.

METHODS: This study used data from the prospective Electricity Generating Authority of Thailand cohort. Dental examinations were performed for the 2008 and 2013 surveys. Oral examinations (ie, number of teeth and oral hygiene index and plaque scores), periodontal pocket depth, and gingival recession were performed by dentists. The outcome of interest was severe periodontitis diagnosed by the Centre for Disease Control-American Academy of Periodontology, defined as 2 or more interproximal sites with a clinical attachment level ≥6 mm (on different teeth) and 1 or more interproximal sites with a periodontal pocket depth ≥5 mm. Risk prediction models were developed using mixed-effects logistic regression (MELR), recurrent neural network, mixed-effects support vector machine, and mixed-effects decision tree models. A total of 21 features were considered as predictive features, including 4 demographic characteristics, 2 physical examinations, 4 underlying diseases, 1 medication, 2 risk behaviors, 2 oral features, and 6 laboratory features.

RESULTS: A total of 3883 observations from 2086 participants were split into development (n=3112, 80.1%) and validation (n=771, 19.9%) sets with prevalences of periodontitis of 34.4% (n=1070) and 34.1% (n=263), respectively. The final MELR model contained 6 features (gender, education, smoking, diabetes mellitus, number of teeth, and plaque score) with an area under the curve (AUC) of 0.983 (95% CI 0.977-0.989) and positive likelihood ratio (LR+) of 11.9 (95% CI 8.8-16.3). Machine learning yielded lower performance than the MELR model, with AUC (95% CI) and LR+ (95% CI) values of 0.712 (0.669-0.754) and 2.1 (1.8-2.6), respectively, for the recurrent neural network model; 0.698 (0.681-0.734) and 2.1 (1.7-2.6), respectively, for the mixed-effects support vector machine model; and 0.662 (0.621-0.702) and 2.4 (1.9-3.0), respectively, for the mixed-effects decision tree model.

CONCLUSIONS: The MELR model might be more useful than machine learning for large-scale screening to identify those at high risk of severe periodontitis for periodontal evaluation. External validation using data from other centers is required to evaluate the generalizability of the model.

PMID:38096008 | DOI:10.2196/48351

Categories
Nevin Manimala Statistics

Examining the structure of personality dysfunction

Personal Disord. 2023 Dec 14. doi: 10.1037/per0000648. Online ahead of print.

ABSTRACT

Personality impairment is a core feature of personality disorders in both current (i.e., Diagnostic and Statistical Manual of Mental Disorders, fifth edition [DSM-5] personality disorders, International Classification of Diseases,11th revision personality disorders) and emerging (i.e., DSM-5′s alternative model of personality disorders) models of psychopathology. Yet, despite its importance within clinical nosology, attempts to identify its optimal lower-order structure have yielded inconsistent findings. Given its presence in diagnostic models, it is important to better understand its empirical structure across a variety of instantiations. To the degree that impairment is multifaceted, various factors may have different nomological networks and varied implications for assessment, diagnosis, and treatment. Therefore, participants were recruited from two large public universities in the present preregistered study (N = 574) to explore the construct’s structure with exploratory “bass-ackward” factor analyses at the item level. Participants completed over 250 items from six commonly used measures of personality dysfunction. Criterion variables in its nomological network were also collected (e.g., general and pathological personality traits, internalizing/externalizing behavior, and personality disorders) using both self- and informant-reports. These factor analyses identified four lower-order facets of impairment (i.e., negative self-regard, disagreeableness, intimacy problems, and lack of direction), all of which showed moderate to strong overlap with traits from both general and pathological models of personality. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

PMID:38095995 | DOI:10.1037/per0000648

Categories
Nevin Manimala Statistics

Simulation-based design optimization for statistical power: Utilizing machine learning

Psychol Methods. 2023 Dec 14. doi: 10.1037/met0000611. Online ahead of print.

ABSTRACT

The planning of adequately powered research designs increasingly goes beyond determining a suitable sample size. More challenging scenarios demand simultaneous tuning of multiple design parameter dimensions and can only be addressed using Monte Carlo simulation if no analytical approach is available. In addition, cost considerations, for example, in terms of monetary costs, are a relevant target for optimization. In this context, optimal design parameters can imply a desired level of power at minimum cost or maximum power at a cost threshold. We introduce a surrogate modeling framework based on machine learning predictions to solve these optimization tasks. In a simulation study, we demonstrate the efficiency for a wide range of hypothesis testing scenarios with single- and multidimensional design parameters, including t tests, analysis of variance, item response theory models, multilevel models, and multiple imputations. Our framework provides an algorithmic solution for optimizing study designs when no analytic power analysis is available, handling multiple design dimensions and cost considerations. Our implementation is publicly available in the R package mlpwr. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

PMID:38095992 | DOI:10.1037/met0000611

Categories
Nevin Manimala Statistics

Using Bayesian item response theory for multicohort repeated measure design to estimate individual latent change scores

Psychol Methods. 2023 Dec 14. doi: 10.1037/met0000635. Online ahead of print.

ABSTRACT

Repeated measure data design has been used extensively in a wide range of fields, such as brain aging or developmental psychology, to answer important research questions exploring relationships between trajectory of change and external variables. In many cases, such data may be collected from multiple study cohorts and harmonized, with the intention of gaining higher statistical power and enhanced external validity. When psychological constructs are measured using survey scales, a fundamental psychometric challenge for data harmonization is to create commensurate measures for the constructs of interest across studies. Traditional analysis may fit a unidimensional item response theory model to data from one time point and one cohort to obtain item parameters and fix the same parameters in subsequent analyses. Such a simplified approach ignores item residual dependencies in the repeated measure design on one hand, and on the other hand, it does not exploit accumulated information from different cohorts. Instead, two alternative approaches should serve such data designs much better: an integrative approach using multiple-group two-tier model via concurrent calibration, and if such calibration fails to converge, a Bayesian sequential calibration approach that uses informative priors on common items to establish the scale. Both approaches use a Markov chain Monte Carlo algorithm that handles computational complexity well. Through a simulation study and an empirical study using Alzheimer’s diseases neuroimage initiative cognitive battery data (i.e., language and executive functioning), we conclude that latent change scores obtained from these two alternative approaches are more precisely recovered. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

PMID:38095987 | DOI:10.1037/met0000635

Categories
Nevin Manimala Statistics

Multimodal aspects of sentence comprehension: Do facial and color cues interact with processing negated and affirmative sentences?

J Exp Psychol Learn Mem Cogn. 2023 Dec 14. doi: 10.1037/xlm0001302. Online ahead of print.

ABSTRACT

Negation is usually considered as a linguistic operator reversing the truth value of a proposition. However, there are various ways to express negation in a multimodal manner. It still remains an unresolved issue whether nonverbal expressions of negation can influence linguistic negation comprehension. Based on extensive evidence demonstrating that language comprehenders are able to instantly integrate extralinguistic information such as a speaker’s identity, we expected that nonverbal cues of negation and affirmation might similarly affect sentence comprehension. In three preregistered experiments, we examined how far nonverbal markers of negation and affirmation-specifically, the so-called “not face” (see Benitez-Quiroz et al., 2016) and red or green color (see Dudschig et al., 2023)-interact with comprehending negation and affirmation at the sentential level. Participants were presented with photos (“not face” vs. positive control; Experiments 1 and 2) or color patches (red vs. green; Experiment 3). They then read negated and affirmative sentences in a self-paced manner or judged the sensibility of negated and affirmative sentences (e.g., “No, I do not want to sing” vs. “Yes, I would like to buy a sofa”). Both frequentist statistics and Bayes factors resulting from linear mixed-effects analyses showed that processing times for negated and affirmative sentences were not significantly modulated by the nonverbal features under investigation. This indicates that their influence might not extend to sentential negation or affirmation comprehension. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

PMID:38095955 | DOI:10.1037/xlm0001302

Categories
Nevin Manimala Statistics

Correction to Timulak et al. (2022)

Psychotherapy (Chic). 2023 Dec;60(4):547. doi: 10.1037/pst0000504.

ABSTRACT

Reports an error in “A comparison of emotion-focused therapy and cognitive-behavioral therapy in the treatment of generalized anxiety disorder: Results of a feasibility randomized controlled trial” by Ladislav Timulak, Daragh Keogh, Craig Chigwedere, Charlotte Wilson, Fiona Ward, David Hevey, Patrick Griffin, Louise Jacobs, Suzanne Hughes, Christina Vaughan, Kea Beckham and Shona Mahon (Psychotherapy, 2022[Mar], Vol 59[1], 84-95). In the article, the third n and percentage values in the second sentence in the second paragraph of the Treatment Drop Out, Number of Sessions, Research Attrition section should appear as n = 6 (20.6%) at 6-month follow-up. All versions of this article have been corrected. (The following abstract of the original article appeared in record 2022-26657-001.) Generalized anxiety disorder (GAD) is a chronic mental health difficulty typically present in primary care settings. Cognitive-behavioral therapy (CBT) is the psychological intervention with the best evidence for its efficacy for GAD. The development of other psychological interventions can increase client choice. This feasibility trial examined an initial assessment of the efficacy of EFT in comparison to CBT in the treatment of GAD in the context of an Irish public health service. The trial provided information on recruitment, therapist training/adherence, and client retention relevant for a potential noninferiority trial. A randomized controlled trial compared the efficacy of EFT versus CBT for GAD. Both therapies were offered in a 16-20 sessions format. Therapists (n = 8) were trained in both conditions and offered both therapies. Clients were randomly assigned to the two therapies EFT (n = 29) and CBT (n = 29). Outcomes were assessed using several measures, with the Generalized Anxiety Disorder-7 (GAD-7) being the primary outcome. Clients were assessed at baseline, week 16, end of therapy, and at 6-month follow-up. Therapists were able to learn the two models after a short training and showed moderate levels of adherence. Although not statistically significant, the drop out from treatment was 10% for EFT and 27% for CBT. The two therapies showed large pre-post change and similar outcomes across all measures, with these benefits retained at 6-month follow-up. Results suggest that EFT is a potentially promising treatment for GAD. Further investigation is indicated to establish its potential to expand the available psychological therapies for GAD. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

PMID:38095932 | DOI:10.1037/pst0000504

Categories
Nevin Manimala Statistics

Clinical Characteristics of Primary Snoring vs Mild Obstructive Sleep Apnea in Children: Analysis of the Pediatric Adenotonsillectomy for Snoring (PATS) Randomized Clinical Trial

JAMA Otolaryngol Head Neck Surg. 2023 Dec 14. doi: 10.1001/jamaoto.2023.3816. Online ahead of print.

ABSTRACT

IMPORTANCE: It is unknown whether children with primary snoring and children with mild obstructive sleep apnea (OSA) represent populations with substantially different clinical characteristics. Nonetheless, an obstructive apnea-hypopnea index (AHI) of 1 or greater is often used to define OSA and plan for adenotonsillectomy (AT).

OBJECTIVE: To assess whether a combination of clinical characteristics differentiates children with primary snoring from children with mild OSA.

DESIGN, SETTING, AND PARTICIPANTS: Baseline data from the Pediatric Adenotonsillectomy Trial for Snoring (PATS) study, a multicenter, single-blind, randomized clinical trial conducted at 6 academic sleep centers from June 2016 to January 2021, were analyzed. Children aged 3.0 to 12.9 years with polysomnography-diagnosed (AHI <3) mild obstructive sleep-disordered breathing who were considered candidates for AT were included. Data analysis was performed from July 2022 to October 2023.

MAIN OUTCOMES AND MEASURES: Logistic regression models were fitted to identify which demographic, clinical, and caregiver reports distinguished children with primary snoring (AHI <1; 311 patients [67.8%]) from children with mild OSA (AHI 1-3; 148 patients [32.2%]).

RESULTS: A total of 459 children were included. The median (IQR) age was 6.0 (4.0-7.5) years, 230 (50.1%) were female, and 88 (19.2%) had obesity. A total of 121 (26.4%) were Black, 75 (16.4%) were Hispanic, 236 (51.5%) were White, and 26 (5.7%) were other race and ethnicity. Black race (odds ratio [OR], 2.08; 95% CI, 1.32-3.30), obesity (OR, 1.80; 95% CI, 1.12-2.91), and high urinary cotinine levels (>5 µg/L) (OR, 1.88; 95% CI, 1.15-3.06) were associated with greater odds of mild OSA rather than primary snoring. Other demographic characteristics, clinical examination findings, and questionnaire reports did not distinguish between primary snoring and mild OSA. A weighted combination of the statistically significant clinical predictors had limited ability to differentiate children with mild OSA from children with primary snoring.

CONCLUSIONS AND RELEVANCE: In this analysis of baseline data from the PATS randomized clinical trial, primary snoring and mild OSA were difficult to distinguish without polysomnography. Mild OSA vs snoring alone did not identify a clinical group of children who may stand to benefit from AT for obstructive sleep-disordered breathing.

TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT02562040.

PMID:38095903 | DOI:10.1001/jamaoto.2023.3816

Categories
Nevin Manimala Statistics

Evidence-Based Checklist to Delay Cardiac Arrest in Brain-Dead Potential Organ Donors: The DONORS Cluster Randomized Clinical Trial

JAMA Netw Open. 2023 Dec 1;6(12):e2346901. doi: 10.1001/jamanetworkopen.2023.46901.

ABSTRACT

IMPORTANCE: The effectiveness of goal-directed care to reduce loss of brain-dead potential donors to cardiac arrest is unclear.

OBJECTIVE: To evaluate the effectiveness of an evidence-based, goal-directed checklist in the clinical management of brain-dead potential donors in the intensive care unit (ICU).

DESIGN, SETTING, AND PARTICIPANTS: The Donation Network to Optimize Organ Recovery Study (DONORS) was an open-label, parallel-group cluster randomized clinical trial in Brazil. Enrollment and follow-up were conducted from June 20, 2017, to November 30, 2019. Hospital ICUs that reported 10 or more brain deaths in the previous 2 years were included. Consecutive brain-dead potential donors in the ICU aged 14 to 90 years with a condition consistent with brain death after the first clinical examination were enrolled. Participants were randomized to either the intervention group or the control group. The intention-to-treat data analysis was conducted from June 15 to August 30, 2020.

INTERVENTIONS: Hospital staff in the intervention group were instructed to administer to brain-dead potential donors in the intervention group an evidence-based checklist with 13 clinical goals and 14 corresponding actions to guide care, every 6 hours, from study enrollment to organ retrieval. The control group provided or received usual care.

MAIN OUTCOMES AND MEASURES: The primary outcome was loss of brain-dead potential donors to cardiac arrest at the individual level. A prespecified sensitivity analysis assessed the effect of adherence to the checklist in the intervention group.

RESULTS: Among the 1771 brain-dead potential donors screened in 63 hospitals, 1535 were included. These patients included 673 males (59.2%) and had a median (IQR) age of 51 (36.3-62.0) years. The main cause of brain injury was stroke (877 [57.1%]), followed by trauma (485 [31.6%]). Of the 63 hospitals, 31 (49.2%) were assigned to the intervention group (743 [48.4%] brain-dead potential donors) and 32 (50.8%) to the control group (792 [51.6%] brain-dead potential donors). Seventy potential donors (9.4%) at intervention hospitals and 117 (14.8%) at control hospitals met the primary outcome (risk ratio [RR], 0.70; 95% CI, 0.46-1.08; P = .11). The primary outcome rate was lower in those with adherence higher than 79.0% than in the control group (5.3% vs 14.8%; RR, 0.41; 95% CI, 0.22-0.78; P = .006).

CONCLUSIONS AND RELEVANCE: This cluster randomized clinical trial was inconclusive in determining whether the overall use of an evidence-based, goal-directed checklist reduced brain-dead potential donor loss to cardiac arrest. The findings suggest that use of such a checklist has limited effectiveness without adherence to the actions recommended in this checklist.

TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT03179020.

PMID:38095899 | DOI:10.1001/jamanetworkopen.2023.46901