Categories
Nevin Manimala Statistics

Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis

JMIR Med Educ. 2024 Jan 5;10:e51148. doi: 10.2196/51148.

ABSTRACT

BACKGROUND: The United States Medical Licensing Examination (USMLE) has been critical in medical education since 1992, testing various aspects of a medical student’s knowledge and skills through different steps, based on their training level. Artificial intelligence (AI) tools, including chatbots like ChatGPT, are emerging technologies with potential applications in medicine. However, comprehensive studies analyzing ChatGPT’s performance on USMLE Step 3 in large-scale scenarios and comparing different versions of ChatGPT are limited.

OBJECTIVE: This paper aimed to analyze ChatGPT’s performance on USMLE Step 3 practice test questions to better elucidate the strengths and weaknesses of AI use in medical education and deduce evidence-based strategies to counteract AI cheating.

METHODS: A total of 2069 USMLE Step 3 practice questions were extracted from the AMBOSS study platform. After including 229 image-based questions, a total of 1840 text-based questions were further categorized and entered into ChatGPT 3.5, while a subset of 229 questions were entered into ChatGPT 4. Responses were recorded, and the accuracy of ChatGPT answers as well as its performance in different test question categories and for different difficulty levels were compared between both versions.

RESULTS: Overall, ChatGPT 4 demonstrated a statistically significant superior performance compared to ChatGPT 3.5, achieving an accuracy of 84.7% (194/229) and 56.9% (1047/1840), respectively. A noteworthy correlation was observed between the length of test questions and the performance of ChatGPT 3.5 (ρ=-0.069; P=.003), which was absent in ChatGPT 4 (P=.87). Additionally, the difficulty of test questions, as categorized by AMBOSS hammer ratings, showed a statistically significant correlation with performance for both ChatGPT versions, with ρ=-0.289 for ChatGPT 3.5 and ρ=-0.344 for ChatGPT 4. ChatGPT 4 surpassed ChatGPT 3.5 in all levels of test question difficulty, except for the 2 highest difficulty tiers (4 and 5 hammers), where statistical significance was not reached.

CONCLUSIONS: In this study, ChatGPT 4 demonstrated remarkable proficiency in taking the USMLE Step 3, with an accuracy rate of 84.7% (194/229), outshining ChatGPT 3.5 with an accuracy rate of 56.9% (1047/1840). Although ChatGPT 4 performed exceptionally, it encountered difficulties in questions requiring the application of theoretical concepts, particularly in cardiology and neurology. These insights are pivotal for the development of examination strategies that are resilient to AI and underline the promising role of AI in the realm of medical education and diagnostics.

PMID:38180782 | DOI:10.2196/51148

Categories
Nevin Manimala Statistics

Prevalence and the factors associated with microalbuminuria among patients with type 2 diabetes mellitus and/or hypertension in the urban areas of Puducherry district: a cross-sectional study

Fam Pract. 2024 Jan 5:cmad124. doi: 10.1093/fampra/cmad124. Online ahead of print.

ABSTRACT

BACKGROUND: Microalbuminuria is an early indicator for renal and cardiovascular diseases, especially among patients with diabetes mellitus (DM) and hypertension (HTN). We determined the prevalence and the factors associated with microalbuminuria among patients with type 2 DM and/or HTN in the urban areas of the Puducherry district in India.

METHODS: We included 225 patients aged 40-69 years with DM and/or HTN from a non-communicable diseases (NCDs) survey conducted during 2019-2020 in the urban areas of Puducherry district. The prevalence of microalbuminuria and various biological risk factors of NCDs were assessed as per the WHO STEPS methodology. The prevalence of microalbuminuria was presented as proportions (95% CI), and the adjusted prevalence ratio (aPR) was estimated using weighted forward stepwise generalized linear modelling. P-value ≤0.05 was considered statistically significant.

RESULTS: The mean (SD) age of the patients was 54 (11) years. Over one-third (38.2%) (95% CI: 31.6-44.4) of patients with DM and/or HTN had microalbuminuria. The prevalence was highest among those having both DM and HTN 48% (95% CI: 37-59), followed by those having only DM 40.6% (95% CI: 29-52.2) and only HTN 27.7% (95% CI: 18.1-38.6). The prevalence of microalbuminuria was twice (aPR = 2.1, 95% CI: 1.1-3.9) higher among women and 2.4 times (95% CI: 1.12-5.1) higher among those having both DM and HTN as compared to those with only HTN.

CONCLUSION: The prevalence of microalbuminuria among patients with DM and/or HTN is concerningly high. Population-based screening for microalbuminuria, especially among women and those having both DM and HTN, needs to be undertaken in the urban areas of Puducherry district.

PMID:38180781 | DOI:10.1093/fampra/cmad124

Categories
Nevin Manimala Statistics

Applying Resampling and Visualization Methods in Factor Analysis to Model Human Spatial Vision

Invest Ophthalmol Vis Sci. 2024 Jan 2;65(1):17. doi: 10.1167/iovs.65.1.17.

ABSTRACT

PURPOSE: Studies have reported different numbers of spatial frequency channels for chromatic and achromatic vision. To resolve the difference, we performed factor analysis, a multivariate modeling technique, on population data of achromatic and chromatic sensitivity. In addition, we included resampling and visualization methods to evaluate models from factor analysis. These routines are complex but widely useful. Therefore we have archived our analysis routines by building smCSF, an open-source software package in R (https://smin95.github.io/dataviz/).

METHODS: Data of 103 normally-sighted adults were analyzed. They included blue-yellow, red-green, and achromatic contrast sensitivity. To obtain the confidence interval of relevant statistical parameters, factor analysis was performed using a resampling method. Then exploratory models were developed. We then performed model selections by fitting them against the empirical data and quantifying the quality of the fits.

RESULTS: During the exploratory factor analysis, different statistical tests supported different factor models. These could partially be reasons for why there have been conflicting reports. However, after the confirmatory analysis, we found that a model that included two spatial channels was adequate to approximate the chromatic sensitivity data, whereas that with three channels was so for the achromatic sensitivity data.

CONCLUSIONS: Our findings provide novel insights about the spatial channels for chromatic and achromatic contrast sensitivity from population data. Also, the analysis and visualization routines have been archived in a computational package to boost the transparency and replicability of science.

PMID:38180771 | DOI:10.1167/iovs.65.1.17

Categories
Nevin Manimala Statistics

History of Infertility and Midlife Cardiovascular Health in Female Individuals

JAMA Netw Open. 2024 Jan 2;7(1):e2350424. doi: 10.1001/jamanetworkopen.2023.50424.

ABSTRACT

IMPORTANCE: Fertility status is a marker for future health, and infertility has been associated with risk for later cancer and diabetes, but associations with midlife cardiovascular health (CVH) in female individuals remain understudied.

OBJECTIVE: To evaluate the association of infertility history with CVH at midlife (approximately age 50 years) among parous individuals.

DESIGN, SETTING, AND PARTICIPANTS: Project Viva is a prospective cohort study of pregnant participants enrolled between 1999 and 2002 who delivered a singleton live birth in the greater Boston, Massachusetts, area. Infertility history was collected at a midlife visit between 2017 and 2021, approximately 18 years after enrollment. Data analysis was performed from January to June 2023.

EXPOSURES: The primary exposure was any lifetime history of infertility identified by self-report, medical record, diagnosis, or claims for infertility treatment.

MAIN OUTCOMES AND MEASURES: The American Heart Association’s Life’s Essential 8 (LE8) is a construct for ranking CVH that includes scores from 0 to 100 (higher scores denote better health status) in 4 behavioral (diet, physical activity, sleep, and smoking status) and 4 biomedical (body mass index, blood pressure, blood lipids, and glycemia) domains to form an overall assessment of CVH. Associations of a history of infertility (yes or no) with mean LE8 total, behavioral, biomedical, and blood biomarker (lipids and glycemia) scores were examined, adjusting for age at outcome (midlife visit), race and ethnicity, education, household income, age at menarche, and perceived body size at age 10 years.

RESULTS: Of 468 included participants (mean [SD] age at the midlife visit, 50.6 [5.3] years) with exposure and outcome data, 160 (34.2%) experienced any infertility. Mean (SD) LE8 scores were 76.3 (12.2) overall, 76.5 (13.4) for the behavioral domain, 76.0 (17.5) for the biomedical domain, and 78.9 (19.2) for the blood biomarkers subdomain. In adjusted models, the estimated overall LE8 score at midlife was 2.94 points lower (95% CI, -5.13 to -0.74 points), the biomedical score was 4.07 points lower (95% CI, -7.33 to -0.78 points), and the blood subdomain score was 5.98 points lower (95% CI, -9.71 to -2.26 points) among those with vs without history of infertility. The point estimate also was lower for the behavioral domain score (β = -1.81; 95% CI, -4.28 to 0.66), although the result was not statistically significant.

CONCLUSIONS AND RELEVANCE: This cohort study of parous individuals found evidence for an association between a history of infertility and lower overall and biomedical CVH scores. Future study of enhanced cardiovascular preventive strategies among those who experience infertility is warranted.

PMID:38180761 | DOI:10.1001/jamanetworkopen.2023.50424

Categories
Nevin Manimala Statistics

Reporting Requirements, Confidentiality, and Legal Immunity for Physicians Who Report Medically Impaired Drivers

JAMA Netw Open. 2024 Jan 2;7(1):e2350495. doi: 10.1001/jamanetworkopen.2023.50495.

ABSTRACT

IMPORTANCE: Physicians play an important role in assessing patients’ ability to drive. There is a dearth of peer-reviewed information on policies regarding physician reporting of medically impaired drivers.

OBJECTIVE: To investigate state reporting requirements and the availability of confidentiality and legal immunity for physicians who report medically impaired drivers.

DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study was conducted from November 1 to 30, 2022, in 3 rounds. First, all 50 US states’ Department of Motor Vehicle (DMV) websites were systematically reviewed. Second, DMV staff from each state were surveyed via telephone. Third, each state’s legal codes for driver licensing were reviewed.

MAIN OUTCOMES AND MEASURES: Outcome measures included the percentage of states with mandatory and voluntary reporting policies, reporting instructions on DMV websites, anonymous reporting options, and legal immunity for reporting physicians, in addition to characteristics of states’ mandatory reporting policies (ie, types of medical conditions that require reporting) and policies surrounding the confidentiality of reports. The data were analyzed using descriptive statistics.

RESULTS: One-third of state DMV websites (17 [34%]) lacked instructions regarding physician reporting. Six states had mandatory reporting requirements; 4 of these states only required reporting of conditions characterized by lapses of consciousness. Only 3 states (6%) accepted anonymous reports, and 7 states (14%) deemed physician reports of medically impaired drivers confidential without exception. Nearly one-third of states (15 [30%]) deemed reports by physicians confidential, with the exception that reported drivers could find out who reported them if they asked for a copy of the reporting form. Most states (37 [74%]) had statutes that protected physicians from liability related to reporting medically impaired drivers.

CONCLUSIONS AND RELEVANCE: This cross-sectional study of state reporting requirements regarding medically impaired drivers found many differences in state policies regarding mandatory reporting and the conditions that require reporting. There was also limited availability of online reporting instructions, anonymous reporting options, and legal protections for reporting physicians.

PMID:38180760 | DOI:10.1001/jamanetworkopen.2023.50495

Categories
Nevin Manimala Statistics

Xylazine in Overdose Deaths and Forensic Drug Reports in US States, 2019-2022

JAMA Netw Open. 2024 Jan 2;7(1):e2350630. doi: 10.1001/jamanetworkopen.2023.50630.

ABSTRACT

IMPORTANCE: Xylazine is increasingly reported in street drugs and fatal overdoses in the US, yet state-level data are limited, hampering local public health responses.

OBJECTIVE: To gather available state-level data on xylazine involvement in overdose deaths and in forensic drug reports.

DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study was a secondary analysis of 2019 to 2022 data from the National Forensic Laboratory Information System (NFLIS), National Center for Health Statistics, and individual states’ medical examiner or public health agency reports. Data were analyzed from August to October 2023.

EXPOSURE: State.

MAIN OUTCOMES AND MEASURES: Yearly xylazine-related overdose deaths per 100 000 residents; xylazine NFLIS drug reports, both per 100 000 residents and as a percentage of all NFLIS drug reports (from samples of drugs seized by law enforcement and analyzed by NFLIS-participating laboratories).

RESULTS: A total of 63 state-years were included in analyses of mortality rates, while 204 state-years were included in analyses of NFLIS reports. According to the publicly available data compiled in this study, at least 43 states reported at least 1 xylazine-related overdose death from 2019 to 2022, yet yearly totals of xylazine-related deaths were available for only 21 states. Of states with data available, xylazine-involved overdose death rates were highest in Vermont (10.5 per 100 000 residents) and Connecticut (9.8 per 100 000 residents) in 2022. In 2019, 16 states had zero xylazine reports included in NFLIS reports; in 2022, only 2 states had zero xylazine reports and all but 3 states had recorded an increase in xylazine’s representation in NFLIS reports. In 2022, xylazine represented 16.17% of all NFLIS reports in Delaware and between 5.95% and 7.00% of NFLIS reports in Connecticut, Maryland, District of Columbia, New Jersey, and Rhode Island, yet less than 1.0% of NFLIS reports in 35 different states.

CONCLUSIONS AND RELEVANCE: In this cross-sectional study of publicly available data on fatal overdoses and drugs analyzed by forensic laboratories, xylazine’s reported presence in overdose deaths and forensic reports was concentrated in the eastern US yet extended across the country to encompass nearly all states. In spite of xylazine’s geographic reach, yearly state-level numbers of xylazine-related overdose deaths were publicly available for less than half of all states.

PMID:38180756 | DOI:10.1001/jamanetworkopen.2023.50630

Categories
Nevin Manimala Statistics

Estimation of Improvements in Mortality in Spectrum Among Adults With HIV Receiving Antiretroviral Therapy in High-Income Countries

J Acquir Immune Defic Syndr. 2024 Jan 1;95(1S):e89-e96. doi: 10.1097/QAI.0000000000003326. Epub 2024 Jan 4.

ABSTRACT

INTRODUCTION: Mortality rates for people living with HIV (PLHIV) on antiretroviral therapy (ART) in high-income countries continue to decline. We compared mortality rates among PLHIV on ART in Europe for 2016-2020 with Spectrum’s estimates.

METHODS: The AIDS Impact Module in Spectrum is a compartmental HIV epidemic model coupled with a demographic population projection model. We used national Spectrum projections developed for the 2022 HIV estimates round to calculate mortality rates among PLHIV on ART, adjusting to the age/country distribution of PLHIV starting ART from 1996 to 2020 in the Antiretroviral Therapy Cohort Collaboration (ART-CC)’s European cohorts.

RESULTS: In the ART-CC, 11,504 of 162,835 PLHIV died. Between 1996-1999 and 2016-2020, AIDS-related mortality in the ART-CC decreased from 8.8 (95% CI: 7.6 to 10.1) to 1.0 (0.9-1.2) and from 5.9 (4.4-8.1) to 1.1 (0.9-1.4) deaths per 1000 person-years among men and women, respectively. Non-AIDS-related mortality decreased from 9.1 (7.9-10.5) to 6.1 (5.8-6.5) and from 7.0 (5.2-9.3) to 4.8 (4.3-5.2) deaths per 1000 person-years among men and women, respectively. Adjusted all-cause mortality rates in Spectrum among men were near ART-CC estimates for 2016-2020 (Spectrum: 7.02-7.47 deaths per 1000 person-years) but approximately 20% lower in women (Spectrum: 4.66-4.70). Adjusted excess mortality rates in Spectrum were 2.5-fold higher in women and 3.1-3.4-fold higher in men in comparison to the ART-CC’s AIDS-specific mortality rates.

DISCUSSION: Spectrum’s all-cause mortality estimates among PLHIV are consistent with age/country-controlled mortality observed in ART-CC, with some underestimation of mortality among women. Comparing results suggest that 60%-70% of excess deaths among PLHIV on ART in Spectrum are from non-AIDS causes.

PMID:38180742 | DOI:10.1097/QAI.0000000000003326

Categories
Nevin Manimala Statistics

New HIV Infections Among Key Populations and Their Partners in 2010 and 2022, by World Region: A Multisources Estimation

J Acquir Immune Defic Syndr. 2024 Jan 1;95(1S):e34-e45. doi: 10.1097/QAI.0000000000003340. Epub 2024 Jan 4.

ABSTRACT

BACKGROUND: Previously, The Joint United Nations Programme on HIV/AIDS estimated proportions of adult new HIV infections among key populations (KPs) in the last calendar year, globally and in 8 regions. We refined and updated these, for 2010 and 2022, using country-level trend models informed by national data.

METHODS: Infections among 15-49 year olds were estimated for sex workers (SWs), male clients of female SW, men who have sex with men (MSM), people who inject drugs (PWID), transgender women (TGW), and non-KP sex partners of these groups. Transmission models used were Goals (71 countries), AIDS Epidemic Model (13 Asian countries), Optima (9 European and Central Asian countries), and Thembisa (South Africa). Statistical Estimation and Projection Package fits were used for 15 countries. For 40 countries, new infections in 1 or more KPs were approximated from first-time diagnoses by the mode of transmission. Infection proportions among nonclient partners came from Goals, Optima, AIDS Epidemic Model, and Thembisa. For remaining countries and groups not represented in models, median proportions by KP were extrapolated from countries modeled within the same region.

RESULTS: Across 172 countries, estimated proportions of new adult infections in 2010 and 2022 were both 7.7% for SW, 11% and 20% for MSM, 0.72% and 1.1% for TGW, 6.8% and 8.0% for PWID, 12% and 10% for clients, and 5.3% and 8.2% for nonclient partners. In sub-Saharan Africa, proportions of new HIV infections decreased among SW, clients, and non-KP partners but increased for PWID; elsewhere these groups’ 2010-to-2022 differences were opposite. For MSM and TGW, the proportions increased across all regions.

CONCLUSIONS: KPs continue to have disproportionately high HIV incidence.

PMID:38180737 | DOI:10.1097/QAI.0000000000003340

Categories
Nevin Manimala Statistics

Design and Validation of a Clinical Outcome Measure for Adolescents and Adult Patients with Spinal Muscular Atrophy: SMA Life Study Protocol

Neurol Ther. 2024 Jan 5. doi: 10.1007/s40120-023-00571-9. Online ahead of print.

ABSTRACT

INTRODUCTION: The objective of this study is to develop a clinical tool for the evaluation and follow-up of adolescent and adult patients with 5q spinal muscular atrophy (SMA) and to design its validation.

METHODS: This prospective, non-interventional study will be carried out at five centres in Spain and will include patients aged 16 years or older with a confirmed diagnosis of 5q SMA (biallelic mutation of the survival motor neuron 1 [SMN1] gene). A panel of experts made up of neurologists, physiatrists and Spanish patients’ association (FundAME), participated in the design of the clinical tool. Physicians will administer the tool at three time points (baseline, 12 months and 24 months). Additionally, data from other questionnaires and scales will be collected. Once recruitment is achieved, an interim statistical analysis will be performed to assess its psychometric properties by applying Rasch analysis and classical statistical tests.

RESULTS: The tool will consist of up to 53 items to assess functional status from a clinical perspective in seven key dimensions (bulbar, respiratory, axial, lower, upper, fatigability and other symptoms), which will be collected together with objective clinical measures (body mass index, forced vital capacity, pinch strength and 6-minute walk test).

CONCLUSIONS: The validation of this tool will facilitate the clinical evaluation of adult and adolescent patients with SMA and the quantification of their response to new treatments in both clinical practice and research.

PMID:38180726 | DOI:10.1007/s40120-023-00571-9

Categories
Nevin Manimala Statistics

H-Wave® Device Stimulation for Chronic Low Back Pain: A Patient-Reported Outcome Measures (PROMs) Study

Pain Ther. 2024 Jan 5. doi: 10.1007/s40122-023-00570-6. Online ahead of print.

ABSTRACT

INTRODUCTION: Chronic low back pain (cLBP) is a problem globally, creating a tremendous economic burden. Since conventional treatments often fail, various forms of electrical stimulation have been proposed to improve function and decrease pain. Patient-reported outcome measures (PROMs) have not been adequately reported in the electrical stimulation literature.

METHODS: A retrospective independent statistical analysis was conducted on PROMs data for users of H-Wave® device stimulation (HWDS) collected by the device manufacturer over a period of 4 years. Final surveys for 34,192 pain management patients were filtered for pain chronicity limited to 3-24 months and device use of 22-365 days, resulting in 11,503 patients with “all diagnoses”; this number was then reduced to 2711 patients with nonspecific cLBP, sprain, or strain.

RESULTS: Reported pain was reduced by 3.12 points (0-10 pain scale), with significant (≥ 20%) relief in 85.28%. Function/activities of daily living (ADL) improved in 96.36%, while improved work performance was reported in 81.61%. Medication use decreased or stopped in 64.41% and sleep improved in 59.76%. Over 96% reported having expectations met or exceeded, service satisfaction, and confidence in device use, while no adverse events were reported. Subgroup analyses found positive associations with longer duration of device use, home exercise participation, and working, whereas older age and longer pain chronicity resulted in reduced benefit. Similar analysis of the larger all-diagnoses cohort demonstrated near-equivalent positive outcomes.

CONCLUSION: Treatment outcomes directly reported by cLBP HWDS patients demonstrated profound positive effects on function and ADL, robust improvement in pain perception, and additional benefits like decreased medication use, better sleep, and improved work performance, representing compelling new evidence of treatment efficacy.

PMID:38180725 | DOI:10.1007/s40122-023-00570-6