Categories
Nevin Manimala Statistics

Utility-based Analysis of Statistical Approaches and Deep Learning Models for Synthetic Data Generation With Focus on Correlation Structures: Algorithm Development and Validation

JMIR AI. 2025 Mar 20;4:e65729. doi: 10.2196/65729.

ABSTRACT

BACKGROUND: Recent advancements in Generative Adversarial Networks and large language models (LLMs) have significantly advanced the synthesis and augmentation of medical data. These and other deep learning-based methods offer promising potential for generating high-quality, realistic datasets crucial for improving machine learning applications in health care, particularly in contexts where data privacy and availability are limiting factors. However, challenges remain in accurately capturing the complex associations inherent in medical datasets.

OBJECTIVE: This study evaluates the effectiveness of various Synthetic Data Generation (SDG) methods in replicating the correlation structures inherent in real medical datasets. In addition, it examines their performance in downstream tasks using Random Forests (RFs) as the benchmark model. To provide a comprehensive analysis, alternative models such as eXtreme Gradient Boosting and Gated Additive Tree Ensembles are also considered. We compare the following SDG approaches: Synthetic Populations in R (synthpop), copula, copulagan, Conditional Tabular Generative Adversarial Network (ctgan), tabular variational autoencoder (tvae), and tabula for LLMs.

METHODS: We evaluated synthetic data generation methods using both real-world and simulated datasets. Simulated data consist of 10 Gaussian variables and one binary target variable with varying correlation structures, generated via Cholesky decomposition. Real-world datasets include the body performance dataset with 13,393 samples for fitness classification, the Wisconsin Breast Cancer dataset with 569 samples for tumor diagnosis, and the diabetes dataset with 768 samples for diabetes prediction. Data quality is evaluated by comparing correlation matrices, the propensity score mean-squared error (pMSE) for general utility, and F1-scores for downstream tasks as a specific utility metric, using training on synthetic data and testing on real data.

RESULTS: Our simulation study, supplemented with real-world data analyses, shows that the statistical methods copula and synthpop consistently outperform deep learning approaches across various sample sizes and correlation complexities, with synthpop being the most effective. Deep learning methods, including large LLMs, show mixed performance, particularly with smaller datasets or limited training epochs. LLMs often struggle to replicate numerical dependencies effectively. In contrast, methods like tvae with 10,000 epochs perform comparably well. On the body performance dataset, copulagan achieves the best performance in terms of pMSE. The results also highlight that model utility depends more on the relative correlations between features and the target variable than on the absolute magnitude of correlation matrix differences.

CONCLUSIONS: Statistical methods, particularly synthpop, demonstrate superior robustness and utility preservation for synthetic tabular data compared with deep learning approaches. Copula methods show potential but face limitations with integer variables. Deep Learning methods underperform in this context. Overall, these findings underscore the dominance of statistical methods for synthetic data generation for tabular data, while highlighting the niche potential of deep learning approaches for highly complex datasets, provided adequate resources and tuning.

PMID:40112290 | DOI:10.2196/65729

Categories
Nevin Manimala Statistics

Profiling Generalized Anxiety Disorder on Social Networks: Content and Behavior Analysis

J Med Internet Res. 2025 Mar 20;27:e53399. doi: 10.2196/53399.

ABSTRACT

BACKGROUND: Despite a dramatic increase in the number of people with generalized anxiety disorder (GAD), a substantial number still do not seek help from health professionals, resulting in reduced quality of life. With the growth in popularity of social media platforms, individuals have become more willing to express their emotions through these channels. Therefore, social media data have become valuable for identifying mental health status.

OBJECTIVE: This study investigated the social media posts and behavioral patterns of people with GAD, focusing on language use, emotional expression, topics discussed, and engagement to identify digital markers of GAD, such as anxious patterns and behaviors. These insights could help reveal mental health indicators, aiding in digital intervention development.

METHODS: Data were first collected from Twitter (subsequently rebranded as X) for the GAD and control groups. Several preprocessing steps were performed. Three measurements were defined based on Linguistic Inquiry and Word Count for linguistic analysis. GuidedLDA was also used to identify the themes present in the tweets. Additionally, users’ behaviors were analyzed using Twitter metadata. Finally, we studied the correlation between the GuidedLDA-based themes and users’ behaviors.

RESULTS: The linguistic analysis indicated differences in cognitive style, personal needs, and emotional expressiveness between people with and without GAD. Regarding cognitive style, there were significant differences (P<.001) for all features, such as insight (Cohen d=1.13), causation (Cohen d=1.03), and discrepancy (Cohen d=1.16). Regarding personal needs, there were significant differences (P<.001) in most personal needs categories, such as curiosity (Cohen d=1.05) and communication (Cohen d=0.64). Regarding emotional expressiveness, there were significant differences (P<.001) for most features, including anxiety (Cohen d=0.62), anger (Cohen d=0.72), sadness (Cohen d=0.48), and swear words (Cohen d=2.61). Additionally, topic modeling identified 4 primary themes (ie, symptoms, relationships, life problems, and feelings). We found that all themes were significantly more prevalent for people with GAD than for those without GAD (P<.001), along with significant effect sizes (Cohen d>0.50; P<.001) for most themes. Moreover, studying users’ behaviors, including hashtag participation, volume, interaction pattern, social engagement, and reactive behaviors, revealed some digital markers of GAD, with most behavior-based features, such as the hashtag (Cohen d=0.49) and retweet (Cohen d=0.69) ratios, being statistically significant (P<.001). Furthermore, correlations between the GuidedLDA-based themes and users’ behaviors were also identified.

CONCLUSIONS: Our findings revealed several digital markers of GAD on social media. These findings are significant and could contribute to developing an assessment tool that clinicians could use for the initial diagnosis of GAD or the detection of an early signal of worsening in people with GAD via social media posts. This tool could provide ongoing support and personalized coping strategies. However, one limitation of using social media for mental health assessment is the lack of a demographic representativeness analysis.

PMID:40112289 | DOI:10.2196/53399

Categories
Nevin Manimala Statistics

Exploring the Use of Social Media for Medical Problem Solving by Analyzing the Subreddit r/medical_advice: Quantitative Analysis

JMIR Infodemiology. 2025 Mar 20;5:e56116. doi: 10.2196/56116.

ABSTRACT

BACKGROUND: The advent of the internet has transformed the landscape of health information acquisition and sharing. Reddit has become a hub for such activities, such as the subreddit r/medical_advice, affecting patients’ knowledge and decision-making. While the popularity of these platforms is recognized, research into the interactions and content within these communities remains sparse. Understanding the dynamics of these platforms is crucial for improving online health information quality.

OBJECTIVE: This study aims to quantitatively analyze the subreddit r/medical_advice to characterize the medical questions posed and the demographics of individuals providing answers. Insights into the subreddit’s user engagement, information-seeking behavior, and the quality of shared information will contribute to the existing body of literature on health information seeking in the digital era.

METHODS: A cross-sectional study was conducted, examining all posts and top comments from r/medical_advice since its creation on October 1, 2011. Data were collected on March 2, 2023, from pushhift.io, and the analysis included post and author flairs, scores, and engagement metrics. Statistical analyses were performed using RStudio and GraphPad Prism 9.0.

RESULTS: From October 2011 to March 2023, a total of 201,680 posts and 721,882 comments were analyzed. After excluding autogenerated posts and comments, 194,678 posts and 528,383 comments remained for analysis. A total of 41% (77,529/194,678) of posts had no user flairs, while only 0.1% (108/194,678) of posts were made by verified medical professionals. The average engagement per post was a score of 2 (SD 7.03) and 3.32 (SD 4.89) comments. In period 2, urgent questions and those with level-10 pain reported higher engagement, with significant differences in scores and comments based on flair type (P<.001). Period 3 saw the highest engagement in posts related to pregnancy and the lowest in posts about bones, joints, or ligaments. Media inclusion significantly increased engagement, with video posts receiving the highest interaction (P<.001).

CONCLUSIONS: The study reveals a significant engagement with r/medical_advice, with user interactions influenced by the type of query and the inclusion of visual media. High engagement with posts about pregnancy and urgent medical queries reflects a focused public interest and the subreddit’s role as a preliminary health information resource. The predominance of nonverified medical professionals providing information highlights a shift toward community-based knowledge exchange, though it raises questions about the reliability of the information. Future research should explore cross-platform behaviors and the impact of misinformation on public health. Effective moderation and the involvement of verified medical professionals are recommended to enhance the subreddit’s role as a reliable health information resource.

PMID:40112288 | DOI:10.2196/56116

Categories
Nevin Manimala Statistics

Assessment of utilization of automated systems and laboratory information management systems in clinical microbiology laboratories in Thailand

PLoS One. 2025 Mar 20;20(3):e0320074. doi: 10.1371/journal.pone.0320074. eCollection 2025.

ABSTRACT

INTRODUCTION: Clinical microbiology laboratories are essential for diagnosing and monitoring antimicrobial resistance (AMR). Here, we assessed the systems involved in generating, managing and analyzing blood culture data in these laboratories in an upper-middle-income country.

METHODS: From October 2023 to February 2024, we conducted a survey on the utilization of automated systems and laboratory information management systems (LIMS) for blood culture specimens in 2022 across 127 clinical microbiology laboratories (one each from 127 public referral hospitals) in Thailand. We categorized automated systems for blood culture processing into three steps: incubation, bacterial identification, and antimicrobial susceptibility testing (AST).

RESULTS: Of the 81 laboratories that completed the questionnaires, the median hospital bed count was 450 (range, 150-1,387), and the median number of blood culture bottles processed was 17,351 (range, 2,900-80,330). All laboratories (100%) had an automated blood culture incubation system. Three-quarters of the laboratories (75%, n = 61) had at least one automated system for both bacterial identification and AST, about a quarter (22%, n = 18) had no automated systems for either step, and two laboratories (3%) outsourced both steps. The systems varied and were associated with the hospital level. Many laboratories utilized both automated systems and conventional methods for bacterial identification (n = 54) and AST (n = 61). For daily data management, 71 laboratories (88%) used commercial microbiology LIMS, three (4%) WHONET, three (4%) an in-house database software and four (5%) did not use any software. Many laboratories manually entered data of incubation (73%, n = 59), bacterial identification (27%, n = 22) and AST results (25%, n = 20) from their automated systems into their commercial microbiology LIMS. The most common barrier to data analysis was ‘lack of time’, followed by ‘lack of staff with statistical skills’ and ‘difficulty in using analytical software’.

CONCLUSION: In Thailand, various automated systems for blood culture and LIMS are utilized. However, barriers to data management and analysis are common. These challenges are likely present in other upper-middle-income countries. We propose that guidance and technical support for automated systems, LIMS and data analysis are needed.

PMID:40112277 | DOI:10.1371/journal.pone.0320074

Categories
Nevin Manimala Statistics

Racial and Ethnic Differences in Advance Care Planning and End-of-Life Care in Older Adults With Stroke: A Cohort Study

Neurology. 2025 Apr 22;104(8):e213486. doi: 10.1212/WNL.0000000000213486. Epub 2025 Mar 19.

ABSTRACT

BACKGROUND AND OBJECTIVES: Stroke is a leading cause of death and disability in the United States and may result in cognitive impairment and the inability to participate in treatment decisions, attesting to the importance of advance care planning (ACP). Although racial and ethnic differences have been shown for ACP in the general population, little is known about these differences specific to patients with stroke. The aim of this study was to examine the presence of ACP and receipt of life-prolonging care by race and ethnicity among decedents who had suffered a stroke.

METHODS: We used the Health and Retirement Study, a nationally representative longitudinal survey. We conducted a cohort study of decedents who died between 2000 and 2018 using multivariable logistic regression models to explore the association between self-reported ethnicity and race and completion of ACP (including a living will [LW] and durable power of attorney for healthcare [DPOAH]) and receipt of life-prolonging care at end of life, controlling for covariates. Stratified models for each race and ethnicity also were conducted.

RESULTS: This study included 3,491 decedents with a reported history of stroke; 57.4% were women, and the mean age was 81.5 years (SD = 10.2). Decedents who identified as non-Hispanic White had the highest end-of-life planning rates (LW: 57%, DPOAH: 72%, and ACP conversation: 63%) compared with those identifying as non-Hispanic Black (LW: 20%, DPOAH 40%, and ACP conversation: 41%) and Hispanic (LW: 20%, DPOAH: 36%, and ACP conversation: 42%; p < 0.001). The presence of ACP discussions, LW, and DPOAH was associated with lower odds of receiving life-prolonging care at end-of-life among non-Hispanic White decedents (OR = .64, CI = .447-0.904; OR = .30, CI = .206-0.445; OR = .61, CI = .386-0.948) but not among those who identified as Hispanic or non-Hispanic Black.

CONCLUSIONS: Hispanic or non-Hispanic Black decedents with stroke had significantly lower rates of ACP discussions, LWs, and naming a DPOAH compared with those who identified as non-Hispanic White. In addition, ACP activities were inversely associated with receipt of life-prolonging care among non-Hispanic White decedents, but not among those who identified as non-Hispanic Black and Hispanic. Small ethnic/racial subgroup sizes limit the generalizability of this study.

PMID:40112272 | DOI:10.1212/WNL.0000000000213486

Categories
Nevin Manimala Statistics

Racial Disparities in Endometrial Cancer Incidence and Outcomes in Brazil: Insights From Population-Based Registries

JCO Glob Oncol. 2025 Mar;11:e2400604. doi: 10.1200/GO-24-00604. Epub 2025 Mar 20.

ABSTRACT

PURPOSE: This study aimed to examine trends in the incidence and mortality rates of endometrial cancer (EC) across ethnic groups in Brazil and to analyze the demographic and clinicopathological characteristics associated with these trends.

METHODS: The incidence of EC was analyzed from 2010 to 2015 using data from Brazilian Population-Based Cancer Registries (PBCRs), including crude rates and annual percentage changes (APCs). Clinical and sociodemographic information from 2000 to 2019 was gathered from Hospital-Based Cancer Registries. Mortality data between 2000 and 2021 were obtained from the National Mortality Information System, allowing for comparisons between White women and Black women.

RESULTS: From 2010 to 2015, a total of 32,831 new cases of EC were reported across 13 PBCRs, with Black patients accounting for 35.7% of these cases. The median age at diagnosis was 63 years, with Black women diagnosed at a younger age than White women. Black patients experienced a significant increase in incidence rate (APC +6.7% v +3.0%). A greater proportion of Black patients lived without partners (54.0%), had higher rates of alcohol consumption (15%) and smoking (25.8%), and resided in less developed regions (54.6%) with lower education levels (77.5%). From 2000 to 2021, Brazil recorded 72,189 EC-related deaths, showing higher mortality rates among White women (3.8 per 100,000) than Black women (2.4 per 100,000), although the downward trend was steeper among White women (-1.2%) than Black women (-0.6%).

CONCLUSION: Racial disparities in EC incidence and mortality in Brazil may be closely linked to unfavorable sociodemographic factors faced by Black women. Targeted public health initiatives are critical for improving early detection and access to equitable care for Black women.

PMID:40112259 | DOI:10.1200/GO-24-00604

Categories
Nevin Manimala Statistics

Going Back Home: Understanding the Challenges and Discrimination of Early and Mid-Career International and Puerto Rican Medical Graduates in Oncology Fields in the United States

JCO Glob Oncol. 2025 Mar;11:e2400513. doi: 10.1200/GO-24-00513. Epub 2025 Mar 20.

ABSTRACT

PURPOSE: Although international medical graduates (IMGs) and Puerto Rican Medical Graduates (PRMGs) comprise an integral part of the health care workforce, these individuals, particularly women, frequently face numerous types of discrimination throughout medical training and independent practice. To our knowledge, we conducted the first cross-sectional study to understand the journeys and consequences of migration faced by IMGs and PRMGs in the US oncology workforce.

METHODS: We developed a cross-sectional, online survey consisting of 51 multiple choice and open-ended questions that captured demographic information, professional status, period of migration to the United States, location within the United States that participants migrated to, reasons for migration, cultural adaptation, experiences of discrimination during training, and overall professional experiences in the United States.

RESULTS: The majority of participants cited better education, professional gains, and a lack of opportunities in participants’ home country as primary reasons for migration to the United States. However, most participants, particularly women, experienced staunch assimilation to fit the mold of professional American standards; women were also particularly likely to report experiences of racial/ethnic, language, and gender discrimination during oncology training in the United States, which only marginally improved during independent practice. Despite such discrimination, most participants reported excellent professional satisfaction during training and independent practice, although only moderate personal satisfaction. Most participants decided to stay in the United States, citing reasons pertaining to enhanced professional opportunities, whereas those that returned home valued reasons relating to family and quality of life.

CONCLUSION: Our sobering findings underscore the need for institutional enforcement of an inclusive environment encompassing cultural humility, enactment of programs addressing barriers to socialization, immigration laws, and financial support, creation of IMG-specific support networks, and the sponsorship and promoting of minority women physicians.

PMID:40112258 | DOI:10.1200/GO-24-00513

Categories
Nevin Manimala Statistics

Effects of exercise on inflammation in female survivors of nonmetastatic breast cancer: a systematic review and meta-analysis

J Natl Cancer Inst. 2025 Mar 20:djaf062. doi: 10.1093/jnci/djaf062. Online ahead of print.

ABSTRACT

BACKGROUND: Despite advances in breast cancer treatment, recurrence remains common and contributes to higher mortality risk. Among the potential mechanisms, inflammation plays a key role in recurrence by promoting tumor progression. Exercise provides a wide array of health benefits and may reduce inflammation, potentially reducing mortality risk. However, the effects of exercise, including mode (ie, resistance training [RT], aerobic training [AT], and combined RT and AT) and program duration, on inflammatory biomarkers in breast cancer survivors remain to be elucidated.

METHODS: A systematic search was undertaken in PubMed, CINAHL, Embase, SPORTDiscus and CENTRAL in August 2024. Randomized controlled trials examining the effects of exercise on IL-1β, IL-6, IL-8, IL-10, TNF-α, and CRP were included. A random-effects meta-analysis was undertaken to quantify the magnitude of change.

RESULTS: Twenty-two studies were included (n = 968). Exercise induced small to large significant reductions in IL-6 (SMD = -0.85; 95% CI = -1.68 to -0.02; p = .05) and TNF-α (SMD = -0.40; 95% CI = -0.81 to 0.01; p = .05) and a trend for a decrease in CRP. When stratifying by exercise mode, trends toward reduction in IL-6 and TNF-α were observed for combined exercise, whilst changes were not generally affected by exercise program duration.

CONCLUSION: Exercise, especially combined RT and AT, can reduce pro-inflammatory biomarkers, and may be a suitable strategy to reduce inflammation in breast cancer survivors. However, further research is needed to investigate the effects of exercise mode and program duration on markers of inflammation in this survivor group.

PMID:40112254 | DOI:10.1093/jnci/djaf062

Categories
Nevin Manimala Statistics

Urban-rural differences in the successful aging among older adults in China

PLoS One. 2025 Mar 20;20(3):e0319105. doi: 10.1371/journal.pone.0319105. eCollection 2025.

ABSTRACT

This study aimed to reveal urban-rural disparities in successful aging among Chinese older adults and the impact of gender and age on aging outcomes. We utilized the Successful Aging Index (SAI), a multidimensional measure encompassing social, economic, bio-clinical, psychological, and lifestyle factors. Scores on the SAI range from 0 to 10, with higher scores signifying better aging. Data was sourced from the 2018 Chinese Longitudinal Healthy Longevity Survey, comprising 7,315 participants. Urban older adults (OU) had significantly higher SAI scores than rural older adults (OR), with averages of 4.32 ± 1.44 and 3.85 ± 1.24, respectively (p < 0.001). Men showed more successful aging than women, regardless of their residence (p < 0.001). OU had better financial and educational status and higher social activity scores, except for friend interaction (p < 0.001). They were more physically active (p < 0.001), more adherent to the Mediterranean diet (p < 0.001), and less likely to smoke (p = 0.018). However, OU had a higher prevalence of cardiovascular disease risk factors compared to OR (p < 0.001). Notably, depression scores were similar between OU and OR (p = 0.129). In summary, significant urban-rural differences in successful aging are evident among Chinese older adults, with urban-dwelling older adults aging more successfully than their rural peers. Men, irrespective of their place of residence, experience more successful aging outcomes than women.

PMID:40112253 | DOI:10.1371/journal.pone.0319105

Categories
Nevin Manimala Statistics

Associations with HIV preexposure prophylaxis use by cisgender female sex workers in two Ugandan cities

PLoS One. 2025 Mar 20;20(3):e0320065. doi: 10.1371/journal.pone.0320065. eCollection 2025.

ABSTRACT

BACKGROUND: Sex workers of all genders have a high risk of HIV acquisition and are a priority population for HIV pre-exposure prophylaxis (PrEP). We aimed to assess current oral PrEP use and associated factors among cisgender female sex workers (FSW) in two Ugandan cities.

METHODS: We administered a survey questionnaire to 236 HIV-negative FSW in the cities of Mbale and Mbarara from January to March 2020. The survey was nested in a quasi-experimental study to assess the effect of peer education and text message reminders on the uptake of regular sexually transmitted infection (STI) and HIV testing. Using interviewer-administered questionnaires, we obtained data on current self-reported tenofovir-based oral PrEP use. We used modified Poisson regression with robust standard errors to evaluate the factors associated with current oral PrEP usage.

RESULTS: Nearly 70% of FSWs reported taking an HIV test during the past three months. Among the respondents, 33% (33/100) in Mbale and 67% (91/136) in Mbarara reported having ever heard of PrEP. However, only 9.7% (23/236) self-reported currently taking oral-PrEP. In Mbarara, FSWs were twice as likely to be aware of or use oral PrEP than those in Mbale (adjusted prevalence ratio [aPR] 2.33; 95% confidence interval (CI) 1.19-3.97; p = 0.01). Additionally, current use was positively associated with attainment of secondary (aPR 2.50; 95% CI: 1.14-5.45; p = 0.02) or tertiary education (aPR 3.12; 95% CI: 1.09-8.96; p = 0.03).

CONCLUSION: PrEP use in this cohort of FSWs was low and was associated with location and level of education. To increase PrEP uptake among FSWs, targeted educational campaigns and implementation studies are needed, particularly for those with lower levels of education.

PMID:40112250 | DOI:10.1371/journal.pone.0320065