Categories
Nevin Manimala Statistics

Reliability of digital algometer-based pressure pain threshold measurement in patients with end-stage knee osteoarthritis: a single-center reliability study

Physiother Theory Pract. 2026 Jun 29:1-11. doi: 10.1080/09593985.2026.2695769. Online ahead of print.

ABSTRACT

OBJECTIVE: To evaluate the reliability and measurement error of digital algometer-based pressure pain threshold (PPT) assessment inpatients with end-stage knee osteoarthritis (KOA).

METHODS: This prospective observational reliability study included 60 patients with end-stage KOA scheduled for total knee arthroplasty. Pressure pain threshold was assessed by two raters with different levels of experience at baseline and 7 days later at the medial aspect of the affected knee (local site) and the dorsal aspect of the contralateral forearm (remote site), following standardized training and measurement procedures. Three repeated measurements were obtained at each site per session and averaged. Relative reliability was assessed using intraclass correlation coefficients (ICC), with the ICC2,k model used for intra-session reliability based on the mean of three repeated measurements and the ICC2, 1 model used for inter-session and inter-rater reliability. Absolute measurement error was quantified using the standard error of measurement (SEM), relative SEM (SEM%), and the minimal detectable change at the 95% confidence level (MDC95). Inter-rater agreement was examined using Bland – Altman analysis.

RESULTS: Pressure pain threshold measurements demonstrated excellent reliability across all conditions. Intra-session, inter-session, and inter-rater ICC values all exceeded 0.87. Intra-session ICC2, k values ranged from 0.874 to 0.965, inter-session ICC2, 1 values from 0.870 to 0.984, and inter-rater ICC2, 1 values from 0.921 to 0.953. Relative SEM values remained below 7%, and inter-session MDC95 ranged from 0.33 to 0.72 kg/cm2, representing measurement-error thresholds for detecting change beyond random variability. Bland-Altman analysis showed that most differences lay within the 95% limits of agreement, with no apparent proportional bias. Mean inter-rater differences ranged from 0.09 to 0.19 kg/cm2.

CONCLUSIONS: Under standardized conditions, digital algometer-based PPT assessment showed high reliability with low measurement error in patients with end-stage KOA. Clinical utility requires further validation.

PMID:42371695 | DOI:10.1080/09593985.2026.2695769

Categories
Nevin Manimala Statistics

Evaluating the Impact of Transcendental Meditation on Trauma Symptoms, Depression, Anxiety, and Sleep Problems Among Israeli Civilians Post-October 7, 2023: A Pilot Study

J Clin Psychol. 2026 Jun 29. doi: 10.1002/jclp.70172. Online ahead of print.

ABSTRACT

OBJECTIVE: The mass evacuation of Israeli residents from conflict zones after the events of October 7, 2023, coupled with ongoing security threats, has taken a substantial psychological toll, with many individuals exhibiting symptoms of post-traumatic stress disorder (PTSD), anxiety, and sleep problems. This pilot study examined the feasibility and preliminary within-group changes associated with participation in transcendental meditation (TM), a non-pharmacological program, in relation to PTSD symptoms, depression, anxiety, and sleep problems among 39 Israeli civilians evacuated after October 7.

METHOD: In an 8-week intervention, we examined changes in psychological well-being using the PTSD Checklist for DSM-5 (PCL-5), Patient Health Questionnaire (PHQ-9), Generalized Anxiety Disorder scale (GAD-7), and Insomnia Severity Index (ISI). Changes in PTSD symptoms, depression symptoms, anxiety, and sleep problems were analyzed using dependent t-tests. Additional analyses of baseline, 4-week, and 8-week post-test data used repeated-measures analysis of variance.

RESULTS: Participants showed statistically significant within-group decreases from baseline to post-test in PTSD symptoms, depression, anxiety, and sleep problems.

CONCLUSION: Findings provide preliminary support for the feasibility and acceptability of TM in trauma-exposed civilians and suggest that participation in the program was associated with improvements in psychological symptoms over time. Given the uncontrolled pilot design, these results should be interpreted cautiously and require confirmation in randomized controlled trials.

PMID:42371680 | DOI:10.1002/jclp.70172

Categories
Nevin Manimala Statistics

Inferior Vena Cava Ultrasound for Decongestion Assessment in Acute Heart Failure: Systematic Review

Echocardiography. 2026 Jul;43(7):e70495. doi: 10.1111/echo.70495.

ABSTRACT

BACKGROUND: Residual congestion at discharge in acute heart failure (AHF) is a primary driver of readmission and mortality. Inferior vena cava (IVC) ultrasound provides a noninvasive bedside assessment of volume status, yet its clinical impact on guiding therapy remains underdefined. This systematic review evaluated the efficacy of IVC ultrasound-guided therapy compared to standard clinical assessment in AHF decongestion.

METHODS: Following PRISMA guidelines (PROSPERO: CRD420251171323), a systematic search was conducted across PubMed, EMBASE, and other major databases through October 2025. We included randomized controlled trials (RCTs) and nonrandomized studies focusing on IVC-guided management in adults with AHF. Outcomes included congestion markers, NT-proBNP levels, hospitalization duration, and mortality.

RESULTS: Four studies involving 629 patients met the inclusion criteria. Most studies showed improved decongestion with IVC ultrasound guidance, evidenced by lower residual congestion and improved IVC metrics (diameter/collapsibility). While NT-proBNP levels decreased in all cohorts, between-group differences were not statistically significant. Clinical outcomes improved in 50% of studies, showing shorter hospital stays and reduced mortality. Notably, one trial reported a significant mortality benefit (3.3% vs. 33.3%; p = 0.003). Adverse events were either similar or significantly fewer (p < 0.05) in the ultrasound-guided groups.

CONCLUSION: IVC ultrasound is an effective bedside tool for individualized volume management in AHF, potentially enhancing treatment precision and clinical outcomes. While current evidence is promising, larger multicenter trials are necessary to standardize its implementation in routine heart failure care.

PMID:42371669 | DOI:10.1111/echo.70495

Categories
Nevin Manimala Statistics

Asynchronous Electronic Screening for Unhealthy Alcohol Use Among Veterans in Primary Care: A Cluster Randomized Quality Improvement Trial

JAMA Intern Med. 2026 Jun 29. doi: 10.1001/jamainternmed.2026.1517. Online ahead of print.

ABSTRACT

IMPORTANCE: Screening for unhealthy alcohol use is recommended in primary care; however, completion and quality are inconsistent especially during telemedicine visits. Little is known about optimal workflows incorporating electronic screening (e-screening).

OBJECTIVE: To evaluate whether use of previsit asynchronous e-screening is associated with improved completion and detection of unhealthy alcohol use via the Alcohol Use Disorders Identification Test (AUDIT-C) questionnaire compared with usual staff-administered screening during telemedicine primary care visits.

DESIGN, SETTING, AND PARTICIPANTS: Pragmatic cluster randomized quality improvement trial conducted at 2 primary care clinics in the Veterans Health Administration (VHA) from June 24 to August 1, 2024. Primary care clinicians (PCCs) were randomized 1:1, stratified by site, to intervention or control.

INTERVENTION: For PCCs in the control arm, patients received usual care including staff-administered AUDIT-C at telemedicine visits. For PCCs in the intervention arm, 24 to 48 hours before visits patients additionally received an invitation to asynchronous self-administered e-screening. Veterans who did not complete e-screening were still eligible for staff completion of screening during their clinic visits.

MAIN OUTCOMES AND MEASURES: The primary outcome was completion of AUDIT-C; secondary outcome was positive screen result (AUDIT-C ≥5). The exploratory outcome was brief intervention after positive screen result. All statistical models were clustered by PCC and adjusted for patient age, sex, race and ethnicity, comorbidity, prior primary care use, and site.

RESULTS: Among 848 veterans in the primary analysis (mean [SD] age, 55.4 [16.1] years; 729 [86.0%] male), use of e-screening was associated with increased telemedicine visit screening completion rates by 30.5 percentage points (74.4% [95% CI, 68.5%-80.3%] for e-screening vs 43.9% [95% CI, 26.6%-61.2%] for usual care; P < .001) and with increased likelihood of a positive screen result (10.6% [95% CI, 8.0%-13.2%] for e-screening vs 2.7% [95% CI, 0.7%-4.7%] for usual care; P < .001). Exploratory analysis identified the proportion of veterans receiving a brief intervention after a positive screen result (2.3% [10 of 442] for usual care vs 5.9% [24 of 406] for e-screening; P = .01).

CONCLUSIONS AND RELEVANCE: In this study, use of asynchronous e-screening was associated with improved completion and screen-positive results for unhealthy alcohol use in primary care, with the greatest gains for telemedicine encounters. Overall, this approach may close the implementation gap for population-based screening, improve disclosure, and reduce staff burden, particularly in hybrid care models.

TRIAL REGISTRATION: isrctn.org Identifier: ISRCTN16316660.

PMID:42371662 | DOI:10.1001/jamainternmed.2026.1517

Categories
Nevin Manimala Statistics

Genomic Insights Into Antimicrobial Resistance and Virulence of Enterococcus avium Strains From Bovine Mastitis in Some Selected Dairy Farms of Bangladesh

Vet Med Sci. 2026 Jul;12(4):e71060. doi: 10.1002/vms3.71060.

ABSTRACT

BACKGROUND: Bovine mastitis remains a major global threat to dairy production and animal welfare, with the increasing emergence of multidrug-resistant (MDR) pathogens severely undermining the efficacy of conventional antimicrobial therapies and complicating disease control strategies. Enterococcus avium, traditionally understudied in livestock, has been occasionally associated with mastitis and poses potential zoonotic and antimicrobial resistance (AMR) risks.

OBJECTIVES: This study aimed to investigate the prevalence, AMR, virulence repertoire and genomic features of E. avium isolates from milk, faeces and soil in some selected dairy farms with mastitis in Bangladesh.

METHODS: A total of 110 samples (milk, faecal and soil) were collected and screened for E. avium using selective culture and polymerase chain reaction (PCR). Antimicrobial susceptibility was determined against 15 antibiotics. Four MDR E. avium isolates (4M1, 4S1, 4F1 and 4F2) were selected for whole-genome sequencing (WGS) to characterize their genomic diversity, functional potential, resistome and virulome in dairy cows and associated environments.

RESULTS: E. avium was detected in 56.36% of samples, with highest prevalence in milk (47.8%). MDR was highly prevalent (93.5%), with frequent resistance to sulphonamides, nitrofurantoin and oxacillin, whereas gentamicin retained activity. Genomic analyses revealed conserved core genomes alongside variable accessory elements, indicating both evolutionary stability and adaptive potential. Phylogenetic proximity to human-derived strains highlights zoonotic risk. Functional profiling demonstrated robust metabolism, environmental sensing, adhesion-related virulence factors and multiple bacteriocin clusters, supporting persistence and microbial competition. ARGs conferring multidrug efflux, β-lactam and fluoroquinolone resistance were conserved across isolates.

CONCLUSIONS: E. avium is a prevalent mastitis pathogen, highly conserved among the studied isolates, with significant virulence and AMR profiles, highlighting its zoonotic potential and need for One Health-based surveillance strategies.

PMID:42371642 | DOI:10.1002/vms3.71060

Categories
Nevin Manimala Statistics

Mindfulness-Based Group Medical Visits for Persons With Chronic Low Back Pain: A Randomized Clinical Trial

JAMA Intern Med. 2026 Jun 29. doi: 10.1001/jamainternmed.2026.2186. Online ahead of print.

ABSTRACT

IMPORTANCE: Back pain is among the most common, disabling, and costly conditions managed in primary care in the US, but current treatment options often do not provide adequate relief. Mindfulness-based interventions have demonstrated effectiveness in individuals with chronic low back pain (CLBP); however, mindfulness remains underused in part because it is not integrated into most outpatient care models.

OBJECTIVE: To assess whether persons with CLBP participating in a mindfulness group medical visit intervention experience significantly improved pain intensity and interference compared with those receiving usual care.

DESIGN, SETTING, AND PARTICIPANTS: This randomized clinical trial, Optimizing Pain Treatment in Medical Settings Using Mindfulness (OPTIMUM), using a pragmatic approach (designed to evaluate interventions under typical conditions of care) was conducted from May 7, 2021, to November 6, 2024. Adults with CLBP attending primary care clinics in Massachusetts, Pennsylvania, and North Carolina were included.

INTERVENTION: Participants were randomized 1:1 to the OPTIMUM intervention, an 8-week telehealth-delivered mindfulness group medical visit program delivered as part of primary care (intervention), or usual care (controls).

MAIN OUTCOMES AND MEASURES: The primary analysis assessed the between-group difference in the primary outcome of change from baseline to month 6 in the Pain, Enjoyment of Life and General Activity (PEG) scale score. A mean minimal clinically important difference (MCID) in PEG score of at least 1 was considered. Secondary analyses evaluated the between-group differences in change from baseline to week 8 and month 12 in PEG score.

RESULTS: Of 451 participants (mean [SD], 52.1 [14.7] years; 318 [70.5%] female), 224 were randomized to the intervention group and 227 to the control group. All reported moderate pain interference at baseline. In intention-to-treat analyses, the intervention participants had a statistically significant improvement in PEG score from baseline compared with controls at the 6-month primary time point (mean change, -1.21 [95% CI, -1.50 to -0.92] vs -0.59 [95% CI, -0.86 to -0.31]; between-group difference, -0.62 [95% CI, -1.02 to -0.23]; P = .002) and at 8 weeks (mean change, -1.16 [95% CI, -1.44 to -0.88] vs -0.27 [95% CI, -0.53 to -0.003]; between-group difference, -0.89 [95% CI, -1.27 to -0.51]; P < .001) and 12 months (mean change, -1.52 [95% CI, -1.81 to -1.23] vs -0.78 [95% CI, -1.05 to -0.50]; between-group difference, -0.74 [95% CI, -1.14 to -0.34]; P < .001). The MCID was not met at any time point.

CONCLUSIONS AND RELEVANCE: In this randomized clinical trial, a telehealth-delivered mindfulness group medical visit program for persons with CLBP resulted in significant improvements in pain intensity and interference compared with usual care; however, these changes did not meet the prespecified mean 1-point MCID between groups. The program incorporated primary care clinicians, was accessible, and is potentially scalable as a nonpharmacologic treatment for CLBP.

TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT04129450.

PMID:42371634 | DOI:10.1001/jamainternmed.2026.2186

Categories
Nevin Manimala Statistics

Readmission and Late Mortality Among Children With Congenital Diaphragmatic Hernia

JAMA Netw Open. 2026 Jun 1;9(6):e2620290. doi: 10.1001/jamanetworkopen.2026.20290.

ABSTRACT

IMPORTANCE: Congenital diaphragmatic hernia (CDH) is a rare malformation with high neonatal mortality. Although advances in management have improved survival rates, long-term morbidity remains substantial, and its impact on the health care system, particularly hospital readmissions, remains poorly described.

OBJECTIVE: To describe the incidence, causes, and factors associated with hospital readmission and late mortality after discharge from the primary stay.

DESIGN, SETTING, AND PARTICIPANTS: This nationwide retrospective cohort study was conducted from 2012 to 2024 and used data from the French National Health Data System, capturing nationwide health insurance claims and hospital discharge records in France. Participants were children with CDH who underwent surgical repair within the first 6 months of life, and were discharged alive from the primary stay. Data were analyzed from January to October 2025.

MAIN OUTCOMES AND MEASURES: The main outcomes were readmission to an acute care facility within 3 years after discharge and death during follow-up. Factors associated with readmission were identified using multivariable analysis.

RESULTS: Of the 1028 included infants (median [IQR] birth weight, 3050 [2720-3410] g; 849 [82.6%] with full-term birth; 630 [61.3%] male infants), 753 had at least 3 years of follow-up (median [IQR] time of follow-up, 6.2 [2.6-9.1] years), constituting the overall sample size for the primary analysis. Of them, 546 children (72.5%) were readmitted at least once, and 182 (24.2%) required intensive care. At 3 years, 208 (38.0%), 112 (20.5%), and 127 (23.3%) children had experienced at least 1 readmission for respiratory causes, gastrointestinal and/or nutritional issues, and CDH-related surgical complications, respectively. Preterm birth (incidence rate ratio [IRR], 1.32; 95% CI, 1.10-1.60), associated congenital anomalies (IRR, 1.31; 95% CI, 1.13-1.53), a primary stay longer than 1 month (IRR, 1.50; 95% CI, 1.27-1.76), oxygen therapy at discharge (IRR, 2.14; 95% CI, 1.55-2.99), and enteral feeding at discharge (IRR, 2.21; 95% CI, 1.83-2.68) were independently associated with readmission. Fourteen late deaths (14 of 1028 infants [1.4%]) were recorded, attributable to CDH-related complications or associated comorbidities in half of cases. Enteral feeding at discharge was also independently associated with late mortality (hazard ratio, 5.09; 95% CI, 1.33-19.48).

CONCLUSIONS AND RELEVANCE: In this cohort study of 1028 children with CDH, nearly three-quarters were readmitted within 3 years, but late mortality was low. Although enteral feeding at discharge likely reflected CDH severity, it may also represent a potentially modifiable target that warrants further investigation to improve outcomes.

PMID:42371627 | DOI:10.1001/jamanetworkopen.2026.20290

Categories
Nevin Manimala Statistics

Pediatric Reference and Optimal Curves for Hemoglobin

JAMA Netw Open. 2026 Jun 1;9(6):e2620863. doi: 10.1001/jamanetworkopen.2026.20863.

ABSTRACT

IMPORTANCE: Clinicians rely on reference intervals (RIs) to interpret laboratory test results. In pediatric populations, estimating RIs typically requires partitioning data by age, sex, and other relevant factors, which can lead to limited sample size and imprecise estimates; these limitations are addressed by using curve estimation, modeling hemoglobin level as a continuous function of age.

OBJECTIVE: To establish hemoglobin reference curves (RCs) for children and to complement recently published World Health Organization (WHO) thresholds by estimating hemoglobin optimal curves (OCs) that may inform more appropriate reporting standards.

DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study included healthy Canadian children aged 2 weeks through 10 years attending scheduled primary care visits from June 3, 2008, to February 26, 2020, in Toronto, Ontario, Canada. Data were analyzed from October 16, 2024, to February 1, 2026.

EXPOSURE: Blood samples were collected and analyzed for hemoglobin, ferritin, and C-reactive protein levels.

MAIN OUTCOMES AND MEASURES: Parents completed a questionnaire to collect variables used as optimality criteria. Sex-specific RCs and OCs were estimated using nonparametric quantile regression with restricted cubic splines. RCs were based on the full sample, whereas OCs excluded children with indicators of suboptimal iron status. A web-based platform was developed to visualize these curves and calculate sex-specific reference and optimal limits by age. Findings were examined in relation to WHO hemoglobin thresholds.

RESULTS: Blood samples from 4597 children (2451 males [53%]; median age, 38 months [IQR, 18-63 months]) were used to estimate hemoglobin RCs, and samples from a subgroup of 3426 children (1798 males [52%]; median age, 45 months [IQR, 24-68 months]) were used to estimate OCs. For females, lower OC hemoglobin limits were slightly below the lower RC limits up until age 2 years and became higher after age 6 years (eg, at 6 months, the OC lower limit was 9.91 g/dL [90% CI, 9.70-10.13 g/dL] vs 10.00 g/dL [90% CI, 9.78-10.23 g/dL] for RC). For males, lower OC limits were higher than lower RC limits until age 20 months (eg, at 6 months, the OC lower limit was 9.74 g/dL [90% CI, 9.46-10.02 g/dL] vs 9.28 g/dL [90% CI, 8.94-9.63 g/dL] for RC) and were similar afterwards. Differences in the upper limits were minimal for both sexes. WHO hemoglobin thresholds were consistently higher than lower limits of OCs across all ages but exceeded the 5th percentile curve only among children aged 5 through 10 years for both sexes (eg, for males aged 1 year, the OC lower limit was 10.06 g/dL [90% CI, 9.92-10.21 g/dL] vs the 10.5 g/dL WHO threshold).

CONCLUSIONS AND RELEVANCE: This cross-sectional study estimated sex-specific pediatric RCs for hemoglobin, modeled as a continuous function of age, to eliminate the need for age partitioning and overcome the associated sample size limitations. OCs, developed using health-based criteria, offered additional clinical context beyond traditional RIs. The findings highlight potential misalignments with existing WHO thresholds, particularly at younger ages.

PMID:42371625 | DOI:10.1001/jamanetworkopen.2026.20863

Categories
Nevin Manimala Statistics

Screening for Missed Opportunities for Diagnosis in the ED Using eTriggers and Large Language Models

JAMA Netw Open. 2026 Jun 1;9(6):e2620939. doi: 10.1001/jamanetworkopen.2026.20939.

ABSTRACT

IMPORTANCE: Emergency department (ED) quality review often uses administrative electronic triggers (eTriggers), but yields on detecting missed opportunities for diagnosis (MODs) are low. A commercial large language model (LLM) may help screen for MODs, yet evaluation data in real-world cohorts remain limited.

OBJECTIVE: To evaluate LLMs for identifying MODs in ED eTrigger cohorts.

DESIGN, SETTING, AND PARTICIPANTS: This retrospective diagnostic study of 2 eTrigger cohorts, ED discharge with return hospital admission within 72 hours and ED admission to the floor with intensive care unit (ICU) escalation within 24 hours, was conducted from April 2015 through March 2025 across 9 EDs (2 academic and 7 community) in 1 US health system. Samples included 200 encounters from the 72-hour return cohort and 100 encounters from the floor-to-ICU cohort; each case was adjudicated by 2 emergency physicians using a review process based on the Safer Dx framework.

EXPOSURES: Cases were evaluated by Claude Sonnet 4, Claude Sonnet 4.6, Claude Opus 4.6, Gemini 3 Pro, GPT-5, and GPT-5 mini.

MAIN OUTCOMES AND MEASURES: Main outcomes were sensitivity, specificity, positive predictive value, negative predictive value, area under the receiver operating characteristic curve (AUC), and reviewer-reviewer and reviewer-model concordance.

RESULTS: Among 300 sampled encounters, 12 were excluded, leaving 288 analyzed encounters (median [IQR] age, 69 [54-79] years; 135 female [46.9%]) with 39 MODs (13.5%), including 21 of 191 (11.0%) in the 72-hour return cohort and 18 of 97 (18.6%) in the floor-to-ICU cohort. Interrater agreement was 81.9% (95% CI, 77.4%-86.1%), with Gwet AC1 of 0.77 (95% CI, 0.70-0.83). In the 72-hour return cohort, model sensitivity ranged from 42.9% (95% CI, 24.5%-63.5%) for GPT-5 mini to 85.7% (95% CI, 65.4%-95.0%) for Claude Sonnet 4, specificity from 55.9% (95% CI, 48.4%-63.1%) for Claude Sonnet 4 to 82.9% (95% CI, 76.6%-87.9%) for GPT-5 mini, and AUC from 0.65 (95% CI, 0.53-0.77) for GPT-5 mini to 0.73 (95% CI, 0.61-0.85) for Claude Sonnet 4. In the floor-to-ICU cohort, sensitivity ranged from 5.6% (95% CI, 1.0-25.8%) for GPT-5 mini to 55.6% (95% CI, 33.7%-75.4%) for Claude Sonnet 4, specificity from 64.6% (53.6%-74.2%) for Claude Sonnet 4 to 97.5% (95% CI, 91.2%-99.3%) for GPT-5 mini, and AUC from 0.57 (95% CI, 0.46-0.67) for GPT-5 mini to 0.82 (95% CI, 0.73-0.91) for GPT-5. Across cohorts, LLMs showed similar discrimination but different sensitivity-specificity tradeoffs; Claude Sonnet 4 generally favored higher sensitivity, whereas GPT-5 mini favored higher specificity.

CONCLUSIONS AND RELEVANCE: In this diagnostic study of 2 ED eTrigger cohorts, model performance varied by cohort, with LLMs showing similar discrimination but different binary thresholds. These findings suggest that evaluation within the review workflow is needed before implementation and that reviewer-like concordance captures a distinct dimension of model behavior from discrimination.

PMID:42371624 | DOI:10.1001/jamanetworkopen.2026.20939

Categories
Nevin Manimala Statistics

State Cost Growth Benchmark Programs and Total Medical Expenditures, 2010 to 2020

JAMA Netw Open. 2026 Jun 1;9(6):e2620963. doi: 10.1001/jamanetworkopen.2026.20963.

ABSTRACT

IMPORTANCE: US health care spending continues to outpace economic growth, prompting states to implement cost-growth benchmark programs aimed at constraining expenditure growth. However, empirical evidence evaluating their associations with overall spending growth remains limited.

OBJECTIVE: To evaluate whether adoption of statewide cost-growth benchmark programs is associated with changes in per capita total medical expenditure (TME) growth.

DESIGN, SETTING, AND PARTICIPANTS: This cohort study used a quasi-experimental, difference-in-differences analysis with 2-way fixed effects to examine data from the Centers for Medicare & Medicaid Services State Health Expenditure Accounts from January 1, 2010, to December 31, 2020. A total of 561 state- and year-level observations across 50 states and Washington, DC, were analyzed. Statistical analysis was performed from January 2025 to April 2026.

EXPOSURES: Adoption of statewide cost-growth benchmark programs in Massachusetts (2013), Maryland (2014), Vermont (2018), Rhode Island (2019), and Delaware (2019); all states paired their benchmark programs with enforcement mechanisms and/or payment reforms except for Delaware, which relied solely on public reporting.

MAIN OUTCOMES AND MEASURES: The primary outcome was log-transformed per capita TME growth. Secondary outcomes included changes in payer-specific and spending category-specific expenditures.

RESULTS: Across all 50 states and Washington, DC, the mean annual per capita TME increased by 3.7% during the study period. Implementation of cost-growth benchmark programs was associated with a 2.0% reduction in TME growth (95% CI, -3.3% to -0.7%; P = .004). Reductions were observed in all treatment states except Delaware. Medicare spending growth decreased across all treatment states (-2.4%; 95% CI, -4.2 to -0.6; P = .009), whereas reductions in commercial spending growth were concentrated in Maryland (-2.2%; 95% CI, -3.6 to -0.8; P = .003) and Rhode Island (-18.3%; 95% CI, -20.3% to -16.2%; P < .001). Spending reductions were primarily driven by decreases in hospital (-5.3%; 95% CI, -7.3% to -3.3%; P < .001) and skilled nursing facility expenditures (-7.7%; 95% CI, -10.5% to -4.9%; P < .001), alongside concomitant spending increases in home health spending (8.9%; 95% CI, 3.2% to 14.8%; P = .002). Findings were robust to multiple sensitivity analyses.

CONCLUSIONS AND RELEVANCE: This cohort study found that state cost-growth benchmark programs were associated with modest reductions in health care spending growth. These findings suggest that benchmark programs, particularly those paired with enforcement mechanisms or payment reforms, may contribute to slowing expenditure growth and shifting care toward lower-cost settings.

PMID:42371623 | DOI:10.1001/jamanetworkopen.2026.20963