Hepatology. 2021 Sep 8. doi: 10.1002/hep.32142. Online ahead of print.
ABSTRACT
BACKGROUND & AIMS: Chronic hepatitis B (CHB) affects over 290 million people globally and only 10% have been diagnosed, presenting a severe gap that must be addressed. We developed logistic regression and machine learning (random forest) models to accurately identify patients with HBV, using only easily-obtained demographic data from a population-based data set.
APPROACH & RESULTS: We identified participants with data on hepatitis B surface antigen (HBsAg), birth year, sex, race/ethnicity, and birthplace from 10 cycles of the National Health and Nutrition Examination Survey (NHANES, 1999-2018) and divided them into two cohorts: training (cycles 2, 3, 5, 6, 8, 10; n = 39,119) and validation (cycles 1, 4, 7, 9; n = 21,569). We then developed and tested our two models. The overall cohort was 49.2% male, 39.7% White, 23.2% Black, 29.6% Hispanic, and 7.5% Asian/Other, with a median birth year of 1973. In multivariable logistic regression, the following factors were associated with HBV infection: birth year 1991 or after (adjusted OR [aOR] of 0.28, P < 0.001), male sex (aOR 1.49, P = 0.0080), Black and Asian/Other vs. White (aOR 5.23 and 9.13, P < 0.001 for both), and being United States-born (vs. foreign-born) (aOR 0.14, P < 0.001). We found that the machine learning model consistently outperformed the logistic regression model, with higher AUROC values (0.83 vs. 0.75 in validation cohort, P < 0.001) and better differentiation of high and low risk individuals.
CONCLUSIONS: Our machine learning model provides a simple, targeted approach to HBV screening, using only easily-obtained demographic data.
PMID:34496066 | DOI:10.1002/hep.32142