J Chem Inf Model. 2026 Jun 11. doi: 10.1021/acs.jcim.6c00675. Online ahead of print.
ABSTRACT
Two-dimensional (2D) inorganic crystals are a class of materials that are gaining significant attention for use in electronic and optoelectronic devices. Among many other exciting applications, 2D materials offer a range of beneficial properties to such devices owing to charge carrier confinement, high carrier mobility, tunable band gaps, strong light-matter interactions, and atomically thin geometries that enable excellent electrostatic control and mechanical flexibility. In parallel, data-driven approaches to predictions of inorganic material properties have gained considerable attention as computationally lightweight surrogate models for properties of interest. This is particularly important when screening candidate materials for particular sets of structure-property relationships. Many of these approaches have targeted three-dimensional bulk crystalline materials. In this work, we develop a set of data-driven models for predicting the properties of 2D layered, van der Waals, and ultrathin film materials, namely, thermodynamic stability, metallicity, and electronic band gap. We train the models on materials sourced from open-source computational databases of 2D materials (Alexandria_2D, C2DB, MC2D, and 2DMatpedia) and use chemically relevant elemental, physical, and compositional features as input. The large feature space is reduced to a subset of critical features by a statistical and gradient-boosted feature selection strategy. The models are fully interpretable with feature relevance scores and SHapley Additive exPlanations analysis assessing the global and local influence of feature values. The classifiers for thermodynamic stability and metallicity achieve F1-scores of 0.832 and 0.870 and accuracies of 89.7% and 89.7%, respectively. The regressor model for the band gap achieves an R2 of 0.883, a mean-absolute error (MAE) of 0.317 eV, and a root-mean-squared error (RMSE) of 0.485 eV on the in-distribution test set. We assess the band gap predictor regressor against a 2D material band gap data set (N ∈ 177) manually extracted from the academic literature to quantify the model’s ability to predict outside of the training distribution, achieving an R2 of 0.334, an MAE of 0.675 eV, and an RMSE of 0.961 eV. These results demonstrate the efficacy of feature selection in producing fully explainable machine learning surrogate models for high-throughput property prediction for 2D materials.
PMID:42273839 | DOI:10.1021/acs.jcim.6c00675