Commun Chem. 2026 Jan 3. doi: 10.1038/s42004-025-01866-8. Online ahead of print.
ABSTRACT
Machine learning (ML) and artificial intelligence (AI) techniques are transforming the way chemical reactions are studied today. Datasets from high-throughput experimentation (HTE) are generated to better understand the reaction conditions crucial for outcomes such as yields and selectivities. However, it is often overlooked that datasets from such designed experiments possess a specific structure, which can be captured by a statistical model. Ignoring these data structures when applying ML/AI algorithms can result in misleading conclusions. In contrast, leveraging knowledge about the data-generating process yields reliable, interpretable, and comprehensive insights into reaction mechanisms. A particularly complex dataset is available for the Buchwald-Hartwig amination. Using this dataset, a statistical model for such HTE-generated chemical data is introduced, and a parameter estimation algorithm is developed. Based on the estimated model, new insights into the Buchwald-Hartwig amination are discussed. Our approach is applicable to a wide range of HTE-generated data for chemical reactions and beyond.
PMID:41484279 | DOI:10.1038/s42004-025-01866-8