Categories
Nevin Manimala Statistics

A Benchmark Evaluation of Chemical Structure Extraction from Patents: Insights and Challenges in Chemical Structure Recognition

Chem Res Toxicol. 2026 May 26. doi: 10.1021/acs.chemrestox.6c00057. Online ahead of print.

ABSTRACT

Early warning systems (EWSs) are currently being developed by various authorities aiming at identifying potentially hazardous chemicals before they become a threat to the environment and human health. In this context, patents provide an excellent data source for exploring novel chemistry or the use of chemicals in materials and products. However, analysis of patents is challenging, including unraveling molecular structures presented as graphics depicting various elements, functional groups, and molecular bonds. Our study aims to improve EWS using automated artificial intelligence-based molecular structure recognition methods for encoding these for further hazard analysis. Current structure extraction tools are primarily trained on chemical structures collected from publicly available data sets, and the application of these tools to patent-specific chemical data has received little attention. This paper presents a field study utilizing the three tools Decimer, Molscribe, and Mathpix and assesses their performance in recognizing chemical structures in patents. Two data sets were compiled and curated including (1) diverse organic chemicals and (2) per- and polyfluoroalkyl substances (PFAS). It was revealed that these tools perform well on simpler molecular structures, whereas they struggle with more complex structural features, including repetitive units, cross-bonding, and Markush structures. Furthermore, it was discovered that these tools are extremely sensitive to image artifacts such as noise from lines and dots or distortions. Overcoming these challenges will be critical before implementation in automated EWS and thereby enable screening of patents for rapid and effective identification of potentially hazardous emerging chemicals.

PMID:42186716 | DOI:10.1021/acs.chemrestox.6c00057

By Nevin Manimala

Portfolio Website for Nevin Manimala