Digital twins and virtual cohorts: How is synthetic data used for real-world evidence?

Epidemiol Prev. 2026 Jan-Feb;50(1):94-99. doi: 10.19191/EP26.1.A902.022.

ABSTRACT

Synthetic data are artificially generated information with the aim of imitating real data. They are designed to preserve the statistical characteristics of the original population while ensuring high levels of privacy, which makes them particularly useful in contexts where confidentiality is crucial. Measuring the value of synthetic data means assessing the similarity with the original data, the ability to produce results comparable to those obtained with real data, and the potential risks of privacy breaches. However, some risks remain, including the possible re-identification of individuals, the danger of amplification of biases already present in the original data, and the difficulty in validating the quality of synthetically generated data. At present, synthetic data represent an emerging and promising technology in various fields, however their use in epidemiology, particularly in observational settings, is still debated and requires further investigation and evaluation.

PMID:41854006 | DOI:10.19191/EP26.1.A902.022

By Nevin Manimala