Categories
Nevin Manimala Statistics

Maximal Local Privacy Loss-A New Method for Privacy Evaluation of Synthetic Datasets

Stat Med. 2026 Jan;45(1-2):e70376. doi: 10.1002/sim.70376.

ABSTRACT

Synthetic patient data has the potential to advance research in the medical field by providing privacy-preserving access to data resembling sensitive personal data. Assessing the level of privacy offered is essential to ensure privacy compliance, but it is challenging in practice. Many common methods either fail to capture central aspects of privacy or result in excessive caution based on unrealistic worst-case scenarios. We present a new approach to evaluating the privacy of synthetic datasets from known probability distributions based on the maximal local privacy loss. The strategy is based on measuring individual contributions to the likelihood of generating a specific synthetic dataset, to detect possibilities of reconstructing records in the original data. To demonstrate the method, we generate synthetic time-to-event data based on pancreatic and colon cancer data from the Cancer Registry of Norway using sequential regressions including a flexible parametric survival model. This illustrates the method’s ability to measure information leakage at an individual level, which can be used to ensure acceptable privacy risks for every patient in the data.

PMID:41569604 | DOI:10.1002/sim.70376

By Nevin Manimala

Portfolio Website for Nevin Manimala