Stud Health Technol Inform. 2024 Aug 22;316:705-709. doi: 10.3233/SHTI240511.
ABSTRACT
To address privacy and ethical issues in using health data for machine learning, we evaluate the scalability of advanced synthetic data generation methods like GANs, VAEs, copulaGAN, and transformer models specifically for patient service utilization data. Our study examines five models on data from a Canadian health authority, focusing on training and generation efficiency, data resemblance, and practical utility. Our findings indicate that statistical models excel in efficiency, while most models produce synthetic data that closely mirrors real data, and is also useful for real-world applications.
PMID:39176892 | DOI:10.3233/SHTI240511