synthetic-data-generation privacy confidentiality machine-learning privacy-enhancing-technologies secure-learning


Using syntethic data generation for secure learning that preserves data privacy.

Problem Context

Hospitals can use patient data to provide more personalised healthcare, which is catered to the patient's specific situation. Medical specialists can equally benefit from this data, for example to better understand the factors that contribute to diseases. Law enforcement and banks could share personal data to better combat fraud and money laundering. Yet when personal data is required to perform research, privacy concerns and regulations can form an obstruction.


Synthetic data generation (SDG) has emerged as a solution to learn from sensitive information and accommodate analysis by third parties. SDG methods use AI techniques to analyse personal data and produce syntethic data on the basis thereof. The synthetic data resembles the original data without containing real sensitive, personal information. As a result, they form an attractive substitute to be used in analyses and model development.


Within the Visi project, we created an online demonstrator that demonstrates the use of synthetic data generation.


  • Madelon Molhoek, Consultant DataScience, TNO, e-mail: