SeCure learning tO aNalyse VertIcally partitioNed CancEr Data (CONVINCED)
Enabling survival analyses on privacy-sensitive distributed cancer data using Multi-Party Computation
The Netherlands Comprehensive Cancer Organisation (IKNL) maintains the Netherlands Cancer Registry (NCR) data. The growing complexity of cancer diagnosis and treatment requires data sets that are larger and richer than currently available in a single cancer registry. For instance, relevant patient characteristics may be recorded at different organizations. However, sharing patient data remains challenging due to compliance with privacy regulations. Therefore, if we want to study the effects of these features we need new approaches that allow us to analyse data in a distributed manner. New technologies like federated learning and Multi-Party Computation (MPC) offer opportunities to train AI algorithms on such distributed data while preserving privacy.
In particular the setting where the data is vertically partitioned is relevant when considering the Netherlands: in this case, each party owns a portion of the attributes, e.g. clinical, demographic and psycho-social variables, of the same group of patients. While there exist several federated learning methods that learn from horizontally-partitioned privacy-sensitive data sets, there are only limited references in the literature of methods to train machine learning models from vertically-partitioned data sets, and the security properties of such methods are unclear.
Survival analysis focuses on analysing the expected time it takes for an event to occur. In cancer research it can tell us how likely someone is to be alive a few years after diagnosis. Additionally, it can give insight into which characteristics might impact the chances of survival, e.g. the patient’s fitness, the treatment method, and hospital of diagnosis. In the joint research project CONVINCED, IKNL and TNO combined clinical data and analysis knowledge with knowledge on privacy-preserving analytics and cryptography to develop new solutions to enable privacy-preserving survival analyses on vertically-partitioned data sets, making use of MPC. The focus is on the Kaplan-Meier estimator (i.e., survival curves) and Cox proportional hazards model, two commonly-used techniques in survival analysis to estimate the impact of specific features on the survival rate of (groups of) patients, and widely used in the medical domain, not only in Oncology. The developed solutions therefore have a broader applicability.
In this project new solutions were developed in a Proof-of-Concept, enabling the privacy-preserving analyses mentioned above using MPC. In addition, generic secure components were developed relevant as building for other privacy-preserving AI models. The aim is to publish the developed software open source. Future research questions will focus on improving performance and scalability and the implementation of other relevant algorithms in a privacy-preserving manner.
- Daniël Worm, Sr consultant, TNO, e-mail: firstname.lastname@example.org