PySyft: Data Science on data you are not allowed to see
- Track:
- PyData: Machine Learning, Stats
- Type:
- Talk
- Level:
- intermediate
- Room:
- North Hall
- Start:
- 14:00 on 11 July 2024
- Duration:
- 30 minutes
Abstract
In today’s data-driven world, privacy stands as an essential requirements for the ethical and effective practice of data science. Moreover, the implementation of robust privacy guarantees in data analysis not only protects sensitive information, but also unlocks the potential for unprecedented democratisation of models and datasets.
PySyft is a stack of open source tools that is designed to help organisations to securely collaborate with external (untrusted) individuals. By using PySyft, organisations can enable external auditors (e.g. data scientists) to use their assets, such as datasets or models, in order to conduct studies with a specific, known purpose. Data scientists can run their analysis using those assets through PySyft, and without seeing nor obtaining a copy of the assets themselves. We call this process Remote Data Science. PySyft is a framework for Remote Data Science.
In the first part of my talk I will introduce the problem of privacy in Data Science, PETs (Privacy Enhancing Technologies), and OpenMined mission to democratise access to data and information. Afterwards, I will demonstrate how PySyft
works, and how it can be used to run a machine learning experiments, with privacy guarantees.