Skip to main content
EuroPython logo

PySyft: Data Science on data you are not allowed to see

Track:
PyData: Machine Learning, Stats
Type:
Talk
Level:
intermediate
Room:
North Hall
Start:
14:00 on 11 July 2024
Duration:
30 minutes

Abstract

In today’s data-driven world, privacy stands as an essential requirements for the ethical and effective practice of data science. Moreover, the implementation of robust privacy guarantees in data analysis not only protects sensitive information, but also unlocks the potential for unprecedented democratisation of models and datasets.

PySyft is a stack of open source tools that is designed to help organisations to securely collaborate with external (untrusted) individuals. By using PySyft, organisations can enable external auditors (e.g. data scientists) to use their assets, such as datasets or models, in order to conduct studies with a specific, known purpose. Data scientists can run their analysis using those assets through PySyft, and without seeing nor obtaining a copy of the assets themselves. We call this process Remote Data Science. PySyft is a framework for Remote Data Science.

In the first part of my talk I will introduce the problem of privacy in Data Science, PETs (Privacy Enhancing Technologies), and OpenMined mission to democratise access to data and information. Afterwards, I will demonstrate how PySyft works, and how it can be used to run a machine learning experiments, with privacy guarantees.