PySyft decouples private data from model training, using Federated Learning, Differential Privacy, and Encrypted Computation (like Multi-Party Computation (MPC) and Homomorphic Encryption (HE)) within the main Deep Learning frameworks like PyTorch and TensorFlow. Join the movement on Slack.
Most software libraries let you compute over the information you own and see inside of machines you control. However, this means that you cannot compute on information without first obtaining (at least partial) ownership of that information. It also means that you cannot compute using machines without first obtaining control over those machines. This is very limiting to human collaboration and systematically drives the centralization of data, because you cannot work with a bunch of data without first putting it all in one (central) place.
The Syft ecosystem seeks to change this system, allowing you to write software which can compute over information you do not own on machines you do not have (total) control over. This not only includes servers in the cloud, but also personal desktops, laptops, mobile phones, websites, and edge devices. Wherever your data wants to live in your ownership, the Syft ecosystem exists to help keep it there while allowing it to be used privately for computation.
This repo contains multiple projects which work together, namely PySyft and PyGrid. PyGrid will be added soon, in the mean time this is the directory structure.
OpenMined/PySyft
├── README.md <-- You are here 📌
└── packages
├── grid <-- Coming to this Mono repo 🔜
└── syft <-- The Syft droids you are looking for 👋🏽
NOTE Changing the entire folder structure will likely result in some minor issues. If you spot one please let us know or open a PR.
PySyft is the centerpiece of the Syft ecosystem. It has two primary purposes. You can either use PySyft to perform two types of computation:
- Dynamic: Directly compute over data you cannot see.
- Static: Create static graphs of computation which can be deployed/scaled at a later date on different compute.
The PyGrid library serves as an API for the management and deployment of PySyft at scale. It also allows for you to extend PySyft for the purposes of Federated Learning on web, mobile, and edge devices using the following Syft worker libraries:
- KotlinSyft (Android)
- SwiftSyft (iOS)
- syft.js (Javascript)
- PySyft (Python, you can use PySyft itself as one of these "FL worker libraries")
However, the Syft ecosystem only focuses on consistent object serialization/deserialization, core abstractions, and algorithm design/execution across these languages. These libraries alone will not connect you with data in the real world. The Syft ecosystem is supported by the Grid ecosystem, which focuses on the deployment, scalability, and other additional concerns around running real-world systems to compute over and process data (such as data compliance web applications).
- PySyft is the library that defines objects, abstractions, and algorithms.
- PyGrid is the platform which lets you deploy them within a real institution.
- PyGrid Admin is a UI which allows a data owner to manage their PyGrid deployment.
A more detailed explanation of PySyft can be found in the white paper on Arxiv.
PySyft has also been explained in videos on YouTube:
PySyft is available on PyPI and Conda.
We recommend that you install PySyft within a virtual environment like Conda, due to its ease of use. If you are using Windows, we suggest installing Anaconda and using the Anaconda Prompt to work from the command line.
$ conda create -n pysyft python=3.9
$ conda activate pysyft
$ conda install jupyter notebook
We support Linux, MacOS and Windows and the following Python and Torch versions. Older versions may work, however we have stopped testing and supporting them.
Py / Torch | 1.6 | 1.7 | 1.8 |
---|---|---|---|
3.7 | ✅ | ✅ | ✅ |
3.8 | ✅ | ✅ | ✅ |
3.9 | ➖ | ✅ | ✅ |
$ pip install syft
This will auto-install PyTorch and other dependencies as required to run the examples and tutorials. For more information on building from source see the contribution guide here.
Coming soon! Until then, please view the Examples below.
A comprehensive list of examples can be found here.
These tutorials cover a variety of Python libraries for data science and machine learning.
All the examples can be played with by launching a Jupyter Notebook and navigating to the examples
folder.
$ jupyter notebook
Duet is a peer-to-peer tool within PySyft that provides a research-friendly API for a Data Owner to privately expose their data, while a Data Scientist can access or manipulate the data on the owner's side through a zero-knowledge access control mechanism. It's designed to lower the barrier between research and privacy-preserving mechanisms, so that scientific progress can be made on data that is currently inaccessible or tightly controlled. The main benefit of using Duet is that allows you to get started using PySyft, without needing to manage a full PyGrid deployment. It is the simplest path to using Syft, without needing to install anything (except Syft 😉).
You can find all Duet examples in the examples/duet
folder.
The guide for contributors can be found here. It covers all that you need to know to start contributing code to PySyft today.
Also, join the rapidly growing community of 12,000+ on Slack. The Slack community is very friendly and great about quickly answering questions about the use and development of PySyft!
This software is in beta. Use at your own risk.
The PySyft 0.2.x codebase is now in its own branch here, but OpenMined will not offer official support for this version range. We have compiled a list of FAQs relating to this version._
For support in using this library, please join the #support Slack channel. Click here to join our Slack community!
We are very grateful for contributions to PySyft from the following organizations!