This is tje result of a student project for the course on Blockchain and Cryptocurrencies, master degree on Artificial Intelligence, University of Bologna, held by Prof. Stefano Ferretti.
Authors:
- G. Cialone
- F. Imboccioli
original project link: https://github.com/Imbo9/fl_blockchain
This project focuses on the implementation of federated learning techniques within a blockchain framework to create a collaborative model for classifying MRI images of Alzheimer's patients. The primary objective is to enhance the model's performance by leveraging ensemble models in the weight space of neural networks rather than simply averaging the scores of different model instances.
The main advantages of this approach are twofold:
-
Reduced Variance and Bias: By employing ensemble techniques in the weight space, the aggregated model achieves a more balanced trade-off between variance and bias. This leads to improved generalization and better performance on unseen data.
-
Privacy and Security: Addressing privacy concerns, hospitals do not share or upload the raw datasets to the blockchain. Instead, they only share the model weights obtained from each federated learning round. This ensures that sensitive patient data remains protected.
Additionally, the approach addresses storage capacity issues as the weights are stored on IPFS (InterPlanetary File System), and only the hash of the weights is loaded onto the blockchain for aggregation.
The key steps of the process are as follows:
- Hospitals participate in federated learning rounds and train their respective models locally on their datasets.
- Model weights, not raw data, are shared by the hospitals after each round.
- The shared model weights are securely uploaded onto the blockchain (with the actual weights stored on IPFS).
- The blockchain aggregates the weights, leading to the creation of an improved, collaborative model.
- The process is iterated over multiple rounds to continuously improve the model's performance.
By adopting this federated learning approach on a blockchain, hospitals can collectively benefit from a more powerful and privacy-preserving model without directly sharing sensitive data. This contributes to a better understanding and classification of Alzheimer's disease, even when dealing with diverse datasets from different hospital sources.
This setup is just for a simulation
- Ganache
- IPFS
- Miniconda
- eth-brownie
- cuda
- tensorflow
- opencv-python
- pandas
- scikit-learn
conda deactivate
conda create --name blockchain_project python=3.9
conda activate blockchain_project
python -m install pip --upgrade pip
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
pip install "tensorflow<2.11"
pip install opencv-python
pip install pandas==1.5.3
pip install eth-brownie
pip install scikit-learn
https://trufflesuite.com/ganache/
https://github.com/ipfs/ipfs-desktop/releases
brownie networks add Ethereum fl-local host=http://127.0.0.1:7545 chainid=5777 timeout=3600
brownie networks list
brownie run .\scripts\setup.py main --network fl-local
brownie run .\scripts\setup.py --network fl-local
This is just a simulation. For concurruncy problems on training on the same GPU, the collaborator.py script contains a loop that trains the different hospital model instances one at time in sequence. In a real time scenario, with more than one peer, it is possible to run the different learnings at the same time and it works in the same way.
brownie run .\scripts\collaborator.py --network fl-local
brownie run .\scripts\manager.py --network fl-local