remove case study discussed in previous chapter and introduce one new…

… federated case study
harvard-edge · Nov 18, 2024 · 12f3dd0 · 12f3dd0
1 parent 000bbc2
commit 12f3dd0
Show file tree

Hide file tree

Showing 2 changed files with 19 additions and 15 deletions.
diff --git a/contents/core/privacy_security/privacy_security.bib b/contents/core/privacy_security/privacy_security.bib
@@ -683,3 +683,16 @@ @inproceedings{zhao2018fpga
  year = {2018},
  month = may,
 }
+
+
+@article{heyndrickx2023melloddy,
+  title={Melloddy: Cross-pharma federated learning at unprecedented scale unlocks benefits in qsar without compromising proprietary information},
+  author={Heyndrickx, Wouter and Mervin, Lewis and Morawietz, Tobias and Sturm, No{\'e} and Friedrich, Lukas and Zalewski, Adam and Pentina, Anastasia and Humbeck, Lina and Oldenhof, Martijn and Niwayama, Ritsuya and others},
+  journal={Journal of chemical information and modeling},
+  volume={64},
+  number={7},
+  pages={2331--2344},
+  year={2023},
+  publisher={ACS Publications},
+  url = {https://pubs.acs.org/doi/10.1021/acs.jcim.3c00799},
+}
diff --git a/contents/core/privacy_security/privacy_security.qmd b/contents/core/privacy_security/privacy_security.qmd
@@ -720,9 +720,9 @@ Data minimization can be broken down into [3 categories](https://dl.acm.org/doi/
 
 2. The data collected from users must be _relevant_ to the purpose of the data collection.
 
-3. Users' data should be limited to only the necessary data to fulfill the purpose of the initial data collection. If similarly robust and accurate results can be obtained from a smaller dataset, any additional data beyond this smaller dataset should not be collected.
+3. Users' data should be limited to only the _necessary_ data to fulfill the purpose of the initial data collection. If similarly robust and accurate results can be obtained from a smaller dataset, any additional data beyond this smaller dataset should not be collected.
 
-Emerging techniques like differential Privacy, federated learning, and synthetic data generation allow useful insights derived from less raw user data. Performing data flow mapping and impact assessments helps identify opportunities to minimize raw data usage.
+Emerging techniques like differential privacy, federated learning, and synthetic data generation allow useful insights derived from less raw user data. Performing data flow mapping and impact assessments helps identify opportunities to minimize raw data usage.
 
 Methodologies like Privacy by Design [@cavoukian2009privacy] consider such minimization early in system architecture. Regulations like GDPR also mandate data minimization principles. With a multilayered approach across legal, technical, and process realms, data minimization limits risks in embedded ML products.
 
@@ -884,23 +884,14 @@ There are several system performance-related aspects of FL in machine learning s
 
 **Energy Consumption:** The energy consumption of client devices in FL is a critical factor, particularly for battery-powered devices like smartphones and other TinyML/IoT devices. The computational demands of training models locally can lead to significant battery drain, which might discourage continuous participation in the FL process. Balancing the computational requirements of model training with energy efficiency is essential. This involves optimizing algorithms and training processes to reduce energy consumption while achieving effective learning outcomes. Ensuring energy-efficient operation is key to user acceptance and the sustainability of FL systems.
 
-#### Case Studies
+#### Case Studies: Federated Learning for Collaborative Healthcare Datasets
 
-Here are a couple of real-world case studies that can illustrate the use of federated learning:
+In healthcare and pharmaceuticals, organizations often hold vast amounts of valuable data, but sharing it directly is fraught with challenges. Strict regulations like GDPR and HIPAA, as well as concerns about protecting IP, make combining datasets across companies nearly impossible. However, collaboration remains essential for advancing fields like drug discovery and patient care. Federated learning offers a unique solution by allowing companies to collaboratively train machine learning models without ever sharing their raw data. This approach ensures that each organization retains full control of its data while still benefiting from the collective insights of the group.
 
-##### Google Gboard
+The MELLODDY project, a landmark initiative in Europe, exemplifies how federated learning can overcome these barriers [@heyndrickx2023melloddy]. MELLODDY brought together ten pharmaceutical companies to create the largest shared chemical compound library ever assembled, encompassing over 21 million molecules and 2.6 billion experimental data points. Despite working with sensitive and proprietary data, the companies securely collaborated to improve predictive models for drug development.
 
-Google uses federated learning to improve predictions on its Gboard mobile keyboard app. The app runs a federated learning algorithm on users' devices to learn from their local usage patterns and text predictions while keeping user data private. The model updates are aggregated in the cloud to produce an enhanced global model. This allows for providing next-word predictions personalized to each user's typing style while avoiding directly collecting sensitive typing data. Google reported that the federated learning approach reduced prediction errors by 25% compared to the baseline while preserving Privacy.
+The results were remarkable. By pooling insights through federated learning, each company significantly enhanced its ability to identify promising drug candidates. Predictive accuracy improved while the models also gained broader applicability to diverse datasets. MELLODDY demonstrated that federated learning not only preserves privacy but also unlocks new opportunities for innovation by enabling large-scale, data-driven collaboration. This approach highlights a future where companies can work together to solve complex problems without sacrificing data security or ownership.
 
-##### Healthcare Research
-
-The UK Biobank and American College of Cardiology combined datasets to train a model for heart arrhythmia detection using federated learning. The datasets could not be combined directly due to legal and Privacy restrictions. Federated learning allowed collaborative model development without sharing protected health data, with only model updates exchanged between the parties. This improved model accuracy as it could leverage a wider diversity of training data while meeting regulatory requirements.
-
-##### Financial Services
-
-Banks are exploring using federated learning for anti-money laundering (AML) detection models. Multiple banks could jointly improve AML Models without sharing confidential customer transaction data with competitors or third parties. Only the model updates need to be aggregated rather than raw transaction data. This allows access to richer training data from diverse sources while avoiding regulatory and confidentiality issues around sharing sensitive financial customer data.
-
-These examples demonstrate how federated learning provides tangible privacy benefits and enables collaborative ML in settings where direct data sharing is impossible.
 
 ### Machine Unlearning