-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add security and privacy chapter initial draft #66
add security and privacy chapter initial draft #66
Conversation
Co-Authored-By: eurashin <[email protected]>
Co-Authored-By: eurashin <[email protected]>
Co-Authored-By: ELSuitorHarvard <[email protected]> Co-Authored-By: Elias Nuwara <[email protected]>
Co-Authored-By: ELSuitorHarvard <[email protected]>
Co-Authored-By: ELSuitorHarvard <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall just a few comments. I stopped putting the combined-word fixes once I realized it was a systemic problem. I wonder if someone's editor handled line breaks incorrectly but someone might need to go through this more thoroughly to fix these. Not a deal breaker though its still very clear, good job!
privacy_security.qmd
Outdated
|
||
In this chapter, we will be talking about security and privacy together, so there are key terms that we need to be clear about. | ||
|
||
- **Privacy:** For instance, when a fitness tracker collects data about your daily activities, privacy concerns revolve around who else can access this data---whether it's just the company, the user, or unwanted third parties as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the other 3 examples are about the same security it make sense to make this use the same example. Or make them all different examples. This one just feels a bit inconsistent with the rest!
privacy_security.qmd
Outdated
|
||
### Mirai Botnet | ||
|
||
The Mirai botnet involved the infection of networked devices such as digital cameras and DVR players [@antonakakis2017understanding]. In October 2016, the botnet was used to conduct one of the largest DDoS attacks ever, disrupting internet access across the United States. The attack was possible because many devices used default usernames and passwords, which were easily exploited by the Mirai malware to control the devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a link to more context on what a DDoS attack is. Perhaps https://www.cloudflare.com/learning/ddos/what-is-a-ddos-attack/ ?
privacy_security.qmd
Outdated
|
||
The methodology of model inversion typically involves the following steps: | ||
|
||
- **Accessing Model Outputs:** The attacker queries the ML model withinput data and observes the outputs. This is often done through alegitimate interface, like a public API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Accessing Model Outputs:** The attacker queries the ML model withinput data and observes the outputs. This is often done through alegitimate interface, like a public API. | |
- **Accessing Model Outputs:** The attacker queries the ML model with input data and observes the outputs. This is often done through a legitimate interface, such as a public API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this was me - oops. I did a big search and replace for some line breaks and messed it up and thought I had fixed it all. Thanks for catching these and noticing them. Will update.
privacy_security.qmd
Outdated
|
||
In these attacks, the objective is to extract information about concrete metrics, such as the learned parameters of a network, the fine-tuned hyperparameters, and the model's internal layer architecture [@oliynyk2023know]. | ||
|
||
- **Learned Parameters:** adversaries aim to steal the learnedknowledge (weights and biases) of a model in order to replicateit. Parameter theft is generally used in conjunction with otherattacks, such as architecture theft, which lacks parameterknowledge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Learned Parameters:** adversaries aim to steal the learnedknowledge (weights and biases) of a model in order to replicateit. Parameter theft is generally used in conjunction with otherattacks, such as architecture theft, which lacks parameterknowledge. | |
- **Learned Parameters:** adversaries aim to steal the learned knowledge (weights and biases) of a model in order to replicate it. Parameter theft is generally used in conjunction with other attacks, such as architecture theft, which lacks parameter knowledge. |
privacy_security.qmd
Outdated
|
||
Data poisoning can degrade the accuracy of a model, force it to make incorrect predictions or cause it to behave unpredictably. In critical applications like healthcare, such alterations can lead to significant trust and safety issues. | ||
|
||
There are four main categories of data poisoning [@oprea2022poisoning]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like six categories below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you good sir! 🙏 Fixed.
|
||
The researchers added synthetically generated toxic comments with slight misspellings and grammatical errors to the model's training data. This slowly corrupted the model, causing it to misclassify increasing numbers of severely toxic inputs as non-toxic over time. | ||
|
||
After retraining on the poisoned data, the model's false negative rate increased from 1.4% to 27% - allowing extremely toxic comments to bypass detection. The researchers warned this stealthy data poisoning could enable the spread of hate speech, harassment, and abuse if deployed against real moderation systems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow this is super cool/effective/scary!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it is nuts! Some neat works out there.
|
||
- **Benefit:** The end result is a machine learning model that haslearned from a wide range of patient data without any of thatsensitive data having to be shared or leave its original location. | ||
|
||
#### Trade-offs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth calling out that FL is still vulnerable to some types of attacks like data poisoning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Thanks @arbass22 I've added it. Will merge.
Co-Authored-By: ELSuitorHarvard <[email protected]>
Co-Authored-By: ELSuitorHarvard <[email protected]>
Before submitting your Pull Request, please ensure that you have carefully reviewed and completed all items on this checklist.
Content
References & Citations
Quarto Website Rendering
Grammar & Style
Collaboration
Miscellaneous
Final Steps