Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add security and privacy chapter initial draft #66

Merged
merged 43 commits into from
Nov 30, 2023

Conversation

eliasab16
Copy link
Contributor

@eliasab16 eliasab16 commented Nov 21, 2023

Before submitting your Pull Request, please ensure that you have carefully reviewed and completed all items on this checklist.

  1. Content

    • The chapter content is complete and covers the topic in detail.
    • All technical terms are well-defined and explained.
    • Any code snippets or algorithms are well-documented and tested.
    • The chapter follows a logical flow and structure.
  2. References & Citations

    • All references are correctly listed at the end of the chapter.
    • In-text citations are used appropriately and match the references.
    • All figures, tables, and images have proper sources and are cited correctly.
  3. Quarto Website Rendering

    • The chapter has been locally built and tested using Quarto.
    • All images, figures, and tables render properly without any glitches.
    • All images have a source or they are properly linked to external sites.
    • Any interactive elements or widgets work as intended.
    • The chapter's formatting is consistent with the rest of the book.
  4. Grammar & Style

    • The chapter has been proofread for grammar and spelling errors.
    • The writing style is consistent with the rest of the book.
    • Any jargon is clearly explained or avoided where possible.
  5. Collaboration

    • All group members have reviewed and approved the chapter.
    • Any feedback from previous reviews or discussions has been addressed.
  6. Miscellaneous

    • All external links (if any) are working and lead to the intended destinations.
    • If datasets or external resources are used, they are properly credited and linked.
    • Any necessary permissions for reused content have been obtained.
  7. Final Steps

    • The chapter is pushed to the correct branch on the repository.
    • The Pull Request is made with a clear title and description.
    • The Pull Request includes any necessary labels or tags.
    • The Pull Request mentions any stakeholders or reviewers who should take a look.

@profvjreddi profvjreddi marked this pull request as draft November 21, 2023 02:54
Copy link
Contributor

@arbass22 arbass22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall just a few comments. I stopped putting the combined-word fixes once I realized it was a systemic problem. I wonder if someone's editor handled line breaks incorrectly but someone might need to go through this more thoroughly to fix these. Not a deal breaker though its still very clear, good job!


In this chapter, we will be talking about security and privacy together, so there are key terms that we need to be clear about.

- **Privacy:** For instance, when a fitness tracker collects data about your daily activities, privacy concerns revolve around who else can access this data---whether it's just the company, the user, or unwanted third parties as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the other 3 examples are about the same security it make sense to make this use the same example. Or make them all different examples. This one just feels a bit inconsistent with the rest!


### Mirai Botnet

The Mirai botnet involved the infection of networked devices such as digital cameras and DVR players [@antonakakis2017understanding]. In October 2016, the botnet was used to conduct one of the largest DDoS attacks ever, disrupting internet access across the United States. The attack was possible because many devices used default usernames and passwords, which were easily exploited by the Mirai malware to control the devices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a link to more context on what a DDoS attack is. Perhaps https://www.cloudflare.com/learning/ddos/what-is-a-ddos-attack/ ?


The methodology of model inversion typically involves the following steps:

- **Accessing Model Outputs:** The attacker queries the ML model withinput data and observes the outputs. This is often done through alegitimate interface, like a public API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Accessing Model Outputs:** The attacker queries the ML model withinput data and observes the outputs. This is often done through alegitimate interface, like a public API.
- **Accessing Model Outputs:** The attacker queries the ML model with input data and observes the outputs. This is often done through a legitimate interface, such as a public API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this was me - oops. I did a big search and replace for some line breaks and messed it up and thought I had fixed it all. Thanks for catching these and noticing them. Will update.


In these attacks, the objective is to extract information about concrete metrics, such as the learned parameters of a network, the fine-tuned hyperparameters, and the model's internal layer architecture [@oliynyk2023know].

- **Learned Parameters:** adversaries aim to steal the learnedknowledge (weights and biases) of a model in order to replicateit. Parameter theft is generally used in conjunction with otherattacks, such as architecture theft, which lacks parameterknowledge.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Learned Parameters:** adversaries aim to steal the learnedknowledge (weights and biases) of a model in order to replicateit. Parameter theft is generally used in conjunction with otherattacks, such as architecture theft, which lacks parameterknowledge.
- **Learned Parameters:** adversaries aim to steal the learned knowledge (weights and biases) of a model in order to replicate it. Parameter theft is generally used in conjunction with other attacks, such as architecture theft, which lacks parameter knowledge.


Data poisoning can degrade the accuracy of a model, force it to make incorrect predictions or cause it to behave unpredictably. In critical applications like healthcare, such alterations can lead to significant trust and safety issues.

There are four main categories of data poisoning [@oprea2022poisoning]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like six categories below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you good sir! 🙏 Fixed.


The researchers added synthetically generated toxic comments with slight misspellings and grammatical errors to the model's training data. This slowly corrupted the model, causing it to misclassify increasing numbers of severely toxic inputs as non-toxic over time.

After retraining on the poisoned data, the model's false negative rate increased from 1.4% to 27% - allowing extremely toxic comments to bypass detection. The researchers warned this stealthy data poisoning could enable the spread of hate speech, harassment, and abuse if deployed against real moderation systems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow this is super cool/effective/scary!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it is nuts! Some neat works out there.


- **Benefit:** The end result is a machine learning model that haslearned from a wide range of patient data without any of thatsensitive data having to be shared or leave its original location.

#### Trade-offs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth calling out that FL is still vulnerable to some types of attacks like data poisoning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Thanks @arbass22 I've added it. Will merge.

@mpstewart1 mpstewart1 marked this pull request as ready for review November 28, 2023 22:51
privacy_security.qmd Outdated Show resolved Hide resolved
privacy_security.qmd Outdated Show resolved Hide resolved
privacy_security.qmd Outdated Show resolved Hide resolved
privacy_security.qmd Outdated Show resolved Hide resolved
privacy_security.qmd Outdated Show resolved Hide resolved
privacy_security.qmd Outdated Show resolved Hide resolved
privacy_security.qmd Outdated Show resolved Hide resolved
privacy_security.qmd Outdated Show resolved Hide resolved
privacy_security.qmd Outdated Show resolved Hide resolved
privacy_security.qmd Outdated Show resolved Hide resolved
@mpstewart1 mpstewart1 merged commit 9df6390 into harvard-edge:main Nov 30, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants