Skip to content

Commit

Permalink
Fix formatting issues
Browse files Browse the repository at this point in the history
  • Loading branch information
profvjreddi committed May 7, 2024
1 parent c5a1c12 commit 78e7c44
Showing 1 changed file with 14 additions and 14 deletions.
28 changes: 14 additions & 14 deletions contents/robust_ai/robust_ai.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -303,17 +303,17 @@ Adversarial attacks exploit the way ML models learn and make decisions during in

Adversarial attacks fall under different scenarios:

\* \*\*Whitebox Attacks:\*\* the attacker possess full knowledge of the target model's internal workings, including the training data,parameters, and architecture [@ye2021thundernna]. This comprehensive access creates favorable conditions for an attacker to exploit the model's vulnerabilities. The attacker can take advantage of specific and subtle weaknesses to craft effective adversarial examples.
* **Whitebox Attacks:** the attacker possess full knowledge of the target model's internal workings, including the training data,parameters, and architecture [@ye2021thundernna]. This comprehensive access creates favorable conditions for an attacker to exploit the model's vulnerabilities. The attacker can take advantage of specific and subtle weaknesses to craft effective adversarial examples.

\* \*\*Blackbox Attacks:\*\* in contrast to whitebox attacks, in blackbox attacks, the attacker has little to no knowledge of the target model [@guo2019simple]. To carry out the attack, the adversarial actor needs to make careful observations of the model's output behavior.
* **Blackbox Attacks:** in contrast to whitebox attacks, in blackbox attacks, the attacker has little to no knowledge of the target model [@guo2019simple]. To carry out the attack, the adversarial actor needs to make careful observations of the model's output behavior.

\* \*\*Greybox Attacks:\*\* these fall in between blackbox and whitebox attacks. The attacker has only partial knowledge about the target model's internal design [@xu2021grey]. For example, the attacker could have knowledge about training data but not the architecture or parameters. In the real-world, practical attacks fall under both blackbox and greybox scenarios.
* **Greybox Attacks:** these fall in between blackbox and whitebox attacks. The attacker has only partial knowledge about the target model's internal design [@xu2021grey]. For example, the attacker could have knowledge about training data but not the architecture or parameters. In the real-world, practical attacks fall under both blackbox and greybox scenarios.

The landscape of machine learning models is both complex and broad, especially given their relatively recent integration into commercial applications. This rapid adoption, while transformative, has brought to light numerous vulnerabilities within these models. Consequently, a diverse array of adversarial attack methods has emerged, each strategically exploiting different aspects of different models. Below, we highlight a subset of these methods, showcasing the multifaceted nature of adversarial attacks on machine learning models:

\* \*\*Generative Adversarial Networks (GANs)\*\* are deep learning models that consist of two networks competing against each other: a generator and and a discriminator \[[@goodfellow2020generative]\]. The generator tries to synthesize realistic data, while the discriminator evaluates whether they are real or fake. GANs can be used to craft adversarial examples. The generator network is trained to produce inputs that are misclassified by the target model. These GAN-generated images can then be used to attack a target classifier or detection model. The generator and the target model are engaged in a competitive process, with the generator continually improving its ability to create deceptive examples, and the target model enhancing its resistance to such examples. GANs provide a powerful framework for crafting complex and diverse adversarial inputs, illustrating the adaptability of generative models in the adversarial landscape.
* **Generative Adversarial Networks (GANs)** are deep learning models that consist of two networks competing against each other: a generator and and a discriminator \[[@goodfellow2020generative]\]. The generator tries to synthesize realistic data, while the discriminator evaluates whether they are real or fake. GANs can be used to craft adversarial examples. The generator network is trained to produce inputs that are misclassified by the target model. These GAN-generated images can then be used to attack a target classifier or detection model. The generator and the target model are engaged in a competitive process, with the generator continually improving its ability to create deceptive examples, and the target model enhancing its resistance to such examples. GANs provide a powerful framework for crafting complex and diverse adversarial inputs, illustrating the adaptability of generative models in the adversarial landscape.

\* \*\*Transfer Learning Adversarial Attacks\*\* exploit the knowledge transferred from a pre-trained model to a target model, enabling the creation of adversarial examples that can deceive both models.These attacks pose a growing concern, particularly when adversaries have knowledge of the feature extractor but lack access to the classification head (the part or layer that is responsible for making the final classifications). Referred to as\"headless attacks,\" these transferable adversarial strategies leverage the expressive capabilities of feature extractors to craft perturbations while being oblivious to the label space or training data. The existence of such attacks underscores the importance of developing robust defenses for transfer learning applications, especially since pre-trained models are commonly used \[[@ahmed2020headless]\].
* **Transfer Learning Adversarial Attacks** exploit the knowledge transferred from a pre-trained model to a target model, enabling the creation of adversarial examples that can deceive both models.These attacks pose a growing concern, particularly when adversaries have knowledge of the feature extractor but lack access to the classification head (the part or layer that is responsible for making the final classifications). Referred to as\"headless attacks,\" these transferable adversarial strategies leverage the expressive capabilities of feature extractors to craft perturbations while being oblivious to the label space or training data. The existence of such attacks underscores the importance of developing robust defenses for transfer learning applications, especially since pre-trained models are commonly used \[[@ahmed2020headless]\].

#### Mechanisms of Adversarial Attacks

Expand Down Expand Up @@ -403,27 +403,27 @@ Data poisoning is an attack where the training data is tampered with, leading to

The process usually involves the following steps:

\* \*\*Injection:\*\* The attacker adds incorrect or misleading examples into the training set. These examples are often designed to look normal to cursory inspection but have been carefully crafted to disrupt the learning process.
* **Injection:** The attacker adds incorrect or misleading examples into the training set. These examples are often designed to look normal to cursory inspection but have been carefully crafted to disrupt the learning process.

\* \*\*Training:\*\* The ML model trains on this manipulated dataset and develops skewed understandings of the data patterns.
* **Training:** The ML model trains on this manipulated dataset and develops skewed understandings of the data patterns.

\* \*\*Deployment:\*\* Once the model is deployed, the corrupted training leads to flawed decision-making or predictable vulnerabilities the attacker can exploit.
* **Deployment:** Once the model is deployed, the corrupted training leads to flawed decision-making or predictable vulnerabilities the attacker can exploit.

The impacts of data poisoning extend beyond just classification errors or accuracy drops. In critical applications like healthcare, such alterations can lead to significant trust and safety issues [@marulli2022sensitivity]. Later on we will discuss a few case studies of these issues.

There are six main categories of data poisoning \[[@oprea2022poisoning]\]:

\* \*\*Availability Attacks\*\*: these attacks aim to compromise the overall functionality of a model. They cause it to misclassify most testing samples, rendering the model unusable for practical applications. An example is label flipping, where labels of a specific, targeted class are replaced with labels from a different one.
* **Availability Attacks**: these attacks aim to compromise the overall functionality of a model. They cause it to misclassify most testing samples, rendering the model unusable for practical applications. An example is label flipping, where labels of a specific, targeted class are replaced with labels from a different one.

\* \*\*Targeted Attacks:\*\* in contrast to availability attacks, targeted attacks aim to compromise a small number of the testing samples. So, the effect is localized to a limited number of classes, while the model maintains the same original level of accuracy for the majority of the classes. The targeted nature of the attack requires the attacker to possess knowledge of the model's classes.It also makes detecting these attacks more challenging.
* **Targeted Attacks:** in contrast to availability attacks, targeted attacks aim to compromise a small number of the testing samples. So, the effect is localized to a limited number of classes, while the model maintains the same original level of accuracy for the majority of the classes. The targeted nature of the attack requires the attacker to possess knowledge of the model's classes.It also makes detecting these attacks more challenging.

\* \*\*Backdoor Attacks:\*\* in these attacks, an adversary targets specific patterns in the data. The attacker introduces a backdoor (a malicious, hidden trigger or pattern) into the training data. For example, manipulating certain features in structured data or manipulating a pattern of pixels at a fixed position. This causes the model to associate the malicious pattern with specific labels. As a result, when the model encounters test samples that contain a malicious pattern, it makes false predictions.
* **Backdoor Attacks:** in these attacks, an adversary targets specific patterns in the data. The attacker introduces a backdoor (a malicious, hidden trigger or pattern) into the training data. For example, manipulating certain features in structured data or manipulating a pattern of pixels at a fixed position. This causes the model to associate the malicious pattern with specific labels. As a result, when the model encounters test samples that contain a malicious pattern, it makes false predictions.

\* \*\*Subpopulation Attacks:\*\* Here attackers selectively choose to compromise a subset of the testing samples while maintaining accuracy on the rest of the samples. You can think of these attacks as a combination of availability and targeted attacks: performing availability attacks (performance degradation) within the scope of a targeted subset. Although subpopulation attacks may seem very similar to targeted attacks, the two have clear differences:
* **Subpopulation Attacks:** Here attackers selectively choose to compromise a subset of the testing samples while maintaining accuracy on the rest of the samples. You can think of these attacks as a combination of availability and targeted attacks: performing availability attacks (performance degradation) within the scope of a targeted subset. Although subpopulation attacks may seem very similar to targeted attacks, the two have clear differences:

\* \*\*Scope:\*\* while targeted attacks target a selected set of samples, subpopulation attacks target a general subpopulation with similar feature representations. For example, in a targeted attack, an actor inserts manipulated images of a 'speed bump' warning sign(with carefully crafted perturbation or patterns), which causes an autonomous car to fail to recognize such a sign and slow down. On the other hand, manipulating all samples of people with a British accent so that a speech recognition model would misclassify a British person's speech is an example of a subpopulation attack.
* **Scope:** while targeted attacks target a selected set of samples, subpopulation attacks target a general subpopulation with similar feature representations. For example, in a targeted attack, an actor inserts manipulated images of a 'speed bump' warning sign(with carefully crafted perturbation or patterns), which causes an autonomous car to fail to recognize such a sign and slow down. On the other hand, manipulating all samples of people with a British accent so that a speech recognition model would misclassify a British person's speech is an example of a subpopulation attack.

\* \*\*Knowledge:\*\* while targeted attacks require a high degree of familiarity with the data, subpopulation attacks require less intimate knowledge to be effective.
* **Knowledge:** while targeted attacks require a high degree of familiarity with the data, subpopulation attacks require less intimate knowledge to be effective.

The characteristics of data poisoning include:

Expand Down

0 comments on commit 78e7c44

Please sign in to comment.