Skip to content

Tutorials

moo_hax edited this page Apr 30, 2021 · 8 revisions

MNIST

Counterfit works to keep the target in focus for the user and tries to provide a uniform interface from which to use the underlying frameworks. However, understanding how to build a class is important for successful use. For a warmup, we will build a target for everyone’s favorite ML model, MNIST.

  1. Start Counterfit and execute the new command. Enter a name and select images as the data type.
  2. Find the new target folder in counterfit/targets, and open the new target python file in your preferred code editor.
  3. Fill out the required target properties.
  • model_name are taken care of
  • model_data_type was taken care of by the new command.
  • model_endpoint is where Counterfit will collect outputs from the target model. We will use the "mnist_sklearn_pipeline.pkl" pre-trained model found in the tutorial folder.
  • model_input_shape is the input shape of the target model, which is a known "(1, 28, 28)"
  • model_output_classes are the output classes of the model, which is a known "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]" After filling in the blanks with the information above, the Tutorial target should look like the following,
from counterfit.base import AbstractTarget

class Tutorial(AbstractTarget):
    model_name = 'tutorial'
    model_data_type = 'numpy'
    model_endpoint = 'counterfit/targets/tutorial/mnist_sklearn_pipeline.pkl'
    model_input_shape = (1, 28, 28)
    model_output_classes = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

With the required properties in place, we can start implementing functionality, starting with loading resources. 3. Paste the __init__ function below in the Tutorial class.

The __init__ function is responsible for loading resources required for attacking a target model. This includes sample data (X), models, or anything else you may need. At a minuimum Counterfit expects X to be set.

def __init__(self):
  with open(self.model_endpoint, "rb") as f:
    self.model = pickle.load(f)

  self.data_file = 'counterfit/targets/tutorial/mnist_784.npz'
  self.sample_data = np.load(self.data_file, allow_pickle=True)

  self.X = self.sample_data['X']

Because this is a local model, we first load the model to expose the predict function that Counterfit will use to interact with the target model. Next, we load sample data, X is a list of lists, where each list is an array containing a processed sample. The data for the tutorial is in a nice tidy numpy zip file, however, most targets will require additional processing to get X.

Excellent, we now have samples, labels, and a model to attack. Next, we will build the function Counterfit will use to submit samples to the target model.

4.Paste the following code below the __init__ function.

def __call__(self, x):
  scores = self.model.predict_proba(x)
  return (scores.tolist())

During attack runtime an attack algorithm will use the __call__ function to submit samples to the target model via x. x is a perturbed sample of shape (Batch, Channels, Height, Width). Channels, Height, and Width are derived from the model_input_shape that was defined earlier.

What is "Batch" and why only 1? A batch is the number of samples being submitted to the target model. Conventionally, attack algorithms work on multiple samples at the same time, however, Counterfit was designed for penetration testing and red teaming where submitting large amounts of data by default is not good practice. So for now, Counterfit uses a batch of 1.

Functionally, x is a list of lists, where each list is a sample of shape (1, 28, 28). This should sound familiar as it is the shape as X and model_input_shape!

Inside the __call__ function x is passed to the model for prediction and the resulting output is collected. Although, currently there's a problem with our __call__ function, do you see it? The pre-trained MNIST model uses an input shape of (1, 784) and not (1, 28, 28). Before submitting our input to the model, we need to reshape the array. Alternatively, you could define model_input_shape as (1, 784).

5.Update the __call__ function to reshape x to (1, 784).

def __call__(self, x):
  scores = self.model.predict_proba(x.reshape(x.shape[0], -1))
  return (scores.tolist())

A crucial piece to the __call__ function is how we return scores to the attack algorithm. An attack algorithm uses the returned scores to inform how to change the sample for the next iteration of the attack. In this tutorial the pre-trained MNIST model returns exactly what is needed, which is an array of proabilities for each label. When attacking models in the wild, an understanding how to construct these yourself is important and is covered in the next tutorial.

6. Paste the following code into the Template class.

def outputs_to_labels(self, output):
  output = np.atleast_2d(output)
  return [self.model_output_classes[i] for i in np.argmax(output, axis=1)]

Finally, the last function, outputs_to_labels. This function is used to match the outputs from the __call__ function to the model_output_classes defined in the class properties. This function is used to check for success - especially in targeted attacks. You need to ensure that the right output is matched to the right label.

Alright, we're ready to attack our MNIST model. The final Template target should look like,

import pickle

import numpy as np
from core.state import AbstractTarget

class Tutorial(AbstractTarget):
    model_name = 'tutorial'
    model_data_type = 'numpy'
    model_endpoint = 'counterfit/targets/tutorial/mnist_sklearn_pipeline.pkl'
    model_input_shape = (1, 28, 28)
    model_output_classes = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]

    def __init__(self):
      with open(self.model_endpoint, "rb") as f:
        self.model = pickle.load(f)

      self.data_file = 'counterfit/targets/tutorial/mnist_784.npz'
      self.sample_data = np.load(self.data_file, allow_pickle=True)

      self.X = self.sample_data['X']

    def __call__(self, x):
        scores = self.model.predict_proba(x.reshape(x.shape[0], -1))
        return (scores.tolist())

    def outputs_to_labels(self, output):
        output = np.atleast_2d(output)
        return [self.model_output_classes[i] for i in np.argmax(output, axis=1)]

7. Start Counterfit and list targets with list targets If this is your first time running Counterfit it might take a minute to start. Hopefully tutorial is in the list of targets. Along with the target name, other properties from the Tutorial class are highlighted.

8. Explore and load available frameworks Counterfit wraps existing Adversarial frameworks which can be heavy to load. To keep a reasonable startup time frameworks are not loaded on start. After a framework has been loaded, Counterfit can use the algorithms in the framework.

9. Prepare the tutorial model to attack Start by interacting with a target, you will see a bunch of warnings that we're not supressing to keep the target code clean. After listing the attacks while interacting with a target you will notice a smaller list of available attacks. Attacks are filtered based on the model_data_type defined in the Tutorial class.


`tutorial> scan --verbose --iterations 10 --log --attack hop_skip_jump`

11. Run a targeted attack.

The previous example showed how to run an untargeted attack. An untargeted attack attempts to change the label to anything other than the original label. A targeted attack attempts to change a label to a specific label. 

12. Review and save the results.

After all that hard work it is time to view the results.