Skip to content


moo_hax edited this page May 3, 2021 · 8 revisions


Counterfit works to keep the target in focus for the user and tries to provide a uniform interface from which to use the underlying frameworks. However, understanding how to build a class is important for successful use. For a warmup, we will build a target for everyone’s favorite ML model, MNIST.

  1. Start Counterfit and execute the new command. Enter a name and select images as the data type.
[[email protected]] -> python .\

                          __            _____ __
  _________  __  ______  / /____  _____/ __(_) /_
 / ___/ __ \/ / / / __ \/ __/ _ \/ ___/ /_/ / __/
/ /__/ /_/ / /_/ / / / / /_/  __/ /  / __/ / /
\___/\____/\__,_/_/ /_/\__/\___/_/  /_/ /_/\__/



        [+] 18 attacks
        [+] 4 targets

counterfit> new
? Target name: mnist
? Which framework? art
? What data type? image

  1. Find the new target folder in counterfit/targets, and open the new target python file in your preferred code editor. The code file is generated from a template in counterfit/core/commands/
# Generated by counterfit #

from counterfit.core.targets import ArtTarget

class Mnist(ArtTarget):
    model_name = "mnist"
    model_data_type = "image"
    model_endpoint = ""
    model_input_shape = ()
    model_output_classes = []
    X = []

    def __init__(self):
        self.X = []

    def __call__(self, x):
        return x
  1. In your code editor, fill out the required target properties.
  • model_name and model_data_type were taken care of during new target creation.
  • model_endpoint is where Counterfit will collect outputs from the target model. We will use the mnist_sklearn_pipeline.pkl pre-trained model found in the tutorial folder.
  • model_input_shape is the input shape of the target model, which is a known (1, 28, 28).
  • model_output_classes are the output classes of the model, which is a known ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"].

After filling in the blanks with the information above, the target class should look like the following,

# Generated by counterfit #

from counterfit.core.targets import ArtTarget

class Mnist(ArtTarget):
    model_name = "mnist"
    model_data_type = "image"
    model_endpoint = "counterfit/targets/tutorial/mnist_sklearn_pipeline.pkl"
    model_input_shape = (1, 28, 28)
    model_output_classes =  ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
    X = []

    def __init__(self):
        self.X = []

    def __call__(self, x):
        return x
  1. Interact with the target via interact. Try and reload the target, fix any errors that show up. Now when you list target some of the information should be filled out.
counterfit> interact mnist


mnist> reload

mnist> list targets

Name             Type             Input Shape      Location
creditfraud      numpy            (30,)            counterfit/targets/creditfraud/creditfraud_sklearn_pipeline.pkl
mnist            image            (1, 28, 28)      counterfit/targets/tutorial/mnist_sklearn_pipeline.pkl
moviereviews     text             (1,)             counterfit/targets/moviereviews/
satelliteimages  image            (3, 256, 256)    counterfit/targets/satelliteimages/satellite-image-params-airplane-stadium.h5
tutorial         image            (1, 28, 28)      counterfit/targets/tutorial/mnist_sklearn_pipeline.pkl

  1. With the required properties in place, we can start loading resources and implementing functionality.
  • This model is an image data type. A user can override clip_values in the target. This ensures image values remain valid pixel values.
  • Because this is a local model, we first load the model and expose the predict function that Counterfit will use to interact with the target model.
  • Next, load sample data X. The sample data is a list of lists where each list is an array containing a processed sample. The data for the tutorial is in a nice tidy numpy zip file, however, most targets will require additional processing to get X.

Paste the __init__ function below in the target class.

    def __init__(self):
        self.clip_values = (0, 255)
        with open(self.model_endpoint, "rb") as f:
            self.model = pickle.load(f)

        self.data_file = "counterfit/targets/tutorial/mnist_784.npz"
        self.sample_data = np.load(self.data_file, allow_pickle=True)

        self.X = self.sample_data["X"]
  1. Excellent, we now have samples and a model to attack. Next, we will build the __call__ function, Counterfit will use this function to submit samples to the target model via x. x is a perturbed sample of shape (Batch, Channels, Height, Width). Channels, Height, and Width are derived from the model_input_shape that was defined earlier. Functionally, x is a list of lists, where each list is a sample of shape (1, 28, 28). This should sound familiar as it is the same shape as X and model_input_shape. Paste the following code below the __init__ function.
    def __call__(self, x):
        scores = self.model.predict_proba(x.reshape(x.shape[0], -1))
        return scores.tolist()

Note: A crucial piece to the __call__ function is how scores are returned to the attack algorithm. The must return a list of probabilities. An attack algorithm uses the returned scores to inform how to change the sample for the next iteration of the attack. In this tutorial the pre-trained MNIST model returns exactly what is needed, which is a list of probabilities for each label.

  1. Alright, the new target is almost ready. Add the following imports to the top on the file, import pickle, import numpy as np. Next, execute the reload command to load the updated target into the session. The __init__ function is called on reload or interact. There will be a warnings that are not suppressed to keep the target code clean - you can safely ignore them. , The final target should look like below,
# Generated by counterfit #

import pickle
import numpy as np
from counterfit.core.targets import ArtTarget

class Mnist(ArtTarget):
    model_name = "mnist"
    model_data_type = "image"
    model_endpoint = "counterfit/targets/tutorial/mnist_sklearn_pipeline.pkl"
    model_input_shape = (1, 28, 28)
    model_output_classes = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
    X = []

    def __init__(self):
        self.clip_values = (0, 255)
        with open(self.model_endpoint, "rb") as f:
            self.model = pickle.load(f)

        self.data_file = "counterfit/targets/tutorial/mnist_784.npz"
        self.sample_data = np.load(self.data_file, allow_pickle=True)

        self.X = self.sample_data["X"]

    def __call__(self, x):
        scores = self.model.predict_proba(x.reshape(x.shape[0], -1))
        return scores.tolist()
  1. To test the functionality of the target, execute the predict function.
mnist> predict

 [!] No index sample, setting random index.

                                                                                Output Scores
Sample                                                                  ['0' '1' '2' '3' '4' '5' '6'
Index                                Sample                                      '7' '8' '9']
   65923                                     mnist-sample-46600446.png  [0.000 0.000 0.000 0.000 1.000
                                                                        0.000 0.000 0.000 0.000 0.000]

  1. Excellent. We are ready to run attacks on the MNIST target. List the frameworks, load art, and then list the available attacks. Attacks are filtered based on the model_data_type defined in the target class.
counterfit> list frameworks

Framework             # of Attacks
art                   7
textattack            11

counterfit> load art

[+] Framework loaded successfully!

counterfit> list attacks

Name                       Type             Category         Tags             Framework
boundary                   evasion          blackbox         image, numpy     art
hop_skip_jump              evasion          blackbox         image, numpy     art
pixel                      evasion          blackbox         image            art
spatial_transformation     evasion          blackbox         image            art
square                     evasion          blackbox         image            art
threshold                  evasion          blackbox         image            art
zoo                        evasion          blackbox         image, numpy     art

  1. Add an attack to the pipeline by executing use hop_skip_jump.
mnist> use hop_skip_jump

[+] Using hop_skip_jump c596a8f3

  1. Finally, start the attack with run.
mnist>hop_skip_jump> run

[+] Running hop_skip_jump on mnist

HopSkipJump: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:32<00:00, 32.74s/it]

[+] 1/1 succeeded

     Sample Index      Label (conf)     Attack Label (conf)  % Eucl. dist.  Elapsed Time [sec]    Queries (rate)    Attack Input
1.               0          5 (0.9990)           3 (0.6320)       0.02039%                32.8       24548 (749.4   counterfit/ta
                                                                                                        query/sec)  rgets/mnist/r

  1. Alternatively, run multiple attacks with scan. Issue the back command to exit the active attack. Then use scan,
mnist>hop_skip_jump> scan --iterations 2 --attack hop_skip_jump

[+] Running these attacks 2x each:

[+] Using hop_skip_jump f36ede50

HopSkipJump: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.07s/it]

[+] Using hop_skip_jump 196a6995

HopSkipJump: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 16.58it/s]


                                            Time[sec]        Queries             Best Score
Attack Name      Total Runs  Successes (%)  (min/avg/max)    (min/avg/max)       (attack_id)                Best Parameters
  hop_skip_jump           2      1 (50.0%)    0.1/ 2.1/ 4.1      51/ 1746/ 3441   1.0 (f36ede50)               init_eval=78

  1. Save the results with save.
mnist>hop_skip_jump> save

[+] Successfully wrote counterfit/targets/mnist/results/mnist_9f806eec.json
