-
Notifications
You must be signed in to change notification settings - Fork 130
Tutorials
Counterfit works to keep the target in focus for the user and tries to provide a uniform interface from which to use the underlying frameworks. However, understanding how to build a class is important for successful use. For a warmup, we will build a target for everyone’s favorite ML model, MNIST.
- Start Counterfit and execute the new command. Enter a name and select images as the data type.
- Find the new target folder in counterfit/targets, and open the new target python file in your preferred code editor.
- Fill out the required target properties.
- model_name are taken care of
- model_data_type was taken care of by the new command.
- model_endpoint is where Counterfit will collect outputs from the target model. We will use the "mnist_sklearn_pipeline.pkl" pre-trained model found in the tutorial folder.
- model_input_shape is the input shape of the target model, which is a known "(1, 28, 28)"
-
model_output_classes
are the output classes of the model, which is a known "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]" After filling in the blanks with the information above, theTutorial
target should look like the following,
from counterfit.base import AbstractTarget
class Tutorial(AbstractTarget):
model_name = 'tutorial'
model_data_type = 'numpy'
model_endpoint = 'counterfit/targets/tutorial/mnist_sklearn_pipeline.pkl'
model_input_shape = (1, 28, 28)
model_output_classes = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
With the required properties in place, we can start implementing functionality, starting with loading resources.
3. Paste the __init__
function below in the Tutorial
class.
The __init__
function is responsible for loading resources required for attacking a target model. This includes sample data (X
), models, or anything else you may need. At a minuimum Counterfit expects X
to be set.
def __init__(self):
with open(self.model_endpoint, "rb") as f:
self.model = pickle.load(f)
self.data_file = 'counterfit/targets/tutorial/mnist_784.npz'
self.sample_data = np.load(self.data_file, allow_pickle=True)
self.X = self.sample_data['X']
Because this is a local model, we first load the model to expose the predict
function that Counterfit will use to interact with the target model. Next, we load sample data, X
is a list of lists, where each list is an array containing a processed sample. The data for the tutorial is in a nice tidy numpy zip file, however, most targets will require additional processing to get X
.
Excellent, we now have samples, labels, and a model to attack. Next, we will build the function Counterfit will use to submit samples to the target model.
4.Paste the following code below the __init__
function.
def __call__(self, x):
scores = self.model.predict_proba(x)
return (scores.tolist())
During attack runtime an attack algorithm will use the __call__
function to submit samples to the target model via x
. x
is a perturbed sample of shape (Batch, Channels, Height, Width)
. Channels, Height, and Width are derived from the model_input_shape
that was defined earlier.
What is "Batch" and why only 1? A batch is the number of samples being submitted to the target model. Conventionally, attack algorithms work on multiple samples at the same time, however, Counterfit was designed for penetration testing and red teaming where submitting large amounts of data by default is not good practice. So for now, Counterfit uses a batch of 1.
Functionally, x
is a list of lists, where each list is a sample of shape (1, 28, 28)
. This should sound familiar as it is the shape as X
and model_input_shape
!
Inside the __call__
function x
is passed to the model for prediction and the resulting output is collected. Although, currently there's a problem with our __call__
function, do you see it? The pre-trained MNIST model uses an input shape of (1, 784)
and not (1, 28, 28)
. Before submitting our input to the model, we need to reshape the array. Alternatively, you could define model_input_shape
as (1, 784)
.
5.Update the __call__
function to reshape x
to (1, 784)
.
def __call__(self, x):
scores = self.model.predict_proba(x.reshape(x.shape[0], -1))
return (scores.tolist())
A crucial piece to the __call__
function is how we return scores to the attack algorithm. An attack algorithm uses the returned scores to inform how to change the sample for the next iteration of the attack. In this tutorial the pre-trained MNIST model returns exactly what is needed, which is an array of proabilities for each label. When attacking models in the wild, an understanding how to construct these yourself is important and is covered in the next tutorial.
6. Paste the following code into the Template
class.
def outputs_to_labels(self, output):
output = np.atleast_2d(output)
return [self.model_output_classes[i] for i in np.argmax(output, axis=1)]
Finally, the last function, outputs_to_labels
. This function is used to match the outputs from the __call__
function to the model_output_classes
defined in the class properties. This function is used to check for success - especially in targeted attacks. You need to ensure that the right output is matched to the right label.
Alright, we're ready to attack our MNIST model. The final Template target should look like,
import pickle
import numpy as np
from core.state import AbstractTarget
class Tutorial(AbstractTarget):
model_name = 'tutorial'
model_data_type = 'numpy'
model_endpoint = 'counterfit/targets/tutorial/mnist_sklearn_pipeline.pkl'
model_input_shape = (1, 28, 28)
model_output_classes = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
def __init__(self):
with open(self.model_endpoint, "rb") as f:
self.model = pickle.load(f)
self.data_file = 'counterfit/targets/tutorial/mnist_784.npz'
self.sample_data = np.load(self.data_file, allow_pickle=True)
self.X = self.sample_data['X']
def __call__(self, x):
scores = self.model.predict_proba(x.reshape(x.shape[0], -1))
return (scores.tolist())
def outputs_to_labels(self, output):
output = np.atleast_2d(output)
return [self.model_output_classes[i] for i in np.argmax(output, axis=1)]
7. Start Counterfit and list targets with list targets
If this is your first time running Counterfit it might take a minute to start. Hopefully tutorial
is in the list of targets. Along with the target name, other properties from the Tutorial
class are highlighted.
8. Explore and load available frameworks Counterfit wraps existing Adversarial frameworks which can be heavy to load. To keep a reasonable startup time frameworks are not loaded on start. After a framework has been loaded, Counterfit can use the algorithms in the framework.
9. Prepare the tutorial model to attack
Start by interacting with a target, you will see a bunch of warnings that we're not supressing to keep the target code clean. After listing the attacks while interacting with a target you will notice a smaller list of available attacks. Attacks are filtered based on the model_data_type
defined in the Tutorial
class.
`tutorial> scan --verbose --iterations 10 --log --attack hop_skip_jump`
11. Run a targeted attack.
The previous example showed how to run an untargeted attack. An untargeted attack attempts to change the label to anything other than the original label. A targeted attack attempts to change a label to a specific label.
12. Review and save the results.
After all that hard work it is time to view the results.