[ACM FAccT 2024] Benchmarking the Fairness of Image Upsampling Methods

This repository contains the code for reproducing the experiments in the paper Benchmarking the Fairness of Image Upsampling Methods by Mike Laszkiewicz, Imant Daunhawer, Julia E. Vogt, Asja Fischer, and Johannes Lederer.

This paper has been accepted at ACM FAccT 2024.

Recent years have witnessed a rapid development of deep generative models for creating synthetic media, such as images and videos. While the practical applications of these models in everyday tasks are enticing, it is crucial to assess the inherent risks regarding their fairness. In this work, we introduce a comprehensive framework for benchmarking the performance and fairness of conditional generative models. We develop a set of metrics—inspired by their supervised fairness counterparts—to evaluate the models on their fairness and diversity. Focusing on the specific application of image upsampling, we create a benchmark covering a wide variety of modern upsampling methods. As part of the benchmark, we introduce UnfairFace, a subset of FairFace that replicates the racial distribution of common large-scale face datasets. Our empirical study highlights the importance of using an unbiased training set and reveals variations in how the algorithms respond to dataset imbalances. Alarmingly, we find that none of the considered methods produces statistically fair and diverse results.

Installation

All experiments were conducted using Python 3.9.18 on an Ubuntu 20.04 machine and the environment can be installed using the provided requirements.txt:

pip install -r requirements.txt

Setup

For setting up the experiments, we must store the

FairFace test dataset and the corresponding labels;
race classifiers trained on FairFace;
outputs of the image upsample algorithms, i.e., the upsampled reconstructions.

1. FairFace Dataset

Based on the original FairFace dataset, we select the test dataset as described in Section 4 of the paper. Furthermore, we require the corresponding labels and the average faces for evaluating the diversity. The processed data can be downloaded here and should be stored according to the following structure:

data
└──fairface
    ├── avg_faces
    ├── avg_noisy_faces
    ├── test_correct_prediction
    └── fairface_label_val.csv

2. Race Classifier

In our experiments, we employ the pre-trained race classifier from the FairFace repository. It should be stored as models/race_prediction/res34_fair_align_multi_7_20190809.pt

3. Upsampled Reconstructions

The upsampled images can be downloaded here. Alternatively, you can recompute the reconstructions on your machine. Below, we provide the upsampling models pre-trained on FairFace and UnfairFace and an explanation on how to deploy these models.

The images must be stored in upsampled_imgs following the given structure:

upsampled_imgs
└──fairface
    └── 4_to_128
        ├── ddrm
        ├── fairpsp
        ...
    └── 4noise_to_128
        ├── ddrm
        ├── fairpsp
        ...
    └── 16_to_128
        ├── ddrm
        ├── fairpsp
        ...
└──unfairface
    └── 4_to_128
        ├── ddrm
        ├── fairpsp
        ...
    └── 4noise_to_128
        ├── ddrm
        ├── fairpsp
        ...
    └── 16_to_128
        ├── ddrm
        ├── fairpsp
        ...

Running the Experiments

All experiments can be reproduced by running the empirical_evaluation.ipynb notebook.

Pretrained Models

In our experiments, we evaluate PULSE, pSp, Fair-pSp, posterior Sampling, and DDRM trained on FairFace and UnfairFace. Specifically, these methodes require

StyleGAN2 backbones for applying PULSE;
the encoders for pSp and Fair-pSp;
the NCSNv2 models for running posterior Sampling ;
and a DDPM backbone for implementing DDRM.

All models can be downloaded here. To deploy these models, we use the official repositories:

PULSE
pSp
Fair-pSp (There is no official repository of Fair-pSp. This repository is our reimplementation of the approach.)
Posterior Sampling
DDRM

UnfairFace

For evaluting your own model using the proposed benchmark, you need to train the upsampling model on FairFace and UnfairFace. UnfairFace is a processed and filtered version of FairFace and can be accessed here. For further details on UnfairFace, we refer to Section 3 of the paper.

Citation

If you find this repository useful, please consider citing the our paper:

@inproceedings{10.1145/3630106.3658921,
author = {Laszkiewicz, Mike and Daunhawer, Imant and Vogt, Julia E. and Fischer, Asja and Lederer, Johannes},
title = {Benchmarking the Fairness of Image Upsampling Methods},
year = {2024},
isbn = {9798400704505},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3630106.3658921},
doi = {10.1145/3630106.3658921},
booktitle = {Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency},
pages = {489–517},
numpages = {29},
keywords = {Computer Vision, Conditional Generative Models, Fairness, Image Upsampling},
location = {Rio de Janeiro, Brazil},
series = {FAccT '24}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
src		src
README.md		README.md
empirical_evaluation.ipynb		empirical_evaluation.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ACM FAccT 2024] Benchmarking the Fairness of Image Upsampling Methods

Installation

Setup

1. FairFace Dataset

2. Race Classifier

3. Upsampled Reconstructions

Running the Experiments

Pretrained Models

UnfairFace

Citation

About

Releases

Packages

Languages

MikeLasz/Benchmarking-Fairness-ImageUpsampling

Folders and files

Latest commit

History

Repository files navigation

[ACM FAccT 2024] Benchmarking the Fairness of Image Upsampling Methods

Installation

Setup

1. FairFace Dataset

2. Race Classifier

3. Upsampled Reconstructions

Running the Experiments

Pretrained Models

UnfairFace

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages