Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness

Official PyTorch implementation of the paper Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness by Qi Zhang, Yifei Wang*, Jingyi Cui, Xiang Pan, Qi Lei, Stefanie Jegelka, Yisen Wang

TLDR

This work challenges the common “accuracy-interpretability” tradeoff by demonstrating the potential of feature monosemanticity to bring clear gains in model accuracy. These gains manifest themselves in various aspects of “learning robustness” that we can think of: input noise, label noise, out-of-domain data, few-shot image data, and few-shot language data. The diverse set of evidence strongly indicates that feature monosemanticity provides a general sense of robustness compared to polysemantic features,

Installation

The codebase is built upon a previous version of solo-learn (the version on Sep 27, 2022). To avoid unexpected errors, first create a Python3.8 environment, and then install the reposoity as below.

# clone the repository
git clone https://github.com/PKU-ML/non_neg
# create environment
conda create -n non_neg python=3.8
conda activate non_neg
# install dependencies
cd non_neg
pip3 install .[dali,umap,h5] --extra-index-url https://developer.download.nvidia.com/compute/redist --extra-index-url https://download.pytorch.org/whl/cu113

Obtain Polysemantic/Monosemantic Representations

To attain feature monosemanticity, we conisder a intrinsic method non-negative contrastive learning (NCL) and a post-hoc method sparse autoencoder (SAE). Pretrain with the default configuration files using the following command.

CIFAR-100

# SimCLR (Poly)
python3 main_pretrain.py \
    --config-path scripts/pretrain/cifar \
    --config-name simclr.yaml
# NCL (Mono)
python3 main_pretrain.py \
    --config-path scripts/pretrain/cifar \
    --config-name ncl.yaml

# SAE (Mono)
python3 main_sparse.py \
    --config-path scripts/pretrain/cifar \
    --config-name sae.yaml

ImageNet-100

# SimCLR (Poly)
python3 main_pretrain.py \
    --config-path scripts/pretrain/imagenet-100 \
    --config-name simclr.yaml
# NCL (Mono)
python3 main_pretrain.py \
    --config-path scripts/pretrain/imagenet-100 \
    --config-name ncl.yaml

# SAE (Mono)
python3 main_sparse.py \
    --config-path scripts/pretrain/imagenet-100 \
    --config-name sae.yaml

Noisy Linear Probing

After that, for linear evaluation with different noises, run the following command:

# Simclr (Mono)
python3 main_linear.py \
    --config-path scripts/linear/{dataset} \
    --config-name simclr_clean.yaml (simclr_label_noise.yaml, simclr_gaussian_noise.yaml, simclr_uniform_noise.yaml) \
    pretrained_feature_extractor=path/to/pretrained/feature/extractor

# NCL (Mono)
python3 main_linear.py \
    --config-path scripts/linear/{dataset} \
    --config-name ncl_clean.yaml (ncl_label_noise.yaml, ncl_gaussian_noise.yaml, ncl_uniform_noise.yaml) \
    pretrained_feature_extractor=path/to/pretrained/feature/extractor

# SAE (Mono)
python3 main_linear.py \
    --config-path scripts/linear/{dataset} \
    --config-name sae_clean.yaml (sae_label_noise.yaml, sae_gaussian_noise.yaml, sae_uniform_noise.yaml) \
    pretrained_feature_extractor=path/to/pretrained/feature/extractor

Here dataset={cifar,imagenet100}. We use the argument pretrained_feature_extractor to configure the path of the pretrained checkpoints. We apply different noises (label noise, uniform and gaussain input noise) in different scripts.

Full Noisy Finetuning

And for noisy fine-tuning evaluation, run the following command:

# Simclr (Mono)
python3 main_linear.py \
    --config-path scripts/finetuning/{dataset} \
    --config-name simclr_clean.yaml (simclr_few_shot.yaml, simclr_label_noise.yaml) \
    pretrained_feature_extractor=path/to/pretrained/feature/extractor

# NCL (Mono)
python3 main_linear.py \
    --config-path scripts/finetuning/{dataset} \
    --config-name ncl_clean.yaml (ncl_label_noise.yaml, ncl_few_shot.yaml) \
    pretrained_feature_extractor=path/to/pretrained/feature/extractor

# SAE (Mono)
python3 main_linear.py \
    --config-path scripts/finetuning/{dataset} \
    --config-name sae_clean.yaml (sae_label_noise.yaml, sae_few_shot.yaml) \
    pretrained_feature_extractor=path/to/pretrained/feature/extractor

Acknowledgement

Our codes borrow the implementations of SimCLR in the solo-learn repository: https://github.com/vturrisi/solo-learn

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
scripts		scripts
solo		solo
README.md		README.md
main_linear.py		main_linear.py
main_pretrain.py		main_pretrain.py
main_sparse.py		main_sparse.py
requirements.txt		requirements.txt
results.png		results.png
setup.py		setup.py
visualization.png		visualization.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness

TLDR

Installation

Obtain Polysemantic/Monosemantic Representations

CIFAR-100

ImageNet-100

Noisy Linear Probing

Full Noisy Finetuning

Acknowledgement

About

Releases

Packages

Languages

PKU-ML/Beyond_Interpretability

Folders and files

Latest commit

History

Repository files navigation

Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness

TLDR

Installation

Obtain Polysemantic/Monosemantic Representations

CIFAR-100

ImageNet-100

Noisy Linear Probing

Full Noisy Finetuning

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages