Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models

We report the bias correlations that we find for 5 cutting-edge LLMs (GPT-4, GPT-3.5, Mistral-7B, LLama2-13B, Google Palm). This dataset can be used as a benchmark to evaluate progress in more generalized biases, and the templating technique can be used to expand the benchmark with minimal additional human annotation. Moreover, this benchmark can also be used in any other models as below.

This benchmark can be executed using models from API providers or your local computer

🔗 Paper

You can find the paper related to this work here

Dataset

See the dataset.csv in the data folder
Dataset Description

Run the this benchmark with API Providers

This method is based on litellm, a library that calls all llm APIs easily. Read more about the docs at litellm.ai.

1. Create a virtual environment (Optional but Recommended)

macOS and Linux

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Windows

python -m venv venv
source venv/Scripts/activate
pip install -r requirements.txt

2. Choose your API Provider and supported model here

Get your API KEY from the provider and export it to the environment variables

Example: Groq and llama-2-70b-4096

export GROQ_API_KEY=xxx
export MODEL=groq/llama2-70b-4096

Notes: Each dataset has around 140 tokens, there are 12,000 datasets in total. A 7B model will cost around $0.2, while a 70B model will cost around $2.

3. Run and checkout your report in the reports folder

python main.py --model remote

Run this bias benchmark locally

This method is based on Ollama, a library that helps get up and running with large language models locally. Read more about the docs at olllama.ai and see supported models here.

1. Download Ollama

Ollama
Run a model from the supported models

Examples:

ollama run llama3 # llama3 8B
ollama run gemma # gemma 7B

2. Create a virtual environment (Optional but Recommended)

macOS and Linux

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Windows

python -m venv venv
source venv/Scripts/activate
pip install -r requirements.txt

3. Export environment variables and run

export MODEL=llama3

Run

python main.py --model local

Run faster (if you have a powerful computer)

Follow the guide here to configure the Ollama server.

OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve
python main.py --model local

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models

🔗 Paper

Dataset

Run the this benchmark with API Providers

1. Create a virtual environment (Optional but Recommended)

2. Choose your API Provider and supported model here

3. Run and checkout your report in the reports folder

Run this bias benchmark locally

1. Download Ollama

2. Create a virtual environment (Optional but Recommended)

3. Export environment variables and run

Run

Run faster (if you have a powerful computer)

Files

README.md

Latest commit

History

README.md

File metadata and controls

Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models

🔗 Paper

Dataset

Run the this benchmark with API Providers

1. Create a virtual environment (Optional but Recommended)

2. Choose your API Provider and supported model here

3. Run and checkout your report in the reports folder

Run this bias benchmark locally

1. Download Ollama

2. Create a virtual environment (Optional but Recommended)

3. Export environment variables and run

Run

Run faster (if you have a powerful computer)