Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models
We report the bias correlations that we find for 5 cutting-edge LLMs (GPT-4, GPT-3.5, Mistral-7B, LLama2-13B, Google Palm). This dataset can be used as a benchmark to evaluate progress in more generalized biases, and the templating technique can be used to expand the benchmark with minimal additional human annotation. Moreover, this benchmark can also be used in any other models as below.
- This benchmark can be executed using models from API providers or your local computer
🔗 Paper
You can find the paper related to this work here
- See the dataset.csv in the data folder
- Dataset Description
This method is based on litellm, a library that calls all llm APIs easily. Read more about the docs at litellm.ai.
macOS and Linux
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Windows
python -m venv venv
source venv/Scripts/activate
pip install -r requirements.txt
2. Choose your API Provider and supported model here
Get your API KEY from the provider and export it to the environment variables
Example: Groq and llama-2-70b-4096
export GROQ_API_KEY=xxx
export MODEL=groq/llama2-70b-4096
Notes: Each dataset has around 140 tokens, there are 12,000 datasets in total. A 7B model will cost around $0.2, while a 70B model will cost around $2.
3. Run and checkout your report in the reports folder
python main.py --model remote
This method is based on Ollama, a library that helps get up and running with large language models locally. Read more about the docs at olllama.ai and see supported models here.
- Ollama
- Run a model from the supported models
Examples:
ollama run llama3 # llama3 8B
ollama run gemma # gemma 7B
macOS and Linux
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Windows
python -m venv venv
source venv/Scripts/activate
pip install -r requirements.txt
export MODEL=llama3
python main.py --model local
- Follow the guide here to configure the Ollama server.
OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve
python main.py --model local