representation-engineering/examples/fairness at main · andyzoujm/representation-engineering

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
fairness.ipynb		fairness.ipynb
utils.py		utils.py

README.md

This notebook provides examples of using representation engineering techniques from the paper to detect and mitigate bias in large language models. It loads a pretrained LLaMA and pipelines for representation reading and control. On a bias dataset, it shows how representation directions can be identified that correlate with race and gender. Then it demonstrates using representation control to make an LLaMA's outputs more fair and unbiased. For example, it generates clinical vignettes with more equal gender representation compared to the unconstrained model. Overall, this shows how the representation analysis and control methods from the paper can give us handles to understand and improve fairness and bias issues in LLMAs.

For more details, please check out section 6.3 of our RepE paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fairness

fairness

README.md

Files

fairness

Directory actions

More options

Directory actions

More options

Latest commit

History

fairness

Folders and files

parent directory

README.md