Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

This notebook provides examples of using representation engineering techniques from the paper to detect and mitigate bias in large language models. It loads a pretrained LLaMA and pipelines for representation reading and control. On a bias dataset, it shows how representation directions can be identified that correlate with race and gender. Then it demonstrates using representation control to make an LLaMA's outputs more fair and unbiased. For example, it generates clinical vignettes with more equal gender representation compared to the unconstrained model. Overall, this shows how the representation analysis and control methods from the paper can give us handles to understand and improve fairness and bias issues in LLMAs.

For more details, please check out section 6.3 of our RepE paper.