Reproduce Constitutional AI Steps #2

Mistobaan · 2023-04-13T18:39:26Z

Overview

This issue captures some of the key steps required to reproduce the Constitutional AI paper steps to fine tune a RLHF model with feedback generated by a RLAIF model.

Phase One

Gather a dataset of harmful prompts
Create a base script to compose prompts using a base constitution
Generate a new dataset of prompts + responses using Carper's GPT-J RLHF to review / critique the output
Fine-tune the original model on revised responses using supervised learning

Phase Two

Sample the fine tuned model using the dataset of harmful prompts to create a new dataset with multiple outputs
Train a "reward model' (i.e. https://github.com/Dahoas/reward-modeling) to select the best result (fine tuned preference model)
Use RLAIF training to fine tune the RLHF model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce Constitutional AI Steps #2

Reproduce Constitutional AI Steps #2

Mistobaan commented Apr 13, 2023

Reproduce Constitutional AI Steps #2

Reproduce Constitutional AI Steps #2

Comments

Mistobaan commented Apr 13, 2023

Overview

Phase One

Phase Two