While most scripts were made to finetune on instruction datasets, it is possible to finetune on any dataset. This is useful for experimentation while not being as expensive as training a full model.
This guide is only to prepare the finetuning, as either LoRA or Adapter-v1 methods support this dataset type!
-
Gather your text into an input file named
input.txt
-
Divide the data into training and validation sets using the following script:
python scripts/prepare_any_text.py
-
Modify relevant scripts for your finetuning method under
finetune/
andevaluate/
, setting theinstruction_tuning
variable toFalse
And then you're set! Proceed to run the LoRA guide or Adapter v1 guide.