Finetuning on an unstructured dataset

While most scripts were made to finetune on instruction datasets, it is possible to finetune on any dataset. This is useful for experimentation while not being as expensive as training a full model.

This guide is only to prepare the finetuning, as either LoRA or Adapter-v1 methods support this dataset type!

Preparation

Gather your text into an input file named input.txt
Divide the data into training and validation sets using the following script:
```
python scripts/prepare_any_text.py
```
Modify relevant scripts for your finetuning method under finetune/ and evaluate/, setting the instruction_tuning variable to False

And then you're set! Proceed to run the LoRA guide or Adapter v1 guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unstructured_dataset.md

unstructured_dataset.md

Finetuning on an unstructured dataset

Preparation

Files

unstructured_dataset.md

Latest commit

History

unstructured_dataset.md

File metadata and controls

Finetuning on an unstructured dataset

Preparation