Skip to content

Commit

Permalink
Update mimicit_format.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Luodian authored Nov 9, 2023
1 parent 9304944 commit 131a92b
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/mimicit_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

We mainly use one integrate dataset format and we refer it to MIMIC-IT format since. You can convert any of your datasets into the same format like the following mentioned (two files for each dataset).

The mimic-it format contains the following data yaml file. Within this data yaml file, you could assign the path of the instruction json file and the image parquet file, and also the number of samples you want to use. The number of samples within each group will be uniformly sampled, and the `number_samples / total_numbers`` will decide sampling ratio of each dataset.
We use the following data yaml file to indicate the data group and dataset we used in training. Within this data yaml file, for each dataset, you could assign the path of the instruction json file and the image parquet file, and also the number of samples you want to use. The number of samples within each group will be uniformly sampled, and the `number_samples / total_numbers`` will decide sampling ratio of each dataset.

```yaml
IMAGE_TEXT: # Group name should be in [IMAGE_TEXT, TEXT_ONLY, IMAGE_TEXT_IN_CONTEXT]
Expand Down

0 comments on commit 131a92b

Please sign in to comment.