Skip to content

niizam/4chan-datasets

Repository files navigation

4chan-datasets

Example of how to use:

  1. Clone the HF repo
  2. Use merge.py to merge text files, for example:
python merge.py -d repo/pol -o pol-merged.txt
  1. Then turn pol-merged.txt into json/csv format
python tokenizer.py pol-merged.txt pol-dataset.json
or
python tokenizer.py pol-merged.txt pol-dataset.csv

Releases

No releases published

Packages

No packages published

Languages