Tokenizing Dataset Fails with newline or index error #241

leetfin · 2021-11-09T18:21:51Z

When trying to tokenize a dataset, it fails with either the error
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
or one about list index out of range.
Running the newest version of the Colab notebook and this happens with both GPT-2 and GPT-Neo.

Please let me know what info is needed or what I can try to fix this.

Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokenizing Dataset Fails with newline or index error #241

Tokenizing Dataset Fails with newline or index error #241

leetfin commented Nov 9, 2021

Tokenizing Dataset Fails with newline or index error #241

Tokenizing Dataset Fails with newline or index error #241

Comments

leetfin commented Nov 9, 2021