Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenizing Dataset Fails with newline or index error #241

Open
leetfin opened this issue Nov 9, 2021 · 0 comments
Open

Tokenizing Dataset Fails with newline or index error #241

leetfin opened this issue Nov 9, 2021 · 0 comments

Comments

@leetfin
Copy link

leetfin commented Nov 9, 2021

When trying to tokenize a dataset, it fails with either the error
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
or one about list index out of range.
Running the newest version of the Colab notebook and this happens with both GPT-2 and GPT-Neo.

Please let me know what info is needed or what I can try to fix this.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant