You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a few folks seeding the entirety of Reddit, but the Reddit Corpus project provides archives of individual subreddits. This gives you the very useful ability to train in a particular domain. Here is a small example: dadjokes2.corpus.zip
The only problem is that they are not in the same format as your reddit_parse.py expects. They are zipped (.zip) in a bundle of five JSON files consisting of:
users.json
conversations.json
corpus.json
index.json
utterances.jsonl
What is the shortest path for converting this to useable training data?
Wes
The text was updated successfully, but these errors were encountered:
There are a few folks seeding the entirety of Reddit, but the Reddit Corpus project provides archives of individual subreddits. This gives you the very useful ability to train in a particular domain. Here is a small example: dadjokes2.corpus.zip
The only problem is that they are not in the same format as your reddit_parse.py expects. They are zipped (.zip) in a bundle of five JSON files consisting of:
What is the shortest path for converting this to useable training data?
Wes
The text was updated successfully, but these errors were encountered: