Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about input data format #1

Open
refreshalways opened this issue Jun 29, 2018 · 3 comments
Open

Question about input data format #1

refreshalways opened this issue Jun 29, 2018 · 3 comments

Comments

@refreshalways
Copy link

  1. What's the format for the input of prepare_data.py?
  2. difference between vocabulary.txt (prepare_data.py) and rg_vocab.txt (HRED.py) ?

Thanks.

@shawnspace
Copy link
Owner

There is some mismatch between the files I uploaded because I modify them when conducting my experiments.

I have uploaded my latest version and a lot of lines have been changed. Please check them again.

The input file "dialog.txt' in prepare_context_RG_data.py is like:

q1\ta1\tq2\ta2\n
q1\ta1\tq2\ta2\n
...

For each line, there are several utterances (like q1, a1 here) and you can split them by '\t'. For each utterance, I have already conducted word tokenization and you can split each utterance by whitespace to get each word token.

To train the model, you need to firstly use prepare_context_RG_data.py to generate the .tfrecords files. Then you can use train.py to train the model.

Hope this helps you

@refreshalways
Copy link
Author

Thank you for prompt reply.
It does not seem to allow format like "q1\ta1\q2\n", any suggestion?

@shawnspace
Copy link
Owner

I am not sure what issue with your code. Maybe you could post more information about why "it doesn't allow". This format works for me on my machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants