Question about input data format #1

refreshalways · 2018-06-29T02:58:38Z

What's the format for the input of prepare_data.py?
difference between vocabulary.txt (prepare_data.py) and rg_vocab.txt (HRED.py) ?

Thanks.

shawnspace · 2018-06-29T03:18:23Z

There is some mismatch between the files I uploaded because I modify them when conducting my experiments.

I have uploaded my latest version and a lot of lines have been changed. Please check them again.

The input file "dialog.txt' in prepare_context_RG_data.py is like:

q1\ta1\tq2\ta2\n
q1\ta1\tq2\ta2\n
...

For each line, there are several utterances (like q1, a1 here) and you can split them by '\t'. For each utterance, I have already conducted word tokenization and you can split each utterance by whitespace to get each word token.

To train the model, you need to firstly use prepare_context_RG_data.py to generate the .tfrecords files. Then you can use train.py to train the model.

Hope this helps you

refreshalways · 2018-06-29T05:55:48Z

Thank you for prompt reply.
It does not seem to allow format like "q1\ta1\q2\n", any suggestion?

shawnspace · 2018-06-30T07:25:20Z

I am not sure what issue with your code. Maybe you could post more information about why "it doesn't allow". This format works for me on my machine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about input data format #1

Question about input data format #1

refreshalways commented Jun 29, 2018

shawnspace commented Jun 29, 2018

refreshalways commented Jun 29, 2018

shawnspace commented Jun 30, 2018

Question about input data format #1

Question about input data format #1

Comments

refreshalways commented Jun 29, 2018

shawnspace commented Jun 29, 2018

refreshalways commented Jun 29, 2018

shawnspace commented Jun 30, 2018