Error with reddit-parser #59

tfg1434 · 2019-09-01T00:49:28Z

I am getting an error while training my own reddit data from this website, https://files.pushshift.io/reddit/comments/
2017-8.

Trying it the first time:
Traceback (most recent call last):
File "reddit_parse.py", line 258, in
main()
File "reddit_parse.py", line 37, in main
parse_main(args)
File "reddit_parse.py", line 91, in parse_main
args.print_subreddit, args.min_conversation_length)
File "reddit_parse.py", line 242, in write_comment_cache
output_file.write(output_string + '\n')
File "reddit_parse.py", line 151, in write
self.file_reference.write(data)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f602' in position 404: character maps to
Second time, I added 'encoding=utf8' to line 151:
Traceback (most recent call last):
File "reddit_parse.py", line 258, in
main()
File "reddit_parse.py", line 37, in main
parse_main(args)
File "reddit_parse.py", line 91, in parse_main
args.print_subreddit, args.min_conversation_length)
File "reddit_parse.py", line 242, in write_comment_cache
output_file.write(output_string + '\n')
File "reddit_parse.py", line 151, in write
self.file_reference.write(data, encoding='utf8')
TypeError: write() takes no keyword arguments
Python 3.6.8
Tensorflow 1.9.0

Could someone please help me?

DSMJR · 2019-09-01T19:30:44Z

I personally could never get it to work, so I ended up just programming my own.

tfg1434 · 2019-09-01T20:38:19Z

Could you send me the code? I have no idea how to do parsing.

DSMJR · 2019-09-01T20:40:52Z

Let me take a look at your data set and I might be able to help you out. My previous parser was for the Cornell corpus but I can look in to making one for your case. I emailed the person on the link you provided so I can find out how that data is structured.

tfg1434 · 2019-09-01T22:07:31Z

Ok. Thanks a lot. You're awesome.

DSMJR · 2019-09-01T22:08:39Z

Thanks, we will keep in touch.

tfg1434 · 2019-09-01T22:09:23Z

May be of some use?
https://www.youtube.com/watch?v=dvOnYLDg8_Y&list=PLQVvvaa0QuDdc2k5dwtDTyT9aCja0on8j

DSMJR · 2019-09-01T22:12:58Z

Thanks, that might help. the data seems to be a similar layout.

tfg1434 · 2019-09-02T00:08:12Z

Might be json format
https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/

DSMJR · 2019-09-02T00:11:57Z

Yep, got that.I just downloaded some testing data to look at and start coding. This parser shouldn't be too complicated.

tfg1434 · 2019-09-02T00:42:42Z

Thanks SO much.

…

On Sun, 1 Sep 2019 at 19:12, DSMJR ***@***.***> wrote: Yep, got that.I just downloaded some testing data to look at and start coding. This parser shouldn't be too complicated. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#59?email_source=notifications&email_token=AML4CLMV7MEJOUR7BU2LPE3QHRK47A5CNFSM4ISVGFXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5UNTKQ#issuecomment-526965162>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AML4CLKLZRFM2TLJQRQCARDQHRK47ANCNFSM4ISVGFXA> .

DSMJR · 2019-09-02T00:43:41Z

No problem.

DSMJR · 2019-09-02T23:24:31Z

Some of the datasets on that website are corrupted after extraction. I didn't find any corrupted ones in the "daily" directory. Done with the parser, if you have any errors with it, do tell. It skips over deleted comments and some with no comments. I also included my cornell corpus parser, if you are interested. And also a link to a pre-parsed reddit dataset.
you're welcome! The parser link will stay open until you mark the issue closed
Don't forget to mark the issue as closed if your problem is solved.👍

Parser ---- https://drive.google.com/file/d/1YgDZrQGJXZybXAo_5_4SZXycBFUJ3jCo/view?usp=sharing

pre-parsed data ---- https://drive.google.com/uc?id=1s77S7COjrb3lOnfqvXYfn7sW_x5U1_l9&export=download

cornell corpus ---- https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html

tfg1434 · 2019-09-04T01:24:33Z

You're the best. What data did you pre-parse?

DSMJR · 2019-09-04T01:27:08Z

It was the pre-parsed data in the README of this project. It seemed to be decent data so I sent it in case you weren't aware of it👍.

tfg1434 · 2019-09-04T01:31:24Z

I'm sorry, I'm getting an error while parsing. It seems to just be a simple unicode decode error;

Traceback (most recent call last):
File "C:/Users/16175/Documents/chatbot-rnn/reddit-parse/parser.py", line 19, in
for i in file: #
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 10: invalid continuation byte

tfg1434 · 2019-09-04T01:32:11Z

I'm parsing RC_2017-08 (around 15 gigs)

DSMJR · 2019-09-04T01:32:30Z

let me take a look

DSMJR · 2019-09-04T01:39:01Z

Strange, I used "RC_2005-12" (because it was the smallest in size and I wanted to get data quickly to test) and I didn't have any problems with it.

tfg1434 · 2019-09-04T01:40:12Z

Huh. I'm going to check out the difference between the files. :\

tfg1434 · 2019-09-04T01:42:52Z

Nevermind. I got it to work by changing the encoding to latin1. I'm not sure if it will cause problems in the future, though. Some people say that some encoding makes the output gibberish. Anyway, new error:

Traceback (most recent call last):
File "C:/Users/16175/Documents/chatbot-rnn/reddit-parse/parser.py", line 30, in
f = f[1] #
IndexError: list index out of range

DSMJR · 2019-09-04T01:44:10Z

add print(f) just above that line to see what data is in the variable.

tfg1434 · 2019-09-04T01:45:54Z

['BZh91AY&SYåØ;\x90\x06Úvß€\x7f\x90\x7fÿÿúÿÿÿÿÿÿÿÿb³Gß[ì0õ\xa0\x1a\x00h\x1a\x00ÑX\x14}\x06ûX\x01è\x00¢|Ã)õ÷O\x1dÜ\x0fT¥\x00\x02¨RJÐ\x00\x00Á4\x03Az,[\x00:\x1e¤\x0e@\x02€h:\x00\x1d"h\x01vtª«°\n']
Traceback (most recent call last):
File "C:/Users/16175/Documents/chatbot-rnn/reddit-parse/parser.py", line 31, in
f = f[1] #
IndexError: list index out of range

I need to get some sleep. See you tomorrow.

DSMJR · 2019-09-04T01:47:53Z

I Think that those characters might be a consequence of changing encoding. I'm downloading your dataset (RC_2017-08) to see. I will try to fix the problem.

DSMJR · 2019-09-04T02:48:23Z

I downloaded the file (RC_2017-08) https://files.pushshift.io/reddit/comments/RC_2017-08.bz2
and no errors. Do we have the same file?

tfg1434 · 2019-09-04T20:44:41Z

Yes, we have the same file. Did you change anything in the code from the zip you sent me?

DSMJR · 2019-09-04T20:46:51Z

No changes.

tfg1434 · 2019-09-04T20:50:43Z

What did you enter for file name?

tfg1434 · 2019-09-04T20:51:07Z

oh, and also. What python are you using?

DSMJR · 2019-09-04T20:51:55Z

I can't find a reason why it doesn't work, it should work. How big is your file extracted and un-extracted?

DSMJR · 2019-09-04T20:52:29Z

file name : "RC_2017-08"

DSMJR · 2019-09-07T19:43:17Z

I think running it over night is a good idea. I can't think of any reason why it wouldn't be working. I think it is just going to take a while to first process your datasets. I am interested to hear what it is doing in the morning. This has been a long issue I am glad I've been able to help so far.

tfg1434 · 2019-09-07T19:48:46Z

You are the reason I didn't just search for some pre parsed data ( ͡° ͜ʖ ͡°)

DSMJR · 2019-09-07T19:51:35Z

Ha. I am happy to help. python is a VERY amazing language.

tfg1434 · 2019-09-07T20:05:05Z

Friend thinks JavaScript is better; writes an alphazero RL bot
"nevermind python is better"

DSMJR · 2019-09-07T20:06:39Z

Javascript is a good language BUT python is way better.

tfg1434 · 2019-09-08T12:15:56Z

Loading vocab file...
Loading sizes file...
Total batch count: 6,156
Building the model
Total trainable parameters: 228,120,704
loading tensor data file 0
Saved model to models/new_save\model.ckpt (epoch fraction 0.000).
Traceback (most recent call last):
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call
return fn(*args)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4096,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: gradients/rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_2_0/gru_cell/MatMul_1_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_2_0/gru_cell/MatMul_1_grad/MatMul_1/StackPopV2, gradients/rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_2_0/gru_cell/Tanh_grad/TanhGrad)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: gradients/rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_0_2/gru_cell/BiasAdd/Enter_grad/b_acc_3/_71 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_765_g...ad/b_acc_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 193, in
main()
File "train.py", line 49, in main
train(args)
File "train.py", line 155, in train
train_loss, state, _, summary = sess.run(outputs, feed)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 900, in run
run_metadata_ptr)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run
run_metadata)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4096,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: gradients/rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_2_0/gru_cell/MatMul_1_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_2_0/gru_cell/MatMul_1_grad/MatMul_1/StackPopV2, gradients/rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_2_0/gru_cell/Tanh_grad/TanhGrad)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: gradients/rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_0_2/gru_cell/BiasAdd/Enter_grad/b_acc_3/_71 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_765_g...ad/b_acc_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'gradients/rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_2_0/gru_cell/MatMul_1_grad/MatMul_1', defined at:
File "train.py", line 193, in
main()
File "train.py", line 49, in main
train(args)
File "train.py", line 88, in train
model = Model(args)
File "C:\Users\16175\Documents\chatbot-rnn\model.py", line 228, in init
grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 532, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 701, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 396, in _MaybeCompile
return grad_fn() # Exit early
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 701, in
lambda: grad_fn(op, *out_grads))
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\math_grad.py", line 1049, in _MatMulGrad
grad_b = gen_math_ops.mat_mul(a, grad, transpose_a=True)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 4567, in mat_mul
name=name)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3414, in create_op
op_def=op_def)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1740, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

...which was originally created as op 'rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_2_0/gru_cell/MatMul_1', defined at:
File "train.py", line 193, in
main()
[elided 1 identical lines from previous traceback]
File "train.py", line 88, in train
model = Model(args)
File "C:\Users\16175\Documents\chatbot-rnn\model.py", line 177, in init
initial_state=self.initial_state, scope='rnnlm')
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\rnn.py", line 618, in dynamic_rnn
dtype=dtype)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\rnn.py", line 815, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3209, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2941, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2878, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3179, in
body = lambda i, lv: (i + 1, orig_body(*lv))
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\rnn.py", line 786, in _time_step
(output, new_state) = call_cell()
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\rnn.py", line 772, in
call_cell = lambda: cell(input_t, state)
File "C:\Users\16175\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 232, in call
return super(RNNCell, self).call(inputs, state)

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: gradients/rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_2_0/gru_cell/MatMul_1_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_2_0/gru_cell/MatMul_1_grad/MatMul_1/StackPopV2, gradients/rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_2_0/gru_cell/Tanh_grad/TanhGrad)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: gradients/rnnlm_1/while/rnnlm/partitioned_multi_rnn_cell/cell_0_2/gru_cell/BiasAdd/Enter_grad/b_acc_3/_71 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_765_g...ad/b_acc_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

tfg1434 · 2019-09-08T12:37:23Z

I put output.txt and output 1.bz2-output 5.bz2 in a reddit folder inside data. Is that correct?

DSMJR · 2019-09-08T12:45:57Z

Yes you had them in the right spot. This OOM error might be an easy fix.

DSMJR · 2019-09-08T12:57:32Z

Show me line 30 and 24 in train.py.

tfg1434 · 2019-09-08T14:04:23Z

30: parser.add_argument('--batch_size', type=int, default=40,

24: parser.add_argument('--num_blocks', type=int, default=2,

DSMJR · 2019-09-08T14:13:12Z

Change line 30 to: parser.add_argument('--batch_size', type=int, default=10,

tfg1434 · 2019-09-08T14:52:29Z

Loading vocab file...
Loading sizes file...
Total batch count: 24,629
Found a previous checkpoint. Overwriting model description arguments to:
model: gru, block_size: 2048, num_blocks: 3, num_layers: 3
Building the model
Total trainable parameters: 228,120,704
Loading saved parameters
Resuming from global epoch fraction 0.000, total trained time: 0:00:00, learning rate: 9.999999747378752e-06
loading tensor data file 0
0 / 24,629 (epoch 0.000 / 50), loss 4.845 (avg 4.845), 9.393s
1 / 24,629 (epoch 0.000 / 50), loss 4.834 (avg 4.840), 1.797s
2 / 24,629 (epoch 0.000 / 50), loss 4.833 (avg 4.837), 1.845s
3 / 24,629 (epoch 0.000 / 50), loss 4.815 (avg 4.832), 1.804s
4 / 24,629 (epoch 0.000 / 50), loss 4.813 (avg 4.828), 1.798s
5 / 24,629 (epoch 0.000 / 50), loss 4.795 (avg 4.822), 1.801s
6 / 24,629 (epoch 0.000 / 50), loss 4.788 (avg 4.818), 1.799s
7 / 24,629 (epoch 0.000 / 50), loss 4.768 (avg 4.811), 1.795s
8 / 24,629 (epoch 0.000 / 50), loss 4.765 (avg 4.806), 1.796s
9 / 24,629 (epoch 0.000 / 50), loss 4.751 (avg 4.801), 1.795s
10 / 24,629 (epoch 0.000 / 50), loss 4.716 (avg 4.793), 1.801s
11 / 24,629 (epoch 0.000 / 50), loss 4.702 (avg 4.786), 1.798s
12 / 24,629 (epoch 0.000 / 50), loss 4.674 (avg 4.777), 1.804s
loading tensor data file 1
Saved model to models/new_save\model.ckpt (epoch fraction 0.001).
Traceback (most recent call last):
File "train.py", line 193, in
main()
File "train.py", line 49, in main
train(args)
File "train.py", line 140, in train
x, y = data_loader.next_batch()
File "C:\Users\16175\Documents\chatbot-rnn\utils.py", line 192, in next_batch
self._load_preprocessed((self.tensor_index + 1) % self.input_file_count)
File "C:\Users\16175\Documents\chatbot-rnn\utils.py", line 154, in _load_preprocessed
ydata[-1] = xdata[0]
IndexError: index 0 is out of bounds for axis 0 with size 0

DSMJR · 2019-09-08T14:56:06Z

In the directory models/new_save delete all the data in that folder.And in the data folder delete all .npz files and all .pkl files. And run it again (it is going to take a while to run like before).

tfg1434 · 2019-09-08T15:24:00Z

This time
Total trainable parameters: 152,080,512
is something wrong?

tfg1434 · 2019-09-08T15:24:32Z

No vocab file found. Preprocessing...
Preprocessing the following files:

data/reddit\output 1.bz2
data/reddit\output 2.bz2
data/reddit\output 3.bz2
data/reddit\output 4.bz2
data/reddit\output 5.bz2
data/reddit\output.txt
Saving vocab file
Saved vocab (vocab size: 128)
Preprocessing file 1/6 (data/reddit\output 1.bz2)... done (0.0 seconds)
Preprocessing file 2/6 (data/reddit\output 2.bz2)... done (0.0 seconds)
Preprocessing file 3/6 (data/reddit\output 3.bz2)... done (0.0 seconds)
Preprocessing file 4/6 (data/reddit\output 4.bz2)... done (0.0 seconds)
Preprocessing file 5/6 (data/reddit\output 5.bz2)... done (0.0 seconds)
Preprocessing file 6/6 (data/reddit\output.txt)... done (1.9 seconds)
Processed input data: 9,831,058 characters loaded (1.9 seconds)
Total batch count: 24,629
Building the model
Total trainable parameters: 152,080,512
loading tensor data file 0
0 / 24,629 (epoch 0.000 / 50), loss 4.853 (avg 4.853), 5.785s
1 / 24,629 (epoch 0.000 / 50), loss 4.843 (avg 4.848), 1.208s
2 / 24,629 (epoch 0.000 / 50), loss 4.838 (avg 4.845), 1.209s
3 / 24,629 (epoch 0.000 / 50), loss 4.825 (avg 4.840), 1.210s
4 / 24,629 (epoch 0.000 / 50), loss 4.816 (avg 4.835), 1.207s
5 / 24,629 (epoch 0.000 / 50), loss 4.802 (avg 4.830), 1.204s
6 / 24,629 (epoch 0.000 / 50), loss 4.789 (avg 4.824), 1.212s
7 / 24,629 (epoch 0.000 / 50), loss 4.778 (avg 4.818), 1.208s
8 / 24,629 (epoch 0.000 / 50), loss 4.765 (avg 4.812), 1.205s
9 / 24,629 (epoch 0.000 / 50), loss 4.754 (avg 4.806), 1.208s
10 / 24,629 (epoch 0.000 / 50), loss 4.731 (avg 4.800), 1.209s
11 / 24,629 (epoch 0.000 / 50), loss 4.714 (avg 4.792), 1.211s
12 / 24,629 (epoch 0.000 / 50), loss 4.694 (avg 4.785), 1.203s
loading tensor data file 1
Saved model to models/new_save\model.ckpt (epoch fraction 0.001).
Traceback (most recent call last):
File "train.py", line 193, in
main()
File "train.py", line 49, in main
train(args)
File "train.py", line 140, in train
x, y = data_loader.next_batch()
File "C:\Users\16175\Documents\chatbot-rnn\utils.py", line 192, in next_batch
self._load_preprocessed((self.tensor_index + 1) % self.input_file_count)
File "C:\Users\16175\Documents\chatbot-rnn\utils.py", line 154, in _load_preprocessed
ydata[-1] = xdata[0]
IndexError: index 0 is out of bounds for axis 0 with size 0

DSMJR · 2019-09-08T15:27:29Z

This time
Total trainable parameters: 152,080,512
is something wrong?

no it's normal.

DSMJR · 2019-09-08T15:28:54Z

And you got rid of all the files that the program made previously?

tfg1434 · 2019-09-08T15:31:01Z

yes.

tfg1434 · 2019-09-08T15:33:37Z

I didn't delete anything and tried again and it works now! :/

tfg1434 · 2019-09-08T15:34:04Z

i'm gonna keep this open until the model is fully trained, k?

DSMJR · 2019-09-08T15:36:11Z

Ok sounds good. I guess it resolved itself. if it is resolved then don't forget to mark the issue closed.

DSMJR · 2019-09-08T15:38:12Z

Glad I could help!

tfg1434 · 2019-09-08T18:07:39Z

How much loss is acceptable?

DSMJR · 2019-09-08T21:10:17Z

depends on how well it does in inference mode. The lower the better.

DSMJR · 2019-09-08T21:26:36Z

Glad I could help! Enjoy.

tfg1434 · 2019-09-21T12:56:58Z

I think it deserves a new issue, but error with training:
#60

tfg1434 closed this as completed Sep 8, 2019

tfg1434 reopened this Sep 21, 2019

tfg1434 closed this as completed Sep 21, 2019

tfg1434 reopened this Sep 21, 2019

Error with reddit-parser #59

Error with reddit-parser #59

Comments

tfg1434 commented Sep 1, 2019

DSMJR commented Sep 1, 2019

tfg1434 commented Sep 1, 2019

DSMJR commented Sep 1, 2019 • edited Loading

tfg1434 commented Sep 1, 2019

DSMJR commented Sep 1, 2019

tfg1434 commented Sep 1, 2019

DSMJR commented Sep 1, 2019

tfg1434 commented Sep 2, 2019

DSMJR commented Sep 2, 2019

tfg1434 commented Sep 2, 2019 via email

DSMJR commented Sep 2, 2019

DSMJR commented Sep 2, 2019 • edited Loading

tfg1434 commented Sep 4, 2019

DSMJR commented Sep 4, 2019 • edited Loading

tfg1434 commented Sep 4, 2019

tfg1434 commented Sep 4, 2019

DSMJR commented Sep 4, 2019

DSMJR commented Sep 4, 2019 • edited Loading

tfg1434 commented Sep 4, 2019

tfg1434 commented Sep 4, 2019

DSMJR commented Sep 4, 2019

tfg1434 commented Sep 4, 2019

DSMJR commented Sep 4, 2019 • edited Loading

DSMJR commented Sep 4, 2019

tfg1434 commented Sep 4, 2019

DSMJR commented Sep 4, 2019

tfg1434 commented Sep 4, 2019

tfg1434 commented Sep 4, 2019

DSMJR commented Sep 4, 2019

DSMJR commented Sep 4, 2019

DSMJR commented Sep 7, 2019

tfg1434 commented Sep 7, 2019

DSMJR commented Sep 7, 2019

tfg1434 commented Sep 7, 2019

DSMJR commented Sep 7, 2019

tfg1434 commented Sep 8, 2019

tfg1434 commented Sep 8, 2019

DSMJR commented Sep 8, 2019

DSMJR commented Sep 8, 2019 • edited Loading

tfg1434 commented Sep 8, 2019

DSMJR commented Sep 8, 2019

tfg1434 commented Sep 8, 2019

DSMJR commented Sep 8, 2019 • edited Loading

tfg1434 commented Sep 8, 2019

tfg1434 commented Sep 8, 2019

DSMJR commented Sep 8, 2019

DSMJR commented Sep 8, 2019

tfg1434 commented Sep 8, 2019

tfg1434 commented Sep 8, 2019

tfg1434 commented Sep 8, 2019

DSMJR commented Sep 8, 2019

DSMJR commented Sep 8, 2019

tfg1434 commented Sep 8, 2019

DSMJR commented Sep 8, 2019

DSMJR commented Sep 8, 2019

tfg1434 commented Sep 21, 2019

DSMJR commented Sep 1, 2019 •

edited

Loading

DSMJR commented Sep 2, 2019 •

edited

Loading

DSMJR commented Sep 4, 2019 •

edited

Loading

DSMJR commented Sep 4, 2019 •

edited

Loading

DSMJR commented Sep 4, 2019 •

edited

Loading

DSMJR commented Sep 8, 2019 •

edited

Loading

DSMJR commented Sep 8, 2019 •

edited

Loading