-
Notifications
You must be signed in to change notification settings - Fork 1
/
raw_annotated_data.csv
We can't make this file beautiful and searchable because it's too large.
6251 lines (6251 loc) · 998 KB
/
raw_annotated_data.csv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
ReviewId,SentenceId,TurkerId,Text,Label1,Label2
r102,s0,t31,"1. The paper misses some more recent reference, e.g. [a,b].",actionable,suggestion
r102,s1,t31,"2. Indeed, AlexNet is a good seedbed to test binary methods.",actionable,suggestion
r102,s2,t31,"So, I wish to see a section on testing with Resnet and GoogleNet.",actionable,suggestion
r102,s3,t31,"Indeed, the authors have commented: ""AlexNet with batch-normalization (AlexNet-BN) is the standard model ... acceptance that improvements made to accuracy transfer well to more modern architectures.""",non_actionable,fact
r102,s4,t31,3. The paper wants to find a good trade-off on speed and accuracy.,non_actionable,fact
r102,s5,t31,"The authors have plotted such trade-off on space v.s. accuracy in Figure 3(b), then how about speed v.s. accuracy?",actionable,suggestion
r102,s6,t31,My concern is that one-bit system is already complicated to implement.,actionable,shortcoming
r102,s7,t31,"Indeed, the authors have discussed their implementation in Section 3.3, so, how their method works in practice?",non_actionable,question
r102,s8,t31,4. Is trade-off between 1 to 2 bits really important?,non_actionable,question
r102,s9,t31,"Compared with 2bits or ternary network, the proposed method at most achieving (1.4/2) compression ratio and (2/1.4) speedup (based on their Table 1).",non_actionable,fact
r102,s0,t23,"1. The paper misses some more recent reference, e.g. [a,b].",actionable,suggestion
r102,s1,t23,"2. Indeed, AlexNet is a good seedbed to test binary methods.",non_actionable,agreement
r102,s2,t23,"So, I wish to see a section on testing with Resnet and GoogleNet.",actionable,suggestion
r102,s3,t23,"Indeed, the authors have commented: ""AlexNet with batch-normalization (AlexNet-BN) is the standard model ... acceptance that improvements made to accuracy transfer well to more modern architectures.""",non_actionable,agreement
r102,s4,t23,3. The paper wants to find a good trade-off on speed and accuracy.,actionable,suggestion
r102,s5,t23,"The authors have plotted such trade-off on space v.s. accuracy in Figure 3(b), then how about speed v.s. accuracy?",actionable,question
r102,s6,t23,My concern is that one-bit system is already complicated to implement.,non_actionable,shortcoming
r102,s7,t23,"Indeed, the authors have discussed their implementation in Section 3.3, so, how their method works in practice?",actionable,question
r102,s8,t23,4. Is trade-off between 1 to 2 bits really important?,actionable,question
r102,s9,t23,"Compared with 2bits or ternary network, the proposed method at most achieving (1.4/2) compression ratio and (2/1.4) speedup (based on their Table 1).",non_actionable,fact
r102,s0,t20,"1. The paper misses some more recent reference, e.g. [a,b].",actionable,shortcoming
r102,s1,t20,"2. Indeed, AlexNet is a good seedbed to test binary methods.",non_actionable,fact
r102,s2,t20,"So, I wish to see a section on testing with Resnet and GoogleNet.",actionable,suggestion
r102,s3,t20,"Indeed, the authors have commented: ""AlexNet with batch-normalization (AlexNet-BN) is the standard model ... acceptance that improvements made to accuracy transfer well to more modern architectures.""",non_actionable,fact
r102,s4,t20,3. The paper wants to find a good trade-off on speed and accuracy.,non_actionable,fact
r102,s5,t20,"The authors have plotted such trade-off on space v.s. accuracy in Figure 3(b), then how about speed v.s. accuracy?",actionable,question
r102,s6,t20,My concern is that one-bit system is already complicated to implement.,non_actionable,shortcoming
r102,s7,t20,"Indeed, the authors have discussed their implementation in Section 3.3, so, how their method works in practice?",actionable,question
r102,s8,t20,4. Is trade-off between 1 to 2 bits really important?,actionable,question
r102,s9,t20,"Compared with 2bits or ternary network, the proposed method at most achieving (1.4/2) compression ratio and (2/1.4) speedup (based on their Table 1).",non_actionable,fact
r102,s0,t9,"1. The paper misses some more recent reference, e.g. [a,b].",actionable,shortcoming
r102,s1,t9,"2. Indeed, AlexNet is a good seedbed to test binary methods.",non_actionable,agreement
r102,s2,t9,"So, I wish to see a section on testing with Resnet and GoogleNet.",actionable,suggestion
r102,s3,t9,"Indeed, the authors have commented: ""AlexNet with batch-normalization (AlexNet-BN) is the standard model ... acceptance that improvements made to accuracy transfer well to more modern architectures.""",non_actionable,agreement
r102,s4,t9,3. The paper wants to find a good trade-off on speed and accuracy.,non_actionable,fact
r102,s5,t9,"The authors have plotted such trade-off on space v.s. accuracy in Figure 3(b), then how about speed v.s. accuracy?",non_actionable,question
r102,s6,t9,My concern is that one-bit system is already complicated to implement.,non_actionable,shortcoming
r102,s7,t9,"Indeed, the authors have discussed their implementation in Section 3.3, so, how their method works in practice?",actionable,question
r102,s8,t9,4. Is trade-off between 1 to 2 bits really important?,actionable,question
r102,s9,t9,"Compared with 2bits or ternary network, the proposed method at most achieving (1.4/2) compression ratio and (2/1.4) speedup (based on their Table 1).",non_actionable,fact
r102,s0,t10,"1. The paper misses some more recent reference, e.g. [a,b].",non_actionable,disagreement
r102,s1,t10,"2. Indeed, AlexNet is a good seedbed to test binary methods.",actionable,agreement
r102,s2,t10,"So, I wish to see a section on testing with Resnet and GoogleNet.",actionable,suggestion
r102,s3,t10,"Indeed, the authors have commented: ""AlexNet with batch-normalization (AlexNet-BN) is the standard model ... acceptance that improvements made to accuracy transfer well to more modern architectures.""",non_actionable,other
r102,s4,t10,3. The paper wants to find a good trade-off on speed and accuracy.,non_actionable,other
r102,s5,t10,"The authors have plotted such trade-off on space v.s. accuracy in Figure 3(b), then how about speed v.s. accuracy?",actionable,suggestion
r102,s6,t10,My concern is that one-bit system is already complicated to implement.,actionable,fact
r102,s7,t10,"Indeed, the authors have discussed their implementation in Section 3.3, so, how their method works in practice?",actionable,question
r102,s8,t10,4. Is trade-off between 1 to 2 bits really important?,actionable,question
r102,s9,t10,"Compared with 2bits or ternary network, the proposed method at most achieving (1.4/2) compression ratio and (2/1.4) speedup (based on their Table 1).",non_actionable,other
r1,s0,t20,"A decent paper with some issues This paper proposes a new output layer in neural networks, which allows them to use logged contextual bandit feedback for training.",non_actionable,agreement
r1,s1,t20,"Others: - The baseline in REINFORCE (Williams'92), which is equivalent to introduced Lagrange multiplier, is well known and well defined as control variate in Monte Carlo simulation, certainly not an ""ad-hoc heuristic"" as claimed in the paper [see Greensmith et al. (2004). Variance Reduction for Gradient Estimates in Reinforcement Learning, JMLR 5.] - Bandit to supervised conversion: please add a supervised baseline system trained just on instances with top feedbacks -- this should be a much more interesting and relevant strong baseline.",actionable,suggestion
r1,s2,t20,There are multiple indications that this bandit-to-supervised baseline is hard to outperform in a number of important applications.,non_actionable,fact
r1,s3,t20,"The control variate of the SNIPS objective can be seen as defining a probability distribution over the log, thus ensuring that for each sample that sample’s delta is multiplied by a value in [0,1] and not by a large importance sampling ratio.",non_actionable,fact
r1,s4,t20,1. IPS for losses<0 and risk minimization: raise the probability of every sample in the log irrespective of the loss itself,actionable,suggestion
r1,s5,t20,- What is the feedback in the CIFAR-10 experiments?,actionable,question
r1,s6,t20,- The claim of Theorem 2 in appendix B does not follow from its proof: what is proven is that the value of S(w) lies in an interval [1-e..1+e] with a certain probability for all w.,non_actionable,fact
r1,s7,t20,"Actually, the proof never makes any connection to optimization.",actionable,shortcoming
r1,s8,t20,"This would contradict some previously established convergence results for this type of problems: Reddi et al. (2016) Stochastic Variance Reduction for Nonconvex Optimization, ICML and Wang et al. 2013.",actionable,shortcoming
r1,s9,t20,"Variance Reduction for Stochastic Gradient Optimization, NIPS.",non_actionable,fact
r1,s0,t31,"A decent paper with some issues This paper proposes a new output layer in neural networks, which allows them to use logged contextual bandit feedback for training.",non_actionable,agreement
r1,s1,t31,"Others: - The baseline in REINFORCE (Williams'92), which is equivalent to introduced Lagrange multiplier, is well known and well defined as control variate in Monte Carlo simulation, certainly not an ""ad-hoc heuristic"" as claimed in the paper [see Greensmith et al. (2004). Variance Reduction for Gradient Estimates in Reinforcement Learning, JMLR 5.] - Bandit to supervised conversion: please add a supervised baseline system trained just on instances with top feedbacks -- this should be a much more interesting and relevant strong baseline.",actionable,shortcoming
r1,s2,t31,There are multiple indications that this bandit-to-supervised baseline is hard to outperform in a number of important applications.,non_actionable,fact
r1,s3,t31,"The control variate of the SNIPS objective can be seen as defining a probability distribution over the log, thus ensuring that for each sample that sample’s delta is multiplied by a value in [0,1] and not by a large importance sampling ratio.",non_actionable,fact
r1,s4,t31,1. IPS for losses<0 and risk minimization: raise the probability of every sample in the log irrespective of the loss itself,non_actionable,fact
r1,s5,t31,- What is the feedback in the CIFAR-10 experiments?,non_actionable,question
r1,s6,t31,- The claim of Theorem 2 in appendix B does not follow from its proof: what is proven is that the value of S(w) lies in an interval [1-e..1+e] with a certain probability for all w.,actionable,shortcoming
r1,s7,t31,"Actually, the proof never makes any connection to optimization.",actionable,shortcoming
r1,s8,t31,"This would contradict some previously established convergence results for this type of problems: Reddi et al. (2016) Stochastic Variance Reduction for Nonconvex Optimization, ICML and Wang et al. 2013.",non_actionable,fact
r1,s9,t31,"Variance Reduction for Stochastic Gradient Optimization, NIPS.",non_actionable,fact
r1,s0,t8,"A decent paper with some issues This paper proposes a new output layer in neural networks, which allows them to use logged contextual bandit feedback for training.",actionable,shortcoming
r1,s1,t8,"Others: - The baseline in REINFORCE (Williams'92), which is equivalent to introduced Lagrange multiplier, is well known and well defined as control variate in Monte Carlo simulation, certainly not an ""ad-hoc heuristic"" as claimed in the paper [see Greensmith et al. (2004). Variance Reduction for Gradient Estimates in Reinforcement Learning, JMLR 5.] - Bandit to supervised conversion: please add a supervised baseline system trained just on instances with top feedbacks -- this should be a much more interesting and relevant strong baseline.",actionable,suggestion
r1,s2,t8,There are multiple indications that this bandit-to-supervised baseline is hard to outperform in a number of important applications.,non_actionable,agreement
r1,s3,t8,"The control variate of the SNIPS objective can be seen as defining a probability distribution over the log, thus ensuring that for each sample that sample’s delta is multiplied by a value in [0,1] and not by a large importance sampling ratio.",non_actionable,fact
r1,s4,t8,1. IPS for losses<0 and risk minimization: raise the probability of every sample in the log irrespective of the loss itself,non_actionable,fact
r1,s5,t8,- What is the feedback in the CIFAR-10 experiments?,non_actionable,question
r1,s6,t8,- The claim of Theorem 2 in appendix B does not follow from its proof: what is proven is that the value of S(w) lies in an interval [1-e..1+e] with a certain probability for all w.,actionable,shortcoming
r1,s7,t8,"Actually, the proof never makes any connection to optimization.",actionable,shortcoming
r1,s8,t8,"This would contradict some previously established convergence results for this type of problems: Reddi et al. (2016) Stochastic Variance Reduction for Nonconvex Optimization, ICML and Wang et al. 2013.",non_actionable,fact
r1,s9,t8,"Variance Reduction for Stochastic Gradient Optimization, NIPS.",non_actionable,other
r1,s0,t16,"A decent paper with some issues This paper proposes a new output layer in neural networks, which allows them to use logged contextual bandit feedback for training.",non_actionable,agreement
r1,s1,t16,"Others: - The baseline in REINFORCE (Williams'92), which is equivalent to introduced Lagrange multiplier, is well known and well defined as control variate in Monte Carlo simulation, certainly not an ""ad-hoc heuristic"" as claimed in the paper [see Greensmith et al. (2004). Variance Reduction for Gradient Estimates in Reinforcement Learning, JMLR 5.] - Bandit to supervised conversion: please add a supervised baseline system trained just on instances with top feedbacks -- this should be a much more interesting and relevant strong baseline.",actionable,suggestion
r1,s2,t16,There are multiple indications that this bandit-to-supervised baseline is hard to outperform in a number of important applications.,non_actionable,fact
r1,s3,t16,"The control variate of the SNIPS objective can be seen as defining a probability distribution over the log, thus ensuring that for each sample that sample’s delta is multiplied by a value in [0,1] and not by a large importance sampling ratio.",non_actionable,fact
r1,s4,t16,1. IPS for losses<0 and risk minimization: raise the probability of every sample in the log irrespective of the loss itself,actionable,suggestion
r1,s5,t16,- What is the feedback in the CIFAR-10 experiments?,actionable,question
r1,s6,t16,- The claim of Theorem 2 in appendix B does not follow from its proof: what is proven is that the value of S(w) lies in an interval [1-e..1+e] with a certain probability for all w.,actionable,shortcoming
r1,s7,t16,"Actually, the proof never makes any connection to optimization.",actionable,shortcoming
r1,s8,t16,"This would contradict some previously established convergence results for this type of problems: Reddi et al. (2016) Stochastic Variance Reduction for Nonconvex Optimization, ICML and Wang et al. 2013.",actionable,disagreement
r1,s9,t16,"Variance Reduction for Stochastic Gradient Optimization, NIPS.",non_actionable,fact
r1,s0,t2,"A decent paper with some issues This paper proposes a new output layer in neural networks, which allows them to use logged contextual bandit feedback for training.",non_actionable,fact
r1,s1,t2,"Others: - The baseline in REINFORCE (Williams'92), which is equivalent to introduced Lagrange multiplier, is well known and well defined as control variate in Monte Carlo simulation, certainly not an ""ad-hoc heuristic"" as claimed in the paper [see Greensmith et al. (2004). Variance Reduction for Gradient Estimates in Reinforcement Learning, JMLR 5.] - Bandit to supervised conversion: please add a supervised baseline system trained just on instances with top feedbacks -- this should be a much more interesting and relevant strong baseline.",actionable,suggestion
r1,s2,t2,There are multiple indications that this bandit-to-supervised baseline is hard to outperform in a number of important applications.,non_actionable,fact
r1,s3,t2,"The control variate of the SNIPS objective can be seen as defining a probability distribution over the log, thus ensuring that for each sample that sample’s delta is multiplied by a value in [0,1] and not by a large importance sampling ratio.",non_actionable,fact
r1,s4,t2,1. IPS for losses<0 and risk minimization: raise the probability of every sample in the log irrespective of the loss itself,non_actionable,other
r1,s5,t2,- What is the feedback in the CIFAR-10 experiments?,actionable,question
r1,s6,t2,- The claim of Theorem 2 in appendix B does not follow from its proof: what is proven is that the value of S(w) lies in an interval [1-e..1+e] with a certain probability for all w.,non_actionable,disagreement
r1,s7,t2,"Actually, the proof never makes any connection to optimization.",non_actionable,shortcoming
r1,s8,t2,"This would contradict some previously established convergence results for this type of problems: Reddi et al. (2016) Stochastic Variance Reduction for Nonconvex Optimization, ICML and Wang et al. 2013.",non_actionable,shortcoming
r1,s9,t2,"Variance Reduction for Stochastic Gradient Optimization, NIPS.",non_actionable,other
r89,s0,t10,A full implementation of binary CNN with code This paper builds on Binary-NET [Hubara et al. 2016] and expands it to CNN architectures.,non_actionable,other
r89,s1,t10,"It also provides optimizations that substantially improve the speed of the forward pass: packing layer bits along the channel dimension, pre-allocation of CUDA resources and binary-optimized CUDA kernels for matrix multiplications.",actionable,agreement
r89,s2,t10,The authors compare their framework to BinaryNET and Nervana/Neon and show a 8x speedup for 8092 matrix-matrix multiplication and a 68x speedup for MLP networks.,non_actionable,other
r89,s3,t10,"For CNN, they a speedup of 5x is obtained from the GPU to binary-optimizimed-GPU.",non_actionable,other
r89,s4,t10,A gain in memory size of 32x is also achieved by using binary weight and activation during the forward pass.,non_actionable,other
r89,s5,t10,The main contribution of this paper is an optimized code for Binary CNN.,actionable,agreement
r89,s6,t10,The authors provide the code with permissive licensing.,non_actionable,other
r89,s7,t10,"As is often the case with such comparisons, it is hard to disentangle from where exactly come the speedups.",actionable,shortcoming
r89,s8,t10,"Overall, i think it makes a good contribution to a field that is gaining importance for mobile and embedded applications of deep convnets.",actionable,agreement
r89,s9,t10,I think it is a good fit for a poster.,actionable,agreement
r89,s0,t9,A full implementation of binary CNN with code This paper builds on Binary-NET [Hubara et al. 2016] and expands it to CNN architectures.,non_actionable,agreement
r89,s1,t9,"It also provides optimizations that substantially improve the speed of the forward pass: packing layer bits along the channel dimension, pre-allocation of CUDA resources and binary-optimized CUDA kernels for matrix multiplications.",non_actionable,agreement
r89,s2,t9,The authors compare their framework to BinaryNET and Nervana/Neon and show a 8x speedup for 8092 matrix-matrix multiplication and a 68x speedup for MLP networks.,non_actionable,fact
r89,s3,t9,"For CNN, they a speedup of 5x is obtained from the GPU to binary-optimizimed-GPU.",non_actionable,fact
r89,s4,t9,A gain in memory size of 32x is also achieved by using binary weight and activation during the forward pass.,non_actionable,fact
r89,s5,t9,The main contribution of this paper is an optimized code for Binary CNN.,non_actionable,agreement
r89,s6,t9,The authors provide the code with permissive licensing.,non_actionable,fact
r89,s7,t9,"As is often the case with such comparisons, it is hard to disentangle from where exactly come the speedups.",non_actionable,fact
r89,s8,t9,"Overall, i think it makes a good contribution to a field that is gaining importance for mobile and embedded applications of deep convnets.",non_actionable,agreement
r89,s9,t9,I think it is a good fit for a poster.,non_actionable,agreement
r89,s0,t31,A full implementation of binary CNN with code This paper builds on Binary-NET [Hubara et al. 2016] and expands it to CNN architectures.,non_actionable,fact
r89,s1,t31,"It also provides optimizations that substantially improve the speed of the forward pass: packing layer bits along the channel dimension, pre-allocation of CUDA resources and binary-optimized CUDA kernels for matrix multiplications.",non_actionable,fact
r89,s2,t31,The authors compare their framework to BinaryNET and Nervana/Neon and show a 8x speedup for 8092 matrix-matrix multiplication and a 68x speedup for MLP networks.,non_actionable,fact
r89,s3,t31,"For CNN, they a speedup of 5x is obtained from the GPU to binary-optimizimed-GPU.",non_actionable,fact
r89,s4,t31,A gain in memory size of 32x is also achieved by using binary weight and activation during the forward pass.,non_actionable,fact
r89,s5,t31,The main contribution of this paper is an optimized code for Binary CNN.,non_actionable,fact
r89,s6,t31,The authors provide the code with permissive licensing.,non_actionable,fact
r89,s7,t31,"As is often the case with such comparisons, it is hard to disentangle from where exactly come the speedups.",non_actionable,fact
r89,s8,t31,"Overall, i think it makes a good contribution to a field that is gaining importance for mobile and embedded applications of deep convnets.",non_actionable,agreement
r89,s9,t31,I think it is a good fit for a poster.,non_actionable,agreement
r89,s0,t16,A full implementation of binary CNN with code This paper builds on Binary-NET [Hubara et al. 2016] and expands it to CNN architectures.,non_actionable,fact
r89,s1,t16,"It also provides optimizations that substantially improve the speed of the forward pass: packing layer bits along the channel dimension, pre-allocation of CUDA resources and binary-optimized CUDA kernels for matrix multiplications.",non_actionable,fact
r89,s2,t16,The authors compare their framework to BinaryNET and Nervana/Neon and show a 8x speedup for 8092 matrix-matrix multiplication and a 68x speedup for MLP networks.,non_actionable,fact
r89,s3,t16,"For CNN, they a speedup of 5x is obtained from the GPU to binary-optimizimed-GPU.",non_actionable,fact
r89,s4,t16,A gain in memory size of 32x is also achieved by using binary weight and activation during the forward pass.,non_actionable,fact
r89,s5,t16,The main contribution of this paper is an optimized code for Binary CNN.,non_actionable,fact
r89,s6,t16,The authors provide the code with permissive licensing.,non_actionable,fact
r89,s7,t16,"As is often the case with such comparisons, it is hard to disentangle from where exactly come the speedups.",actionable,shortcoming
r89,s8,t16,"Overall, i think it makes a good contribution to a field that is gaining importance for mobile and embedded applications of deep convnets.",non_actionable,agreement
r89,s9,t16,I think it is a good fit for a poster.,non_actionable,agreement
r89,s0,t20,A full implementation of binary CNN with code This paper builds on Binary-NET [Hubara et al. 2016] and expands it to CNN architectures.,non_actionable,fact
r89,s1,t20,"It also provides optimizations that substantially improve the speed of the forward pass: packing layer bits along the channel dimension, pre-allocation of CUDA resources and binary-optimized CUDA kernels for matrix multiplications.",non_actionable,fact
r89,s2,t20,The authors compare their framework to BinaryNET and Nervana/Neon and show a 8x speedup for 8092 matrix-matrix multiplication and a 68x speedup for MLP networks.,non_actionable,fact
r89,s3,t20,"For CNN, they a speedup of 5x is obtained from the GPU to binary-optimizimed-GPU.",non_actionable,fact
r89,s4,t20,A gain in memory size of 32x is also achieved by using binary weight and activation during the forward pass.,non_actionable,fact
r89,s5,t20,The main contribution of this paper is an optimized code for Binary CNN.,non_actionable,fact
r89,s6,t20,The authors provide the code with permissive licensing.,non_actionable,fact
r89,s7,t20,"As is often the case with such comparisons, it is hard to disentangle from where exactly come the speedups.",non_actionable,fact
r89,s8,t20,"Overall, i think it makes a good contribution to a field that is gaining importance for mobile and embedded applications of deep convnets.",non_actionable,agreement
r89,s9,t20,I think it is a good fit for a poster.,non_actionable,agreement
r39,s0,t10,"A good approach with some open questions on related work, scalability, and robustness The authors propose an approach for zero-shot visual learning.",actionable,agreement
r39,s1,t10,The robot then uses the learned parametric skill functions to reach goal states (images) provided by the demonstrator.,non_actionable,other
r39,s2,t10,"These topics seem sufficiently related to the proposed approach that the authors should include them in their related work section, and explain the similarities and differences.",actionable,agreement
r39,s3,t10,How much can change between the goal images and the environment before the system fails?,actionable,question
r39,s4,t10,"In the videos, it seems that the people and chairs are always in the same place.",non_actionable,other
r39,s5,t10,I could imagine a network learning to ignore features of objects that tend to wander over time.,actionable,suggestion
r39,s6,t10,The authors should consider exploring and discussing the effects of adding/moving/removing objects on the performance.,actionable,suggestion
r39,s7,t10,The evaluation with the sequence of checkpoints was created by using every fifth image.,non_actionable,other
r39,s8,t10,"In the videos, it seems like the robot could get a slightly better view if it took another couple of steps.",actionable,suggestion
r39,s9,t10,I assume this is an artifact of the way the goal recognizer is trained.,actionable,fact
r39,s0,t31,"A good approach with some open questions on related work, scalability, and robustness The authors propose an approach for zero-shot visual learning.",non_actionable,agreement
r39,s1,t31,The robot then uses the learned parametric skill functions to reach goal states (images) provided by the demonstrator.,non_actionable,fact
r39,s2,t31,"These topics seem sufficiently related to the proposed approach that the authors should include them in their related work section, and explain the similarities and differences.",actionable,suggestion
r39,s3,t31,How much can change between the goal images and the environment before the system fails?,non_actionable,question
r39,s4,t31,"In the videos, it seems that the people and chairs are always in the same place.",non_actionable,fact
r39,s5,t31,I could imagine a network learning to ignore features of objects that tend to wander over time.,non_actionable,fact
r39,s6,t31,The authors should consider exploring and discussing the effects of adding/moving/removing objects on the performance.,actionable,suggestion
r39,s7,t31,The evaluation with the sequence of checkpoints was created by using every fifth image.,non_actionable,fact
r39,s8,t31,"In the videos, it seems like the robot could get a slightly better view if it took another couple of steps.",actionable,suggestion
r39,s9,t31,I assume this is an artifact of the way the goal recognizer is trained.,non_actionable,fact
r39,s0,t2,"A good approach with some open questions on related work, scalability, and robustness The authors propose an approach for zero-shot visual learning.",non_actionable,fact
r39,s1,t2,The robot then uses the learned parametric skill functions to reach goal states (images) provided by the demonstrator.,non_actionable,fact
r39,s2,t2,"These topics seem sufficiently related to the proposed approach that the authors should include them in their related work section, and explain the similarities and differences.",actionable,suggestion
r39,s3,t2,How much can change between the goal images and the environment before the system fails?,actionable,question
r39,s4,t2,"In the videos, it seems that the people and chairs are always in the same place.",non_actionable,fact
r39,s5,t2,I could imagine a network learning to ignore features of objects that tend to wander over time.,non_actionable,fact
r39,s6,t2,The authors should consider exploring and discussing the effects of adding/moving/removing objects on the performance.,actionable,suggestion
r39,s7,t2,The evaluation with the sequence of checkpoints was created by using every fifth image.,non_actionable,fact
r39,s8,t2,"In the videos, it seems like the robot could get a slightly better view if it took another couple of steps.",non_actionable,fact
r39,s9,t2,I assume this is an artifact of the way the goal recognizer is trained.,actionable,shortcoming
r39,s0,t16,"A good approach with some open questions on related work, scalability, and robustness The authors propose an approach for zero-shot visual learning.",non_actionable,agreement
r39,s1,t16,The robot then uses the learned parametric skill functions to reach goal states (images) provided by the demonstrator.,non_actionable,fact
r39,s2,t16,"These topics seem sufficiently related to the proposed approach that the authors should include them in their related work section, and explain the similarities and differences.",non_actionable,agreement
r39,s3,t16,How much can change between the goal images and the environment before the system fails?,non_actionable,question
r39,s4,t16,"In the videos, it seems that the people and chairs are always in the same place.",non_actionable,fact
r39,s5,t16,I could imagine a network learning to ignore features of objects that tend to wander over time.,non_actionable,fact
r39,s6,t16,The authors should consider exploring and discussing the effects of adding/moving/removing objects on the performance.,actionable,suggestion
r39,s7,t16,The evaluation with the sequence of checkpoints was created by using every fifth image.,non_actionable,fact
r39,s8,t16,"In the videos, it seems like the robot could get a slightly better view if it took another couple of steps.",actionable,suggestion
r39,s9,t16,I assume this is an artifact of the way the goal recognizer is trained.,non_actionable,fact
r39,s0,t20,"A good approach with some open questions on related work, scalability, and robustness The authors propose an approach for zero-shot visual learning.",non_actionable,fact
r39,s1,t20,The robot then uses the learned parametric skill functions to reach goal states (images) provided by the demonstrator.,non_actionable,fact
r39,s2,t20,"These topics seem sufficiently related to the proposed approach that the authors should include them in their related work section, and explain the similarities and differences.",non_actionable,agreement
r39,s3,t20,How much can change between the goal images and the environment before the system fails?,actionable,question
r39,s4,t20,"In the videos, it seems that the people and chairs are always in the same place.",non_actionable,shortcoming
r39,s5,t20,I could imagine a network learning to ignore features of objects that tend to wander over time.,actionable,suggestion
r39,s6,t20,The authors should consider exploring and discussing the effects of adding/moving/removing objects on the performance.,actionable,suggestion
r39,s7,t20,The evaluation with the sequence of checkpoints was created by using every fifth image.,non_actionable,fact
r39,s8,t20,"In the videos, it seems like the robot could get a slightly better view if it took another couple of steps.",actionable,suggestion
r39,s9,t20,I assume this is an artifact of the way the goal recognizer is trained.,non_actionable,fact
r90,s0,t2,"A good paper, but it could be better for writing and baseline comparisons This paper studies the problem of text-to-speech synthesis (TTS) ""in the wild"" and proposes to use the shifting buffer memory.",non_actionable,fact
r90,s1,t2,"Specifically, an input text is transformed to phoneme encoding and then context vector is created with attention mechanism.",non_actionable,fact
r90,s2,t2,A novel speaker can be adapted just by fitting it with SGD while fixing all other components.,non_actionable,fact
r90,s3,t2,"In experiments, authors try single-speaker TTS and multi-speaker TTS along with speaker identification (ID), and show that the proposed approach outperforms baselines, namely, Tacotron and Char2wav.",non_actionable,fact
r90,s4,t2,"Finally, they use the challenging Youtube data to train the model and show promising results.",non_actionable,agreement
r90,s5,t2,"3. The proposed approach outperforms baselines in several tasks, and the ability to fit to a novel speaker is nice.",non_actionable,agreement
r90,s6,t2,But there are some issues as well (see Cons.) Cons:,non_actionable,shortcoming
r90,s7,t2,Some notations were not clearly described in the text even though it was in the table.,actionable,shortcoming
r90,s8,t2,"The paper says Deep Voice 2 (Arik et al., 2017a) is only prior work for multi-speaker TTS.",non_actionable,fact
r90,s9,t2,"3. Why do you think your model is better than VCTK test split, and even VCTK85 is better than VCTK101?",actionable,question
r90,s0,t20,"A good paper, but it could be better for writing and baseline comparisons This paper studies the problem of text-to-speech synthesis (TTS) ""in the wild"" and proposes to use the shifting buffer memory.",non_actionable,agreement
r90,s1,t20,"Specifically, an input text is transformed to phoneme encoding and then context vector is created with attention mechanism.",non_actionable,fact
r90,s2,t20,A novel speaker can be adapted just by fitting it with SGD while fixing all other components.,actionable,suggestion
r90,s3,t20,"In experiments, authors try single-speaker TTS and multi-speaker TTS along with speaker identification (ID), and show that the proposed approach outperforms baselines, namely, Tacotron and Char2wav.",non_actionable,fact
r90,s4,t20,"Finally, they use the challenging Youtube data to train the model and show promising results.",non_actionable,fact
r90,s5,t20,"3. The proposed approach outperforms baselines in several tasks, and the ability to fit to a novel speaker is nice.",non_actionable,agreement
r90,s6,t20,But there are some issues as well (see Cons.) Cons:,non_actionable,shortcoming
r90,s7,t20,Some notations were not clearly described in the text even though it was in the table.,actionable,shortcoming
r90,s8,t20,"The paper says Deep Voice 2 (Arik et al., 2017a) is only prior work for multi-speaker TTS.",non_actionable,fact
r90,s9,t20,"3. Why do you think your model is better than VCTK test split, and even VCTK85 is better than VCTK101?",actionable,question
r90,s0,t31,"A good paper, but it could be better for writing and baseline comparisons This paper studies the problem of text-to-speech synthesis (TTS) ""in the wild"" and proposes to use the shifting buffer memory.",actionable,shortcoming
r90,s1,t31,"Specifically, an input text is transformed to phoneme encoding and then context vector is created with attention mechanism.",non_actionable,fact
r90,s2,t31,A novel speaker can be adapted just by fitting it with SGD while fixing all other components.,non_actionable,fact
r90,s3,t31,"In experiments, authors try single-speaker TTS and multi-speaker TTS along with speaker identification (ID), and show that the proposed approach outperforms baselines, namely, Tacotron and Char2wav.",non_actionable,fact
r90,s4,t31,"Finally, they use the challenging Youtube data to train the model and show promising results.",non_actionable,fact
r90,s5,t31,"3. The proposed approach outperforms baselines in several tasks, and the ability to fit to a novel speaker is nice.",non_actionable,agreement
r90,s6,t31,But there are some issues as well (see Cons.) Cons:,actionable,shortcoming
r90,s7,t31,Some notations were not clearly described in the text even though it was in the table.,actionable,shortcoming
r90,s8,t31,"The paper says Deep Voice 2 (Arik et al., 2017a) is only prior work for multi-speaker TTS.",non_actionable,fact
r90,s9,t31,"3. Why do you think your model is better than VCTK test split, and even VCTK85 is better than VCTK101?",non_actionable,question
r90,s0,t10,"A good paper, but it could be better for writing and baseline comparisons This paper studies the problem of text-to-speech synthesis (TTS) ""in the wild"" and proposes to use the shifting buffer memory.",actionable,suggestion
r90,s1,t10,"Specifically, an input text is transformed to phoneme encoding and then context vector is created with attention mechanism.",non_actionable,other
r90,s2,t10,A novel speaker can be adapted just by fitting it with SGD while fixing all other components.,non_actionable,other
r90,s3,t10,"In experiments, authors try single-speaker TTS and multi-speaker TTS along with speaker identification (ID), and show that the proposed approach outperforms baselines, namely, Tacotron and Char2wav.",non_actionable,other
r90,s4,t10,"Finally, they use the challenging Youtube data to train the model and show promising results.",non_actionable,other
r90,s5,t10,"3. The proposed approach outperforms baselines in several tasks, and the ability to fit to a novel speaker is nice.",non_actionable,other
r90,s6,t10,But there are some issues as well (see Cons.) Cons:,actionable,disagreement
r90,s7,t10,Some notations were not clearly described in the text even though it was in the table.,actionable,shortcoming
r90,s8,t10,"The paper says Deep Voice 2 (Arik et al., 2017a) is only prior work for multi-speaker TTS.",non_actionable,other
r90,s9,t10,"3. Why do you think your model is better than VCTK test split, and even VCTK85 is better than VCTK101?",actionable,question
r90,s0,t8,"A good paper, but it could be better for writing and baseline comparisons This paper studies the problem of text-to-speech synthesis (TTS) ""in the wild"" and proposes to use the shifting buffer memory.",actionable,suggestion
r90,s1,t8,"Specifically, an input text is transformed to phoneme encoding and then context vector is created with attention mechanism.",non_actionable,fact
r90,s2,t8,A novel speaker can be adapted just by fitting it with SGD while fixing all other components.,non_actionable,fact
r90,s3,t8,"In experiments, authors try single-speaker TTS and multi-speaker TTS along with speaker identification (ID), and show that the proposed approach outperforms baselines, namely, Tacotron and Char2wav.",non_actionable,fact
r90,s4,t8,"Finally, they use the challenging Youtube data to train the model and show promising results.",non_actionable,agreement
r90,s5,t8,"3. The proposed approach outperforms baselines in several tasks, and the ability to fit to a novel speaker is nice.",non_actionable,agreement
r90,s6,t8,But there are some issues as well (see Cons.) Cons:,actionable,shortcoming
r90,s7,t8,Some notations were not clearly described in the text even though it was in the table.,actionable,shortcoming
r90,s8,t8,"The paper says Deep Voice 2 (Arik et al., 2017a) is only prior work for multi-speaker TTS.",non_actionable,fact
r90,s9,t8,"3. Why do you think your model is better than VCTK test split, and even VCTK85 is better than VCTK101?",non_actionable,question
r97,s0,t2,A new method for weight quantization.,non_actionable,fact
r97,s1,t2,"A step in the right direction, with interesting results, but not a huge level of novelty.",non_actionable,fact
r97,s2,t2,"This paper proposes a new method to train DNNs with quantized weights, by including the quantization as a constraint in a proximal quasi-Newton algorithm, which simultaneously learns a scaling for the quantized values (possibly different for positive and negative weights).",non_actionable,fact
r97,s3,t2,"The paper is very clearly written, and the proposal is very well placed in the context of previous methods for the same purpose.",non_actionable,agreement
r97,s4,t2,The experiments are very clearly presented and solidly designed.,non_actionable,agreement
r97,s5,t2,"In fact, the paper is a somewhat simple extension of the method proposed by Hou, Yao, and Kwok (2017), which is where the novelty resides.",non_actionable,fact
r97,s6,t2,"Consequently, there is not a great degree of novelty in terms of the proposed method, and the results are only slightly better than those of previous methods.",non_actionable,shortcoming
r97,s7,t2,"Finally, in terms of analysis of the algorithm, the authors simply invoke a theorem from Hou, Yao, and Kwok (2017), which claims convergence of the proposed algorithm.",non_actionable,shortcoming
r97,s8,t2,"However, what is shown in that paper is that the sequence of loss function values converges, which does not imply that the sequence of weight estimates also converges, because of the presence of a non-convex constraint ($b_j^t \in Q^{n_l}$).",non_actionable,shortcoming
r97,s9,t2,"This may not be relevant for the practical results, but to be accurate, it can't be simply stated that the algorithm converges, without a more careful analysis.",actionable,shortcoming
r97,s0,t8,A new method for weight quantization.,non_actionable,fact
r97,s1,t8,"A step in the right direction, with interesting results, but not a huge level of novelty.",non_actionable,agreement
r97,s2,t8,"This paper proposes a new method to train DNNs with quantized weights, by including the quantization as a constraint in a proximal quasi-Newton algorithm, which simultaneously learns a scaling for the quantized values (possibly different for positive and negative weights).",non_actionable,fact
r97,s3,t8,"The paper is very clearly written, and the proposal is very well placed in the context of previous methods for the same purpose.",non_actionable,agreement
r97,s4,t8,The experiments are very clearly presented and solidly designed.,non_actionable,agreement
r97,s5,t8,"In fact, the paper is a somewhat simple extension of the method proposed by Hou, Yao, and Kwok (2017), which is where the novelty resides.",non_actionable,fact
r97,s6,t8,"Consequently, there is not a great degree of novelty in terms of the proposed method, and the results are only slightly better than those of previous methods.",actionable,shortcoming
r97,s7,t8,"Finally, in terms of analysis of the algorithm, the authors simply invoke a theorem from Hou, Yao, and Kwok (2017), which claims convergence of the proposed algorithm.",non_actionable,fact
r97,s8,t8,"However, what is shown in that paper is that the sequence of loss function values converges, which does not imply that the sequence of weight estimates also converges, because of the presence of a non-convex constraint ($b_j^t \in Q^{n_l}$).",non_actionable,fact
r97,s9,t8,"This may not be relevant for the practical results, but to be accurate, it can't be simply stated that the algorithm converges, without a more careful analysis.",actionable,shortcoming
r97,s0,t16,A new method for weight quantization.,non_actionable,fact
r97,s1,t16,"A step in the right direction, with interesting results, but not a huge level of novelty.",non_actionable,fact
r97,s2,t16,"This paper proposes a new method to train DNNs with quantized weights, by including the quantization as a constraint in a proximal quasi-Newton algorithm, which simultaneously learns a scaling for the quantized values (possibly different for positive and negative weights).",non_actionable,fact
r97,s3,t16,"The paper is very clearly written, and the proposal is very well placed in the context of previous methods for the same purpose.",non_actionable,agreement
r97,s4,t16,The experiments are very clearly presented and solidly designed.,non_actionable,agreement
r97,s5,t16,"In fact, the paper is a somewhat simple extension of the method proposed by Hou, Yao, and Kwok (2017), which is where the novelty resides.",non_actionable,fact
r97,s6,t16,"Consequently, there is not a great degree of novelty in terms of the proposed method, and the results are only slightly better than those of previous methods.",actionable,shortcoming
r97,s7,t16,"Finally, in terms of analysis of the algorithm, the authors simply invoke a theorem from Hou, Yao, and Kwok (2017), which claims convergence of the proposed algorithm.",actionable,shortcoming
r97,s8,t16,"However, what is shown in that paper is that the sequence of loss function values converges, which does not imply that the sequence of weight estimates also converges, because of the presence of a non-convex constraint ($b_j^t \in Q^{n_l}$).",actionable,shortcoming
r97,s9,t16,"This may not be relevant for the practical results, but to be accurate, it can't be simply stated that the algorithm converges, without a more careful analysis.",actionable,suggestion
r97,s0,t23,A new method for weight quantization.,non_actionable,fact
r97,s1,t23,"A step in the right direction, with interesting results, but not a huge level of novelty.",non_actionable,agreement
r97,s2,t23,"This paper proposes a new method to train DNNs with quantized weights, by including the quantization as a constraint in a proximal quasi-Newton algorithm, which simultaneously learns a scaling for the quantized values (possibly different for positive and negative weights).",non_actionable,fact
r97,s3,t23,"The paper is very clearly written, and the proposal is very well placed in the context of previous methods for the same purpose.",non_actionable,agreement
r97,s4,t23,The experiments are very clearly presented and solidly designed.,non_actionable,agreement
r97,s5,t23,"In fact, the paper is a somewhat simple extension of the method proposed by Hou, Yao, and Kwok (2017), which is where the novelty resides.",non_actionable,fact
r97,s6,t23,"Consequently, there is not a great degree of novelty in terms of the proposed method, and the results are only slightly better than those of previous methods.",non_actionable,shortcoming
r97,s7,t23,"Finally, in terms of analysis of the algorithm, the authors simply invoke a theorem from Hou, Yao, and Kwok (2017), which claims convergence of the proposed algorithm.",actionable,fact
r97,s8,t23,"However, what is shown in that paper is that the sequence of loss function values converges, which does not imply that the sequence of weight estimates also converges, because of the presence of a non-convex constraint ($b_j^t \in Q^{n_l}$).",non_actionable,shortcoming
r97,s9,t23,"This may not be relevant for the practical results, but to be accurate, it can't be simply stated that the algorithm converges, without a more careful analysis.",actionable,suggestion
r97,s0,t20,A new method for weight quantization.,non_actionable,fact
r97,s1,t20,"A step in the right direction, with interesting results, but not a huge level of novelty.",non_actionable,fact
r97,s2,t20,"This paper proposes a new method to train DNNs with quantized weights, by including the quantization as a constraint in a proximal quasi-Newton algorithm, which simultaneously learns a scaling for the quantized values (possibly different for positive and negative weights).",non_actionable,fact
r97,s3,t20,"The paper is very clearly written, and the proposal is very well placed in the context of previous methods for the same purpose.",non_actionable,agreement
r97,s4,t20,The experiments are very clearly presented and solidly designed.,non_actionable,agreement
r97,s5,t20,"In fact, the paper is a somewhat simple extension of the method proposed by Hou, Yao, and Kwok (2017), which is where the novelty resides.",non_actionable,fact
r97,s6,t20,"Consequently, there is not a great degree of novelty in terms of the proposed method, and the results are only slightly better than those of previous methods.",non_actionable,shortcoming
r97,s7,t20,"Finally, in terms of analysis of the algorithm, the authors simply invoke a theorem from Hou, Yao, and Kwok (2017), which claims convergence of the proposed algorithm.",non_actionable,fact
r97,s8,t20,"However, what is shown in that paper is that the sequence of loss function values converges, which does not imply that the sequence of weight estimates also converges, because of the presence of a non-convex constraint ($b_j^t \in Q^{n_l}$).",non_actionable,shortcoming
r97,s9,t20,"This may not be relevant for the practical results, but to be accurate, it can't be simply stated that the algorithm converges, without a more careful analysis.",non_actionable,shortcoming
r14,s0,t31,A paper with interesting ideas but lacking convincing evidence Summary: The authors proposed an unsupervised time series clustering methods built with deep neural networks.,actionable,shortcoming
r14,s1,t31,"First, the encoder employs CNN to shorten the time series and extract local temporal features, and the CNN is followed by bidirectional LSTMs to get the encoded representations.",non_actionable,fact
r14,s2,t31,A temporal clustering model and a DCNN decoder are applied on the encoded representations and jointly trained.,non_actionable,fact
r14,s3,t31,An additional heatmap generator component can be further included in the clustering model.,actionable,suggestion
r14,s4,t31,Detailed comments: The problem of unsupervised time series clustering is important and challenging.,non_actionable,fact
r14,s5,t31,The idea of utilizing deep learning models to learn encoded representations for clustering is interesting and could be a promising solution.,actionable,agreement
r14,s6,t31,"For example, what is the size of each layer and the dimension of the encoded space?",actionable,shortcoming
r14,s7,t31,How does the model combine the heatmap output (which is a sequence of the same length as the time series) and the clustering output (which is a vector of size K) in Figure 1?,actionable,shortcoming
r14,s8,t31,How do we interpret the generated heatmap?,actionable,shortcoming
r14,s9,t31,"For example, in Figure 4, all 4 DTC-methods achieved the best performance on one or two datasets.",actionable,shortcoming
r14,s0,t10,A paper with interesting ideas but lacking convincing evidence Summary: The authors proposed an unsupervised time series clustering methods built with deep neural networks.,actionable,shortcoming
r14,s1,t10,"First, the encoder employs CNN to shorten the time series and extract local temporal features, and the CNN is followed by bidirectional LSTMs to get the encoded representations.",non_actionable,other
r14,s2,t10,A temporal clustering model and a DCNN decoder are applied on the encoded representations and jointly trained.,non_actionable,other
r14,s3,t10,An additional heatmap generator component can be further included in the clustering model.,non_actionable,other
r14,s4,t10,Detailed comments: The problem of unsupervised time series clustering is important and challenging.,actionable,shortcoming
r14,s5,t10,The idea of utilizing deep learning models to learn encoded representations for clustering is interesting and could be a promising solution.,actionable,agreement
r14,s6,t10,"For example, what is the size of each layer and the dimension of the encoded space?",actionable,question
r14,s7,t10,How does the model combine the heatmap output (which is a sequence of the same length as the time series) and the clustering output (which is a vector of size K) in Figure 1?,actionable,question
r14,s8,t10,How do we interpret the generated heatmap?,actionable,question
r14,s9,t10,"For example, in Figure 4, all 4 DTC-methods achieved the best performance on one or two datasets.",non_actionable,other
r14,s0,t2,A paper with interesting ideas but lacking convincing evidence Summary: The authors proposed an unsupervised time series clustering methods built with deep neural networks.,non_actionable,fact
r14,s1,t2,"First, the encoder employs CNN to shorten the time series and extract local temporal features, and the CNN is followed by bidirectional LSTMs to get the encoded representations.",non_actionable,fact
r14,s2,t2,A temporal clustering model and a DCNN decoder are applied on the encoded representations and jointly trained.,non_actionable,fact
r14,s3,t2,An additional heatmap generator component can be further included in the clustering model.,actionable,suggestion
r14,s4,t2,Detailed comments: The problem of unsupervised time series clustering is important and challenging.,non_actionable,fact
r14,s5,t2,The idea of utilizing deep learning models to learn encoded representations for clustering is interesting and could be a promising solution.,non_actionable,agreement
r14,s6,t2,"For example, what is the size of each layer and the dimension of the encoded space?",actionable,question
r14,s7,t2,How does the model combine the heatmap output (which is a sequence of the same length as the time series) and the clustering output (which is a vector of size K) in Figure 1?,actionable,question
r14,s8,t2,How do we interpret the generated heatmap?,actionable,question
r14,s9,t2,"For example, in Figure 4, all 4 DTC-methods achieved the best performance on one or two datasets.",non_actionable,fact
r14,s0,t16,A paper with interesting ideas but lacking convincing evidence Summary: The authors proposed an unsupervised time series clustering methods built with deep neural networks.,actionable,shortcoming
r14,s1,t16,"First, the encoder employs CNN to shorten the time series and extract local temporal features, and the CNN is followed by bidirectional LSTMs to get the encoded representations.",non_actionable,fact
r14,s2,t16,A temporal clustering model and a DCNN decoder are applied on the encoded representations and jointly trained.,non_actionable,fact
r14,s3,t16,An additional heatmap generator component can be further included in the clustering model.,actionable,suggestion
r14,s4,t16,Detailed comments: The problem of unsupervised time series clustering is important and challenging.,non_actionable,fact
r14,s5,t16,The idea of utilizing deep learning models to learn encoded representations for clustering is interesting and could be a promising solution.,non_actionable,agreement
r14,s6,t16,"For example, what is the size of each layer and the dimension of the encoded space?",actionable,question
r14,s7,t16,How does the model combine the heatmap output (which is a sequence of the same length as the time series) and the clustering output (which is a vector of size K) in Figure 1?,actionable,question
r14,s8,t16,How do we interpret the generated heatmap?,actionable,question
r14,s9,t16,"For example, in Figure 4, all 4 DTC-methods achieved the best performance on one or two datasets.",non_actionable,fact
r14,s0,t20,A paper with interesting ideas but lacking convincing evidence Summary: The authors proposed an unsupervised time series clustering methods built with deep neural networks.,non_actionable,fact
r14,s1,t20,"First, the encoder employs CNN to shorten the time series and extract local temporal features, and the CNN is followed by bidirectional LSTMs to get the encoded representations.",non_actionable,fact
r14,s2,t20,A temporal clustering model and a DCNN decoder are applied on the encoded representations and jointly trained.,non_actionable,fact
r14,s3,t20,An additional heatmap generator component can be further included in the clustering model.,actionable,suggestion
r14,s4,t20,Detailed comments: The problem of unsupervised time series clustering is important and challenging.,non_actionable,fact
r14,s5,t20,The idea of utilizing deep learning models to learn encoded representations for clustering is interesting and could be a promising solution.,non_actionable,fact
r14,s6,t20,"For example, what is the size of each layer and the dimension of the encoded space?",actionable,question
r14,s7,t20,How does the model combine the heatmap output (which is a sequence of the same length as the time series) and the clustering output (which is a vector of size K) in Figure 1?,actionable,question
r14,s8,t20,How do we interpret the generated heatmap?,actionable,question
r14,s9,t20,"For example, in Figure 4, all 4 DTC-methods achieved the best performance on one or two datasets.",non_actionable,fact
r112,s0,t8,A promising approach on nonparametric modelling of partial differential equations with deep architectures that requires more details.,actionable,shortcoming
r112,s1,t8,This paper addresses complex dynamical systems modelling through nonparametric Partial Differential Equations using neural architectures.,non_actionable,fact
r112,s2,t8,The most important idea of the papier (PDE-net) is to learn both differential operators and the function that governs the PDE.,non_actionable,fact
r112,s3,t8,"To achieve this goal, the approach relies on the approximation of differential operators by convolution of filters of appropriate order.",non_actionable,fact
r112,s4,t8,This is really the strongest point of the paper.,non_actionable,agreement
r112,s5,t8,"Moreover, a basic system called delta t block implements one level of full approximation and is stoked several times.",non_actionable,fact
r112,s6,t8,"Comments: The paper is badly structured and is sometimes hard to read because it does not present in a linear way the classic ingredients of Machine Learning, expression of the full function to be estimated, equations of each layer, description of the set of parameters to be learned and the loss function.",actionable,shortcoming
r112,s7,t8,"About the loss function, I was surprised not to see a sparsity constraint on the different filters in order to select the order of the differential operators themselves.",actionable,shortcoming
r112,s8,t8,I also found difficult to measure the degree of novelty of the approach considering the recent works and the related work section should have been much more precise in terms of comparison.,actionable,shortcoming
r112,s9,t8,"Finally, I’ve found the paper very interesting and promising but regarding the standard of scientific publication, it requires additional attention to provide a better description the model and discuss the learning scheme to get a strongest and reproducible approach.",actionable,suggestion
r112,s0,t2,A promising approach on nonparametric modelling of partial differential equations with deep architectures that requires more details.,non_actionable,fact
r112,s1,t2,This paper addresses complex dynamical systems modelling through nonparametric Partial Differential Equations using neural architectures.,non_actionable,fact
r112,s2,t2,The most important idea of the papier (PDE-net) is to learn both differential operators and the function that governs the PDE.,non_actionable,fact
r112,s3,t2,"To achieve this goal, the approach relies on the approximation of differential operators by convolution of filters of appropriate order.",non_actionable,fact
r112,s4,t2,This is really the strongest point of the paper.,non_actionable,fact
r112,s5,t2,"Moreover, a basic system called delta t block implements one level of full approximation and is stoked several times.",non_actionable,fact
r112,s6,t2,"Comments: The paper is badly structured and is sometimes hard to read because it does not present in a linear way the classic ingredients of Machine Learning, expression of the full function to be estimated, equations of each layer, description of the set of parameters to be learned and the loss function.",actionable,shortcoming
r112,s7,t2,"About the loss function, I was surprised not to see a sparsity constraint on the different filters in order to select the order of the differential operators themselves.",actionable,shortcoming
r112,s8,t2,I also found difficult to measure the degree of novelty of the approach considering the recent works and the related work section should have been much more precise in terms of comparison.,actionable,shortcoming
r112,s9,t2,"Finally, I’ve found the paper very interesting and promising but regarding the standard of scientific publication, it requires additional attention to provide a better description the model and discuss the learning scheme to get a strongest and reproducible approach.",actionable,suggestion
r112,s0,t10,A promising approach on nonparametric modelling of partial differential equations with deep architectures that requires more details.,actionable,shortcoming
r112,s1,t10,This paper addresses complex dynamical systems modelling through nonparametric Partial Differential Equations using neural architectures.,non_actionable,other
r112,s2,t10,The most important idea of the papier (PDE-net) is to learn both differential operators and the function that governs the PDE.,non_actionable,other
r112,s3,t10,"To achieve this goal, the approach relies on the approximation of differential operators by convolution of filters of appropriate order.",non_actionable,other
r112,s4,t10,This is really the strongest point of the paper.,actionable,agreement
r112,s5,t10,"Moreover, a basic system called delta t block implements one level of full approximation and is stoked several times.",non_actionable,other
r112,s6,t10,"Comments: The paper is badly structured and is sometimes hard to read because it does not present in a linear way the classic ingredients of Machine Learning, expression of the full function to be estimated, equations of each layer, description of the set of parameters to be learned and the loss function.",actionable,shortcoming
r112,s7,t10,"About the loss function, I was surprised not to see a sparsity constraint on the different filters in order to select the order of the differential operators themselves.",actionable,fact
r112,s8,t10,I also found difficult to measure the degree of novelty of the approach considering the recent works and the related work section should have been much more precise in terms of comparison.,actionable,fact
r112,s9,t10,"Finally, I’ve found the paper very interesting and promising but regarding the standard of scientific publication, it requires additional attention to provide a better description the model and discuss the learning scheme to get a strongest and reproducible approach.",actionable,suggestion
r112,s0,t20,A promising approach on nonparametric modelling of partial differential equations with deep architectures that requires more details.,non_actionable,agreement
r112,s1,t20,This paper addresses complex dynamical systems modelling through nonparametric Partial Differential Equations using neural architectures.,non_actionable,fact
r112,s2,t20,The most important idea of the papier (PDE-net) is to learn both differential operators and the function that governs the PDE.,non_actionable,fact
r112,s3,t20,"To achieve this goal, the approach relies on the approximation of differential operators by convolution of filters of appropriate order.",non_actionable,fact
r112,s4,t20,This is really the strongest point of the paper.,non_actionable,agreement
r112,s5,t20,"Moreover, a basic system called delta t block implements one level of full approximation and is stoked several times.",non_actionable,fact
r112,s6,t20,"Comments: The paper is badly structured and is sometimes hard to read because it does not present in a linear way the classic ingredients of Machine Learning, expression of the full function to be estimated, equations of each layer, description of the set of parameters to be learned and the loss function.",actionable,shortcoming
r112,s7,t20,"About the loss function, I was surprised not to see a sparsity constraint on the different filters in order to select the order of the differential operators themselves.",actionable,shortcoming
r112,s8,t20,I also found difficult to measure the degree of novelty of the approach considering the recent works and the related work section should have been much more precise in terms of comparison.,actionable,suggestion
r112,s9,t20,"Finally, I’ve found the paper very interesting and promising but regarding the standard of scientific publication, it requires additional attention to provide a better description the model and discuss the learning scheme to get a strongest and reproducible approach.",actionable,suggestion
r112,s0,t16,A promising approach on nonparametric modelling of partial differential equations with deep architectures that requires more details.,non_actionable,agreement
r112,s1,t16,This paper addresses complex dynamical systems modelling through nonparametric Partial Differential Equations using neural architectures.,non_actionable,fact
r112,s2,t16,The most important idea of the papier (PDE-net) is to learn both differential operators and the function that governs the PDE.,non_actionable,fact
r112,s3,t16,"To achieve this goal, the approach relies on the approximation of differential operators by convolution of filters of appropriate order.",non_actionable,fact
r112,s4,t16,This is really the strongest point of the paper.,non_actionable,agreement
r112,s5,t16,"Moreover, a basic system called delta t block implements one level of full approximation and is stoked several times.",non_actionable,fact
r112,s6,t16,"Comments: The paper is badly structured and is sometimes hard to read because it does not present in a linear way the classic ingredients of Machine Learning, expression of the full function to be estimated, equations of each layer, description of the set of parameters to be learned and the loss function.",actionable,shortcoming
r112,s7,t16,"About the loss function, I was surprised not to see a sparsity constraint on the different filters in order to select the order of the differential operators themselves.",actionable,shortcoming
r112,s8,t16,I also found difficult to measure the degree of novelty of the approach considering the recent works and the related work section should have been much more precise in terms of comparison.,actionable,shortcoming
r112,s9,t16,"Finally, I’ve found the paper very interesting and promising but regarding the standard of scientific publication, it requires additional attention to provide a better description the model and discuss the learning scheme to get a strongest and reproducible approach.",actionable,shortcoming
r65,s0,t10,"A thorough exploration of techniques for unsupervised translation, a very strong start for this problem This paper describes an approach to train a neural machine translation system without parallel data.",actionable,agreement
r65,s1,t10,"Starting from a word-to-word translation lexicon, which was also learned with unsupervised methods, this approach combines a denoising auto-encoder objective with a back-translation objective, both in two translation directions, with an adversarial objective that attempts to fool a discriminator that detects the source language of an encoded sentence.",non_actionable,other
r65,s2,t10,"These five objectives together are sufficient to achieve impressive English <-> German and Engish <-> French results in Multi30k, a bilingual image caption scenario with short simple sentences, and to achieve a strong start for a standard WMT scenario.",actionable,agreement
r65,s3,t10,And it is genuinely impressive to see all these pieces come together into something that translates substantially better than a word-to-word baseline.,actionable,agreement
r65,s4,t10,But the aspect I like most about this paper is the experimental analysis.,actionable,fact
r65,s5,t10,"Considering that this is a big, complicated system, it is crucial that the authors included both an ablation experiment to see which pieces were most important, and an experiment that indicates the amount of labeled data that would be required to achieve the same results with a supervised system.",actionable,fact
r65,s6,t10,"I am glad you take the time to give your model selection criterion it's own section in 3.2, as it does seem to be an important part of this puzzle.",actionable,fact
r65,s7,t10,"In the first paragraph of Section 4.5, I disagree with the sentence, ""Similar observations can be made for the other language pairs we considered.""",actionable,disagreement
r65,s8,t10,"In fact, I would go so far as to say that the English to French scenario described in that paragraph is a notable outlier, in that it is the other language pair where you beat the oracle re-ordering baseline in both Multi30k and WMT.",actionable,disagreement
r65,s9,t10,"When citing Shen et al., 2017, consider also mentioning the following: Controllable Invariance through Adversarial Feature Learning; Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, Graham Neubig; NIPS 2017; https://arxiv.org/abs/1705.11122 Response read -- thanks.",actionable,suggestion
r65,s0,t20,"A thorough exploration of techniques for unsupervised translation, a very strong start for this problem This paper describes an approach to train a neural machine translation system without parallel data.",non_actionable,agreement
r65,s1,t20,"Starting from a word-to-word translation lexicon, which was also learned with unsupervised methods, this approach combines a denoising auto-encoder objective with a back-translation objective, both in two translation directions, with an adversarial objective that attempts to fool a discriminator that detects the source language of an encoded sentence.",non_actionable,fact
r65,s2,t20,"These five objectives together are sufficient to achieve impressive English <-> German and Engish <-> French results in Multi30k, a bilingual image caption scenario with short simple sentences, and to achieve a strong start for a standard WMT scenario.",non_actionable,fact
r65,s3,t20,And it is genuinely impressive to see all these pieces come together into something that translates substantially better than a word-to-word baseline.,non_actionable,agreement
r65,s4,t20,But the aspect I like most about this paper is the experimental analysis.,non_actionable,agreement
r65,s5,t20,"Considering that this is a big, complicated system, it is crucial that the authors included both an ablation experiment to see which pieces were most important, and an experiment that indicates the amount of labeled data that would be required to achieve the same results with a supervised system.",actionable,suggestion
r65,s6,t20,"I am glad you take the time to give your model selection criterion it's own section in 3.2, as it does seem to be an important part of this puzzle.",non_actionable,agreement
r65,s7,t20,"In the first paragraph of Section 4.5, I disagree with the sentence, ""Similar observations can be made for the other language pairs we considered.""",actionable,disagreement
r65,s8,t20,"In fact, I would go so far as to say that the English to French scenario described in that paragraph is a notable outlier, in that it is the other language pair where you beat the oracle re-ordering baseline in both Multi30k and WMT.",actionable,shortcoming
r65,s9,t20,"When citing Shen et al., 2017, consider also mentioning the following: Controllable Invariance through Adversarial Feature Learning; Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, Graham Neubig; NIPS 2017; https://arxiv.org/abs/1705.11122 Response read -- thanks.",actionable,suggestion
r65,s0,t31,"A thorough exploration of techniques for unsupervised translation, a very strong start for this problem This paper describes an approach to train a neural machine translation system without parallel data.",non_actionable,fact
r65,s1,t31,"Starting from a word-to-word translation lexicon, which was also learned with unsupervised methods, this approach combines a denoising auto-encoder objective with a back-translation objective, both in two translation directions, with an adversarial objective that attempts to fool a discriminator that detects the source language of an encoded sentence.",non_actionable,fact
r65,s2,t31,"These five objectives together are sufficient to achieve impressive English <-> German and Engish <-> French results in Multi30k, a bilingual image caption scenario with short simple sentences, and to achieve a strong start for a standard WMT scenario.",non_actionable,fact
r65,s3,t31,And it is genuinely impressive to see all these pieces come together into something that translates substantially better than a word-to-word baseline.,non_actionable,agreement
r65,s4,t31,But the aspect I like most about this paper is the experimental analysis.,non_actionable,agreement
r65,s5,t31,"Considering that this is a big, complicated system, it is crucial that the authors included both an ablation experiment to see which pieces were most important, and an experiment that indicates the amount of labeled data that would be required to achieve the same results with a supervised system.",non_actionable,agreement
r65,s6,t31,"I am glad you take the time to give your model selection criterion it's own section in 3.2, as it does seem to be an important part of this puzzle.",non_actionable,agreement
r65,s7,t31,"In the first paragraph of Section 4.5, I disagree with the sentence, ""Similar observations can be made for the other language pairs we considered.""",non_actionable,fact
r65,s8,t31,"In fact, I would go so far as to say that the English to French scenario described in that paragraph is a notable outlier, in that it is the other language pair where you beat the oracle re-ordering baseline in both Multi30k and WMT.",non_actionable,fact
r65,s9,t31,"When citing Shen et al., 2017, consider also mentioning the following: Controllable Invariance through Adversarial Feature Learning; Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, Graham Neubig; NIPS 2017; https://arxiv.org/abs/1705.11122 Response read -- thanks.",actionable,suggestion
r65,s0,t30,"A thorough exploration of techniques for unsupervised translation, a very strong start for this problem This paper describes an approach to train a neural machine translation system without parallel data.",non_actionable,agreement
r65,s1,t30,"Starting from a word-to-word translation lexicon, which was also learned with unsupervised methods, this approach combines a denoising auto-encoder objective with a back-translation objective, both in two translation directions, with an adversarial objective that attempts to fool a discriminator that detects the source language of an encoded sentence.",non_actionable,fact
r65,s2,t30,"These five objectives together are sufficient to achieve impressive English <-> German and Engish <-> French results in Multi30k, a bilingual image caption scenario with short simple sentences, and to achieve a strong start for a standard WMT scenario.",non_actionable,fact
r65,s3,t30,And it is genuinely impressive to see all these pieces come together into something that translates substantially better than a word-to-word baseline.,non_actionable,agreement
r65,s4,t30,But the aspect I like most about this paper is the experimental analysis.,non_actionable,agreement
r65,s5,t30,"Considering that this is a big, complicated system, it is crucial that the authors included both an ablation experiment to see which pieces were most important, and an experiment that indicates the amount of labeled data that would be required to achieve the same results with a supervised system.",actionable,suggestion
r65,s6,t30,"I am glad you take the time to give your model selection criterion it's own section in 3.2, as it does seem to be an important part of this puzzle.",non_actionable,agreement
r65,s7,t30,"In the first paragraph of Section 4.5, I disagree with the sentence, ""Similar observations can be made for the other language pairs we considered.""",actionable,disagreement
r65,s8,t30,"In fact, I would go so far as to say that the English to French scenario described in that paragraph is a notable outlier, in that it is the other language pair where you beat the oracle re-ordering baseline in both Multi30k and WMT.",actionable,suggestion
r65,s9,t30,"When citing Shen et al., 2017, consider also mentioning the following: Controllable Invariance through Adversarial Feature Learning; Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, Graham Neubig; NIPS 2017; https://arxiv.org/abs/1705.11122 Response read -- thanks.",actionable,suggestion
r65,s0,t2,"A thorough exploration of techniques for unsupervised translation, a very strong start for this problem This paper describes an approach to train a neural machine translation system without parallel data.",non_actionable,agreement
r65,s1,t2,"Starting from a word-to-word translation lexicon, which was also learned with unsupervised methods, this approach combines a denoising auto-encoder objective with a back-translation objective, both in two translation directions, with an adversarial objective that attempts to fool a discriminator that detects the source language of an encoded sentence.",non_actionable,fact
r65,s2,t2,"These five objectives together are sufficient to achieve impressive English <-> German and Engish <-> French results in Multi30k, a bilingual image caption scenario with short simple sentences, and to achieve a strong start for a standard WMT scenario.",non_actionable,agreement
r65,s3,t2,And it is genuinely impressive to see all these pieces come together into something that translates substantially better than a word-to-word baseline.,non_actionable,agreement
r65,s4,t2,But the aspect I like most about this paper is the experimental analysis.,non_actionable,agreement
r65,s5,t2,"Considering that this is a big, complicated system, it is crucial that the authors included both an ablation experiment to see which pieces were most important, and an experiment that indicates the amount of labeled data that would be required to achieve the same results with a supervised system.",non_actionable,agreement
r65,s6,t2,"I am glad you take the time to give your model selection criterion it's own section in 3.2, as it does seem to be an important part of this puzzle.",non_actionable,agreement
r65,s7,t2,"In the first paragraph of Section 4.5, I disagree with the sentence, ""Similar observations can be made for the other language pairs we considered.""",non_actionable,disagreement
r65,s8,t2,"In fact, I would go so far as to say that the English to French scenario described in that paragraph is a notable outlier, in that it is the other language pair where you beat the oracle re-ordering baseline in both Multi30k and WMT.",non_actionable,disagreement
r65,s9,t2,"When citing Shen et al., 2017, consider also mentioning the following: Controllable Invariance through Adversarial Feature Learning; Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, Graham Neubig; NIPS 2017; https://arxiv.org/abs/1705.11122 Response read -- thanks.",actionable,suggestion
r73,s0,t20,"A well written, clear paper presenting a novel representation of graphs as multi-channel image-like structures from their node embeddings.",non_actionable,agreement
r73,s1,t20,The paper presents a novel representation of graphs as multi-channel image-like structures.,non_actionable,agreement
r73,s2,t20,These structures are extrapolated by,non_actionable,fact
r73,s3,t20,1) mapping the graph nodes into an embedding using an algorithm like node2vec,non_actionable,fact
r73,s4,t20,2) compressing the embedding space using pca,non_actionable,fact
r73,s5,t20,3) and extracting 2D slices from the compressed space and computing 2D histograms per slice.,non_actionable,fact
r73,s6,t20,he resulting multi-channel image-like structures are then feed into vanilla 2D CNN.,non_actionable,fact
r73,s7,t20,"The papers is well written and clear, and proposes an interesting idea of representing graphs as multi-channel image-like structures.",non_actionable,agreement
r73,s8,t20,"Furthermore, the authors perform experiments with real graph datasets from the social science domain and a comparison with the SoA method both graph kernels and deep learning architectures.",non_actionable,agreement
r73,s9,t20,"The proposed algorithm in 3 out of 5 datasets, two of theme with statistical significant.",non_actionable,fact
r73,s0,t31,"A well written, clear paper presenting a novel representation of graphs as multi-channel image-like structures from their node embeddings.",non_actionable,fact
r73,s1,t31,The paper presents a novel representation of graphs as multi-channel image-like structures.,non_actionable,fact
r73,s2,t31,These structures are extrapolated by,non_actionable,fact
r73,s3,t31,1) mapping the graph nodes into an embedding using an algorithm like node2vec,non_actionable,fact
r73,s4,t31,2) compressing the embedding space using pca,non_actionable,fact
r73,s5,t31,3) and extracting 2D slices from the compressed space and computing 2D histograms per slice.,non_actionable,fact
r73,s6,t31,he resulting multi-channel image-like structures are then feed into vanilla 2D CNN.,non_actionable,fact
r73,s7,t31,"The papers is well written and clear, and proposes an interesting idea of representing graphs as multi-channel image-like structures.",non_actionable,agreement
r73,s8,t31,"Furthermore, the authors perform experiments with real graph datasets from the social science domain and a comparison with the SoA method both graph kernels and deep learning architectures.",non_actionable,fact
r73,s9,t31,"The proposed algorithm in 3 out of 5 datasets, two of theme with statistical significant.",non_actionable,fact
r73,s0,t16,"A well written, clear paper presenting a novel representation of graphs as multi-channel image-like structures from their node embeddings.",non_actionable,agreement
r73,s1,t16,The paper presents a novel representation of graphs as multi-channel image-like structures.,non_actionable,fact
r73,s2,t16,These structures are extrapolated by,non_actionable,fact
r73,s3,t16,1) mapping the graph nodes into an embedding using an algorithm like node2vec,non_actionable,fact
r73,s4,t16,2) compressing the embedding space using pca,non_actionable,fact
r73,s5,t16,3) and extracting 2D slices from the compressed space and computing 2D histograms per slice.,non_actionable,fact
r73,s6,t16,he resulting multi-channel image-like structures are then feed into vanilla 2D CNN.,non_actionable,fact
r73,s7,t16,"The papers is well written and clear, and proposes an interesting idea of representing graphs as multi-channel image-like structures.",non_actionable,agreement
r73,s8,t16,"Furthermore, the authors perform experiments with real graph datasets from the social science domain and a comparison with the SoA method both graph kernels and deep learning architectures.",non_actionable,fact
r73,s9,t16,"The proposed algorithm in 3 out of 5 datasets, two of theme with statistical significant.",non_actionable,fact
r73,s0,t10,"A well written, clear paper presenting a novel representation of graphs as multi-channel image-like structures from their node embeddings.",actionable,agreement
r73,s1,t10,The paper presents a novel representation of graphs as multi-channel image-like structures.,non_actionable,other
r73,s2,t10,These structures are extrapolated by,non_actionable,other
r73,s3,t10,1) mapping the graph nodes into an embedding using an algorithm like node2vec,non_actionable,other
r73,s4,t10,2) compressing the embedding space using pca,non_actionable,other
r73,s5,t10,3) and extracting 2D slices from the compressed space and computing 2D histograms per slice.,non_actionable,other
r73,s6,t10,he resulting multi-channel image-like structures are then feed into vanilla 2D CNN.,non_actionable,other
r73,s7,t10,"The papers is well written and clear, and proposes an interesting idea of representing graphs as multi-channel image-like structures.",actionable,agreement
r73,s8,t10,"Furthermore, the authors perform experiments with real graph datasets from the social science domain and a comparison with the SoA method both graph kernels and deep learning architectures.",non_actionable,other
r73,s9,t10,"The proposed algorithm in 3 out of 5 datasets, two of theme with statistical significant.",non_actionable,other
r73,s0,t2,"A well written, clear paper presenting a novel representation of graphs as multi-channel image-like structures from their node embeddings.",non_actionable,agreement
r73,s1,t2,The paper presents a novel representation of graphs as multi-channel image-like structures.,non_actionable,agreement
r73,s2,t2,These structures are extrapolated by,non_actionable,fact
r73,s3,t2,1) mapping the graph nodes into an embedding using an algorithm like node2vec,non_actionable,fact
r73,s4,t2,2) compressing the embedding space using pca,non_actionable,shortcoming
r73,s5,t2,3) and extracting 2D slices from the compressed space and computing 2D histograms per slice.,non_actionable,fact
r73,s6,t2,he resulting multi-channel image-like structures are then feed into vanilla 2D CNN.,non_actionable,fact
r73,s7,t2,"The papers is well written and clear, and proposes an interesting idea of representing graphs as multi-channel image-like structures.",non_actionable,agreement
r73,s8,t2,"Furthermore, the authors perform experiments with real graph datasets from the social science domain and a comparison with the SoA method both graph kernels and deep learning architectures.",non_actionable,fact
r73,s9,t2,"The proposed algorithm in 3 out of 5 datasets, two of theme with statistical significant.",non_actionable,fact
r76,s0,t31,"Also, BF is an odd language to target for program synthesis.",actionable,shortcoming
r76,s1,t31,This paper introduces a method for regularizing the REINFORCE algorithm by keeping around a small set of known high-quality samples as part of the sample set when performing stochastic gradient estimation.,non_actionable,fact
r76,s2,t31,I question the value of program synthesis in a language which is not human-readable.,actionable,shortcoming
r76,s3,t31,"There are no program synthesis examples demonstrating types of functions which perform complex tasks involving e.g. recursion, such as sorting operations.",actionable,shortcoming
r76,s4,t31,"All this said, the priority queue training presented here for reinforcement learning with sparse rewards is interesting, and appears to significantly improve the quality of results from a naive policy gradient approach.",non_actionable,agreement
r76,s5,t31,"It would be nice to provide some sort of analysis of it, even an empirical one.",actionable,suggestion
r76,s6,t31,"For example, how frequently are the entries in the queue updated?",actionable,suggestion
r76,s7,t31,Is this consistent over training time?,actionable,suggestion
r76,s8,t31,"While the paper does demonstrate that PQT is helpful on this very particular task, it makes very little effort to investigate *why* it is helpful, or whether it will usefully generalize to other domains.",actionable,shortcoming
r76,s9,t31,It would also help clarify under what situations one should or should not use this.,actionable,suggestion
r76,s0,t20,"Also, BF is an odd language to target for program synthesis.",non_actionable,shortcoming
r76,s1,t20,This paper introduces a method for regularizing the REINFORCE algorithm by keeping around a small set of known high-quality samples as part of the sample set when performing stochastic gradient estimation.,non_actionable,fact
r76,s2,t20,I question the value of program synthesis in a language which is not human-readable.,non_actionable,shortcoming
r76,s3,t20,"There are no program synthesis examples demonstrating types of functions which perform complex tasks involving e.g. recursion, such as sorting operations.",actionable,shortcoming
r76,s4,t20,"All this said, the priority queue training presented here for reinforcement learning with sparse rewards is interesting, and appears to significantly improve the quality of results from a naive policy gradient approach.",non_actionable,agreement
r76,s5,t20,"It would be nice to provide some sort of analysis of it, even an empirical one.",actionable,suggestion
r76,s6,t20,"For example, how frequently are the entries in the queue updated?",actionable,question
r76,s7,t20,Is this consistent over training time?,actionable,question
r76,s8,t20,"While the paper does demonstrate that PQT is helpful on this very particular task, it makes very little effort to investigate *why* it is helpful, or whether it will usefully generalize to other domains.",actionable,shortcoming
r76,s9,t20,It would also help clarify under what situations one should or should not use this.,actionable,suggestion
r76,s0,t16,"Also, BF is an odd language to target for program synthesis.",actionable,disagreement
r76,s1,t16,This paper introduces a method for regularizing the REINFORCE algorithm by keeping around a small set of known high-quality samples as part of the sample set when performing stochastic gradient estimation.,non_actionable,fact
r76,s2,t16,I question the value of program synthesis in a language which is not human-readable.,actionable,disagreement
r76,s3,t16,"There are no program synthesis examples demonstrating types of functions which perform complex tasks involving e.g. recursion, such as sorting operations.",non_actionable,fact
r76,s4,t16,"All this said, the priority queue training presented here for reinforcement learning with sparse rewards is interesting, and appears to significantly improve the quality of results from a naive policy gradient approach.",non_actionable,agreement
r76,s5,t16,"It would be nice to provide some sort of analysis of it, even an empirical one.",actionable,suggestion
r76,s6,t16,"For example, how frequently are the entries in the queue updated?",actionable,question
r76,s7,t16,Is this consistent over training time?,actionable,question
r76,s8,t16,"While the paper does demonstrate that PQT is helpful on this very particular task, it makes very little effort to investigate *why* it is helpful, or whether it will usefully generalize to other domains.",actionable,shortcoming
r76,s9,t16,It would also help clarify under what situations one should or should not use this.,actionable,suggestion
r76,s0,t2,"Also, BF is an odd language to target for program synthesis.",non_actionable,fact
r76,s1,t2,This paper introduces a method for regularizing the REINFORCE algorithm by keeping around a small set of known high-quality samples as part of the sample set when performing stochastic gradient estimation.,non_actionable,fact
r76,s2,t2,I question the value of program synthesis in a language which is not human-readable.,non_actionable,fact
r76,s3,t2,"There are no program synthesis examples demonstrating types of functions which perform complex tasks involving e.g. recursion, such as sorting operations.",actionable,shortcoming
r76,s4,t2,"All this said, the priority queue training presented here for reinforcement learning with sparse rewards is interesting, and appears to significantly improve the quality of results from a naive policy gradient approach.",non_actionable,agreement
r76,s5,t2,"It would be nice to provide some sort of analysis of it, even an empirical one.",actionable,suggestion
r76,s6,t2,"For example, how frequently are the entries in the queue updated?",actionable,question
r76,s7,t2,Is this consistent over training time?,actionable,question
r76,s8,t2,"While the paper does demonstrate that PQT is helpful on this very particular task, it makes very little effort to investigate *why* it is helpful, or whether it will usefully generalize to other domains.",actionable,shortcoming
r76,s9,t2,It would also help clarify under what situations one should or should not use this.,actionable,shortcoming
r76,s0,t8,"Also, BF is an odd language to target for program synthesis.",actionable,shortcoming
r76,s1,t8,This paper introduces a method for regularizing the REINFORCE algorithm by keeping around a small set of known high-quality samples as part of the sample set when performing stochastic gradient estimation.,non_actionable,fact
r76,s2,t8,I question the value of program synthesis in a language which is not human-readable.,actionable,disagreement
r76,s3,t8,"There are no program synthesis examples demonstrating types of functions which perform complex tasks involving e.g. recursion, such as sorting operations.",actionable,shortcoming
r76,s4,t8,"All this said, the priority queue training presented here for reinforcement learning with sparse rewards is interesting, and appears to significantly improve the quality of results from a naive policy gradient approach.",non_actionable,agreement
r76,s5,t8,"It would be nice to provide some sort of analysis of it, even an empirical one.",actionable,suggestion
r76,s6,t8,"For example, how frequently are the entries in the queue updated?",actionable,question
r76,s7,t8,Is this consistent over training time?,actionable,question
r76,s8,t8,"While the paper does demonstrate that PQT is helpful on this very particular task, it makes very little effort to investigate *why* it is helpful, or whether it will usefully generalize to other domains.",actionable,shortcoming
r76,s9,t8,It would also help clarify under what situations one should or should not use this.,non_actionable,fact
r63,s0,t8,An encoder reads in the document to condition the generator which outputs a summary.,non_actionable,fact
r63,s1,t8,An additional GAN loss is used on the generator output to encourage the output to look like summaries -- this procedure only requires unpaired summaries.,non_actionable,fact
r63,s2,t8,The results are that this procedure improves upon the trivial baseline but still significantly underperforms supervised training.,non_actionable,fact
r63,s3,t8,The idea is a simple but useful extension of these previous works.,non_actionable,agreement
r63,s4,t8,"The problem set-up of unpaired summarization is not particularly compelling, since summaries are typically found paired with their original documents.",actionable,shortcoming
r63,s5,t8,"It would be more interesting to see how well it can be used for other textual domains such as translation, where a lot of unpaired data exists (some other submissions to ICLR tackle this problem).",actionable,suggestion
r63,s6,t8,A key baseline that is missing is pretraining the generator as a language model over summaries.,actionable,shortcoming
r63,s7,t8,"Without this baseline, it is hard to tell whether GAN training is even useful.",actionable,shortcoming
r63,s8,t8,Another experiment missing is seeing whether joint supervised-GAN-reconstruction training can outperform purely supervised training.,actionable,shortcoming
r63,s9,t8,"This paper has numerous grammatical and spelling errors throughout the paper (worse, the same errors are copy-pasted everywhere).",actionable,shortcoming
r63,s0,t20,An encoder reads in the document to condition the generator which outputs a summary.,non_actionable,fact
r63,s1,t20,An additional GAN loss is used on the generator output to encourage the output to look like summaries -- this procedure only requires unpaired summaries.,non_actionable,fact
r63,s2,t20,The results are that this procedure improves upon the trivial baseline but still significantly underperforms supervised training.,non_actionable,fact
r63,s3,t20,The idea is a simple but useful extension of these previous works.,non_actionable,fact
r63,s4,t20,"The problem set-up of unpaired summarization is not particularly compelling, since summaries are typically found paired with their original documents.",actionable,shortcoming
r63,s5,t20,"It would be more interesting to see how well it can be used for other textual domains such as translation, where a lot of unpaired data exists (some other submissions to ICLR tackle this problem).",actionable,suggestion
r63,s6,t20,A key baseline that is missing is pretraining the generator as a language model over summaries.,actionable,shortcoming
r63,s7,t20,"Without this baseline, it is hard to tell whether GAN training is even useful.",actionable,shortcoming
r63,s8,t20,Another experiment missing is seeing whether joint supervised-GAN-reconstruction training can outperform purely supervised training.,actionable,shortcoming
r63,s9,t20,"This paper has numerous grammatical and spelling errors throughout the paper (worse, the same errors are copy-pasted everywhere).",actionable,shortcoming
r63,s0,t10,An encoder reads in the document to condition the generator which outputs a summary.,non_actionable,other
r63,s1,t10,An additional GAN loss is used on the generator output to encourage the output to look like summaries -- this procedure only requires unpaired summaries.,non_actionable,other
r63,s2,t10,The results are that this procedure improves upon the trivial baseline but still significantly underperforms supervised training.,non_actionable,other
r63,s3,t10,The idea is a simple but useful extension of these previous works.,actionable,agreement
r63,s4,t10,"The problem set-up of unpaired summarization is not particularly compelling, since summaries are typically found paired with their original documents.",actionable,shortcoming
r63,s5,t10,"It would be more interesting to see how well it can be used for other textual domains such as translation, where a lot of unpaired data exists (some other submissions to ICLR tackle this problem).",actionable,suggestion
r63,s6,t10,A key baseline that is missing is pretraining the generator as a language model over summaries.,actionable,shortcoming
r63,s7,t10,"Without this baseline, it is hard to tell whether GAN training is even useful.",actionable,shortcoming
r63,s8,t10,Another experiment missing is seeing whether joint supervised-GAN-reconstruction training can outperform purely supervised training.,actionable,shortcoming
r63,s9,t10,"This paper has numerous grammatical and spelling errors throughout the paper (worse, the same errors are copy-pasted everywhere).",actionable,shortcoming
r63,s0,t16,An encoder reads in the document to condition the generator which outputs a summary.,non_actionable,fact
r63,s1,t16,An additional GAN loss is used on the generator output to encourage the output to look like summaries -- this procedure only requires unpaired summaries.,non_actionable,fact
r63,s2,t16,The results are that this procedure improves upon the trivial baseline but still significantly underperforms supervised training.,actionable,shortcoming
r63,s3,t16,The idea is a simple but useful extension of these previous works.,non_actionable,agreement
r63,s4,t16,"The problem set-up of unpaired summarization is not particularly compelling, since summaries are typically found paired with their original documents.",non_actionable,fact
r63,s5,t16,"It would be more interesting to see how well it can be used for other textual domains such as translation, where a lot of unpaired data exists (some other submissions to ICLR tackle this problem).",actionable,suggestion
r63,s6,t16,A key baseline that is missing is pretraining the generator as a language model over summaries.,actionable,shortcoming
r63,s7,t16,"Without this baseline, it is hard to tell whether GAN training is even useful.",actionable,shortcoming
r63,s8,t16,Another experiment missing is seeing whether joint supervised-GAN-reconstruction training can outperform purely supervised training.,actionable,shortcoming
r63,s9,t16,"This paper has numerous grammatical and spelling errors throughout the paper (worse, the same errors are copy-pasted everywhere).",actionable,shortcoming
r63,s0,t2,An encoder reads in the document to condition the generator which outputs a summary.,non_actionable,fact
r63,s1,t2,An additional GAN loss is used on the generator output to encourage the output to look like summaries -- this procedure only requires unpaired summaries.,non_actionable,fact
r63,s2,t2,The results are that this procedure improves upon the trivial baseline but still significantly underperforms supervised training.,non_actionable,fact
r63,s3,t2,The idea is a simple but useful extension of these previous works.,non_actionable,agreement
r63,s4,t2,"The problem set-up of unpaired summarization is not particularly compelling, since summaries are typically found paired with their original documents.",non_actionable,shortcoming
r63,s5,t2,"It would be more interesting to see how well it can be used for other textual domains such as translation, where a lot of unpaired data exists (some other submissions to ICLR tackle this problem).",actionable,shortcoming
r63,s6,t2,A key baseline that is missing is pretraining the generator as a language model over summaries.,actionable,shortcoming
r63,s7,t2,"Without this baseline, it is hard to tell whether GAN training is even useful.",actionable,shortcoming
r63,s8,t2,Another experiment missing is seeing whether joint supervised-GAN-reconstruction training can outperform purely supervised training.,actionable,shortcoming
r63,s9,t2,"This paper has numerous grammatical and spelling errors throughout the paper (worse, the same errors are copy-pasted everywhere).",actionable,shortcoming
r100,s0,t20,An interesting paper which is marginally above acceptance threshold The authors of the paper propose a framework to generate natural adversarial examples by searching adversaries in a latent space of dense and continuous data representation (instead of in the original input data space).,actionable,agreement
r100,s1,t20,"The details of their proposed method are covered in Algorithm 1 on Page 12, where an additional GAN (generative adversarial network) I_{\gamma}, which can be regarded as the inverse function of the original GAN G_{\theta}, is trained to learn a map from the original input data space to the latent z-space.",non_actionable,fact
r100,s2,t20,The intuition of the proposed approach is clearly explained and it seems very reasonable to me.,non_actionable,agreement
r100,s3,t20,"My main concern, however, is in the current sampling-based search algorithm in the latent z-space, which the authors have already admitted in the paper.",non_actionable,shortcoming
r100,s4,t20,The efficiency of such a search method decreases very fast when the dimensions of the z-space increases.,non_actionable,fact
r100,s5,t20,Another concern is that the authors have not provided sufficient number of examples to show the advantages of their proposed method over the other method (such as FGSM) in generating the adversaries.,actionable,shortcoming
r100,s6,t20,Could you explicitly specify the dimension of the latent z-space in each example in image and text domain in Section 3?,actionable,question
r100,s7,t20,"In Tables 7 and 8, the human beings agree with the LeNet in >= 58% of cases.",non_actionable,fact
r100,s8,t20,Could you still say that your generated “adversaries” leading to the wrong decision from LeNet?,actionable,question
r100,s9,t20,How do you choose the parameter \lambda in Equation (2)?,actionable,question
r100,s0,t25,An interesting paper which is marginally above acceptance threshold The authors of the paper propose a framework to generate natural adversarial examples by searching adversaries in a latent space of dense and continuous data representation (instead of in the original input data space).,actionable,shortcoming
r100,s1,t25,"The details of their proposed method are covered in Algorithm 1 on Page 12, where an additional GAN (generative adversarial network) I_{\gamma}, which can be regarded as the inverse function of the original GAN G_{\theta}, is trained to learn a map from the original input data space to the latent z-space.",non_actionable,shortcoming
r100,s2,t25,The intuition of the proposed approach is clearly explained and it seems very reasonable to me.,non_actionable,agreement
r100,s3,t25,"My main concern, however, is in the current sampling-based search algorithm in the latent z-space, which the authors have already admitted in the paper.",non_actionable,shortcoming
r100,s4,t25,The efficiency of such a search method decreases very fast when the dimensions of the z-space increases.,non_actionable,disagreement
r100,s5,t25,Another concern is that the authors have not provided sufficient number of examples to show the advantages of their proposed method over the other method (such as FGSM) in generating the adversaries.,actionable,shortcoming
r100,s6,t25,Could you explicitly specify the dimension of the latent z-space in each example in image and text domain in Section 3?,actionable,question
r100,s7,t25,"In Tables 7 and 8, the human beings agree with the LeNet in >= 58% of cases.",non_actionable,fact
r100,s8,t25,Could you still say that your generated “adversaries” leading to the wrong decision from LeNet?,actionable,question
r100,s9,t25,How do you choose the parameter \lambda in Equation (2)?,actionable,question
r100,s0,t31,An interesting paper which is marginally above acceptance threshold The authors of the paper propose a framework to generate natural adversarial examples by searching adversaries in a latent space of dense and continuous data representation (instead of in the original input data space).,non_actionable,fact
r100,s1,t31,"The details of their proposed method are covered in Algorithm 1 on Page 12, where an additional GAN (generative adversarial network) I_{\gamma}, which can be regarded as the inverse function of the original GAN G_{\theta}, is trained to learn a map from the original input data space to the latent z-space.",non_actionable,fact
r100,s2,t31,The intuition of the proposed approach is clearly explained and it seems very reasonable to me.,non_actionable,agreement
r100,s3,t31,"My main concern, however, is in the current sampling-based search algorithm in the latent z-space, which the authors have already admitted in the paper.",actionable,shortcoming
r100,s4,t31,The efficiency of such a search method decreases very fast when the dimensions of the z-space increases.,actionable,shortcoming
r100,s5,t31,Another concern is that the authors have not provided sufficient number of examples to show the advantages of their proposed method over the other method (such as FGSM) in generating the adversaries.,actionable,shortcoming
r100,s6,t31,Could you explicitly specify the dimension of the latent z-space in each example in image and text domain in Section 3?,actionable,shortcoming
r100,s7,t31,"In Tables 7 and 8, the human beings agree with the LeNet in >= 58% of cases.",non_actionable,fact
r100,s8,t31,Could you still say that your generated “adversaries” leading to the wrong decision from LeNet?,non_actionable,question
r100,s9,t31,How do you choose the parameter \lambda in Equation (2)?,non_actionable,question
r100,s0,t10,An interesting paper which is marginally above acceptance threshold The authors of the paper propose a framework to generate natural adversarial examples by searching adversaries in a latent space of dense and continuous data representation (instead of in the original input data space).,non_actionable,other
r100,s1,t10,"The details of their proposed method are covered in Algorithm 1 on Page 12, where an additional GAN (generative adversarial network) I_{\gamma}, which can be regarded as the inverse function of the original GAN G_{\theta}, is trained to learn a map from the original input data space to the latent z-space.",non_actionable,other
r100,s2,t10,The intuition of the proposed approach is clearly explained and it seems very reasonable to me.,actionable,agreement
r100,s3,t10,"My main concern, however, is in the current sampling-based search algorithm in the latent z-space, which the authors have already admitted in the paper.",actionable,shortcoming
r100,s4,t10,The efficiency of such a search method decreases very fast when the dimensions of the z-space increases.,actionable,shortcoming
r100,s5,t10,Another concern is that the authors have not provided sufficient number of examples to show the advantages of their proposed method over the other method (such as FGSM) in generating the adversaries.,actionable,shortcoming
r100,s6,t10,Could you explicitly specify the dimension of the latent z-space in each example in image and text domain in Section 3?,actionable,suggestion
r100,s7,t10,"In Tables 7 and 8, the human beings agree with the LeNet in >= 58% of cases.",non_actionable,other
r100,s8,t10,Could you still say that your generated “adversaries” leading to the wrong decision from LeNet?,actionable,suggestion
r100,s9,t10,How do you choose the parameter \lambda in Equation (2)?,actionable,suggestion
r100,s0,t2,An interesting paper which is marginally above acceptance threshold The authors of the paper propose a framework to generate natural adversarial examples by searching adversaries in a latent space of dense and continuous data representation (instead of in the original input data space).,non_actionable,agreement
r100,s1,t2,"The details of their proposed method are covered in Algorithm 1 on Page 12, where an additional GAN (generative adversarial network) I_{\gamma}, which can be regarded as the inverse function of the original GAN G_{\theta}, is trained to learn a map from the original input data space to the latent z-space.",non_actionable,fact
r100,s2,t2,The intuition of the proposed approach is clearly explained and it seems very reasonable to me.,non_actionable,agreement
r100,s3,t2,"My main concern, however, is in the current sampling-based search algorithm in the latent z-space, which the authors have already admitted in the paper.",non_actionable,shortcoming
r100,s4,t2,The efficiency of such a search method decreases very fast when the dimensions of the z-space increases.,non_actionable,fact
r100,s5,t2,Another concern is that the authors have not provided sufficient number of examples to show the advantages of their proposed method over the other method (such as FGSM) in generating the adversaries.,actionable,shortcoming
r100,s6,t2,Could you explicitly specify the dimension of the latent z-space in each example in image and text domain in Section 3?,actionable,suggestion
r100,s7,t2,"In Tables 7 and 8, the human beings agree with the LeNet in >= 58% of cases.",non_actionable,fact
r100,s8,t2,Could you still say that your generated “adversaries” leading to the wrong decision from LeNet?,actionable,question
r100,s9,t2,How do you choose the parameter \lambda in Equation (2)?,actionable,question
r118,s0,t31,"An interesting paper, but not the clearest presentation.",actionable,shortcoming
r118,s1,t31,Additional regularization terms are also added to encourage the model to encode longer term dependencies in its latent distributions.,actionable,suggestion
r118,s2,t31,My first concern with this paper is that the derivation in Eq.,actionable,shortcoming
r118,s3,t31,There is a p(z_1:T) term that should appear in the integrand.,actionable,suggestion
r118,s4,t31,It is not clear to me why h_t should depend on \tilde{b}_t.,actionable,suggestion
r118,s5,t31,"It may add capacity to the decoder in the form of extra weights, but the same could be achieved by making z_t larger.",non_actionable,fact
r118,s6,t31,In the no reconstruction loss experiments do you still sample \tilde{b}_t in the generative part?,non_actionable,question
r118,s7,t31,It seems the Blizzard results in Figure 2 are missing no reconstruction loss + full backprop.,actionable,shortcoming
r118,s8,t31,Exactly which gradients are you skipping at random?,non_actionable,question
r118,s9,t31,Do you have any intuition for why it is sometimes necessary to set beta=0?,non_actionable,question
r118,s0,t10,"An interesting paper, but not the clearest presentation.",actionable,disagreement
r118,s1,t10,Additional regularization terms are also added to encourage the model to encode longer term dependencies in its latent distributions.,non_actionable,other
r118,s2,t10,My first concern with this paper is that the derivation in Eq.,actionable,disagreement
r118,s3,t10,There is a p(z_1:T) term that should appear in the integrand.,actionable,shortcoming
r118,s4,t10,It is not clear to me why h_t should depend on \tilde{b}_t.,actionable,fact
r118,s5,t10,"It may add capacity to the decoder in the form of extra weights, but the same could be achieved by making z_t larger.",actionable,suggestion
r118,s6,t10,In the no reconstruction loss experiments do you still sample \tilde{b}_t in the generative part?,actionable,question
r118,s7,t10,It seems the Blizzard results in Figure 2 are missing no reconstruction loss + full backprop.,actionable,shortcoming
r118,s8,t10,Exactly which gradients are you skipping at random?,actionable,question
r118,s9,t10,Do you have any intuition for why it is sometimes necessary to set beta=0?,actionable,question
r118,s0,t8,"An interesting paper, but not the clearest presentation.",actionable,shortcoming
r118,s1,t8,Additional regularization terms are also added to encourage the model to encode longer term dependencies in its latent distributions.,non_actionable,fact
r118,s2,t8,My first concern with this paper is that the derivation in Eq.,actionable,shortcoming
r118,s3,t8,There is a p(z_1:T) term that should appear in the integrand.,actionable,shortcoming
r118,s4,t8,It is not clear to me why h_t should depend on \tilde{b}_t.,actionable,disagreement
r118,s5,t8,"It may add capacity to the decoder in the form of extra weights, but the same could be achieved by making z_t larger.",actionable,disagreement
r118,s6,t8,In the no reconstruction loss experiments do you still sample \tilde{b}_t in the generative part?,non_actionable,question
r118,s7,t8,It seems the Blizzard results in Figure 2 are missing no reconstruction loss + full backprop.,non_actionable,agreement
r118,s8,t8,Exactly which gradients are you skipping at random?,non_actionable,question
r118,s9,t8,Do you have any intuition for why it is sometimes necessary to set beta=0?,non_actionable,question
r118,s0,t2,"An interesting paper, but not the clearest presentation.",non_actionable,shortcoming
r118,s1,t2,Additional regularization terms are also added to encourage the model to encode longer term dependencies in its latent distributions.,non_actionable,fact
r118,s2,t2,My first concern with this paper is that the derivation in Eq.,non_actionable,shortcoming
r118,s3,t2,There is a p(z_1:T) term that should appear in the integrand.,actionable,suggestion
r118,s4,t2,It is not clear to me why h_t should depend on \tilde{b}_t.,actionable,shortcoming
r118,s5,t2,"It may add capacity to the decoder in the form of extra weights, but the same could be achieved by making z_t larger.",actionable,suggestion
r118,s6,t2,In the no reconstruction loss experiments do you still sample \tilde{b}_t in the generative part?,actionable,question
r118,s7,t2,It seems the Blizzard results in Figure 2 are missing no reconstruction loss + full backprop.,actionable,shortcoming
r118,s8,t2,Exactly which gradients are you skipping at random?,actionable,question
r118,s9,t2,Do you have any intuition for why it is sometimes necessary to set beta=0?,actionable,question
r118,s0,t20,"An interesting paper, but not the clearest presentation.",non_actionable,fact
r118,s1,t20,Additional regularization terms are also added to encourage the model to encode longer term dependencies in its latent distributions.,non_actionable,fact
r118,s2,t20,My first concern with this paper is that the derivation in Eq.,actionable,shortcoming
r118,s3,t20,There is a p(z_1:T) term that should appear in the integrand.,actionable,suggestion
r118,s4,t20,It is not clear to me why h_t should depend on \tilde{b}_t.,actionable,shortcoming
r118,s5,t20,"It may add capacity to the decoder in the form of extra weights, but the same could be achieved by making z_t larger.",actionable,shortcoming
r118,s6,t20,In the no reconstruction loss experiments do you still sample \tilde{b}_t in the generative part?,actionable,question
r118,s7,t20,It seems the Blizzard results in Figure 2 are missing no reconstruction loss + full backprop.,actionable,shortcoming
r118,s8,t20,Exactly which gradients are you skipping at random?,actionable,question
r118,s9,t20,Do you have any intuition for why it is sometimes necessary to set beta=0?,actionable,question
r85,s0,t18,"But, apparently, this is not the problem the authors actually solve, according to eq.",non_actionable,fact
r85,s1,t18,I am not sure how this greedy action should result in maximizing the total discounted reward along a trajectory.,non_actionable,shortcoming
r85,s2,t18,Equation 3 seems to be a cost function penalizing differences between predicted and observed states.,non_actionable,agreement
r85,s3,t18,"Similarly, equation 4 penalizes differences between predicted and observed state transitions.",actionable,suggestion
r85,s4,t18,"Essentially, the current manuscript does not learn the reward function of an MDP in the RL setting, but it learns some sort of a shaping reward function to do policy imitation, i.e. copy the behavior of the demonstrator as closely as possible.",non_actionable,agreement
r85,s5,t18,"So, in my view, the manuscript does a nice job at policy fitting, but this is not reward estimation.",non_actionable,other
r85,s6,t18,The manuscript has to be rewritten that way.,non_actionable,suggestion
r85,s7,t18,"One could also argue that the manuscript would profit from a better theoretical analysis of the IRL problem, say: C. A. Rothkopf, C. Dimitrakakis.",non_actionable,suggestion
r85,s8,t18,Preference elicitation and inverse reinforcement learning.,actionable,other
r85,s9,t18,"ECML 2011 Overall the manuscript leverages on deep learning’s power of function approximation and the simulation results are nice, but in terms of the soundness of the underlying RL and IRL theory there is some work to do.",actionable,fact
r85,s0,t2,"But, apparently, this is not the problem the authors actually solve, according to eq.",non_actionable,shortcoming
r85,s1,t2,I am not sure how this greedy action should result in maximizing the total discounted reward along a trajectory.,non_actionable,shortcoming
r85,s2,t2,Equation 3 seems to be a cost function penalizing differences between predicted and observed states.,non_actionable,fact
r85,s3,t2,"Similarly, equation 4 penalizes differences between predicted and observed state transitions.",non_actionable,fact
r85,s4,t2,"Essentially, the current manuscript does not learn the reward function of an MDP in the RL setting, but it learns some sort of a shaping reward function to do policy imitation, i.e. copy the behavior of the demonstrator as closely as possible.",non_actionable,fact
r85,s5,t2,"So, in my view, the manuscript does a nice job at policy fitting, but this is not reward estimation.",non_actionable,disagreement
r85,s6,t2,The manuscript has to be rewritten that way.,actionable,suggestion
r85,s7,t2,"One could also argue that the manuscript would profit from a better theoretical analysis of the IRL problem, say: C. A. Rothkopf, C. Dimitrakakis.",actionable,suggestion
r85,s8,t2,Preference elicitation and inverse reinforcement learning.,non_actionable,other
r85,s9,t2,"ECML 2011 Overall the manuscript leverages on deep learning’s power of function approximation and the simulation results are nice, but in terms of the soundness of the underlying RL and IRL theory there is some work to do.",actionable,shortcoming
r85,s0,t31,"But, apparently, this is not the problem the authors actually solve, according to eq.",actionable,disagreement
r85,s1,t31,I am not sure how this greedy action should result in maximizing the total discounted reward along a trajectory.,actionable,shortcoming
r85,s2,t31,Equation 3 seems to be a cost function penalizing differences between predicted and observed states.,non_actionable,fact
r85,s3,t31,"Similarly, equation 4 penalizes differences between predicted and observed state transitions.",non_actionable,fact
r85,s4,t31,"Essentially, the current manuscript does not learn the reward function of an MDP in the RL setting, but it learns some sort of a shaping reward function to do policy imitation, i.e. copy the behavior of the demonstrator as closely as possible.",non_actionable,fact
r85,s5,t31,"So, in my view, the manuscript does a nice job at policy fitting, but this is not reward estimation.",actionable,shortcoming
r85,s6,t31,The manuscript has to be rewritten that way.,actionable,suggestion
r85,s7,t31,"One could also argue that the manuscript would profit from a better theoretical analysis of the IRL problem, say: C. A. Rothkopf, C. Dimitrakakis.",actionable,suggestion
r85,s8,t31,Preference elicitation and inverse reinforcement learning.,non_actionable,fact
r85,s9,t31,"ECML 2011 Overall the manuscript leverages on deep learning’s power of function approximation and the simulation results are nice, but in terms of the soundness of the underlying RL and IRL theory there is some work to do.",actionable,shortcoming
r85,s0,t20,"But, apparently, this is not the problem the authors actually solve, according to eq.",non_actionable,shortcoming
r85,s1,t20,I am not sure how this greedy action should result in maximizing the total discounted reward along a trajectory.,non_actionable,shortcoming
r85,s2,t20,Equation 3 seems to be a cost function penalizing differences between predicted and observed states.,non_actionable,fact
r85,s3,t20,"Similarly, equation 4 penalizes differences between predicted and observed state transitions.",non_actionable,fact
r85,s4,t20,"Essentially, the current manuscript does not learn the reward function of an MDP in the RL setting, but it learns some sort of a shaping reward function to do policy imitation, i.e. copy the behavior of the demonstrator as closely as possible.",non_actionable,fact
r85,s5,t20,"So, in my view, the manuscript does a nice job at policy fitting, but this is not reward estimation.",actionable,shortcoming
r85,s6,t20,The manuscript has to be rewritten that way.,actionable,suggestion
r85,s7,t20,"One could also argue that the manuscript would profit from a better theoretical analysis of the IRL problem, say: C. A. Rothkopf, C. Dimitrakakis.",actionable,suggestion
r85,s8,t20,Preference elicitation and inverse reinforcement learning.,non_actionable,fact
r85,s9,t20,"ECML 2011 Overall the manuscript leverages on deep learning’s power of function approximation and the simulation results are nice, but in terms of the soundness of the underlying RL and IRL theory there is some work to do.",actionable,shortcoming
r85,s0,t10,"But, apparently, this is not the problem the authors actually solve, according to eq.",actionable,shortcoming
r85,s1,t10,I am not sure how this greedy action should result in maximizing the total discounted reward along a trajectory.,actionable,disagreement
r85,s2,t10,Equation 3 seems to be a cost function penalizing differences between predicted and observed states.,non_actionable,other
r85,s3,t10,"Similarly, equation 4 penalizes differences between predicted and observed state transitions.",non_actionable,other
r85,s4,t10,"Essentially, the current manuscript does not learn the reward function of an MDP in the RL setting, but it learns some sort of a shaping reward function to do policy imitation, i.e. copy the behavior of the demonstrator as closely as possible.",non_actionable,other
r85,s5,t10,"So, in my view, the manuscript does a nice job at policy fitting, but this is not reward estimation.",actionable,shortcoming
r85,s6,t10,The manuscript has to be rewritten that way.,actionable,suggestion
r85,s7,t10,"One could also argue that the manuscript would profit from a better theoretical analysis of the IRL problem, say: C. A. Rothkopf, C. Dimitrakakis.",actionable,disagreement
r85,s8,t10,Preference elicitation and inverse reinforcement learning.,non_actionable,other
r85,s9,t10,"ECML 2011 Overall the manuscript leverages on deep learning’s power of function approximation and the simulation results are nice, but in terms of the soundness of the underlying RL and IRL theory there is some work to do.",actionable,suggestion
r25,s0,t8,Claiming much of common intuition around tricks for avoiding gradient issues are incorrect.,actionable,disagreement
r25,s1,t8,The paper makes some bold claims.,non_actionable,fact
r25,s2,t8,"It's possible some of the issues arise from the particular architectures they choose to investigate and demonstrate on (eg I have mostly seen ResNets in the context of CNNs but they analyze on FC topologies, the form of the loss, etc) but that's a guess and there are some further analysis in the supp material for these networks which I haven't looked at in detail.",actionable,fact
r25,s3,t8,"Regardless - an important note to the authors is that it's a particularly long and verbose paper, coming in at 16 pages of the main paper(!) with nearly 50 (!) pages of supplementary material where the heart and meat of the proofs and experiments reside.",actionable,shortcoming
r25,s4,t8,As such it's not even clear if this is proper for a conference.,actionable,shortcoming
r25,s5,t8,The authors have already provided several pages worth of additional comments on the website on further related work.,non_actionable,fact
r25,s6,t8,I view this as an issue in and of itself.,actionable,fact
r25,s7,t8,I've seen many papers that need to go through much more complicated derivations and theory and remain within a 8-10 page limit by being precise and strictly to the point.,actionable,suggestion
r25,s8,t8,"Perhaps Godel could be a good inspiration here, with a 21 page PhD thesis that fundamentally changed mathematics.",actionable,suggestion
r25,s9,t8,"So, while I cannot vouch for the correctness, I think it can and should go through a serious revision to make it succinct and that will likely considerably help in making it accessible to a wider readership and aligned to the expectations from a conference paper in the field.",actionable,suggestion
r25,s0,t20,Claiming much of common intuition around tricks for avoiding gradient issues are incorrect.,actionable,shortcoming
r25,s1,t20,The paper makes some bold claims.,non_actionable,fact
r25,s2,t20,"It's possible some of the issues arise from the particular architectures they choose to investigate and demonstrate on (eg I have mostly seen ResNets in the context of CNNs but they analyze on FC topologies, the form of the loss, etc) but that's a guess and there are some further analysis in the supp material for these networks which I haven't looked at in detail.",non_actionable,fact
r25,s3,t20,"Regardless - an important note to the authors is that it's a particularly long and verbose paper, coming in at 16 pages of the main paper(!) with nearly 50 (!) pages of supplementary material where the heart and meat of the proofs and experiments reside.",non_actionable,fact
r25,s4,t20,As such it's not even clear if this is proper for a conference.,non_actionable,shortcoming
r25,s5,t20,The authors have already provided several pages worth of additional comments on the website on further related work.,non_actionable,fact
r25,s6,t20,I view this as an issue in and of itself.,non_actionable,shortcoming
r25,s7,t20,I've seen many papers that need to go through much more complicated derivations and theory and remain within a 8-10 page limit by being precise and strictly to the point.,non_actionable,fact
r25,s8,t20,"Perhaps Godel could be a good inspiration here, with a 21 page PhD thesis that fundamentally changed mathematics.",non_actionable,fact
r25,s9,t20,"So, while I cannot vouch for the correctness, I think it can and should go through a serious revision to make it succinct and that will likely considerably help in making it accessible to a wider readership and aligned to the expectations from a conference paper in the field.",actionable,suggestion
r25,s0,t10,Claiming much of common intuition around tricks for avoiding gradient issues are incorrect.,actionable,shortcoming
r25,s1,t10,The paper makes some bold claims.,actionable,other
r25,s2,t10,"It's possible some of the issues arise from the particular architectures they choose to investigate and demonstrate on (eg I have mostly seen ResNets in the context of CNNs but they analyze on FC topologies, the form of the loss, etc) but that's a guess and there are some further analysis in the supp material for these networks which I haven't looked at in detail.",actionable,fact
r25,s3,t10,"Regardless - an important note to the authors is that it's a particularly long and verbose paper, coming in at 16 pages of the main paper(!) with nearly 50 (!) pages of supplementary material where the heart and meat of the proofs and experiments reside.",actionable,shortcoming
r25,s4,t10,As such it's not even clear if this is proper for a conference.,actionable,shortcoming
r25,s5,t10,The authors have already provided several pages worth of additional comments on the website on further related work.,actionable,shortcoming
r25,s6,t10,I view this as an issue in and of itself.,actionable,shortcoming
r25,s7,t10,I've seen many papers that need to go through much more complicated derivations and theory and remain within a 8-10 page limit by being precise and strictly to the point.,actionable,fact
r25,s8,t10,"Perhaps Godel could be a good inspiration here, with a 21 page PhD thesis that fundamentally changed mathematics.",actionable,suggestion
r25,s9,t10,"So, while I cannot vouch for the correctness, I think it can and should go through a serious revision to make it succinct and that will likely considerably help in making it accessible to a wider readership and aligned to the expectations from a conference paper in the field.",actionable,suggestion
r25,s0,t31,Claiming much of common intuition around tricks for avoiding gradient issues are incorrect.,actionable,shortcoming
r25,s1,t31,The paper makes some bold claims.,non_actionable,fact
r25,s2,t31,"It's possible some of the issues arise from the particular architectures they choose to investigate and demonstrate on (eg I have mostly seen ResNets in the context of CNNs but they analyze on FC topologies, the form of the loss, etc) but that's a guess and there are some further analysis in the supp material for these networks which I haven't looked at in detail.",non_actionable,fact
r25,s3,t31,"Regardless - an important note to the authors is that it's a particularly long and verbose paper, coming in at 16 pages of the main paper(!) with nearly 50 (!) pages of supplementary material where the heart and meat of the proofs and experiments reside.",actionable,shortcoming
r25,s4,t31,As such it's not even clear if this is proper for a conference.,actionable,shortcoming
r25,s5,t31,The authors have already provided several pages worth of additional comments on the website on further related work.,non_actionable,fact
r25,s6,t31,I view this as an issue in and of itself.,actionable,shortcoming
r25,s7,t31,I've seen many papers that need to go through much more complicated derivations and theory and remain within a 8-10 page limit by being precise and strictly to the point.,actionable,shortcoming
r25,s8,t31,"Perhaps Godel could be a good inspiration here, with a 21 page PhD thesis that fundamentally changed mathematics.",actionable,suggestion
r25,s9,t31,"So, while I cannot vouch for the correctness, I think it can and should go through a serious revision to make it succinct and that will likely considerably help in making it accessible to a wider readership and aligned to the expectations from a conference paper in the field.",actionable,shortcoming
r25,s0,t2,Claiming much of common intuition around tricks for avoiding gradient issues are incorrect.,non_actionable,disagreement
r25,s1,t2,The paper makes some bold claims.,non_actionable,fact
r25,s2,t2,"It's possible some of the issues arise from the particular architectures they choose to investigate and demonstrate on (eg I have mostly seen ResNets in the context of CNNs but they analyze on FC topologies, the form of the loss, etc) but that's a guess and there are some further analysis in the supp material for these networks which I haven't looked at in detail.",non_actionable,fact
r25,s3,t2,"Regardless - an important note to the authors is that it's a particularly long and verbose paper, coming in at 16 pages of the main paper(!) with nearly 50 (!) pages of supplementary material where the heart and meat of the proofs and experiments reside.",actionable,shortcoming
r25,s4,t2,As such it's not even clear if this is proper for a conference.,non_actionable,shortcoming
r25,s5,t2,The authors have already provided several pages worth of additional comments on the website on further related work.,non_actionable,fact
r25,s6,t2,I view this as an issue in and of itself.,non_actionable,shortcoming
r25,s7,t2,I've seen many papers that need to go through much more complicated derivations and theory and remain within a 8-10 page limit by being precise and strictly to the point.,actionable,shortcoming
r25,s8,t2,"Perhaps Godel could be a good inspiration here, with a 21 page PhD thesis that fundamentally changed mathematics.",actionable,suggestion
r25,s9,t2,"So, while I cannot vouch for the correctness, I think it can and should go through a serious revision to make it succinct and that will likely considerably help in making it accessible to a wider readership and aligned to the expectations from a conference paper in the field.",actionable,suggestion
r122,s0,t20,"Creative and interesting The paper introduces an application of Graph Neural Networks (Li's Gated Graph Neural Nets, GGNNs, specifically) for reasoning about programs and programming.",non_actionable,agreement
r122,s1,t20,"The core idea is to represent a program as a graph that a GGNN can take as input, and train the GGNN to make token-level predictions that depend on the semantic context.",non_actionable,fact
r122,s2,t20,"identifying bugs in programs where the wrong variable is used, and",non_actionable,fact
r122,s3,t20,2) predicting a variable's name by consider its semantic context.,non_actionable,fact
r122,s4,t20,"The paper is generally well written, easy to read and understand, and the results are compelling.",non_actionable,agreement
r122,s5,t20,The proposed GGNN approach outperforms (bi-)LSTMs on both tasks.,non_actionable,agreement
r122,s6,t20,"Because the tasks are not widely explored in the literature, it could be difficult to know how crucial exploiting graphically structured information is, so the authors performed several ablation studies to analyze this out.",non_actionable,fact
r122,s7,t20,"Those results show that as structural information is removed, the GGNN's performance diminishes, as expected.",non_actionable,fact
r122,s8,t20,"As a demonstration of the usefulness of their approach, the authors ran their model on an unnamed open-source project and claimed to find several bugs, at least one of which potentially reduced memory performance.",non_actionable,fact
r122,s9,t20,"Overall the work is important, original, well-executed, and should open new directions for deep learning in program analysis.",non_actionable,agreement
r122,s0,t10,"Creative and interesting The paper introduces an application of Graph Neural Networks (Li's Gated Graph Neural Nets, GGNNs, specifically) for reasoning about programs and programming.",non_actionable,other
r122,s1,t10,"The core idea is to represent a program as a graph that a GGNN can take as input, and train the GGNN to make token-level predictions that depend on the semantic context.",non_actionable,other
r122,s2,t10,"identifying bugs in programs where the wrong variable is used, and",non_actionable,other
r122,s3,t10,2) predicting a variable's name by consider its semantic context.,non_actionable,other
r122,s4,t10,"The paper is generally well written, easy to read and understand, and the results are compelling.",actionable,agreement
r122,s5,t10,The proposed GGNN approach outperforms (bi-)LSTMs on both tasks.,actionable,agreement
r122,s6,t10,"Because the tasks are not widely explored in the literature, it could be difficult to know how crucial exploiting graphically structured information is, so the authors performed several ablation studies to analyze this out.",non_actionable,other
r122,s7,t10,"Those results show that as structural information is removed, the GGNN's performance diminishes, as expected.",non_actionable,other
r122,s8,t10,"As a demonstration of the usefulness of their approach, the authors ran their model on an unnamed open-source project and claimed to find several bugs, at least one of which potentially reduced memory performance.",non_actionable,other
r122,s9,t10,"Overall the work is important, original, well-executed, and should open new directions for deep learning in program analysis.",actionable,agreement
r122,s0,t25,"Creative and interesting The paper introduces an application of Graph Neural Networks (Li's Gated Graph Neural Nets, GGNNs, specifically) for reasoning about programs and programming.",non_actionable,agreement
r122,s1,t25,"The core idea is to represent a program as a graph that a GGNN can take as input, and train the GGNN to make token-level predictions that depend on the semantic context.",non_actionable,fact
r122,s2,t25,"identifying bugs in programs where the wrong variable is used, and",actionable,shortcoming
r122,s3,t25,2) predicting a variable's name by consider its semantic context.,non_actionable,fact
r122,s4,t25,"The paper is generally well written, easy to read and understand, and the results are compelling.",non_actionable,agreement
r122,s5,t25,The proposed GGNN approach outperforms (bi-)LSTMs on both tasks.,actionable,shortcoming
r122,s6,t25,"Because the tasks are not widely explored in the literature, it could be difficult to know how crucial exploiting graphically structured information is, so the authors performed several ablation studies to analyze this out.",non_actionable,agreement
r122,s7,t25,"Those results show that as structural information is removed, the GGNN's performance diminishes, as expected.",non_actionable,agreement
r122,s8,t25,"As a demonstration of the usefulness of their approach, the authors ran their model on an unnamed open-source project and claimed to find several bugs, at least one of which potentially reduced memory performance.",non_actionable,agreement
r122,s9,t25,"Overall the work is important, original, well-executed, and should open new directions for deep learning in program analysis.",non_actionable,agreement
r122,s0,t31,"Creative and interesting The paper introduces an application of Graph Neural Networks (Li's Gated Graph Neural Nets, GGNNs, specifically) for reasoning about programs and programming.",non_actionable,agreement
r122,s1,t31,"The core idea is to represent a program as a graph that a GGNN can take as input, and train the GGNN to make token-level predictions that depend on the semantic context.",non_actionable,fact
r122,s2,t31,"identifying bugs in programs where the wrong variable is used, and",non_actionable,fact
r122,s3,t31,2) predicting a variable's name by consider its semantic context.,non_actionable,fact
r122,s4,t31,"The paper is generally well written, easy to read and understand, and the results are compelling.",non_actionable,agreement
r122,s5,t31,The proposed GGNN approach outperforms (bi-)LSTMs on both tasks.,non_actionable,fact
r122,s6,t31,"Because the tasks are not widely explored in the literature, it could be difficult to know how crucial exploiting graphically structured information is, so the authors performed several ablation studies to analyze this out.",non_actionable,fact
r122,s7,t31,"Those results show that as structural information is removed, the GGNN's performance diminishes, as expected.",non_actionable,fact
r122,s8,t31,"As a demonstration of the usefulness of their approach, the authors ran their model on an unnamed open-source project and claimed to find several bugs, at least one of which potentially reduced memory performance.",non_actionable,fact
r122,s9,t31,"Overall the work is important, original, well-executed, and should open new directions for deep learning in program analysis.",non_actionable,agreement
r122,s0,t12,"Creative and interesting The paper introduces an application of Graph Neural Networks (Li's Gated Graph Neural Nets, GGNNs, specifically) for reasoning about programs and programming.",non_actionable,fact
r122,s1,t12,"The core idea is to represent a program as a graph that a GGNN can take as input, and train the GGNN to make token-level predictions that depend on the semantic context.",non_actionable,fact
r122,s2,t12,"identifying bugs in programs where the wrong variable is used, and",non_actionable,other
r122,s3,t12,2) predicting a variable's name by consider its semantic context.,non_actionable,fact
r122,s4,t12,"The paper is generally well written, easy to read and understand, and the results are compelling.",non_actionable,agreement
r122,s5,t12,The proposed GGNN approach outperforms (bi-)LSTMs on both tasks.,non_actionable,fact
r122,s6,t12,"Because the tasks are not widely explored in the literature, it could be difficult to know how crucial exploiting graphically structured information is, so the authors performed several ablation studies to analyze this out.",non_actionable,agreement
r122,s7,t12,"Those results show that as structural information is removed, the GGNN's performance diminishes, as expected.",non_actionable,fact
r122,s8,t12,"As a demonstration of the usefulness of their approach, the authors ran their model on an unnamed open-source project and claimed to find several bugs, at least one of which potentially reduced memory performance.",non_actionable,fact
r122,s9,t12,"Overall the work is important, original, well-executed, and should open new directions for deep learning in program analysis.",non_actionable,agreement
r43,s0,t31,deep learning with the boosting trick This paper applies the boosting trick to deep learning.,non_actionable,fact
r43,s1,t31,The proposed algorithm is validated on several image classification datasets.,non_actionable,fact
r43,s2,t31,The paper is its current form has the following issues:,actionable,shortcoming
r43,s3,t31,1. There is hardly any baseline compared in the paper.,actionable,shortcoming
r43,s4,t31,"The proposed algorithm is essentially an ensemble algorithm, there exist several works on deep model ensemble (e.g., Boosted convolutional neural networks, and Snapshot Ensemble) should be compared against.",actionable,shortcoming
r43,s5,t31,"2. I did not carefully check all the proofs, but seems most of the proof can be moved to supplementary to keep the paper more concise.",actionable,suggestion
r43,s6,t31,Why the error rate reported here is higher than that in the original paper?,non_actionable,question
r43,s7,t31,"Typo: In Session 3 Line 7, there is a missing reference.",actionable,shortcoming
r43,s8,t31,"In Session 3 Line 10, “1,00 object classes” should be “100 object classes”.",actionable,shortcoming
r43,s9,t31,"In Line 3 of the paragraph below Equation 5, “classe” should be “class”.",actionable,shortcoming
r43,s0,t10,deep learning with the boosting trick This paper applies the boosting trick to deep learning.,non_actionable,other
r43,s1,t10,The proposed algorithm is validated on several image classification datasets.,non_actionable,other
r43,s2,t10,The paper is its current form has the following issues:,actionable,shortcoming
r43,s3,t10,1. There is hardly any baseline compared in the paper.,actionable,shortcoming
r43,s4,t10,"The proposed algorithm is essentially an ensemble algorithm, there exist several works on deep model ensemble (e.g., Boosted convolutional neural networks, and Snapshot Ensemble) should be compared against.",actionable,shortcoming
r43,s5,t10,"2. I did not carefully check all the proofs, but seems most of the proof can be moved to supplementary to keep the paper more concise.",actionable,fact
r43,s6,t10,Why the error rate reported here is higher than that in the original paper?,actionable,question
r43,s7,t10,"Typo: In Session 3 Line 7, there is a missing reference.",actionable,shortcoming
r43,s8,t10,"In Session 3 Line 10, “1,00 object classes” should be “100 object classes”.",actionable,shortcoming
r43,s9,t10,"In Line 3 of the paragraph below Equation 5, “classe” should be “class”.",actionable,shortcoming
r43,s0,t20,deep learning with the boosting trick This paper applies the boosting trick to deep learning.,non_actionable,fact
r43,s1,t20,The proposed algorithm is validated on several image classification datasets.,non_actionable,fact
r43,s2,t20,The paper is its current form has the following issues:,non_actionable,shortcoming
r43,s3,t20,1. There is hardly any baseline compared in the paper.,non_actionable,shortcoming
r43,s4,t20,"The proposed algorithm is essentially an ensemble algorithm, there exist several works on deep model ensemble (e.g., Boosted convolutional neural networks, and Snapshot Ensemble) should be compared against.",actionable,suggestion
r43,s5,t20,"2. I did not carefully check all the proofs, but seems most of the proof can be moved to supplementary to keep the paper more concise.",actionable,suggestion
r43,s6,t20,Why the error rate reported here is higher than that in the original paper?,actionable,question
r43,s7,t20,"Typo: In Session 3 Line 7, there is a missing reference.",actionable,shortcoming
r43,s8,t20,"In Session 3 Line 10, “1,00 object classes” should be “100 object classes”.",actionable,shortcoming
r43,s9,t20,"In Line 3 of the paragraph below Equation 5, “classe” should be “class”.",actionable,shortcoming
r43,s0,t8,deep learning with the boosting trick This paper applies the boosting trick to deep learning.,non_actionable,fact
r43,s1,t8,The proposed algorithm is validated on several image classification datasets.,non_actionable,fact
r43,s2,t8,The paper is its current form has the following issues:,actionable,shortcoming
r43,s3,t8,1. There is hardly any baseline compared in the paper.,actionable,shortcoming
r43,s4,t8,"The proposed algorithm is essentially an ensemble algorithm, there exist several works on deep model ensemble (e.g., Boosted convolutional neural networks, and Snapshot Ensemble) should be compared against.",actionable,suggestion
r43,s5,t8,"2. I did not carefully check all the proofs, but seems most of the proof can be moved to supplementary to keep the paper more concise.",actionable,suggestion
r43,s6,t8,Why the error rate reported here is higher than that in the original paper?,non_actionable,question
r43,s7,t8,"Typo: In Session 3 Line 7, there is a missing reference.",actionable,shortcoming
r43,s8,t8,"In Session 3 Line 10, “1,00 object classes” should be “100 object classes”.",actionable,suggestion
r43,s9,t8,"In Line 3 of the paragraph below Equation 5, “classe” should be “class”.",actionable,suggestion
r43,s0,t16,deep learning with the boosting trick This paper applies the boosting trick to deep learning.,non_actionable,fact
r43,s1,t16,The proposed algorithm is validated on several image classification datasets.,non_actionable,fact
r43,s2,t16,The paper is its current form has the following issues:,actionable,shortcoming
r43,s3,t16,1. There is hardly any baseline compared in the paper.,actionable,shortcoming
r43,s4,t16,"The proposed algorithm is essentially an ensemble algorithm, there exist several works on deep model ensemble (e.g., Boosted convolutional neural networks, and Snapshot Ensemble) should be compared against.",non_actionable,fact
r43,s5,t16,"2. I did not carefully check all the proofs, but seems most of the proof can be moved to supplementary to keep the paper more concise.",actionable,suggestion
r43,s6,t16,Why the error rate reported here is higher than that in the original paper?,actionable,question
r43,s7,t16,"Typo: In Session 3 Line 7, there is a missing reference.",actionable,shortcoming
r43,s8,t16,"In Session 3 Line 10, “1,00 object classes” should be “100 object classes”.",actionable,shortcoming
r43,s9,t16,"In Line 3 of the paragraph below Equation 5, “classe” should be “class”.",actionable,shortcoming
r49,s0,t2,Deep Temporal Clustering This paper proposes an algorithm for jointly performing dimensionality reduction and temporal clustering in a deep learning context.,non_actionable,fact
r49,s1,t2,"An autoencoder is utilized for dimensionality reduction alongside a clustering objective - that is the autoencoder optimizes the mse (using LSTM layers are utilized in the autoencoder for modelling temporal information), while the latent space is fed into the temporal clustering layer.",non_actionable,fact
r49,s2,t2,The clustering/autoencoder objectives are optimized in an alternating optimization fashion.,non_actionable,fact
r49,s3,t2,"The main con lies in this work being very closely related to t-sne, i.e. compare the the temporal clustering loss based on kl-div (eq 6) to t-sne.",non_actionable,shortcoming
r49,s4,t2,"If we consider e.g., a linear 1-layer autoencoder to be equivalent to PCA (without the rnn layers), in essence this formulation is closely related to applying pca to reduce the initial dimensionality and then t-sne.",non_actionable,fact
r49,s5,t2,"Also, do the cluster centroids appear to be roughly stable over many runs of the algorithm?",actionable,question
r49,s6,t2,"As the averaged results over 5 runs are shown, the standard deviation would be helpful towards showing this empirically.",actionable,suggestion
r49,s7,t2,"On the positive side, it is likely that richer representations can be obtained via this architecture, and results appear to be good with comparison to other metrics The section of the paper that discusses heat-maps should be written more clearly.",actionable,suggestion
r49,s8,t2,Figure 3 is commented with respect to detecting an event - non-event but the process itself is not clearly described as far as I can see.,actionable,shortcoming
r49,s9,t2,minor note: the dynamic time warping is formally not a metric,actionable,disagreement
r49,s0,t10,Deep Temporal Clustering This paper proposes an algorithm for jointly performing dimensionality reduction and temporal clustering in a deep learning context.,non_actionable,other
r49,s1,t10,"An autoencoder is utilized for dimensionality reduction alongside a clustering objective - that is the autoencoder optimizes the mse (using LSTM layers are utilized in the autoencoder for modelling temporal information), while the latent space is fed into the temporal clustering layer.",non_actionable,other
r49,s2,t10,The clustering/autoencoder objectives are optimized in an alternating optimization fashion.,non_actionable,other
r49,s3,t10,"The main con lies in this work being very closely related to t-sne, i.e. compare the the temporal clustering loss based on kl-div (eq 6) to t-sne.",actionable,shortcoming
r49,s4,t10,"If we consider e.g., a linear 1-layer autoencoder to be equivalent to PCA (without the rnn layers), in essence this formulation is closely related to applying pca to reduce the initial dimensionality and then t-sne.",actionable,shortcoming
r49,s5,t10,"Also, do the cluster centroids appear to be roughly stable over many runs of the algorithm?",actionable,question
r49,s6,t10,"As the averaged results over 5 runs are shown, the standard deviation would be helpful towards showing this empirically.",actionable,suggestion
r49,s7,t10,"On the positive side, it is likely that richer representations can be obtained via this architecture, and results appear to be good with comparison to other metrics The section of the paper that discusses heat-maps should be written more clearly.",actionable,suggestion
r49,s8,t10,Figure 3 is commented with respect to detecting an event - non-event but the process itself is not clearly described as far as I can see.,actionable,fact
r49,s9,t10,minor note: the dynamic time warping is formally not a metric,actionable,shortcoming
r49,s0,t16,Deep Temporal Clustering This paper proposes an algorithm for jointly performing dimensionality reduction and temporal clustering in a deep learning context.,non_actionable,fact
r49,s1,t16,"An autoencoder is utilized for dimensionality reduction alongside a clustering objective - that is the autoencoder optimizes the mse (using LSTM layers are utilized in the autoencoder for modelling temporal information), while the latent space is fed into the temporal clustering layer.",non_actionable,fact
r49,s2,t16,The clustering/autoencoder objectives are optimized in an alternating optimization fashion.,non_actionable,fact
r49,s3,t16,"The main con lies in this work being very closely related to t-sne, i.e. compare the the temporal clustering loss based on kl-div (eq 6) to t-sne.",actionable,fact
r49,s4,t16,"If we consider e.g., a linear 1-layer autoencoder to be equivalent to PCA (without the rnn layers), in essence this formulation is closely related to applying pca to reduce the initial dimensionality and then t-sne.",non_actionable,fact
r49,s5,t16,"Also, do the cluster centroids appear to be roughly stable over many runs of the algorithm?",non_actionable,question
r49,s6,t16,"As the averaged results over 5 runs are shown, the standard deviation would be helpful towards showing this empirically.",actionable,suggestion
r49,s7,t16,"On the positive side, it is likely that richer representations can be obtained via this architecture, and results appear to be good with comparison to other metrics The section of the paper that discusses heat-maps should be written more clearly.",non_actionable,agreement
r49,s8,t16,Figure 3 is commented with respect to detecting an event - non-event but the process itself is not clearly described as far as I can see.,actionable,shortcoming
r49,s9,t16,minor note: the dynamic time warping is formally not a metric,actionable,fact
r49,s0,t31,Deep Temporal Clustering This paper proposes an algorithm for jointly performing dimensionality reduction and temporal clustering in a deep learning context.,non_actionable,fact
r49,s1,t31,"An autoencoder is utilized for dimensionality reduction alongside a clustering objective - that is the autoencoder optimizes the mse (using LSTM layers are utilized in the autoencoder for modelling temporal information), while the latent space is fed into the temporal clustering layer.",non_actionable,fact
r49,s2,t31,The clustering/autoencoder objectives are optimized in an alternating optimization fashion.,non_actionable,agreement
r49,s3,t31,"The main con lies in this work being very closely related to t-sne, i.e. compare the the temporal clustering loss based on kl-div (eq 6) to t-sne.",actionable,shortcoming
r49,s4,t31,"If we consider e.g., a linear 1-layer autoencoder to be equivalent to PCA (without the rnn layers), in essence this formulation is closely related to applying pca to reduce the initial dimensionality and then t-sne.",non_actionable,fact
r49,s5,t31,"Also, do the cluster centroids appear to be roughly stable over many runs of the algorithm?",non_actionable,question
r49,s6,t31,"As the averaged results over 5 runs are shown, the standard deviation would be helpful towards showing this empirically.",actionable,suggestion
r49,s7,t31,"On the positive side, it is likely that richer representations can be obtained via this architecture, and results appear to be good with comparison to other metrics The section of the paper that discusses heat-maps should be written more clearly.",actionable,shortcoming
r49,s8,t31,Figure 3 is commented with respect to detecting an event - non-event but the process itself is not clearly described as far as I can see.,actionable,shortcoming
r49,s9,t31,minor note: the dynamic time warping is formally not a metric,actionable,shortcoming
r49,s0,t20,Deep Temporal Clustering This paper proposes an algorithm for jointly performing dimensionality reduction and temporal clustering in a deep learning context.,non_actionable,fact
r49,s1,t20,"An autoencoder is utilized for dimensionality reduction alongside a clustering objective - that is the autoencoder optimizes the mse (using LSTM layers are utilized in the autoencoder for modelling temporal information), while the latent space is fed into the temporal clustering layer.",non_actionable,fact
r49,s2,t20,The clustering/autoencoder objectives are optimized in an alternating optimization fashion.,non_actionable,fact
r49,s3,t20,"The main con lies in this work being very closely related to t-sne, i.e. compare the the temporal clustering loss based on kl-div (eq 6) to t-sne.",non_actionable,fact
r49,s4,t20,"If we consider e.g., a linear 1-layer autoencoder to be equivalent to PCA (without the rnn layers), in essence this formulation is closely related to applying pca to reduce the initial dimensionality and then t-sne.",non_actionable,fact
r49,s5,t20,"Also, do the cluster centroids appear to be roughly stable over many runs of the algorithm?",actionable,question
r49,s6,t20,"As the averaged results over 5 runs are shown, the standard deviation would be helpful towards showing this empirically.",actionable,suggestion
r49,s7,t20,"On the positive side, it is likely that richer representations can be obtained via this architecture, and results appear to be good with comparison to other metrics The section of the paper that discusses heat-maps should be written more clearly.",actionable,suggestion
r49,s8,t20,Figure 3 is commented with respect to detecting an event - non-event but the process itself is not clearly described as far as I can see.,actionable,shortcoming
r49,s9,t20,minor note: the dynamic time warping is formally not a metric,actionable,disagreement
r47,s0,t31,Each subtask execution is represented by a (non-learned) option.,non_actionable,fact
r47,s1,t31,"Alternatively, if the subtask graphs were learned instead of given, that would open the door to scaling an general learning.",non_actionable,fact
r47,s2,t31,"Yet, this is not discussed in the paper.",actionable,shortcoming
r47,s3,t31,"The proposed algorithm relies on fairly involved reward shaping, in that it is a very strong signal of supervision on what the next action should be.",non_actionable,fact
r47,s4,t31,"Additionaly, it's not clear why learning seems to completely ""fail"" without the pre-trained policy.",actionable,shortcoming
r47,s5,t31,"The justification given is that it is ""to address the difficulty of training due to the complex nature of the problem"" but this is not really satisfying as the problems are not that hard.",actionable,shortcoming
r47,s6,t31,It it thus hard to properly evaluate your method against other proposed methods.,actionable,shortcoming
r47,s7,t31,- It seems weird that the smoothed logical AND/OR functions do not depend on the number of inputs; that is unless there are always 3 inputs (but it is not explained why; logical functions are usually formalised as functions of 2 inputs) as suggested by Fig 3.,actionable,shortcoming
r47,s8,t31,Is the time budget different for each new generated environment?,non_actionable,question
r47,s9,t31,- why wait until exactly 120 epochs for NTS-RProp before fine-tuning with actor-critic?,non_actionable,question
r47,s0,t10,Each subtask execution is represented by a (non-learned) option.,non_actionable,other
r47,s1,t10,"Alternatively, if the subtask graphs were learned instead of given, that would open the door to scaling an general learning.",actionable,suggestion
r47,s2,t10,"Yet, this is not discussed in the paper.",actionable,shortcoming
r47,s3,t10,"The proposed algorithm relies on fairly involved reward shaping, in that it is a very strong signal of supervision on what the next action should be.",actionable,fact
r47,s4,t10,"Additionaly, it's not clear why learning seems to completely ""fail"" without the pre-trained policy.",actionable,shortcoming
r47,s5,t10,"The justification given is that it is ""to address the difficulty of training due to the complex nature of the problem"" but this is not really satisfying as the problems are not that hard.",actionable,disagreement
r47,s6,t10,It it thus hard to properly evaluate your method against other proposed methods.,actionable,shortcoming
r47,s7,t10,- It seems weird that the smoothed logical AND/OR functions do not depend on the number of inputs; that is unless there are always 3 inputs (but it is not explained why; logical functions are usually formalised as functions of 2 inputs) as suggested by Fig 3.,actionable,fact
r47,s8,t10,Is the time budget different for each new generated environment?,actionable,question
r47,s9,t10,- why wait until exactly 120 epochs for NTS-RProp before fine-tuning with actor-critic?,actionable,question
r47,s0,t20,Each subtask execution is represented by a (non-learned) option.,non_actionable,fact
r47,s1,t20,"Alternatively, if the subtask graphs were learned instead of given, that would open the door to scaling an general learning.",non_actionable,fact
r47,s2,t20,"Yet, this is not discussed in the paper.",actionable,shortcoming
r47,s3,t20,"The proposed algorithm relies on fairly involved reward shaping, in that it is a very strong signal of supervision on what the next action should be.",non_actionable,fact
r47,s4,t20,"Additionaly, it's not clear why learning seems to completely ""fail"" without the pre-trained policy.",actionable,shortcoming
r47,s5,t20,"The justification given is that it is ""to address the difficulty of training due to the complex nature of the problem"" but this is not really satisfying as the problems are not that hard.",actionable,shortcoming
r47,s6,t20,It it thus hard to properly evaluate your method against other proposed methods.,non_actionable,fact
r47,s7,t20,- It seems weird that the smoothed logical AND/OR functions do not depend on the number of inputs; that is unless there are always 3 inputs (but it is not explained why; logical functions are usually formalised as functions of 2 inputs) as suggested by Fig 3.,actionable,shortcoming
r47,s8,t20,Is the time budget different for each new generated environment?,actionable,question
r47,s9,t20,- why wait until exactly 120 epochs for NTS-RProp before fine-tuning with actor-critic?,actionable,question
r47,s0,t8,Each subtask execution is represented by a (non-learned) option.,non_actionable,fact
r47,s1,t8,"Alternatively, if the subtask graphs were learned instead of given, that would open the door to scaling an general learning.",actionable,suggestion
r47,s2,t8,"Yet, this is not discussed in the paper.",actionable,shortcoming
r47,s3,t8,"The proposed algorithm relies on fairly involved reward shaping, in that it is a very strong signal of supervision on what the next action should be.",non_actionable,fact
r47,s4,t8,"Additionaly, it's not clear why learning seems to completely ""fail"" without the pre-trained policy.",actionable,shortcoming
r47,s5,t8,"The justification given is that it is ""to address the difficulty of training due to the complex nature of the problem"" but this is not really satisfying as the problems are not that hard.",actionable,shortcoming
r47,s6,t8,It it thus hard to properly evaluate your method against other proposed methods.,actionable,shortcoming
r47,s7,t8,- It seems weird that the smoothed logical AND/OR functions do not depend on the number of inputs; that is unless there are always 3 inputs (but it is not explained why; logical functions are usually formalised as functions of 2 inputs) as suggested by Fig 3.,actionable,shortcoming
r47,s8,t8,Is the time budget different for each new generated environment?,non_actionable,question
r47,s9,t8,- why wait until exactly 120 epochs for NTS-RProp before fine-tuning with actor-critic?,non_actionable,question
r47,s0,t2,Each subtask execution is represented by a (non-learned) option.,non_actionable,fact
r47,s1,t2,"Alternatively, if the subtask graphs were learned instead of given, that would open the door to scaling an general learning.",non_actionable,fact
r47,s2,t2,"Yet, this is not discussed in the paper.",actionable,shortcoming
r47,s3,t2,"The proposed algorithm relies on fairly involved reward shaping, in that it is a very strong signal of supervision on what the next action should be.",non_actionable,fact
r47,s4,t2,"Additionaly, it's not clear why learning seems to completely ""fail"" without the pre-trained policy.",non_actionable,shortcoming
r47,s5,t2,"The justification given is that it is ""to address the difficulty of training due to the complex nature of the problem"" but this is not really satisfying as the problems are not that hard.",non_actionable,disagreement
r47,s6,t2,It it thus hard to properly evaluate your method against other proposed methods.,actionable,shortcoming
r47,s7,t2,- It seems weird that the smoothed logical AND/OR functions do not depend on the number of inputs; that is unless there are always 3 inputs (but it is not explained why; logical functions are usually formalised as functions of 2 inputs) as suggested by Fig 3.,actionable,shortcoming
r47,s8,t2,Is the time budget different for each new generated environment?,actionable,question