-
Notifications
You must be signed in to change notification settings - Fork 15
/
README
712 lines (598 loc) · 35.6 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
CURRENNT is written in C++/CUDA and runs on Windows and Linux. It requires a
CUDA-capable graphics card with a compute capability of 1.3 or higher (i.e. at
least a GeForce 210, GT 220/40, FX380 LP, 1800M, 370/380M or NVS 2/3100M)
This is free software licensed under the GPL; see LICENSE for details.
If you find CURRENNT useful for your research, we would appreciate if you
cite the following paper,
Felix Weninger, Johannes Bergmann, and Bjoern Schuller,
"Introducing CURRENNT - the Munich Open-Source CUDA RecurREnt Neural Network
Toolkit", Journal of Machine Learning Research, 2014.
+=============================================================================+
| Structure |
+=============================================================================+
1. Building on Linux
2. Building on Windows
3. Command line options
3.1 Common options
3.2 Feed forward options
3.3 Training options
3.4 Autosave options
3.5 Data file options
3.6 Weight initialization options
4. Network configuration
5. NetCDF data files
5.1 General structure
5.2 Regression tasks
5.3 Classification tasks
6. Tests
7. Examples
+=============================================================================+
| 1. Building on Linux |
+=============================================================================+
Building on Linux requires the following:
* CUDA Toolkit 5.0
* GCC 4.6 or higher
* NetCDF scientific data library
* Boost library 1.48 or higher
To build CURRENNT execute the following commands:
#> cd currennt
#> mkdir build && cd build
#> cmake ..
#> make
This produces the executable file 'currennt'.
You might get many warnings like 'Cannot tell what pointer points to, assuming
global memory space'. These are totally OK and there seems to be no way to
suppress them.
+=============================================================================+
| 2. Building on Windows |
+=============================================================================+
Building on Windows requires the following:
* Visual Studio 2010
* CUDA Toolkit 5.0
* Boost library 1.52.0 or higher
The NetCDF sources and required DLLs are already there, so you do not have to
download them yourself.
To build CURRENNT do the following:
1) Open 'currennt.sln' in Visual Studio 2010
2) Ensure that you installed the Boost library in 'C:/boost_1_52_0'.
NOTE: If you build Boost from source, make sure you build the x64 version
(default is x86). This is done via
bjam msvc architecture=x86 address-model=64 stage
on the command line from the Boost source directory.
If you want to use another version or another installation path, you need
to modify the project files for the 'currennt_lib' and 'currennt' projects.
You can do this by changing the include and library search paths under the
project settings or you can edit the project files directly which is much
simpler. To do this, open the files 'currennt_lib/currennt_lib.vcxproj'
and 'currennt/currennt.vcxproj' and replace every occurence of
'C:/boost_1_52_0' with the path (may be a relative path) to your Boost
installation.
3) Ensure that the solution configuration is set to 'Release' on 'x64' (in the
toolbar right below the main menu)
4) Compile the whole solution via 'Build -> Build Solution' (F7)
You might get many warnings like 'Cannot tell what pointer points to, assuming
global memory space'. These are totally OK and there seems to be no way to
suppress them.
The build process creates currennt.exe in the Release directory.
If you have not installed the NetCDF library and its dependencies in your
global system path, you have to copy them from the CURRENNT source directory
into the Release directory.
+=============================================================================+
| 3. Command line options |
+=============================================================================+
CURRENNT offers many command line options. You can get a short description of
them by running the program with the '--help' switch. A more detailed
description of the options can be found in the following subsections.
All command line options can also be specified in a parameter file
(cf. --options_file below).
+-----------------------------------------------------------------------------+
| 3.1 Common options |
+-----------------------------------------------------------------------------+
--help
Shows a brief description of every available option.
--options_file <file.cfg>
This option reads the command line options from <file.cfg> in which every
option is set in a separate line in the form of 'option = value'. This is a
positional option, i.e. you can omit the name of the option and just run the
trainer as 'currennt mynet.cfg'. Go into the directory 'examples' for an
example of this feature. Options set on the command line have priority, i.e.
if you set 'max_epochs = 5' in <file.cfg> and set 'max_epochs = 10' on the
command line, the final value of this option will be 10.
--network <file.jsn>
Sets the file which defines the structure of the neural network. If the file
contains weights, these weights are used instead of initializing them using a
random distribution. The default value of this option is 'network.jsn'. The
structure of this file is explained in section 4.
--cuda <true/false>
Enables CUDA. If you explicitly set this option to 'false' then the CPU will
be used for all computations. This only makes sense for debugging or for
training very small networks (<< 100,000 weights). CUDA is enabled by
default. The device that is being used is specified by the CURRENNT_CUDA_DEVICE
environment variable (default: 0).
--list_devices <true/false>
If yes, instead of doing any processing, the list of available CUDA devices
along with their IDs is printed and the program exits. To change the CUDA
device being used, set the CURRENNT_CUDA_DEVICE environment variable, e.g.,
on bash: $ export CURRENNT_CUDA_DEVICE=<device_id>
--parallel_sequences <value>
In order to speed up computations, the trainer processes a set of multiple
training sequences, which we call a fraction, in parallel. I.e. if you set
this value to 50 then the trainer will split the whole data set in fractions
of 50 sequences each and calculates all those sequences concurrently. This
is the ultimate option to boost the training speed but you should be careful
because choosing a value != 1 means that you do not do true online learning
any more if that's important for your task. If you can afford it, choose a
value as high as possible (if you go too high the program tells you that
there is not enough GPU memory available). This can speedup the training over
20x compared to true online learning! Section 3.3 contains more information
about this option.
--random_seed <value>
Sets the seed for the random number generators. This option allows you to
re-produce training results on the same machine. Useful for debugging but
usually you should leave it at its default value of 0 which causes the
program to create the seed itself.
+-----------------------------------------------------------------------------+
| 3.2 Forward pass options |
+-----------------------------------------------------------------------------+
In forward pass mode, the trainer does not train a network but uses an
already trained network (specified by the --network option) and computes only
the forward pass to obtain the network outputs for certain input sequences.
--ff_input_file <file.nc>
Sets the name of the data file (in NetCDF format, see section 5) that
contains the network input sequences for the forward pass. Data files
used in forward pass must have the same structure as training data files
(see section 5), but the vectors of outputs or target classes may be missing.
If you set static noise (via '--input_noise_sigma', see below) != 0, then the
noise is also applied to the input sequences of this file. If you don't want
that, ensure that static noise is set to 0.
--ff_output_file <file>
Sets the name of the output file that is produced in forward pass.
The interpretation of this option depends on the chosen output format (cf.
--ff_output_format)
If the output format is "single CSV", it specifies the name of a CSV file;
otherwise, it specifies the name of a directory where for each sequence,
a file with the output activations will be created.
The default output file name is 'ff_output.csv'.
--ff_output_format <format>
Sets the output format for the forward pass (htk, csv, or single_csv).
In "single_csv" mode, a single CSV file is written
in which the first column contains the sequence tag from
the input data file and the other columns contain the activations of the
output neurons in temporally ascending order. I.e. if your network has an
output layer with 2 neurons in it and the input data file contains two
sequences with tags A and B whereas A has a length of 2 and B has a length of
3 timesteps, then the resulting output file could look like
A;0.1;0.2;-0.05;0.9723
B;0.5;0.5;0.111;-0.22;0.82;0.3
In "csv" mode, every output sequence is written to a CSV file whose name
corresponds to the sequence tag plus ".csv".
If the sequence tag can be interpreted as a
path name on your system, subdirectories will be created accordingly.
For example, if there are two sequences called "speaker1/sentence1" and
"speaker2/sentence3", the directories speaker1 and speaker2 will be created,
and the CSV output will be written to the sentence1.csv and sentence3.csv files
within these directories.
In "htk" mode, the logic is similar, but instead of CSV files, binary files
in the Hidden Markov Toolkit (HTK) feature file format will be output.
If you don't know what HTK is, you will probably not need this option.
The default output format is "single_csv".
--ff_output_kind <number>
Sets the feature type "magic number" in the HTK feature files.
Only relevant if ff_output_format = htk.
Default is 9 ("user defined").
--feature_period <number>
Sets the sampling period of the features in the HTK feature files, given in
milliseconds. Default is 10.
--revert_std <true/false>
If set to true, it is attempted to read the outputMeans and outputStdevs
fields from the forward pass input file, which are interpreted as the
original means and standard deviations of the output features, while it is
assumed that the output features are standardized. Then, the output features
are multiplied by their standard deviations and the means are added. If the
fields cannot be read, no operation is performed. This is only valid for
regression tasks. Default is "true". See the nc-standardize tool in the
tools folder for creating standardized NetCDF files.
+-----------------------------------------------------------------------------+
| 3.3 Training options |
+-----------------------------------------------------------------------------+
--train <true/false>
Enables training mode. If you want to train a network then you have have to
set this option to 'true'. The default value is 'false' and hence the program
does only a forward pass by default.
--stochastic <true/false>
Enables stochastic gradient descent. If you set this value to 'true'
then the trainer updates the network weights after every processed fraction
(which consists of PS sequences, with PS being the number of parallel
sequences set via the --parallel_sequences option). If you set this value to
'false' which is the default value, then you're doing batch learning. If you
want true online learning (weight updates after every sequence) then you have
to set this value to 'true' and set '--parallel_sequences' to 1.
--hybrid_online_batch <true/false>
Does the same, provided for compatibility.
--shuffle_fractions <true/false>
Enables shuffling of fractions during network training. Since only the
fractions are shuffled, they always consist of the same sequences. The
advantage of fraction shuffling over sequence shuffling (see below) is
that it can be considerably faster if the lengths of your input sequences
differ greatly. If you have roughly equally long sequences, prefer sequence
shuffling. The default value is 'false'.
--shuffle_sequences <true/false>
Enables shuffling of sequences within and across fractions. As opposed to
fraction shuffling (see above), this mode truely randomizes the order of all
sequences in each training epoch. If the lengths of your input sequences
differ greatly, you can use fraction shuffling instead which should speed up
the training at the cost of less randomized sequences.
--max_epochs <value>
Sets the maximum number of training epochs. If the network does not converge
after <value> epochs, the training is stopped, no matter what. The default
value of this option is infinity.
--max_epochs_no_best <value>
Causes the program to stop training if no new best error on the validation
set could be achieved within the last <value> epochs. Be careful if you also
use the '--validate_every' option (see below). If you choose to only
calculate the validation error every 5 epochs, then the trainer might miss
new best errors in epochs in which this error is not evaluated. The default
value of this option is 20.
--validate_every <value>
Causes the trainer to evaluate the validation error every <value> epochs.
Choosing a value other than 1 speeds up training times but you might miss new
best errors. The default value is 1.
--test_every <value>
Causes the trainer to evaluate the error on the test set every <value>
epochs. The default value is 1.
--optimizer steepest_descent
Sets the type of optimizer to use. The default (and currently only) optimizer
is a steepest descent optimizer with momentum ('steepest_descent'). Its
parameters can be set via the options '--learning_rate' and '--momentum' (see
below).
--learning_rate <value>
Sets the learning rate for the steepest descent optimizer. The default value
is 1e-5.
--momentum <value>
Sets the momentum for the steepest descent optimizer. The default value is
0.9.
--weight_noise_sigma <value>
Sets the standard deviation of the Gaussian noise that is applied to weights
before calculating the gradient of each batch. This may provide more
robustness against overfitting. The default value is 0.0 (no noise).
--save_network <file.jsn>
Sets the file name of the network file that is created when training is
finished. The file will contain the network structure (the same as the one
loaded from the file provided by '--network') and the trained weights. You
can use this file to re-train the network by simply using it as value for
the '--network' option. The trainer will then use these weights as initial
weights instead of initializing them randomly.
+-----------------------------------------------------------------------------+
| 3.4 Autosave options |
+-----------------------------------------------------------------------------+
CURRENNT offers options to save to current training status after every
training epoch. Autosave is disabled by default but if you're training on a
large amount of data, you should really switch it on. If you don't and your
computer crashes during training, you have to start all over again.
The produced autosave files are in JSON format and can be used as network file
for the '--network' option. This means you can first train the network a few
epochs with configuration A and then continue training with a completely
different configuration B by restarting the program with the autosave file.
WARNING: If an autosave file already exists, it will be overwritten!
--autosave <true/false>
Enables autosave after every training epoch. The resulting files are named
'<prefix>epoch0123.autosave' with <prefix> being the prefix configured via
'--autosave_prefix'. Default is 'false'. Note that using this option is
time-consuming (as networks have to be serialized after every epoch) and
requires a lot of disk space if you have large networks!
--autosave_best <true/false>
Enables autosave once a new best validation error is reached.
The resulting files will be written to <autosave_prefix>.best.jsn.
Using this option is recommended if training your network takes significant
time. Default is 'false'.
--autosave_prefix <value>
Sets the filename prefix for autosave files. If you want your autosave files
to be put in the directory 'mydir' and have filenames like
'mynn-epoch012.autosave', then you should set <value> to 'mydir/mynn-'.
The default prefix is empty, so the resulting files are put in the current
working directory with names like 'epoch012.autosave'
--continue <file.autosave>
Continues training from the provided autosave file. If you continue from an
autosave file, all options set at the command line or in the options file are
ignored as they are stored in the autosave file itself. If you really want to
change these options, you can edit the autosave file since it is an ordinary
JSON file. WARNING: If you do this (a) you have to be careful not to break
anything (special characters, delimiters, ...) and (b) you lose the
information about which options were used to obtain the autosave file in the
first place. To continue training, you only need the autosave file and the
training/validation/test data files.
+-----------------------------------------------------------------------------+
| 3.5 Data file options |
+-----------------------------------------------------------------------------+
--train_file <file.nc[,file2.nc...]>
Sets the NetCDF file(s) which contain(s) the sequences used as the training
set during network training. Separate multiple files by semicolons or
commas. The required structure of the data file is described in section 5.
Does not have a default value and must be set in training mode.
--val_file <file.nc[,file2.nc...]>
Sets the NetCDF file(s) which contain(s) the sequences used as the validation
set during network training. Separate multiple files by semicolons or
commas. The required structure of the data file is described in section 5.
Does not have a default value. If you don't provide a validation file, then
training is done without it.
--test_file <file.nc[,file2.nc...]>
Sets the NetCDF file(s) which contain(s) the sequences used as the test set
during network training. Separate multiple files by semicolons or commas.
The required structure of the data file is described in section 5. Does not
have a default value. If you don't provide a test file, then training is done
without it.
--train_fraction <value>
Sets the fraction of the training set that will be used. I.e. if you set this
value to 0.5, then only the first half of the sequences from the file set via
'--train_file' will be used.
--val_fraction <value>
Sets the fraction of the validation set that will be used. I.e. if you set
this value to 0.5, then only the first half of the sequences from the file
set via '--val_file' will be used.
--test_fraction <value>
Sets the fraction of the test set that will be used. I.e. if you set this
value to 0.5, then only the first half of the sequences from the file set via
'--test_file' will be used.
--truncate_seq <value>
Truncates over-long training sequences to a maximum length.
This can greatly speed up training if your sequences vary much in length,
and enable training with very long sequences that might not fit into the GPU
memory. If a sequence is longer than <truncate_seq> * 1.5 time steps, the
remaining time steps will be assigned to a new sequence, and this process
will be continued iteratively until no sequence is longer than <truncate_seq>
* 1.5 time steps. As a result, no sequence will be shorter than 0.5 *
* <truncate_seq> time steps. Default is 0 (no truncation).
--input_noise_sigma <value>
Sets the standard deviation of the static noise that is applied to the input
sequences of the training and feed forward input set. Static noise is not
applied to validation and test sets. The default value is 0.0 (no static
noise)
--input_left_context <value>
--input_right_context <value>
Enables frame splicing, i.e., concatenating multiple feature frames from a
sliding window. The features which are effectively used at time step t span
the features from timesteps t-<input_left_context> through
t+<input_right_context>. The first and last frames are duplicated as
necessary. Default is no frame splicing, i.e., input_left_context =
input_right_context = 0.
--output_time_lag <value>
Enables training with output time lag. This is mainly useful for having
unidirectional RNNs with lookahead. For example, if this is set to 5, the
network will be trained to predict the given targets 5 time steps before. If
this is given in forward pass mode, the outputs are shifted to the "left"
by <output_time_lag> frames, and the last output time step is duplicated as
necessary. Default is no time lag.
--cache_path <string>
Sets the path for caching data (default /tmp).
+-----------------------------------------------------------------------------+
| 3.6 Weight initialization options |
+-----------------------------------------------------------------------------+
The network weights are initialized randomly unless specified in the network
file set via the '--network' option. The random weights are drawn from either
a uniform or a normal distribution.
--weights_dist <uniform/normal>
Sets the distribution to use for initializing the weights. Default is
'uniform'.
--weights_uniform_min <value>
Sets the minimum value for weights when using the uniform distribution.
Default is -0.1.
--weights_uniform_max <value>
Sets the maximum value for weights when using the uniform distribution.
Default is 0.1.
--weights_normal_sigma <value>
Sets the standard deviation for the normal distribution used to initialize
the weights. Default is 0.1.
--weights_normal_mean <value>
Sets the standard deviation for the normal distribution used to initialize
the weights. Default is 0.
+=============================================================================+
| 4. Network configuration |
+=============================================================================+
The structure of a neural network is defined in a JSON file and passed to the
CURRENNT executable via the '--network' option. The structure of such files
is described in this chapter.
As an example, we will create a neural network for multiclass classification
tasks. Our training data consists of input sequences with patterns of 39 values
and target sequences with integers denoting 1 out of 51 target classes for each
timestep. Hence, we need a neural network with 39 input neurons and 51 output
neurons. Each of the output neurons shall represent the probability of one
timestep belonging to the corresponding target class. Hence, the sum over all
output neuron activations shall equal 1.
Our network shall contain a single hidden LSTM layer that is trained in both
positive and negative time directions (-> bidirectional LSTM). The hidden layer
shall contain 100 biased LSTM units. The final network file has the following
content:
{
"layers": [
{
"size": 39,
"name": "input_layer",
"type": "input"
},
{
"size": 100,
"name": "hidden_layer",
"bias": 1.0,
"type": "blstm"
},
{
"size": 51,
"name": "output_layer",
"bias": 1.0,
"type": "softmax"
},
{
"size": 51,
"name": "postoutput_layer",
"type": "multiclass_classification"
}
]
}
It is very important to use the right braces and commas in the right places. If
the syntax is not 100% correct, CURRENNT will not accept the file. It is also
crucial to define the layers in the right order. The first layer must always be
the input layer (with type 'input') and the last layer must always be a post
output layer.
The so-called post output layer is always the very last layer in the network.
It does not have any trainable weights and its purpose is to evaluate the
objective function during the forward pass and induce the error into the output
layer during the backward pass.
In this example, our network consists of an input layer with 39 neurons, a
hidden bidirectional LSTM layer, a feed forward output layer with 51 neurons
which use the softmax activation function and the post output layer suitable
for our multiclass classification task.
You can define an arbitrary number of hidden layers with different types and
sizes. See the directory 'examples' for more example network files.
Some important points:
* Input and post output layers do not require a bias value.
* You must provide a bias value for hidden and output layers. If you do not
want to have bias connections, set the bias to 0.
* Layer names must be unique
* The post output layer has the same size as the output layer, except
for the sse_mask type (cf. below)
* The size of the post output layer must match the number of targets in the
NetCDF file in case of regression tasks, and must be equal to the number of
classes for multi-way classification tasks.
* It is possible to specify learning rates per layer, by adding
"learningRate": <learning_rate>
as key-value pair in the JSON element corresponding to the desired layer.
For example, this can be used to manually define transformations of features
as "deterministic", i.e. non-trainable, layers, or to prevent pre-trained
layers from being updated, by setting the learning rate to zero.
Available hidden and output layer types:
* feedforward_tanh = Feed forward layer with tanh as activation function
* feedforward_logistic = Feed forward layer with a logistic activation function
* feedforward_identity = Feed forward layer with f(x)=x activation function
* softmax = Feed forward layer with softmax activation function
* lstm = Unidir. LSTM layer with forget gates and peepholes
* blstm = Bidir. LSTM layer with forget gates and peepholes
Available post output layers:
("N targets" refers to the targetPatternSize in the NetCDF file for regression
and "N classes" to the numLabels field for classification)
* sse: Sum of Squared Error objective function
To be used with any output layer. Requires N targets for output layer of size
N
* weighted_sse: Weighted Sum of Squared Error objective function
To be used with any output layer. Requires 2N targets for output layer of
size N. The targets and the weights are expected to be interleaved, i.e., any
vector of targets should be of the form (t1, w1, t2, w2, ..., tN, wN).
* rmse: Root Mean Squared Error objective function
To be used with any output layer. Requires N targets for output layer of size
N
* ce: Cross entropy objective function
To be used with a softmax output layer. Requires N targets for output layer
of size N. Note: If you want to train multi-way classification, it is much
more efficient to use multiclass_classification, which avoids explicit
storage of all training targets (most of which are zero); this is only useful
if you want to train non-trivial discrete PDFs.
* binary_classification: Cross entropy for binary classification
Requires 1 class and a feedforward_logistic output layer of size 1
* multiclass_classification: Cross entropy objective function for 1-of-C coding
of C-way classification targets
Requires C classes and a softmax output layer of size C.
* sse_mask: Sum of Squared Error objective function for masking output
Requires 2N targets, and a feedforward_logistic output layer of size N. The
output layer activations are interpreted as [0,1] mask for a sequence of data
vectors (e.g., a noisy speech spectrogram) so that the product of the mask
computed by the network and the masking input should yield the desired
masking output (e.g., a clean speech spectrogram). This can be used, e.g.,
to train audio source separation. The masking outputs and masking inputs are
given alternatingly, i.e., the target vector in the NetCDF file is supposed
to be of the form (o1, i1, o2, i2, ..., oN, iN) where o denotes masking
output and i denotes masking input. Note that the "masking inputs" need not
correspond to the features at the input layer, e.g., the input layer can read
log-Mel spectra while the input data is a magnitude spectrogram. This is the
configuration used for the experiments in (Weninger et al., Discriminatively
trained recurrent neural networks for single-channel speech separation, IEEE
GlobalSIP, 2014).
+=============================================================================+
| 5. NetCDF data files |
+=============================================================================+
The data files used for CURRENNT are NetCDF (*.nc) files. An example is
provided in the directory 'examples' and its structure can be investigated by
running 'ncdump nc_file.nc'.
The following subsections describe the general structure of these files and the
required extensions for regression and classification tasks respectively.
+-----------------------------------------------------------------------------+
| 5.1 General structure |
+-----------------------------------------------------------------------------+
The data files need to contain the following dimensions:
* numSeqs = Number of sequences
* numTimesteps = Total number of timesteps
* inputPattSize = Size of each input pattern (= number of input neurons)
* maxSeqTagLength = Maximum length of a sequence tag
The required variables are:
* char seqTags(numSeqs, maxSeqTagLength) = Tag (name) for each sequence
* int seqLengths(numSeqs) = Length of each sequence
* float inputs(numTimesteps, inputPattSize) = Input Patterns
+-----------------------------------------------------------------------------+
| 5.2 Regression tasks |
+-----------------------------------------------------------------------------+
Additional dimensions:
* targetPattSize = Size of each output pattern (= number of output neurons)
Additional variables:
* float targetPatterns(numTimesteps, targetPattSize) = Target patterns
Optional variables:
* float outputMeans(targetPattSize) = estimated means of outputs
* float outputStdevs(targetPattSize) = estimated standard deviations of outputs
(used to revert standardization of outputs in forward pass mode)
+-----------------------------------------------------------------------------+
| 5.3 Classification tasks |
+-----------------------------------------------------------------------------+
Additional dimensions:
* numLabels = Number of target classes
Additional variables:
* int targetClasses(numTimesteps) = Target classes (one for each timestep)
+=============================================================================+
| 6. Tests |
+=============================================================================+
The directory 'tests' contains the scripts which check if the library produces
the correct output for a predefined input. Currently there is only one test
'test1' which traines a small deep RNN for one epoch and compares the trained
weights with reference weights obtained with RNNLIB. To run the test, execute
the 'run.py' Python script from within the directory 'test1'. The result will
be printed on the screen. Make sure that you have built CURRENNT as described
above before you run a test.
+=============================================================================+
| 7. Examples |
+=============================================================================+
The directory 'examples' contains example configurations that show how to use
CURRENNT for classification and regression.
The examples use an options file 'config.cfg' that contain the configuration
for CURRENNT and a JSON file 'network.jsn' to specify the network topologies.
Make sure that you have built CURRENNT as described above before you run the
examples.
The following examples are provided:
+-----------------------------------------------------------------------------+
| 7.1 Multi-class classification / Speech recognition |
+-----------------------------------------------------------------------------+
The directory 'speech_recognition_chime' contains
a noisy small vocabulary speech recognition task with 51 words from the
2nd CHiME Speech Separation and Recognition Challenge.
The two *.nc files contain the training and validation sequences as 39
Mel-frequency cepstral coefficient (MFCC) features per 10ms time step. Only the
data for speaker 1 is provided in order to keep the download size reasonable.
The two directories 'no_subsampling' and 'subsampling' contain different
network topologies (with and without activation subsampling layers) that are
trained by executing the shell script 'run.sh' in each of these folders.
On Windows, the batch file 'run.bat' does the same.
For a detailed description of the task and learning procedure, refer to
Martin Wöllmer, Felix Weninger, Jürgen Geiger, Björn Schuller, Gerhard Rigoll:
"Noise Robust ASR in Reverberated Multisource Environments Applying Convolutive
NMF and Long Short-Term Memory", Computer Speech and Language, Special Issue on
Speech Separation and Recognition in Multisource Environments, Elsevier, 17
pages, 2013.
+-----------------------------------------------------------------------------+
| 7.2 Regression / Autoencoding |
+-----------------------------------------------------------------------------+
The directory 'speech_autoencoding_chime' contains an example for a regression
task, mapping noisy speech MFCC features to clean speech MFCC features, similar
to autoencoding. The input data is the same as for the speech recognition
task, just the targets are different. You can run the example via 'run.sh'
or 'run.bat' as in the above.
For a detailed description of the task and learning procedure, refer to
Felix Weninger, Jürgen Geiger, Martin Wöllmer, Björn Schuller, Gerhard Rigoll:
"The Munich Feature Enhancement Approach to the 2013 CHiME Challenge Using
BLSTM Recurrent Neural Networks", Proc. 2nd CHiME Speech Separation and
Recognition Challenge held in conjunction with ICASSP 2013, IEEE, Vancouver,
Canada, 01.06.2013.