preprocessing not working correctly #3

AndreLamurias · 2019-07-30T09:48:08Z

at the moment the preprocessing step is not generating the correct output, and the trained model obtains a low performance. Meanwhile I have uploaded the dditrain and dditest files that you can move to the temp/ directory to train the model: https://drive.google.com/drive/folders/1wKfdeLGm9x4PbmfkYj9Iz8S7jZZz8PUJ?usp=sharing

mjlorenzo305 · 2019-07-31T16:31:25Z

Thanks Andre @AndreLamurias
I was able to get the preprocessing step to complete. I previously thought it was hanging, but it turns out it takes a very long time to complete. I set logging to INFO level to help indicate that progress was occurring (slowly) during the get ddi sdp instances steps.

Once preprocessing completed I verified that the dditrain numpy arrays contained data which I used to then train the model. As you noted above, the training of the full model produced low performance. (Model converges at around .45 F1 on test set after 40 epochs)

I'll try out your pre-processed dataset above and re-train. I'll let you know.
Thanks, Mario

mjlorenzo305 · 2019-08-01T00:24:51Z

@AndreLamurias

I tried the provided preprocessed file by placing them in temp (and moving my previously generated file). After invoking the train process it fails as follows:

Traceback (most recent call last):
File "src/train_rnn.py", line 832, in
main()
File "src/train_rnn.py", line 570, in main
train(sys.argv[3], sys.argv[4:], train_inputs, id_to_index)
File "src/train_rnn.py", line 397, in train
inputs, w2v_layer, wn_index = prepare_inputs(channels, train_inputs, list_order, id_to_index)
File "src/train_rnn.py", line 349, in prepare_inputs
X_ids_left = preprocess_ids(X_subpaths_train[0], id_to_index, max_ancestors_length)
File "src/train_rnn.py", line 204, in preprocess_ids
idxs = [id_to_index[d.replace("", ":")] for d in seq if d and d.startswith("CHEBI")]
File "src/train_rnn.py", line 204, in
idxs = [id_to_index[d.replace("", ":")] for d in seq if d and d.startswith("CHEBI")]
KeyError: 'CHEBI:32134'

I ran it using the following command:
python src/train_rnn.py train temp/dditrain full_model words wordnet common_ancestors concat_ancestors

AndreLamurias · 2019-08-05T15:45:57Z

This is due to the different versions of the chebi ontology. The ID of that compound was updated since we generated those files. I will open another issue so that "alt_id" field is also considered.

For future reference, we used this version of the chebi ontology: ftp://ftp.ebi.ac.uk/pub/databases/chebi/archive/rel158/

mjlorenzo305 · 2019-08-06T17:28:30Z

Thanks @AndreLamurias
I was able to complete the training using the above mentioned version of chebi obo along with the provided set of preprocessed data (numpy arrays).

I observed some improvements in model performance of val_f1 at .60 but still not as high as expected after 100 epochs. Convergence occurs at around 30 epochs.

Any thoughts or ideas on what other param tuning is required?

Thanks, Mario

Here is the summary for the 100th Epoch

Epoch 100/100

244s - loss: 0.0268 - acc: 0.9902 - precision: 0.9745 - recall: 0.9702 - f1: 0.9714 - val_loss: 0.6955 - val_acc: 0.8616 - val_precision: 0.6170 - val_recall: 0.5524 - val_f1: 0.5732

predicted not false: 1372/1537
[[5945 133 180 74 9]
[ 214 268 27 8 0]
[ 212 15 383 27 2]
[ 118 8 30 158 1]
[ 17 0 6 1 42]]
VAL_f1: 0.604 VAL_p: 0.653 VAL_r 0.564
Epoch 00100: val_loss did not improve from 0.39073

mjlorenzo305 · 2019-08-07T00:29:13Z

Following up on my last comment:
Looks like I confused the DDI Detection with the DDI Classification task. The model I trained above was for DDI classification and therefore the val_f1 matches (or slightly better) than the performance reported in the BOLSTM paper. (Correct me if I am mistaken)

AndreLamurias · 2019-08-07T10:21:05Z

@mjlorenzo305 yes those scores are for the DDI classification task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preprocessing not working correctly #3

preprocessing not working correctly #3

AndreLamurias commented Jul 30, 2019

mjlorenzo305 commented Jul 31, 2019

mjlorenzo305 commented Aug 1, 2019

AndreLamurias commented Aug 5, 2019

mjlorenzo305 commented Aug 6, 2019 •

edited

Loading

mjlorenzo305 commented Aug 7, 2019

AndreLamurias commented Aug 7, 2019

preprocessing not working correctly #3

preprocessing not working correctly #3

Comments

AndreLamurias commented Jul 30, 2019

mjlorenzo305 commented Jul 31, 2019

mjlorenzo305 commented Aug 1, 2019

AndreLamurias commented Aug 5, 2019

mjlorenzo305 commented Aug 6, 2019 • edited Loading

mjlorenzo305 commented Aug 7, 2019

AndreLamurias commented Aug 7, 2019

mjlorenzo305 commented Aug 6, 2019 •

edited

Loading