train_model for fingerprint data #1

chepyle · 2017-09-12T11:44:36Z

#Thanks Connor for publishing this project- it is a fascinating take on QSAR approaches-
I noticed that train_model in core.py assumes that all inputs are molecular tensors, so fingerprint-based models fail because they are single arrays. For example, the command

python conv_qsar_fast/main/main_cv.py conv_qsar_fast/inputs/tox21_Morgan/tox21_ahr.cfg

fails with a error message along the lines of:

Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 arrays but instead got the following list of 3 arrays: [array([[1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
        0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0...

The text was updated successfully, but these errors were encountered:

connorcoley · 2017-09-12T17:40:08Z

Hi @chepyle ,

When using a fingerprint representation, the use_fp keyword in architecture specification should be making its way to main.core.build_model() and main.data.get_data_full(), which will use a fixed FP input instead of the learned FP. I see that the problem is coming from train_model(), because this feeds the three input arrays in (regardless of use_fp - this was a late change to the code after running the baseline models).

I've created a branch chepyle and modified main.core.train_model() to add two dummy inputs to the model. That way, when training or testing, these inputs will be accepted but ignored. Because these will be disconnected from the computational graph, you will have to pass on_unused_input='warn' to Theano when running the script.

An alternate solution would be to change all of the calls to .fit or .predict to be inside conditionals that look at use_fp and pass only the first element of the inputs if using a fixed FP representation.

Let me know if this solves the issue; if so, I will merge it into the master branch.

-Connor

chepyle · 2017-09-16T17:38:52Z

Thanks Connor, the model does run now for with the fingerprint data sets. However, the disconnected graph causes problems with save_model similar to this:
keras-team/keras#2790

Jake

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train_model for fingerprint data #1

train_model for fingerprint data #1

chepyle commented Sep 12, 2017 •

edited

Loading

connorcoley commented Sep 12, 2017

chepyle commented Sep 16, 2017

train_model for fingerprint data #1

train_model for fingerprint data #1

Comments

chepyle commented Sep 12, 2017 • edited Loading

connorcoley commented Sep 12, 2017

chepyle commented Sep 16, 2017

chepyle commented Sep 12, 2017 •

edited

Loading