Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train_model for fingerprint data #1

Open
chepyle opened this issue Sep 12, 2017 · 2 comments
Open

train_model for fingerprint data #1

chepyle opened this issue Sep 12, 2017 · 2 comments

Comments

@chepyle
Copy link

chepyle commented Sep 12, 2017

#Thanks Connor for publishing this project- it is a fascinating take on QSAR approaches-
I noticed that train_model in core.py assumes that all inputs are molecular tensors, so fingerprint-based models fail because they are single arrays. For example, the command

python conv_qsar_fast/main/main_cv.py conv_qsar_fast/inputs/tox21_Morgan/tox21_ahr.cfg

fails with a error message along the lines of:

Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 arrays but instead got the following list of 3 arrays: [array([[1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
        0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0...
@connorcoley
Copy link
Owner

Hi @chepyle ,

When using a fingerprint representation, the use_fp keyword in architecture specification should be making its way to main.core.build_model() and main.data.get_data_full(), which will use a fixed FP input instead of the learned FP. I see that the problem is coming from train_model(), because this feeds the three input arrays in (regardless of use_fp - this was a late change to the code after running the baseline models).

I've created a branch chepyle and modified main.core.train_model() to add two dummy inputs to the model. That way, when training or testing, these inputs will be accepted but ignored. Because these will be disconnected from the computational graph, you will have to pass on_unused_input='warn' to Theano when running the script.

An alternate solution would be to change all of the calls to .fit or .predict to be inside conditionals that look at use_fp and pass only the first element of the inputs if using a fixed FP representation.

Let me know if this solves the issue; if so, I will merge it into the master branch.

-Connor

@chepyle
Copy link
Author

chepyle commented Sep 16, 2017

Thanks Connor, the model does run now for with the fingerprint data sets. However, the disconnected graph causes problems with save_model similar to this:
keras-team/keras#2790

Jake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants