Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drug-Target predictions are broken #58

Open
CaseyTa opened this issue Aug 23, 2024 · 3 comments
Open

Drug-Target predictions are broken #58

CaseyTa opened this issue Aug 23, 2024 · 3 comments
Assignees
Labels
api-query Issue querying the API bug Something isn't working

Comments

@CaseyTa
Copy link
Contributor

CaseyTa commented Aug 23, 2024

Describe the problem

TRAPI queries for drug-target predictions do not return any results. Tested and reproducible in dev and all ITRB environments. Using the example query suggested in the documentation

{
    "message": {
        "query_graph": {
            "edges": {"e01": {"object": "n1", "predicates": ["biolink:interacts_with"], "subject": "n0"}},
            "nodes": {
                "n0": {
                    "categories": ["biolink:Drug"],
                    "ids": ["PUBCHEM.COMPOUND:5329102", "PUBCHEM.COMPOUND:4039", "CHEMBL.COMPOUND:CHEMBL1431"]},
                "n1": {
                    "categories": ["biolink:Protein"],
                    "ids": ["UniProtKB:O75251"]
                }
            }
        }
    },
    "query_options": {"max_score": 1, "min_score": 0.1, "n_results": 10}
}

ITRB cloudwatch logs show the following:

[2024-08-23 22:32:26 +0000] [81] [INFO] 🔮⏳️ Getting predictions for: ['PUBCHEM.COMPOUND:5329102', 'PUBCHEM.COMPOUND:4039', 'CHEMBL.COMPOUND:CHEMBL1431'] | []
[2024-08-23 22:32:26 +0000] [81] [ERROR] Error getting the predictions: [Errno 2] No such file or directory: 'models/drug_target.pkl'
@CaseyTa CaseyTa added api-query Issue querying the API bug Something isn't working labels Aug 23, 2024
Copy link

dagshub bot commented Aug 23, 2024

@CaseyTa
Copy link
Contributor Author

CaseyTa commented Aug 24, 2024

I tried looking into this a bit.

In my dev environment, the /app/models directory does not have drug_target.pkl. As a test, I manually downloaded the file from github into my running container:
wget https://github.com/MaastrichtU-IDS/predict-drug-target/raw/2f2d9aa1591f1181ba07a5fff69aeb112e4ec371/models/drug_target.pkl

Then the error message becomes

Check drugs in Vector DB, or get SMILES: 100%|██████████| 1/1 [00:00<00:00,  3.17it/s]
2024-08-24 15:45:11,514 INFO: [embeddings:compute_target_embedding] Retrieved 4962 targets
2024-08-24 15:45:14,171 ERROR: [trapi_parser:resolve_trapi_query] Error getting the predictions: 'Booster' object has no attribute 'predict_proba'

I loaded the pickle file manually and see that it's a xgboost.Booster object which doesn't have a predict_proba method, but has predict. I also see that when training the model, looks like it was evaluated using the predict function, so I changed predict_proba to predict. Now I get:

Check drugs in Vector DB, or get SMILES: 100%|██████████| 1/1 [00:00<00:00,  3.19it/s]
2024-08-24 15:51:47,257 INFO: [embeddings:compute_target_embedding] Retrieved 4962 targets
2024-08-24 15:51:49,890 ERROR: [trapi_parser:resolve_trapi_query] Error getting the predictions: ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>)

I then tried converting the DataFrame to DMatrix and got:

Check drugs in Vector DB, or get SMILES: 100%|██████████| 1/1 [00:00<00:00,  3.17it/s]
2024-08-24 16:10:59,814 INFO: [embeddings:compute_target_embedding] Retrieved 4962 targets
2024-08-24 16:11:02,804 ERROR: [trapi_parser:resolve_trapi_query] Error getting the predictions: name 'predicted' is not defined

I'm clearly going down a wrong path here. @micheldumontier Is anyone else available to continue troubleshooting?

@micheldumontier
Copy link
Collaborator

hi casey, i also was investigating this issue. i found a couple of issues. the first was this object is improperly saved, and doesn't comply with the expected interface (related to whether you save the weights of the booster or not). second, is that even when this was fixed, i found that the input dimension of the application doesn't match the training dataset. so i've resorted to rebuilding the prediction model and revising the code. still working on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-query Issue querying the API bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants