Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results differ unexpectedly between pipeline.predict and query_bed with ref score #38

Open
an1lam opened this issue Dec 22, 2019 · 6 comments
Labels
help wanted Extra attention is needed

Comments

@an1lam
Copy link
Contributor

an1lam commented Dec 22, 2019

I'm using the DeepSEA/variantEffects model with MutationMap to try and find the most impactful mutations for a set of sequences. However, I've noticed a discrepancy between the predictions I get for the wild-type sequences when using pipeline.predict vs. query_bed with the ref score.

The pipeline.predict scores are generally high probabilities for CTCF in cell type A549, which is what I'd expect given that my bed file consists of ChIP-seq peaks for that TF/cell line pair. The ref scored predictions, on the other hand, tend to be really small. A few examples of the difference are:

Predictions from pipeline.predict: [0.96582216 0.03907571 0.09172686 0.19638198 0.13311383 0.64790106
 0.67463433 0.94372396 0.64945625 0.98188872]

		vs.

Predictions from `ref`-scored MutationMap.query_bed: [-9.167706593871117e-09, -3.85800376534462e-07, 2.0929292077198625e-08, 3.2794196158647537e-07, 3.2247044146060944e-08, 2.3366883397102356e-06, 0.0, 1.3096723705530167e-09, 1.1496013030409813e-09, 0.0]

I'm trying to understand whether this is expected, a result of something I'm doing wrong, or a bug. For context, my code is essentially the following (pared down to make it easier to see the essentials):

dl_kwargs = {'fasta_file': '../dat/hg19.fa'}
predict_dl_kwargs = {'fasta_file': '../dat/hg19.fa', 'intervals_file': random_seqs_fpath}
random_seqs_fpath = "../dat/ChIPseq.A549.CTCF.100.random.narrowPeak.gz"
deepsea = kipoi.get_model("DeepSEA/variantEffects", source="kipoi")

# Predictions from MutationMap
dl_kwargs = {'fasta_file': '../dat/hg19.fa'}
mm = MutationMap(deepsea, deepsea.default_dataloader, dataloader_args=dl_kwargs)
mmp = mm.query_bed(random_seqs_fpath, scores=['ref', 'diff'])
mutation_map_predictions = [
    mmp.mutation_map[i]['seq']['ref']['A549_CTCF_None_720']['mutation_map']
    for i in range(len(mmp.mutation_map)
]
# I also do some post-processing to just get one of the non-zero values from the 4 x 1000 mutation map.

# Predictions from pipeline.predict
pipeline_predictions = deepsea_predict.pipeline.predict(predict_dl_kwargs, batch_size=100)

Note that both of these use the exact same bed file and therefore should be looking at the same sequences.

Am I missing some key reason why I should expect these two prediction arrays to differ dramatically? I am using the default rc merge settings for both (and that wouldn't account for the order-of-magnitude differences anyway). The two best ideas I have for why I might be getting such different results are:

  1. Contrary to my understanding of the docs, MutationMap.query_bed is re-centering on each currently-being-tested variant.
  2. ref doesn't mean what I think it does.
@an1lam an1lam added the help wanted Extra attention is needed label Dec 22, 2019
@Avsecz
Copy link
Contributor

Avsecz commented Dec 22, 2019

@krrome Do you have an idea why the predictions are different?

@Avsecz
Copy link
Contributor

Avsecz commented Dec 22, 2019

MutationMap.query_bed is re-centering on each currently-being-tested variant.

I think that could be the potential hunch. @krrome do you know if that's true? ref should mean prediction for the reference allele as expected.

@an1lam
Copy link
Contributor Author

an1lam commented Dec 24, 2019

That was my suspicion, but the docs made it sound like they don't do that for bed files. I tried tracing through the code but frankly ran out of steam before I could figure it out.

@Avsecz
Copy link
Contributor

Avsecz commented Dec 25, 2019

@krrome wrote the code so he'll be better able to help here.

@an1lam
Copy link
Contributor Author

an1lam commented Dec 30, 2019

Thanks! As an FYI: yesterday, I ended up working around this by using kipoi-interpret's Mutation class instead (which gives correct-looking results), so I'm happy to keep this thread open or close it. Up to you!

@Avsecz
Copy link
Contributor

Avsecz commented Dec 30, 2019

Glad to hear the Mutation class worked well. Let's still keep this thread open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants