Results differ unexpectedly between `pipeline.predict` and `query_bed` with `ref` score #38

an1lam · 2019-12-22T20:11:35Z

I'm using the DeepSEA/variantEffects model with MutationMap to try and find the most impactful mutations for a set of sequences. However, I've noticed a discrepancy between the predictions I get for the wild-type sequences when using pipeline.predict vs. query_bed with the ref score.

The pipeline.predict scores are generally high probabilities for CTCF in cell type A549, which is what I'd expect given that my bed file consists of ChIP-seq peaks for that TF/cell line pair. The ref scored predictions, on the other hand, tend to be really small. A few examples of the difference are:

Predictions from pipeline.predict: [0.96582216 0.03907571 0.09172686 0.19638198 0.13311383 0.64790106
 0.67463433 0.94372396 0.64945625 0.98188872]

		vs.

Predictions from `ref`-scored MutationMap.query_bed: [-9.167706593871117e-09, -3.85800376534462e-07, 2.0929292077198625e-08, 3.2794196158647537e-07, 3.2247044146060944e-08, 2.3366883397102356e-06, 0.0, 1.3096723705530167e-09, 1.1496013030409813e-09, 0.0]

I'm trying to understand whether this is expected, a result of something I'm doing wrong, or a bug. For context, my code is essentially the following (pared down to make it easier to see the essentials):

dl_kwargs = {'fasta_file': '../dat/hg19.fa'}
predict_dl_kwargs = {'fasta_file': '../dat/hg19.fa', 'intervals_file': random_seqs_fpath}
random_seqs_fpath = "../dat/ChIPseq.A549.CTCF.100.random.narrowPeak.gz"
deepsea = kipoi.get_model("DeepSEA/variantEffects", source="kipoi")

# Predictions from MutationMap
dl_kwargs = {'fasta_file': '../dat/hg19.fa'}
mm = MutationMap(deepsea, deepsea.default_dataloader, dataloader_args=dl_kwargs)
mmp = mm.query_bed(random_seqs_fpath, scores=['ref', 'diff'])
mutation_map_predictions = [
    mmp.mutation_map[i]['seq']['ref']['A549_CTCF_None_720']['mutation_map']
    for i in range(len(mmp.mutation_map)
]
# I also do some post-processing to just get one of the non-zero values from the 4 x 1000 mutation map.

# Predictions from pipeline.predict
pipeline_predictions = deepsea_predict.pipeline.predict(predict_dl_kwargs, batch_size=100)

Note that both of these use the exact same bed file and therefore should be looking at the same sequences.

Am I missing some key reason why I should expect these two prediction arrays to differ dramatically? I am using the default rc merge settings for both (and that wouldn't account for the order-of-magnitude differences anyway). The two best ideas I have for why I might be getting such different results are:

Contrary to my understanding of the docs, MutationMap.query_bed is re-centering on each currently-being-tested variant.
ref doesn't mean what I think it does.

The text was updated successfully, but these errors were encountered:

Avsecz · 2019-12-22T20:44:39Z

@krrome Do you have an idea why the predictions are different?

Avsecz · 2019-12-22T20:46:12Z

MutationMap.query_bed is re-centering on each currently-being-tested variant.

I think that could be the potential hunch. @krrome do you know if that's true? ref should mean prediction for the reference allele as expected.

an1lam · 2019-12-24T23:33:48Z

That was my suspicion, but the docs made it sound like they don't do that for bed files. I tried tracing through the code but frankly ran out of steam before I could figure it out.

Avsecz · 2019-12-25T10:30:44Z

@krrome wrote the code so he'll be better able to help here.

an1lam · 2019-12-30T16:19:24Z

Thanks! As an FYI: yesterday, I ended up working around this by using kipoi-interpret's Mutation class instead (which gives correct-looking results), so I'm happy to keep this thread open or close it. Up to you!

Avsecz · 2019-12-30T16:58:18Z

Glad to hear the Mutation class worked well. Let's still keep this thread open.

an1lam added the help wanted Extra attention is needed label Dec 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results differ unexpectedly between `pipeline.predict` and `query_bed` with `ref` score #38

Results differ unexpectedly between `pipeline.predict` and `query_bed` with `ref` score #38

an1lam commented Dec 22, 2019 •

edited

Loading

Avsecz commented Dec 22, 2019

Avsecz commented Dec 22, 2019 •

edited

Loading

an1lam commented Dec 24, 2019

Avsecz commented Dec 25, 2019

an1lam commented Dec 30, 2019

Avsecz commented Dec 30, 2019 •

edited

Loading

Results differ unexpectedly between pipeline.predict and query_bed with ref score #38

Results differ unexpectedly between pipeline.predict and query_bed with ref score #38

Comments

an1lam commented Dec 22, 2019 • edited Loading

Avsecz commented Dec 22, 2019

Avsecz commented Dec 22, 2019 • edited Loading

an1lam commented Dec 24, 2019

Avsecz commented Dec 25, 2019

an1lam commented Dec 30, 2019

Avsecz commented Dec 30, 2019 • edited Loading

Results differ unexpectedly between `pipeline.predict` and `query_bed` with `ref` score #38

Results differ unexpectedly between `pipeline.predict` and `query_bed` with `ref` score #38

an1lam commented Dec 22, 2019 •

edited

Loading

Avsecz commented Dec 22, 2019 •

edited

Loading

Avsecz commented Dec 30, 2019 •

edited

Loading