How do we count "times seen" with non-identical encodings and parameter transformation? #162

jgallowa07 · 2024-07-08T18:14:52Z

Adding this as a placeholder - but it occurs to me that in the final push we lost track of a discussion @Haddox and I were having about the correct way to calculate the "times seen" column of the mutations df. Note that this is only relevant to multi-condition training sets which include non-identical protein wildtype sequences.

background

Right now, we simply sum the columns of the transformed binary matrix in order to get the times seen such that times seen is essentially the number of times the model sees a "1" for a given mutation. As discussed with, this may not be the correct way to do things and we should re-think how this parameter is calculated.

How the binarymaps are encoded across non-identical proteins for joint modeling

To describe how we encode the variants into binarymaps, let's consider the example in the unit tests.

TODO Finish description, and add discussion between Hugh and I (that currently exists mainly on slack)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do we count "times seen" with non-identical encodings and parameter transformation? #162

How do we count "times seen" with non-identical encodings and parameter transformation? #162

jgallowa07 commented Jul 8, 2024

How do we count "times seen" with non-identical encodings and parameter transformation? #162

How do we count "times seen" with non-identical encodings and parameter transformation? #162

Comments

jgallowa07 commented Jul 8, 2024

background

How the binarymaps are encoded across non-identical proteins for joint modeling