Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we count "times seen" with non-identical encodings and parameter transformation? #162

Open
jgallowa07 opened this issue Jul 8, 2024 · 0 comments

Comments

@jgallowa07
Copy link
Member

Adding this as a placeholder - but it occurs to me that in the final push we lost track of a discussion @Haddox and I were having about the correct way to calculate the "times seen" column of the mutations df. Note that this is only relevant to multi-condition training sets which include non-identical protein wildtype sequences.

background

Right now, we simply sum the columns of the transformed binary matrix in order to get the times seen such that times seen is essentially the number of times the model sees a "1" for a given mutation. As discussed with, this may not be the correct way to do things and we should re-think how this parameter is calculated.

How the binarymaps are encoded across non-identical proteins for joint modeling

To describe how we encode the variants into binarymaps, let's consider the example in the unit tests.

TODO Finish description, and add discussion between Hugh and I (that currently exists mainly on slack)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant