Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jason dev #92

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Jason dev #92

wants to merge 4 commits into from

Conversation

biodataganache
Copy link
Collaborator

Updating the min_filter to allow for percentages as input. Also add a helpful error message that is printed and added to the log file if the user has filtered out all the kmers using min_filter.

@biodataganache biodataganache linked an issue Oct 10, 2022 that may be closed by this pull request
@christinehc
Copy link
Collaborator

Hmm, not sure if use is as intended, but I'm getting the same full_feature_matrix shape when min_filter=0.1 and min_filter=0.9:

image

image

vs. min_filter=10, where it seems kmers have been pared down:

image

@biodataganache
Copy link
Collaborator Author

Hmm, not sure if use is as intended, but I'm getting the same full_feature_matrix shape when min_filter=0.1 and min_filter=0.9:

image

image

vs. min_filter=10, where it seems kmers have been pared down:

image

That's weird. Is this for model?

@christinehc
Copy link
Collaborator

This is for cluster mode

@biodataganache
Copy link
Collaborator Author

biodataganache commented Oct 10, 2022

0.999 = (289, 61)
0.1 = (289, 0)
1 = (289, 15796)
For cluster in my hands? Weird.

Your results are consistent with the old behavior - in that it would still filter with 0.1 or 0.9 it would just not be any different because they're both less than 1

@christinehc
Copy link
Collaborator

Ah, I realized the issue. I was changing config['cluster']['min_rep'] and not config['min_filter'].

I do think it may be a bit confusing to have both of these parameters named as-is, though I realize they accomplish slightly different things

Changes the behavior of the `min_filter` parameter when the filter is set too high; previously, an error message was printed to console, but the snakemake workflow will continue running until the missing input files are detected in the next rule. This has been changed to terminate with an error immediately instead. Thus, the workflow will not attempt to run subsequent rules that are invalid past the `min_filter` step.
@christinehc
Copy link
Collaborator

Made changes to raise ValueError to terminate the workflow in case not enough kmers pass the filter.

I do still think the min_filter vs. min_rep may warrant further discussion. I also find it a bit confusing to use the min_filter parameter since it is currently a "hidden parameter" and we don't have good documentation for it.

Also, with the current behavior, I think something like include would be a better name for the parameter, referring to the fact that (1 - min_filter)% of kmers are captured. Otherwise, filter seems to imply the reverse to me. Thoughts?

@biodataganache
Copy link
Collaborator Author

OK - I've changed filtering so it now uses a single global parameter "kmer_include_filter" - but has backwards compatibility with "min_filter" since I realized I just set that in stone in the manuscript (whoops). The filtering for model and cluster are all done in the kmerize rule and I removed the (partially) redundant filtering that was specific to cluster. So no more cluster/min_rep and I removed cluster/max_rep (but we should probably replace it in the future).
Finally - I commented the config.yaml a bunch to be more explanatory about what the parameters mean.

@christinehc
Copy link
Collaborator

I'm not seeing any changes to the jason-dev branch-- have those been pushed?

@biodataganache
Copy link
Collaborator Author

Weird. I don't know what happened to that change :) - will find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

min_filter as percentage
2 participants