Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlap output #8

Merged
merged 16 commits into from
Oct 2, 2023
Merged

Overlap output #8

merged 16 commits into from
Oct 2, 2023

Conversation

tbrekalo
Copy link
Owner

@tbrekalo tbrekalo commented Oct 2, 2023

Description

Logistic regression proved as a solid tool for classifying reverse complement pairs as seen in overhang pull request. After comparing logistic regression to other classification models we decided it would be appropriate to make sniff's output a bit more detailed so it can be piped to more complex models for better classification.

Model metrics

image

Detailed look at lightgbm result on yeast training dataset

image

image

Features changes

  • more detailed sniff output
    • outputting overlaps which can be further processed with a script
  • added a classification script using lightgbm

Side changes

  • added python virtual environment targets to Makefile
  • moved eval scripts into scripts directory

Comparing pipeline results to master

image

image

branch threads alpha beta window_length kmer_length runtime_s peak_memory_gib
master 32 0.1 0.9 5 15 328 7.47564
overlap-output 32 0.1 0.9 5 15 402 7.46596

image

@tbrekalo tbrekalo merged commit be82300 into master Oct 2, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant