Skip to content

Sequence Read Archive Classification

Notifications You must be signed in to change notification settings

MartinBoleSlo/BMDSRA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BMD-SRA: A Boosting Model for Differentiating Sequence Read Archive Files Based on the Context.

The volume of the deposit sequence file is increase dramatically. Also, the submitter of the sequence file is main responsible for annotating. Although the submitter and public repositories pay attention to making accurate metadata, mistakes can happen. These issues can cause troubles in performing downstream analysis. BMD-SRA tries to differentiate the given sequence files into four categories including

  1. Meta Genomes
  2. Amplicons
  3. Single Amplified Genomes (SAGs)
  4. Isolated Genomes

For developing this model, some stages were tracked, which listed below:

  1. Preparing Metadata
  2. Downloading Sequence Files
  3. Feature Extraction
  4. Outlier Detection
  5. Developing Model
  6. Evaluation Model

How can you use it?

There are two ways for using the outcomes of the study. Generating your own model or Applying the generated model in your project.

Generating your own model

There is well-form documentation about preparing training data You can use the extracted features and generate your own model.

Load the generated model and apply it.

The generated model is accessible here. You can use the BMDSRA class and pass just two parameters to make an object.

  1. The path of the model.
  2. The path of the scaler.
After making an object of the BMDSRA class, just call predict function and pass the path of the sequence file.

It is worth mentioning that the BMD-SRA needs access to two files, including FeatureExtraction and Preprocessing. Also, accessing to the xgboost package is essential.

Example:

from Codes.BMDSRA import BMDSRA
model_path = "..\\..\\resource\\4-model\\model.json"
scaler_path = "..\\..\\resource\\4-model\\scaler.gz" 
model = BMDSRA(model_path, scaler_path)

seq_path = "..\\..\\resource\\2-subsra\\SRR1588386.fastq" 
res = model.predict(seq_path)
print(res)

To reach more sample about the running model you can see here

About

Sequence Read Archive Classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%