Seq2Feature

Seq2Feature is a comprehensive pipeline designed for annotating plasmid sequences and generating machine learning features using PLAnnotate. This tool automates feature extraction and annotation to support genomic research and machine learning model development.

This project is primarily inspired by pLannotate from Barrick Lab, which provides the foundation for sequence annotation used in Seq2Feature. We extend and build upon this work to offer additional features and automation.

Features

Automated Annotation: Uses databases such as GenoLIB, FPbase, and Swiss PROT to annotate plasmid sequences.
Machine Learning Integration: Extracts features from sequences to facilitate the development and training of ML models.
Scalable Processing: Annotates large datasets efficiently with progress tracking.

Installation

Seq2Feature can be installed using Conda or Docker. Choose the method that best fits your needs.

Using Conda

Clone the Repository

git clone https://github.com/yourusername/Seq2Feature.git
cd Seq2Feature

Create and Activate the Conda Environment

conda env create -f environment.yml
conda activate seq2feature

Using Docker

Build the Docker Image
```
docker build -t seq2feature .
```
Run the Docker Container
```
docker run -it --rm seq2feature
```
By default, this command will run the seqannotate.main module. Adjust the Docker run command if you need to execute a different script or pass additional arguments.

Usage

To use Seq2Feature, you need to run the annotation script which utilizes databases for feature extraction.

Script Overview

Seq2Feature/seqannotate/main.py: Runs the annotation script using databases such as GenoLIB, FPbase, and Swiss PROT. For more details, refer to this paper.
Seq2Feature/seqannotate/resources.py: Contains annotated resources used for feature extraction and annotation.
Seq2Feature/gene_main.py: Demonstrates how to use the seqannotate package to process and annotate gene data.

Example

To process and annotate your gene data, follow these steps:

Update read_loc and save_loc in Seq2Feature/gene_main.py with the paths to your input CSV file and where you want to save the annotated files.
Run the script:
```
python Seq2Feature/gene_main.py
```
This script reads your input CSV, annotates each sequence, and saves the results in the specified location.

Configuration

read_loc: Path to the CSV file containing the sequences to be annotated.
save_loc: Directory where annotated files will be saved.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any enhancements or bug fixes.

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0). See the LICENSE file for details.

Acknowledgements

This project is inspired by pLannotate. Special thanks to the authors of pLannotate for their foundational work in sequence annotation.

Contact

For questions or support, please contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
seqannotate		seqannotate
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
compose.yaml		compose.yaml
environment.yml		environment.yml
gene_main.py		gene_main.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Seq2Feature

Features

Installation

Using Conda

Using Docker

Usage

Script Overview

Example

Configuration

Contributing

License

Acknowledgements

Contact

About

Releases

Packages

Languages

License

NG-sama/Seq2Feature

Folders and files

Latest commit

History

Repository files navigation

Seq2Feature

Features

Installation

Using Conda

Using Docker

Usage

Script Overview

Example

Configuration

Contributing

License

Acknowledgements

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages