Skip to content

ylab-hi/DeepChopper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

logo DeepChopper social

pypi PyPI - Wheel license pypi version platform Actions status Space

🧬 DeepChopper leverages language model to accurately detect and chop artificial sequences which may cause chimeric reads, ensuring higher quality and more reliable sequencing results. By integrating seamlessly with existing workflows, DeepChopper provides a robust solution for researchers and bioinformatics working with NanoPore direct-RNA sequencing data.

πŸš€ Quick Start: Try DeepChopper Online

Experience DeepChopper instantly through our user-friendly web interface. No installation required! Simply click the button below to launch the web application and start exploring DeepChopper's capabilities:

Open in Hugging Face Spaces

What you can do online:

  • πŸ“€ Upload your sequencing data
  • πŸ”¬ Run DeepChopper's analysis
  • πŸ“Š Visualize results
  • πŸŽ›οΈ Experiment with different parameters

Perfect for quick tests or demonstrations! However, for extensive analyses or custom workflows, we recommend installing DeepChopper locally.

⚠️ Note: The online version is limited to one FASTQ record at a time and may not be suitable for large-scale projects.

πŸ“¦ Installation

DeepChopper can be installed using pip, the Python package installer. Follow these steps to install:

  1. Ensure you have Python 3.10 or later installed on your system.

  2. Create a virtual environment (recommended):

    python -m venv deepchopper_env
    source deepchopper_env/bin/activate  # On Windows use `deepchopper_env\Scripts\activate`
  3. Install DeepChopper:

    pip install deepchopper
  4. Verify the installation:

    deepchopper --help

Compatibility and Support

DeepChopper is designed to work across various platforms and Python versions. Below are the compatibility matrices for PyPI installations:

Python Version Linux x86_64 macOS Intel macOS Apple Silicon Windows x86_64
3.10 βœ… βœ… βœ… βœ…
3.11 βœ… βœ… βœ… βœ…
3.12 βœ… βœ… βœ… βœ…

πŸ†˜ Trouble installing? Check our Troubleshooting Guide or open an issue.

πŸ› οΈ Usage

For a comprehensive guide, check out our full tutorial. Here's a quick overview:

Command-Line Interface

DeepChopper offers three main commands: encode, predict, and chop.

  1. Encode your input data:

    deepchopper encode <input.fq>
  2. Predict chimera artifacts:

    deepchopper predict <input.parquet> --output predictions

    Using GPUs? Add the --gpus flag:

    deepchopper predict <input.parquet> --output predictions --gpus 2
  3. Chop chimera artifacts:

    deepchopper chop <predictions> raw.fq

Want a GUI? Launch the web interface (note: limited to one FASTQ record at a time):

deepchopper web

Python Library

Integrate DeepChopper into your Python scripts:

import deepchopper

model = deepchopper.DeepChopper.from_pretrained("yangliz5/deepchopper")
# Your analysis code here

πŸ“š Cite

If DeepChopper aids your research, please cite our paper:

@article {Li2024.10.23.619929,
        author = {Li, Yangyang and Wang, Ting-You and Guo, Qingxiang and Ren, Yanan and Lu, Xiaotong and Cao, Qi and Yang, Rendong},
        title = {A Genomic Language Model for Chimera Artifact Detection in Nanopore Direct RNA Sequencing},
        elocation-id = {2024.10.23.619929},
        year = {2024},
        doi = {10.1101/2024.10.23.619929},
        publisher = {Cold Spring Harbor Laboratory},
        abstract = {Chimera artifacts in nanopore direct RNA sequencing (dRNA-seq) data can confound transcriptome analyses, yet no existing tools are capable of detecting and removing them due to limitations in basecalling models. We present DeepChopper, a genomic language model that accurately identifies and eliminates adapter sequences within base-called dRNA-seq reads, effectively removing chimeric read artifacts. DeepChopper significantly improves critical downstream analyses, including transcript annotation and gene fusion detection, enhancing the reliability and utility of nanopore dRNA-seq for transcriptomics research.Competing Interest StatementThe authors have declared no competing interest.},
        URL = {https://www.biorxiv.org/content/early/2024/10/25/2024.10.23.619929},
        eprint = {https://www.biorxiv.org/content/early/2024/10/25/2024.10.23.619929.full.pdf},
        journal = {bioRxiv}
}

🀝 Contribution

We welcome contributions! Here's how to set up your development environment:

Build Environment

git clone https://github.com/ylab-hi/DeepChopper.git
cd DeepChopper
conda env create -n environment.yaml
conda activate deepchopper

Install Dependencies

pip install pipx
pipx install --suffix @master git+https://github.com/python-poetry/poetry.git@master
poetry@master install

πŸŽ‰ Ready to contribute? Check out our Contribution Guidelines to get started!

πŸ“¬ Support

Need help? Have questions?


DeepChopper is developed with ❀️ by the YLab team. Happy sequencing! πŸ§¬πŸ”¬