Skip to content

nch-igm/CNVoyant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CNVoyant

A collection of tools to annotate, predict clinical significance, and provide prediction explanations for Copy Number Variants (CNVs). Models were trained with the January 2023 version of ClinVar. Separate models were trained to predict deletion and duplication CNVs. To read more about features and benchmarking results, please see our recent publication in JOURNAL_LINK. Here is the graphical abstract of the project:

image

Dependencies

Python dependencies are handled via the anaconda package manager. The best way to create an environment with all needed dependencies is with conda or mamba (a conda wrapper that runs much faster). Create a new enviornment with CNVoyant with this command:

mamba create -n CNVoyant -c conda-forge -c bioconda python=3.10 schuetz.12::cnvoyant

Download Databases

CNVoyant requires ClinVar, conservation scores, functional region boundaries, gnomAD SV, and a GRCh38 reference genome to annotate inputted CNVs. To download these resources, a dependency directory must be specified and passed to the build_all method of the DependencyBuilder object.

from CNVoyant import DependencyBuilder

data_dir = '/path/to/cnvoyant_dependencies'
db = DependencyBuilder(data_dir)
db.build_all()

Build Features

CNVoyant features must be generated before predictions can be generated. Features can be generated by calling the get_features method from the FeatureBuilder object.

import pandas as pd
from CNVoyant import FeatureBuilder

# Create sample data
cnv_df = pd.DataFrame({
  'CHROMOSOME': ['chr1','chr2','chr3','chr4','chr3'],
  'START': [100000, 100000, 100000, 100000, 179197182],
  'END': [200000, 200000, 200000, 200000, 179236784],
  'CHANGE': ['DEL','DEL','DUP','DUP','DEL']
})

# Intialize CNVoyant FeatureBuilder instance
fb = FeatureBuilder(variant_df = cnv_df, data_dir = data_dir)

# Generate features
fb.get_features()

Generate Predictions

Pretrained models are available to generate predictions. Predictions can be generated by calling the predict method from the Classifier object.

from CNVoyant import Classifier

# Intialize CNVoyant Classifier instance
cl = Classifier(data_dir)

# Generate predictions
cnvoyant_preds = cl.predict(fb.feature_df)

Retrain CNVoyant Classifier

The CNVoyant models can be retrained to a specified set of variants, given that a label is available. Label values must be either 'Benign', 'VUS', or 'Pathogenic'. The name of the column header must be passed to the train method from the Classifier object.

from CNVoyant import FeatureBuilder, Classifier

# Sample data
cnv_train_df = pd.DataFrame({
  'CHROMOSOME': ['chr1','chr2','chr3','chr4','chr3','chr8','chr8','chr8'],
  'START': [100000,100000,100000,100000,179197182,60680919,38458191,37878455],
  'END': [200000,200000,200000,200000,179236784,60738964,38470707,38884501],
  'CHANGE': ['DUP','DEL','DUP','DUP','DEL','DEL','DUP','DUP'],
  'LABEL': ['Benign','Benign','Benign','Benign','Pathogenic','VUS','VUS','Pathogenic']
})

# Intialize CNVoyant FeatureBuilder instance
fb_train = FeatureBuilder(variant_df = cnv_train_df, data_dir = data_dir)

# Generate features
fb_train.get_features()

# Intialize CNVoyant Classifier instance
cl_retrained = Classifier(data_dir)

# Retrain models
cl_retrained.train(fb_train.feature_df, label = 'LABEL')

# Generate predictions
cnvoyant_retrained_preds = cl_retrained.predict(fb.feature_df)

Generate CNVoyant Explanations

A key feature of CNVoyant is the ability to provide reasoning behind the provided clinical significance predictions. Explanations are provided via SHAP force plots, which indicate which features drove the prediction of each class for the provided CNV.

from CNVoyant import Explainer

cnv_coordinates = {
    'CHROMOSOME': 'chr3',
    'START': 179197182,
    'END': 179236784,
    'CHANGE': 'DEL'
}

expl = Explainer(
    cnv_coordinates = cnv_coordinates,
    output_dir = '/path/to/output',
    classifier = cl
    )

expl.explain()

The output looks like this:
image

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages