A collection of tools to annotate, predict clinical significance, and provide prediction explanations for Copy Number Variants (CNVs). Models were trained with the January 2023 version of ClinVar. Separate models were trained to predict deletion and duplication CNVs. To read more about features and benchmarking results, please see our recent publication in JOURNAL_LINK. Here is the graphical abstract of the project:
Python dependencies are handled via the anaconda package manager. The best way to create an environment with all needed dependencies is with conda or mamba (a conda wrapper that runs much faster). Create a new enviornment with CNVoyant with this command:
mamba create -n CNVoyant -c conda-forge -c bioconda python=3.10 schuetz.12::cnvoyant
CNVoyant requires ClinVar, conservation scores, functional region boundaries, gnomAD SV, and a GRCh38 reference genome to annotate inputted CNVs. To download these resources, a dependency directory must be specified and passed to the build_all
method of the DependencyBuilder
object.
from CNVoyant import DependencyBuilder
data_dir = '/path/to/cnvoyant_dependencies'
db = DependencyBuilder(data_dir)
db.build_all()
CNVoyant features must be generated before predictions can be generated. Features can be generated by calling the get_features
method from the FeatureBuilder
object.
import pandas as pd
from CNVoyant import FeatureBuilder
# Create sample data
cnv_df = pd.DataFrame({
'CHROMOSOME': ['chr1','chr2','chr3','chr4','chr3'],
'START': [100000, 100000, 100000, 100000, 179197182],
'END': [200000, 200000, 200000, 200000, 179236784],
'CHANGE': ['DEL','DEL','DUP','DUP','DEL']
})
# Intialize CNVoyant FeatureBuilder instance
fb = FeatureBuilder(variant_df = cnv_df, data_dir = data_dir)
# Generate features
fb.get_features()
Pretrained models are available to generate predictions. Predictions can be generated by calling the predict
method from the Classifier
object.
from CNVoyant import Classifier
# Intialize CNVoyant Classifier instance
cl = Classifier(data_dir)
# Generate predictions
cnvoyant_preds = cl.predict(fb.feature_df)
The CNVoyant models can be retrained to a specified set of variants, given that a label is available. Label values must be either 'Benign', 'VUS', or 'Pathogenic'. The name of the column header must be passed to the train
method from the Classifier
object.
from CNVoyant import FeatureBuilder, Classifier
# Sample data
cnv_train_df = pd.DataFrame({
'CHROMOSOME': ['chr1','chr2','chr3','chr4','chr3','chr8','chr8','chr8'],
'START': [100000,100000,100000,100000,179197182,60680919,38458191,37878455],
'END': [200000,200000,200000,200000,179236784,60738964,38470707,38884501],
'CHANGE': ['DUP','DEL','DUP','DUP','DEL','DEL','DUP','DUP'],
'LABEL': ['Benign','Benign','Benign','Benign','Pathogenic','VUS','VUS','Pathogenic']
})
# Intialize CNVoyant FeatureBuilder instance
fb_train = FeatureBuilder(variant_df = cnv_train_df, data_dir = data_dir)
# Generate features
fb_train.get_features()
# Intialize CNVoyant Classifier instance
cl_retrained = Classifier(data_dir)
# Retrain models
cl_retrained.train(fb_train.feature_df, label = 'LABEL')
# Generate predictions
cnvoyant_retrained_preds = cl_retrained.predict(fb.feature_df)
A key feature of CNVoyant is the ability to provide reasoning behind the provided clinical significance predictions. Explanations are provided via SHAP force plots, which indicate which features drove the prediction of each class for the provided CNV.
from CNVoyant import Explainer
cnv_coordinates = {
'CHROMOSOME': 'chr3',
'START': 179197182,
'END': 179236784,
'CHANGE': 'DEL'
}
expl = Explainer(
cnv_coordinates = cnv_coordinates,
output_dir = '/path/to/output',
classifier = cl
)
expl.explain()