-
Notifications
You must be signed in to change notification settings - Fork 21
FAQs
How can I use hg38?
Specify reference genome in make_matrix() by ref_genome_name to 'hg19'
or 'hg38'
.
The COSMIC signature catalog version
The MVA models are trained using COSMIC catalog v2 and that version should be used to get accurate predictions. However apart from the MVA predictions other functionalities of SigMA can be used by setting cosmic_version = 'v3'
in run()
.
Can I tune a new model?
Yes, see here.
What is the meaning of different signature columns?
For each signature there are different measures produced which are discussed here. Below you can find some more details.
Signature_3_l_rat indicates: (probability decomposition with Sig3)/(probability decomposition with Sig3 + probability decomposition without Sig3)
A value of 0.5 indicates that the mutations can be decomposed equally well with other signatures in the catalog, and a value above this value indicates that the decomposed spectrum better explains the mutations if Sig3 is used in the decomposition.
To determine a threshold you can run the tuning example code for your dataset and this creates a file <input_file_name>simulation_predictions.csv. Using this file you can check the false positive rates with different thresholds on each variable. The file has a is_sig3 column this is sig3 status in WGS. Then you can see how many Sig3+ samples fall above your threshold and how many Sig3- samples fall above your threshold on Signature_3_l_rat values.
Signature_3_c*_ml columns indicate the likelihood of Sig3+ clusters in WGS data. If you run in the lite mode you will find get a Signature_3_ml column which is the sum of all the Signature_3_c*_ml values. This indicates how likely the sample is to match to a Sig3+ cluster in WGS. This value is independent from NNLS calculation, so it can also be helpful.
How to interpret a given value of the measures? Which samples are Signature 3 positive?
- You can use the
lite_format = T
setting and look atcateg
column in the output file. - You can use the
lite_format = F
setting and usepass_mva_strict
orpass_mva
columns for 10% and < 5% FPR settings. Note: if you have trained your own model the FPR values may be different. - You can determine the FPR also for any other signature measure (e.g. likelihoods columns ending with
_ml
) using theget_threshold()
function and simulated data. For generating simulated data see the example macro and wiki documentation. E.g. for Sig3 exposure calculated with NNLS:
thresh <- get_threshold(df, limits = c(0.05, 0.1), var = 'exp_sig3',signal = 'is_true', cut_var = 'fpr')
cutoffs <- thresh$cutoff # cutoff values to be used with that parameter e.g. exp_sig3 > cutoffs[1] would correspond to 0.05 FPR
thresh$sen # sensitivity at corresponding values
thresh$fpr # false positive rate
thresh$fdr # false discovery rate