Skip to content

This is a collection of single-molecule, next-generation proteomics (NGP) simulation and analysis algorithms.

License

Notifications You must be signed in to change notification settings

magnuspalmblad/NGP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NGP

This repository is a collection of simulation and analysis algorithms for single-molecule, next-generation proteomics (NGP). Specifically, the scripts can be used to study the influence of choice of labeled amino acids on the number of unique and the number of proteotypic reads for a given proteome.

The "simulations.R" script is has no inputs, but these parameters are defined in the beginning of the script:

Parameter Type Meaning
N_SIMULATIONS integer The number of simulations of 1-20 randomly chosen amino acids
MAX_LENGTH integer The maximum peptide read length considered
MISSED_CLEAVAGES integer The maximum number of missed cleavage sites
ENZYME string The proteolytic enzyme simulated (e.g. "trypsin")
N_PROTEINS integer The number of proteins from the database considered (-1 for all)
IDEAL boolean whether the sequencing is assumed to be ideal or containing errors

The outputs are matrices containing the selection of amino acids, the number of proteins lacking a proteotypic partial read and the number of unique possible reads.

The seed for the random number generator is set by set.seed(). Running simulations with the same seed but different parameters will repeat the simulations for the exact same selection of amino acids.

The "plots.R" makes pretty visualizations from the simulations using ggplot2. It also uses multidimensional linear regression to determine the relative discrimination power of the different amino acids. This R script was used to produce the figures in the JPR technical note describing the generic simulations of single-molecule/next-generation proteomics.

The "plot_cleavage_comparisons.R" analyzes and plots results from six matching 100X matching simulations (same 100 selections of 4, 5 and 6 amino acids), comparing the relative discrimination power of the 20 amino acids for trypsin (cleavage C-termially or R and K, not before P, and with the additional Keil rules) and CNBr (C-terminally of M) with 0 or 1 missed cleavage allowed.

The scripts are written in R for maximum readability. For deeper simulations, refactoring in C/C++ will likely be beneficial.

About

This is a collection of single-molecule, next-generation proteomics (NGP) simulation and analysis algorithms.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages