Quality control of BBS data #16

mdavy86 · 2014-12-16T02:18:02Z

This is a placeholder to discuss what we are doing in terms of Quality control of BBS data

Plant and Food Research

We have some perl scripts, knitr Rmarkdown scripts, and a shiny application looking at quality control aspects of GBS restriction sites for bam alignments.

The shiny application does some exploratory analysis summarizing 96 wells * 2 bam files for ~1.5 Million restriction sites/tags in real time checking the sampled yield distributions versus the known population of restriction sites for samples, investigatng coverage depth, and fragment distribution before considering SNP discovery.

The perl script sanity checks restriction fragments (probably unnecessary), and summarises sites in the following form;

$ perl gbsSites.pl
NAME
    gbsSites.pl - BAM to location terminal ends

DESCRIPTION
    Process a bam file for GBS restriction sites

SYNOPSIS
     gbsSites.pl [options]

    Where options and [defaults] are:

     -bam <BAM file>    Path to a bam file. Multiple options allowed      []

     -enzyme <Enzyme name> Which restriction enzyme? BamHI, ApeKI etc     [BamHI]

     -format < narrow|wide > Options: 'wide' or 'narrow' formats          [wide]

     -out <output file> Filename for tab delimited report                 [report.txt]

## Example output
Sample  Chromosome      cutSite Count   fwdCount        revCompCount
[BAMFile]   1       8312    1       0       1
[BAMFile]   1       17201   340     340     0
[BAMFile]   1       33026   2       0       2
[BAMFile]   1       35031   1       1       0
[BAMFile]   1       50458   54      0       54

The text was updated successfully, but these errors were encountered:

rbrauning · 2015-01-27T01:20:57Z

To enable biologists and lab staff to contribute to qc efforts I've put together questions of interest to be asked from a GBS run. Technical details are left out to draw non-bifos in.

Fastq
- Did we get per lane what’s promised in terms of output?
- How does the sequence quality look like?
- How pure is the data (adapters, other species)? What are contaminants?
Barcodes
- How many reads have recognizable barcodes?
- What are the reads without barcodes?
- Are all barcodes represented equally?
- Are negative controls blank?
Mapping
- How many reads can get mapped to a reference?
- What does the mapping quality look like?
- How much of the genome gets covered by reads?
- What does the coverage depth distribution look like?
- What does the theoretical fragment size distribution look like? Contrast to observed fragment size distribution.
- How many reads do we see per fragment? Are there fragments that absorb most of the reads?
- Do the reads map within 100bp of the fragment ends?
- How do the start and end sequences of fragments look like theoretically and what gets observed?
SNPs
- How many SNPs do we see per sample?
- Do GBS SNP calls agree with SNP chip data / WGS data?

mdavy86 · 2015-01-27T04:04:19Z

Thats good, many of the questions cover more detail than in the last meeting minutes.

We have some code investigating post aligning QC, fragment distributions, modeled as an exponential decay (where applicable), size selection bias relative to the population of known tag sites, depth distribution, reml mixed model analysis of 96 technical samples for 6 genotypes.

lranjard · 2015-02-03T02:07:08Z

Link to fastq_screen, that utility that subsample reads in fastq files to check for contaminations against a configurable set of Bowtie2 genome indexes:
http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/

mdavy86 added the question label Dec 16, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quality control of BBS data #16

Quality control of BBS data #16

mdavy86 commented Dec 16, 2014

rbrauning commented Jan 27, 2015

mdavy86 commented Jan 27, 2015

lranjard commented Feb 3, 2015

Quality control of BBS data #16

Quality control of BBS data #16

Comments

mdavy86 commented Dec 16, 2014

Plant and Food Research

rbrauning commented Jan 27, 2015

mdavy86 commented Jan 27, 2015

lranjard commented Feb 3, 2015