Skip to content

[Step 2] Background selection

Mikhail Dozmorov edited this page Feb 24, 2016 · 1 revision

GenomeRunner uses a background of genomic regions to estimate associations between SNPs-regulatory features. The background may consist of all currently reported SNPs (useful for the analysis of SNP sets from Genome-Wide Association Studies), or contain a set of all SNPs on a microarray chip (e.g., ImmunoChip, MetaboChip). Think about the background as all SNPs tested in a GWA study, and the SNP sets (FOIs) as subsets of the background significantly associated with a disease/phenotype.

Why do we need to care about the background? Because neither regulatory features nor SNPs are located randomly in the genome. Therefore, we need to know genomic locations of all SNPs assessed in the study (the background), to properly estimate whether the SNPs of interest are enriched/depleted in regulatory features as compared with randomly selected SNPs from the background.

By default, all common SNPs are used as a 'background'. The SNPs of interest should be a subset of the background, or the p-values may be incorrect. The default background, all common SNPs from the latest organism-specific database, is suitable when a genome-wide study was performed. When a microarray was used for SNPs profiling, it is advisable to upload all SNPs on that array as a background.