Skip to content

wingolab-org/GEMMA

 
 

Repository files navigation

Genetic associations identified in CFW mice using GEMMA (Parker et al, Nat. Genet., 2016)

GEMMA: Genome-wide Efficient Mixed Model Association

Build Status Anaconda-Server Badge DL BrewBadge GuixBadge DebianBadge

GEMMA is a software toolkit for fast application of linear mixed models (LMMs) and related models to genome-wide association studies (GWAS) and other large-scale data sets.

Check out NEWS.md to see what's new in each GEMMA release.

Please post feature requests or suspected bugs to Github issues. For questions or other discussion, please post to the GEMMA Google Group. We also encourage contributions, for example, by forking the repository, making your changes to the code, and issuing a pull request.

Currently, GEMMA is supported for 64-bit Mac OS X and Linux platforms. Windows is not currently supported. though you can run GEMMA in a Linux VM or container. If you are interested in helping to make GEMMA available on Windows platforms (e.g., by providing installation instructions for Windows, or by contributing Windows binaries) please post a note in the Github issues.

*(The above image depicts physiological and behavioral trait loci identified in CFW mice using GEMMA, from Parker et al, Nature Genetics, 2016.)

Key features

  1. Fast assocation tests implemented using the univariate linear mixed model (LMM). In GWAS, this can correct for population structure and sample nonexchangeability. It also provides estimates of the proportion of variance in phenotypes explained by available genotypes (PVE), often called "chip heritability" or "SNP heritability".

  2. Fast association tests for multiple phenotypes implemented using a multivariate linear mixed model (mvLMM). In GWAS, this can correct for populations tructure and sample nonexchangeability jointly in multiple complex phenotypes.

  3. Bayesian sparse linear mixed model (BSLMM) for estimating PVE, phenotype prediction, and multi-marker modeling in GWAS.

  4. Estimation of variance components ("chip heritability") partitioned by different SNP functional categories from raw (individual-level) data or summary data. For raw data, HE regression or the REML AI algorithm can be used to estimate variance components when individual-level data are available. For summary data, GEMMA uses the MQS algorithm to estimate variance components.

Installation

To install GEMMA you can

  1. Download the precompiled binaries (64-bit Linux and Mac only)

  2. Use existing package managers, see INSTALL.md.

  3. Compile GEMMA from source, see INSTALL.md.

Compiling from source takes more work, but can potentially boost performance of GEMMA when using specialized C++ compilers and numerical libraries.

Precompiled binaries

  1. Fetch the latest stable release and download the file appropriate for your platform.

  2. For .tar.bz2 files unpack the tar ball

     tar xvjf gemma-$version-installer.tar.bz2
    

    run the installer

     ./install.sh ~/gemma
    

    and run gemma

     ~/gemma/bin/gemma
    
  3. For .gz files run gunzip gemma.linux.gz or gunzip gemma.linux.gz to unpack the file.

Run GEMMA

GEMMA is run from the command line. To run gemma

gemma -h

a typical example would be

# compute Kinship matrix
gemma -g ../example/mouse_hs1940.geno.txt.gz -p ../example/mouse_hs1940.pheno.txt \
    -gk -o mouse_hs1940
# run univariate LMM
gemma -g ../example/mouse_hs1940.geno.txt.gz \
    -p ../example/mouse_hs1940.pheno.txt -n 1 -a ../example/mouse_hs1940.anno.txt \
    -k ./output/mouse_hs1940.cXX.txt -lmm -o mouse_hs1940_CD8_lmm

Above example files can be downloaded from github.

Debugging and optimization

GEMMA has a wide range of debugging options which can be viewed with

gemma -h 14

 DEBUG OPTIONS
 -check                   enable checks (slower)
 -no-fpe-check            disable hardware floating point checking
 -strict                  strict mode will stop when there is a problem
 -silence                 silent terminal display
 -debug                   debug output
 -debug-data              debug data output
 -legacy                  run gemma in legacy mode

typically when running gemma you should use -debug which includes relevant checks.

For performances you may want to use the -no-check option instead. Also check the build optimization notes in INSTALL.md.

Help

Citing GEMMA

If you use GEMMA for published work, please cite our paper:

If you use the multivariate linear mixed model (mvLMM) in your research, please cite:

If you use the Bayesian sparse linear mixed model (BSLMM), please cite:

And if you use of the variance component estimation using summary statistics, please cite:

License

Copyright (C) 2012–2018, Xiang Zhou and team.

The GEMMA source code repository is free software: you can redistribute it under the terms of the GNU General Public License. All the files in this project are part of GEMMA. This project is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See file LICENSE for the full text of the license.

Both the source code for the gzstream zlib wrapper and shUnit2 unit testing framework included in GEMMA are distributed under the GNU Lesser General Public License, either version 2.1 of the License, or (at your option) any later revision.

The source code for the included Catch unit testing framework is distributed under the Boost Software Licence version 1.

Optimizing performance

Precompiled binaries and libraries may not be optimal for your particular hardware. See INSTALL.md for speeding up tips.

Building from source

More information on source code, dependencies and installation can be found in INSTALL.md.

Input data formats

Currently GEMMA takes two types of input formats

  1. BIMBAM format (preferred)
  2. PLINK format

See this example where we convert some spreadsheets for use in GEMMA.

Reporting a GEMMA bug or issue

For bugs GEMMA has an issue tracker on github. For general support GEMMA has a mailing list at gemma-discussion

Before posting an issue search the issue tracker and mailing list first. It is likely someone may have encountered something similiar. Also try running the latest version of GEMMA to make sure it has not been fixed already. Support/installation questions should be aimed at the mailing list - it is the best resource to get answers.

The issue tracker is specifically meant for development issues around the software itself. When reporting an issue include the output of the program and the contents of the .log.txt file in the output directory.

Check list:

  1. I have found and issue with GEMMA
  2. I have searched for it on the issue tracker (incl. closed issues)
  3. I have searched for it on the mailing list
  4. I have tried the latest release of GEMMA
  5. I have read and agreed to below code of conduct
  6. If it is a support/install question I have posted it to the mailing list
  7. If it is software development related I have posted a new issue on the issue tracker or added to an existing one
  8. In the message I have included the output of my GEMMA run
  9. In the message I have included the relevant .log.txt file in the output directory
  10. I have made available the data to reproduce the problem (optional)

To find bugs the GEMMA software developers may ask to install a development version of the software. They may also ask you for your data and will treat it confidentially. Please always remember that GEMMA is written and maintained by volunteers with good intentions. Our time is valuable too. By helping us as much as possible we can provide this tool for everyone to use.

Code of conduct

By using GEMMA and communicating with its communtity you implicitely agree to abide by the code of conduct as published by the Software Carpentry initiative.

Credits

The GEMMA software was developed by:

Xiang Zhou
Dept. of Biostatistics
University of Michigan

Peter Carbonetto, Tim Flutre, Matthew Stephens, Pjotr Prins and others have also contributed to the development of this software.

Packages

No packages published

Languages

  • C++ 91.5%
  • Shell 3.6%
  • HTML 2.2%
  • Roff 1.8%
  • Makefile 0.6%
  • XSLT 0.3%