GitHub - ht-diva/MR_instruments_selection

MR-instruments CHRIS-INTERVAL METANALYSIS.

20240917 Giulia Pontali, Solène Cadiou, Claudia Giambartolomei @HUMANTECHNOPOLE

GWAS analyses were performed independently using Regenie in our 2 cohorts (INTERVAL study, N = 9251; CHRIS study, N =4194; TOT N = 13445). Standard-error weighted meta-analysis was performed using METAL software. Only variants common to the 2 studies were kept in the meta-analysis summary statistics to be analyzed.

We used Locus Breaker function to identify regions of associations (https://www.nature.com/articles/s41467-023-38389-6).

All analyses including the instrument selection are based on genome build 37. As a last step, to convert positions of the selected instruments to genome build 38, we used the bcftools +liftover tool (https://academic.oup.com/bioinformatics/article/40/2/btae038/7585532).

NOTE 1: (For now missing from the data) Once the MR instruments will be selected for the COLOC step, we will provide the position of the cis region Lifted to build 38.

NOTE 2: We may have multiple cis-instruments corresponding to the same seqID but mapped to a different locus: please treat them as separate signals (not collapsed into one MR model).

To identify cis-instruments for MR, we provide three different files:

MR_instruments_best_snps_from_LB_MVP.txt We first selected as cis instrument the best cis SNP (i.e., SNP with the smallest p-value within a cis locus breaker region) that passes the Bonferroni significancy threshold. We used the following approach. First, loci from locus breaker results were mapped to cis and trans regions using a cis-trans mapping file. The transcription start site (TSS) information in build 37 from this file was used to define the boundaries of the cis region (+/- 500kb). Any locus overlapping the cis region of the associated protein was defined as cis. We then selected as instrument the SNP having the smallest significant (ie lower than 5.10^-8 divided by our Bonferroni number of effective tests) p-value, if its F-statistics was greater or equal to 10. All top significant SNPs in cis-locus passed the F-statistics filtering, leading to identify 1,799 SNPs in 1,799 cis regions.
MR_instruments_unconditional_analysis_MVP.txt We used COJO-GCTA to select the conditionally independent SNPs within each locus (extended by +/-100kb) associated to a protein. We first mapped the SNPs to cis and trans regions using the same cis-trans mapping file as before: any SNP falling in a region defined by +/-500kb from the TSS of the associated protein (as outlined before) was defined as cis. For the unconditional analysis, we report the unconditional BETA, SE and MinusLog10PVAL. We used unconditional BETA to compute F-statistics (see the formula below).

We identified 4,607 independent SNPs in cis regions. All with F-statistics greater than or equal to 10.

MR_instruments_conditional_analysis_MVP.txt As before, we used COJO-GCTA to select the conditionally independent SNPs within each locus (extended by +/-100kb) associated to a protein. We first mapped the SNPs to cis and trans regions using a cis-trans mapping file similarly to what has been described in 2. For the conditional analysis, we report the conditional BETA, SE and MinusLog10PVAL (conditional on all other instruments for that locus). We used conditional BETA to compute F-statistics (see the formula below). We identified 4,606 independent SNPs in cis regions with F-statistics greater than or equal to 10.

Explanation of column names in the files:

DATASET = INTERVAL_CHRIS_META
TISSUE = WholeBlood
SNP = chr:pos:allele1:allele2 (alleles arranged in alphabetical order)
CHR
POS_37 = Position in build 37
POS_38 = Position in build 38
locus_START_END_37 = region from locus breaker output
locus_extended_START_END_37 = region used in fine-mapping (positions from regions from locus breaker extended by +/- 100,000 bp for all regions)
BETA
SE
MinusLog10PVAL
EFFECT_ALLELE
OTHER_ALLELE
MAF
EAF
SAMPLESIZE
PVE = (2*(BETA^2)MAF(1-MAF)) / (2*(BETA^2)MAF(1-MAF)+(SE^2)2SAMPLESIZE*$MAF*(1-MAF))
k = number of cis conditionally independent SNPs within each locus
Fstats = PVE*(SAMPLESIZE-1-k)/(1-PVE)*k with k=1
Fstats_multipleMR = PVE*(SAMPLESIZE-1-k)/(1-PVE)*k (this info is present only in MR_instruments_unconditional_analysis_MVP.txt/MR_instruments_conditional_analysis_MVP.txt, we did not perform any filtering using it)
GENE_NAME = Please note that when multiple gene names map to the same cis region, the gene names are collapsed (separated by |)
GENE_ENSEMBL
TSS_37 = Please note that when multiple gene names map to the same cis region, the TSS_37 positions are collapsed (separated by |)
SeqID
UNIPROT
PROTEIN_NAME
PROTEIN_LONG_NAME
FILENAME = NA
Gene.type = protein_coding

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
SnakeMake		SnakeMake
code		code
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 3

Languages

ht-diva/MR_instruments_selection

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages