Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed-model in Limma #325

Open
KamilMaliszArdigen opened this issue Oct 28, 2024 · 4 comments · May be fixed by #339
Open

Mixed-model in Limma #325

KamilMaliszArdigen opened this issue Oct 28, 2024 · 4 comments · May be fixed by #339
Assignees
Labels
enhancement New feature or request

Comments

@KamilMaliszArdigen
Copy link

Description of feature

We would like to extend this pipeline logic to allow processing of data and model described bellow. Can you suggest how we should approach this topic?

Consider the following experiment frame:


Patient Condition Tissue
1 Diseased A
1 Diseased B
2 Diseased A
2 Diseased B
3 Diseased A
3 Diseased B
4 Normal A
4 Normal B
5 Normal A
5 Normal B
6 Normal A
6 Normal B

Mixed-model experiments involve comparisons both within and between subjects. The experiment has six subjects, three with a disease and three normal subjects. Two tissue types, A and B, are examined from each subject. The goal is to compare tissue types and disease states.

  • Within-subject comparison: Comparing tissue types A and B can be done within subjects since measurements for both tissues are available from each subject.

  • Between-subject comparison: Comparing diseased and normal subjects requires a between-subject comparison, as these measurements come from different individuals.

Key Concepts for Analyzing Mixed-model Experiments

  • Random Effects: To account for the variability within and between subjects, the "Patient" variable is treated as a random effect. This means each patient has a unique baseline effect on the expression levels.

  • RNAseq data with voom: For RNAseq data, the voom function converts read counts to log2-counts per million (log2-cpm) with associated weights, preparing the data for linear modeling in limma. The voom function accounts for the mean-variance relationship in RNAseq data, making it suitable for linear modeling. 

  • duplicateCorrelation Function: This function is used to estimate the correlation between measurements taken from the same subject. 

  • Linear Mixed Model: A linear mixed model is fitted using lmFit, incorporating the estimated inter-subject correlation. This model allows for the simultaneous estimation of fixed effects (tissue type and disease state) and random effects (patient-specific variations).

Steps for Analyzing Mixed-model Experiments in Limma

  1. Combine Factors: Combine experimental factors into a single factor. For example, combine "Condition" (Diseased or Normal) and "Tissue" (A or B) into a combined factor.

  2. Create Design Matrix: Create a design matrix based on the combined factor using model.matrix.

  3. Apply voom Transformation: Use the voom function from limma to transform raw read counts to log2-cpm values.

  4. Estimate Inter-subject Correlation: Use duplicateCorrelation to estimate the correlation between measurements from the same subject.

  5. Fit Linear Mixed Model: Use lmFit to fit the linear model, incorporating the estimated correlation and specifying the "block" argument as the variable representing the subjects (e.g., "Patient").

  6. Define Contrasts: Define contrasts to compare the different experimental conditions, such as diseased versus normal for each tissue type and tissue A versus tissue B for each condition.

  7. Compute Contrasts and Moderated t-tests: Calculate the contrasts and perform moderated t-tests using contrasts.fit and eBayes.

  8. Identify Differentially Expressed Genes: Use topTable to identify differentially expressed genes for each contrast of interest.

Understanding Variability in Mixed-model Experiments

Mixed-model experiments have two levels of variability:

  1. Between-subject variability: The variation between individuals, accounting for inherent differences between subjects. 

  2. Within-subject variability: The variation between measurements within the same individual, adjusted for individual baseline differences.

The use of random effects and the duplicateCorrelation function in limma allows for the proper analysis of multi-level experiments, leading to more accurate and reliable results by accounting for both within- and between-subject variability.


References:

  1. https://bioconductor.org/packages/devel/bioc/vignettes/limma/inst/doc/usersguide.pdf

  2. https://pmc.ncbi.nlm.nih.gov/articles/PMC4402510/

  3. https://pmc.ncbi.nlm.nih.gov/articles/PMC4937821/

  4. https://pubmed.ncbi.nlm.nih.gov/34605806/



@KamilMaliszArdigen KamilMaliszArdigen added the enhancement New feature or request label Oct 28, 2024
@KamilMaliszArdigen KamilMaliszArdigen self-assigned this Oct 28, 2024
@pinin4fjords
Copy link
Member

Just to be super clear, can you illustrate the differences in how you would construct the models for this please?

@mzenczak
Copy link

Just to be super clear, can you illustrate the differences in how you would construct the models for this please?

@pinin4fjords if I understood you correctly here is the answer to your question:

Following the steps for analyzing mixed models, we first combine the experimental factors into a single factor, as shown in the table below (serving as the sample.sheet). This newly created factor becomes our contrast variable, which we then use in the model construction. See the code snippet below for details:

Patient Condition Tissue Condition.Tissue
1 Diseased A Diseased.A
1 Diseased B Diseased.B
2 Diseased A Diseased.A
2 Diseased B Diseased.B
3 Diseased A Diseased.A
3 Diseased B Diseased.B
4 Normal A Normal.A
4 Normal B Normal.B
5 Normal A Normal.A
5 Normal B Normal.B
6 Normal A Normal.A
6 Normal B Normal.B
DGE <- DGEList(count.table)
DGE <- calcNormFactors(DGE)
contrast_variable <- "Condition.Tissue"
model <- paste("~ 0 +", contrast_variable)
design <- model.matrix(as.formula(model), data = sample.sheet)
voom <- voom(counts = DGE, design = design)
corfit <- duplicateCorrelation(object = voom, design = design, block = Patient)
fit <- lmFit(object = voom, design = design, block = Patient, correlation = corfit$consensus.correlation)

For example, to define a contrast that tests for deferentially expressed genes between diseased tissue A and normal tissue A, you could construct a contrast like this:

contrast <- "Condition.Tissue_Diseased.A_vs_Normal.A = Diseased.A - Normal.A"
contrast_matrix <- makeContrasts(contrasts = contrast, levels = design)
fit <- contrasts.fit(fit, contrast_matrix)
fit <- eBayes(fit)
results <- topTable(fit)

Hope this answers your question. :)

@pinin4fjords
Copy link
Member

IIUC the contrast can be dealt with by supplying the workflow with a sample sheet containing the composite variable, and building contrasts on that variable in the normal way.

After that the only missing thing is the duplicateCorrelation treatment, which could be PR'd to the limma module as a new feature.

@grst
Copy link
Member

grst commented Oct 31, 2024

There's also the DREAM method from the bioconductor package variancePartition which is an extension of limma for linear mixed effects models:

https://gabrielhoffman.github.io/variancePartition/articles/dream.html

It allows to specify random effects using an extension of Wilkinson formulas that is also adopted by other R packages and might be more convenient than what's described in the limma vignette:

form <- ~ Disease + (1 | Individual)

@KamilMaliszArdigen KamilMaliszArdigen linked a pull request Nov 5, 2024 that will close this issue
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants