-
Notifications
You must be signed in to change notification settings - Fork 0
/
summary.tex
executable file
·96 lines (90 loc) · 4.47 KB
/
summary.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
\chapter*{Summary}\label{ch:summary}
\addcontentsline{toc}{chapter}{Summary}
\vspace{-1.2cm}
\begin{singlespace}
%\textbf{Investigating Normal Human Gene Expression in Tissues
%with High-throughput Transcriptomic and Proteomic Data.}
{\small With the improvement of high-throughput technologies
during the last decade,
several studies exploring the normal gene expression in human tissues
have been published.
Many studies examine the transcriptome with RNA sequencing (RNA-Seq),
and others probe the proteome with unlabelled bottom-up Mass Spectrometry.
As the sampling of undiseased tissues is difficult,
the community often refers to expression atlases,
which are collating these studies,
to support or validate new findings.
Despite many overlapping tissues between the studies,
few atlases attempt to integrate all the data.\mybr\
\vspace{-1mm}
In this thesis, I investigate the consistency of gene expression
across tissues and studies in human
with the help of transcriptomics
captured with high-throughput sequencing (RNA-Seq)
and proteomics generated with label-free bottom-up
Mass Spectrometry (MS).\mybr\
\vspace{-1mm}
After describing the transcriptomic and proteomic data
and their state-of-art processing (\Cref{ch:datasets}),
I review several identified sources of biases
and my approaches to limit their effects (\Cref{ch:expression}).\mybr\
\vspace{-1mm}
The integration of the various transcriptomic datasets
(\Cref{ch:Transcriptomics})
shows that
the biological signal dominates the technical noise for RNA-Seq data.
Tissue samples display higher levels of correlation
for identical tissues in other studies than
for other tissues in the same datasets.
In other words, interstudy correlations for identical tissues
are higher than correlations between different tissues within the same study.
Globally, genes show similar expression profiles across studies
for a given set of tissues.
All genes categories are involved, including the tissue-specific genes
and the ubiquitously expressed ones.\mybr\
\vspace{-1mm}
After briefly discussing comparisons of proteomic data,
I introduce a new proteomic quantification method,
\PPKM\ (\Cref{ch:proteomics}).
%on which I base the integration of the proteomics and transcriptomics.
The \PPKM\ method allows me to quantify about twice as many proteins
compared to usual methods.\mybr\
\vspace{-1mm}
Limited numbers of previous studies have shown various correlation levels
between the expression of protein and mRNA
in studies combining high-throughput transcriptomics and proteomics.
I show that, for most tissues,
we can observe quite good correlation levels
(\ie\ significantly better than expected by chance),
even when the samples have different biological and technical backgrounds
as they have been independently sourced.
Many genes share similar patterns of expression
between the two biological layers,
\eg\ genes that have a protein detected in a single tissue
are more likely to have their mRNA showing specificity for the same tissue.
Additionally, three groups of genes present functional enrichments
of biological processes.
Genes having highly correlated protein and mRNA expressions across tissues
are enriched in catabolic processes.
Genes having the most anticorrelated expressions are enriched
for ribosomes and ncRNAs regulation.
Genes with a protein detected in a single tissue are enriched
in signalling processes.\mybr\
\vspace{-1mm}
Overall, this thesis describes a global picture
of the current consolidated knowledge
we can extract from the joint study
of public transcriptomic and proteomic data.
Beyond confirming or improving observations reported in the literature,
this work provides new insights
into the ubiquitous and tissue-specific genes.
To the best of my knowledge,
this work has also established the most extensive list of genes
with robust transcriptomic and proteomic expression across tissues and studies.
Furthermore, it shows that joint study approaches can help the development
of new methods, like the new proteomic \PPKM\ quantification method.
Finally, the highlighting of distinct functional enrichment profiles
for groups of genes across tissues and studies
lays a framework for further research.\mybr\
}
\end{singlespace}