-
Notifications
You must be signed in to change notification settings - Fork 3
/
appendix.tex
244 lines (208 loc) · 12.7 KB
/
appendix.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
\clearpage
\phantomsection
\addcontentsline{toc}{part}{APPENDIX}
\part*{\Huge\bfseries\addfontfeature{LetterSpace=10}APPENDIX}
\label{sec:appendix}
\setcounter{chapter}{0}
\renewcommand\thechapter{\Alph{chapter}}
\begingroup
\renewcommand*\floatpos{H}
\chapter{Supplementary material for \texorpdfstring{\cref*{sec:trna}}{chapter 2}}
The material in this section has been taken from the supplementary figures \&
methods of Schmitt, Rudolph,\andothersdelim\& al. [\cite*{Schmitt:2014}] with
minimal changes to the figure legends. The figures and their captions have been
created jointly by Bianca Schmitt, Claudia Kutter and me.
\section{Code}
The code used in the analysis of the data for this chapter can be found at
\url{https://github.com/klmr/trna} and
\url{https://github.com/klmr/trna-chip-pipeline}.
\section{Supplementary figures and tables}
\newpage
\thispagestyle{empty}
\textfig{trna-workflow}{spill}{\textwidth}
{Workflow of the genome-wide identification and analysis of protein-coding
and \abbr{trna} genes.}
{(A) \rnaseq analysis of protein-coding gene expression, differential
expression analysis and codon usage analysis. (B) \chipseq analysis of \pol3
occupancy at \trna gene loci, differential expression analysis of \trna
genes, and anticodon isoacceptor abundance analysis.}
\input{./data/expressed-trnas.tex}
\textfig{mrna-heatmap}{spill}{\textwidth}
{Hierarchical clustering of \abbr{mrna} gene expression correlations.}
{The heatmap shows the Spearman correlations of \mrna gene expression values,
representing the same data as \cref{fig:mrna-pca}. The samples cluster
hierarchically by tissue, followed by developmental stage.}
\textfig{trna-heatmap}{spill}{\textwidth}
{Hierarchical clustering of \abbr{trna} gene expression correlations.}
{The heatmap shows the Spearman correlations of \trna gene expression values,
representing the same data as \cref{fig:trna-pca}. The samples cluster
hierarchically by tissue, followed by developmental stage, with few
exceptions.}
\textfig{correlation-plots-rnaseq-pol3}{text}{0.7\textwidth}
{Correlation of \abbr{rnaseq} and \abbr{pol3} \abbr{chipseq} data during
mouse liver and brain development.}
{Correlation of (A) protein-coding gene expression across developmental
stages, (B) \trna gene expression as measured by \pol3 occupancy, (C)
triplet codon usage in protein-coding genes, (D) \trna anticodon
isoacceptor, (E) amino acid usage of protein-coding genes and (F) \trna
amino acid isotype.}
\textfig{pca-all-stages}{spill}{\textwidth}
{Early developmental stage-specific \abbr{trna} genes are lowly expressed.}
{(A) Factorial map of the \pca of \pol3 occupied \trna gene expression
levels in liver (red), brain (yellow), embryonic body without head (light
red) and head (light yellow) of stage E12.5, as well as whole E9.5 embryo
(grey). The proportion of variance explained by the \abbrsc{PC} is indicated
in parenthesis.\\
(B) Violin plots represent normalized enrichment of \pol3 at \trna genes
identified in E9.5 whole embryo (top), E12.5 head (middle) and E12.5 body
without head (bottom) tissue. In parentheses are the numbers of \trna genes
transcribed in the particular embryonic stage (“total \(>10\)”), which are
subdivided into \trna genes that can be found in the \num{12} developmental
stages according to \cref{fig:trna-counts} (“all tissues”) and those that
are specific for the embryonic stage (“specific”).}
\textfig{codon-usage-mrna}{spill}{\textwidth}
{Observed codon usage in \abbr{mrna} transcriptomes of developing mouse
liver.}
{Proportional frequencies (\rcu) weighted by transcript expression are shown
for triplet codons ordered by amino acid as a bar plot, where grey shading
is by triplet codon. Data is obtained from liver \rnaseq data of all \num{6}
developmental stages.}
\textfig{anticodon-abundance-trna}{spill}{\textwidth}
{Observed anticodon abundance of \abbr{trna} isoacceptors of developing
mouse liver.}
{Proportional frequencies weighted by \trna gene expression (\raa) are shown
for anticodon isoacceptors ordered by amino acid isotype as a bar plot,
where grey shading is by anticodon. Data is obtained from liver \pol3
\chipseq data of all \num{6} developmental stages.}
\thispagestyle{empty}
\textfig{rnaseq-pol3-aa-usage-liver}{spill}{\textwidth}
{Observed and simulated amino acid and isotype usage in transcriptomes
across mouse liver development.}
{Each panel (A–C) consists of three columns: experimentally observed data
(left), simulated patterns of transcription randomized among either the
expressed genes (middle) or all genomically encoded genes (right).
Transcriptomes of each developmental stage were simulated \num{100} times.
Proportional frequencies weighted by transcript expression are shown for (A)
\num{20} amino acids as a radial plot, where data lines are coloured by
developmental stage and the background of all genomically annotated \mrna
genes is in grey. Labels within grid of radial plot describe ratios.
Proportional frequencies weighted by \pol3 binding are shown for (B)
\num{20} isotypes as a radial plot, both coloured as above (grey: background
of all genomically annotated \trna genes). (C) Plot right panel shows
Spearman’s rank correlation coefficients (\(\rho\)) and \(p\)-values (\(p\))
of \pol3 binding to \trna isotypes (\(x\)-axis) and transcriptomic amino
acid frequencies weighted by expression obtained from \rnaseq data
(\(y\)-axis) in E15.5 liver (experimentally observed data) and all six
developmental stages (simulated data). Amino acid isotypes outside the
\num{99} per cent confidence interval (grey area within plot in C right) are
named. Observed Spearman’s rank correlation coefficients across all stages
(coloured as above) are indicated by black diamonds in plot C middle and
left panels.}
\textfig{rnaseq-pol3-aa-usage-brain}{spill}{\textwidth}
{\abbr{mrna} codon usage and \abbr{pol3} occupancy of \abbr{trna} isotypes
in developing mouse brain tissue.}
{Proportional frequency weighted by transcript expression of (A) arginine
triplet codons, (B) amino acids, (C) \pol3 binding of arginine isoacceptors
and (D) \pol3 binding of amino acid isotypes. Grey shading is by triplet
codon (A) or \trna anticodon (C). Labels within grid of radial plot describe
proportions.}
\textfig{codon-usage-low-vs-high-expressed-genes}{spill}{\textwidth}
{Highly versus lowly expressed protein-coding genes show no differential
codon usage.}
{Proportional frequencies weighted by transcript expression are shown for
arginine triplet codons as a bar plot of (A) highly (\nth{90}–\nth{95}
percentile) and (B) lowly expressed (\nth{25}–\nth{50} percentile)
protein-coding genes during liver development, where grey shading is by
triplet codon. Plots show Spearman’s rank correlation coefficients
(\(\rho\)) and \(p\)-values (\(p\)) of \pol3 binding to \trna isoacceptors
(\(x\)-axis) and transcriptomic codon frequencies weighted by expression
obtained from \rnaseq data (\(y\)-axis) in E15.5 liver of (C) highly and (D)
lowly expressed protein-coding genes. Anticodon isoacceptors (grey dots in
plots) are not encoded in the mouse genome and were excluded from
calculating the correlation coefficients. (E) Variances of correlation
values over all stages in liver (i) all expressed protein-coding genes, (ii)
highly and (iii) lowly expressed protein-coding gene sets.}
\textfig{transcriptomic-pol3-codon-usage}{spill}{\textwidth}
{Transcriptomic \abbr{mrna} codon usage and \abbr{pol3} binding to
\abbr{trna} isoacceptors correlate in developing mouse liver and brain.}
{Plots show correlation of proportional \pol3 binding to \trna isoacceptors
(\(x\)-axis) and transcriptomic codon frequencies weighted by expression
obtained from \rnaseq data (\(y\)-axis). Correlation plots for developing
liver (A–F) and brain (G–L) are shown. Indexed box in top left indicates
developmental stage. Grey dots represent degenerated codons. Spearman’s rank
correlation coefficients (\(\rho\)) are reported along with their
\(p\)-values (\(p\)) in bottom right of each panel.}
\thispagestyle{empty}
\textfig{codon-anticodon-correlation-with-wobble-only-missing}{spill}{\textwidth}
{Transcriptomic \abbr{mrna} codon usage and wobble corrected \abbr{pol3}
binding to \abbr{trna} isoacceptors correlate in developing mouse liver and
brain.}
{Plots show correlation of proportional \pol3 binding to \trna isoacceptors
corrected according to wobble pairing (\(x\)-axis) and transcriptomic codon
frequencies weighted by expression obtained from \rnaseq data (\(y\)-axis).
Correlation plots for developing liver (A–F) and brain (G–L) are shown.
Indexed box in top left indicates developmental stage. Spearman’s rank
correlation coefficients (\(\rho\)) are reported along with their
\(p\)-values (\(p\)) in bottom right of each panel.}
\thispagestyle{empty}
\textfig{transcriptomic-pol3-aa}{spill}{\textwidth}
{Transcriptomic \abbr{mrna} amino acid usage and \abbr{pol3} binding to
\abbr{trna} isotypes correlate in developing mouse liver and brain.}
{Plots show correlation of \pol3 binding to \trna isotypes (\(x\)-axis) and
transcriptomic amino acid frequencies weighted by expression obtained from
\rnaseq data (\(y\)-axis). Correlation plots for developing liver (A–F) and
brain (G–L) are shown. Indexed box in top left indicates developmental
stage. Grey area represents \num{99} per cent confidence interval.
Spearman’s rank correlation coefficients (\(\rho\)) and the corresponding
\(p\)-values (\(p\)) are reported in top left and bottom right, respectively
of each panel. Amino acid isotypes outside the \num{99} per cent confidence
interval (grey area) are named.}
\input{./data/meme-hits.tex}
\textfig{colocalisation-e155-p22}{spill}{\textwidth}
{Differentially expressed \abbr{trna} genes show no colocalisation with
differentially expressed protein-coding genes.}
{In each plot, the blue line is the cumulative distribution of the ratio of
the number of upregulated \mrna genes to the number of all \mrna genes in
the neighbourhood of each upregulated \trna gene. The green line is the
cumulative distribution of the ratios of the number of upregulated \mrna
genes (\fdr cutoff \num{0.01}) to the number of all \mrna genes, in the
neighbourhood of each \trna gene that is not differentially expressed.
Significant differences between these two distributions reveal situations
where upregulated \trna genes are significantly (by Kolmogorov–Smirnov test)
associated with upregulated protein-coding genes. Different window sizes
were used, ranging from \SIlist{10;50;100}{kb} around \trna genes. Pairwise
comparison of (A–C) E15.5 and P22 in liver as well as (D–F) P4 and P29 in
brain are shown. This analysis was repeated using two additional \fdr
cutoffs (\num{0.05} and \num{0.}, data for liver in
\cref{tab:colocalisation-liver}, not shown for brain). Under the assumption
that there was an observable colocalisation effect, we would expect there to
be a robust signal, i.e.\ consistent significance across different tested
parameters. However, of the \num{18} tests, only one was significant
(corrected \(p<0.013\)), after correcting for multiple testing (Bonferroni),
indicating the absence of any strong localisation effect.}
\chapter{Supplementary material for \texorpdfstring{\cref*{sec:codons}}{chapter 3}}
\section{Code}
The code used in the analysis of the data for this chapter can be found at
\url{https://github.com/klmr/codons}.
\section{Supplementary tables}
\newpage
\input{./data/GO_m_phase.tex}
\input{./data/GO_psp.tex}
\chapter{Supplementary material for \texorpdfstring{\cref*{sec:pol3}}{chapter 4}}
\section{Code}
The code used in the analysis of the data for this chapter can be found at
\url{https://github.com/klmr/pol3-seq}.
\section{Supplementary figures}
\newpage
\textfig{pol3-inputs}{spill}{\textwidth}
{Input library coverage}
{of different features in six stages of development in liver. The analysis
was performed under the assumption that different features have similar
amount of input binding (normalised for feature length). As we can see here,
this is not quite the case.}
\textfig{sine-summary-1}{spill}{\textwidth}
{\abbr{transsine} binding by \abbr{pol3} across development in liver and
brain.}
{Raw counts of \abbr{pol3} binding to different \abbr{transsine} classes,
including those classes where no binding occurs.}
\endgroup