-
Notifications
You must be signed in to change notification settings - Fork 1
/
ModernOSConcepts.tex
1831 lines (1639 loc) · 177 KB
/
ModernOSConcepts.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% !TEX root = MasterThesis.tex
\chapter{Modern Operating System Concepts}\label{ch:modern-os-concepts}
This thesis' introduction already picked up the discussion about which operating system architecture is the superior one by referring to the \textit{Tanenbaum-Torvalds debate}\cite{linux-is-obsolete} in 1992, a discussion being quite interesting from today's point of view on different operating systems and their future.
\textsc{Tanenbaum} and \textsc{Torvalds}, both are underpinning their arguments with the origins of their implementations \textit{MINIX} and \textit{Linux} in different problem spaces.
And roughly spoken, exactly these different problem spaces which need to be addressed by an operating system are one reason for their diversity.
Over the years, they had to fit in solving very different kinds of problems on very different kinds of hardware, which resulted in very different implementations and architectures.
As with the debate, it is impossible to find one architectural concept or implementation which is clearly superior to the other in every use-case.
Nevertheless, it is a reasonable question why a majority of the operating system kernels which were developed from scratch in the last few years are based on a microkernel concept.
In 2009, the official statement of the Linux kernel developer \textsc{Richard Gooch} was that monolithic kernels are superior for performance reasons\cite{why-linux-monolith}.
So the question remains what changed during the last ten years to promote this change, and specifically in the context of this work, how this affects device driver development.
For this reason, this chapter is dedicated to the basic components and functionalities of operating system kernels and how they are implemented in the real world examples \textit{Linux} and \textit{Zircon}.
\section{Operating System Architectures}\label{sec:kernel-arch-concepts}
As already pointed out, architectural decisions for operating systems are commonly influenced by the issues they are intended to solve.
By giving priority to some design objectives that are pertinent to the underlying issue, different concepts and architectures are the most appropriate choice.
According to \textsc{Glatz}\cite{glatz2015betriebssysteme}, some of them are:
\begin{itemize}
\item Providing a reliable, crash-proof environment.
\item Providing a portable operating system.
\item Providing a scalable operating system, e.g.\ in terms of processing cores.
\item Providing an extensible operating system, e.g.\ in terms of adding additional functionality to the kernel.
\item Providing real-time capabilities.
\item Providing an efficient design in terms of resources and performance.
\item Providing a secure environment for user applications.
\item Providing a maintainable operating system, e.g.\ by the division of policy and mechanism.
\end{itemize}\ \\
%
In addition, operating systems should also pay attention to the common software design issue \textit{mechanism vs.\ policy}, i.e.\ an operating system design should provide a clear distinction between the \textit{mechanism}, the capabilities that can be performed (how is something done) and the \textit{policy}, which controls how the available capabilities are used (what is done)\cite{lfd430},~\cite{silberschatz2009operating}.
An example for driver development could be controlling the number of processes that can use a device at once.
In this case, the driver should provide the mechanism, \textit{how} such a limitation could be done, but not \textit{what}, the actual number of allowed processes.
The idea behind being that requirements may change over time and such a distinction makes it easy to adjust the \textit{policy} via parameters without touching the underlying \textit{mechanism}\cite{silberschatz2009operating}.
How these design principles fit into the known operating system architectures will be considered in the following sections.
But first, the terms \textit{kernel mode} and \textit{user mode} will be explained as they are fundamental for this work.
\subsubsection*{Dual-Mode Execution}
Modern general purpose \acp{cpu} provide a ring based, hardware enabled security model which had its origin in the Intel x86 processor architecture\cite{tanenbaum-modern-operating-systems}.
It is usually made of four different security levels, the rings 0 to 3 which are illustrated in Figure~\ref{pic:x86rings}.
In this theoretical model, ring 3 is the least secure level, used for common user applications (even if started with extended privileges (\textit{root} for the UNIX-like world)), while ring 2 is used for libraries shared between user applications and ring 1 is for system calls\cite{glatz2015betriebssysteme}.
Actual operating system implementations may not use all of them.
Linux for example utilizes ring 3 for user-space applications and shared libraries and ring 0 for system calls and the kernel.
System calls provide the transition to ring 0, the one with the topmost security level, which is used for the operating system kernel.
As a crucial part of an operating system, system calls will be discussed in more detail later in this work.
Directly related to this model is the \textit{dual-mode} execution mode of modern \acp{cpu}.
It is a hardware enabled security concept to provide a distinction between the user applications in ring 3 and the actual operating system kernel in ring 0.
Only the kernel in ring 0, running in the \textit{kernel mode} (or \textit{privileged mode, supervisor mode or system mode}), has direct and privileged access to memory, hardware, timers or interrupts, e.g.\ for performing \ac{io} operations or memory mappings\cite{lfd430}.
%
\begin{figure} [t]
\centering
\includegraphics[scale=0.3]{CPUPrivilegeRings}
\caption{The Rings of the x86's security concept according to~\cite{glatz2015betriebssysteme}}\label{pic:x86rings}
\end{figure}
%
\begin{figure} [t]
\centering
\includegraphics[width=\linewidth]{ModeSwitchCallSequence}
\caption{A system calls sequence including the mode switches according to~\cite{glatz2015betriebssysteme}}\label{pic:mode-switches}
\end{figure} %\ \\
%
User applications in ring 3, running in the \textit{user mode}, are not allowed to use them directly, they have limited privileges and a limited instruction set.
As mentioned above, they need to use a mechanism called \textit{system calls} to transfer the execution to the \textit{kernel mode} where the privileged actions are performed.
Lastly, the execution is transferred back to the calling user process and with this, the mode changes back to \textit{user mode}.
Figure~\ref{pic:mode-switches} pictures the operating flow of a system call including the mode switches between \textit{user} and \textit{kernel mode}.
The \ac{cpu}'s operating mode is usually controlled by a specific bit in the \ac{psw}\cite{tanenbaum-modern-operating-systems}.
\ac{psw} is the term on x86 \ac{cpu} architectures.
It is corresponding to the \ac{cpsr} on ARM architectures\footnote{http://infocenter.arm.com, visited on 08.03.2019\\ \url{http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473f/CHDFAEID.html}}.
But since the literature always uses the \ac{psw} regardless of the concrete \ac{cpu}'s architecture, this should also be done in this work.
This bit influences the state of each \ac{cpu} core itself in a multi-processor system, but not the operating system kernel.
As a result, different \ac{cpu} cores may be in a different execution mode\cite{lfd430}.
With this separation, any privileged instruction is forbidden in \textit{user mode} and will not be executed.
Based on the dual mode execution on the \ac{cpu}, different architectural concepts for operating systems evolved.
They differ e.g.\ in the share of the operating system respectively the operating system's kernel actually running within the \ac{cpu}'s \textit{kernel mode}.
Thus, they influence the whole system, including device driver development but also performance and security issues.
With this basic knowledge about the \ac{cpu}'s operating modes, the next section researches a selection of different operating system architectures.
Special attention should be paid to the most common ones, the \textit{monolithic} and the \textit{microkernel} architectures and their implementation in Linux and Zircon.
Importantly, this work will not discuss special purpose operating system architectures such as e.g.\ Integrity OS for realtime applications or the Cray Linux Environment which is a Linux adaption for supercomputers.
Today, even the majority of general purpose computing systems are driven by more than one \ac{cpu} core and most common modern operating systems are designed to provide support for the de facto standard for tightly coupled systems, \ac{smp}.
\subsection{Monolithic Architectures}\label{sec:monolithic-archs}
Some sources, such as \textsc{Glatz}\cite{glatz2015betriebssysteme} or \textsc{Silberschatz}\cite{silberschatz2009operating}, suggest monolithic operating systems do not have a well-defined structure at all.
As they are indeed most commonly grown structures, started in a completely different scope (MS-DOS, the original UNIX), it is not an incorrect claim.
But it does not necessarily have to be the case.
Above all, monolithic operating system (kernel) architectures have in common that they form one single binary program which is running entirely in kernel mode.
User programs, running in user mode, interact with the kernel only through a well-defined set of \textit{system calls}\cite{lfd430}.
Within the kernel itself, all parts are free to use and access each other even the hardware, without any limitation, e.g.\ regarding the access of kernel functionalities of another component or hardware access.
So a function or procedure initially developed for scheduling processes could be used in a completely different context if its functionality is useful to solve another issue.
In fact, there is no information hiding between kernel functions or procedures.
Any function in this kernel context has full access to the hardware, such as \ac{io} devices, timers, interrupts and even to the memory.
There is no memory protection or validation between different components of a monolithic kernel.
Of course, this leads to some serious disadvantages in this architecture, for example, a crash in one single function or procedure could crash the entire kernel or the resulting system may become difficult to understand and maintain\cite{tanenbaum-modern-operating-systems},~\cite{silberschatz2009operating}.
The missing memory protection within the kernel could also be a source for crashes or attacks.
But in contrast, this design also enables a very efficient kernel design without any unneeded communication overhead or hardware inefficiencies\cite{lfd430}.
An extension of the monolithic architecture are the so-called \textbf{modular operating systems}.
They provide additional, defined interfaces for (usually) dynamically loadable and unloadable extensions, e.g.\ for device drivers or filesystems.
Sometimes, such extensions or modules are just allowed to use a limited function set of the operating system, but they are still running as a part of the kernel in kernel mode\cite{lfd430},~\cite{tanenbaum-modern-operating-systems}.
Just like ordinary kernel functions or procedures, (malicious) programming errors in extensions may lead to a kernel crash, manipulation or damage to other components.
On the other hand, the modular concept provides some advantages over regular monolithic kernels.
It allows slimming down the actual kernel by providing the ability to load only the actually needed functionality dynamically and e.g.\ security patches within such an extension are possible without restarting the entire system\cite{brause2017betriebssysteme}.
As the extensions become a part of the operating system running in kernel mode, no additional communication effort between the actual kernel and the modules is required.
Figure~\ref{pic:monolith} pictures an exemplary architecture for a monolithic kernel using optional modules.
Such a concept of modules is quite popular for monolithic operating systems like \textit{Linux} or \textit{Solaris}\cite{brause2017betriebssysteme},~\cite{silberschatz2009operating}.
%
\begin{figure} [t]
\centering
\includegraphics[scale=0.3]{MonolithicKernel}
\caption{A Monolithic Kernel Architecture according to~\cite{lfd430}}\label{pic:monolith}
\end{figure}
%
%
\subsection{Microkernel Architectures}\label{sec:microkernel-archs}
The microkernel architecture focuses on very different design goals compared to the monolithic one.
Some of them are to cope with the complexity, rather poor maintainability and susceptibility to errors by a massively modular approach.
To achieve this, the core idea behind microkernels is to provide only a very small kernel running in kernel mode which only provides the core functionalities while all the other important functions of an operating system are running in user mode.
Thereby, the microkernel architecture is excellently suited to implement a proper division of mechanism and policy.
The kernel provides just the most basic mechanisms needed for an operating system, while the userspace modules implement the policy.
This decoupling makes it easier to change the policy in userspace for altering requirements without touching the actual kernel\cite{tanenbaum-modern-operating-systems}.
%
\begin{figure} [t]
\centering
\includegraphics[scale=0.3]{Microkernel}
\caption{A Microkernel Architecture according to~\cite{lfd430}}\label{pic:microkernel}
\end{figure}
%
What is part of this core functionality differs between the various sources, but all considered ones are in agreement that a simple mechanism for process scheduling is as well a core functionality as providing an \ac{ipc} mechanism\cite{lfd430},~\cite{silberschatz2009operating},~\cite{glatz2015betriebssysteme}.
In contrast, the sources disagree whether memory management and virtualization, device drivers or synchronization facilities are a part of the actual kernel.
The \textit{Mach} microkernel, which formed the first generation of microkernels in 1985, named process and thread administration, an extensible and secure \ac{ipc} mechanism, virtual memory management and scheduling as its core tasks, while everything else needed has to run in usermode\cite{rashidMach}.
Functional enhancements of the system do not require changes to the kernel itself, too.
This concerns, depending on the exact realization, device drivers, memory management, system call handlers and even more system components\cite{lfd430},~\cite{silberschatz2009operating}.
Figure~\ref{pic:microkernel} tries to picture this situation.
In academic microkernel approaches, all components in user mode run within their own userspace process as small, well-defined modules, while the communication is done through copious message passing via the actual kernel\cite{tanenbaum-modern-operating-systems},~\cite{lfd430}.
Since the restrictions by the \ac{cpu}'s dual mode still apply for microkernel based operating systems it is not allowed for device drivers running in user mode to have direct physical access to \ac{io} ports as a consequence.
A device driver has to invoke the actual kernel to perform the needed action in substitution.
But thus, the kernel is able to check the actions and whether the driver is authorized to execute them.
As a result, the microkernel design is more reliable and secure as such a division enables the kernel to intercept erroneous actions such as accidental memory writes to important regions\cite{tanenbaum-modern-operating-systems}.
Equally, a crash in an userspace system component like a driver is not able to crash the entire kernel in such an approach.
And as an additional advantage, the microkernel architecture facilitates porting the operating system kernel to another target architecture as most hardware dependencies are part of the small kernel\cite{silberschatz2009operating},~\cite{lfd430}.
With all the described advantages of microkernels, the question remains why microkernels are primarily used in real-time environments, avionics or military, but not for desktop operating systems.
One reason is that all these advantages are paid by the high price of microkernel message costs.
For the mentioned application areas, especially the reliability that comes with the microkernel architecture is more desirable than the performance costs of the more frequent context switches in comparison to monolithic architectures\cite{tanenbaum-modern-operating-systems}.
Since a lot of the operating system's functionality has been moved to the userspace, microkernel architectures need to perform noticeable more context switches to invoke the actual kernel for privileged actions.
The performance losses are not only caused by the large amount of context switches themselves, but also by the fact that modern \acp{cpu}, particularly the caches, are not designed for them.
Every context switch causes cache misses which triggers that the required data has to be loaded from the slower main memory and cached.
The data of the previous context (e.g.\ the user mode context) will be dismissed from the cache and the \ac{cpu} is largely blocked in the meantime.
By rapidly switching back to the previous context, as it is usual for e.g.\ a short kernel invocation to perform an \ac{io} operation on microkernel architectures, the cache is no longer suitable for the new context and has to be replaced\cite{lfd430}.
First of all, the \textit{L4} kernel, a second generation microkernel was able to get close to the performance of a monolithic kernel like \textit{Linux}\cite{Hrtig1997}.
This has been achieved by improving the \acf{ipc} mechanism, a fundamental component of microkernel architectures on which other communication mechanisms are based.
% by improving ipc performance, only synchronous ipc
Nevertheless, pure microkernels are mainly used for systems with high reliability requirements but unusual for desktop application.
Some industry examples are \textit{Integrity}, \textit{QNX} and \textit{seL4}, a mathematically verified version of the \textit{L4} kernel\cite{tanenbaum-modern-operating-systems}.
%TODO ? reincarnation server : check if all modules are up running and work correctly -> if not it replaces them without user interaction \cite{tanenbaum}
\subsection{Layered Architectures}
Layered operating system architectures are usually organized in hierarchical layers, but sometimes the chosen model is described as a series of concentric rings.
Each layer or ring provides a group of functionality while it is only allowed to use the functions of the one directly below.
The cooperation between the layers or rings is regulated by clearly defined interfaces\cite{brause2017betriebssysteme}.
This is usually accompanied by the fact that the lower layers or inner rings are more privileged than the outer ones.
However, there is no uniform and universally accepted approach for division in layers and their count according to this pattern\cite{glatz2015betriebssysteme},~\cite{tanenbaum-modern-operating-systems}.
In fact, a meaningful division is not that easy.
Functionalities may have to be divided artificially and the harmonious arrangement can have its pitfalls caused by the access requirements this architecture is based on.
Is the layered access model applied properly to get a clean architecture, only then it is able to unfold its advantages.
These are for example the interchangeability of the layers if they and their interfaces were properly designed or the resulting concept for debugging.
As the layers are constructed on top of each other, it is possible to debug and verify each one for its own, starting at the lowest layer up to the top most one\cite{silberschatz2009operating}.
But also the costs for system calls are comparatively high, because they have to be passed though all layers while each one adds overhead to such a call\cite{silberschatz2009operating}.
In general, layered operating system architectures are related to monolithic ones, but it is conceivable to adopt the idea for microkernel approaches, e.g.\ the MINIX userspace is divided into layers.
Examples for this architecture are \textit{OS/2} or newer \textit{Unix} variants, while \textit{Multics} is one for a concentric ring based model\cite{glatz2015betriebssysteme},~\cite{tanenbaum-modern-operating-systems}.
% \subsection{Distributed Architectures}
\subsection{Hybrid Architectures}
Hybrid operating system architectures based on monolithic concepts, microkernels and maybe layers are a common approach to combine the advantages of these concepts.
They try to pair the performance of the monolithic design with the modularity and reliability of microkernels\cite{microkernels},~\cite{silberschatz2009operating}.
How both worlds interact is very different depending on the exact implementation.
\textsc{Silberschatz} explains one of them in his book \textit{Operating System Concepts}\cite{silberschatz2009operating} using Apple's \textit{OS X} (today named \textit{macOS}) as an example.
Depending on the exact implementation and the share of the architectures, most disadvantages of monolithic architectures still apply for the hybrid systems.
Further examples for hybrid architectures are \textit{Windows NT} and \textit{BeOS}\cite{microkernels}.
\subsection{The Linux Kernel's Monolithic Architecture}
Linux is the perfect example for an extremely grown operating system.
Starting as a pure hobby project to learn about a specific \ac{cpu} and connect to the Unix computers at \textsc{Linus Torvalds}, its initial author, university, it became strongly related to its archetype, \textit{Unix}\cite{DiamondTorvalds2002}.
They share fundamental design goals, just like being capable of multiple processors and users at the same time, while not being based on the origin Unix source code\cite{tanenbaum-modern-operating-systems},~\cite{lfd430},~\cite{DiamondTorvalds2002}.
It is entirely running in kernel mode and all built-in layers have full access to the internal kernel \ac{api} using common function calls like in C.
A sophisticated concept of kernel modules which can be dynamically loaded into a running kernel makes a limited number of microkernel advantages available for Linux.
Modules in Linux are only allowed to use a restricted (exported) set of functions to use, but once loaded into the running kernel, they become a part of the monolith running in kernel mode\cite{lfd430}.
Linux is largely compatible to the \ac{posix} standard which was initially created for Unix.
Initially, it was because \textsc{Torvalds} could not get a version of the standard, while today it is rather a conscious decision\cite{DiamondTorvalds2002},~\cite{tanenbaum-modern-operating-systems}.
Also, the decision for the monolithic architecture is today consciously supported by the kernel community and justified with its performance and efficiency over microkernels due to the \textit{privilege barrier} between user and kernel mode which has to be passed quite often in microkernel architecture\cite{why-linux-monolith},~\cite{lfd430}.
The Linux kernel itself is divided in five essential tasks which are also reflected in its source code.
They are:
\newpage
\begin{itemize}
\item Process Management,
\item Memory Management,
\item Filesystems,
\item Device Management and
\item Networking\cite{lfd430}.
\end{itemize}
By structuring this tasks and further components into \textit{subsystems} like \textit{drivers/}, \textit{fs/} (filesystems), \textit{net/} or \textit{kernel/}, the Linux kernel remains comprehensible and in some ways modular.
A closer look at most of them in general but also their implementation in Linux is done in the following sections.
The Linux kernel is mainly written in C but some very hardware dependent parts are in Assembly.
Additionally, especially the Assembly parts were strongly dependent on the \ac{gcc}.
Today, there are some efforts to reduce the share of Assembly for maintenance and readability since modern compilers do not generate less efficient code as hand-written Assembly is\cite{programming-religion}.
This also reduces the dependency to \ac{gcc} and enables the use of alternative compilers, especially Clang\cite{linux-clang},~\cite{fosdem-linux-llvm}.
Nevertheless, the Linux kernel's principal language is C and it only provides support for C drivers.
\subsection{The Zircon Kernel's Microkernel Architecture}
% design goals, architecture, interface posix, parts, language/compiler
In contrast to the Linux kernel, Zircon is not a grown structure.
Started in 2015, it was largely developed from scratch by Google for a so far undisclosed field of application\cite{chat-zircon-arch}.
Nevertheless, Zircon emerged from a branch of \textit{Little Kernel} (LK) by \textsc{Travis Geiselbrecht} who is also a part of the Zircon Team at Google\cite{zircon-vs-lk}.
Despite its origin, Zircon is very different to Little Kernel.
It targets powerful devices such as modern computers and phones and provides for this reason only 64-bit support, first class user-mode support and a capability-based security model.
In contrast, Little Kernel is designed for embedded applications and amongst others used as bootloader for \textit{Android} and as \textit{Android Trusted Execution Environment (Trusty TEE)}\cite{lk-intro}.
It has 32-bit support, but none of the more sophisticated features Zircon has\cite{zircon-vs-lk}.
The microkernel architecture is justified by having security, safety, reliability and modularity as major design goals for Zircon.
According to \textsc{Travis Geiselbrecht}, the architecture was a conscious tradeoff between the named goals and performance\cite{chat-zircon-arch}.
They try to avoid costly context switches as much as possible, speed up the remaining ones and take advantage of \acf{smp}, but it is not the focus of Zircon.
Alike, Zircon does not focus on performing \ac{io} operations or process management which are the key tasks \ac{posix} was designed for\cite{chat-zircon-arch}.
As a result, Zircon does not claim to be or to become \ac{posix} compatible, they just support a very basic subset of the standard\cite{zircon-libc-posix}.
% For \ac{ipc} Zircon provides a mechansim based on an \ac{idl} which is widly used within the Zircon kernel but also in Fuchsia as the whole operating system.
The Zircon kernel itself is split up into the actual microkernel running in kernel mode (\textit{kernel/}) and services, drivers and core libraries running in user mode (\textit{system/})\cite{zircon-intro}.
The kernel part provides the basic operating system mechanisms:
\begin{itemize}
\item Process Management,
\item (virtual) Memory Management,
\item \acl{ipc} and
\item Synchronization Mechanisms\cite{zircon-intro}.
\end{itemize}
The part running in user mode contains core services for, amongst others, booting, device management and networking, device drivers respectively hardware related code and user libraries.
Zircon is for the most parts written in C++ and less in C.
It provides native support for device drivers in both languages but due to the fact that Zircon provides an \ac{idl} which defines a contract for in-process drivers, other languages are conceivable as well.
In fact, support for Rust drivers is currently being worked on\cite{chat-zircon-arch}.
Unlike Linux, Zircon provided support for both, the \ac{gcc} and the Clang compiler, from the beginning caused by the sophisticated tools around Clang and LLVM\@.
\section{System Calls}\label{sec:system-calls}
System calls were already marginally mentioned in this work as the mechanism to switch the program execution between user and kernel mode, because applications running in user mode have only restricted rights.
Thus, these special calls are needed for the interaction with basic hardware devices like the \ac{cpu}, the memory, peripherals or filesystems and for invoking the actual operating system's kernel for management operations like process management\cite{lfd430}.
System calls are to a high degree hardware dependent and differ between various operating system implementations.
% Even operating systems which comply the \acf{posix} standard may differ in the type and number of available system calls.
% The wide-spread \ac{posix} standard is only an \ac{api} definition but not one for system calls.
% An operating system may implement the \ac{posix} standard functions within a library which often involve system calls, but it has not to do so\cite{lfd430},~\cite{glatz2015betriebssysteme}.
% Equally, a system is not constrained to a single \ac{api} and may implement different ones using its existing set of system calls\cite{glatz2015betriebssysteme}.
% \ac{posix} is implemented is a very common standard which is for example implemented in UNIX, macOS, MACH and partly in Linux\cite{tanenbaum-modern-operating-systems},~\cite{glatz2015betriebssysteme}.
% Another example for an \ac{api} is Win32 used by Windows to abstract their system calls.
% As a result system calls are rarely used directly by user applications without an abstraction layer such as libraries. An example is the \textit{libc}\cite{lfd430},~\cite{tanenbaum-modern-operating-systems}.
%
A system call has its origin in an application running in user mode.
If the application has to invoke the operating system kernel, e.g.\ to perform an action on memory in substitution, it has to use one for switching the operating mode\cite{glatz2015betriebssysteme},~\cite{tanenbaum-modern-operating-systems}.
As switching the \ac{cpu}'s execution mode also means a new context, a system call to the kernel differs from a common procedure call.
In user mode, the so-called entry code stores the system call's parameters in a defined way.
One is to store the parameters and the call's number in defined registers (see~\ref{pic:syscalls}), another one is to store them on stack, according the C/C++ calling convention in reverse order\cite{silberschatz2009operating},~\cite{glatz2015betriebssysteme}.
The exact one depends on the actual \ac{cpu} architecture but also on the operating system kernel itself.
Linux, for example, has in general another convention for system calls than Windows.
The following instruction triggers a special software interrupt containing the order to switch the context.
It is also named \textit{trap instruction}\cite{glatz2015betriebssysteme},~\cite{tanenbaum-modern-operating-systems}.
To be exact, it is the interrupt vector number of the trap instruction which is responsible for the switch.
They are \texttt{0x80} on Linux as pictured in~\ref{pic:syscalls} and \texttt{0x2e} on Windows systems\cite{glatz2015betriebssysteme}.
But right before switching it is needed to save the \acf{psw}, which contains the actual processors state including the mode bit, to the stack.
The same number is used in kernel mode as an index within the interrupt vector table (or \acf{idt}) which contains the start address of the system service dispatcher routine (compare to~\ref{pic:syscalls}).
This table's content, the system service dispatcher routine, is loaded as next instruction to the new \ac{psw}\cite{brause2017betriebssysteme}.
Jumping to this routine, the system call's parameters and its number are restored to examine the actual call and invoke the matching service routine from the system service dispatching table which finally fulfills the requested action as pictured in simplified form in~\ref{pic:syscalls}\cite{glatz2015betriebssysteme}.
Conclusively, the control flow jumped back to the system service dispatcher which hands the control back to user space including switching back to user mode in the common way to return\cite{glatz2015betriebssysteme}.
The previous \ac{psw} is restored from the stack containing the bit for the \ac{cpu}'s user mode execution.
The user application has to clean up the stack like for each procedure call at the very end\cite{tanenbaum-modern-operating-systems}.
% Figure~\ref{pic:syscalls} shows a simplified version of the system call implementation on x86\_64 for Linux\cite{decade-linux-syscalls}.
More modern versions use the special instructions \texttt{SYSENTER} and \texttt{SYSEXIT} (Intel) or \texttt{SYSCALL} and \texttt{SYSRET} (AMD) instead of the slower trap interrupts\cite{decade-linux-syscalls}.
Figure~\ref{pic:syscalls} uses a x86\_64 like assembly and thus, this modern syntax for invoking the systemcall while the \ac{idt} also pictures the older number 0x80.
%
\begin{figure} [t]
\centering
\includegraphics[width=\linewidth]{Systemcalls}
\caption{System Call Implementation inspired by~\cite{lfd430}}\label{pic:syscalls}
\end{figure}
%
%
\subsubsection{POSIX}
The basic idea behind the \acf{posix} standard is to define a stable interface between the user-space and the operating system kernel to achieve portability for applications on systems meeting this standard.
But even if an operating system complies, it may differ in the type and number of available system calls.
\ac{posix} is only an \ac{api} definition but not one for system calls.
An operating system may implement the \ac{posix} standard functions within a library which often involve system calls, but it has not to do so\cite{lfd430},~\cite{glatz2015betriebssysteme}.
System calls are rarely used directly by user applications without an abstraction layer such as libraries. An example is the \textit{libc} on Unix-like systems\cite{lfd430},~\cite{tanenbaum-modern-operating-systems}.
Equally, a system is not constrained to a single \ac{api} and may implement different ones using its existing set of system calls\cite{glatz2015betriebssysteme}.
\ac{posix} is a very common standard which is for example implemented in UNIX, macOS, MACH and partly in Linux\cite{tanenbaum-modern-operating-systems},~\cite{glatz2015betriebssysteme}.
Another example for an \ac{api} is Win32 used by Windows to abstract their system calls, but as a result, applications targeting the Windows Win32 \ac{api} are not portable to systems implementing the \ac{posix} standard.
% As a result system calls are rarely used directly by user applications without an abstraction layer such as libraries. An example is the \textit{libc}\cite{lfd430},~\cite{tanenbaum-modern-operating-systems}.
\subsection{System Calls in Linux}
The way system calls are working in Linux was already described as part of the general section.
The exact mechanism, the calling convention but also the number of system calls is highly dependent on the \ac{cpu}'s architecture.
While a 32-bit Linux kernel in version 4.8 for the x86 architecture offers 379 calls, the 64-bit version for x86\_64 offers only 328\cite{lfd430}.
The \textit{man-pages} project documents give an overview (\texttt{man 2 syscall}) about architectural differences and the calling conventions.
How far Linux is actually compatible to the \ac{posix} standard is not only related to the kernel and the number of system calls itself, but for the most part to its abstraction layer, the used \ac{posix}/C standard library.
One of the most widespread ones, the \textit{glibc} (GNU C Library) aims to follow POSIX.1\-2008 amongst other standards\footnote{gnu.org, visited on 15.06.2019 \url{https://www.gnu.org/software/libc/}} while the \textit{musl} library does not implement it in complete\footnote{repo.or.cz, visited on 15.06.2019 \url{https://repo.or.cz/w/musl-tools.git/blob_plain/HEAD:/tab_posix.html}}.
\subsection{System Calls in Zircon}
% FIDL, core libs,..
% http://research.cs.queensu.ca/~cordy/Papers/BKBHDC_ESE_Linux.pdf
% for vdso
In Zircon, system calls are bounded to the concept of \textit{handles}, a construct which allows applications running in user mode to reference an object in kernel mode\cite{zircon-handle}.
Interactions between user applications and kernel objects are still done using system calls but most of them are using a handle which describes the kernel object to work on\cite{zircon-concepts}.
Handles are checked by the kernel each time a system call is triggered.
For additional security, the kernel checks whether
\begin{itemize}
\item a handle has the correct type for the system call,
\item a kernel handle's parameters refers to an existing within the calling process's handle table and
\item a handle has the necessary rights for the triggered action\cite{zircon-concepts}.
\end{itemize}
%
In contrast to Linux, Zircon provides just one library for system calls and the standard C implementation, the \textit{libzircon.so}.
It is a \acf{vdso} directly provided by the kernel and not stored as a physical \ac{elf} file on disk.
For the reason that \acp{vdso} are accessible from both, kernel and user mode, without switching the context, they are a perfect concept to implement system calls in a very performant way\cite{vdso-linuxjournal}.
Thus, the Zircon \ac{vdso} is the only way to perform system calls\cite{zircon-vdso}, which is a very elegant solution to cope with performance issues in a microkernel architecture.
The system calls are defined by using an abstract definition syntax and the matching tool \textit{abigen} which generates header files and code for the libzircon's and the kernel's system call implementation\cite{zircon-concepts}.
Also in contrast to Linux, Zircon respectively Fuchsia does not aim for \ac{posix} compatibility.
It implements only a very limited subset of \ac{posix} consisting of basic \ac{io} operations and pthreads.
Zircon does not support Unix-like signals, symbolic links and much more\cite{zircon-libc-posix}.
The libzircon.so does not support directly \ac{io} operations.
They are performed by the \textit{fdio.so} library which overwrites weak symbols of libzircon\cite{zircon-libc-posix}.
All available system calls in Zircon from the version this thesis is working on are documented in \url{https://github.com/Allegra42/zircon/tree/i2c-grove-lcd/docs/syscalls}.
\section{Processes and Threads}\label{sec:processes-threads}
Modern operating systems are characterized by their ability to perform various tasks at the same time.
So far, this fact has simply been accepted within this work, but it was not questioned what is special about it and how this multitasking is achieved by an operating system.
In general, parallelization of tasks can take place on different levels, e.g.\ as hardware or software parallelism.
While the physical resources for the first case are actually available to execute the tasks really simultaneously, it only seems to be so for software parallelism\cite{glatz2015betriebssysteme}.
Hardware parallelism is realized by independent, maybe specialized execution units such as multiple processing cores or controllers which are able to perform particular parallel to the main \ac{cpu}.
This could be USB, network or graphics controller, for example\cite{glatz2015betriebssysteme}.
In order to create the impression of parallelism on a single \ac{cpu} and to use their available computing power as efficient as possible, an abstraction to provide pseudo concurrency for the execution of several tasks is required.
According to \textsc{Tanenbaum}, this concept called \textit{process model}, is the most central one of operating systems\cite{tanenbaum-modern-operating-systems}.
The term \textit{process} is often defined as a program in execution\cite{achilles2006betriebssysteme}.
The process model is complemented by the thread model which provides a simplified version for parallelism within a process or an application\cite{glatz2015betriebssysteme}.
The following sections take a closer look at these concepts and answer the remaining questions, in particular about the process and thread model, synchronization mechanisms, inter process communication and their implementation in Linux and Zircon.
\subsection{Processes}
% model, what is a process
% \cite{glatz2015betriebssysteme}:
% each process has virtually the whole cpu and corresponding resources for itself (cpu, register, address space)
% process model realizes/coordinates the parallel execution of more than one process -> time multiplexing
% reality: num processes > num cpu cores -> pseudo parallel execution
%
% \cite{tanenbaum-modern-operating-systems}: process model
% runnable software is organized as a number of sequential processes, process is an instance of an executing program including its current values (pc, registers, variables) -> each one has its own virtual cpu -> real cpu switches different processes -> called multiprogramming
%
% programm needs to be in a defined format (object file format)
% \cite{silberschatz} program is a passive entity -> executable file, process is an active entity (pc, ...)
%
% -> vordergrund/hintergrundprozesse (ui, deamon processes)
% foreground(background) background -> not associated with users but with specific function -> deamon (accept incomming mail, ..) \cite{tanenbaum-modern-operating-systems}
% process implementation
% process is different in different systems -> needs something like a processor control block (pcb) (containing all relevant data, an id (pid), process state, process context,...)
% -> for a parallel execution we need to switch between them -> context switches
% process needs to be paused and resumed -> relevant information needs to be stored and restored -> cpu register contents (pc, sp, psw), process relevant memory contents (stack), register contents are copied to pcb and restored from there, memory content stays -> each process gets a private part of the memory
%
% \cite{achilles2006betriebssysteme} process contains memory segments -> programm code, data segement(s), stack, heap + exclusive resouces and if cpu is active -> cpu register -> saved in pcb if process is interrupted (contains id, priority, process rights, maximum resources (max cpu time, max files open, ..), actual resources (actual cpu time, ...)
%
% \cite{tanenbaum-modern-operating-systems}
% process table -> one entry per process (process control block) -> contains information about the process's state (pc, stack pointer, memory allocation, status of open files, accounting and scheduling information, priority, pid, parent process, ...)
%
% \cite{silberschatz}
% pcb: process state (running, waiting,...), program counter, cpu registers, cpu scheduling information, memory management information, accounting information (maximum and actual cpu time used, time limits, account numbers, ..), io status information (io devices, open files, ... allocated by this process)
%
% process states
% \cite{tanenbaum-modern-operating-systems} process states
% running actually using cpu
% ready runnable, temporary stopped to run other process
% blocked unable to run until sth external happens, e.g. a resource is freed
%
% \cite{silberschatz}
% new -> process is being created
% running ->
% waiting -> (blocked)
% ready
% terminated
%
% % lifecycle
% \cite{mandl2014Grungkurs}
% pic mandl
% pic silberschatz
% 1 os activates process
% 2 os interrupts other process/ preemption
% 3 process is blocked (waits for input, resources)
% 4 reason for blocking is not longer available -> resources are available
% 5 process is terminated (in(voluntary))
% -> pic tanenbaum
% multiprogramming model
% programm needs to be in a defined format (object file format)
%
% \cite{tanenbaum-modern-operating-systems}
% improves cpu utilization
% - hardware stacks pc, ...
% - hardware loads new pc from interrupt vector
% - registers are saved
% - new stack is setuped
% - interrupt service runs (reads and buffers input)
% - scheduler decides which process is to run next
% - new process is started
% process start
% \cite{glatz2015betriebssysteme}, \cite{tanenbaum-modern-operating-systems}
% process start:
% -system start (initialisation)
% -system service call for process creation (through a running process)
% -application start (via user)
% - init batch job
%
% -> vordergrund/hintergrundprozesse (ui, deamon processes)
% foreground(background) background -> not associated with users but with specific function -> deamon (accept incomming mail, ..) \cite{tanenbaum-modern-operating-systems}
% types of new processes
% process chaining -> unix exec
% process forking -> unix fork
% process creation -> (threads pthread create, win thread create), win process create
% % data for new processes
% data for new process -> inherit of all data (need to share same programm code), data buffer which is init via old process (via ipc), new process with initial values e.g. via sys call params
%
% inherit -> fork, win createprocess (\cite{achilles2006betriebssysteme}: both processes running at the same time, or parent waits for child to be ended, both share all resources/ a part of them /none, address space: child is a duplicate of parent/ child runs a new programming)
% ipc -> unix popen
% initial params -> posix pthread create, win create thread
% % process termination
% \cite{glatz2015betriebssysteme}, \cite{tanenbaum-modern-operating-systems}
% termination:
% - normal exit (voluntary)
% - ahead of time , error exit (self detected error, voluntary)
% - ahead of time (fatal error, detected by system) (not voluntary)
% - terminated/killed by other process (not voluntary involuntary)
The term \textit{process} describes an instance of a program in execution, including all its required resources to enable a model for the pseudo parallel execution of multiple programs based thereon\cite{silberschatz2009operating},~\cite{tanenbaum-modern-operating-systems}.
Each program in execution is modeled as a separate process which is assigned to a, from its point of view, private \ac{cpu} including \ac{cpu} registers, above all the program counter (PC) and a virtual private address space\cite{tanenbaum-modern-operating-systems},~\cite{glatz2015betriebssysteme}.
Especially by using virtual private memory, processes are not only modeled individually, but are also isolated from each other in reality in order to prevent errors or deliberate attacks between programs\cite{brause2017betriebssysteme}.
But since in reality more processes, respectively programs, have to be executed than \acp{cpu} exists, a mechanism is needed to switch between processes and allow the execution of all them.
This mechanism behind, the \textit{multiprogramming model} (formerly also known as \textit{time-sharing model}), and the policies called \textit{process scheduling} will be considered hereafter.
However, for a change between the execution of programs to succeed, a process must contain information about its state at the moment it is interrupted to run another one.
Which information needs to be stored and the way it is done slightly depends on an effective system's design and its implementation.
In general, this information is stored in a structure called \textit{process control block} in literature\cite{tanenbaum-modern-operating-systems}.
Depending on the implementation, it contains for example
\begin{itemize}
\item a process id,
\item the process's state in the process lifecycle (described in the following section),
\item the \ac{cpu}'s register contents, especially the program counter (PC), the stack pointer (SP) and the processor status word (PSW),
\item the process's address space including the program code, its stack and heap and data segments,
\item information about resources allocated by the process like open files, or \ac{io} devices,
\item \ac{cpu} scheduling and accounting information like a priority, the maximum and actual computing time for this process,
\item a reference to the parent process and
\item the process's rights\cite{tanenbaum-modern-operating-systems},~\cite{glatz2015betriebssysteme},~\cite{achilles2006betriebssysteme},~\cite{silberschatz2009operating}.
\end{itemize}
\subsubsection*{Process Lifecycle}
The state in the list above refers to a process's lifecycle.
It is quite easy and in most cases described as a state chart like pictured in figure~\ref{pic:process-lifecycle}.
Once a process was created, it is managed by the operating system and changes its state to \textit{ready}.
The process is ready to run, but has to wait until the system allocates a \ac{cpu} to it.
Is that the case, the process changes its state to \textit{running} and performs its calculations.
If a process is interrupted (preempted) by the operating system without having fulfilled its task completely, there can be two reasons for this which result in different subsequent states.
Was the process interrupted just to run another one, the process's state changes back to \textit{ready}.
It only lacks \ac{cpu} time to run the process again.
If the reason was instead that resources - like the process is waiting for an external event or required \ac{io} devices - are used by another process, then it changes its state to \textit{waiting} or \textit{blocked}.
Only when the blocking resource needed by the process is available again, the operating system changes the status of the process to \textit{ready} again.
The process is prepared to get reassigned to a \ac{cpu}.
A running process finishing its task during its computing time has reached the end of its life, its state changes to \textit{terminated}.
The operating system destroys the process and frees related resources\cite{silberschatz2009operating},~\cite{mandl2014Grundkurs}.
%
\begin{figure} [t]
\centering
\includegraphics[width=\linewidth]{ProcessLifecycle}
\caption{Lifecycle of a process according to~\label{pic:process-lifecycle}\cite{silberschatz2009operating}}
\end{figure}
%
The model described applies to each individual process.
For the coordination of the entirety of processes, the scheduler, an operating system kernel's component is responsible.
It is a good example for the separation of mechanism and policy.
The mechanism behind is based on the \textit{multiprogramming} model as the switching of the \ac{cpu} between programs is called.
The policy, that is the strategy to decide which process should run next, should be specified apart from the mechanism.
It not only allows the impression of parallelism on a single processing core, but also increases its utilization, as many processes spend a lot of time waiting for external events such as keyboard inputs\cite{tanenbaum-modern-operating-systems}.
Principally, the mechanism behind scheduling does not differ fundamentally from the treatment of an ordinary interrupt.
First, the \acf{pc} is saved prior to the new \ac{pc} loading from the interrupt vector and the \ac{cpu}'s registers are saved.
After a new stack was setup, the actual interrupt service runs before the scheduling policy decides which process is to run next and the selected one is started\cite{tanenbaum-modern-operating-systems}.
\subsubsection*{Process Creation}
There are several reasons why a new process should be started and scheduled at all.
The most basic one is the systems start-up.
To initialize an operating system, numerous processes are created to execute parts of the system itself.
The user perceives very few of them directly, since most of them are \textit{background processes} performing specific tasks like accepting incoming mails.
Often long running processes without user interaction are referred to as \textit{daemon processes}\cite{glatz2015betriebssysteme},~\cite{tanenbaum-modern-operating-systems}.
They are mostly generated from the \textit{init process}, the first one running and bringing up the system.
Foreground processes, the ones a user interacts with, are more often started on behalf of the user.
Besides, a process can also be started by a system service call from an existing process.
The last option that a process is started to execute a batch job is rare today apart from scientific high performance computers\cite{tanenbaum-modern-operating-systems},~\cite{glatz2015betriebssysteme}.
Also, for the way how a new process can be started there are several options.
Using \textit{process chaining}, a running process starts an independent new one, a new \ac{pcb}, and destroys itself\cite{achilles2006betriebssysteme}.
This option is pictured in figure~\ref{pic:processcreation}, a.
An example in Unix-like systems is calling \texttt{exec()} on an already running process to replace the current program with another one\cite{glatz2015betriebssysteme},~\cite{tanenbaum-modern-operating-systems}.
By \textit{forking a process} (figure~\ref{pic:processcreation}, b), a second one is created that is, at least in the beginning, a copy of the original one.
Both keep exiting and share the same environment such as program code, address space and resources\cite{tanenbaum-modern-operating-systems}.
As a result, both processes have access to exactly the same resources as opened files or \ac{io} devices\cite{achilles2006betriebssysteme}.
The Unix implementation of \texttt{fork()} is an example for this way of process creation.
Process \textit{forking} and \textit{chaining} are usually used combined in Unix-like systems to create a new, independent process which is running in parallel to the first one\cite{tanenbaum-modern-operating-systems}.
In contrast, Windows offers a third way of \textit{process creation} (figure~\ref{pic:processcreation}, c) which combines them in a single instruction to start a second, independent process (\texttt{CreateProcess()})\cite{glatz2015betriebssysteme}.
%
\begin{figure} [t]
\centering
\includegraphics[width=\linewidth]{ProcessCreation}
\caption{Ways of Process Creation according to~\cite{glatz2015betriebssysteme}}\label{pic:processcreation}
\end{figure}
%
As the newly created process and its parent share the same program code, the new one can simply inherit all needed data from its parent like it is done by \textit{process forking}\cite{achilles2006betriebssysteme}.
Each process has its own virtual address space if it is not shared through inheritance.
As a result, the data cannot be copied between processes without further ado.
This is not only necessary for the multiprogramming model, but also a security mechanism that encapsulates and protects applications in execution against each other, called \textit{process isolation}.
In this case, i.e.\ for the one created via \textit{process creation}, an operating system needs a mechanism to pass data to the created process.
As a solution a mechanism for the communication between processes with which the parent process communicated the new data to the child via a buffer is just as conceivable as the use of initial parameters which are transmitted during the process creation, e.g.\ via a system call\cite{glatz2015betriebssysteme}.
The first option is used by the \texttt{popen()} call of Unix-like systems while the mechanism behind the \textit{\ac{ipc}} plays a major role in modern operating systems and therefore is treated in a separate section.
The second one is more often used for \textit{threads} which are the topic of the following section.
\subsubsection*{Process Termination}
Reaching the end of its lifecycle (see figure~\ref{pic:process-lifecycle}), a process terminates itself regularly and voluntary (\textit{normal exit})\cite{tanenbaum-modern-operating-systems}.
In addition, a process can terminate itself prematurely for various scenarios or be terminated by the operating system or other processes.
For example, a process may detect an internal error and voluntary terminates itself ahead of time or the operating system detects such an error and terminates the process involuntary to avoid major damage\cite{tanenbaum-modern-operating-systems}.
Besides, a process may get involuntarily terminated or killed by another process.
Killing another process is usually a privileged task which requires an authorization or extended rights\cite{tanenbaum-modern-operating-systems}.
\subsection{Threads}\label{sec:threads}
% whats a thread why threads
% \cite{mandl2014Grundkurs}: processes are expensive, threads are resource-saving -> a concurrent execution unit within a process, also named light-weight process (LWP), all threads within a process share the same addres space, the one of the process -> same process data, program code, global variables, resources -> light-weight because they share the resources
% besides the inherit context of the process (containing open files, ...) they have an own stack for local vars, own register and pc
%
% \cite{brause2017betriebssysteme}: processes needs much memory (not the address space alone but also the pcb entry) -> for a number of processes it is more than can be cached in ram -> switching processes is a heavy operating and takes a long time
% much applications do not need a completely new contest, but threads for execution within one process -> app waits for keyboard input and checks text for errors at the same time
% -> threads as solution (lwp)
%
% \cite{glatz2015betriebssysteme}: basic idea: parallel actions based on the same resources (address space, files, io devices) -> parallel actions within a process, process manages the resources and threads -> no isolation or saftely between threads -> data may get corrupted/seen by another thread in the same process
% process manages resources, thread executes the code -> each process contains at least one thread to execute code
%
%
% processes are used to group resources together, threads are the entities scheduled for the execution on the cpu
% threads: allow multiple executions to take place in the same environment. (multithreading)
%
% \cite{silberschatz2009operating}: benefits: responsiveness, resource sharing, economy (allocation memory and resources for process creation is costly), scalability (multiprocessor arch)
The current process concept, as previously introduced, provides only a single thread of execution.
However, this an issue as soon as a resource needs to be edited in parallel.
If a user edits a file for example, the process has to wait for the user's input repeatedly.
It is nearly impossible to check this input for errors at the same time, since a second process is not allowed to access resources allocated by another process without an explicit action of programmers (like explicit \acl{ipc})\cite{tanenbaum-modern-operating-systems},~\cite{brause2017betriebssysteme}.
The \textit{thread model} should solve this issue.
Its basic idea is to equip a process with several execution threads that work on the same resources running in quasi-parallel\cite{tanenbaum-modern-operating-systems}.
This means processes only group the resources together while \textit{threads} represent the actual execution units on the \ac{cpu}\cite{tanenbaum-modern-operating-systems}.
A program is still abstracted as a process while (parallel) sequences within the program correspond to threads.
As a result, \textit{threads} can be considered as slimmed down processes sharing the same address space and physical resources.
Each thread has its own \acp{pc}, register set, stack and state, but the address space, global variables, open files and accounting information are shared\cite{tanenbaum-modern-operating-systems}.
A thread's lifecycle is equivalent to process' ones, but in contrast to them threads are not isolated and protected against each other\cite{glatz2015betriebssysteme}.
As they share the same address space, one thread can read, write or even destroy another thread's private stack\cite{tanenbaum-modern-operating-systems}.
Another issue are competing threads that try to write the same global data.
They cause a \textit{race condition} which possibly results in inconsistent or wrong data\cite{brause2017betriebssysteme}.
Nevertheless, the advantages of using \textit{threads} (multithreaded programming) predominate.
The most important one is resource sharing.
The fact that multiple threads are using the same resources by default, e.g.\ files, in pseudo parallel are the biggest advantage but at the same time the biggest problem of this concept.
Threads enable responsive, interactive applications and increase the performance especially on multiprocessor architectures in that they can run truly parallel\cite{silberschatz2009operating}.
Another reason is an economic one.
While allocating an address space and physical resources to create a process is an expensive operation, creating a thread is not as they inherit exactly these components from their corresponding process\cite{silberschatz2009operating},~\cite{mandl2014Grundkurs}.
Threads only need to set up their own \ac{pc}, registers, stack and a state within a process.
That's why they are also known as \acf{lwp}\cite{mandl2014Grundkurs}.
How \textit{light-weight} they actually are depends on a lot on their implementation in an operating system.
Conceivable options are thread implementation in userspace or kernel space, but also hybrid ones.
Implementing them as user library requires an own mechanism to schedule threads within a process which is often realized as a kind of runtime environment including a \ac{tcb} (corresponding to \ac{pcb}) for each single process\cite{mandl2014Grundkurs},~\cite{tanenbaum-modern-operating-systems}.
There is no need to invoke the operating system to switch a thread, but as a result all threads within a process block become blocked if a single one waits for a resource\cite{tanenbaum-modern-operating-systems},~\cite{brause2017betriebssysteme}.
The operating system's schedule just do not know about the runtime environment and possible other runnable threads within a process.
The implementation in kernel space is not that light-weight, but multicore architectures benefit more from this variant.
If the operating system's kernel has the control over threads, these are managed just like processes.
Instead of one \ac{tcb} within each process, the kernel collects the \acp{tcb} for all threads in the entire system in a thread table corresponding to the process table\cite{tanenbaum-modern-operating-systems},~\cite{mandl2014Grundkurs}.
But this also allows the kernel to recognize a blocked thread and schedule another one of the same process instead of blocking the whole process\cite{tanenbaum-modern-operating-systems}.
\subsection{Processes and Threads in Linux}
As in the general model, a process in Linux corresponds to a program in execution.
Initially, a process contains exactly one thread of execution but further ones can be created as soon as the process has started\cite{tanenbaum-modern-operating-systems}.
Processes are organized in a tree-like structure with the so-called \textit{init} process on its top.
Starting from init, each new process to be created is split off from its parent using the \textit{fork()} system call.
Forking a process, the newly created child process inherits the whole environment of its parent including environmental variables, opened files and network connections.
Furthermore, it gets a copy of its parents address space including the data and code sections.
As both processes share the same data and resources at this time, synchronization is needed if both try to access the same one\cite{mandl2014Grundkurs}.
This issue is considered in the following section~\ref{sec:synchro}.
The distinction of parent and child is done via their \ac{pid}.
The fork call returns 0 to the child and a non-null \ac{pid} to its parent.
Based on this, the execution paths can be split up\cite{tanenbaum-modern-operating-systems},~\cite{mandl2014Grundkurs}.
Even if parent and child differ only by the \ac{pid} directly after the fork, changes in the parent are not visible to the child and vice versa.
To take advantage of this fact to actually use the newly created process independently of the parent, in Unix-like systems the fork call is in most cases directly followed by the call \textit{exec()}, which replaces the complete process environment with the one from another, new program.
As copying the process data is quite costly and needless if an exec follows directly, modern Linux systems use a \textit{copy-on-write} strategy for process forking.
Children get their own page tables pointing to the parent ones.
The child still can read the parent's environment data without a copy operation needed.
Only when the child or the parent tries a write access on this data, a \textit{protection fault} is thrown and the page to modify is copied to the child process\cite{tanenbaum-modern-operating-systems}.
Such a fork is known as \textit{vfork()} and used especially in situations an \textit{exec()} follows directly to avoid the expensive and needless copy of the parent process's environment\cite{mandl2014Grundkurs}.
%TODO stefan means hard to understand
Within the Linux kernel, the difference between processes and threads (or heavy-weight and light-weight processes) is not as important as described in the general section about processes.
It is rather spoken of \textit{tasks}.
The reason for this is that the Linux kernel offers a possibility for a fine-granular process respectively thread creation via the \textit{clone()} call on kernel level.
In addition, Linux respects \ac{posix} threads\cite{lfd430}.
But as the clone call is a unique feature of Linux and not generally available on other Unix-like systems applications using the clone call directly are not portable\cite{tanenbaum-modern-operating-systems}.
Nevertheless, \textit{clone()} is a very interesting concept in Linux.
In every case, a process or thread is created with clone it gets an empty private stack and executes directly a new program which is given as an argument to the call.
The decision whether a process or a thread should be created, which resources should be shared or copied and which process is actual the parent are based on the following flags:
\begin{itemize}
\item if \texttt{CLONE\_VM} is set, a new thread is created, if not, the call results in a new process.
\item if \texttt{CLONE\_FS} is set, the newly created thread or process shares \textit{umask}, \textit{root} and \textit{working directories} with its parent. They are not shared at all if the flag is not set.
\item if \texttt{CLONE\_FILES} is set, the parent shares file descriptors with its child. If not, they are copied to the child.
\item if \texttt{CLONE\_PARENT} is set, the child's parent is the same as the calling process ones. If the flag is not set, the calling process becomes the parent\cite{tanenbaum-modern-operating-systems}.
\end{itemize}
The flags listed show only a partial amount of the possibilities \textit{clone()} offers.
A full list of the glibc wrapper to the corresponding systemcall is available as a man-page using \texttt{man 2 clone}.
Both, processes (tasks) and threads based on the clone call are kernel constructs.
The \ac{posix} thread implementation on Linux internally also uses the clone call with the special flag \texttt{CLONE\_THREAD}.
In contrast to the process lifecycle presented in the generic section, Linux tasks have one extra state called \textit{zombie} which is entered on process termination until the parent process is informed about\cite{mandl2014Grundkurs}.
%TODO deamons in linux?
\subsection{Processes and Threads in Zircon}\label{sec:processes-zircon}
Similar to Linux, runnable entities of the Zircon kernel are called \textit{tasks}.
This term includes the kernel objects \textit{jobs}, \textit{processes} and \textit{threads}.
As the whole Zircon kernel is object-based, the user interacts with kernel objects like the ones mentioned above via handles.
The Zircon documentation describes \textit{handles} as ``kernel constructs that allows user-mode programs to reference a kernel object''\cite{zircon-process}, containing the reference to this object, the corresponding rights and the user-space process it is bounded to.
Thus, a handle can reference each object type listed in the \textit{Zircon Kernel objects} reference\cite{zircon-objects}, including kernel objects for drivers like \textit{interrupts}, \textit{resource} objects and \textit{Log} objects.
The Zircon systemcall \texttt{zx\_status\_t zx\_task\_kill(zx\_handle\_t handle)} is such an interaction between a user application and the kernel.
It refers to a \textit{task} object handle and thus to all objects for which \textit{task} is a generic term\cite{zircon-task}.
% All objects included in the definition of the term \textit{task} have in common that they can be suspended, resumed and killed.
\textit{Jobs} as an organizational unit are Zircon-specific and not known from the general or the Linux-specific section on processes and threads.
A \textit{job} manages a group of processes but possibly other (child) jobs, too.
Corresponding to Linux, jobs build a tree structure as every process must belong to a single job but jobs can be nested.
Except the root job, which corresponds to the init process on Linux, each one has only one parent.
A Job object consist of a reference to a single parent job, a number of child jobs and a set of member processes.
In future\footnote{Unless otherwise noted, the status of the Zircon documentation cited in this thesis corresponds to the forked and frozen source code repository on which basis the driver development takes place. See \url{https://github.com/Allegra42/zircon/blob/i2c-grove-lcd/docs/}.}, they will also contain a set of policies\cite{zircon-job}.
Jobs are used to track the privileges needed to perform kernel related operations like systemcalls but also to track and limit the consumption of basic computing resources such as memory, \ac{cpu} time or \ac{io} devices\cite{zircon-job}.
The idea behind this concept enables managing applications composed of more than one process as a single entity, both from a resource and permission view as well as from lifetime control\cite{zircon-process}.
A Zircon \textit{process} itself is an instance of a program as defined in the general section about processes~\ref{sec:processes-threads}.
It consists of the program code as a set of instructions to be executed by one or more \textit{threads} and a collection of resources.
Threads are just as much a part of a process' resources as \textit{handles} and \textit{\acp{vmar}} are\cite{zircon-process}.
Strictly speaking, it is not necessary to mention \acp{vmar} as an own point.
They are kernel objects itself.
But as the documentation mentions them explicitly, they should also receive special attention at this point.
A \ac{vmar} represents a contiguous part of virtual memory address space used by the kernel as well as by the user-space.
Each process starts its own \ac{vmar} to build up its address space.
\acp{vmar} have a hierarchical permission model, so a process with a read only address space cannot create a readable and writable one, and are randomized per default\cite{zircon-vmar}.
The lifetime of Zircon processes differs from the general but also from the Linux model.
In Zircon, a new process is created via \texttt{zx\_process\_create()} but the execution starts not before \texttt{zx\_process\_start()} is called.
Was a process already started and its last running thread exits, it is impossible to create a new thread within this process.
Processes which are composed of a job are threatened as a single entity from a lifetime control's point of view.
The lifetime of Zircon processes (or jobs) ends if
\begin{itemize}
\item the last thread within the process or job exits or is terminated,
\item the process or job itself calls \texttt{zx\_process\_exit()},
\item the parent job terminates the process or
\item the parent job is destroyed\cite{zircon-process}.
\end{itemize}
Just as described in the general section Zircon \textit{threads} are the actual runnable computation entity living within a process.
They are created within the \textit{process} context via \texttt{zx\_thread\_create()} but as it is done with Zircon processes, the actual execution is not started until \texttt{zx\_thread\_start()} \textbf{or} \texttt{zx\_process\_start()} is called.
The \texttt{main()} function or entry point of an application should be the first one started via \texttt{zx\_process\_start()}.
But returning from such an entry point does not terminate a thread's execution.
This must be done manually by voluntarily calling
\begin{itemize}
\item \texttt{zx\_thread\_exit()},
\item \texttt{zx\_vmar\_unmap\_handle\_close\_thread\_exit()},
\item \texttt{zx\_futex\_wake\_handle\_close\_thread\_exit()}
\end{itemize}
on the thread itself\cite{zircon-thread}.
Even if the last handle to a thread is closed, the thread is not terminated automatically.
Instead, the thread must be killed explicitly after the handle was restored via calling \texttt{zx\_object\_get\_child()} on the parent.
On the other hand, a thread can be terminated involuntarily
\begin{itemize}
\item if the parent process is terminated,
\item if someone calls \texttt{zx\_task\_kill()} on the thread's handle or
\item if an exception was generated for which there is no handler or the handler terminated the thread\cite{zircon-thread}.
\end{itemize}
In contrast to Linux threads and some library thread implementations, Zircon threads are always \textit{detached} from each other.
An operation like \texttt{join()} which waits for an undetached thread to complete and allows a clean termination is not needed in Zircon itself.
Libraries and runtime environments on top of Zircon may require such an operation to reach e.g.\ \ac{posix} compatibility\cite{zircon-thread}.
\section{Synchronization and Inter Process Communication}\label{sec:ipc-and-synchro}
% The term \ac{ipc} is not only used for pure communication between different processes but also for their synchronization\cite{glatz2015betriebssysteme}.
% The latter has already been dealt with in the previous section.
% In this one, especially the communication between different processes is to be considered.
% The one between threads of the same process has previously been discussed.
The term \acf{ipc} is not only used for the pure communication between different processes but also for the synchronization between processes, threads and data shared between them.
As already mentioned in the previous~section~\ref{sec:threads}, the ability of threads to access shared data has not only advantages in this context.
As they share storage, e.g.\ main memory, two threads can read and write the same value.
But what happens if both of them try to access the same, shared value?
Two threads try to read and modify a shared value are given for example.
Both read the value and calculate based on the read a new one to store.
There is no problem as long as they just read the same value, but it occurs on updating the value.
Maybe the first thread is scheduled first and updates the value, directly following the second one is scheduled and updates the value based on the value read before the first thread was executed.
In this case, the update of the first thread is lost.
The data possibly becomes inconsistent.
Such a situation is called \textit{lost update problem}\cite{glatz2015betriebssysteme}.
Unless the scheduling of the threads is predictable in each situation, it remains impossible to predict the exact outcome.
Therefore, one speaks of a \textit{race condition}\cite{tanenbaum-modern-operating-systems}.
The section~\ref{sec:synchro} discusses corresponding problems, especially for threads, and indicates ways to deal with them.
Since it is not sufficient to allow only the exchange of data respectively communication between threads of a single process, section~\ref{sec:ipc} shows mechanisms that overcome the process isolation and enable the communication between multiple ones.
\subsection{Synchronization}\label{sec:synchro}
% Already the previous section mentioned that the ability of threads to access shared data not only has advantages.
% As they share storage, e.g.\ main memory, two threads can read and write the same value.
% But what happens if both of them try to access the same, shared value?
% Two threads try to read and modify a shared value are given for example.
% Both read the value and calculate based on the read a new one to store.
% There is no problem as long as they just read the same value, but it occures on updating the value.
% May the first thread is scheduled first and updates the value, directly following the second one is scheduled and updates the value based on the value read before the first thread was executed.
% In this case, the update of the first thread is lost.
% The data becomes inconsitent.
% Such a situation is called \textit{lost update problem}\cite{glatz2015betriebssysteme}.
% Unless the scheduling of the threads is predictable in each situation, it remains impossible to predict the exact outcome.
% Therefore, one speaks of a \textit{race condition}\cite{tanenbaum-modern-operating-systems}.
In the context of this section, it should be assumed that \textit{threads} are implemented as a part of the operating system's kernel rather than based on a runtime system.
According to the definition, the term \textit{thread} refers in this section to both, execution threads within a single process and the ones within independent processes.
A competitive situation can occur for both variants.
The threads of one process rather rival regarding a shared variable while the threads of different processes rather compete for resources such as a printer.
%
Race conditions are a topic each time several processes or threads work on the same data in any way.
This also applies to an operating system's kernel, except for non-preemptive kernels which can guarantee that only one thread is active at a time on a single-core system\cite{silberschatz2009operating}.
In contrast to preemptive kernels, non-preemptive ones do not allow a thread running in kernel mode to be interrupted.
The thread runs until it exits kernel mode, blocks or yields the control of the \ac{cpu} voluntarily\cite{silberschatz2009operating}.
In such a case it can be guaranteed that a kernel and its data structures are free of race conditions.
However, preemptive kernels are more common because they are more responsive and better suited for real-time tasks due to their interruptibility\cite{silberschatz2009operating}.
Furthermore, with today's multicore systems it is rather unlikely to fulfill the requirement of only one active thread in kernel space.
%
For this reason, a mechanism to prevent race conditions on both, user and kernelspace, is needed.
The basic idea behind is to prevent the possibility that two threads try to change a shared resource at the same time\cite{tanenbaum-modern-operating-systems}.
But only certain areas in the program code are critical, the ones where a shared resource is processed.
They are called \textit{critical regions} or \textit{critical section}\cite{tanenbaum-modern-operating-systems}.
It does not harm if other regions are interrupted.
% Other regions do not harm if they are interrupted.
If only one thread at a time is allowed in critical sections and another one is not allowed to enter the region until the competing access is completed, this is called \textit{mutual exclusion}\cite{tanenbaum-modern-operating-systems},~\cite{glatz2015betriebssysteme}.
The easiest way to achieve \textit{mutual exclusion} is to completely disable the system's interrupts immediately after entering a critical region and re-enable them just before leaving it.
In the meantime, all incoming interrupts are collected and processed as soon as they become re-enabled\cite{achilles2006betriebssysteme}.
However, this method is not suitable for modern multicore systems since the interrupts can only be locked for the current \ac{cpu} core.
The competing thread can still modify the resource if executed on a different core.
This solution is not ideal for single core systems, too, because even the clock interrupt is disabled and with this the process scheduling.
Does the currently running thread not re-enable the interrupts, the whole system is blocked\cite{tanenbaum-modern-operating-systems}.
While there are some approaches to pure software solutions such as the algorithms of \textsc{Peterson} or \textsc{Dekker} (see \cite{tanenbaum-modern-operating-systems} or \cite{silberschatz2009operating} for further information) hardware enabled solutions are common today\cite{tanenbaum-modern-operating-systems}.
Modern multicore \acp{cpu} usually offer an instruction which is referred as \ac{tsl} or \ac{tas} in literature.
It is an atomic, not interruptible operation, usually used to modify a shared variable which controls the access to a shared memory region\cite{tanenbaum-modern-operating-systems}.
The atomicity of a \ac{tsl} instruction is in common achieved by locking the memory bus to prevent other \acp{cpu} from accessing the memory until the operation is done.
As a big advantage of this solution, the \ac{cpu} cores are not obstructed.
Common calculations are not impeded but memory accesses are prevented\cite{tanenbaum-modern-operating-systems}.
A variant is the \ac{xchg} instruction which exchanges the content of two memory locations in one atomic operation\cite{silberschatz2009operating}.
All mechanisms mentioned so far have one problem: they require busy waiting.
The thread waiting is still active and waists \ac{cpu} time.
For short waits, this is perfectly fine.
Switching to another thread and back would be more expensive in such a situation\cite{glatz2015betriebssysteme}.
Thus, there are locking mechanisms that implement busy waiting very efficiently, called \textit{spinlocks}\cite{tanenbaum-modern-operating-systems}.
But longer active waits are very inefficient.
In this case, it is better to use blocking waits and bring another thread in execution until the reason for the blocking is removed, e.g.\ the desired resource is freed again.
\subsubsection*{Semaphores and Mutexes}
\textsc{Dijkstra} suggested in 1965 a possibly blocking lock mechanism called \textit{semaphores} based on a \ac{cpu}'s \ac{tsl} instruction and easier to use for application developers.
A \textit{semaphore} is a new integer type with two related operations called \textit{P} and \textit{V} in the original paper or \textit{down} and \textit{up} in some literature and implementations\cite{glatz2015betriebssysteme}.
The semaphore is initially initialized with a value greater 0, the exact one depends on a system's implementation or can be defined by a developer.
If the \textit{P} or \textit{down} operation is executed on a semaphore, it is checked whether the value is greater than 0, decrements the counter if it is the case and continues.
If not, the thread is put to sleep.
The operation remains unfinished until a \textit{V} or \textit{up} operation was executed and incremented the semaphores value.
One of the possibly multiple threads sleeping on a semaphore is randomly or by a certain rule chosen by the operating system and gets the clearance to complete the \textit{P} or \textit{down} operation\cite{tanenbaum-modern-operating-systems}.
For a semaphore, the fact that updates on its value must be performed in an atomic operation is essential\cite{silberschatz2009operating}.
Neither the decrementing of the value performed in the \textit{P} operation nor the incrementing in \textit{V} may be interrupted.
Semaphores are a generic approach to control access to a \textit{critical section} for a number of threads, but the often required mutual exclusion is only given if the semaphore is initialized with the value 1.
This particular case of a binary semaphore is also referred to as \textit{mutex}.
A \textit{mutex} guarantees mutual exclusion for a critical section protected with its related operations as \textit{lock} (\textit{P, down}) and \textit{unlock} (\textit{V, up})\cite{tanenbaum-modern-operating-systems}.
Except for the initial value and the operation's names, mutexes do not differ from semaphores.
Just like semaphores, they can be implemented using the \ac{tsl} or \ac{xchg} instruction.
But if semaphores are actually blocking depends on the implementation.
As mentioned earlier a blocking mechanism is not always advantageous, e.g.\ for very short waits or multicore systems with real parallelism.
In these cases, its is potentially more efficient to use busy waiting with spinlocks\cite{glatz2015betriebssysteme}.
\subsubsection*{Futexes}
Another variant of semaphores is the \ac{futex}.
It targets the issue that neither busy waiting nor block and reschedule another thread is very efficient on modern multicore \acp{cpu}.
On a very parallel system, there are many contentions for resources which would require frequent switching of the active threads, but scheduling another thread respectively process requires as well expensive switches to the kernel\cite{tanenbaum-modern-operating-systems}.
Futexes are not a pure userspace construct even if the name suggests.
They still need a wait queue in kernel to manage the contending threads waiting on a lock, schedule another thread (futexes are blocking by design) and putting a thread into the queue requires a system call as well.
A user library tests a lock variable using a \ac{tsl} instruction.
If the lock is already held by another thread, the library has to put the thread into the queue.
But if there is no contention there is also no need to involve the kernel\cite{tanenbaum-modern-operating-systems}.
The less actual competitive situations occur, the more efficient are futexes compared to common mutexes.
Mutexes are for example used as part of the Linux kernel.
In this context, \textsc{Ingo Molnar} published an article\footnote{git.kernel.org, visited on 15.06.2019 \url{https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/robust-futexes.txt}} as part of the Linux kernel documentation covering the implementation of robust futexes and its impact on the system performance.
\subsubsection*{Barriers}
So far, the synchronization of resource accesses has been discussed.
Sometimes, this is not an issue but to coordinate and to synchronize the sequence flow of two or more threads.
If threads within an application are divided into self-contained phases, it must be ensured that the next phase is not started until the current one has been completed by all threads involved\cite{tanenbaum-modern-operating-systems}.
The synchronization points where the threads have to wait for each other to enter the next phase are called \textit{barriers}.
An example for such a situation could be cooking.
The stove must not be switched on until all ingredients have been purchased.
In contrast to the general implementation of barriers, a common real world stove does not block itself should this not be the case.
In most instances, barriers are implemented on the basis of blocking semaphores\cite{glatz2015betriebssysteme}.
%TODO more? example for two threads (A locks B, B locks A, A unlocks B, B unlocks A)?
% glatz page 203
\subsection{Inter Process Communication}\label{sec:ipc}
While the previous section~\ref{sec:synchro} focuses on threads, this section explicitly deals with the communication between different processes and their synchronization.
As \ac{ipc} is a central element in operating systems, especially in microkernel architectures, which significantly influences the system's behavior and its performance.
Both of them are crucial aspects in each operating system and thus, their efficiency is a decisive factor for a system's acceptance.
\subsubsection{Memory Based Communication}
Since the communication of threads of one process via its shared address space is already known, it is obvious to think about a comparable technique between processes as well.
But in contrast to threads within a process, the concept of process isolation must be considered here.
As a security feature, it is not possible for processes to share their address space.
% There are two basic ways independent processes can communicate with each other.
The basic idea to implement a corresponding idea nevertheless involves the idea of using an \textit{external shared memory region} which is requested from the operating system.
It allocates the memory and shows it in the address space of both processes\cite{brause2017betriebssysteme}.
The shared memory region is the responsibility of the processes, must be managed and secured against incorrect access by themselves\cite{brause2017betriebssysteme},~\cite{glatz2015betriebssysteme}.
As known from thread communication, the advantages of this solution are the performance and transparency of raw memory access, but it also needs additional (manual) synchronization to obtain and maintain consistent data\cite{glatz2015betriebssysteme}.
Nevertheless, this solution breaks the isolation between processes.
A variant of this idea is using \textit{files} or similar resources as a communication medium.
In this case, the process isolation remains, but manual synchronization, as described in the previous section, is still necessary\cite{brause2017betriebssysteme}.
\subsubsection{Message based communication}
Message based communication methods are more wide-spread.
In contrast to the memory based ones, the synchronization usually does not have to be done manually.
The message exchange and with it the synchronization are provided by the operating system respectively its kernel via two primitives: \textit{send()} and \textit{receive()}\cite{tanenbaum-modern-operating-systems}.
Basically, there are three types of messages:
\begin{itemize}
\item A \textit{Message} is characterized by a delimited amount of data within a single communication.
\item \textit{Streams} can theoretically transport an unlimited amount of data. In practice, there is a limitation not visible for the sender and the receiver.
\item \textit{Packets} organizes the data to transport in standardized formats, the communication protocols. They allow fragmented transfer of a large quantity of data which is defragmented on the receiver side. The system implementation hides this fact, so the packets itself are not visible for an application\cite{glatz2015betriebssysteme}.
\end{itemize}
% message passing tanenbaum
% \cite{tanenbaum-modern-operating-systems}
% message passing: two primitives: send/receive (rather system calls than language constructs), if no message available, receiver can block until one arrives (or return immediately with an error code)
% issue: lost messages -> receiver send back an acknowledgement message, if sender does not receive it within a certain time, it retransmits the message -> message was received, but ack is lost -> message got retransmitted -> receiver get it twice -> need to distinct between both, e.g. via sequence numbers (same number for a resent message)
% example: MPI (message passing interface)
Regardless of the type of message, there are two basic operations available for message-based communication: \textit{send()} and \textit{receive()}.
These operations are commonly supported by the operating system and enable not only the system-wide but also the cross-system exchange of messages and data\cite{glatz2015betriebssysteme}.
Especially the opportunity of cross-system communication offers a significant advantage over memory-based communication, as this is the foundation for network communication.
But also the synchronization between the involved processes is simplified.
The concept of messages avoids race conditions and makes the manual use of semaphores for application developers obsolete, as the operating system realizes the transmission of the messages and their synchronization.
However, this concept is not as efficient as memory-based \ac{ipc} concepts in terms of resource consumption and performance\cite{glatz2015betriebssysteme}.
\paragraph{Synchronous and Asynchronous Communication}
One of the most basic design decisions within message-based communication is whether to send or receive messages synchronously or asynchronously.
When transmitting messages \textit{synchronously}, both sender and receiver must be ready for the exchange at the same time\cite{glatz2015betriebssysteme}.
Is this not the case for one instance of them, e.g.\ the receiver, the other one (the sender) must block until the first one, the receiver, becomes ready.
An additional, manual synchronization is not necessary.
Even if the literature does not describe a general way to transfer messages between processes, race conditions cannot occur as long as a date is only sent from a writing process to the address space of an initially reading process and can only be modified there\cite{brause2017betriebssysteme},~\cite{glatz2015betriebssysteme}.
However, within the individual processes, the synchronization corresponds to the one known from threads (see~\ref{sec:synchro}).
But the close coupling of the processes within the synchronous communication leads to the fact that the program flow of both is no longer possible independently of each other.
Accordingly, the parallelism for processes in such a relationship is also limited\cite{brause2017betriebssysteme}.
The \textit{asynchronous} approach tries to decouple this by inserting a buffer between which is also known as \textit{mailbox} or \textit{message queue}.
So sending is possible even if the receiver is not available at one moment, as long as the buffer is not completely filled.
The receiver can read equally if the sender is not ready, but the buffer still contains messages.
Even different processing speeds between the processes involved can also be compensated in this way\cite{glatz2015betriebssysteme},~\cite{brause2017betriebssysteme}.
Only if the buffer is completely filled, the sender has to be blocked and vice versa the receiver if the buffer is empty.
But an operation only blocks when running in a \textit{blocking operation mode}.
In a \textit{non-blocking operation mode}, a call returns immediately in such a situation but reports a full respectively empty buffer which has to be handled on application level\cite{glatz2015betriebssysteme}.
\textit{Asynchronous} communication allows an independent program flow and simplifies parallelism thereby.
Nevertheless, an additional buffer with limited size is needed.
The exact one depends on the implementation.
If this buffer is filled and the receiver is blocked, the situation is the same as before with synchronous communication.
But the main problem with this approach is that it is not possible to predict how long a message transfer takes and when a response can be expected\cite{glatz2015betriebssysteme}.
The issue is to decide whether a message is lost, the communication only takes a long time or the receiving process has crashed\cite{tanenbaum-modern-operating-systems}.
\paragraph{Connection-Oriented and Connectionless Communication}
Especially when the reliability of data transmission has a great significance, a \textit{connection-oriented} approach for message-based \ac{ipc} is chosen.
This allows to guarantee the message sequence, monitor the transmission time of them and to intervene if necessary\cite{glatz2015betriebssysteme}.
Until a (logical) connection or \textit{channel} between the communication partners has been established, the initial message receiver must be unambiguously known.
As soon as this is the case, the identification is not longer of relevance.
The established channel provides a reliable end-to-end transmission which may have the following properties:
%
\begin{itemize}
\item An \textit{unidirectional} connection allows message transfers only in one direction at a time.
\begin{itemize}
\item If the roles of sender and receiver never change, one says the transmission is \textit{simplex}.
\item If the participants act alternatingly as sender or receiver, the transmission is also called \textit{half duplex}.
\end{itemize}
\item An \textit{bidirectional} connection allows message transfers in both directions. Both participants can act as sender and receiver at the same time. The transmission is \textit{full duplex}\cite{brause2017betriebssysteme},~\cite{glatz2015betriebssysteme}.
\end{itemize}
%
If a connection is no longer needed, it must be disconnected and destroyed.
The construction and dismantling of a connection means an additional and noticeable overhead especially for a small amount of data to transmit.
It is only worthwhile if the reliable transmission between designated instances is more significant than the connection costs\cite{glatz2015betriebssysteme}.
One example for connection-oriented communication is a telephone call.
But there are also good reasons for \textit{connectionless} communication.
For example if the overhead of the connection setup is too costly for little data to be transferred or rather few data is sent to a large amount of receivers.
A real world example for \textit{connectionless} communication is radio.
The order of messages cannot be guaranteed for this connection type and also the loss of them is possible, but this has not necessarily further impact on the application.
For a multimedia application, a retry could be a greater damage than the package loss.
But for other application purposes, a retry is maybe needed and must be triggered from the application itself\cite{glatz2015betriebssysteme},~\cite{brause2017betriebssysteme}.
Nevertheless, the recipient must be addressed anew in every single message.
\paragraph{Receiver Addressing}
Message receivers respectively their message queues are commonly addressed by a symbolic name which is managed in a system-wide or cross-system directory.
When addressing them, one speaks depending on the number of receivers and the liability of the transmission of:
\begin{itemize}
\item an \textit{unicast}, if there is a 1:1 relationship between sender and a certain receiver addressed.
\item a \textit{multicast}, if there is a 1:m relationship between sender and a defined group of receivers addressed.
\item an \textit{anycast}, if there is a 1:1..n relationship between sender and group of receivers. At least one recipient from the group must receive the message.
\item a \textit{broadcast}, if there is a 1:n relationship between the sender and all detectable receivers, whether the message actually arrives does not matter\cite{glatz2015betriebssysteme}.
\end{itemize}
\paragraph{Priority}
In literature, asynchronous message-based \ac{ipc} mechanisms are usually modeled without prioritizing messages.
A buffer, message queue or mailbox that works according to the \ac{fifo} principle is suitable for this purpose.
In order to enable a prioritized message exchange, it is, iter alia, conceivable to use separate buffers or queues for each priority\cite{glatz2015betriebssysteme},~\cite{tanenbaum-modern-operating-systems}.
% \subsubsection{Network Communication}
% Communication between processes is not only limited to one system but can also take place in cross-system networks.
% The implementation of them should not be further considered at this point.
% The book \textit{Computer Networks} by \textsc{Tanenbaum} and \textsc{Wetherall} is a good resource if there is interest in the topic.
% Instead, the two basic concepts for this purpose, the Berkeley-Sockets and \acp{rpc}.
%
% \paragraph{Sockets}
%
%
% % socket
% -> enable a cross-computer data exchange via data streams, takes the idea of pipelines for distributed computer systems
% -> applications use tcp/udp
% -> socket is a communication end-point, where application and transport layer matches
% -> stream socket -> connection oriented, reliable transport e.g. tcp
% -> datagram socket -> connectionless, unreliable transport, e.g. udp
% -> seqpacket socket -> data set oriented, reliable transport
% -> raw socket, unreliable transport, e.g. ip
%
% reliable means the data is faultless transported, if not, the application is informed
% unreliable -> no information
%
% connections needs the same socket type on both sides
%
%
% \paragraph{RPC}
% % rpc
% -> call distributed/remote procedures
% -> calling process : client
% -> called procedure : server procedure
% both systems implement a notification service which implements the call including parameters into a calling and a return message
%
% sequenz: -> client stub (does not fulfill action itself, just connects to rpc server, calling procedure is blocked until the call is done)
% - wrap calling parameters in rpc message (marshaling)