-
Notifications
You must be signed in to change notification settings - Fork 2
/
thesis.nw
1621 lines (1433 loc) · 75.6 KB
/
thesis.nw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[abstracton]{scrartcl}
\usepackage[english]{babel}
\usepackage{ucs}
\usepackage[utf8x]{inputenc}
\usepackage{relsize}
\usepackage{lmodern}
\PreloadUnicodePage{0}
\usepackage[pdfborder={0 0 0}]{hyperref}
\usepackage{noweb}
\def\cmd{\textsc}
\def\pack{\emph}
\def\term{\emph}
\def\param{\texttt}
\def\str{\texttt}
\def\bs{\textbackslash}
\def\key{\texttt}
\newcommand\ctrl[1]{\key{\char`\^#1}}
\titlehead{University of Mannheim\\Laboratory for Dependable Distributed Systems}
\subject{\vspace{3cm}Bachelor Thesis}
\title{Design~and~Implementation~of a~Forensic~Documentation~Tool for~Interactive~Command\mbox{-}line~Sessions}
\author{Tim Weber}
\date{February 23, 2010}
\publishers{
\vspace{5cm}
\begin{tabular}{ll}
Primary examiner: & Prof. Dr. Felix C. Freiling \\
Secondary examiner: & Dipl.-Inf. Andreas Dewald \\
Supervisor: & Prof. Dr. Felix C. Freiling
\end{tabular}
}
\begin{document}
\maketitle
\begin{center}Bachelorstudiengang Software- und Internettechnologie\end{center}
\thispagestyle{empty}
\pagebreak
\pagenumbering{roman}
\setcounter{page}{1}
{\phantom{.}\vspace{5cm}}
\abstract{
In computer forensics, it is important to document examination of a computer system with as much detail as possible.
Many experts use the software \cmd{script} to record their whole terminal session while analyzing the target system.
This thesis shows why \cmd{script}’s features are not sufficient for documentation that is to be used in court.
A new system, \cmd{forscript}, providing additional capabilities and mechanisms will be designed and developed in this thesis.
}
\pagebreak
\tableofcontents
\pagebreak
\section*{Acknowledgements}
First of all, I would like to thank Prof. Dr. Freiling for the opportunity to write a thesis about this interesting subject and for supporting me during the process of writing.
I would also like to thank Andreas Dewald for being available as secondary examiner and Prof. Dr.-Ing. Effelsberg for postponing my thesis deadline.
Thanks to Alexander Brock for testing and fuzzing \cmd{forscript} as well as proofreading the thesis.
Michael Stapelberg, thank you for testing \cmd{forscript} and for giving me some hints about Unix system calls and why \cmd{script} does some things the way it does.
Many thanks go to the free software community:
The people who created C, GCC, Git, \LaTeX, Linux, make, noweb, script and Vim, but especially those who create comprehensive documentation.
I would like to explicitly mention the BSD \emph{termios(4)} manual page as an example of how good documentation should look like.
Thanks to my father for giving me the time to study at my own pace, and thanks to the hacker community for inspiring me every day.
Finally, I would like to thank Nathalie for the love, support and understanding before, during and after this thesis.
\pagebreak
\pagenumbering{arabic}
\section{Introduction}\label{intro}
\subsection{Background: Computer Forensics}
Computer forensics is a branch of forensic science.~\cite{casey}
In the digital age we live in, an increasing number of crimes is performed using or at least aided by digital devices and computer systems.
To analyze the evidence that may be present on these devices, specially trained experts are required.
Having knowledge about the technology behind the systems, these forensic investigators are able to search for evidence without destroying traces, modifying or even accidentally inserting misleading data.
Principles and techniques of computer forensics are, among others, employed to
\begin{itemize}
\item analyze computers, mobile phones and other electronic devices a suspected criminal has used,
\item recover data after a hardware or software failure,
\item gain information during or after attacks or break-in attempts on a computer system.
\end{itemize}
\subsubsection*{Documentation of Terminal Sessions}
A forensic investigator has to keep a detailed record of his or her actions while analyzing a system.
That way, in case of dispute about a piece of evidence, another forensic investigator can review the steps that led to certain conclusions.
This \term{forensic log} improves the credibility of the investigator and protects a possible defendant from false accusations.
Additionally, the investigator protects himself from forgetting how the evidence was found and what additional details (which probably seemed to be not important at that time) were present.
The protocol consists of, depending on the type of analysis, notes on paper, images, videos and data files on the investigator’s computer.
For example, to perform a \term{static analysis} of a suspect’s computer’s hard disk drive, i.e. searching the drive for suspicious data without modifying it, an investigator normally uses his computer, which is equipped with a software that records every action the investigator performs.
Often a Unix-based operating system like Linux or Mac OS~X and command-line based software (also called \term{CLI software} for its command-line user interface) is used to perform such an analysis, for example \cmd{dd} to create a snapshot of the suspect’s hard drive, \cmd{sha1sum} to verify its integrity and other tools like \cmd{foremost} to find evidence in the snapshot.
All interaction with the forensic software takes place in a text-based interface; the investigator uses his keyboard to perform commands, his workstation responds by displaying\footnote{also called “printing”, even though the output appears on the screen, not on paper} text and data.
A text-based interface cannot display graphics or use the mouse\footnote{Using the mouse is possible via several extensions, but mouse commands are simply translated to special control characters and can be read by the application just like any other keyboard input.}.
In principle, CLI sessions can be documented quite easily by creating a piece of software that records everything typed on the keyboard and everything sent to the screen.
The \cmd{script} utility is often used to accomplish this; however, it has several limitations described in section~\ref{scriptissues} which greatly limit its usefulness as a forensic tool.
\subsection{Tasks}
Several tasks have to be solved in this bachelor thesis:
\begin{itemize}
\item Analyze \cmd{script} with regard to weaknesses concerning its usage as a forensic tool.
\item Describe \cmd{script}’s output format and its disadvantages.
\item Describe in detail an output format suitable for forensic usage.
\item Implement a software for Linux that is used like \cmd{script}, but creates output in the new forensic output format. In order to minimize the requirements a target system has to meet to be able to run the software, it has to be implemented in the \term{C} programming language.
\item Document the software according to the methods of \term{literate programming}.
\end{itemize}
\term{Literate programming}~\cite{literate} is a technique invented by Donald~E. Knuth, the author of the \TeX{} typesetting system.
Instead of writing more or less commented source code, it propagates writing a continuous text with embedded code fragments.
These do not necessarily appear in the order they are executed, but where they are didactically useful.
The software \cmd{noweb}~\cite{noweb} is used to generate the layouted thesis as well as the final program’s source code out of a single file.
\subsection{Results}
It is apparent that \cmd{script} is not suited for forensic usage, especially because it does not record the user’s input and data about the environment it is running in.
A successor, \cmd{forscript}, has been designed and developed in this thesis.
Its output format is portable, extensible and contains detailed information about the environment.
The disadvantages of \cmd{script} are eliminated.
Following the paradigm of literate programming, this thesis is \cmd{forscript} and vice versa.
\subsection{Outlook on the Thesis}
Section~\ref{intro}, which you are currently reading, contains the introduction into the topic of computer forensics.
It explains why detailed documentation of forensic analyses is an important task, what a command-line interface is, which subjects will be presented in this thesis and also provides an overview of the tasks and results.
In section~\ref{script}, one of the most popular tools for recording interactive terminal sessions, \cmd{script}, will be presented and the format of the files it generates will be described.
Afterwards, several issues regarding its usage as a forensic tool are presented, leading to the conclusion that it should be replaced with a more suitable software.
This new software called \cmd{forscript} will be drafted in section~\ref{design}, focusing on its file format and the resulting properties.
The invocation syntax of \cmd{forscript}, which is based on that of \cmd{script}, and the differences in behavior compared to \cmd{script} is also described.
Section~\ref{implementation}, by far the longest section, contains a detailed step-by-step description of \cmd{forscript}’s source code.
It describes how to write \cmd{forscript}’s data format, parsing the command line, what a pseudo terminal is and how to create one to access the input and output streams of an application, how to deal with subprocesses and signals and other things.
The resulting application will be evaluated in section~\ref{evaluation}, which includes an example transcript file and a description of \cmd{forscript}’s known limitations.
Finally, section~\ref{summary} summarizes the work that has been done.
It talks about the future of \cmd{forscript} and describes the next steps that should probably be taken to make it even more useful.
\section{\cmd{script}}\label{script}
\pack{util-linux} is the name of a collection of command-line utilities for Linux systems.
It includes essential software like \cmd{dmesg}, \cmd{fdisk}, \cmd{mkswap}, \cmd{mount} and \cmd{shutdown} as well as the \cmd{script} and \cmd{scriptreplay} utilities.
The original \pack{util-linux} package~\cite{util} was abandoned in 2006.
Today, it has been replaced by its successor \pack{util-linux-ng}~\cite{utilng}, a \term{fork} based on the last available \pack{util-linux} version.
\pack{util-linux-ng} is under active development.
The analysis of the original \cmd{script} utility in this thesis is based on the most recent \pack{util-linux-ng} release as of the time of writing, version 2.17.
\subsection{Purpose}
The purpose of \cmd{script} is to record everything printed to the user’s terminal into a file.
According to its manual, “[i]t is useful for students who need a hardcopy record of an interactive session as proof of an assignment”.
It can also record timing data, specifying the chronological progress of the terminal session, into a second file.
Using both of these files, the accompanying utility \cmd{scriptreplay} can display the recorded data in a video-like way.
\subsection{Mode of Operation}
In order to record the terminal session, \cmd{script} creates a new \term{pseudo terminal} (PTY), which is a virtual, software-based representation of a terminal line, and attach itself to the “master” side of it, thereby being able to send and receive data to and from an application connected to the “slave” side of the PTY.
It launches a subprocess (also known as \term{child}), which launches the actual client application as its own subchild and then records the client application’s output stream.
The parent process forwards the user’s input to the client application.
Recording terminates as soon as the child process exits.
\subsection{Invocation}
\cmd{script} takes one optional argument, the file name of the output file (also called \term{typescript} file) to generate.
If the argument is omitted, the file will be named \str{typescript}, except when the file already exists and is a symbolic or hard link:
\cmd{script} then refuses to overwrite the file, apparently for safety reasons.
This check can be avoided by explicitly providing the file name on the command line.
There are several command-line switches that modify \cmd{script}’s behavior.
The \param{-a} switch will pass the \str{a} flag instead of \str{w} to [[fopen()]]’s [[mode]] parameter.
If a typescript file does already exist, it will then not be overwritten; instead, the new content will be appended to the existing file.
By default, \cmd{script} will launch the shell specified by the environment variable \str{\$SHELL}.
If \str{\$SHELL} it is not set, a default shell selected at compile time (usually \str{/bin/sh}).
The shell will be called with \param{-i} as its first parameter, making it an interactive shell.
However, if \cmd{script} is called with the \param{-c} option, followed by a command, it will launch the shell with \param{-c} and the command instead of \param{-i}.
The shell will then be non-interactive and only run the specified command, then exit.
For example, called with the parameters \param{-c 'last -5'}, \cmd{script} will launch \str{/bin/sh -c 'last -5'} (or whatever shell is defined in \str{\$SHELL}).
Note that all POSIX-compatible shells have to support the \param{-i} and \param{-c} parameters.
If the \param{-f} switch is used, \cmd{script} will call [[fflush()]] on the typescript file after new data has been written to it, resulting in instant updates to the typescript file, at the expense of performance.
This is for example useful for letting another user watch the actions recorded by \cmd{script} in real time.
If the \param{-q} switch is not specified, \cmd{script} will display a message when it starts or quits and also record its startup and termination it the typescript file.
With \param{-q}, all of these messages will not appear, with one exception:
Since \cmd{scriptreplay} will unconditionally discard the first line in a typescript file, writing the startup message (\str{"Script started on …"}) cannot be disabled.
The \param{-t} switch will make \cmd{script} output timing information to \term{stderr}.
Its format is described in section~\ref{scripttiming}.
If \cmd{script} is called with \param{-V} or \param{--version} as only parameter, it will print its version and exit.
Any other parameter will make \cmd{script} display an error message and exit.
\subsection{File Formats}
\subsubsection{Typescript}
The current implementation of \cmd{script} uses a very simple typescript file format:
Everything the client application sends to the terminal, i.e. everything printed on screen, will be written to the file, byte by byte, including control characters that are used for various tasks like setting colors, positioning the cursor etc.
Additionally, a header \str{"Script started on XXX\bs{}n"} is written, where \str{XXX} is the human-readable date and time when \cmd{script} was invoked.
If \cmd{script} was invoked without the \param{-q} flag, an additional footer \str{"Script done on YYY\bs{}n"}, where \str{YYY} is the human-readable date and time when \cmd{script} terminated, is written.
\subsubsection{Timing}
\label{scripttiming}
Since this typescript format completely lacks timing information, the \param{-t} flag will output timing data to stderr.
The user has to capture this output to a file by calling \cmd{script} like this: \str{script -t 2>timingfile}.
The timing file consists of tuples of delay and byte count (space-separated), one per line:
\begin{verbatim}
0.725168 56
0.006549 126
0.040017 1
4.727988 1
0.047972 1
\end{verbatim}
Each line can be read like \emph{“\emph{x} seconds after the previous output, \emph{n} more bytes were sent to the terminal”}.
If there was no previous output (because it is the first line of timing information), the delay specifies the time between \cmd{script} invocation and the first chunk of output.
\subsection{Disadvantages}\label{scriptissues}
The two file formats produced by \cmd{script}, typescript and timing, show several shortcomings with regard to forensic usage:
\begin{itemize}
\item Input coming from the user’s keyboard is not logged at all. A common example is the user entering a command in the shell but then pressing \ctrl{C} instead of return. The shell will move to the next line and display the prompt again; there is no visible distinction whether the command was run or not.\footnote{With more recent versions of Linux and Bash, terminals which have the ECHOCTL bit set (for example via stty) will show \ctrl{C} at the end of an interrupted line, which fixes this problem to some degree. Similar issues, like finding out whether the user entered or tab-completed some text, still persist.}
\item Metadata about the environment \cmd{script} runs in is not logged. This leads to a high level of uncertainty when interpreting the resulting typescript, because even important information like the character set and encoding or the terminal size and type is missing.
\item Typescript and timing are separate files, but one logical entity. They should reside in one file to protect the user from confusion and mistakes.
\item Appending to a typescript file is possible, but ambigious, since the beginning of a new part is determined only by the string \str{"Script started on~…"}. Also, appending to a typescript and recording timing information are incompatible, because \cmd{scriptreplay} will only ignore the first header line in a typescript file. Subsequent ones will disturb the timing’s byte counter.
\end{itemize}
\subsection*{Summary}
This section has presented the background, purpose and operation of \cmd{script}.
We have learned that because of several lacking features, using it in computer forensics is problematic.
The next section will introduce a software without these disadvantages.
\section{Design of \cmd{forscript}}\label{design}
In this section, the new file format as used by \cmd{forscript} will be specified.
You will learn about how input, output and metadata are combined into a single output file.
After describing the format’s characteristics, the invocation syntax, which is designed to be compatible to \cmd{script}, will be presented.
\subsection{File Format}
A \cmd{forscript} data file (called a \emph{transcript file}) consists of the mostly unaltered output stream of the client application, but includes blocks of additional data (called \emph{control chunks}) at arbitrary positions.
A control chunk is started by a \emph{shift out} byte (\str{0x0e}) and terminated by a \emph{shift in} byte (\str{0x0f}).
Each control chunk is either an input chunk or a metadata chunk.
\subsubsection{Input Chunks}
Input chunks contain the data that is sent to the client application’s input stream, which is usually identical to the user’s keyboard input.
They are of arbitrary length and terminate at the \emph{shift in} byte.
If a literal \emph{shift out} or \emph{shift in} byte needs to appear in an input chunk’s data, it is escaped by prepending a \emph{data link escape} byte (\str{0x10}).
If a literal \emph{data link escape} byte needs to appear in an input chunk’s data, it has to be doubled (i.e., \str{0x10 0x10}).
For example, if the user sends the byte sequence \str{0x4e 0x0f 0x00 0x61 0x74 0x10}, the complete input chunk written to the transcript file is \str{0x0e 0x4e 0x10 0x0f 0x00 0x61 0x74 0x10 0x10 0x0f}.
\subsubsection{Metadata Chunks}
Metadata chunks, also called meta chunks, contain additional information about the file or the application’s status, for example environment variables, terminal settings or time stamps.
They contain an additional \emph{shift out} byte at the beginning, followed by a byte that determines the type of metadata that follows.
The available types are described below.
Meta chunks are of arbitrary length and terminate at the \emph{shift in} byte.
The same escaping of \emph{shift out}, \emph{shift in} and \emph{data link escape} that is used for input chunks is also used for meta chunks.
For example, the “terminal size” meta type is introduced by its type byte \str{0x11}, followed by width and heigth of the terminal, represented as two unsigned big-endian 16-bit integers.
The information “terminal size is 80×16 characters” would be written to the transcript file as \str{0x0e 0x0e 0x11 0x00 0x50 0x00 0x10 0x10 0x0f}.
Note that the least significant byte of the number 16 has to be written as \str{0x10 0x10} to prevent the special meaning of \str{0x10} to escape the following \str{0x0f}.
\subsubsection{Properties of the File Format}
This basic file format design has several advantages:
\begin{itemize}
\item New meta chunk types can be introduced while still allowing older tools to read the file, because the escaping rules are simple and the parsing application need not know a fixed length of each type.
\item Since switching between input and output data occurs very often in a usual terminal session, the format is designed to require very little storage overhead for these operations.
\item The format is very compact and easy to implement. Using a format like XML would decrease performance and require sophisticated libraries on the machine \cmd{forscript} is run on. However, for forensic usage it is best to be able to use a small statically linked executable.
\item Converting a \cmd{forscript} file to a \cmd{script} file is basically as easy as removing everything between \emph{shift out} and \emph{shift in} bytes (while respecting escaping rules, of course).
\end{itemize}
\subsection{Metadata Chunk Types}
The next sections will describe the available metadata chunk types.
Integers are unsigned and big endian, except where noted otherwise.
In the resulting file, numbers are represented in binary form, not as ASCII digits.
For better understanding, the code \cmd{forscript} uses to write each meta chunk appears after the chunk’s explanation.
The three functions [[chunkwh()]], [[chunkwf()]] and [[chunkwd()]] that are used for actually writing the data to disk will be explained in section~\ref{def:chunkwriters}.
To be able to understand the code, it is sufficient to know that [[chunkwh()]] takes one parameter (the chunk’s type) and writes the header bytes.
[[chunkwf()]] writes the footer byte and takes no parameters, while [[chunkwd()]] writes the payload data, escaping it on the fly, and requires a pointer and byte count.
There is an additional convenience function [[chunkwm()]] that takes all three parameters and will write a complete metadata chunk.
All chunk functions return a negative value if an error occured, for example if an environment setting could not be retrieved or if writing to the transcript file failed.
Since only a partial metadata chunk may have been written to the transcript, the file is no longer in a consistent state.
Therefore, \cmd{forscript} should terminate whenever a chunk function returns a negative value.
A transcript file needs to begin with a \emph{file version} meta chunk, followed directly by the first \emph{start of session} chunk.
\subsubsection*{\str{0x01} File Version (1 byte)}
The transcript file must start with a meta chunk of this type; there may be no other data before it.
Denotes the version of the \cmd{forscript} file format that is being used for this file.
In order to guarantee a length of exactly one byte, the version numbers 0, 14, 15 and 16 are not allowed, therefore no escaping takes place.
This document describes version 1 of the format, therefore currently the only valid value is \str{0x01}.
<<chunks>>=
int chunk01() {
unsigned char ver = 0x01;
return chunkwm(0x01, &ver, sizeof(ver));
}
@
\subsubsection*{\str{0x02} Begin of Session (10 bytes)}
Denotes the start of a new \cmd{forscript} session.
The first four data bytes represent the start time as the number of seconds since the Unix Epoch.
The next four bytes contain a signed representation of the nanosecond offset to the number of seconds.
If these four bytes are set to \str{0xffffffff}, there was an error retrieving the nanoseconds.
The last two bytes specify the machine’s (signed) time zone offset to UTC in minutes.
If these two bytes are set to \str{0xffff}, the machine’s timezone is unknown.
<<chunks>>=
int chunk02() {
struct timespec now;
extern long timezone;
int ret;
unsigned char data[10];
uint32_t secs;
int32_t nanos = ~0;
int16_t tzone = ~0;
if ((ret = clock_gettime(CLOCK_REALTIME, &now)) < 0)
return ret;
secs = htonl(now.tv_sec);
if (now.tv_nsec < 1000000000L && now.tv_nsec > -1000000000L)
nanos = htonl(now.tv_nsec);
tzset();
tzone = htons((uint16_t)(timezone / -60));
memcpy(&data[0], &secs, sizeof(secs));
memcpy(&data[4], &nanos, sizeof(nanos));
memcpy(&data[8], &tzone, sizeof(tzone));
return chunkwm(0x02, data, sizeof(data));
}
@
This chunk requires the headers \str{time.h} for [[clock_gettime()]], \str{inet.h} for [[htonl()]] and \str{string.h} for [[memcpy()]]:
<<includes>>=
#include <time.h>
#include <arpa/inet.h>
#include <string.h>
@
\subsubsection*{\str{0x03} End of Session (1 byte)}
Denotes the end of a \cmd{forscript} session.
The data byte contains the return value of the child process.
The usual exit code convention applies:
If the child exited normally, use its return value.
If the child was terminated as a result of a signal (like \str{SIGSEGV}), use the number of the signal plus $128$.
The parameter [[status]] should contain the raw status value returned by [[wait()]], not only the child’s return value.
If the exit code of the child could not be determined, \str{0xff} is used instead.
<<chunks>>=
int chunk03(int status) {
unsigned char data = ~0;
if (WIFEXITED(status))
data = WEXITSTATUS(status);
else if (WIFSIGNALED(status))
data = 128 + WTERMSIG(status);
return chunkwm(0x03, &data, sizeof(data));
}
@
\subsubsection*{\str{0x11} Terminal Size (two 2-byte values)}
Is written at session start and when the size of the terminal window changes.
The first data word contains the number of colums, the second one the number of rows.
Since the terminal size has to be passed to the running client application, the chunk itself does not request the values, but receives them as a parameter.
<<chunks>>=
int chunk11(struct winsize *size) {
uint32_t be;
be = htonl((size->ws_col << 16) | size->ws_row);
return chunkwm(0x11, (unsigned char *)&be, sizeof(be));
}
@
\subsubsection*{\str{0x12} Environment Variables (arbitrary number of C strings)}
Is written at session start.
Contains the environment variables and their values as \str{NAME=value} pairs, each pair is terminated by a null byte (\str{0x00}).
Since variable names may not contain the \str{=} character and neither variables names nor the values may include a null byte, the list needs no special escaping.
<<chunks>>=
int chunk12() {
extern char **environ;
int i = 0;
int ret;
while (environ[i] != NULL) {
if (i == 0) {
if ((ret = chunkwh(0x12)) < 0)
return ret;
}
if ((ret = chunkwd((unsigned char *)environ[i],
strlen(environ[i]) + 1)) < 0)
return ret;
i++;
}
if (i != 0) {
if ((ret = chunkwf()) < 0)
return ret;
}
return 1;
}
@
\subsubsection*{\str{0x13} Locale Settings (seven C strings)}
Is written at session start.
Contains the string values of several locale settings, namely \str{LC\_ALL}, \str{LC\_COLLATE}, \str{LC\_CTYPE}, \str{LC\_MESSAGES}, \str{LC\_MONETARY}, \str{LC\_NUMERIC} and \str{LC\_TIME}, in that order, each terminated by a null byte.
<<chunks>>=
int chunk13() {
int cat[7] = { LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES,
LC_MONETARY, LC_NUMERIC, LC_TIME };
char *loc;
int ret;
if ((ret = chunkwh(0x13)) < 0)
return ret;
for (int i = 0; i < 7; i++) {
if ((loc = setlocale(cat[i], "")) == NULL)
return -1;
if ((ret = chunkwd((unsigned char *)loc,
strlen(loc) + 1)) < 0)
return ret;
}
if ((ret = chunkwf()) < 0)
return ret;
return 0;
}
@
[[setlocale()]] requires \str{locale.h}:
<<includes>>=
#include <locale.h>
@
\subsubsection*{\str{0x16} Delay (two 4-byte values)}
Contains the number of seconds and nanoseconds that have passed since the last delay chunk (or, if this is the first one, since the session started).
A replaying application should wait for the time specified in this chunk before advancing further in the transcript file.
Since the seconds and nanoseconds are represented as integers, converting to a floating-point number would mean a loss of precision.
Therefore both integers are subtracted independently.
If the nanoseconds part of [[now]] is less than that of [[ts]], the seconds part has to be decreased by one for the result to be correct.
<<chunks>>=
int chunk16(struct timespec *ts) {
unsigned char buf[2 * sizeof(uint32_t)];
uint32_t secs, nanos;
struct timespec now;
if (clock_gettime(CLOCK_MONOTONIC, &now) < 0)
return -1;
secs = now.tv_sec - ts->tv_sec;
if (now.tv_nsec > ts->tv_nsec) {
nanos = now.tv_nsec - ts->tv_nsec;
} else {
nanos = 1000000000L - (ts->tv_nsec - now.tv_nsec);
secs--;
}
*ts = now;
secs = htonl(secs);
nanos = htonl(nanos);
memcpy(&buf[0], &secs, sizeof(secs));
memcpy(&buf[sizeof(secs)], &nanos, sizeof(nanos));
return chunkwm(0x16, buf, sizeof(buf));
}
@
\subsection{Magic Number}
Since a \cmd{forscript} file has to start with a file version chunk followed by a begin of session chunk, there is a distinctive eight-byte signature at the beginning of each file:
\begin{verbatim}
0x0e 0x0e 0x01 0x?? 0x0f 0x0e 0x0e 0x02
\end{verbatim}
The first two bytes start a metadata chunk, the third one identifies it as a file version chunk.
The fourth byte contains the version number, which is currently \str{0x01} but may change in the future.
Byte 5 closes the version chunk, 6 to 8 start a begin of session chunk.
\subsection{Invocation}
\cmd{forscript}’s invocation syntax has been designed to be compatible to \cmd{script}, most parameters result in the same behavior.
The following list contains additional notes and describes the differences to \cmd{script}:
\begin{itemize}
\item \param{-a} (append): If the target transcript file already exists and is non-empty, it has to start with a valid and supported \emph{file version} header.
\item \param{-c} (command) and \param{-f} (flush): Identical to \cmd{script}.
\item \param{-q} (quiet): In contrast to \cmd{script}, no startup message will be written to the transcript file.
\item \param{-t} (timing): This parameter will be accepted, but ignored. \cmd{forscript} always records timing information.
\item \param{-V} and \param{--version}: Identical to \cmd{script}, both will make \cmd{forscript} output its version information and terminate. The parameter has to be the only one specified on the command line, else an error message will be printed.
\end{itemize}
If unsupported parameters are passed, \cmd{forscript} will print a short usage summary to \emph{stderr} and exit.
While running, the client application’s output will be printed to \emph{stdout}.
Error messages will be printed to \emph{stderr}.
\subsection*{Summary}
Now you know how \cmd{forscript} stores the recorded terminal session and how it will be called by the user.
You have seen the code that writes the various metadata chunks.
After this soft introduction to \cmd{forscript}’s implementation, the next section contains the rest of the code and will talk in detail about how the software works.
\pagebreak
\section{Implementation of \cmd{forscript}}\label{implementation}
This section will describe the code of \cmd{forscript} in detail.
You will learn how the software hooks into the input and output stream of the client application and how it reacts to things like window size changes or the child terminating.
Other interesting topics include how to launch a subprocess and change its controlling terminal as well as how to read from multiple data streams at one without having to run separate processes.
\subsection{Constants}
For improved readability, we define the special characters introduced in the previous section as constants:
<<constants>>=
const unsigned char SO = 0x0e;
const unsigned char SI = 0x0f;
const unsigned char DLE = 0x10;
@
It is by design that the three special characters have consecutive byte numbers.
This allows us to define a minimum and maximum byte value that requires special escape handling:
<<constants>>=
const unsigned char ESCMIN = 0x0e;
const unsigned char ESCMAX = 0x10;
@
\subsection{Writing Metadata Chunks to Disk}
\label{def:chunkwriters}
The function \emph{chunkwd()} takes a pointer and a byte count as arguments and writes chunk data to the transcript file, applying required escapes on the fly.
To improve performance, it does not write byte-by-byte, but instead scans the input data until it finds a special character.
When it does, it writes everything up to, but not including, the special character to the file and then adds a DLE character.
The search then goes on.
If another special character is found, everything from the last special character (inclusive) to the current one (exclusive) plus a DLE is written.
Eventually the whole input data will have been scanned and the function terminates after writing everything from the last special character (inclusive) or the beginning of the data (if there were no special characters) to the end of the input data.
This is the code:
<<chunkw>>=
int chunkwd(unsigned char *data, int count) {
int escaped = 0;
int pos = 0;
int start = 0;
while (pos < count) {
if (data[pos] <= ESCMAX && data[pos] >= ESCMIN) {
if (pos > start) {
if (!swrite(&data[start], sizeof(char),
pos - start, OUTF))
return -1;
}
if (!swrite(&DLE, sizeof(DLE), 1, OUTF))
return -2;
start = pos;
escaped++;
}
pos++;
}
if (!swrite(&data[start], sizeof(char),
pos - start, OUTF))
return -3;
return escaped;
}
@
\emph{OUTF} is the already opened transcript file and a global variable:
<<globals>>=
FILE *OUTF;
@
The \emph{swrite()} function (“safe write”) that is being used here will return zero if the number of items written is not equal to the number of items that \emph{should} have been written:
<<swrite>>=
int swrite(const void *ptr, size_t size,
size_t nmemb, FILE *stream) {
return (fwrite(ptr, size, nmemb, stream) == nmemb);
}
@
To be able to use [[fwrite()]], \str{stdio.h} has to be included:
<<includes>>=
#include <stdio.h>
@
There are functions to write chunk headers and footers:
<<chunkwhf>>=
int chunkwh(unsigned char id) {
int ret;
for (int i = 0; i < 2; i++) {
ret = swrite(&SO, sizeof(SO), 1, OUTF);
if (!ret)
return -1;
}
return (swrite(&id, sizeof(unsigned char),
1, OUTF)) ? 1 : -1;
}
int chunkwf() {
return (swrite(&SI, sizeof(SI), 1, OUTF)) ? 1 : -1;
}
@
There is also a convenience function that writes a meta chunk’s header and footer as well as the actual data:
<<chunkwm>>=
int chunkwm(unsigned char id, unsigned char *data, int count) {
int ret;
if (!chunkwh(id))
return -11;
if ((ret = chunkwd(data, count)) < 0)
return ret;
if (!chunkwf())
return -12;
return 1;
}
@
\subsection{Error Handling}
If the program has to terminate abnormally, the function [[die()]] will be called.
After resetting the terminal attributes and telling a possible child process to exit, it will output an error message and exit the software.
<<die>>=
void die(char *message, int chunk) {
if (TTSET)
tcsetattr(STDERR_FILENO, TCSADRAIN, &TT);
if (CHILD > 0)
kill(CHILD, SIGTERM);
fprintf(stderr, "%s: ", MYNAME);
if (chunk != 0) {
fprintf(stderr, "metadata chunk %02x failed", chunk);
if (message != NULL)
fprintf(stderr, ": ");
} else {
if (message == NULL)
fprintf(stderr, "unknown error");
}
if (message != NULL)
fprintf(stderr, "%s", message);
fprintf(stderr, "; exiting.\n");
exit(EXIT_FAILURE);
}
@
[[exit()]] requires \str{stdlib.h}:
<<includes>>=
#include <stdlib.h>
@
The global variable [[MYNAME]] contains a pointer to the name the binary was called as and is set in [[main()]].
<<globals>>=
char *MYNAME;
@
\subsection{Startup and Shutdown Messages}
The [[statusmsg()]] function writes a string to both the terminal and the transcript:
<<statusmsg>>=
void statusmsg(const char *msg) {
char date[BUFSIZ];
time_t t = time(NULL);
struct tm *lt = localtime(&t);
if (lt == NULL)
die("localtime failed", 0);
if (strftime(date, sizeof(date), "%c", lt) < 1)
die("strftime failed", 0);
if (printf(msg, date, OUTN) < 0) {
perror("status stdout");
die("statusmsg stdout failed", 0);
}
if (fprintf(OUTF, msg, date, OUTN) < 0) {
perror("status transcript");
die("statusmsg transcript failed", 0);
}
}
@
\subsection{Initialization}
\subsubsection{Determining the Binary’s Name}
To be able to output its own name (e.g. in error messages), \cmd{forscript} determines the name of the binary that has been called by the user.
This value is stored in [[argv[0]]].
The global variable [[MYNAME]] will be used to reference that value from every function that needs it.
<<setmyname>>=
MYNAME = argv[0];
@
If \cmd{forscript} was called using a path name (e.g. \str{/usr/bin/forscript}), everything up to the final slash needs to be cut off.
This is done by moving the pointer to the character immediately following the final slash.
<<setmyname>>=
{ char *name;
if ((name = basename(MYNAME)) != NULL)
MYNAME = name;
}
@
[[basename()]] requires \str{libgen.h}:
<<includes>>=
#include <libgen.h>
@
\subsubsection{Command Line Arguments}
Since \cmd{forscript}’s invocation tries to mimic \cmd{script}’s as far as possible, command line argument handling is designed to closely resemble \cmd{script}’s behavior.
Therefore, like in \cmd{script}, the command line switches \str{--version} and \str{-V} are treated separately.
If there is exactly one command line argument and it is one of these, \cmd{forscript} will print its version and terminate.
<<getopt>>=
if ((argc == 2) &&
(!strcmp(argv[1], "-V") || !strcmp(argv[1], "--version"))) {
printf("%s %s\n", MYNAME, MYVERSION);
return 0;
}
@
[[MYVERSION]] is defined as a global constant:
<<globals>>=
const char *MYVERSION = "1.0.0-git";
@
The other options are parsed using the normal [[getopt()]] method, which requires \str{unistd.h}:
<<includes>>=
#include <unistd.h>
@
[[getopt()]] returns the next option character each time it is called, and $-1$ if there are none left.
The option characters are handled in a [[switch]] statement.
As in \cmd{script}, flags that turn on some behavior cause a respective global [[int]] variable to be increased by one.
These flags are:
<<globals>>=
int aflg = 0, fflg = 0, qflg = 0;
@
The value of the \str{-c} parameter is stored in a global string pointer:
<<globals>>=
char *cflg = NULL;
@
The \str{-t} flag is accepted for compatibility reasons, but has no effect in \cmd{forscript} because timing information is always written.
After the loop terminates, [[optind]] arguments have been parsed.
[[argc]] and [[argv]] are then modified accordingly to only handle non-option arguments (in \cmd{forscript} this is only the file name).
The parsing loop therefore looks like this:
<<getopt>>=
{ int c; extern char *optarg; extern int optind;
while ((c = getopt(argc, argv, "ac:fqt")) != -1)
switch ((char)c) {
case 'a':
aflg++; break;
case 'c':
cflg = optarg; break;
case 'f':
fflg++; break;
case 'q':
qflg++; break;
case 't':
break;
case '?':
default:
fprintf(stderr,
"usage: %s [-afqt] [-c command] [file]\n",
MYNAME);
exit(EXIT_FAILURE);
break;
}
argc -= optind;
argv += optind;
}
@
After the options have been parsed, the output file name will be determined and stored in the global string [[OUTN]]:
<<globals>>=
char *OUTN = "transcript";
@
If there was no file name supplied on the command line, the default name is \str{transcript}.
This differs from \cmd{script}’s default name \str{typescript} intentionally, because the file format is different and can, for example, not be displayed directly using \cmd{cat}.
If there are any scripts or constructs that assume the default output file name to be \str{typescript}, the chance that replacing \cmd{script} with \cmd{forscript} will break their functionality anyway is quite high.
\subsubsection{Opening the Output File}
As in \cmd{script}, there is a safety warning if no file name was supplied and \str{transcript} exists and is a (hard or soft) link.
<<openoutfile>>=
if (argc > 0) {
OUTN = argv[0];
} else {
struct stat s;
if (lstat(OUTN, &s) == 0 &&
(S_ISLNK(s.st_mode) || s.st_nlink > 1)) {
fprintf(stderr, "Warning: `%s' is a link.\n"
"Use `%s [options] %s' if you really "
"want to use it.\n"
"%s not started.\n",
OUTN, MYNAME, OUTN, MYNAME);
exit(EXIT_FAILURE);
}
}
@
[[lstat()]] needs \str{types.h} and \str{stat.h} as well as \str{\_XOPEN\_SOURCE}:
<<includes>>=
#include <sys/types.h>
#include <sys/stat.h>
@
<<featuretest>>=
#define _XOPEN_SOURCE 500
@
The file will now be opened, either for writing or for appending, depending on [[aflg]].
Note that if appending, the file will be opened for reading as well.
This is because \cmd{forscript} checks the file version header before appending to a file.
<<openoutfile>>=
if ((OUTF = fopen(OUTN, (aflg ? "a+" : "w"))) == NULL) {
perror(OUTN);
die("the output file could not be opened", 0);
}
@
If the file has been opened for appending, check whether it starts with a compatible file format.
Currently, the only format allowed is \str{0x01}.
If the file is empty, appending is possible, but the \emph{file version} chunk has to be written.
This is done by setting [[aflg]] to $0$, which will cause [[doio()]] to write the chunk.
<<openoutfile>>=
if (aflg) {
char buf[5];
size_t count;
count = fread(&buf, sizeof(char), 5, OUTF);
if (count == 0)
aflg = 0;
else if (count != 5 ||
strncmp(buf, "\x0e\x0e\x01\x01\x0f", 5) != 0)
die("output file is not in forscript format v1, "
"cannot append", 0);
}
@
\subsection{Preparing a New Pseudo Terminal}
While \cmd{script} uses manual PTY allocation (by trying out device names) or BSD’s [[openpty()]] where available, \cmd{forscript} has been designed to use the Unix~98 PTY multiplexer (\str{/dev/ptmx}) standardized in POSIX.1-2001 to create a new PTY.
This method requires \str{fcntl.h} and a sufficiently high feature test macro value for POSIX code.
<<includes>>=
#include <fcntl.h>
@
<<featuretest>>=
#define _POSIX_C_SOURCE 200112L
@
The PTY’s master and slave file descriptors will be stored in these global variables:
<<globals>>=
int PTM = 0, PTS = 0;
@
Additionally, the settings of the terminal \cmd{forscript} runs in will be saved in the global variable [[TT]].
This variable is used to duplicate the terminal’s settings to the newly created PTY as well as to restore the terminal settings as soon as \cmd{forscript} terminates.
There is also a variable [[TTSET]] which stores whether the settings have been written to [[TT]].
This is important when restoring the terminal settings after a failure:
If the settings have not yet been written to [[TT]], applying them will lead to undefined behavior.
<<globals>>=
struct termios TT;
int TTSET = 0;
@
<<openpt>>=
if (tcgetattr(STDIN_FILENO, &TT) < 0) {
perror("tcgetattr");
die("tcgetattr failed", 0);
}
TTSET = 1;
@
The \str{termios} structure is defined in \str{termios.h}.
<<includes>>=
#include <termios.h>
@
A new PTY master is requested like this:
<<openpt>>=
if ((PTM = posix_openpt(O_RDWR)) < 0) {
perror("openpt");
die("openpt failed", 0);
}
@
Then, access to the slave is granted.
<<openpt>>=
if (grantpt(PTM) < 0) {
perror("grantpt");
die("grantpt failed", 0);
}
if (unlockpt(PTM) < 0) {
perror("unlockpt");
die("unlockpt failed", 0);
}
@
The slave’s device file name is requested using [[ptsname()]].
Since the name is not needed during further execution, the slave will be opened and its file descriptor stored.
<<openpt>>=
{ char *pts = NULL;
if ((pts = ptsname(PTM)) != NULL) {
if ((PTS = open(pts, O_RDWR)) < 0) {
perror(pts);
die("pts open failed", 0);
}
} else {
perror("ptsname");
die("ptsname failed", 0);
}
}
@
The “parent” terminal will be configured into a “raw” mode of operation.
\cmd{script} does this by calling [[cfmakeraw()]], which is a nonstandard BSD function.
For portability reasons \cmd{forscript} sets the corresponding bits manually, thereby emulating [[cfmakeraw()]].
The list of settings is taken from the \emph{termios(3)} Linux man page~\cite{linuxman} and should be equivalent.
Afterwards, the settings of the terminal \cmd{forscript} was started in will be copied to the new terminal.
This means that in the eyes of the user the terminal’s behavior will not change, but \cmd{forscript} can now document the terminal’s data stream with maximum accuracy.
<<openpt>>=
{
struct termios rtt = TT;
rtt.c_iflag &= ~(IGNBRK | BRKINT | PARMRK | ISTRIP
| INLCR | IGNCR | ICRNL | IXON);
rtt.c_oflag &= ~OPOST;
rtt.c_lflag &= ~(ECHO | ECHONL | ICANON | ISIG | IEXTEN);
rtt.c_cflag &= ~(CSIZE | PARENB);
rtt.c_cflag |= CS8;
if (tcsetattr(STDIN_FILENO, TCSANOW, &rtt) < 0) {
perror("tcsetattr stdin");
die("tcsetattr stdin failed", 0);
}
if (tcsetattr(PTS, TCSANOW, &TT) < 0) {
perror("tcsetattr pts");
die("tcsetattr pts failed", 0);
}
}
@
\subsubsection{Managing Window Size}
If the size of a terminal window changes, the controlling process receives a \str{SIGWINCH} signal and should act accordingly.
\cmd{forscript} handles this signal in the [[resized()]] function by writing the new size to the transcript and forwarding it to the client terminal.
<<resized>>=
void resized(int signal) {
UNUSED(signal);
winsize(3);
}
@
The actual reading and writing of the window size is done by [[winsize()]], which takes a [[mode]] parameter.
If the mode is $1$, the client application’s terminal size will be set.
If the mode is $2$, the terminal size will be written to the transcript.
If the mode is $3$, both operations will be done, which is the usual case.
<<winsize>>=
void winsize(unsigned int mode) {
struct winsize size;
ioctl(STDIN_FILENO, TIOCGWINSZ, &size);
if (mode & 2)
if (chunk11(&size) < 0)
die("writing window size", 0x11);
if ((mode & 1) && PTM)
ioctl(PTM, TIOCSWINSZ, &size);
}
@
Retrieving the window size requires \str{ioctl.h} for [[ioctl()]]:
<<includes>>=
#include <sys/ioctl.h>
@
The client PTY’s window size will be initialized now.
This needs to take place before the client application is launched, because it probably requires an already configured terminal size when starting up.
Writing the size to the transcript however would put the window size meta chunk before the start of session chunk, therefore [[winsize()]]’s mode $1$ is used.
<<openpt>>=
winsize(1);
@
\subsection{Launching Subprocesses}
The original \cmd{script} uses one process to listen for input, one to listen for output and one to initialize and [[execl()]] the command to be recorded.
\cmd{forscript} in contrast uses only the [[select()]] function to be notified of pending input and output and therefore only needs two processes: Itself and the subcommand.
\subsubsection*{Registering Signal Handlers}
To be notified of an exiting subprocess, a handler for the \str{SIGCHLD} signal needs to be defined.
This signal is usually sent by the operating system if any child process’s run status changes, i.e. it is stopped (\str{SIGSTOP}), continued (\str{SIGCONT}) or it exits.
\cmd{script} terminates if the child is stopped, but \cmd{forscript} does not, because it uses the \str{SA\_NOCLDSTOP} flag to specify that it wishes not to be notified about the child stopping or resuming.
The function [[finish()]] handles the child’s termination.
The second signal handler, [[resized()]], handles window size changes.
<<sigchld>>=
{ struct sigaction sa;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_NOCLDSTOP;
sa.sa_handler = finish;
sigaction(SIGCHLD, &sa, NULL);
sa.sa_handler = resized;
sigaction(SIGWINCH, &sa, NULL);
}
@
These functions and constants require \str{signal.h}.
<<includes>>=
#include <signal.h>
@
\subsubsection*{Forking}
When a progam calls the [[fork()]] function, the operating system basically clones the program into a new process that is a subprocess of the caller.
Both processes continue to run at the next command after the [[fork()]] call, but the value [[fork()]] returned will be different:
The child will see a return value of [[0]], while the parent will retrieve the process ID of the child.
A negative value will be returned if the fork did not succeed.
<<fork>>=
if ((CHILD = fork()) < 0) {
perror("fork");
die("fork failed", 0);
}
@
[[CHILD]] is used in several places when dealing with the subprocess, therefore it is a global variable.
<<globals>>=
int CHILD = 0;
@
After forking, the child launches (or, to be exact, becomes) the process that should be logged, while the parent does the actual input/output logging.
<<fork>>=
if (CHILD == 0)
doshell();
else
doio();
@
\subsection{Running the Target Application}
The [[doshell()]] function is run in the child process, whose only task it is to set up all required PTY redirections and then execute the client command.
Therefore, open file descriptors from the parent process which are no longer needed are closed early.