-
Notifications
You must be signed in to change notification settings - Fork 4
/
ch01_bin.texi
858 lines (712 loc) · 27.4 KB
/
ch01_bin.texi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
@node Theme 1, Theme 2, Acknowledgements, Top
@chapter Theme 1: ``/bin''
@tie{}@image{img/E}very project has to start somewhere, Problem 5, Top, Top
begin at the beginning.
Guile can be used as a scripting language. Programs can be written as
plain text files, and then run from the command line by using the
Guile interpreter. As such, most scripts run on Unix-like shells will
begin with a sha-bang @code{#!} invocation. And most scripts must
start off doing the same chores: parsing the command line, acting on
the options, and finding the files whose names appeared in the
command-line arguments.
To introduce these mundane concepts, our first theme is @emph{/bin}, e.g.
re-implementing some common Unix tools. This will get us warmed up.
These examples should demonstrate
@itemize
@item
How to set up the sha-bang invocation for Guile scripts run from Unix
shells.
@item
How to handle command line arguments
@item
How to map file names given as command line arguments to their files
@item
How to search for files and directories
@item
How to open files, both as binary data and as encoded text data
@end itemize
And so, without further ado, here are the examples.
@menu
* Problem 1:: echo and cat
* Problem 2:: ls
* Problem 3:: LZW Compression
* Problem 4:: tar file archives
@end menu
@node Problem 1, Problem 2, Theme 1, Theme 1
@section Problem 1: Echo and Cat
In this problem, two venerable Unix commands are re-implemented in
Scheme: @command{echo} and @command{cat}. @command{echo} prints out
the command-line arguments, and @command{cat} prints a file to the
terminal.
@subsection The Requirements for `echo' and `cat'
In this problem, like in many of the problems, we'll lay out the
requirements for a program, and then see how our volunteer implemented
the requirements. For the purpose of this exercise, the requirements
for @command{echo} and @command{cat} with be drawn from the Posix
standard@footnote{@ref{IEEE 2004}}, with a couple of minor
modifications.
@heading Echo
The @command{echo} script writes its arguments to the standard output,
followed by a <newline>. If there are no arguments, it just prints a
<newline>.
@command{echo} has no command-line options. Even @option{--help} and
@option{--version} are not treated as command-line options.
If any of the arguments contain the backslash character (@code{\}),
the argument is modified. Backslash introduces an escape. These
escapes are parsed from logical left to right.
@table @code
@item \a
Write an <alert> in place of @code{\a}.
@item \b
Write a <backspace> in place of @code{\b}.
@item \c
Suppress the <newline> that would otherwise be written after the
command-line arguments. The @code{\c} is not written, any remaining
characters in this argument are not written, and any remaining
arguments are not written.
@item \f
Write a <form-feed> in place of @code{\f}.
@item \n
Write a <newline> in place of @code{\n}.
@item \r
Write a <carriage-return> in place of @code{\r}.
@item \t
Write a <tab> in place of @code{\t}
@item \v
Write a <vertical-tab> in place of @code{\v}.
@item \\
Write a single backslash character in place of the pair of backslash characters.
@item \0@i{num}
Write an 8-bit character corresponding to @i{num}, an octal number
between octal 0 and octal 377 (decimal 255) inclusive.
@end table
A backslash at the end of a command line argument will not be escaped.
The backslash will be written. However, the exit value will be 1 in
this case.
A backslash followed by any other character not listed in the table,
will will not be escaped. The backslash will be written, and the
character that follows it will be written. However, in this
case, the exit value will be 1.
For the octal escape @code{\0}, it is important to note that this
value is not an ISO-8859-x position or a Unicode code point, but,
rather a raw 8-bit byte to be sent unencoded to the standard output.
It is up to the operator, not @command{echo}, to ensure that a
character sequence that is valid for the environments locale is being
sent.
If a @code{\0} escape is present, but is not followed by an
number, the raw byte zero is written.
If a @code{\0} escape is present and is followed by an octal
number of greater than 3 digits, only the first 3 digits will be
interpreted as being part of the escape.
If a @code{\0} escape is present and its octal value is greater than
377, print nothing. In this case, the exit value will be 1.
An octal escape may not have unnecessary initial zeros. For example
@itemize
@item
@code{\01} should output raw byte 1
@item
@code{\001} should output raw byte zero followed by the string ``01''
@item
@code{\0001} should output raw byte zero followed by the string ``001''
@end itemize
The digits 8 and 9 are not part of an octal escape. For example, the
string @code{\018} shall be output as the raw byte 1 followed by the
character for the numeral 8.
Remember that command-line arguments and file names may contain any
character allowed by the current locale.
In all other cases, the exit value will be zero.
@heading Cat
@command{cat} [OPTION]... [FILE]...
@command{cat} concatenates files or standard input and prints it to
the standard output.
This version of @command{cat} supports three command-line options,
each with a short and a long form.
@table @option
@item -u --unbuffered
Do no buffering. Write bytes from the input to the standard output
without delay as each character is read.
@item -h --help
Print out command help.
@item -v --version
Print out the program name and version number.
@end table
After the command-line options, a list of file names is expected. The
contents of the files are printed to standard output. No character
encoding or decoding of the contents of the files should be performed:
they should be transmitted unmodified.
If the special file name @file{-} (hyphen) is given, at that point the
contents of the standard input will be transmitted to the standard
output.
If one of the files does not exist, or if it cannot be opened, the
program will print a descriptive error message to the standard error
and will return the exit code 1.
Otherwise, the exit code is zero.
@heading Rules and Suggestions for the Volunteer
For this exercise, the volunteer was requested to use only Guile's
library functions. No external libraries were allowed.
The volunteer was also requested to attempt to use one of Guile's
two sets of functions to help parse command line
options: the @code{ice-9 getopt-long} module and the @code{srfi
srfi-37} module. As you shall see, the volunteer did use manage to
use @code{ice-9 getopt-long} for @command{cat}.
@subsection An Implementation of `echo' and `cat'
Chris K Jester-Young was the volunteer for this section, and here are
his solutions, with some annotations by the editor.
@page
@heading `cat'
First we have @command{cat}. One interesting thing to note in this
example is the use of @code{catch} to catch system errors that may
arise if files do not exist or cannot be opened.
@verbatiminclude code/cat.scm
@page
@heading `echo'
Next up is @command{echo}.
@verbatiminclude code/echo.scm
@page
@node Problem 2, Problem 3, Problem 1, Theme 1
@section Problem 2: `ls'
In this section, we investigate the most famous Unix command of all
time: @command{ls}. @command{ls} lists files or directories, and
displays their properties.
However, @command{ls} has accumulated dozens of options over the past
decades. A feature-complete @command{ls} would be too long to make a
usable example. So, this script is constrained to the most important
command-line options.
The command @command{ls} lists information about files, directories,
and the contents of directories. Basically, for this challenge, the
script should operate like a limited functionality version of
Posix
@command{ls}@footnote{@url{http://pubs.opengroup.org/onlinepubs/009695399/utilities/ls.html,
The Posix spec for @command{ls}}}.
@subsection The Requirements for a Limited @command{ls}
This script only recognizes a limited set of command-line options:
@itemize
@item
@option{-a} - display all matching files, including those whose name
begins with a period
@item
@option{-l} - use the long output format
@item
@option{-R} - recursively descend into subdirectories
@end itemize
Any other command-line arguments that begin with a hyphen should cause
an ``invalid option'' error, and the program will be terminated with a
non-zero exit code.
The command-line option @option{-R} will recursively print the
contents of any subdirectory encountered.
The command-line option @option{-l} has two effects. One, information
about the files will be printed in the long format. Two, when given a
symbolic link to a directory, the command will print information about
the symbolic link itself and not the file or directory to which it
points.
@heading Operands
If a command-line argument does not begin with a hyphen, it is treated
as an operand.
When called without operands, the contents of the current directory
are printed.
Operands must be either the names of files, directories, or symbolic
links. When an operand that is not one of the above is encountered,
the script should print a descriptive error and exit with a non-zero
return code.
If an operand is a file, @command{ls} will print the name of the file.
If an operand is a symbolic link to a file, the command will print the
name of the link. If an operand is a directory, @command{ls} will
print out the contents of that directory. If an operand is a symbolic
link to a directory, @command{ls} will print the contents of that
directory, unless the @option{-l} is given.
When printing the contents of a directory, files and directories
that begin with <period> are usually not printed. If the command-line
option @option{-a} is given, files and directories that begin with
<period> are printed.
@heading Output
There are two output formats: the default format and the long format.
Within each directory, the files are sorted in case-insensitive
alphabetical order according to the current locale.
In the default format, the filenames are output one per line. You can
print them out in a columnar format if you like, though.
In the long format, the file information will be printed as follows
@multitable @columnfractions 0.2 0.1 0.68
@headitem
Field @tab Length @tab Description
@item
Type @tab 1 @tab
`d' for directory@*
`-' for regular file@*
`b' for block special file@*
`l' for symbolic link@*
`c' for character special file@*
`p' for fifo
@item
User Read @tab 1 @tab
`r' if readable by the owner@*
`-' otherwise
@item
User Write @tab 1 @tab
`w' if twritable by the owner@*
`-' otherwise
@item
User Execute @tab 1 @tab
`S' if the file is not executable and the set-user-ID
mode is set@*
`s' if the file is executable and the set-user-ID mode is
set@*
`x' if the file is executable or the directory is searchable by
the owner@*
`-' otherwise
@item
Group Read @tab 1 @tab
`r' if readable by the group@*
`-' otherwise
@item
Group Write @tab 1 @tab
`w' if writable by the group@*
`-' otherwise
@item
Group Execute @tab 1 @tab
`S' if the file is not executable and
the set-group-ID mode is set@*
`s' if the file is executable and the
set-group-ID mode is set@*
`x' if the file is exectuable or the
directory is searchable by members of this group@*
`-' otherwise
@item
Other Read @tab 1 @tab
`r' if readable by others@*
`-' otherwise
@item
Other Write @tab 1 @tab
`w' if writable by others@*
`-' otherwise
@item
Other Execute @tab 1 + space @tab
`T' if the file is a directory and the
search permission is not granted to others and the restricted
deletion flag is set@*
`t' if the file is a directory and the search
permission is granted to others and the restricted deletion flag is
set@*
`x' if the file is executable or the directory is searchable by
others@*
`-' otherwise
@item
Link Count @tab @tab
For a directory, number of immediate
subdirectories it has plus one for itself plus one for its parent.
The link count for a file is one.
@item
Owner Name @tab @tab
@item
Group Name @tab @tab
@item
File Size @tab @tab in bytes
@item
Date & Time @tab @tab
``month day hour:sec'' format if the file has
been modified in the last six months, or ``month day year'' format
otherwise
@item
Pathname @tab @tab
For non-links, the path@*
For links, ``<link name> -> <path to linked file or directory>''
@end multitable
The exit code should be zero except in those error cases described
above.
For more information about @command{ls}, you can consult The Open
Group Base Specifications Issue 6, or the documentation of any BSD or
GNU version of @command{ls}.
@heading Rules and Suggestions for the Volunteer
For this challenge, only Guile's library functions have been used.
@subsection An Implementation of `ls'
Jez Ng contributed a script to these specifications. It is an
interesting solution.
One thing to note is how he has decided to truly minimize the scope of
the procedures by declaring procedures within procedures.
Unsurprisingly, the majority of the script involves getting the format
right for long output.
@page
@verbatiminclude code/ls.scm
@page
@node Problem 3, Problem 4, Problem 2, Theme 1
@section Problem 3: LZW Compression
Lempel-Ziv-Welch compression is the basis of both the UNIX Compress
program and of GIF encoding. Today's challenge has two parts.
@itemize
@item
Write `compress' and `uncompress' procedures for LZW compression.
@item
Use them to make `compress' and `uncompress' scripts.
@end itemize
@subsection The Requirements for Compression Procedures and Scripts
First up are the compression procedures. Good old LZW compression: a
nice problem in every CompSci's undergraduate classes.
@heading @code{lzw-compress} and @code{lzw-uncompress}
@deffn {Guile Procedure} lzw-compress input-bv #:key table-size dictionary
This procedure should take a bytevector presumed to contain 8-bit
unsigned integers, and it should return a bytevector containing 16-bit
unsigned integers in little-endian format.
@var{input-bv} is the input bytevector.
@var{table-size} is an optional parameter that indicates the maximum
number of entries in the dictionary. This parameter is limited to the
range 258 - 65536. The default value of @var{table-size} is 65536.
@var{dictionary} is an optional parameter that modifies the output.
When true, the procedure should return both the output 16-bit
bytevector as well as the dictionary or hash table created by the
compression routine. Since the formation of the dictionary is up to
the implementer, the output format of the dictionary is unspecified.
@end deffn
Probably the best writup on LZW compression is the one by Mark Nelson
over at @uref{http://marknelson.us/2011/11/08/lzw-revisited/}. Refer
to that article for details on LZW compression.
It is possible to fill up the dictionary. In that case, one continues
to use the dictionary as it is, without adding new entries.
As is common, the first 256 entries in the dictionary -- entries #0 to
#255 -- are initialized to 0 to 255. Entry #256 is not to be
used. Entries #257 to #(table-size - 1) will contain the multi-byte
entries in the dictionary.
@deffn {Guile Procedure} lzw-uncompress input-bv #:key table-size dictionary
Similarly, this procedure takes @var{input-bv} the bytevector created
by @code{compress} and an optional table size and returns the
8-bit unsigned bytevector of uncompressed data. @var{dictionary},
when true, causes the procedure to also return its dictionary or hash
table.
@end deffn
Daniel Hartwig contributed an implementation of these compression
routines.
There are a couple of interesting techniques of which to take note.
First, if you C programmers have ever wondered how to create a static
variable in a function, @code{make-serial-number-generator} show the
Scheme analog of that technique.
@page
@verbatiminclude code/lzw.scm
@page
@heading The `compress' and `uncompress' scripts
Once the procedures are working, it is a simple task to write scripts
that use them. So we'll write scripts that are simplified versions
Unix commands `compress' and `uncompress'. These scripts will
manipulate files with the following format.
Each file will begin with a 3 byte header.
@itemize
@item
Byte 1: @code{#x1F}
@item
Byte 2: @code{#x9D}
@item
Byte 3: Dictionary size, given as an 8-bit unsigned number between 9
and 16 inclusive. The number indicates a dictionary size from between
2^9 and 2^16.
@end itemize
The rest of the file is the LZW-compressed 16-bit binary data stored
in little-endian format.
(Note that this may not be compatible with your operating system's
version of @command{compress}. The @command{compress} file format is
not 100% consistent across platforms.)
@example
compress [-v] [-b bits] [name ...]
@end example
For each filename, @command{compress}, will create a LZW-compressed
version of an input file. The compressed file will have the same
filename as the input file with the ".Z" extension appended to it. If
the compression is successful and the output file is successfully
written, the input file will be deleted.
If no filenames are given, @command{compress} will take the contents
of stdin and send the compressed data to stdout.
The optional @option{-b} @code{bits} parameter will indicate the
maximum size of the dictionary. If @code{bits} is given, it must be
between 9 and 16, indicating maximum dictionary sizes of
@code{2^bits}.
If the optional @option{-v} parameter is given, the script should
print to stdout the compression ratio for each file processed. If no
file was specified and this program is thus compressing stdin to
stdout, this flag is ignored.
Compress should fail with appropriate error messages if any of the
following problems occur
@itemize
@item
The command-line has unknown options or is otherwise incorrect
@item
The command line argument after a @option{-b} is out of range, non-numeric,
or missing.
@item
The file associated with an input filename does not exist or is
unreadable
@item
An input filename has a ".Z" suffix
@item
Writing the output file would overwrite a file that already exists
@item
Writing to disk fails for any reason
@item
Erasing the input file on completion fails for any reason
@end itemize
If an error occurs, the script should return the error code 1.
Otherwise it returns the error code 0.
@example
uncompress [-v] [name ...]
@end example
@command{uncompress} will create an uncompressed version of a file
generated by @command{compress}. The uncompressed file with have the
same filename as the input file with the ".Z" extension removed. If
the uncompression is successful and the output file is successfully
written, the input file will be deleted.
Also, like @command{compress}, if no filenames are given,
@command{uncompress} takes the contents of stdin and uncompresses them
to stdout.
If the optional @option{-v} parameter is given, the script should
print to stdout the compression ratio for each file processed. If no
file was specified and thus this program is compressing stdin to
stdout, this flag is ignored.
Uncompress should fail with appropriate error messages if any of the
following problems occur
@itemize
@item
The command-line has unknown options or is otherwise incorrect
@item
The file header is incorrect
@item
The bits parameter in the file header is out of range
@item
The file associated with the input filename does not exist or is
unreadable
@item
The input compressed data is incorrect or corrupt, which can be
detected by receiving an index that is not yet in the dictionary, or
if an index value exceeds the number of entries in the dictionary as
specified in the header, or if the last entry in the file not a
complete 16-bit integer
@item
The input file does not end in ".Z"
@item
The output file would overwrite a file that already exists
@item
Writing to disk fails for any reason.
@item
Erasing the input file on completion fails for any reason
@end itemize
If an error occurs, the script should return the error code 1.
Otherwise it returns the error code 0.
@heading @code{compress} and @code{uncompress}
Daniel Hartwig contributed @code{compress} and @code{uncompress}
scripts. As you can imagine, the majority of the scripts do
unglamorous tasks such as checking options, filenames and the like.
@page
Here's @code{compress}
@verbatiminclude code/compress
@page
Here's @code{uncompress}
@verbatiminclude code/uncompress
@node Problem 4, , Problem 3, Theme 1
@section Problem 4: tar file archives
This challenge is to create a script that takes a list of filenames
and that generates an @emph{ustar}-format archive file. This archive
file format is compatible with common POSIX tools.
The @emph{ustar} interchange format is one of the simpler formats used
for archive files that contain multiple files along with their
metadata.
We are going to create a script that creates @emph{ustar}-format
files. But, to keep things simple, we are only going to use a small
subset of the functionality that @emph{ustar} files can provide. The
result should be readable by common @command{tar} and @command{pax}
tools.
@subsection @command{ustar} Script
The @command{ustar} script will have a simple calling structure.
@command{ustar} @code{archive file1 .. filen}
It will create a new archive containing the files indicated on the
command line.
The script will have to handle many error conditions, including but
not limited to
@itemize
@item
filename contains characters not in the ustar-string's character set
@item
file part of filename is longer than 100 characters
@item
path part of filename is longer than 155 characters
@item
file is a symbolic link, fifo, directory or any othet type of
non-normal file
@item
file's uname and gname contain characters not in ustar-string's
character set
@item
file's uname or gname are longer than 31 characters
@item
file length is greater than 8,589,934,591 bytes, (octal 77777777777)
@item
file's UID or GID is greater than 2,097,151 (octal 7777777)
@item
system errors about inability to open, write, or close files.
@end itemize
@subsection The @emph{rustar} File Format
First, I will describe our restricted @emph{ustar} file format, which,
I'm going to dub @emph{rustar} for @emph{restricted ustar}, just so
that we're clear that I'm talking about something more specific than
the @emph{ustar} format.
@heading File Structure
A @emph{rustar} file contains a set of @emph{logical records}. Each
logical record represents the contents of a file plus its metadata.
The logical records appear sequentially in the file, one after
another, and there is no global header in the file. At the end of the
file is a footer.
@heading Logical Records
Each logical record consists of two parts, a @emph{header} segment,
and the contents of the file a.k.a the @emph{data} segment. Of these,
only the @emph{header} requires a detailed explanation.
@heading Header
The header segment is a 512 byte block that contains metadata for a
file. The block is broken up into 17 fields of fixed length. Each
field contains data in one of three types.
@heading Header Types
Here we describe the three types that can appear in a header. Each
type has the annotation @code{[N]}. The @emph{N} indicates that this
field is a fixed-size that takes up @emph{N} bytes.
@enumerate
@item
@code{rustar-string[N]} is a fixed-width string that contains only the
codepoints listed below. It is stored in the ASCII encoding, and, if
necessary, is right padded with NULL bytes to ensure it occupies the
whole of its @emph{N} bytes. NULL bytes can only appear at the end of
the string. The string need not end with NULL bytes if it fills the
whole of its fixed witdh.
The list of allowed codepoints is
@itemize
@item
U+20 to U+22
@item
U+25 to U+3F
@item
U+41 to U+5A
@item
U+5F
@item
U+61 to U+7A
@item
and U+00, but, U+00 can only be followed by more U+00.
@end itemize
@item
@code{rustar-0string[N]} --- note the `0' --- is a fixed-width string
with the same format and restrictions as a @code{rustar-string[N]} but
with an addition restriction. It must end with at least one NULL byte.
@item
@code{rustar-number[N]} is an unsigned integer stored as a fixed-width
string. The string contains the the text representation of the
integer in octal format. The last byte (and only the last byte) of
the string must be NULL. The string is left-padded with the `0'
character to ensure the number occupies the whole of its fixed width
buffer.
For example, a @code{rustar-number[8]} field for the integer 10 will
be the string ``0000012'' followed by one byte of NULL. 12 octal
equals 10 decimal.
@end enumerate
@heading Header Fields
The 17 fields in the 512 byte header block of a logical record are
@multitable @columnfractions .25 .25 .50
@headitem
Field @tab Format @tab
Description
@item
Name @tab string[100] @tab
The filename by itself, with no directory information. The path
separator character (U+2F), is not allowed.
@item
Mode @tab number[8] @tab
A bitfield of the permissions. See below.
@item
UID @tab number[8] @tab
The User ID of the file
@item
GID @tab number[8] @tab
The Group ID of the file
@item
Size @tab number[12] @tab
The length of the file in bytes
@item
mtime @tab number[12] @tab
The 32-bit integer modification time of the file.
@item
Checksum @tab number[8] @tab
256 + the sum of all the bytes in this header except the checksum
field.
@item
Typeflag @tab string[1] @tab
Always ``0''.
@item
Link name @tab string[100] @tab
Always 100 bytes of NULL.
@item
Magic @tab 0string[6] @tab
The string ``ustar'' plus a NULL.
@item
Version @tab string[2] @tab
The string ``00''.
@item
uname @tab 0string[32] @tab
The uname of the file.
@item
gname @tab 0string[32] @tab
The gname of the file
@item
Dev-Major @tab number[8] @tab
Always zero.
@item
Dev-Minor @tab number[8] @tab
Always zero.
@item
Prefix @tab string[155] @tab
Path information for this file. If this file has no additional path
information, this is all NULL. Directory separation is represented by
`/' forward slash. The slash at the end is assumed, and should not be
included explicitly.@footnote{For example: prefix ``foo'' + name
``bar'' forms ``foo/bar''. Prefix ``foo/'' + name ``bar'' forms
``foo//bar''. Don't do that.}
@item
Padding @tab 0string[12] @tab
12 bytes of NULL.
@end multitable
The mode bitfield is a standard permissions bitfield:
@itemize
@item
0x001 execute permission for 'other'
@item
0x002 write permission for 'other'
@item
0x004 read permission for 'other'
@item
0x008 exeute permission for 'group'
@item
0x010 write permission for 'group'
@item
0x020 read permission for 'group'
@item
0x040 execute permission for 'owner'
@item
0x080 write permission for 'owner'
@item
0x100 read permission for 'owner'
@item
0x200 (unused)
@item
0x400 if is setgid
@item
0x800 if is setuid
@end itemize
@heading Data
After the 512-byte header block, the binary contents of the file are
stored. The data segment is NULL-padded so that it ends on a 512-byte
block boundary.
@heading Footer
The footer is 1024 bytes of NULL that appears at the end of the file.
@page
@heading The Archive Script
Jez NG contributed a script that meets the above requirements quite
nicely. One thing to note here is the use of the procedures @code{cut}
and @code{cute}. These let you, in effect, pass a subset of the
required parameters to a procedure. In a later call, you can add the
remaining parameters to the procedure and then truly call it.
@verbatiminclude code/tar.scm
Later, Mark Weaver contributed a more featureful script that handles
almost all of the capabilites of the @code{ustar} archive format. It
does directories and links as well as files. Also, he uses a very
common hack to allow longer path names. He puts whatever part of the
path that will fit within the 100 character field for the filename.
You can find his script in the appendix, @xref {ustar Archives}.