-
Notifications
You must be signed in to change notification settings - Fork 18
/
chapter02.tex
1492 lines (1352 loc) · 82.4 KB
/
chapter02.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% -*- coding: utf-8 -*-
% !TeX program = xelatex
% This file is part of TeX by Topic
% Copyright 2007-2014 Victor Eijkhout
% Translated by [email protected]
% Translated by [email protected]
% Translated by [email protected]
% Date of translated: 2018-5-5
\documentclass{book}
\input{preamble}
\setcounter{chapter}{1}
\begin{document}
%\chapter{Category Codes and Internal States}\label{mouth}
\chapter{分类码与内部状态}\label{mouth}
%When characters are read,
%\TeX\ assigns them
%category codes. The reading mechanism has three internal
%states, and transitions between these states are affected
%by category codes of characters in the input.
%This chapter describes how \TeX\ reads its input and
%how the category codes of characters influence the
%reading behaviour. Spaces and line ends are discussed.
读取字符时,\TeX 的输入处理器会为字符分配分类码。
根据读取到字符的分类码,输入处理器会在三种内部状态之间切换。
本章讨论 \TeX 是如何读取字符的,以及字符的分类码是如何影响读取行为的。
本章还将讨论空格及行尾\liamfnote{在本文的翻译中:\emph{行尾}(line ends)是一行末尾及相关问题的统称,\emph{行终止符}(end-of-line character)是 \TeX 的输入处理器主动添加在输入行末尾的字符,\emph{行尾符}是操作系统中用于标识一行结尾的字符,例如:回车符(carriage return)与换行符(line feed)。}的相关问题。
\label{cschap:endlinechar}\label{cschap:ignorespaces}\label{cschap:catcode}
\label{cschap:char32}\label{cschap:obeylines}\label{cschap:obeyspaces}
\begin{inventory}
%\item [\cs{endlinechar}]
% The character code of the end-of-line character
% appended to input lines.
% \IniTeX\ default:~13.
\item [\cs{endlinechar}]
该命令是输入处理器添加至输入行尾的行终止符的字符编码。在 \IniTeX 中默认是 13。
%\item [\cs{par}]
% Command to close off a paragraph and go into vertical mode.
% Is generated by empty lines.
\item [\cs{par}]
该命令结束当前自然段,并使 \TeX 进入竖直模式。输入处理器会将(连续或单个的)空行转换成它。
%\item [\cs{ignorespaces}]
% Command that reads and expands until something is
% encountered that is not a \gr{space token}.
\item [\cs{ignorespaces}]
该命令展开时读取连续的空格,直到遇见非空格记号(non-\gr{space token})后停止展开。
%\item [\cs{catcode}]
% Query or set category codes.
\item [\cs{catcode}]
该命令用于查询或者设置分类码。
%\item [\cs{ifcat}]
% Test whether two characters have the same category code.
\item [\cs{ifcat}]
该命令用于检测两个字符的分类码是否相同。
%\item [\cs{\char32}]
% Control space.
% Insert the same amount of space that a space token would
% when \cs{spacefactor}${}=1000$.
\item [\cs{\textvisiblespace}]
控制空格。
该命令插入一个空白,其宽度与 \cs{spacefactor}${}=1000$ 时空格记号之宽度相等。
%\item [\cs{obeylines}]
% Macro in plain \TeX\ to make line ends significant.
\item [\cs{obeylines}]
该 plain \TeX 宏将使源文件中的行尾符的换行效果体现在排版结果中。
%\item [\cs{obeyspaces}]
% Macro in plain \TeX\ to make (most) spaces significant.
\item [\cs{obeyspaces}]
该 plain \TeX 宏将使源文件中(大多数)空格体现在排版结果中。
\end{inventory}
%\section{Introduction}
\section{概述}
%\TeX's input processor scans input lines from a file or terminal, and
%makes tokens out of the characters.
%The input processor can be viewed as
%a simple finite state automaton with three internal states;
%depending on the state its scanning behaviour may differ.
%This automaton will be treated here both from the point of view of the
%internal states and of the category codes governing the
%transitions.
\TeX 的输入处理器从文件或终端扫描输入行,而后将读取到的字符转换成记号。输入处理器可视作一简单的有限状态自动机,其具有三种内部状态;根据输入处理器所处内部状态的不同,其扫描行为有所不同。本章将分别从内部状态和控制内部状态转换的分类码两个角度来考察该自动机。
%\section{Initial processing}
\section{初始化处理}
%Input from a file (or from the user terminal, but this
%will not be mentioned specifically
%most of the time) is handled one line at a time.
%Here follows a discussion of what exactly is an input line
%for \TeX.
\TeX 逐行处理来自文件的输入(也可能是来自终端的输入,但实际甚少使用,故不再提及)。此处首先讨论在 \TeX 语境下,到底什么是「输入行」。
%Computer systems differ with respect to
%\index{line! input}\index{line! end}\index{machine independence}
%the exact definition of an input
%\mdqon
%line. The carriage return/""line feed
%\mdqoff
%\message{slash-dash}%
%sequence terminating a line is most common,
%but some systems use just a line feed, and
%some systems with fixed record length (block) storage do not have
%a line terminator at all. Therefore \TeX\ has its
%own way of terminating an input line.
不同计算机系统对%
\index{line!input}\index{line!end}\index{machine independence}%
输入行的具体定义有所不同。最常见的方式是用回车符(car\-riage return)紧跟换行符(line feed)作为行尾符,有些系统单用换行符作为行尾符,一些有定长存储(块存储)的系统则根本不使用行尾符。因此,\TeX 在结束一行输入时有自己特定的处理方式。
\begin{enumerate}
%\item An input line is read from an input file (minus the
%line terminator, if any).
\item 从输入文件读取一行输入行(不包含可能的行尾符)。
%\item Trailing spaces are removed (this is for the systems
%with block storage, and it prevents confusion because these
%spaces are hard to see in an editor).
\item 移除行尾空格(这是针对块存储系统设计的,同时避免了因行为空格不可见而导致的混乱)。
%\item The \csterm endlinechar\par, by default \gram{return}
%(code~13) is appended.
%If the value of \cs{endlinechar} is negative
%\label{append:elc}%
%or more than~255 (this was 127 in versions of \TeX\ older
%than version~3; see page~\pageref{2vs3} for more differences),
%no character is appended.
%The effect then is the same as
%if the line were to end with a comment character.
\item 将编码为 \csterm endlinechar\par 的行终止符(默认是 \ascii 编码为 13 的 \gram{return})添加在行尾。若 \cs{endlinechar} 的值为负\label{append:elc}或大于 255(在低于 \TeX 3 的版本中则是 127;\pageref{2vs3} 页介绍了更多差异),则输入处理器不会添加任何行终止符;在输入行尾添加注释字符也有相同的作用。
\end{enumerate}
%Computers may also differ in the character encoding
%(the most common schemes are \ascii{} and \ebcdic{}), so \TeX\
%converts the characters that are read from the file to its
%own character codes. These codes are then used exclusively,
%so that \TeX\ will perform the same on any system.
%For more on this, see Chapter~\ref{char}.
不同计算机在字符编码方面也存在差异(最常见的是 \ascii 和 \ebcdic{})。因此,\TeX 有必要将从文件读入的字符转换为其内部编码。这些编码仅在 \TeX 内适用,因此 \TeX 在任何操作系统上的行为都保持一致。更多内容详见第~\ref{char}~章。
%\section{Category codes}
\section{分类码}
%Each of the 256 character codes (0--255) has an
%associated \indexterm{category code}, though not necessarily always the same one.
%There are 16 categories, numbered 0--15.
%When scanning the input, \TeX\
%thus forms character-code--category-code pairs.
%The input processor sees only these pairs; from them are formed
%character tokens, control sequence tokens, and parameter tokens.
%These tokens are then passed to \TeX's expansion and execution
%processes.
256 个字符编码(0--255)中的每一个都关联了一个不尽相同的\indexterm{分类码}。\TeX 的分类码共有 16 个,从 0 开始编号至 15。在扫描输入流的过程中,\TeX 会生成由字符编码和分类码组成的字符编码-分类码配对(character-code--category-code pairs);而后,基于这些配对,输入处理器将它们处理成字符记号、控制序列记号和参数记号。这些记号随后被传给 \TeX 的展开处理器和执行处理器。
%A~character token is a character-code--category-code
%pair that is passed unchanged.
%A~control sequence token consists of one or more characters
%preceded by an escape character; see below.
%Parameter tokens are also explained below.
字符记号是简单的字符编码-分类码配对,它们会直接被传给展开处理器。控制序列记号则由转义字符引导,后接一个或多个字符组成。关于控制序列记号和参数记号的介绍详见下文。
%This is the list of the categories, together with a brief
%description. More elaborate explanations follow in this and
%later chapters.
以下就这些分类做简单说明,详细的阐述则散布在本章其他位置及后续章节当中。
\begin{enumerate}
\message{set counter}%\SetCounter:item=-1
\setcounter{enumi}{-1}
%\item\label{ini:esc}\index{category!0} Escape character; this signals
% the start of a control sequence. \IniTeX\ makes the backslash
% \verb-\- (code~92) an escape character.
\item\label{ini:esc}\index{category!0}
转义字符;用于标记控制序列的开始。\IniTeX 默认使用反斜线 \verb-\- 作为转义字符(\ascii 码为 92)。
%\item\index{category!1} Beginning of group; such a character causes
% \TeX\ to enter a new level of grouping. The plain format makes the
% open brace \verb-{- \mdqon a beginning"-of-group character. \mdqoff
\item\index{category!1}分组开始符;
\TeX 遇到此类字符时,会进入新的一层分组。
在 plain \TeX 中,默认的分组开始符是左花括号 \verb-{-。
%\item\index{category!2} End of group; \TeX\ closes the current level
% of grouping. plain \TeX\ has the closing brace \verb-}- as
% end-of-group character.
\item\index{category!2}分组结束符;
\TeX 遇到此类字符时,会关闭并从当前分组中退出。
在 plain \TeX 中,默认的分组开始符是左花括号 \verb-}-。
%\item\index{category!3} Math shift; this is the opening and closing
% delimiter for math formulas. plain \TeX\ uses the dollar
% sign~\verb-$- for this.
\item\index{category!3}数学切换符;
此类字符是数学公式的左右定界符。
在 plain \TeX 中,默认的数学切换符是美元符号 \verb-$-。
%\item\index{category!4} Alignment tab; the column (row) separator in
% tables made with \cs{halign} (\cs{valign}). In plain \TeX\ this is
% the ampersand~\verb-&-.
\item\index{category!4}制表符;
在 \cs{halign}(\cs{valign})制作的表格中,作为列(行)间分隔符。
在 plain \TeX 中,默认的制表符是与符号 \verb-&-。
%\item\index{category!5}\label{ini:eol} End of line; a character that
% \TeX\ considers to signal the end of an input line.
% \IniTeX\ assigns this code to the \gram{return}, that is, code~13.
% Not coincidentally, 13~is also the value that \IniTeX\ assigns to
% the \cs{endlinechar} parameter; see above.
\item\index{category!5}\label{ini:eol}行终止符;
\TeX 用来表达输入行结束的字符。
\IniTeX 将回车符 \gram{return}(\ascii 编码为 13)作为默认的行终止符。
这就是为什么 \IniTeX 中,\cs{endlinechar} 的值是 13(详见前文)。
%\item\index{category!6} Parameter character; this indicates parameters
% for macros. In plain \TeX\ this is the hash sign~\verb-#-.
\item\index{category!6}参数符;
用于表示宏的参数。
在 plain \TeX 中,默认的参数符是井号 \verb-#-。
%\item\index{category!7} Superscript; this precedes superscript
% expressions in math mode. It is also used to denote character codes
% that cannot be entered in an input file; see below. In plain
% \TeX\ this is the circumflex~\verb-^-.
\item\index{category!7}上标符;
在数学模式中表示上标;也可用于在输入文件中表示无法直接输入的字符(详见后文)。
在 plain \TeX 中,默认的上标符即是 \verb_^_。
%\item\index{category!8} Subscript; this precedes subscript expressions
% in math mode. In plain \TeX\ the underscore~\verb-_- is used for
% this.
\item\index{category!8}下标符;
在数学模式中表示下标。
在 plain \TeX 中,默认的下标符是下划线 \verb-_-。
%\item\index{category!9} Ignored; characters of this category are
% removed from the input, and have therefore no influence on further
% \TeX\ processing. In plain \TeX\ this is the \gr{null} character,
% that is, code~0.
\item\index{category!9}被忽略字符;
此类字符将被 \TeX 自输入流中清除,因此不会影响后续处理。
在 plain \TeX 中,默认将空字符 \gr{null}(\ascii 编码为 0)设置为被忽略字符。
%\item\index{category!10}\label{ini:sp} Space; space characters receive
% special treatment. \IniTeX\ assigns this category to the \ascii{}
% \gr{space} character, code~32.
\item\index{category!10}\label{ini:sp}空格符;
\TeX 对待空格符的方式较为特殊。
\IniTeX\ 将空格 \gr{space}(\ascii 编码为 32)作为默认的空格符。
%\item\index{category!11}\label{ini:let} Letter; in \IniTeX\ only the
% characters \n{a..z}, \n{A..Z} are in this category. Often, macro
% packages make some `secret' character (for instance~\n@) into a
% letter.
\item\index{category!11}\label{ini:let}字母;
\IniTeX 默认只将 \n{a ... z} 和 \n{A ... Z} 分为此类。
在宏包中,某些「隐秘」字符(例如 \n{@})会被暂时分为此类%
\liamfnote{此类字符的分类码在普通用户文档中和在宏包中会有不同。在宏包中暂时分为此类,以起到「提示当前控制序列是内部的」这样的作用。}。
%\item\index{category!12}\label{ini:other} Other; \IniTeX\ puts
% everything that is not in the other categories into this
% category. Thus it includes, for instance, digits and punctuation.
\item\index{category!12}\label{ini:other}其他字符;
\IniTeX 将所有未归于其他类的字符归于此类。
因此,数字和标点都属于此类。
%\item\index{category!13} Active; active characters function as a
% \TeX\ command, without being preceded by an escape character. In
% plain \TeX\ this is only the tie character~\verb-~-, which is
% defined to produce an unbreakable space; see page~\pageref{tie}.
\item\index{category!13}活动字符;
活动字符相当于一个无需转义字符前导的 \TeX 控制序列。
在 plain \TeX 中,只有带子 \verb_~_ 是活动字符,表示不可断行的空格(参见第~\pageref{tie}~页)。
%\item\index{category!14}\label{ini:comm} Comment character; from a
% comment character onwards, \TeX\ considers the rest of an input line
% to be comment and ignores it. In \IniTeX\ the per cent sign \verb-%-
% is made a comment character.
\item\index{category!14}\label{ini:comm}注释符;
\TeX 遇见注释符后,会将从注释符开始到输入行尾的所有内容视作注释而忽略。
在 \IniTeX 中,默认的注释符是百分号 \verb-%-。
%\item\index{category!15}\label{ini:invalid} Invalid character; this
% category is for characters that should not appear in the
% input. \IniTeX\ assigns the \ascii\ \gr{delete} character, code~127,
% to this category.
\item\index{category!15}\label{ini:invalid}无效字符;
该分类包含了不应在 \TeX 中出现的字符。
\IniTeX 将退格字符(\ascii 编码为 127)\gr{delete} 归于此类。
\end{enumerate}
%The user can change the mapping
%of character codes to category codes
%with the \csterm catcode\par\ command (see Chapter~\ref{gramm}
%for the explanation of concepts such as~\gr{equals}):
%\begin{disp}\cs{catcode}\gram{number}\gr{equals}\gram{number}.\end{disp}
%In such a statement, the first number is often given in the form
%\begin{disp}\verb>`>\gr{character}\quad or\quad \verb>`\>\gr{character}\end{disp}
%both of which denote the character code of the character
%(see pages \pageref{char:code} and~\pageref{int:denotation}).
用户可使用 \csterm catcode\par 命令修改字符编码到分类码的映射(见第~\ref{gramm}~章对诸如 \gr{equals} 等概念的解释):
\begin{disp}\cs{catcode}\gram{number}\gr{equals}\gram{number}.\end{disp}
该语句中,第一个参数可用如下方式给出:
\begin{disp}\verb>`>\gr{character}\quad 或\quad \verb>`\>\gr{character}\end{disp}
两种写法都表示该字符的字符编码(见第~\pageref{char:code}~页和~\pageref{int:denotation}~页)。
%The plain format defines
%\csterm active\par
%\begin{verbatim}
%\chardef\active=13
%\end{verbatim}
%so that one can write statements such as
%\begin{verbatim}
%\catcode`\{=\active
%\end{verbatim}
%The \cs{chardef} command is treated
%on pages \pageref{chardef} and~\pageref{num:chardef}.
plain \TeX 格式使用 \cs{chardef} 命令(将在第~\pageref{chardef}~页和~\pageref{num:chardef}~页介绍)将 \csterm active\par 定义为:
\begin{verbatim}
\chardef\active=13
\end{verbatim}
因此上述语句可写成这样:
\begin{verbatim}
\catcode`\{=\active
\end{verbatim}
%The \LaTeX\ format has the control sequences
%\begin{verbatim}
%\def\makeatletter{\catcode`@=11 }
%\def\makeatother{\catcode`@=12 }
%\end{verbatim}
%in order to switch on and off the `secret' character~\n@
%(see below).
\LaTeX 格式定义了如下两个控制序列,用于开启或关闭「隐秘字符」\n{@}(详见下文):
\begin{verbatim}
\def\makeatletter{\catcode`@=11 }
\def\makeatother{\catcode`@=12 }
\end{verbatim}
%The \cs{catcode} command can also be used to query category
%codes: in
%\begin{verbatim}
%\count255=\catcode`\{
%\end{verbatim}
%it yields a number, which can be assigned.
使用 \cs{catcode} 命令查询字符编码对应的分类码,可得到一个数字:
\begin{verbatim}
\count255=\catcode`\{
\end{verbatim}
在例中,\n{\{} 的分类码被保存在第 255 号 \cs{count} 寄存器中。
%Category codes can be tested by
%\begin{disp}\cs{ifcat}\gr{token$_1$}\gr{token$_2$}\end{disp}
%\TeX\ expands whatever is after \cs{ifcat} until two
%unexpandable tokens are found; these are then compared
%with respect to their category codes. Control sequence
%tokens are considered to have category code~16\index{category!16},
%which makes them all equal to each other, and unequal to
%all character tokens.
%Conditionals are treated further in Chapter~\ref{if}.
下列语句可用于检测两个记号的分类码是否相等:
\begin{disp}\cs{ifcat}\gr{token$_1$}\gr{token$_2$}\end{disp}
无论 \cs{ifcat} 后有什么,\TeX 都会将其展开,直至发现两个不可展开的记号;而后,\TeX 将比较这两个记号的分类码。控制序列的分类码被视为 16\index{category!16};因此,所有控制序列的分类码都是相等的,而与所有字符记号的分类码都不相等。条件式的详细介绍见第~\ref{if}~章。
%\section{From characters to tokens}
\section{从字符到记号}
%The input processor
%of \TeX\ scans input lines from a file or from the
%user terminal, and converts the characters in the input
%to tokens. There are three types of tokens.
从文件或用户终端扫描输入行后,\TeX 的输入处理器会将其中的字符转换为记号。记号共有三种。
\begin{itemize}
%\item Character tokens: any character that is
% passed on its own to \TeX's
%further levels of processing with an appropriate
%category code attached.
\item 字符记号:字符记号会被打上相应的分类码,而后直接传给 \TeX 的后续处理器。
%\item Control sequence tokens, of which there are two kinds:
% an escape character
%\ldash that is,\message{ldash nobreak?}
%a character of category~0\index{category!0} \rdash followed
%by a string of `letters' is
%lumped together into a \emph{control word}, which is a single token.
%An escape character followed by a single character that is not of
%category~11\index{category!11}, letter, is made into a
%\indextermsub{control}{symbol}.
%If the distinction between control word and control symbol is
%irrelevant, both are called
%\indextermsub{control}{sequence}.
\item 控制序列记号:严格来说,控制序列记号分为两种。其一是\emph{控制词}\ldash 分类码为 0 的字符\index{category!0}后紧跟一串字母(分类码是 11\index{category!11})。其二是\emph{控制字符}\index{控制!控制字符}\ldash 转义字符后紧跟单个非字母字符(分类码不是 11)。在无需区分控制词和控制字符的场合,它们统称为\cindextermsub{控制}{序列}。
%The control symbol that results from an escape character followed
%\csterm \char32\par
%by a space character is called
%\indextermbus{control}{space}.
由转义字符与一个空格字符 \cstoidx \char32\par\cs{}\textvisiblespace 构成的控制序列,称为\cindextermsub{控制}{空格}。
%\item Parameter tokens: a parameter character \ldash that is, a
% character of category~6\index{category!6}, by default~\verb=#=
% \rdash followed by a digit \n{1..9} is replaced by a parameter
% token. Parameter tokens are allowed only in the context of macros
% (see Chapter~\ref{macro}).
\item 参数记号:由一个参数符\ldash 分类码为 6\index{category!6},默认为 \verb-#-\rdash 和一个紧跟着的\n{1..9} 中的数字构成。参数记号只能在宏(见第~\ref{macro}~章)的上下文中出现。
%A macro parameter character followed by another macro parameter
%character (not necessarily with the same character code)
%is replaced by a single character token.
%This token has category~6 (macro parameter), and the character
%code of the second parameter character.
%The most common instance is of this is
%replacing \n{\#\#} by~\n{\#$_6$}, where the subscript
%denotes the category code.
连续两个参数符(字符编码不一定相同)会被替换为单个字符记号。该字符记号的分类码是 6(参数符),字符编码则与上述连续两个参数符中后者的字符编码相同。最常见的情形是 \n{\#\#} 会被替换为 \n{$\text{\#}_6$}。此处下标表示分类码。
\end{itemize}
%\section{The input processor as a finite state automaton}
\section{输入处理器是有限状态自动机}
\label{input:states}
%\TeX's input processor can be considered to be a finite state
%automaton with three \indextermbus{internal}{states},
%that is, at any moment in time it is in one of three states,
%and after transition to another state there is no memory of the
%previous states.
\TeX 的输入处理器有三种\cindextermbus{内部}{状态},可看做是一个有限状态自动机。这也就是说,在任意瞬间,\TeX 的输入处理器都处于并且只能处于三种状态的一种;并且在状态切换完成后,\TeX 的输入处理器对先前的状态没有任何记忆。
%\subsection{State {\itshape N}: new line}
\subsection{\cstate N:新行}
%State {\itshape N} is entered at the beginning of each new input line,
%and that is the only time \TeX\ is in this state. In state~{\itshape
% N} all space tokens (that is, characters of
%category~10\index{category!10}) are ignored; an end-of-line character
%is converted into a \cs{par} token. All other tokens bring \TeX\ into
%state~{\itshape M}.
当且仅当遇到新的输入行时,\TeX 会进入\cstate{N}。在该状态下,\TeX 遇到空格记号(分类码为 10 的字符\index{category!10})即会将之忽略;遇到行终止符则会将之替换为 \cs{par} 记号;遇到其它记号,则会切换到\cstate{M}。
%\subsection{State {\itshape S}: skipping spaces}
\subsection{\cstate S:忽略空格}
%State {\itshape S} is entered in any mode after a control word or
%control space (but after no other control symbol),
%or, when in state~{\itshape M}, after a space.
%In this state all subsequent spaces or end-of-line characters
%in this input line are discarded.
在\cstate{M} 下遇到空格记号,或在任意状态下遇到控制词或控制空格之后(注意其他控制字符不在此列),\TeX 会进入\cstate S。在该状态下,\TeX 遇到空格记号或行终止符都会忽略。
%\subsection{State {\itshape M}: middle of line}
\subsection{\cstate M:行内}
%By far the most common state is~{\itshape M}, `middle of line'.
%It is entered after characters of categories
%1--4, 6--8, and 11--13, and after control symbols
%other than control space.
%An end-of-line character encountered in this state
%results in a space token.
显然,\cstate M 是输入处理器最常见的状态,它表示「处理到输入行的中间」(middle of line)。
当输入处理器遇到分类码为 1--4、6--8 以及 11--13 的字符或者控制字符(不包括控制空格)之后,就会进入该状态。在该状态下,\TeX 会将行终止符替换为空格记号。
\input figs1
\begin{quotation}
\figmouth
\end{quotation}
%%\point[hathat] Accessing the full character set
%\section{Accessing the full character set}
%\label{hathat}
%\point[hathat] Accessing the full character set
\section{访问整个字符集}
\label{hathat}
%Strictly speaking, \TeX's input processor
%is not a finite state automaton.
%This is because during the scanning of the input line
%all trios consisting of two {\sl equal\/} superscript characters
%\index{\char94\char94\ replacement}
%(category code~7\index{category!7}) and a subsequent character
%(with character code~$<128$)
%are replaced by a single character with a character
%code in the range 0--127,
%differing by 64 from that of the original character.
大体上,\TeX 的输入处理器可以认为是一个有限状态自动机,但严格来说它并不是。
输入处理器在扫描输入行期间,为了让用户能够输入一些特殊字符,而设计了这样的机制:
两个{\slshape 相同}的上标符(分类码为 7\index{\char94\char94\ replacement})以及一个字符编码小于 128 的字符(暂称原字符)组成的三元组会被替换为一个新的字符。该字符的编码位于 0 -- 127 之间,并且与原字符的编码相差 64。
%This mechanism can be used, for instance, to access positions in a font
%corresponding to character codes that cannot
%be input, for instance because they are \ascii{} control characters.
%The most obvious examples are the \ascii{} \gr{return}
%and \gr{delete} characters; the corresponding
%positions 13 and 127 in a font are
%accessible as \verb>^^M> and~\verb>^^?>.
%However, since the category of \verb>^^?> is 15\index{category!15}, invalid,
%that has to be changed before character 127 can be accessed.
这种机制可用于访问字体中难以输入的字符。
例如 \ascii 中的控制符号 \gr{return}(\ascii 编码为 13)和 \gr{delete}(\ascii 编码为 127)可分别使用 \verb>^^M> 和 \verb>^^?> 进行访问。
当然,由于 \verb>^^?> 是无效字符(分类码是 15\index{category!15}),故而在访问前需要先修改其分类码。
%In \TeX3 this mechanism has been
%modified and extended to access 256 characters:
%any quadruplet \verb-^^xy- where both \n x and \n y are lowercase
%hexadecimal digits \n0--\n9, \n a--\n f,
%is replaced by a character in the
%range 0--255, namely the character the number of which is
%represented hexadecimally as~\n{xy}.
%This imposes a slight restriction on the applicability
%of the earlier mechanism: if, for instance, \verb>^^a>
%is typed to produce character~33, then a following
%\n0--\n9, \n{a}--\n{f} will be misunderstood.
在 \TeX3 中,该机制被扩展为可以访问 256 个字符:
四元组 \verb-^^xy- 会被替换为一个编码在 0 -- 255 之间的字符;其中 \n{x} 和 \n{y} 是小写十六进制数字 \n{0}--\n{9}, \n{a}--\n{f},而 \n{xy} 正是该字符编码的十六进制表示。
这一扩展也给先前的机制带来了一些限制:例如 \verb>^^7a> 会被输入处理器替换为 \verb>z>,而不是 \verb>wa>\liamfnote{\n{w} 和 \n{7} 的 \ascii 编码之差为 64。由于 \n{7a} 可被理解为是一个十六进制数,所以 \TeX 贪婪地将四元组看做一个整体替换为 \n{z}。}。
%While this process makes \TeX's input processor
%somewhat more powerful
%than a true finite state automaton,
%it does not interfere with the rest of
%the scanning. Therefore it is conceptually simpler to pretend that
%such a replacement of triplets or quadruplets
%of characters, starting with~\verb>^^>, is performed in advance.
%In actual practice this is not possible,
%because an
%input line may assign category code~7\index{category!7} to some
%character other than the circumflex, thereby
%influencing its further processing.
这种机制一方面使得 \TeX 的输入处理器在某种意义上比真正的有限状态自动机更为强大,另一方面还不会影响其余的扫描过程。因此,在概念上,可以简单地假装认为这种对 \verb>^^> 引导的三元组或四元组的替换是提前进行的。
不过,在实践中这样做是不可能的。这是因为,在处理输入行的过程中,用户可能将其他字符分类为第 7 类\index{category!7},从而影响后续处理\liamfnote{也就是说,如果没有其他字符被分类为第 7 类,则这个假设在实践中也是可行的。}。
%\section{Transitions between internal states}
\section{内部状态切换}
%Let us now discuss the effects on the internal state
%of \TeX's input processor when
%certain category codes are encountered in the input.
现在我们来讨论特定分类码的字符对 \TeX 输入处理器内部状态的影响。
%\subsection{0: escape character}
%\index{escape!character|see{character, escape}}
\subsection{0:转义字符}
\index{转义!字符|see{字符, 转义}}
%When an \indextermbus{escape}{character} is encountered,
%\TeX\ starts forming a control sequence token.
%Three different types of control sequence can result,
%depending on the category code of the character that
%follows the escape character.
遇到\cindextermbus{转义}{字符}后,\TeX 开始构建控制序列记号。取决于转义字符后面的字符之分类码,所得的控制序列记号有三种类型。
\begin{itemize}
%\item
%If the character following the escape is of category~11\index{category!11},
%letter, then \TeX\ combines the escape,
%that character and all following
%characters of category~11, into a control word.
%After that \TeX\
%goes into state~{\itshape S}, skipping spaces.
\item 若转义字符后的字符之分类码为 11\index{category!11},即字母,则 \TeX 将转义字符和之后连续的分类码为 11 的字符构建成一个控制词,而后进入\cstate{S}。
%\item
%With a character of category~10\index{category!10}, space, a control
%symbol called control space results, and \TeX\ goes into
%state~{\itshape S}.
\item 若转义字符后的字符之分类码为 10\index{category!10},即空格,则 \TeX 将它们构建成名为控制空格的控制字符,而后进入\cstate{S}。
%\item
%With a character of any other category code
%a control symbol results, and \TeX\ goes into state~{\itshape M},
%middle of line.
\item 若转义字符后的字符之分类码不是 10 也不是 11,那么 \TeX 将它们构建成控制字符,而后进入\cstate{M}。
\end{itemize}
%The letters of a control sequence name have to be all on one line;
%a control sequence name is not continued on the next line
%if the current line ends with a comment sign, or if (by letting
%\cs{endlinechar} be outside the range~0--255)
%there is no terminating character.
控制序列名字的所有字符必须在同一输入行之中;控制序列的名字不能跨行,即使当前行以注释符结尾或者没有行终止符(通过将 \cs{endlinechar} 设置为 0 -- 255 之外的值)。
%\subsection{1--4, 7--8, 11--13: non-blank characters}
\subsection{1–4, 7–8, 11–13:非空字符}
%Characters of category codes 1--4, 7--8, and 11--13 are made
%into tokens, and \TeX\ goes into state~{\itshape M}.
分类为 1--4、7--8 及 11--13 的字符会被转换为字符记号,而后 \TeX 进入\cstate{M}。
%\subsection{5: end of line}
\subsection{5:行终止符}
%Upon encountering an end-of-line character,
%\TeX\ discards the rest of the
%line, and starts processing the next line,
%in state~{\itshape N}. If the current state was~{\itshape N},
%that is, if the
%line so far contained at most spaces, a~\cs{par} token
%is inserted; if the state was~{\itshape M}, a~space token is inserted,
%and in state~{\itshape S} nothing is inserted.
遇到行终止符时,\TeX 的行为取决于输入处理器当前的状态。但不论处于何种状态,\TeX 会忽略当前行\liamfnote{指源文件中的当前行。},而后进入\cstate{N} 并开始处理下一行。
\begin{itemize}
\item 处于\cstate N,即当前行在此前只有空格,\TeX 将插入 \cs{par} 记号;
\item 处于\cstate M,\TeX 将插入一个空格记号;
\item 处于\cstate S,\TeX 将不插入任何记号。
\end{itemize}
%Note that by `end-of-line character' a character with category
%code~5 is meant. This is not necessarily the \cs{endlinechar},
%nor need it appear at the end of the line.
%See below for further remarks on line ends.
此处「行终止符」指得是分类码为 5 的字符。因此,它的字符编码不一定是 \cs{endlinechar},也不一定非得出现在行尾。详见后文。
%\subsection{6: parameter}
\subsection{6:参数符}
%A \indextermbus{parameter}{character} \ldash usually~\verb=#= \rdash can be
%followed by either a digit \n{1..9}
%in the context of macro definitions
%\altt
%or by another parameter character.
%In the first case a `parameter token' results,
%in the second case only a single parameter character
%is passed on as a character token for further processing.
%In either case \TeX\ goes into state~{\itshape M}.
在宏定义的上下文中,\emph{参数符} \ldash 通常为 \verb-#-\rdash\ 可跟 \n{1..9} 中的数字或另一个参数符。前者产生参数记号,而后者产生单个参数字符记号待后续处理。在这两种情形中,\TeX 都会进入\cstate{M}。
%A parameter character can also appear on its own in an
%alignment preamble (see Chapter~\ref{align}).
单独出现的参数符也被用于阵列的模板行(见第~\ref{align}~章)。
%\subsection{7: superscript}
\subsection{7:上标符}
%A superscript character is handled like most non-blank
%characters, except in the case where it is followed
%by a superscript character of the same character code.
%The process
%that replaces these two characters plus the following character
%(possibly two characters in \TeX3) by another character
%was described above.
\TeX 对上标符的处理和大多数非空字符一样,仅在上述替换机制中有所不同:连续两个字符编码相同的上标符及其后字符组成的三元组或四元组会按规则被替换为其它字符。
%\subsection{9: ignored character}
\subsection{9:被忽略符}
%Characters of category 9 are ignored; \TeX\ remains in the same state.
分类码为 9 的字符会被忽略,且不会影响 \TeX 的状态。
%\subsection{10: space}
\subsection{10:空格符}
%A token with category code 10 \ldash this is called a \gr{space token},
%irrespective of the character code \rdash
%is ignored in states {\itshape N} and~{\itshape S}
%(and the state does not change);
%in state~{\itshape M} \TeX\ goes into state~{\itshape S}, inserting
%a token that has category~10 and character code~32
%(\ascii{} space).
%This implies that the character code of the space token may change
%from the character that was actually input.
在\cstate{N} 和\cstate{S} 中,不论字符编码是多少,空格记号\ldash 分类码为 10 的记号\rdash\ 都会被忽略;同时 \TeX 的状态保持不变。在\cstate{M} 中,\TeX 会向正在构建的记号序列中插入 $ \text{\textvisiblespace}_{10} $(\ascii 编码中的空格,编码为 32),并进入\cstate{S}。这意味着空格记号的字符编码可能与输入字符的编码不同\liamfnote{不论输入的是哪一个分类码为 10 的字符,输入处理器都会将其替换为字符编码为 32 的 \ascii 空格。}。
%\subsection{14: comment}
\subsection{14:注释符}
%A comment character causes \TeX\ to discard
%the rest of the line, including the comment character.
%In particular, the end-of-line character is not seen,
%so even if the comment was encountered in state~{\itshape M}, no space
%token is inserted.
\TeX 遇到注释符后,会忽略当前行之后包括注释符本身在内的所有内容。特别地,\TeX 会忽略行终止符。因此,哪怕是在\cstate{M} 下,\TeX 也不会插入额外的空格记号。
%\subsection{15: invalid}
\subsection{15:无效字符}
%Invalid characters cause an error message. \TeX\ remains in
%the state it was in.
%However, in the context of a control symbol an invalid character
%is acceptable. Thus \verb>\^^?> does not cause any error messages.
\TeX 遇到无效字符时会报错,而 \TeX 自身会停留在之前的状态。不过,在控制字符的上下文中,无效字符是合法的。因此, \verb>\^^?> 不会触发报错。
%%\point[cat12] Letters and other characters
%\section{Letters and other characters}
%\label{cat12}
%\point[cat12] Letters and other characters
\section{分类码中的字母与其他字符}
\label{cat12}
%In most programming languages identifiers can consist
%of both letters and digits (and possibly some other
%character such as the underscore), but control sequences in \TeX\
%are only allowed to be formed out of characters of category~11,
%letter. Ordinarily, the digits and punctuation symbols have
%category~12, other character.
%However, there are contexts where \TeX\ itself
%generates a string of characters, all of which have
%category code~12, even if that is not their usual
%category code.
大部分编程语言的标识符可由字母与数字构成(还可能包含其他诸如下划线之类的字符)。但是,在 \TeX 中,控制序列的名字只能由第 11 类字符(即字母)组成。而通常,数字和标点的分类码是 12,即其他字符。
此外,\TeX 可以产生一些由第 12 类字符组成的字符串,哪怕其中的字符原本并非都是第 12 类字符。
%This happens when the operations
%\cs{string},
%\cs{number},
%\cs{romannumeral},
%\cs{jobname},
%\cs{fontname},
%\cs{meaning},
%and \cs{the}
%are used to generate a stream of character tokens.
%If any of the characters delivered by such a command
%is a space character (that is, character code~32),
%it receives category code~10, space.
此类字符串可由 \cs{string}、\cs{number}、\cs{romannumeral}、\cs{jobname}、\cs{fontname}、\cs{meaning} 以及 \cs{the} 等命令生成。若这些命令产生的字符串包含空格字符(\ascii 编码为 32)\liamfnote{注意,此处说的是空格字符,而非是 \TeX 的空格记号。前者讨论的是字符编码,而后者讨论的是分类码。},则在输出的字符串中,该字符的分类码为 10。
%For the extremely rare case where a hexadecimal digit has been
%hidden in a control sequence, \TeX\ allows \n A$_{12}$--\n F$_{12}$
%to be hexadecimal digits, in addition to the ordinary
%\n A$_{11}$--\n F$_{11}$ (here
%the subscripts denote the category codes).
在极个别情况下,控制序列的展开中可能会包含十六进制数字;因此,除了通常表示字母的 $ \text{\n{A}}_{11} $ -- $ \text{\n{F}}_{11} $ 之外,\TeX 中还有表示十六进制数字的 $ \text{\n{A}}_{12} $ -- $ \text{\n{F}}_{12} $。
%For example,
%\begin{disp}\verb>\string\end>\quad gives four character tokens\quad
%\n{\char92$_{12}$e$_{12}$n$_{12}$d$_{12}$} \end{disp}
%Note that the \indextermbus{escape}{character}~\texttt{\char`\\}$_{12}$\label{use:escape}
%is used in the output only because the
%value of \cs{escapechar} is the character code for the
%backslash. Another value of \cs{escapechar} leads to another
%character in the output of \cs{string}.
%The \cs{string} command is treated further in Chapter~\ref{char}.
举例来说,
\begin{disp}\verb>\string\end>\quad 得到四个字符记号 \quad\n{\char92$_{12}$e$_{12}$n$_{12}$d$_{12}$} \end{disp}
注意,此处输出中有\emph{转义字符}\index{字符!转义字符}~\texttt{\char`\\}$_{12}$\label{use:escape} 的原因是宏 \cs{escapechar} 的值是反斜线的字符编码。而若将 \cs{escapechar} 的值改为其它字符的编码,则 \cs{string} 将输出另一个字符。有关 \cs{string} 命令的详细内容参见第~\ref{char}~章。
%Spaces can wind up in control sequences:
%\begin{disp}\verb>\csname a b\endcsname>\end{disp} gives a control sequence
%token in which one of the three characters is a space.
%Turning this control sequence token into a string of characters
%\begin{disp}\verb>\expandafter\string\csname a b\endcsname>\end{disp}
%gives \n{\char92$_{12}$a$_{12}$\char32$_{10}$b$_{12}$}.
通过一些特殊技巧,空格也可以出现在控制序列的名字当中:
\begin{disp}\verb>\csname a b\endcsname>\end{disp}
是一个控制序列记号,其名称由三个字符组成,并且其中之一是空格符。将这个控制序列转化为字符串
\begin{disp}\verb>\expandafter\string\csname a b\endcsname>\end{disp}
可得 \n{\char92$_{12}$a$_{12}$\textvisiblespace$_{10}$b$_{12}$}。
%As a more practical example, suppose there exists a sequence
%of input files \n{file1.tex}, \n{file2.tex}\label{ex:jobnumber},
%and we want to
%write a macro that finds the number of the input file
%that is being processed. One approach would be to write
%\begin{verbatim}
%\newcount\filenumber \def\getfilenumber file#1.{\filenumber=#1 }
%\expandafter\getfilenumber\jobname.
%\end{verbatim}
%where the letters \n{file} in the parameter text of the
%macro (see Section~\ref{param:text}) absorb that part of the
%jobname, leaving the number as the sole parameter.
举个更加实用的例子。假设有一系列输入文件:\n{file1.tex}、\n{file2.tex}\label{ex:jobnumber},而我们希望写一个宏来输出当前正在处理的文件的序号。第一种解法是:
\begin{verbatim}
\newcount\filenumber
\def\getfilenumber file#1.{\filenumber=#1 }
\expandafter\getfilenumber\jobname.
\end{verbatim}
宏定义中,参数文本中的 \n{file}(见第~\ref{param:text}~节)会吸走 \cs{jobname} 中的 \n{file} 部分,从而留下文件编号作为宏的参数。
%However, this is slightly incorrect: the letters \n{file} resulting
%from the \cs{jobname} command have category code~12, instead of
%11 for the ones in the definition of \cs{getfilenumber}.
%This can be repaired as follows:
%\begin{verbatim}
%{\escapechar=-1
% \expandafter\gdef\expandafter\getfilenumber
% \string\file#1.{\filenumber=#1 }
%}
%\end{verbatim}
%Now the sequence \verb>\string\file> gives the four
%letters \n{f$_{12}$i$_{12}$l$_{12}$e$_{12}$};
%the \cs{expandafter} commands let this be executed prior to
%the macro definition;
%the backslash is omitted because we put\handbreak \verb>\escapechar=-1>.
%Confining this value to a group makes it necessary to use~\cs{gdef}.
但这段代码有些小问题。\cs{jobname} 输出的 \n{file} 四个字符,其分类码为 12。但在 \cs{getfilenumber} 的定义中,\n{file} 四个字符的分类码是 11。为此,需要对上述代码进行以下修正:
\begin{verbatim}
{\escapechar=-1
\expandafter\gdef\expandafter\getfilenumber
\string\file#1.{\filenumber=#1 }
}
\end{verbatim}
此处,\verb>\escapechar=-1> 让 \cs{string} 忽略反斜线;因此 \verb>\string\file> 的结果会是 \n{f$_{12}$i$_{12}$l$_{12}$e$_{12}$} 四个字符。为了在宏定义是得到分类码为 12 的四个字符,我们使用 \cs{expandafter} 命令让 \verb>\string\file> 在宏定义之前先行展开;而由于 \cs{escapechar} 的设定被放在分组内部,所以我们需要使用 \cs{gdef} 进行宏定义。
%\section{The \lowercase{\n{\char92par}} token}
\def\cspar{\cs{par}}
\section{\n{\protect\cspar} 记号}
%\TeX\ inserts a \csterm par\par\ token into the input after
%an \indextermbus{empty}{line}, that is, when
%encountering a character with category code~5,
%end of line, in state~{\itshape N}.
%It is good to realize when exactly this happens:
%since \TeX\ leaves state~{\itshape N}
%when it encounters any token but a space,
%a~line giving a \cs{par} can only contain characters
%of category~10. In particular, it cannot end with a comment
%character. Quite often this fact is used the other way around:
%if an empty line is wanted for the layout of the input
%one can put a comment sign on that line.
在遇到\cindextermbus{空}{行}之后,也就是在\cstate{N} 遇到行终止符(分类码为 5)之后\liamfnote{此处说的是 \TeX 添加的行终止符,而不是输入文件中的行尾符。输入文件中的行尾符已在初始化处理中被移除并替换成了行终止符。},\TeX 会向输入中插入一个 \csterm par\par 记号。具体来说,由于 \TeX 遇到任何非空格字符,都会从\cstate{N} 转移走,因此空行只能包含分类码为 10 的字符。特别地,空行不能以注释符结尾\liamfnote{此时,\TeX 添加的行终止符位于注释之后,故而该行终止符会被 \TeX 忽略。}。因此,若输入文件中因格式美观需要保留空行,则可以在该行中放一个注释符。这算是 \TeX 这一特性的常见用法。
%Two consecutive empty lines generate two \cs{par} tokens.
%For all practical purposes this is equivalent to one \cs{par},
%because after the first one \TeX\ enters vertical mode, and
%in vertical mode a \cs{par} only
%exercises the page builder,
%and clears the paragraph shape parameters.
两个连续的空行产生两个连续的 \cs{par} 记号,而实际上它们等同于一个 \cs{par} 记号:在遇见第一个 \cs{par} 记号之后,\TeX 会进入竖直模式,而在竖直模式中,\cs{par} 只是充当 \TeX 页面构建器,起到清空段落形状参数的作用。
%A \cs{par} is also inserted into the input when \TeX\ sees a
%\gram{vertical command} in unrestricted horizontal mode.
%After the \cs{par} has been read and expanded, the
%vertical command is examined anew (see Chapters~\ref{hvmode}
%and~\ref{par:end}).
\TeX 于非受限水平模式(unrestricted horizontal mode)遇到竖直命令(\gram{vertical command})时,也会向输入插入一个 \cs{par} 记号。当该 \cs{par} 被读取和展开后,上述竖直命令会被重新处理(详见第~\ref{hvmode}~章和~\ref{par:end}~章)。
%The \cs{par} token may also be inserted by the \cs{end}
%command that finishes off the run of \TeX; see Chapter~\ref{output}.
\cs{end} 命令\liamfnote{注意这里说的不是 \LaTeX 中结束环境的 \cs{end}\marg{\meta{env-name}} 命令,而就是 \cs{end} 这个 plain \TeX 命令。}也会向输入插入 \cs{par} 记号,而后结束 \TeX\ 的运行;见第~\ref{output}~章。
%It is important to realize that \TeX\ does what it normally does
%when encountering an empty line
%(which is ending a paragraph)
%only because of the default definition of the \cs{par} token.
%By redefining \cs{par} the behaviour
%caused by empty lines and vertical commands can be changed completely,
%and interesting special effects can be achieved.
%In order to continue to be able to cause the actions normally
%associated with \cs{par}, the synonym \cs{endgraf} is
%available in the plain format. See further Chapter~\ref{par:end}.
值得注意的是,遇到空行时 \TeX 通常的行为(结束当前自然段)完全取决于 \cs{par} 记号的默认定义。重定义 \cs{par} 后,空行和竖直命令的行为可能就完全两样了;因此,我们可以借此实现一些特别的效果。在这种情况下,为了使用正常的 \cs{par} 的功能,plain \TeX 提供了其同义词 \cs{endgraf}。详见第~\ref{par:end}~章。
%The \cs{par} token is not allowed to be part of a macro
%argument, unless the macro has been declared to be \cs{long}.
%A \cs{par} in the argument of a non-\cs{long} macro
%prompts \TeX\ to give a `runaway argument' message.
%Control sequences that have been \cs{let} to \cs{par}
%(such as \cs{endgraf}) are allowed, however.
除非宏被声明为 \cs{long} 的,不然 \cs{par} 记号不能出现在宏的参数当中。对于非 \cs{long} 声明的宏,若其参数中包含 \cs{par} 记号,则 \TeX 会给出「runaway argument」的报错。不过,使用 \cs{let} 定义的与 \cs{par} 同义的控制序列(例如 \cs{endgraf})是允许出现在这些宏的参数之中的。
%\section{Spaces}
\section{空格}
%This section treats some of the aspects of the
%\indextermbus{space}{character} and \indextermbus{space}{token} in the
%initial processing stages of \TeX. The topic of spacing in text
%typesetting is treated in Chapter~\ref{space}.
这一节讨论输入处理器中有关\cindextermbus{空格}{字符}和\cindextermbus{空格}{记号}的一些内容。有关文本排版中的空格,留待第~\ref{space}~章讨论。
%\subsection{Skipped spaces}
\subsection{被忽略的空格}
%From the discussion of the internal states of \TeX's
%input processor
%it is clear that some spaces in the input never reach the
%output; in fact they never get past the input processor.
%These are for instance the spaces at the beginning
%of an input line, and the spaces following the one
%that lets \TeX\ switch to state~{\itshape S}.
在上述有关输入处理器内部状态的讨论中,我们不难发现,有些空格在输入处理器中就被抛弃了,因此永远不会被输出:输入行开头的空格以及在让 \TeX 进入\cstate{S} 的字符之后的空格。
%
%On the other hand, line ends can generate spaces (which are not
%in the input) that may wind up in the output.
%There is a third kind of space: the spaces that get past the
%input processor,
%or are even generated there, but still do not wind up in the
%output. These are the \gram{optional spaces} that the
%syntax of \TeX\ allows in various places.
另一方面,行终止符尽管不在输入中(而是由 \TeX 添加的),但能产生可输出的空格。除此之外,还有第三种空格:它们可以通过输入处理器,甚至干脆由输入处理器产生,但也不会被输出。那便是非强制空格(\gram{optional spaces})。在 \TeX 的语法中,很多地方都会出现此类空格。
%\subsection{Optional spaces}
\subsection{非强制空格}
%The syntax of \TeX\ has the concepts of \indextermbus{optional}{spaces}
%and `one optional space':
\TeX 语法中有所谓\cindextermbus{非强制}{空格}与\emph{单个非强制空格}的概念:
\begin{disp}\gr{one optional space} $\longrightarrow$
\gr{space token} $|$ \gr{empty}\nl
\gr{optional spaces} $\longrightarrow$
\gr{empty} $|$ \gr{space token}\gr{optional spaces}\end{disp}
%In general, \gr{one optional space} is allowed after
%numbers and glue specifications, while \gr{optional spaces} are
%allowed whenever a space can occur inside a number
%(for example, between a minus sign and the digits of the number)
%or glue specification (for example, between \n{plus} and \n{1fil}).
%Also, the definition of \gr{equals} allows \gr{optional spaces}
%before the \n= sign.
通常单个非强制空格(\gr{one optional space})允许出现在数字和粘连说明之后;而非强制空格(\gr{optional spaces})允许出现在数字或粘连中任意允许出现空格的地方(比如负号与数字之间,又比如 \n{plus} 和 \n{1fil} 之间)。此外,根据 \gr{equals} 的定义,非强制空格允许出现在 \n{=} 之前。
%Here are some examples of optional spaces.
以下是有关非强制空格的一些例子:
\begin{itemize}
%\item A number can be delimited by \gr{one optional space}.
%This prevents accidents (see Chapter~\ref{number}),
%and it speeds up processing, as \TeX\ can
%detect more easily where the \gram{number} being read ends.
%Note, however, that not every `number' is a \gram{number}:
%for instance the {\tt 2} in \cs{magstep2} is not a number,
%but the single token that is the parameter of the
%\cs{magstep} macro. Thus a space or line end after this
%is significant. Another example is a parameter number,
%for example~\n{\#1}: since at most nine parameters are allowed, scanning
%one digit after the parameter character suffices.
\item \gr{one optional space} 可用于界定数字的范围。这有助于避免一些意外情况(见第~\ref{number}~章),同时能加速 \TeX 的处理过程——这是因为借助单个非强制空格,\TeX 能更容易地界定当前正在读入的 \gram{number} 于何时结束。
注意,并非每个「数值」都是 \gram{number}。例如说,\cs{magstep2} 中的 {\ttfamily 2} 就不是数字,而是作为 \cs{magstep} 的参数的单独的字符记号。因此,在其后加上空格或行终止符是有意义的。此外,宏参数中的数字,例如 \n{\#1}:因为一个宏最多允许有 9 个参数,故而只需在参数符后扫描一位数字即可\liamfnote{而不需要单个非强制空格来辅助界定数字的范围。}。
%\item From the grammar of \TeX\
%it follows that the
%keywords \n{fill} and \n{filll}
%consist of \n{fil} and
%separate {\tt l}$\,$s, each of which is a keyword
%(see page~\pageref{keywords} for a more elaborate discussion),
%and hence can be followed by optional spaces.
%Therefore forms such as \hbox{\n{fil L l}} are also valid.
%This is a potential source of strange accidents.
%In most cases, appending a \cs{relax} token prevents
%such mishaps.
\item 根据 \TeX 的语法,关键字 \n{fill} 及 \n{filll} 由 \n{fil} 与若干单独的 {\ttfamily l} 字符记号组成(详见第~\pageref{keywords}~页);因此此处允许非强制空格。据此,例如 \hbox{\n{fil\textvisiblespace L\textvisiblespace l}} 是合法的关键字\liamfnote{\TeX 的关键字不区分大小写,并且在关键字前允许有非强制空格。}。这里有一些潜在的问题,可能导致莫名其妙的情况。大多数情况下,在关键字后面加上一个 \cs{relax} 即可避免此类问题。
%\item The primitive command \csterm ignorespaces\par\
%may come in handy as the final command in a macro definition.
%As it gobbles up
%optional spaces, it can be used to prevent spaces following the
%closing brace of an argument from winding up in the output
%inadvertently. For example, in
%\begin{verbatim}
%\def\item#1{\par\leavevmode
% \llap{#1\enspace}\ignorespaces}
%\item{a/}one line \item{b/} another line \item{c/}
%yet another
%\end{verbatim}
%the \cs{ignorespaces} prevents spurious
%spaces in the second and third item.
%An empty line
%after \cs{ignorespaces} will still insert a \cs{par}, however.
\item \TeX 原语 \csterm ignorespaces\par 会吃掉其后的非强制空格;故此可将其插入宏定义的末尾,以避免将参量右花括号后的空格无意带入输出当中。例如说:
\begin{verbatim}
\def\item#1{\par\leavevmode
\llap{#1\enspace}\ignorespaces}
\item{a/}one line \item{b/} another line \item{c/}
yet another
\end{verbatim}
此处,\cs{ignorespaces} 吃掉了第二、第三两次调用之后的空格,而这些空格是不希望被排版输出的。不过,在 \cs{ignorespaces} 之后的空行仍然会插入 \cs{par} 记号。
\end{itemize}
%\subsection{Ignored and obeyed spaces}
\subsection{被忽略和被保留的空格}
%After control words spaces are ignored. This is not an
%instance of optional spaces, but it is due to the fact that
%\TeX\ goes into state~{\itshape S}, skipping spaces, after control
%words. Similarly an end-of-line character is skipped
%after a control word.
\TeX 会忽略控制词之后的空格。不过这不是因为控制词之后的空格是非强制空格,而是因为 \TeX 在遇到控制词之后会进入\cstate{S},从而忽略空格。类似地,控制词之后的行终止符也会被忽略。
%Numbers are delimited by only \gr{one optional space},
%but still
%\begin{disp}\n{a\char92 count0=3\char32\char32b}\quad gives\quad `ab',\end{disp}
%because \TeX\ goes into state~{\itshape S} after the first
%space token. The second space is therefore skipped
%in the input processor of \TeX; it never becomes a space token.
数字由单个非强制空格界定,但是
\begin{disp}\n{a\char92 count0=3\char32\char32b}\end{disp}
的输出是 `ab'。这是因为 \TeX 在第一个空格记号\liamfnote{第一个空格记号是单个非强制空格,界定了单个数字。}之后会进入\cstate{S},从而第二个空格会被 \TeX 的输入处理器忽略,永远不会变成空格记号。
%Spaces are skipped furthermore when \TeX\ is in state~{\itshape N},
%newline. When \TeX\ is processing in vertical mode
%space tokens (that is, spaces that were not skipped)
%are ignored. For example, the space inserted (because of the line end)
%after the first box in
%\begin{verbatim}
%\par
%\hbox{a}
%\hbox{b}
%\end{verbatim}
%has no effect.
当 \TeX 处于新行\cstate{N} 时,空格也会被忽略。另一方面,当 \TeX 处于竖直模式工作时,空格记号(也就是在一开始未被忽略的空格)会被忽略。例如说,下例第一个盒子之后由行终止符生成的空格记号会被忽略\liamfnote{此处 \cs{hbox}\marg{\meta{内容物}} 并不会使 \TeX 从由 \cs{par} 记号引入的竖直模式中切换回水平模式。}。
\begin{verbatim}
\par
\hbox{a}
\hbox{b}
\end{verbatim}
% Both plain \TeX\ and \LaTeX\ define a command \cs{obeyspaces}
% \altt
% that makes spaces significant: after one space other spaces are no
% longer ignored. In both cases the basis is
% \altt
plain \TeX 和 \LaTeX 格式都定义了名为 \cs{obeyspaces} 的宏。该宏能使每个空格都是有意义的:在一个空格之后,连续的空格会被保留。两种格式中,\cs{obeyspaces} 的基本形式是一致的。
\begin{verbatim}
\catcode`\ =13 \def {\space}
\end{verbatim}
% However, there is a difference between the two cases:
% in plain \TeX\
不过,对于 \cs{space} 的定义,两种格式有所区别。在 plain \TeX 中,\cs{space} 的定义如下
\begin{verbatim}
\def\space{ }
\end{verbatim}
在 \LaTeX 中,同名的宏则定义为
% while in \LaTeX\
\begin{verbatim}
\def\space{\leavevmode{} }
\end{verbatim}
% although the macros bear other names there.
% The difference between the two macros becomes
% apparent in the context of \cs{obeylines}:
% each line end is then a \cs{par} command, implying that
% each next line is started in vertical mode.