chapter02.tex

% -*- coding: utf-8 -*-
% !TeX program = xelatex
% This file is part of TeX by Topic
% Copyright 2007-2014 Victor Eijkhout
% Translated by LiYanrui@bbs.ctex.org
% Translated by zoho@bbs.ctex.org
% Translated by Liam0205@bbs.ctex.org
% Date of translated: 2018-5-5
\documentclass{book}

\input{preamble}
\setcounter{chapter}{1}

\begin{document}

%\chapter{Category Codes and Internal States}\label{mouth}
\chapter{分类码与内部状态}\label{mouth}

%When characters are read,
%\TeX\ assigns them
%category codes. The reading mechanism has three internal
%states, and transitions between these states are affected
%by category codes of characters in the input.
%This chapter describes how \TeX\ reads its input and
%how the category codes of characters influence the
%reading behaviour. Spaces and line ends are discussed.
读取字符时，\TeX 的输入处理器会为字符分配分类码。
根据读取到字符的分类码，输入处理器会在三种内部状态之间切换。
本章讨论 \TeX 是如何读取字符的，以及字符的分类码是如何影响读取行为的。
本章还将讨论空格及行尾\liamfnote{在本文的翻译中：\emph{行尾}（line ends）是一行末尾及相关问题的统称，\emph{行终止符}（end-of-line character）是 \TeX 的输入处理器主动添加在输入行末尾的字符，\emph{行尾符}是操作系统中用于标识一行结尾的字符，例如：回车符（carriage return）与换行符（line feed）。}的相关问题。

\label{cschap:endlinechar}\label{cschap:ignorespaces}\label{cschap:catcode}
\label{cschap:char32}\label{cschap:obeylines}\label{cschap:obeyspaces}
\begin{inventory}
%\item [\cs{endlinechar}]
%      The character code of the end-of-line character
%      appended to input lines.
%      \IniTeX\ default:~13.
\item [\cs{endlinechar}]
      该命令是输入处理器添加至输入行尾的行终止符的字符编码。在 \IniTeX 中默认是 13。
%\item [\cs{par}]
%      Command to close off a paragraph and go into vertical mode.
%      Is generated by empty lines.
\item [\cs{par}]
      该命令结束当前自然段，并使 \TeX 进入竖直模式。输入处理器会将（连续或单个的）空行转换成它。
%\item [\cs{ignorespaces}]
%      Command that reads and expands until something is
%      encountered that is not a \gr{space token}.
\item [\cs{ignorespaces}]
      该命令展开时读取连续的空格，直到遇见非空格记号（non-\gr{space token}）后停止展开。
%\item [\cs{catcode}]
%      Query or set category codes.
\item [\cs{catcode}]
      该命令用于查询或者设置分类码。
%\item [\cs{ifcat}]
%      Test whether two characters have the same category code.
\item [\cs{ifcat}]
      该命令用于检测两个字符的分类码是否相同。
%\item [\cs{\char32}]
%      Control space.
%      Insert the same amount of space that a space token would
%      when \cs{spacefactor}${}=1000$.
\item [\cs{\textvisiblespace}]
      控制空格。
      该命令插入一个空白，其宽度与 \cs{spacefactor}${}=1000$ 时空格记号之宽度相等。
%\item [\cs{obeylines}]
%      Macro in plain \TeX\ to make line ends significant.
\item [\cs{obeylines}]
      该 plain \TeX 宏将使源文件中的行尾符的换行效果体现在排版结果中。
%\item [\cs{obeyspaces}]
%      Macro in plain \TeX\ to make (most) spaces significant.
\item [\cs{obeyspaces}]
      该 plain \TeX 宏将使源文件中（大多数）空格体现在排版结果中。
\end{inventory}

%\section{Introduction}
\section{概述}

%\TeX's input processor scans input lines from a file or terminal, and
%makes tokens out of the characters.
%The input processor can be viewed as
%a simple finite state automaton with three internal states;
%depending on the state its scanning behaviour may differ.
%This automaton will be treated here both from the point of view of the
%internal states and of the category codes governing the
%transitions.
\TeX 的输入处理器从文件或终端扫描输入行，而后将读取到的字符转换成记号。输入处理器可视作一简单的有限状态自动机，其具有三种内部状态；根据输入处理器所处内部状态的不同，其扫描行为有所不同。本章将分别从内部状态和控制内部状态转换的分类码两个角度来考察该自动机。

%\section{Initial processing}
\section{初始化处理}

%Input from a file (or from the user terminal, but this
%will not be mentioned specifically
%most of the time) is handled one line at a time.
%Here follows a discussion of what exactly is an input line
%for \TeX.
\TeX 逐行处理来自文件的输入（也可能是来自终端的输入，但实际甚少使用，故不再提及）。此处首先讨论在 \TeX 语境下，到底什么是「输入行」。

%Computer systems differ with respect to
%\index{line! input}\index{line! end}\index{machine independence}
%the exact definition of an input
%\mdqon
%line. The carriage return/""line feed
%\mdqoff
%\message{slash-dash}%
%sequence terminating a line is most common,
%but some systems use just a line feed, and
%some systems with fixed record length (block) storage do not have
%a line terminator at all. Therefore \TeX\ has its
%own way of terminating an input line.
不同计算机系统对%
\index{line!input}\index{line!end}\index{machine independence}%
输入行的具体定义有所不同。最常见的方式是用回车符（car\-riage return）紧跟换行符（line feed）作为行尾符，有些系统单用换行符作为行尾符，一些有定长存储（块存储）的系统则根本不使用行尾符。因此，\TeX 在结束一行输入时有自己特定的处理方式。

\begin{enumerate}
%\item An input line is read from an input file  (minus the
%line terminator, if any).
\item 从输入文件读取一行输入行（不包含可能的行尾符）。
%\item Trailing spaces are removed (this is for the systems
%with block storage, and it prevents confusion because these
%spaces are hard to see in an editor).
\item 移除行尾空格（这是针对块存储系统设计的，同时避免了因行为空格不可见而导致的混乱）。
%\item The \csterm endlinechar\par, by default \gram{return}
%(code~13) is appended.
%If the value of \cs{endlinechar} is negative
%\label{append:elc}%
%or more than~255 (this was 127 in versions of \TeX\ older
%than version~3; see page~\pageref{2vs3} for more differences),
%no character is appended.
%The effect then is the same as
%if the line were to end with a comment character.
\item 将编码为 \csterm endlinechar\par 的行终止符（默认是 \ascii 编码为 13 的 \gram{return}）添加在行尾。若 \cs{endlinechar} 的值为负\label{append:elc}或大于 255（在低于 \TeX 3 的版本中则是 127；\pageref{2vs3} 页介绍了更多差异），则输入处理器不会添加任何行终止符；在输入行尾添加注释字符也有相同的作用。
\end{enumerate}


%Computers may also differ in the character encoding
%(the most common schemes are \ascii{} and \ebcdic{}), so \TeX\
%converts the characters that are read from the file to its
%own character codes. These codes are then used exclusively,
%so that \TeX\ will perform the same on any system.
%For more on this, see Chapter~\ref{char}.
不同计算机在字符编码方面也存在差异（最常见的是 \ascii 和 \ebcdic{}）。因此，\TeX 有必要将从文件读入的字符转换为其内部编码。这些编码仅在 \TeX 内适用，因此 \TeX 在任何操作系统上的行为都保持一致。更多内容详见第~\ref{char}~章。

%\section{Category codes}
\section{分类码}

%Each of the 256 character codes (0--255) has an
%associated \indexterm{category code}, though not necessarily always the same one.
%There are 16 categories, numbered 0--15.
%When scanning the input, \TeX\
%thus forms character-code--category-code pairs.
%The input processor sees only these pairs; from them are formed
%character tokens, control sequence tokens, and parameter tokens.
%These tokens are then passed to \TeX's expansion and execution
%processes.
256 个字符编码（0--255）中的每一个都关联了一个不尽相同的\indexterm{分类码}。\TeX 的分类码共有 16 个，从 0 开始编号至 15。在扫描输入流的过程中，\TeX 会生成由字符编码和分类码组成的字符编码-分类码配对（character-code--category-code pairs）；而后，基于这些配对，输入处理器将它们处理成字符记号、控制序列记号和参数记号。这些记号随后被传给 \TeX 的展开处理器和执行处理器。

%A~character token is a character-code--category-code
%pair that is passed unchanged.
%A~control sequence token consists of one or more characters
%preceded by an escape character; see below.
%Parameter tokens are also explained below.
字符记号是简单的字符编码-分类码配对，它们会直接被传给展开处理器。控制序列记号则由转义字符引导，后接一个或多个字符组成。关于控制序列记号和参数记号的介绍详见下文。

%This is the list of the categories, together with a brief
%description. More elaborate explanations follow in this and
%later chapters.
以下就这些分类做简单说明，详细的阐述则散布在本章其他位置及后续章节当中。
\begin{enumerate}
\message{set counter}%\SetCounter:item=-1
\setcounter{enumi}{-1}
%\item\label{ini:esc}\index{category!0} Escape character; this signals
%  the start of a control sequence. \IniTeX\ makes the backslash
%  \verb-\- (code~92) an escape character.
\item\label{ini:esc}\index{category!0}
        转义字符；用于标记控制序列的开始。\IniTeX 默认使用反斜线 \verb-\- 作为转义字符（\ascii 码为 92）。
%\item\index{category!1} Beginning of group; such a character causes
%  \TeX\ to enter a new level of grouping. The plain format makes the
%  open brace \verb-{- \mdqon a beginning"-of-group character.  \mdqoff
\item\index{category!1}分组开始符；
        \TeX 遇到此类字符时，会进入新的一层分组。
        在 plain \TeX 中，默认的分组开始符是左花括号 \verb-{-。
%\item\index{category!2} End of group; \TeX\ closes the current level
%  of grouping.  plain \TeX\ has the closing brace \verb-}- as
%  end-of-group character.
\item\index{category!2}分组结束符；
        \TeX 遇到此类字符时，会关闭并从当前分组中退出。
        在 plain \TeX 中，默认的分组开始符是左花括号 \verb-}-。
%\item\index{category!3} Math shift; this is the opening and closing
%  delimiter for math formulas. plain \TeX\ uses the dollar
%  sign~\verb-$- for this.
\item\index{category!3}数学切换符；
        此类字符是数学公式的左右定界符。
        在 plain \TeX 中，默认的数学切换符是美元符号 \verb-$-。
%\item\index{category!4} Alignment tab; the column (row) separator in
%  tables made with \cs{halign} (\cs{valign}). In plain \TeX\ this is
%  the ampersand~\verb-&-.
\item\index{category!4}制表符；
        在 \cs{halign}（\cs{valign}）制作的表格中，作为列（行）间分隔符。
        在 plain \TeX 中，默认的制表符是与符号 \verb-&-。
%\item\index{category!5}\label{ini:eol} End of line; a character that
%  \TeX\ considers to signal the end of an input line.
%  \IniTeX\ assigns this code to the \gram{return}, that is, code~13.
%  Not coincidentally, 13~is also the value that \IniTeX\ assigns to
%  the \cs{endlinechar} parameter; see above.
\item\index{category!5}\label{ini:eol}行终止符；
        \TeX 用来表达输入行结束的字符。
        \IniTeX 将回车符 \gram{return}（\ascii 编码为 13）作为默认的行终止符。
        这就是为什么 \IniTeX 中，\cs{endlinechar} 的值是 13（详见前文）。
%\item\index{category!6} Parameter character; this indicates parameters
%  for macros.  In plain \TeX\ this is the hash sign~\verb-#-.
\item\index{category!6}参数符；
        用于表示宏的参数。
        在 plain \TeX 中，默认的参数符是井号 \verb-#-。
%\item\index{category!7} Superscript; this precedes superscript
%  expressions in math mode. It is also used to denote character codes
%  that cannot be entered in an input file; see below.  In plain
%  \TeX\ this is the circumflex~\verb-^-.
\item\index{category!7}上标符；
        在数学模式中表示上标；也可用于在输入文件中表示无法直接输入的字符（详见后文）。
        在 plain \TeX 中，默认的上标符即是 \verb_^_。
%\item\index{category!8} Subscript; this precedes subscript expressions
%  in math mode.  In plain \TeX\ the underscore~\verb-_- is used for
%  this.
\item\index{category!8}下标符；
        在数学模式中表示下标。
        在 plain \TeX 中，默认的下标符是下划线 \verb-_-。
%\item\index{category!9} Ignored; characters of this category are
%  removed from the input, and have therefore no influence on further
%  \TeX\ processing. In plain \TeX\ this is the \gr{null} character,
%  that is, code~0.
\item\index{category!9}被忽略字符；
        此类字符将被 \TeX 自输入流中清除，因此不会影响后续处理。
        在 plain \TeX 中，默认将空字符 \gr{null}（\ascii 编码为 0）设置为被忽略字符。
%\item\index{category!10}\label{ini:sp} Space; space characters receive
%  special treatment.  \IniTeX\ assigns this category to the \ascii{}
%  \gr{space} character, code~32.
\item\index{category!10}\label{ini:sp}空格符；
        \TeX 对待空格符的方式较为特殊。
        \IniTeX\ 将空格 \gr{space}（\ascii 编码为 32）作为默认的空格符。
%\item\index{category!11}\label{ini:let} Letter; in \IniTeX\ only the
%  characters \n{a..z}, \n{A..Z} are in this category. Often, macro
%  packages make some `secret' character (for instance~\n@) into a
%  letter.
\item\index{category!11}\label{ini:let}字母；
        \IniTeX 默认只将 \n{a ... z} 和 \n{A ... Z} 分为此类。
        在宏包中，某些「隐秘」字符（例如 \n{@}）会被暂时分为此类%
        \liamfnote{此类字符的分类码在普通用户文档中和在宏包中会有不同。在宏包中暂时分为此类，以起到「提示当前控制序列是内部的」这样的作用。}。
%\item\index{category!12}\label{ini:other} Other; \IniTeX\ puts
%  everything that is not in the other categories into this
%  category. Thus it includes, for instance, digits and punctuation.
\item\index{category!12}\label{ini:other}其他字符；
        \IniTeX 将所有未归于其他类的字符归于此类。
        因此，数字和标点都属于此类。
%\item\index{category!13} Active; active characters function as a
%  \TeX\ command, without being preceded by an escape character.  In
%  plain \TeX\ this is only the tie character~\verb-~-, which is
%  defined to produce an unbreakable space; see page~\pageref{tie}.
\item\index{category!13}活动字符；
        活动字符相当于一个无需转义字符前导的 \TeX 控制序列。
        在 plain \TeX 中，只有带子 \verb_~_ 是活动字符，表示不可断行的空格（参见第~\pageref{tie}~页）。
%\item\index{category!14}\label{ini:comm} Comment character; from a
%  comment character onwards, \TeX\ considers the rest of an input line
%  to be comment and ignores it. In \IniTeX\ the per cent sign \verb-%-
%  is made a comment character.
\item\index{category!14}\label{ini:comm}注释符；
        \TeX 遇见注释符后，会将从注释符开始到输入行尾的所有内容视作注释而忽略。
        在 \IniTeX 中，默认的注释符是百分号 \verb-%-。
%\item\index{category!15}\label{ini:invalid} Invalid character; this
%  category is for characters that should not appear in the
%  input. \IniTeX\ assigns the \ascii\ \gr{delete} character, code~127,
%  to this category.
\item\index{category!15}\label{ini:invalid}无效字符；
        该分类包含了不应在 \TeX 中出现的字符。
        \IniTeX 将退格字符（\ascii 编码为 127）\gr{delete} 归于此类。
\end{enumerate}

%The user can change the mapping
%of character codes to category codes
%with the \csterm catcode\par\ command (see Chapter~\ref{gramm}
%for the explanation of concepts such as~\gr{equals}):
%\begin{disp}\cs{catcode}\gram{number}\gr{equals}\gram{number}.\end{disp}
%In such a statement, the first number is often given in the form
%\begin{disp}\verb>`>\gr{character}\quad or\quad \verb>`\>\gr{character}\end{disp}
%both of which denote the character code of the character
%(see pages \pageref{char:code} and~\pageref{int:denotation}).
用户可使用 \csterm catcode\par 命令修改字符编码到分类码的映射（见第~\ref{gramm}~章对诸如 \gr{equals} 等概念的解释）：
\begin{disp}\cs{catcode}\gram{number}\gr{equals}\gram{number}.\end{disp}
该语句中，第一个参数可用如下方式给出：
\begin{disp}\verb>`>\gr{character}\quad 或\quad \verb>`\>\gr{character}\end{disp}
两种写法都表示该字符的字符编码（见第~\pageref{char:code}~页和~\pageref{int:denotation}~页）。

%The plain format defines
%\csterm active\par
%\begin{verbatim}
%\chardef\active=13
%\end{verbatim}
%so that one can write statements such as
%\begin{verbatim}
%\catcode`\{=\active
%\end{verbatim}
%The \cs{chardef} command is  treated
%on pages \pageref{chardef} and~\pageref{num:chardef}.
plain \TeX 格式使用 \cs{chardef} 命令（将在第~\pageref{chardef}~页和~\pageref{num:chardef}~页介绍）将 \csterm active\par 定义为：
\begin{verbatim}
\chardef\active=13
\end{verbatim}
因此上述语句可写成这样：
\begin{verbatim}
\catcode`\{=\active
\end{verbatim}

%The \LaTeX\ format has the control sequences
%\begin{verbatim}
%\def\makeatletter{\catcode`@=11 }
%\def\makeatother{\catcode`@=12 }
%\end{verbatim}
%in order to switch on and off the `secret' character~\n@
%(see below).
\LaTeX 格式定义了如下两个控制序列，用于开启或关闭「隐秘字符」\n{@}（详见下文）：
\begin{verbatim}
\def\makeatletter{\catcode`@=11 }
\def\makeatother{\catcode`@=12 }
\end{verbatim}

%The \cs{catcode} command can also be used to query category
%codes: in
%\begin{verbatim}
%\count255=\catcode`\{
%\end{verbatim}
%it yields a number, which can be assigned.
使用 \cs{catcode} 命令查询字符编码对应的分类码，可得到一个数字：
\begin{verbatim}
\count255=\catcode`\{
\end{verbatim}
在例中，\n{\{} 的分类码被保存在第 255 号 \cs{count} 寄存器中。

%Category codes can be tested by
%\begin{disp}\cs{ifcat}\gr{token$_1$}\gr{token$_2$}\end{disp}
%\TeX\ expands whatever is after \cs{ifcat} until two
%unexpandable tokens are found; these are then compared
%with respect to their category codes. Control sequence
%tokens are considered to have category code~16\index{category!16},
%which makes them all equal to each other, and unequal to
%all character tokens.
%Conditionals are treated further in Chapter~\ref{if}.
下列语句可用于检测两个记号的分类码是否相等：
\begin{disp}\cs{ifcat}\gr{token$_1$}\gr{token$_2$}\end{disp}
无论 \cs{ifcat} 后有什么，\TeX 都会将其展开，直至发现两个不可展开的记号；而后，\TeX 将比较这两个记号的分类码。控制序列的分类码被视为 16\index{category!16}；因此，所有控制序列的分类码都是相等的，而与所有字符记号的分类码都不相等。条件式的详细介绍见第~\ref{if}~章。

%\section{From characters to tokens}
\section{从字符到记号}

%The input processor
%of \TeX\ scans input lines from a file or from the
%user terminal, and converts the characters in the input
%to tokens. There are three types of tokens.
从文件或用户终端扫描输入行后，\TeX 的输入处理器会将其中的字符转换为记号。记号共有三种。
\begin{itemize}
%\item Character tokens: any character that is
%   passed on its own to \TeX's
%further levels of processing with an appropriate
%category code attached.
\item 字符记号：字符记号会被打上相应的分类码，而后直接传给 \TeX 的后续处理器。
%\item Control sequence tokens, of which there are two kinds:
%   an escape character
%\ldash that is,\message{ldash nobreak?}
%a character of category~0\index{category!0} \rdash  followed
%by a string of `letters' is
%lumped together into a \emph{control word}, which is a single token.
%An escape character followed by a single character that is not of
%category~11\index{category!11}, letter, is made into a
%\indextermsub{control}{symbol}.
%If the distinction between control word and control symbol is
%irrelevant, both are called
%\indextermsub{control}{sequence}.
\item 控制序列记号：严格来说，控制序列记号分为两种。其一是\emph{控制词}\ldash 分类码为 0 的字符\index{category!0}后紧跟一串字母（分类码是 11\index{category!11}）。其二是\emph{控制字符}\index{控制!控制字符}\ldash 转义字符后紧跟单个非字母字符（分类码不是 11）。在无需区分控制词和控制字符的场合，它们统称为\cindextermsub{控制}{序列}。

%The control symbol that results from an escape character followed
%\csterm \char32\par
%by a space character is called
%\indextermbus{control}{space}.
由转义字符与一个空格字符 \cstoidx \char32\par\cs{}\textvisiblespace 构成的控制序列，称为\cindextermsub{控制}{空格}。
%\item Parameter tokens: a parameter character \ldash that is, a
%  character of category~6\index{category!6}, by default~\verb=#=
%  \rdash followed by a digit \n{1..9} is replaced by a parameter
%  token.  Parameter tokens are allowed only in the context of macros
%  (see Chapter~\ref{macro}).
\item 参数记号：由一个参数符\ldash 分类码为 6\index{category!6}，默认为 \verb-#-\rdash 和一个紧跟着的\n{1..9} 中的数字构成。参数记号只能在宏（见第~\ref{macro}~章）的上下文中出现。

%A macro parameter character followed by another macro parameter
%character (not necessarily with the same character code)
%is replaced by a single character token.
%This token has category~6 (macro parameter), and the character
%code of the second parameter character.
%The most common instance is of this is
%replacing \n{\#\#} by~\n{\#$_6$}, where the subscript
%denotes the category code.
连续两个参数符（字符编码不一定相同）会被替换为单个字符记号。该字符记号的分类码是 6（参数符），字符编码则与上述连续两个参数符中后者的字符编码相同。最常见的情形是 \n{\#\#} 会被替换为 \n{$\text{\#}_6$}。此处下标表示分类码。
\end{itemize}

%\section{The input processor as a finite state automaton}
\section{输入处理器是有限状态自动机}
\label{input:states}

%\TeX's input processor can be considered to be a finite state
%automaton with three \indextermbus{internal}{states},
%that is, at any moment in time it is in one of three states,
%and after transition to another state there is no memory of the
%previous states.
\TeX 的输入处理器有三种\cindextermbus{内部}{状态}，可看做是一个有限状态自动机。这也就是说，在任意瞬间，\TeX 的输入处理器都处于并且只能处于三种状态的一种；并且在状态切换完成后，\TeX 的输入处理器对先前的状态没有任何记忆。

%\subsection{State {\itshape N}: new line}
\subsection{\cstate N：新行}

%State {\itshape N} is entered at the beginning of each new input line,
%and that is the only time \TeX\ is in this state. In state~{\itshape
% N} all space tokens (that is, characters of
%category~10\index{category!10}) are ignored; an end-of-line character
%is converted into a \cs{par} token.  All other tokens bring \TeX\ into
%state~{\itshape M}.
当且仅当遇到新的输入行时，\TeX 会进入\cstate{N}。在该状态下，\TeX 遇到空格记号（分类码为 10 的字符\index{category!10}）即会将之忽略；遇到行终止符则会将之替换为 \cs{par} 记号；遇到其它记号，则会切换到\cstate{M}。

%\subsection{State {\itshape S}: skipping spaces}
\subsection{\cstate S：忽略空格}

%State {\itshape S} is entered in any mode after a control word or
%control space (but after no other control symbol),
%or, when in state~{\itshape M}, after a space.
%In this state all subsequent spaces or end-of-line characters
%in this input line are discarded.
在\cstate{M} 下遇到空格记号，或在任意状态下遇到控制词或控制空格之后（注意其他控制字符不在此列），\TeX 会进入\cstate S。在该状态下，\TeX 遇到空格记号或行终止符都会忽略。

%\subsection{State {\itshape M}: middle of line}
\subsection{\cstate M：行内}

%By far the most common state is~{\itshape M}, `middle of line'.
%It is entered after characters of categories
%1--4, 6--8, and 11--13, and after control symbols
%other than control space.
%An end-of-line character encountered in this state
%results in a space token.
显然，\cstate M 是输入处理器最常见的状态，它表示「处理到输入行的中间」（middle of line）。
当输入处理器遇到分类码为 1--4、6--8 以及 11--13 的字符或者控制字符（不包括控制空格）之后，就会进入该状态。在该状态下，\TeX 会将行终止符替换为空格记号。

\input figs1
\begin{quotation}
  \figmouth
\end{quotation}

%%\point[hathat] Accessing the full character set
%\section{Accessing the full character set}
%\label{hathat}
%\point[hathat] Accessing the full character set
\section{访问整个字符集}
\label{hathat}

%Strictly speaking, \TeX's input processor
%is not a finite state automaton.
%This is because during the scanning of the input line
%all trios consisting of two {\sl equal\/} superscript characters
%\index{\char94\char94\ replacement}
%(category code~7\index{category!7}) and a subsequent character
%(with character code~$<128$)
%are replaced by a single character with a character
%code in the range 0--127,
%differing by 64 from that of the original character.
大体上，\TeX 的输入处理器可以认为是一个有限状态自动机，但严格来说它并不是。
输入处理器在扫描输入行期间，为了让用户能够输入一些特殊字符，而设计了这样的机制：
两个{\slshape 相同}的上标符（分类码为 7\index{\char94\char94\ replacement}）以及一个字符编码小于 128 的字符（暂称原字符）组成的三元组会被替换为一个新的字符。该字符的编码位于 0 -- 127 之间，并且与原字符的编码相差 64。

%This mechanism can be used, for instance, to access positions in a font
%corresponding to character codes that cannot
%be input, for instance because they are \ascii{} control characters.
%The most obvious examples are the \ascii{} \gr{return}
%and \gr{delete} characters; the corresponding
%positions 13 and 127 in a font are
%accessible as \verb>^^M> and~\verb>^^?>.
%However, since the category of \verb>^^?> is 15\index{category!15}, invalid,
%that has to be changed before character 127 can be accessed.
这种机制可用于访问字体中难以输入的字符。
例如 \ascii 中的控制符号 \gr{return}（\ascii 编码为 13）和 \gr{delete}（\ascii 编码为 127）可分别使用 \verb>^^M> 和 \verb>^^?> 进行访问。
当然，由于 \verb>^^?> 是无效字符（分类码是 15\index{category!15}），故而在访问前需要先修改其分类码。

%In \TeX3 this mechanism has been
%modified and extended to access 256 characters:
%any quadruplet \verb-^^xy- where both \n x and \n y are lowercase
%hexadecimal digits \n0--\n9, \n a--\n f,
%is replaced by a character in the
%range 0--255, namely the character the number of which is
%represented hexadecimally as~\n{xy}.
%This imposes a slight restriction on the applicability
%of the earlier mechanism: if, for instance, \verb>^^a>
%is typed to produce character~33, then a following
%\n0--\n9, \n{a}--\n{f} will be misunderstood.
在 \TeX3 中，该机制被扩展为可以访问 256 个字符：
四元组 \verb-^^xy- 会被替换为一个编码在 0 -- 255 之间的字符；其中 \n{x} 和 \n{y} 是小写十六进制数字 \n{0}--\n{9}, \n{a}--\n{f}，而 \n{xy} 正是该字符编码的十六进制表示。
这一扩展也给先前的机制带来了一些限制：例如 \verb>^^7a> 会被输入处理器替换为 \verb>z>，而不是 \verb>wa>\liamfnote{\n{w} 和 \n{7} 的 \ascii 编码之差为 64。由于 \n{7a} 可被理解为是一个十六进制数，所以 \TeX 贪婪地将四元组看做一个整体替换为 \n{z}。}。

%While this process makes \TeX's input processor
%somewhat more powerful
%than a true finite state automaton,
%it does not interfere with the rest of
%the scanning. Therefore it is conceptually simpler to pretend that
%such a replacement of triplets or quadruplets
%of characters, starting with~\verb>^^>, is performed in advance.
%In actual practice this is not possible,
%because an
%input line may assign category code~7\index{category!7} to some
%character other than the circumflex, thereby
%influencing its further processing.
这种机制一方面使得 \TeX 的输入处理器在某种意义上比真正的有限状态自动机更为强大，另一方面还不会影响其余的扫描过程。因此，在概念上，可以简单地假装认为这种对 \verb>^^> 引导的三元组或四元组的替换是提前进行的。
不过，在实践中这样做是不可能的。这是因为，在处理输入行的过程中，用户可能将其他字符分类为第 7 类\index{category!7}，从而影响后续处理\liamfnote{也就是说，如果没有其他字符被分类为第 7 类，则这个假设在实践中也是可行的。}。

%\section{Transitions between internal states}
\section{内部状态切换}

%Let us now discuss the effects on the internal state
%of \TeX's input processor when
%certain category codes are encountered in the input.
现在我们来讨论特定分类码的字符对 \TeX 输入处理器内部状态的影响。

%\subsection{0: escape character}
%\index{escape!character|see{character, escape}}
\subsection{0：转义字符}
\index{转义!字符|see{字符, 转义}}

%When an \indextermbus{escape}{character} is encountered,
%\TeX\ starts forming a control sequence token.
%Three different types of control sequence can result,
%depending on the category code of the character that
%follows the escape character.
遇到\cindextermbus{转义}{字符}后，\TeX 开始构建控制序列记号。取决于转义字符后面的字符之分类码，所得的控制序列记号有三种类型。

\begin{itemize}
%\item
%If the character following the escape is of category~11\index{category!11},
%letter, then \TeX\ combines the escape,
%that character and all following
%characters of category~11, into a control word.
%After that \TeX\
%goes into state~{\itshape S}, skipping spaces.
\item 若转义字符后的字符之分类码为 11\index{category!11}，即字母，则 \TeX 将转义字符和之后连续的分类码为 11 的字符构建成一个控制词，而后进入\cstate{S}。
%\item
%With a character of category~10\index{category!10}, space, a control
%symbol called control space results, and \TeX\ goes into
%state~{\itshape S}.
\item 若转义字符后的字符之分类码为 10\index{category!10}，即空格，则 \TeX 将它们构建成名为控制空格的控制字符，而后进入\cstate{S}。
%\item
%With a character of any other category code
%a control symbol results, and \TeX\ goes into state~{\itshape M},
%middle of line.
\item 若转义字符后的字符之分类码不是 10 也不是 11，那么 \TeX 将它们构建成控制字符，而后进入\cstate{M}。
\end{itemize}

%The letters of a control sequence name have to be all on one line;
%a control sequence name is not continued on the next line
%if the current line ends with a comment sign, or if (by letting
%\cs{endlinechar} be outside the range~0--255)
%there is no terminating character.
控制序列名字的所有字符必须在同一输入行之中；控制序列的名字不能跨行，即使当前行以注释符结尾或者没有行终止符（通过将 \cs{endlinechar} 设置为 0 -- 255 之外的值）。

%\subsection{1--4, 7--8, 11--13: non-blank characters}
\subsection{1–4, 7–8, 11–13：非空字符}

%Characters of category codes 1--4, 7--8, and 11--13 are made
%into tokens, and \TeX\ goes into state~{\itshape M}.
分类为 1--4、7--8 及 11--13 的字符会被转换为字符记号，而后 \TeX 进入\cstate{M}。

%\subsection{5: end of line}
\subsection{5：行终止符}

%Upon encountering an end-of-line character,
%\TeX\ discards the rest of the
%line, and starts processing the next line,
%in state~{\itshape N}. If the current state was~{\itshape N},
%that is, if the
%line so far contained at most spaces, a~\cs{par} token
%is inserted; if the state was~{\itshape M}, a~space token is inserted,
%and in state~{\itshape S} nothing is inserted.
遇到行终止符时，\TeX 的行为取决于输入处理器当前的状态。但不论处于何种状态，\TeX 会忽略当前行\liamfnote{指源文件中的当前行。}，而后进入\cstate{N} 并开始处理下一行。
\begin{itemize}
  \item 处于\cstate N，即当前行在此前只有空格，\TeX 将插入 \cs{par} 记号；
  \item 处于\cstate M，\TeX 将插入一个空格记号；
  \item 处于\cstate S，\TeX 将不插入任何记号。
\end{itemize}

%Note that by `end-of-line character' a character with category
%code~5 is meant. This is not necessarily the \cs{endlinechar},
%nor need it appear at the end of the line.
%See below for further remarks on line ends.
此处「行终止符」指得是分类码为 5 的字符。因此，它的字符编码不一定是 \cs{endlinechar}，也不一定非得出现在行尾。详见后文。

%\subsection{6: parameter}
\subsection{6：参数符}

%A \indextermbus{parameter}{character} \ldash usually~\verb=#= \rdash  can be
%followed by either a digit \n{1..9}
%in the context of macro definitions
%\altt
%or by another parameter character.
%In the first case a `parameter token' results,
%in the second case only a single parameter character
%is passed on as a character token for further processing.
%In either case \TeX\ goes into state~{\itshape M}.
在宏定义的上下文中，\emph{参数符} \ldash 通常为 \verb-#-\rdash\ 可跟 \n{1..9} 中的数字或另一个参数符。前者产生参数记号，而后者产生单个参数字符记号待后续处理。在这两种情形中，\TeX 都会进入\cstate{M}。

%A parameter character can also appear on its own in an
%alignment preamble (see Chapter~\ref{align}).
单独出现的参数符也被用于阵列的模板行（见第~\ref{align}~章）。

%\subsection{7: superscript}
\subsection{7：上标符}

%A superscript character is handled like most non-blank
%characters, except in the case where it is followed
%by a  superscript character of the same character code.
%The process
%that replaces these two characters plus the following character
%(possibly two characters in \TeX3) by another character
%was described above.
\TeX 对上标符的处理和大多数非空字符一样，仅在上述替换机制中有所不同：连续两个字符编码相同的上标符及其后字符组成的三元组或四元组会按规则被替换为其它字符。

%\subsection{9: ignored character}
\subsection{9：被忽略符}

%Characters of category 9 are ignored; \TeX\ remains in the same state.
分类码为 9 的字符会被忽略，且不会影响 \TeX 的状态。

%\subsection{10: space}
\subsection{10：空格符}

%A token with category code 10 \ldash this is called a \gr{space token},
%irrespective of the character code \rdash
%is ignored in states {\itshape N} and~{\itshape S}
%(and the state does not change);
%in state~{\itshape M} \TeX\ goes into state~{\itshape S}, inserting
%a token that has category~10 and character code~32
%(\ascii{} space).
%This implies that the character code of the space token may change
%from the character that was actually input.
在\cstate{N} 和\cstate{S} 中，不论字符编码是多少，空格记号\ldash 分类码为 10 的记号\rdash\ 都会被忽略；同时 \TeX 的状态保持不变。在\cstate{M} 中，\TeX 会向正在构建的记号序列中插入 $ \text{\textvisiblespace}_{10} $（\ascii 编码中的空格，编码为 32），并进入\cstate{S}。这意味着空格记号的字符编码可能与输入字符的编码不同\liamfnote{不论输入的是哪一个分类码为 10 的字符，输入处理器都会将其替换为字符编码为 32 的 \ascii 空格。}。

%\subsection{14: comment}
\subsection{14：注释符}

%A comment character causes \TeX\ to discard
%the rest of the line, including the comment character.
%In particular, the end-of-line character is not seen,
%so even if the comment was encountered in state~{\itshape M}, no space
%token is inserted.
\TeX 遇到注释符后，会忽略当前行之后包括注释符本身在内的所有内容。特别地，\TeX 会忽略行终止符。因此，哪怕是在\cstate{M} 下，\TeX 也不会插入额外的空格记号。

%\subsection{15: invalid}
\subsection{15：无效字符}

%Invalid characters cause an error message. \TeX\ remains in
%the state it was in.
%However, in the context of a control symbol an invalid character
%is acceptable. Thus \verb>\^^?> does not cause any error messages.
\TeX 遇到无效字符时会报错，而 \TeX 自身会停留在之前的状态。不过，在控制字符的上下文中，无效字符是合法的。因此， \verb>\^^?> 不会触发报错。

%%\point[cat12] Letters and other characters
%\section{Letters and other characters}
%\label{cat12}
%\point[cat12] Letters and other characters
\section{分类码中的字母与其他字符}
\label{cat12}

%In most programming languages identifiers can consist
%of both letters and digits (and possibly some other
%character such as the underscore), but control sequences in \TeX\
%are only allowed to be formed out of characters of category~11,
%letter. Ordinarily, the digits and punctuation symbols have
%category~12, other character.
%However, there are contexts where \TeX\ itself
%generates a string of characters, all of which have
%category code~12, even if that is not their usual
%category code.
大部分编程语言的标识符可由字母与数字构成（还可能包含其他诸如下划线之类的字符）。但是，在 \TeX 中，控制序列的名字只能由第 11 类字符（即字母）组成。而通常，数字和标点的分类码是 12，即其他字符。

此外，\TeX 可以产生一些由第 12 类字符组成的字符串，哪怕其中的字符原本并非都是第 12 类字符。
%This happens when the operations
%\cs{string},
%\cs{number},
%\cs{romannumeral},
%\cs{jobname},
%\cs{fontname},
%\cs{meaning},
%and \cs{the}
%are used to generate a stream of character tokens.
%If any of the characters delivered by such a command
%is a space character (that is, character code~32),
%it receives category code~10, space.
此类字符串可由 \cs{string}、\cs{number}、\cs{romannumeral}、\cs{jobname}、\cs{fontname}、\cs{meaning} 以及 \cs{the} 等命令生成。若这些命令产生的字符串包含空格字符（\ascii 编码为 32）\liamfnote{注意，此处说的是空格字符，而非是 \TeX 的空格记号。前者讨论的是字符编码，而后者讨论的是分类码。}，则在输出的字符串中，该字符的分类码为 10。

%For the extremely rare case where a hexadecimal digit has been
%hidden in a control sequence, \TeX\ allows \n A$_{12}$--\n F$_{12}$
%to be hexadecimal digits, in addition to the ordinary
%\n A$_{11}$--\n F$_{11}$ (here
%the subscripts denote the category codes).
在极个别情况下，控制序列的展开中可能会包含十六进制数字；因此，除了通常表示字母的 $ \text{\n{A}}_{11} $ -- $ \text{\n{F}}_{11} $ 之外，\TeX 中还有表示十六进制数字的 $ \text{\n{A}}_{12} $ -- $ \text{\n{F}}_{12} $。

%For example,
%\begin{disp}\verb>\string\end>\quad gives four character tokens\quad
%\n{\char92$_{12}$e$_{12}$n$_{12}$d$_{12}$} \end{disp}
%Note that the \indextermbus{escape}{character}~\texttt{\char`\\}$_{12}$\label{use:escape}
%is used in the output only because the
%value of \cs{escapechar} is the character code for the
%backslash. Another value of \cs{escapechar} leads to another
%character in the output of \cs{string}.
%The \cs{string} command is treated further in Chapter~\ref{char}.
举例来说，
\begin{disp}\verb>\string\end>\quad 得到四个字符记号 \quad\n{\char92$_{12}$e$_{12}$n$_{12}$d$_{12}$} \end{disp}
注意，此处输出中有\emph{转义字符}\index{字符!转义字符}~\texttt{\char`\\}$_{12}$\label{use:escape} 的原因是宏 \cs{escapechar} 的值是反斜线的字符编码。而若将 \cs{escapechar} 的值改为其它字符的编码，则 \cs{string} 将输出另一个字符。有关 \cs{string} 命令的详细内容参见第~\ref{char}~章。

%Spaces can wind up in control sequences:
%\begin{disp}\verb>\csname a b\endcsname>\end{disp} gives a control sequence
%token in which one of the three characters is a space.
%Turning this control sequence token into a string of characters
%\begin{disp}\verb>\expandafter\string\csname a b\endcsname>\end{disp}
%gives \n{\char92$_{12}$a$_{12}$\char32$_{10}$b$_{12}$}.
通过一些特殊技巧，空格也可以出现在控制序列的名字当中：
\begin{disp}\verb>\csname a b\endcsname>\end{disp}
是一个控制序列记号，其名称由三个字符组成，并且其中之一是空格符。将这个控制序列转化为字符串
\begin{disp}\verb>\expandafter\string\csname a b\endcsname>\end{disp}
可得 \n{\char92$_{12}$a$_{12}$\textvisiblespace$_{10}$b$_{12}$}。

%As a more practical example, suppose there exists a sequence
%of input files \n{file1.tex}, \n{file2.tex}\label{ex:jobnumber},
%and we want to
%write a macro that finds the number of the input file
%that is being processed. One approach would be to write
%\begin{verbatim}
%\newcount\filenumber  \def\getfilenumber file#1.{\filenumber=#1 }
%\expandafter\getfilenumber\jobname.
%\end{verbatim}
%where the letters \n{file} in the parameter text of the
%macro (see Section~\ref{param:text}) absorb that part of the
%jobname, leaving the number as the sole parameter.
举个更加实用的例子。假设有一系列输入文件：\n{file1.tex}、\n{file2.tex}\label{ex:jobnumber}，而我们希望写一个宏来输出当前正在处理的文件的序号。第一种解法是：
\begin{verbatim}
\newcount\filenumber
\def\getfilenumber file#1.{\filenumber=#1 }
\expandafter\getfilenumber\jobname.
\end{verbatim}
宏定义中，参数文本中的 \n{file}（见第~\ref{param:text}~节）会吸走 \cs{jobname} 中的 \n{file} 部分，从而留下文件编号作为宏的参数。

%However, this is slightly incorrect: the letters \n{file} resulting
%from the \cs{jobname} command have category code~12, instead of
%11 for the ones in the definition of \cs{getfilenumber}.
%This can be repaired as follows:
%\begin{verbatim}
%{\escapechar=-1
% \expandafter\gdef\expandafter\getfilenumber
%       \string\file#1.{\filenumber=#1 }
%}
%\end{verbatim}
%Now the sequence \verb>\string\file> gives the four
%letters \n{f$_{12}$i$_{12}$l$_{12}$e$_{12}$};
%the \cs{expandafter} commands let this be executed prior to
%the macro definition;
%the backslash is omitted because we put\handbreak \verb>\escapechar=-1>.
%Confining this value to a group makes it necessary to use~\cs{gdef}.
但这段代码有些小问题。\cs{jobname} 输出的 \n{file} 四个字符，其分类码为 12。但在 \cs{getfilenumber} 的定义中，\n{file} 四个字符的分类码是 11。为此，需要对上述代码进行以下修正：
\begin{verbatim}
{\escapechar=-1
 \expandafter\gdef\expandafter\getfilenumber
       \string\file#1.{\filenumber=#1 }
}
\end{verbatim}
此处，\verb>\escapechar=-1> 让 \cs{string} 忽略反斜线；因此 \verb>\string\file> 的结果会是 \n{f$_{12}$i$_{12}$l$_{12}$e$_{12}$} 四个字符。为了在宏定义是得到分类码为 12 的四个字符，我们使用 \cs{expandafter} 命令让 \verb>\string\file> 在宏定义之前先行展开；而由于 \cs{escapechar} 的设定被放在分组内部，所以我们需要使用 \cs{gdef} 进行宏定义。

%\section{The \lowercase{\n{\char92par}} token}
\def\cspar{\cs{par}}
\section{\n{\protect\cspar} 记号}

%\TeX\ inserts a \csterm par\par\ token into the input after
%an \indextermbus{empty}{line}, that is, when
%encountering a character with category code~5,
%end of line, in state~{\itshape N}.
%It is good to realize when exactly this happens:
%since \TeX\ leaves state~{\itshape N}
%when it encounters any token but a space,
%a~line giving a \cs{par} can only contain characters
%of category~10. In particular, it cannot end with a comment
%character. Quite often this fact is used the other way around:
%if an empty line is wanted for the layout of the input
%one can put a comment sign on that line.
在遇到\cindextermbus{空}{行}之后，也就是在\cstate{N} 遇到行终止符（分类码为 5）之后\liamfnote{此处说的是 \TeX 添加的行终止符，而不是输入文件中的行尾符。输入文件中的行尾符已在初始化处理中被移除并替换成了行终止符。}，\TeX 会向输入中插入一个 \csterm par\par 记号。具体来说，由于 \TeX 遇到任何非空格字符，都会从\cstate{N} 转移走，因此空行只能包含分类码为 10 的字符。特别地，空行不能以注释符结尾\liamfnote{此时，\TeX 添加的行终止符位于注释之后，故而该行终止符会被 \TeX 忽略。}。因此，若输入文件中因格式美观需要保留空行，则可以在该行中放一个注释符。这算是 \TeX 这一特性的常见用法。

%Two consecutive empty lines generate two \cs{par} tokens.
%For all practical purposes this is equivalent to one \cs{par},
%because after the first one \TeX\ enters vertical mode, and
%in vertical mode a \cs{par} only
%exercises the page builder,
%and clears the paragraph shape parameters.
两个连续的空行产生两个连续的 \cs{par} 记号，而实际上它们等同于一个 \cs{par} 记号：在遇见第一个 \cs{par} 记号之后，\TeX 会进入竖直模式，而在竖直模式中，\cs{par} 只是充当 \TeX 页面构建器，起到清空段落形状参数的作用。

%A \cs{par} is also inserted into the input when \TeX\ sees a
%\gram{vertical command} in unrestricted horizontal mode.
%After the \cs{par} has been read and expanded, the
%vertical command is examined anew (see Chapters~\ref{hvmode}
%and~\ref{par:end}).
\TeX 于非受限水平模式（unrestricted horizontal mode）遇到竖直命令（\gram{vertical command}）时，也会向输入插入一个 \cs{par} 记号。当该 \cs{par} 被读取和展开后，上述竖直命令会被重新处理（详见第~\ref{hvmode}~章和~\ref{par:end}~章）。

%The \cs{par} token may also be inserted by the \cs{end}
%command that finishes off the run of \TeX; see Chapter~\ref{output}.
\cs{end} 命令\liamfnote{注意这里说的不是 \LaTeX 中结束环境的 \cs{end}\marg{\meta{env-name}} 命令，而就是 \cs{end} 这个 plain \TeX 命令。}也会向输入插入 \cs{par} 记号，而后结束 \TeX\ 的运行；见第~\ref{output}~章。

%It is important to realize that \TeX\ does what it normally does
%when encountering an empty line
%(which is ending a paragraph)
%only because of the default definition of the \cs{par} token.
%By redefining \cs{par} the behaviour
%caused by empty lines and vertical commands can be changed completely,
%and interesting special effects can be achieved.
%In order to continue to be able to cause the actions normally
%associated with \cs{par}, the synonym \cs{endgraf} is
%available in the plain format. See further Chapter~\ref{par:end}.
值得注意的是，遇到空行时 \TeX 通常的行为（结束当前自然段）完全取决于 \cs{par} 记号的默认定义。重定义 \cs{par} 后，空行和竖直命令的行为可能就完全两样了；因此，我们可以借此实现一些特别的效果。在这种情况下，为了使用正常的 \cs{par} 的功能，plain \TeX 提供了其同义词 \cs{endgraf}。详见第~\ref{par:end}~章。

%The \cs{par} token is not allowed to be part of a macro
%argument, unless the macro has been declared to be \cs{long}.
%A \cs{par} in the argument of a non-\cs{long} macro
%prompts \TeX\ to give a `runaway argument' message.
%Control sequences that have been \cs{let} to \cs{par}
%(such as \cs{endgraf}) are allowed, however.
除非宏被声明为 \cs{long} 的，不然 \cs{par} 记号不能出现在宏的参数当中。对于非 \cs{long} 声明的宏，若其参数中包含 \cs{par} 记号，则 \TeX 会给出「runaway argument」的报错。不过，使用 \cs{let} 定义的与 \cs{par} 同义的控制序列（例如 \cs{endgraf}）是允许出现在这些宏的参数之中的。

%\section{Spaces}
\section{空格}

%This section treats some of the aspects of the
%\indextermbus{space}{character} and \indextermbus{space}{token} in the
%initial processing stages of \TeX. The topic of spacing in text
%typesetting is treated in Chapter~\ref{space}.
这一节讨论输入处理器中有关\cindextermbus{空格}{字符}和\cindextermbus{空格}{记号}的一些内容。有关文本排版中的空格，留待第~\ref{space}~章讨论。

%\subsection{Skipped spaces}
\subsection{被忽略的空格}

%From the discussion of the internal states of \TeX's
%input processor
%it is clear that some spaces in the input never reach the
%output; in fact they never get past the input processor.
%These are for instance the spaces at the beginning
%of an input line, and the spaces following the one
%that lets \TeX\ switch to state~{\itshape S}.
在上述有关输入处理器内部状态的讨论中，我们不难发现，有些空格在输入处理器中就被抛弃了，因此永远不会被输出：输入行开头的空格以及在让 \TeX 进入\cstate{S} 的字符之后的空格。
%
%On the other hand, line ends can generate spaces (which are not
%in the input) that may wind up in the output.
%There is a third kind of space: the spaces that get past the
%input processor,
%or are even generated there, but still do not wind up in the
%output. These are the \gram{optional spaces} that the
%syntax of \TeX\ allows in various places.
另一方面，行终止符尽管不在输入中（而是由 \TeX 添加的），但能产生可输出的空格。除此之外，还有第三种空格：它们可以通过输入处理器，甚至干脆由输入处理器产生，但也不会被输出。那便是非强制空格（\gram{optional spaces}）。在 \TeX 的语法中，很多地方都会出现此类空格。

%\subsection{Optional spaces}
\subsection{非强制空格}

%The syntax of \TeX\ has the concepts of \indextermbus{optional}{spaces}
%and `one optional space':
\TeX 语法中有所谓\cindextermbus{非强制}{空格}与\emph{单个非强制空格}的概念：
\begin{disp}\gr{one optional space} $\longrightarrow$
\gr{space token} $|$ \gr{empty}\nl
\gr{optional spaces} $\longrightarrow$
\gr{empty} $|$ \gr{space token}\gr{optional spaces}\end{disp}
%In general, \gr{one optional space} is allowed after
%numbers and glue specifications, while \gr{optional spaces} are
%allowed whenever a space can occur inside a number
%(for example, between a minus sign and the digits of the number)
%or glue specification (for example, between \n{plus} and \n{1fil}).
%Also, the definition of \gr{equals} allows \gr{optional spaces}
%before the \n= sign.
通常单个非强制空格（\gr{one optional space}）允许出现在数字和粘连说明之后；而非强制空格（\gr{optional spaces}）允许出现在数字或粘连中任意允许出现空格的地方（比如负号与数字之间，又比如 \n{plus} 和 \n{1fil} 之间）。此外，根据 \gr{equals} 的定义，非强制空格允许出现在 \n{=} 之前。

%Here are some examples of optional spaces.
以下是有关非强制空格的一些例子：

\begin{itemize}
%\item A number can be delimited by \gr{one optional space}.
%This prevents accidents (see Chapter~\ref{number}),
%and it speeds up processing, as \TeX\ can
%detect more easily where the \gram{number} being read ends.
%Note, however, that not every `number' is a \gram{number}:
%for instance the {\tt 2} in \cs{magstep2} is not a number,
%but the  single token that is the parameter of the
%\cs{magstep} macro. Thus a space or line end after this
%is significant. Another example is a parameter number,
%for example~\n{\#1}: since at most nine parameters are allowed, scanning
%one digit after the parameter character suffices.
\item \gr{one optional space} 可用于界定数字的范围。这有助于避免一些意外情况（见第~\ref{number}~章），同时能加速 \TeX 的处理过程——这是因为借助单个非强制空格，\TeX 能更容易地界定当前正在读入的 \gram{number} 于何时结束。
注意，并非每个「数值」都是 \gram{number}。例如说，\cs{magstep2} 中的 {\ttfamily 2} 就不是数字，而是作为 \cs{magstep} 的参数的单独的字符记号。因此，在其后加上空格或行终止符是有意义的。此外，宏参数中的数字，例如 \n{\#1}：因为一个宏最多允许有 9 个参数，故而只需在参数符后扫描一位数字即可\liamfnote{而不需要单个非强制空格来辅助界定数字的范围。}。

%\item From the grammar of \TeX\
%it follows that the
%keywords \n{fill} and \n{filll}
%consist of \n{fil} and
%separate {\tt l}$\,$s, each of which is a keyword
%(see page~\pageref{keywords} for a more elaborate discussion),
%and hence can be followed by optional spaces.
%Therefore forms such as \hbox{\n{fil L l}} are also valid.
%This is a potential source of strange accidents.
%In most cases, appending a \cs{relax} token prevents
%such mishaps.
\item 根据 \TeX 的语法，关键字 \n{fill} 及 \n{filll} 由 \n{fil} 与若干单独的 {\ttfamily l} 字符记号组成（详见第~\pageref{keywords}~页）；因此此处允许非强制空格。据此，例如 \hbox{\n{fil\textvisiblespace L\textvisiblespace l}} 是合法的关键字\liamfnote{\TeX 的关键字不区分大小写，并且在关键字前允许有非强制空格。}。这里有一些潜在的问题，可能导致莫名其妙的情况。大多数情况下，在关键字后面加上一个 \cs{relax} 即可避免此类问题。

%\item The primitive command \csterm ignorespaces\par\
%may come in handy as the final command in a macro definition.
%As it gobbles up
%optional spaces, it can be used to prevent spaces following the
%closing brace of an argument from winding up in the output
%inadvertently. For example, in
%\begin{verbatim}
%\def\item#1{\par\leavevmode
%    \llap{#1\enspace}\ignorespaces}
%\item{a/}one line \item{b/} another line \item{c/}
%yet another
%\end{verbatim}
%the \cs{ignorespaces} prevents spurious
%spaces in the second and third item.
%An empty line
%after \cs{ignorespaces} will still insert a \cs{par}, however.
\item \TeX 原语 \csterm ignorespaces\par 会吃掉其后的非强制空格；故此可将其插入宏定义的末尾，以避免将参量右花括号后的空格无意带入输出当中。例如说：
\begin{verbatim}
\def\item#1{\par\leavevmode
    \llap{#1\enspace}\ignorespaces}
\item{a/}one line \item{b/} another line \item{c/}
yet another
\end{verbatim}
此处，\cs{ignorespaces} 吃掉了第二、第三两次调用之后的空格，而这些空格是不希望被排版输出的。不过，在 \cs{ignorespaces} 之后的空行仍然会插入 \cs{par} 记号。
\end{itemize}

%\subsection{Ignored and obeyed spaces}
\subsection{被忽略和被保留的空格}

%After control words spaces are ignored. This is not an
%instance of optional spaces, but it is due to the fact that
%\TeX\ goes into state~{\itshape S}, skipping spaces, after control
%words. Similarly an end-of-line character is skipped
%after a control word.
\TeX 会忽略控制词之后的空格。不过这不是因为控制词之后的空格是非强制空格，而是因为 \TeX 在遇到控制词之后会进入\cstate{S}，从而忽略空格。类似地，控制词之后的行终止符也会被忽略。

%Numbers are delimited by only \gr{one optional space},
%but still
%\begin{disp}\n{a\char92 count0=3\char32\char32b}\quad gives\quad `ab',\end{disp}
%because \TeX\ goes into state~{\itshape S} after the first
%space token. The second space is therefore skipped
%in the input processor of \TeX; it never becomes a space token.
数字由单个非强制空格界定，但是
\begin{disp}\n{a\char92 count0=3\char32\char32b}\end{disp}
的输出是 `ab'。这是因为 \TeX 在第一个空格记号\liamfnote{第一个空格记号是单个非强制空格，界定了单个数字。}之后会进入\cstate{S}，从而第二个空格会被 \TeX 的输入处理器忽略，永远不会变成空格记号。

%Spaces are skipped furthermore when \TeX\ is in state~{\itshape N},
%newline. When \TeX\ is processing in vertical mode
%space tokens (that is, spaces that were not skipped)
%are ignored. For example, the space inserted (because of the line end)
%after the first box in
%\begin{verbatim}
%\par
%\hbox{a}
%\hbox{b}
%\end{verbatim}
%has no effect.
当 \TeX 处于新行\cstate{N} 时，空格也会被忽略。另一方面，当 \TeX 处于竖直模式工作时，空格记号（也就是在一开始未被忽略的空格）会被忽略。例如说，下例第一个盒子之后由行终止符生成的空格记号会被忽略\liamfnote{此处 \cs{hbox}\marg{\meta{内容物}} 并不会使 \TeX 从由 \cs{par} 记号引入的竖直模式中切换回水平模式。}。
\begin{verbatim}
\par
\hbox{a}
\hbox{b}
\end{verbatim}

% Both plain \TeX\ and \LaTeX\ define a command \cs{obeyspaces}
% \altt
% that makes spaces significant: after one space other spaces are no
% longer ignored. In both cases the basis is
% \altt
plain \TeX 和 \LaTeX 格式都定义了名为 \cs{obeyspaces} 的宏。该宏能使每个空格都是有意义的：在一个空格之后，连续的空格会被保留。两种格式中，\cs{obeyspaces} 的基本形式是一致的。
\begin{verbatim}
\catcode`\ =13 \def {\space}
\end{verbatim}
% However, there is a difference between the two cases:
% in plain \TeX\
不过，对于 \cs{space} 的定义，两种格式有所区别。在 plain \TeX 中，\cs{space} 的定义如下
\begin{verbatim}
\def\space{ }
\end{verbatim}
在 \LaTeX 中，同名的宏则定义为
% while in \LaTeX\
\begin{verbatim}
\def\space{\leavevmode{} }
\end{verbatim}
% although the macros bear other names there.

% The difference between the two macros becomes
% apparent in the context of \cs{obeylines}:
% each line end is then a \cs{par} command, implying that
% each next line is started in vertical mode.
% An active space is expanded by the plain macro to a space token,
% which is ignored in vertical mode.
% The active spaces in \LaTeX\ will immediately switch to horizontal
% mode, so that each space is significant.
在 \cs{obeylines} 的上下文中，比较容易看出这两种定义的区别。使用 \cs{obeylines} 后，每个行终止符都会被转换成一个 \cs{par} 命令。因此 \TeX 开始处理每一行时，都处于竖直模式。在 plain \TeX 中，活动空格被展开为空格记号，因此在垂直模式中会被忽略。但在 \LaTeX 中，首先会离开竖直模式并进入水平模式，因此每个空格就都是有意义的了。

%\subsection{More ignored spaces}
\subsection{空格被忽略的其他情形}

%There are three further places where \TeX\ will ignore space tokens.
还有三种情形下，\TeX 会忽略空格记号：
% \alt
\begin{enumerate}
%\item When \TeX\ is looking for
%an undelimited macro argument it will accept the
%first token (or group) that is not a space. This is treated
%in Chapter~\ref{macro}.
\item 在寻找未被花括号定界的宏参数时，\TeX 会忽略所有空格记号，而将第一个非空记号（或分组）作为参数。详见第~\ref{macro}~章。

%\item In math mode space tokens are ignored (see Chapter~\ref{math}).
\item 在数学模式中，所有的空格记号会被忽略（详见第~\ref{math}~章）。

%\item After an alignment tab character spaces are ignored
%(see Chapter~\ref{align}).
\item 在阵列制表符之后，空格记号会被忽略（见第~\ref{align}~章）。
\end{enumerate}

%\subsection{\gr{space token}}
\subsection{空格记号：\gr{space token}}

%Spaces are anomalous in \TeX.
%For instance, the \cs{string} operation
%assigns category code~12\index{category!12} to all
%characters except spaces; they receive category~10\index{category!10}.
%Also, as was said above, \TeX's input processor converts (when in
%state~{\itshape M}) all tokens with category code~10 into real spaces:
%they get character code~32.
%Any character token with category~10 is called
%\gram{space token}\indexterm{space! token}.
%Space tokens with character
%code not equal to 32 are called \indextermbus{funny}{spaces}.
在 \TeX 中，空格总是表现得与众不同。举例来说，\cs{string} 会将所有字符的分类码设置为 12\index{category!12}，唯独空格的分类码是 10 \index{category!10}。此外，如前文所述，在\cstate{M} 中，\TeX 的输入处理器会将所有分类码为 10 的字符转换为真正的空格：字符编码会被设置为 32。于是，任何分类码为 10 的字符记号称为\cindextermsub{空格}{记号}（\gram{space token}）。字符编码不是 32 的空格记号称为\cindextermbus{滑稽}{空格}。

%\begin{example} After giving the character \n Q
%the category code of a space character,
%and using it in a definition
%\begin{verbatim}
%\catcode`Q=10 \def\q{aQb}
%\end{verbatim}
%we get
%\begin{verbatim}
%\show\q
%macro:-> a b
%\end{verbatim}
%because the input processor
%changes the character code of the funny space
%in the definition.
%\end{example}
\begin{example}
将字符 \n{Q} 的分类码设置为空格字符之后，如下定义
\begin{verbatim}
\catcode`Q=10 \def\q{aQb}
\end{verbatim}
可得
\begin{verbatim}
\show\q
macro:-> a b
\end{verbatim}
这是因为输入处理器改变了宏定义中滑稽空格的字符编码。
\end{example}

%Space tokens with character codes other than 32 can be
%created using, for instance, \cs{uppercase}.
%However, `since the various forms of
%space tokens are almost identical in behaviour, there's no
%point dwelling on the details'; see~\cite{Knuth:TeXbook}~p.~377.
字符编码不为 32 的空格记号可以用 \cs{uppercase} 等命令生成。不过，「由于字符编码不同的空格记号的行为是一致的，所以纠缠于这类细节是没有意义的」。详见~\cite{Knuth:TeXbook} 第~377~页。

%\subsection{Control space}
\subsection{控制空格}

%The `control space' command \verb-\-\n{\char32}
%\cstoidx\char32\par\
%contributes the amount of space that a \gr{space token} would
%when the \verb=\spacefactor= is~1000.
%A~control space
%is not treated like a space token, or like a macro
%expanding to one (which is how \cs{space} is defined in plain \TeX).
%For instance, \TeX\ ignores spaces
%at the beginning of an input line, but
%control space is a \gr{horizontal command}, so it
%makes \TeX\ switch from vertical to horizontal mode
%(and insert an indentation box).
%See  Chapter~\ref{space} for the space factor, and
%chapter~\ref{hvmode} for horizontal and vertical modes.
控制空格命令 \n{\cs{\textvisiblespace}} \cstoidx\char32\par 给出一个与 \verb=\spacefactor= 等于 1000 时空格记号宽度一样的空格。控制空格不能被当做是空格记号，也不能理解为会展开成一个空格记号的宏（例如 plain \TeX 中的 \cs{space}）。举例来说，\TeX 会忽略所有输入行开头的空格，但是控制空格是一个水平命令（\gr{horizontal command}），故而 \TeX 在遇到它之后会从竖直模式切换到水平模式（并插入一个缩进盒子）。有关 \verb=\spacefactor= 的介绍，详见第~\ref{space}~章；有关水平模式和竖直模式的介绍，详见第~\ref{hvmode}~章。

%\subsection{`\n{\char32}'}
\subsection{可见空格：\textvisiblespace}

%The explicit symbol `\n{\char32}' for a space
%is character~32 in the Computer Modern typewriter typeface.
%However, switching to \cs{tt} is not sufficient to get
%spaces denoted this way, because spaces will still
%receive special treatment in the input processor.
在 Computer Modern 的打字机字体中，字符编码为 32 的字符是显式空格符号 `\textvisiblespace'。不过，简单地使用 \cs{tt} 命令\liamfnote{在 \LaTeX 中是 \cs{ttfamily} 命令。}是无法将其打印出来的。这是因为空格在输入处理器中有特别的处理。

%One way to
%let spaces be typeset by \n{\char32}
%is to set
%\begin{verbatim}
%\catcode`\ =12
%\end{verbatim}
%\TeX\ will then take a space as the instruction to
%typeset character number~32. Moreover, subsequent spaces
%are not skipped, but also typeset this way: state~{\itshape S}
%is only entered after a character with category code~10.
%Similarly, spaces after a control sequence are made
%visible by changing the category code of the space character.
使空格字符 \textvisiblespace 显形的一种方法是将空格字符的分类码设置为 12：
\begin{verbatim}
\catcode`\ =12
\end{verbatim}
此时，\TeX 会将空格字符作为编码为 32 的字符排版出来。此外，连续的空格不会被忽略。这是因为\cstate{S} 只是在遇到分类码为 10 的字符后才会进入。类似地，控制序列之后的空格也会因分类码的改变而显形。

%\section{More about line ends}
\section{有关行尾的更多知识}

%\TeX\ accepts lines from an input file, excluding any line
%terminator that may be used.
%Because of this, \TeX's behaviour here is not dependent
%on the operating system and the \indextermsub{line}{terminator}
%it uses (\key{CR}-\key{LF},
%\key{LF}, or none at all for block storage).
%From the input line any trailing spaces are removed.
%The reason for this is historic; it has to do with
%the block storage mode on \key{IBM} mainframe computers.
%For some computer-specific problems with end-of-line
%characters, see~\cite{B:ctrl-M}.
\TeX 从输入文件中获取文本行，但不包括输入行中的行尾符。因此，\TeX 的行为不依赖操作系统以及\cindextermsub{行}{尾符}究竟是什么（\key{CR}-\key{LF}、\key{LF}抑或是在块存储系统里根本就不存在行尾符）。而后，\TeX 会移除输入行末尾的空格。这样处理是有历史原因的：\TeX 必须能够适应 \key{IBM} 大型计算机的块存储模式有关。对于由计算机的不同而造成的有关行尾符的问题，详见~\cite{B:ctrl-M}。

%A~terminator character is then appended
%with a character code of \cs{endlinechar},
%unless this parameter has a value that
%is negative or more than~255.
%Note that this terminator character
%need not have category code~5\index{category!5}, end of line.
此后，字符编码为 \cs{endlinechar} 的行终止符会被追加在文本行的末尾；除非 \cs{endlinechar} 中保存的数值为负数或大于 255。注意，改行终止符的分类码不一定非得是 5\index{category!5}。

%\subsection{Obeylines}
\subsection{保持各行}

%Every once in a while it is desirable that the line ends in
%\message{Check spurious space obeylines+1}%
%\cstoidx obeylines\par\howto Change the meaning of the line end\par
%the input correspond to those in the output.
%The following piece of code does the trick:
%\begin{verbatim}
%\catcode`\^^M=13 %
%\def^^M{\par}%
%\end{verbatim}
%The \cs{endlinechar} character is here made active,
%and its meaning becomes \cs{par}.
%The comment signs prevent \TeX\ from seeing the terminator of the
%\alt
%lines of this definition, and expanding it since it is active.
有时候会期望会希望输入文本中的行尾符能与排版输出的行尾一一对应。
\message{Check spurious space obeylines+1}%
\cstoidx obeylines\par\howto Change the meaning of the line end\par
下面的代码可以可以解决这一问题：
\begin{verbatim}
\catcode`\^^M=13 %
\def^^M{\par}%
\end{verbatim}
这里，\cs{endlinechar} 成为活动符，其含义变为 \cs{par}。上述代码中的注释符用于阻止 \TeX\ 看到代码末尾的行终止符，以防它将其作为活动字符而展开。

%However, it takes some care to embed this code in a macro.
%The definition
%\begin{verbatim}
%\def\obeylines{\catcode`\^^M=13 \def^^M{\par}}
%\end{verbatim}
%will be misunderstood:
%\TeX\ will discard everything
%after the second \verb>^^M>, because this has category code~5.
%Effectively, this line is then
%\begin{verbatim}
%\def\obeylines{\catcode`\^^M=13 \def
%\end{verbatim}
%To remedy this,
%the definition itself has to be
%performed in a context where \verb>^^M> is an active
%character:
%\begin{verbatim}
%{\catcode`\^^M=13 %
% \gdef\obeylines{\catcode`\^^M=13 \def^^M{\par}}%
%}
%\end{verbatim}
%Empty lines in the  input are not taken into account
%in this definition: these disappear, because two consecutive \cs{par}
%tokens are (in this case) equivalent to one.
%A slightly modified definition for the line end as
%\begin{verbatim}
%\def^^M{\par\leavevmode}
%\end{verbatim}
%remedies this:
%now every line end forces \TeX\ to start a paragraph. For empty
%lines this will then be an empty paragraph.
需要注意的是，在将上述代码嵌入宏的展开文本中需要特别小心。例如说下列代码会让 \TeX 误解：
\begin{verbatim}
\def\obeylines{\catcode`\^^M=13 \def^^M{\par}}
\end{verbatim}
具体来说，\TeX\ 将丢弃第二个 \verb>^^M> 之后的所有字符。这是因为，在宏展开的过程中，\cs{catcode} 命令尚未执行，因而此时 \verb>^^M> 分类码为 5，而非 13。也就是说，这一行实际上变成了：
\begin{verbatim}
\def\obeylines{\catcode`\^^M=13 \def
\end{verbatim}
要修正上述问题，需要为 \verb>^^M> 营造一个可作为活动字符使用的环境：
\begin{verbatim}
{\catcode`\^^M=13 %
 \gdef\obeylines{\catcode`\^^M=13 \def^^M{\par}}%
}
\end{verbatim}
这样解决了上面提到的问题，但仍有缺陷。这是因为，该 \cs{obeylines} 仍然不能保留输入文本中的空行——连续两个 \cs{par} 记号会被当成是一个。为此，我们需要对上述定义稍作改进：
\begin{verbatim}
\def^^M{\par\leavevmode}
\end{verbatim}
这样，输入文本中的每一行都会开启一个新段落，空行则开启一个空段落。

%%\spoint Changing the \cs{\endlinechar}
%\subsection{Changing the \cs{endlinechar}}
%\spoint Changing the \cs{\endlinechar}
\subsection{改变 \cs{endlinechar}}

%Occasionally you may want to change the \cs{endlinechar}, or
%the \cs{catcode} of the ordinary line terminator \verb.^^M.,
%for instance to obtain special effects such as macros where
%the argument is terminated by the line end.
%See page~\pageref{pick:eol} for a worked-out example.
某些情况下，你会希望改变 \cs{endlinechar} 的值或者 \verb.^^M. 的分类码，以达成一些特殊效果。例如说，可以用行终止符作为宏的参数的定界符。具体可参考第~\pageref{pick:eol}~页给出的例子。

%There are a couple of traps. Consider the following:
%\begin{verbatim}
%{\catcode`\^^M=12 \endlinechar=`\^^J \catcode`\^^J=5
%...
%... }
%\end{verbatim}
%This causes unintended output of both character~13 (\verb-^^M-)
%and~10 (\verb-^^J-), caused by the line terminators of the
%first and last line.
在这些常识中，通常会有一些陷阱。我们来看以下写法：
\begin{verbatim}
{\catcode`\^^M=12 \endlinechar=`\^^J \catcode`\^^J=5
...
... }
\end{verbatim}
这段代码的输出不符合预期：由于第一行和最后一行的行终止符，\TeX 将输出字符码为 13（\verb-^^M-）和 10（\verb-^^J-）的字符。\liamfnote{在第一行执行之前，输入处理器的初始化处理将会添加 \texttt{\textasciicircum\textasciicircum M} 作为行终止符，但由于其分类码被修改为 12，即其他字符，故而会被输出。最后一行执行之前，输入处理器的初始化处理将会添加 \texttt{\textasciicircum\textasciicircum J} 作为行终止符，因为此时 \cs{endlinechar} 的值是它的字符码。但由于分组结束，\texttt{\textasciicircum\textasciicircum J} 的分类码已恢复，故此时添加在输入行尾的行终止符  \texttt{\textasciicircum\textasciicircum J} 也会输出。}

%Terminating the first and last line with a comment works,
%but replacing the first line by the two lines
%is also a solution.
在第一行和最后一行末尾加上注释符可以解决此问题，但还有另一种方法是将第一行拆成下面两行\liamfnote{这样可以避免第一行的行终止符带来的问题，但无法避免最后一行的行终止符带来的问题。}：
\begin{verbatim}
{\endlinechar=`\^^J \catcode`\^^J=5
\catcode`\^^M=12
\end{verbatim}

%Of course, in many cases it is not necessary to substitute
%another end-of-line character; a~much simpler solution
%is then to put
%\begin{verbatim}
%\endlinechar=-1
%\end{verbatim}
%which treats all lines as if they end with a comment.
当然，在多数情况下没必要将行终止符替换为另一个字符；设置
\begin{verbatim}
\endlinechar=-1
\end{verbatim}
就等同于各行都以注释符结尾。

%%\spoint More remarks about the end-of-line character
%\subsection{More remarks about the end-of-line character}
%\spoint More remarks about the end-of-line character
\subsection{行终止符的更多注记}

%The character that \TeX\ appends at the end of an input line
%is treated like any other character. Usually one is not aware
%of this, as its category code is special, but there are a few
%ways to let it be processed in an unusual way.
\TeX 对所有字符一视同仁，包括追加到输入行末尾的行终止符。考虑到它特别的分类码，通常大家都不会注意行终止符。但是有一些方法可以特别地处理行终止符。

%\begin{example} Terminating an input line with \verb>^^> will
%(ordinarily, when \cs{endlinechar} is~13) give `M' in the output,
%which is the
%\ascii{} character with code~13+64.
%\end{example}
\begin{example}
假定 \cs{endlinechar} 保持默认值为 13，那么，把 \verb>^^> 置于文本行的末尾，将输出字符 `M'。因为它是编码为~13+64~的 \ascii\ 字符。
\end{example}

%\begin{example} If \verb>\^^M> has been defined,
%terminating an input line with a backslash will execute this command.
%The plain format defines
%\begin{verbatim}
%\def\^^M{\ }
%\end{verbatim}
%which makes a `control return' equivalent to a control space.
%\end{example}
\begin{example}
如果 \verb|\^^M| 有定义，此时可称为「控制换行」。则在输入行中用反斜线结尾将执行此控制换行命令。例如，在 plain TeX 中定义
\begin{verbatim}
\def\^^M{\ }
\end{verbatim}
将使得控制换行与控制空格等价。
\end{example}

%%\point More about the input processor
%\section{More about the input processor}
%\point More about the input processor
\section{输入处理器的更多知识}

%%\spoint The input processor as a separate process
%\subsection{The input processor as a separate process}
%\spoint The input processor as a separate process
\subsection{输入处理器作为独立过程}

%\TeX's levels of processing are all working at the
%same time and incrementally, but conceptually they can often be
%considered to be separate processes that each accept the
%completed output of the previous stage. The juggling with
%spaces provides a nice illustration for this.
\TeX\ 处理器的各个阶段都是同时运行的，但是在概念上它们常被视为依次独立运行，前者的输出是后者的输入。关于空格的小戏法很好地展现了这一点。

%Consider the definition
考虑以下宏定义：
\begin{verbatim}
\def\DoAssign{\count42=800}
\end{verbatim}
%and the call
及其调用：
\begin{verbatim}
\DoAssign 0
\end{verbatim}
%The input processor, the part
%of \TeX\ that builds tokens, in scanning this call
%skips the space before the zero, so the expansion of this
%call is
作为构建记号的部分，\TeX 的输入处理器在扫描此次调用时，会忽略控制序列之后和零之前的所有空格。因此，此次调用的展开为：
\begin{verbatim}
\count42=8000
\end{verbatim}
%It would be incorrect to reason
%`\cs{DoAssign} is read, then expanded, the space delimits the
%number 800, so 800 is assigned and the zero is printed'.
%Note that the same would happen if the zero appeared on the next line.
不要认为：「\cs{DoAssign} 首先被读入，而后展开，而后空格作为分隔符分割了 800，于是 800 被赋值给计数器，并打印出数字零。」不过，需要注意的是，如果数字零出现在下一行，情况就不一样了。

%Another illustration shows that optional spaces appear in a different
%stage of processing from that for skipped spaces:
再举一个让非强制空格出现在忽略空格阶段之后的例子：
\begin{disp}\verb>\def\c.{\relax}>\nl
    \verb>a\c.>{\tt\char32 b}\end{disp}
%expands to
会被展开为：
\begin{disp}\n{a\cs{relax}\char32 b}\end{disp}
%which gives as output
其输出是：
\begin{disp} `a b'\end{disp}
%because spaces after the \cs{relax} control sequence are only
%skipped when the line is first read, not when it is expanded.
这是因为，「输入处理器忽略控制序列 \cs{relax} 之后的空格」这一现象仅出现在该行被首次读取之时，而非在其被展开之时。
%The fragment
另一方面，这个例子：
\begin{disp} \verb-\def\c.{\ignorespaces}-\nl \verb-a\c. b-\end{disp}
%on the other hand, expands to
则会被展开为：
\begin{disp}\n{a\cs{ignorespaces}\textvisiblespace b}\end{disp}
%Executing the \cs{ignorespaces} command removes the subsequent
%space token, so the output is
执行 \cs{ignorespaces} 时会删除所有接续其后的连续空格记号。因此，输出是：
\begin{disp} `ab'.\end{disp}
%In both definitions
%the period after \cs{c} is a delimiting token; it is used here
%to prevent spaces from being skipped.
在上述两个例子中，\cs{c} 之后的西文句号均为定界符，用于保护控制序列之后的空格不被输入处理器吃掉。

%%\spoint The input processor not as a separate process
%\subsection{The input processor not as a separate process}
%\spoint The input processor not as a separate process
\subsection{输入处理器不作为独立过程}

%Considering the tokenizing of \TeX\ to be a separate process
%is a convenient view, but sometimes it leads to confusion.
将 \TeX 的记号化的过程视作独立过程是一个便利的做法，但有时这种做法会引起困惑。
%The line
例如
\begin{verbatim}
\catcode`\^^M=13{}
\end{verbatim}
%makes the line end active,
%and subsequently gives an `undefined control sequence' error
%for the line end of this line itself. Execution of the commands
%on the line thus influences the scanning process of that
%same line.
将行终止符设为活动字符；因此，该行自身的行终止符将报错：「未定义的控制序列（undefined control sequence）」。这表明，执行行内的命令有时会影响对同一行的扫描过程。

%By contrast,
另一方面，下面例子则不会报错：
\begin{verbatim}
\catcode`\^^M=13
\end{verbatim}
%does not give an error.
%The reason for this is that \TeX\ reads the line end while it is still
%scanning the number~13; that is, at a time when the assignment
%has not been performed yet.
%The line end is then converted to the optional space character
%delimiting the number to be assigned.
这是因为，在 \TeX 扫描数字 13 时就读入了行终止符，此时，分类码的赋值过程尚未执行；而此时，行终止符被转换成了非强制空格，作为数字的定界符。

%%\spoint Recursive invocation of the input processor
%\subsection{Recursive invocation of the input processor}
%\spoint Recursive invocation of the input processor
\subsection{输入处理器的递归调用}

%Above, the activity of replacing a parameter
%character plus a digit by a parameter token was described
%as something similar to the lumping together of letters
%into a control sequence token. Reality is somewhat more
%complicated than this. \TeX's token scanning mechanism
%is invoked both for input from file and for input from
%lists of tokens such as the macro definition. Only in the
%first case is the terminology of internal states applicable.
前文中，将参数符和数字替换为参数记号的过程被描述得与将字母捆绑成控制序列记号类似。但实际情况要复杂得多。\TeX 的记号扫描机制不仅在扫描文件输入时起作用，在扫描记号列表输入时同样会起作用：例如在处理宏定义时。前文提到的内部状态变化的机制，仅仅适用于前一种情况。

%Macro parameter characters are treated the same in both
%cases, however. If this were not the case it would
%not be possible to write things such as
%\begin{verbatim}
%\def\a{\def\b{\def\c####1{####1}}}
%\end{verbatim}
%See page \pageref{nest:def} for an explanation of such
%nested definitions.
在两种情况下，输入处理器对参数符的处理方式都是相同的。否则 \TeX 便无法处理下面这样的宏定义：
\begin{verbatim}
\def\a{\def\b{\def\c####1{####1}}}
\end{verbatim}
见第~\pageref{nest:def}~页对这种嵌套定义的解释。

%%\point The \verb@- convention
%\section{The \n{@} convention}
%\point The \verb@- convention
\section{\n{@} 约定}

%Anyone who has ever browsed through either the plain format or
%the \LaTeX\ format will have noticed that a lot of control sequences
%contain an `at' sign:~\verb-@-. These are control sequences that
%are meant to be inaccessible to the ordinary user.
读过 plain TeX 或是 \LaTeX 格式的源码就会发现其中有很多包含符号 \verb-@- 的控制序列。这种包含 \n@ 的控制序列不能被普通用户直接使用。

%Near the beginning of the format files the instruction
%\begin{verbatim}
%\catcode`@=11
%\end{verbatim}
%occurs, making the at sign into a letter,
%meaning that it can be used in control sequences. Somewhere near the
%end of the format definition the at sign is made `other' again:
%\begin{verbatim}
%\catcode`@=12
%\end{verbatim}
格式文件的起始处附近有命令
\begin{verbatim}
\catcode`@=11
\end{verbatim}
它将 \verb-@- 的分类从「其他字符」变为「字母」，从而可以用于组成控制序列。而在格式文件的末尾处附近有命令
\begin{verbatim}
\catcode`@=12
\end{verbatim}
它将 \verb-@- 的分类恢复为其他字符。

%Now why is it that users cannot
%call a control sequence with an at sign
%directly, although they can call macros that contain lots of those
%`at-definitions'? The reason is that the control sequences
%containing an \n@ are internalized by \TeX\ at definition time,
%after which they are a token, not a string of characters.
%Macro expansion then
%just inserts such tokens, and at that time the category codes
%of the constituent characters do not matter any more.
那么，为什么用户不能直接调用带有 \verb-@- 字符的控制序列，却能调用定义中包含此类控制序列的宏呢？原因在于，在宏定义时，带有 \n@ 的控制序列已被 \TeX 内部处理过了，此后，这些控制序列就变成了记号而不是字符串了。在宏展开的过程中，\TeX 只需要操作记号，因此，彼时记号内字符的分类码就不影响宏展开的过程了。

%\endofchapter
%%%%% end of input file [mouth]

\end{document}