-
Notifications
You must be signed in to change notification settings - Fork 35
GenericLexer
Olivier Duhart edited this page Feb 21, 2018
·
22 revisions
The generic lexer aims at solving the performance issues with the Regex Lexer. The idea is to start from a limited set of classical lexemes and to refine this set to fit your needs. Those lexemes are recognize through a Finite State Machine, way more efficient than looping through a set of regexes.
The basic lexemes are :
-
GenericToken.Identifier
: an identifier. From version 2.0.3Identifier
accepts an extra parameter to specify an identifier pattern :-
IdentifierType.Alpha
: only alpha characters (default value, only pattern available before version 2.0.3) -
IdentifierType.AlphaNum
: starting with an alpha char and then alpha or numeric char -
IdentifierType.AlphaNumDash
: starting with an alpha or ''(underscore) char and then alphanumeric or '-'(minus) or '' (underscore) char
-
-
GenericToken.String
: a classical string delimited by double quotes " -
GenericToken.Int
: an int (i.e. a serie of one or more digit) -
GenericToken.Double
: a float number (decimal separator is dot '.' ) -
GenericToken.keyWord
: a keyword is an identifier with a special meaning (it comes with the same constraint as theGenericToken.Identifier
. here again performance comes at the price of less flexibility. This lexeme is configurable. -
GenericToken.SugarToken
: a general purpose lexeme with no special constraint except the use of a leading alpha char. this lexer is configurable.
To build a generic lexer Lexeme attribute we have 2 different constructors:
- static generic lexeme. this constructor allows to do a 1 to 1 mapping between a generic token and your lexer token. It uses only one parameter that is the mapped generic token :
[Lexeme(GenericToken.String)]
(static lexemes are String, Int , Double and Identifier) - configurable lexemes (KeyWord and SugarToken). It takes 2 parameters :
- the mapped GenericToken
- the value of the keyword or sugar token.
public enum WhileTokenGeneric
{
#region keywords 0 -> 19
[Lexeme(GenericToken.KeyWord,"if")]
IF = 1,
[Lexeme(GenericToken.KeyWord, "then")]
THEN = 2,
[Lexeme(GenericToken.KeyWord, "else")]
ELSE = 3,
[Lexeme(GenericToken.KeyWord, "while")]
WHILE = 4,
[Lexeme(GenericToken.KeyWord, "do")]
DO = 5,
[Lexeme(GenericToken.KeyWord, "skip")]
SKIP = 6,
[Lexeme(GenericToken.KeyWord, "true")]
TRUE = 7,
[Lexeme(GenericToken.KeyWord, "false")]
FALSE = 8,
[Lexeme(GenericToken.KeyWord, "not")]
NOT = 9,
[Lexeme(GenericToken.KeyWord, "and")]
AND = 10,
[Lexeme(GenericToken.KeyWord, "or")]
OR = 11,
[Lexeme(GenericToken.KeyWord, "(print)")]
PRINT = 12,
#endregion
#region literals 20 -> 29
// identifier with ```IdentifierType.AlphaNumDash pattern```
[Lexeme(GenericToken.Identifier, IdentifierType.AlphaNumDash)]
IDENTIFIER = 20,
[Lexeme(GenericToken.String)]
STRING = 21,
[Lexeme(GenericToken.Int)]
INT = 22,
#endregion
#region operators 30 -> 49
[Lexeme(GenericToken.SugarToken,">")]
GREATER = 30,
[Lexeme(GenericToken.SugarToken, "<")]
LESSER = 31,
[Lexeme(GenericToken.SugarToken, "==")]
EQUALS = 32,
[Lexeme(GenericToken.SugarToken, "!=")]
DIFFERENT = 33,
[Lexeme(GenericToken.SugarToken, ".")]
CONCAT = 34,
[Lexeme(GenericToken.SugarToken, ":=")]
ASSIGN = 35,
[Lexeme(GenericToken.SugarToken, "+")]
PLUS = 36,
[Lexeme(GenericToken.SugarToken, "-")]
MINUS = 37,
[Lexeme(GenericToken.SugarToken, "*")]
TIMES = 38,
[Lexeme(GenericToken.SugarToken, "/")]
DIVIDE = 39,
#endregion
#region sugar 50 ->
[Lexeme(GenericToken.SugarToken, "(")]
LPAREN = 50,
[Lexeme(GenericToken.SugarToken, ")")]
RPAREN = 51,
[Lexeme(GenericToken.SugarToken, ";")]
SEMICOLON = 52,
EOF = 0
#endregion
}