Skip to content
b3b00 edited this page Nov 4, 2024 · 37 revisions

preliminary

An EBNF parser is an extension of a BNF parser. So for better understanding please first refer to BNF parser page as it contains information shared by both BNF and EBNF parsers.

EBNF notation

repeater modifiers

you can use EBNF notation :

  • '*' to repeat 0 or more the same terminal or non terminal
  • '+' to repeat once or more the same terminal or non terminal

for repeated elements values passed to [Production] methods are :

  • List<TOut> for a repeated non terminal
  • List<Token<TIn>> for a repeated terminal
       [Production("listElements: value additionalValue*")]
       public JSon listElements(JSon head, List<JSon> tail)
       {
           JList values = new JList(head);
           values.AddRange(tail);
           return values;
       }

See EBNFJsonParser.cs for a complete EBNF json parser.

option modifier

the '?' modifier allow optional token or non-terminal.

  • for tokens the Token<TIn> parameter has a IsEmpty property set to true when the matching token is absent.
  • for nonterminal the visitor method get an ValueOption<TOut> instead of TOut. Then the parameter can be tested for emptyness with IsNone property.
//option token

   [Production("block: A B? C")]
   public AST listElements(Token<TIn>, a Token<TIn> b, Token<TIn> c)
   {
       if (b.IsEmpty) {
           // do something usefull
       }
       else {
           string bValue = b.Value;
           // do other thing still usefull
       }
   }

// optional non terminal

   [Production("root2 : a B? c ")]
   public string root2(Token<OptionTestToken> a, ValueOption<string> b, Token<OptionTestToken> c)
   {
       StringBuilder r = new StringBuilder();
       r.Append($"R(");
       r.Append(a.Value);
       r.Append(b.Match(v => $",{v}", () => ",<none>"));
       r.Append($",{c.Value}");
       r.Append($")");
       return r.ToString();
   }

groups / sub-rules

You can define groups (also known as sub-rules) in a production rule. A group is a sequence of terminals or non terminals. Groups only accept following items :

  • terminals : TERM
  • discarded terminals : TERM[d]
  • non terminals : nonterm
  • choices : [ SOME | TERM |OR | OTHER ] (see [alternate choices](#alternate choice) )

Modifiers are not allowed within a group (except discard on terminals). the matching method parameter for a group is a Group<TIn,TOut>. A Group<TIn,TOut> is a list of Token<IN> or TOut. Values in the Group are listed in the same order as their corresping clauses. Group<TIn,TOut> exposes method to ease access to values.

Groups can be "multiplied" using a modifier. In this case the value returned is a List<Group<TIn,TOut>>

Groups can also be optional using the ? operator. Then the returned value is a ValueOption<Group<IN,OUT>>.

        [Production("listElements: value (COMMA [d] value)* ")]
        public JSon listElements(JSon head, List<Group<JsonToken,JSon>> tail)
        {
            JList values = new JList(head);
            values.AddRange(tail.Select((Group<JsonToken,JSon> group) => group.Value(0)).ToList<JSon>());
            return values;
        }

        [Production("rootOption : A ( SEMICOLON [d] A )? ")]
        public string rootOption(Token<GroupTestToken> a, ValueOption<Group<GroupTestToken, string>> option)
        {
            StringBuilder builder = new StringBuilder();
            builder.Append("R(");
            builder.Append(a.Value);
            var gg = option.Match(
                (Group<GroupTestToken, string> group) =>
                {
                    var aToken = group.Token(0).Value;
                    builder.Append($";{aToken}");
                    return group;
                },
            () =>
            {
                builder.Append(";");
                builder.Append("a");
                var g = new Group<GroupTestToken, string>();
                g.Add("<none>", "<none>");
                return g;
            });            
            builder.Append(")");
            return builder.ToString();
        }

alternate choice

In some case you just don't want to write many production rules when those rules only differ with a single terminal or non terminal clause. For these case you can use the | operator. Alternate choices are grouped together between brackets [ ... ]. a pipe | separate each different choice :

public class AlternateChoiceTestTerminal
    {
        [Production("choice : [ a | b | c]")]
        public string Choice(Token<OptionTestToken> c)
        {
            return c.Value;
        }
    }

⚠️ Warning ! a choice group can only contain terminal or non-terminal and they can not be mixed.

? + and * modifiers are allowed :

public class AlternateChoiceTestTerminal
    {
        [Production("choice : [ a | b | c]*")]
        public string Choice(List<Token<OptionTestToken>> c)
        {
            return c.Value;
        }
    }

terminal (and only terminal) choice group can be ignored with the [d] specifier (see ignoring syntax sugar tokens):

public class AlternateChoiceTestTerminal
    {
        [Production("choice : a [ a | b | c] [d]")]
        public string Choice(Token<OptionTestToken> firstTokenOnly )
        {
            return c.Value;
        }
    }

ignoring syntax sugar tokens

Sometimes tokens do not bring any semantic value. Their only value is to denotes syntaxic structure.

For example in C like language, brackets ('{') only denotes beginning of blocks but does add any other information. Their only use is to guide the syntax parser. So we proposed a way to dismiss this tokens on the visitor methods.

the [d] (d for discard) modifier marks a token as ignored. [d] modifier only make sens when applied to a token. If applied to a nonterminal it will simply be ignored.

Here is an exemple for a C block statement:

        [Production("block: LBRACKET [d] statement* RBRACKET [d]")]
        public AST listElements( List<AST> statements)
        {
            // any usefull code
        }

explicit tokens

Sometimes it is easier to define lexemes directly in production rules instead of having to define a lexer.

⚠️ Warning ! this feature only works when used with a Generic Lexer.

Regex Lexer is not supported. The EBNF syntax allows to explicitely define a token inside a production rule. This tokens are surrounded by simple quote '.

Still an enum lexer must be defined for compatibility reasons.

Explicit tokens must be :

  • either keyword token (in this case the identifier pattern of the lexer is used (default to alpha))
  • or a sugar token

⚠️ Explicit tokens are implicitly on Default Generic lexer mode. They can not be used in another mode.

A really simple parser demonstrating the use of explicit tokens :

// the lexer only defines an ID pattern and a double token
// other tokens will be defined explicitely in grammar rules
public enum Lex
    {
        [AlphaId]
        Id,
        
        [Double]
        Dbl
    }

    public class Parse
    {
            [Production("program : statement*")]
        public string Program(List<string> statements)
        {
            StringBuilder builder = new StringBuilder();
            foreach (var statement in statements)
            {
                builder.AppendLine(statement);
            }

            return builder.ToString();
        }

        [Production("statement : Id '='[d] Parse_expressions ")]
        public string Assignment(Token<Lex> id, string expression)
        {
            return $"{id.Value} = {expression}";
        }
        
        [Production("condition : Id '=='[d] Parse_expressions ")]
        public string Condition(Token<Lex> id, string expression)
        {
            return $"{id.Value} == {expression}";
        }

        [Production("statement : 'if'[d] condition 'then'[d] statement 'else'[d] statement")]
        public string IfThenElse(string condition, string thenStatement, string elseStatement)
        {
            StringBuilder builder = new StringBuilder();
            builder.AppendLine($"{condition} :");
            builder.AppendLine($"    - {thenStatement}");
            builder.AppendLine($"    - {elseStatement}");
            return builder.ToString();
        }

        #region expressions

        [Operand]
        [Production("operand : Id")]
        [Production("operand : Dbl")]
        public string Operand(Token<Lex> oper)
        {
            return oper.Value;
        }

        [Infix("'+'", Associativity.Left, 10)]
        public string Plus(string left, Token<Lex> oper, string right)
        {
            return $"( {left} + {right} )";
        }
        
        [Infix("'*'", Associativity.Left, 20)]
        public string Times(string left, Token<Lex> oper, string right)
        {
            return $"( {left} * {right} )";
        }

        #endregion
        
    }

under the hood, meta consideration on EBNF parsers

The EBNF notation has been implemented in CSLY using the BNF notation. The EBNF parser builder is built using the BNF parser builder. Incidently the EBNF parser builder is a good and complete example for BNF parser : RuleParser.cs.

The full grammar for an EBNF rule is EBNF rules grammar.