diff --git a/docs/antlr.rst b/docs/antlr.rst index 52846491..082d535b 100644 --- a/docs/antlr.rst +++ b/docs/antlr.rst @@ -1,8 +1,8 @@ .. include:: links.rst -Using ANTLR Grammars --------------------- +ANTLR Grammars +-------------- .. _grammars: https://github.com/antlr/grammars-v4 diff --git a/docs/asjson.rst b/docs/asjson.rst deleted file mode 100644 index 7edd772c..00000000 --- a/docs/asjson.rst +++ /dev/null @@ -1,25 +0,0 @@ -.. include:: links.rst - -Viewing Models as JSON ----------------------- - - -Models generated by |TatSu| can be viewed by converting them to a JSON-compatible structure -with the help of ``tatsu.util.asjson()``. The protocol tries to provide the best -representation for common types, and can handle any type using ``repr()``. There are provisions for structures with back-references, so there's no infinite recursion. - -.. code:: python - - import json - - print(json.dumps(asjson(model), indent=2)) - -The ``model``, with richer semantics, remains unaltered. - -Conversion to a JSON-compatible structure relies on the protocol defined by -``tatsu.utils.AsJSONMixin``. The mixin defines a ``__json__(seen=None)`` -method that allows classes to define their best translation. You can use ``AsJSONMixin`` -as a base class in your own models to take advantage of ``asjson()``, and you can -specialize the conversion by overriding ``AsJSONMixin.__json__()``. - -You can also write your own version of ``asjson()`` to handle special cases that are recurrent in your context. diff --git a/docs/grako.rst b/docs/grako.rst deleted file mode 100644 index 2b7ecc5f..00000000 --- a/docs/grako.rst +++ /dev/null @@ -1,19 +0,0 @@ -.. include:: links.rst - -Grako Compatibility -------------------- - -|TatSu| is routinely tested over major projects developed with Grako_. The -backwards-compatibility suite includes (at least) translators for COBOL_, Java_, and (Oracle) SQL_. - -Grako_ grammars and projects can be used with |TatSu|, with these caveats: - -* The `AST`_ type retuned when a sequence of elements is matched is now ``tuple`` (instead of a descendant of ``list``). This change improves efficiency and avoids unwanted manipulations of a value that should be inmutable. - -* The Python_ module name changed to ``tatsu``. - -* ``ignorecase`` no longer applies to regular expressions in grammars. Use ``(?i)`` in the pattern to enable ``re.IGNORECASE`` - -* Left recursion is enabled by default because it works and has zero impact on non-recursive grammars. - -* Deprecated grammar syntax is no longer documented. It's best not to use it, as it will be removed in a future version of |TatSu|. diff --git a/docs/index.rst b/docs/index.rst index e50f9bf5..3163e585 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -46,13 +46,10 @@ input, much like the `re`_ module does with regular expressions, or it can gener ast semantics models - asjson - print_translation translation left_recursion mini-tutorial traces - grako antlr examples support diff --git a/docs/install.rst b/docs/install.rst index b359b7e1..c7c2d931 100644 --- a/docs/install.rst +++ b/docs/install.rst @@ -9,5 +9,6 @@ Installation $ pip install tatsu .. warning:: - Versions of |TatSu| since 5.0.0 may require Python>=3.8. Python 2.7 is no longer supported + Modern versions of |TatSu| require active versions of Python (if the Python + version is more than one and a half years old, things may not work). diff --git a/docs/models.rst b/docs/models.rst index 21867e33..b54dcb6d 100644 --- a/docs/models.rst +++ b/docs/models.rst @@ -1,8 +1,12 @@ .. include:: links.rst +Models +------ + + Building Models ---------------- +~~~~~~~~~~~~~~~ Naming elements in grammar rules makes the parser discard uninteresting parts of the input, like punctuation, to produce an *Abstract Syntax @@ -41,6 +45,32 @@ You can also use `Python`_'s built-in types as node types, and default behavior can be overidden by defining a method to handle the result of any particular grammar rule. + + +Viewing Models as JSON +~~~~~~~~~~~~~~~~~~~~~~ + + +Models generated by |TatSu| can be viewed by converting them to a JSON-compatible structure +with the help of ``tatsu.util.asjson()``. The protocol tries to provide the best +representation for common types, and can handle any type using ``repr()``. There are provisions for structures with back-references, so there's no infinite recursion. + +.. code:: python + + import json + + print(json.dumps(asjson(model), indent=2)) + +The ``model``, with richer semantics, remains unaltered. + +Conversion to a JSON-compatible structure relies on the protocol defined by +``tatsu.utils.AsJSONMixin``. The mixin defines a ``__json__(seen=None)`` +method that allows classes to define their best translation. You can use ``AsJSONMixin`` +as a base class in your own models to take advantage of ``asjson()``, and you can +specialize the conversion by overriding ``AsJSONMixin.__json__()``. + +You can also write your own version of ``asjson()`` to handle special cases that are recurrent in your context. + Walking Models ~~~~~~~~~~~~~~ @@ -82,19 +112,18 @@ methods such as: return s def walk_object(self, o): - raise Exception('Unexpected tyle %s walked', type(o).__name__) + raise Exception(f'Unexpected type {type(o).__name__} walked') -Predeclared classes can be passed to ``ModelBuilderSemantics`` instances -through the ``types=`` parameter: - -.. code:: python +Which nodes get *walked* is up to the ``NodeWalker`` implementation. Some +strategies for walking *all* or *most* nodes are implemented as classes +in ``tatsu.wakers``, such as ``PreOrderWalker`` and ``DepthFirstWalker``. - from mymodel import AddOperator, MulOperator +Sometimes nodes must be walked more than once for the purpose at hand, and it's +up to the walker how and when to do that. - semantics=ModelBuilderSemantics(types=[AddOperator, MulOperator]) +Take a look at ``tatsu.ngcodegen.PythonCodeGenerator`` for the walker that generates +a parser in Python from the model of a parsed grammar. -``ModelBuilderSemantics`` assumes nothing about ``types=``, so any -constructor (a function, or a partial function) can be used. Model Class Hierarchies ~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/print_translation.rst b/docs/print_translation.rst deleted file mode 100644 index da477b88..00000000 --- a/docs/print_translation.rst +++ /dev/null @@ -1,41 +0,0 @@ -.. include:: links.rst - -Print Translation ------------------ - - -|TatSu| doesn't impose a way to create translators, but it -exposes the facilities it uses to generate the `Python`_ source code for -parsers. - -Translation in |TatSu| is based on subclasses of ``Walker`` and on classes that -inherit from ``IndentPrintMixin``, a strategy copied from the new PEG_ parser -in Python_ (see `PEP 617`_). - -``IndentPrintMixin`` provides an ``indent()`` method, which is a context manager, -and should be used thus: - -.. code:: python - - class MyTranslationWalker(NodeWalker, IndentPrintMixin): - - def walk_SomeNode(self, node): - with self.indent(): - # ccontinue walking the tree - - -The ``self.print()`` method takes note of the current level of indentation, so -output will be indented by the ``indent`` passed to -the ``IndentPrintConstructor``: - -.. code:: python - - def walk_SomeNode(self, node): - with self.indent(): - self.print(walk_expression(node.exp)) - -The printed code can be retrieved using the ``printed_text()`` method. Other -posibilities are available by assigning a text-like object to -``self.output_stream`` in the ``__init__()`` method. - -.. _PEP 617: https://peps.python.org/pep-0617/ diff --git a/docs/translation.rst b/docs/translation.rst index 192ac503..9a8277be 100644 --- a/docs/translation.rst +++ b/docs/translation.rst @@ -5,12 +5,68 @@ .. _pegen: https://github.com/we-like-parsers/pegen .. _PEG parser: https://peps.python.org/pep-0617/ -Declarative Translation ------------------------ +Translation +----------- Translation is one of the most common tasks in language processing. Analysis often sumarizes the parsed input, and *walkers* are good for that. -In translation, the output can often be as verbose as the input, so a systematic approach that avoids bookkeeping as much as possible is convenient. + + +|TatSu| doesn't impose a way to create translators, but it +exposes the facilities it uses to generate the `Python`_ source code for +parsers. + + +Print Translation +~~~~~~~~~~~~~~~~~ + +Translation in |TatSu| is based on subclasses of ``NodeWalker``. Print-based translation +relies on classes that inherit from ``IndentPrintMixin``, a strategy copied from +the new PEG_ parser in Python_ (see `PEP 617`_). + +``IndentPrintMixin`` provides an ``indent()`` method, which is a context manager, +and should be used thus: + +.. code:: python + + class MyTranslationWalker(NodeWalker, IndentPrintMixin): + + def walk_SomeNodeType(self, node: NodeType): + self.print('some preamble') + with self.indent(): + # continue walking the tree + self.print('something else') + + +The ``self.print()`` method takes note of the current level of indentation, so +output will be indented by the `indent` passed to +the ``IndentPrintMixin`` constructor, or to the ``indent(amount: int)`` method. +The mixin keeps as stack of the indent ammounts so it can go back to where it +was after each ``with indent(amount=n):`` statement: + + +.. code:: python + + def walk_SomeNodeType(self, node: NodeType): + with self.indent(amount=2): + self.print(node.exp) + +The printed code can be retrieved using the ``printed_text()`` method, but other +posibilities are available by assigning a stream-like object to +``self.output_stream`` in the ``__init__()`` method. + +A good example of how to do code generation with a ``NodeWalker`` and ``IndentPrintMixin`` +is |TatSu|'s own code generator, which can be +found in ``tatsu/ngcodegen/python.py``, or the model +generation found in ``tatsu/ngcodegen/objectomdel.py``. + + +.. _PEP 617: https://peps.python.org/pep-0617/ + + +Declarative Translation (deprecated) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + |TatSu| provides support for template-based code generation ("translation", see below) in the ``tatsu.codegen`` module. @@ -26,8 +82,6 @@ breadth or depth first, using only standard Python_. The procedural code must kn to navigate it, although other strategies are available with ``PreOrderWalker``, ``DepthFirstWalker``, and ``ContextWalker``. -**deprecated** - |TatSu| doesn't impose a way to create translators with it, but it exposes the facilities it uses to generate the `Python`_ source code for parsers. diff --git a/grammar/tatsu.ebnf b/grammar/tatsu.ebnf index 340b4a14..a70e42fc 100644 --- a/grammar/tatsu.ebnf +++ b/grammar/tatsu.ebnf @@ -1,4 +1,4 @@ -@@grammar :: Tatsu +@@grammar :: TatSu @@whitespace :: /\s+/ @@comments :: ?"(?sm)[(][*](?:.|\n)*?[*][)]" @@eol_comments :: ?"#[^\n]*$" diff --git a/ruff.toml b/ruff.toml index 04687d63..28ca6f1e 100644 --- a/ruff.toml +++ b/ruff.toml @@ -20,6 +20,7 @@ ignore = [ "PLR0904", # too-many-public-methods "PLR0913", # too-many-arguments "PLR0915", # too-many-statements + "PLR0917", # too many possitional arguments "PLR2004", # magic-value-comparison "PLW1514", # unspecified-encoding # "PLW0603", # global-statement diff --git a/tatsu/_version.py b/tatsu/_version.py index a19f16d9..6f0b6fd4 100644 --- a/tatsu/_version.py +++ b/tatsu/_version.py @@ -1 +1 @@ -__version__ = '5.10.7b1' +__version__ = '5.11.0b1' diff --git a/tatsu/bootstrap.py b/tatsu/bootstrap.py index 6c3be580..ce462d20 100644 --- a/tatsu/bootstrap.py +++ b/tatsu/bootstrap.py @@ -1,15 +1,15 @@ -#!/usr/bin/env python +#!/usr/bin/env python3 -# CAVEAT UTILITOR +# WARNING: CAVEAT UTILITOR # -# This file was automatically generated by TatSu. +# This file was automatically generated by TatSu. # -# https://pypi.python.org/pypi/tatsu/ +# https://pypi.python.org/pypi/tatsu/ # -# Any changes you make to it will be overwritten the next time -# the file is generated. +# Any changes you make to it will be overwritten the next time +# the file is generated. -# ruff: noqa: C405, I001, F401, SIM117 +# ruff: noqa: C405, COM812, I001, F401, SIM117 import sys from pathlib import Path @@ -30,7 +30,7 @@ def __init__(self, text, /, config: ParserConfig | None = None, **settings): config = ParserConfig.new( config, owner=self, - whitespace=re.compile(r"\s+"), + whitespace='\\s+', nameguard=None, ignorecase=False, namechars='', @@ -41,38 +41,39 @@ def __init__(self, text, /, config: ParserConfig | None = None, **settings): start='start', ) config = config.replace(**settings) - super().__init__(text, config=config) + super().__init__(text, config=config) class EBNFBootstrapParser(Parser): def __init__(self, /, config: ParserConfig | None = None, **settings): config = ParserConfig.new( config, owner=self, - whitespace=re.compile(r"\s+"), + whitespace='\\s+', nameguard=None, ignorecase=False, namechars='', parseinfo=True, comments_re='(?sm)[(][*](?:.|\\n)*?[*][)]', eol_comments_re='#[^\\n]*$', - left_recursion=False, keywords=KEYWORDS, start='start', ) config = config.replace(**settings) + super().__init__(config=config) @tatsumasu() def _start_(self): self._grammar_() + @tatsumasu('Grammar') def _grammar_(self): self._constant('TATSU') self.name_last_node('title') - def block1(): + def block0(): with self._choice(): with self._option(): self._directive_() @@ -82,13 +83,13 @@ def block1(): self.add_last_node_to_name('keywords') self._error( 'expecting one of: ' - ' ' # noqa: COM812 + ' ' ) - self._closure(block1) + self._closure(block0) self._rule_() self.add_last_node_to_name('rules') - def block6(): + def block1(): with self._choice(): with self._option(): self._rule_() @@ -98,10 +99,14 @@ def block6(): self.add_last_node_to_name('keywords') self._error( 'expecting one of: ' - ' ' # noqa: COM812 + ' ' ) - self._closure(block6) + self._closure(block1) self._check_eof() + self._define( + ['title'], + ['directives', 'keywords', 'rules'], + ) self._define( ['title'], @@ -125,7 +130,7 @@ def _directive_(self): self._token('eol_comments') self._error( 'expecting one of: ' - "'comments' 'eol_comments'" # noqa: COM812 + "'comments' 'eol_comments'" ) self.name_last_node('name') self._cut() @@ -134,11 +139,7 @@ def _directive_(self): self._cut() self._regex_() self.name_last_node('value') - - self._define( - ['name', 'value'], - [], - ) + self._define(['name', 'value'], []) with self._option(): with self._group(): self._token('whitespace') @@ -159,14 +160,10 @@ def _directive_(self): self._constant('None') self._error( 'expecting one of: ' - "'False' 'None' " # noqa: COM812 + "'False' 'None' " ) self.name_last_node('value') - - self._define( - ['name', 'value'], - [], - ) + self._define(['name', 'value'], []) with self._option(): with self._group(): with self._choice(): @@ -181,7 +178,7 @@ def _directive_(self): self._error( 'expecting one of: ' "'ignorecase' 'left_recursion'" - "'nameguard' 'parseinfo'" # noqa: COM812 + "'nameguard' 'parseinfo'" ) self.name_last_node('name') self._cut() @@ -192,23 +189,15 @@ def _directive_(self): self._cut() self._boolean_() self.name_last_node('value') - - self._define( - ['value'], - [], - ) + self._define(['value'], []) with self._option(): self._constant(True) self.name_last_node('value') self._error( 'expecting one of: ' - "'::'" # noqa: COM812 + "'::'" ) - - self._define( - ['name', 'value'], - [], - ) + self._define(['name', 'value'], []) with self._option(): with self._group(): self._token('grammar') @@ -218,11 +207,7 @@ def _directive_(self): self._cut() self._word_() self.name_last_node('value') - - self._define( - ['name', 'value'], - [], - ) + self._define(['name', 'value'], []) with self._option(): with self._group(): self._token('namechars') @@ -232,24 +217,18 @@ def _directive_(self): self._cut() self._string_() self.name_last_node('value') - - self._define( - ['name', 'value'], - [], - ) + self._define(['name', 'value'], []) self._error( 'expecting one of: ' "'comments' 'eol_comments' 'grammar'" "'ignorecase' 'left_recursion'" "'namechars' 'nameguard' 'parseinfo'" - "'whitespace'" # noqa: COM812 + "'whitespace'" ) self._cut() + self._define(['name', 'value'], []) - self._define( - ['name', 'value'], - [], - ) + self._define(['name', 'value'], []) @tatsumasu() def _keywords_(self): @@ -258,6 +237,7 @@ def block0(): self._keywords_() self._positive_closure(block0) + @tatsumasu() def _keyword_(self): self._token('@@keyword') @@ -277,10 +257,11 @@ def block0(): self._token('=') self._error( 'expecting one of: ' - "':' '='" # noqa: COM812 + "':' '='" ) self._closure(block0) + @tatsumasu() def _paramdef_(self): with self._choice(): @@ -289,11 +270,7 @@ def _paramdef_(self): self._cut() self._params_() self.name_last_node('params') - - self._define( - ['params'], - [], - ) + self._define(['params'], []) with self._option(): self._token('(') self._cut() @@ -309,35 +286,28 @@ def _paramdef_(self): self._cut() self._kwparams_() self.name_last_node('kwparams') - - self._define( - ['kwparams', 'params'], - [], - ) + self._define(['kwparams', 'params'], []) with self._option(): self._params_() self.name_last_node('params') self._error( 'expecting one of: ' - ' ' # noqa: COM812 + ' ' ) self._token(')') - - self._define( - ['kwparams', 'params'], - [], - ) + self._define(['kwparams', 'params'], []) self._error( 'expecting one of: ' - "'(' '::'" # noqa: COM812 + "'(' '::'" ) + @tatsumasu('Rule') def _rule_(self): - def block1(): + def block0(): self._decorator_() - self._closure(block1) + self._closure(block0) self.name_last_node('decorators') self._name_() self.name_last_node('name') @@ -349,11 +319,7 @@ def block1(): self._cut() self._params_() self.name_last_node('params') - - self._define( - ['params'], - [], - ) + self._define(['params'], []) with self._option(): self._token('(') self._cut() @@ -369,49 +335,35 @@ def block1(): self._cut() self._kwparams_() self.name_last_node('kwparams') - - self._define( - ['kwparams', 'params'], - [], - ) + self._define(['kwparams', 'params'], []) with self._option(): self._params_() self.name_last_node('params') self._error( 'expecting one of: ' - ' ' # noqa: COM812 + ' ' ) self._token(')') - - self._define( - ['kwparams', 'params'], - [], - ) + self._define(['kwparams', 'params'], []) self._error( 'expecting one of: ' - "'(' '::'" # noqa: COM812 + "'(' '::'" ) with self._optional(): self._token('<') self._cut() self._known_name_() self.name_last_node('base') - - self._define( - ['base'], - [], - ) + self._define(['base'], []) self._token('=') self._cut() self._expre_() self.name_last_node('exp') self._token(';') self._cut() + self._define(['base', 'decorators', 'exp', 'kwparams', 'name', 'params'], []) - self._define( - ['base', 'decorators', 'exp', 'kwparams', 'name', 'params'], - [], - ) + self._define(['base', 'decorators', 'exp', 'kwparams', 'name', 'params'], []) @tatsumasu() def _decorator_(self): @@ -429,23 +381,25 @@ def _decorator_(self): self._token('nomemo') self._error( 'expecting one of: ' - "'name' 'nomemo' 'override'" # noqa: COM812 + "'name' 'nomemo' 'override'" ) self.name_last_node('@') + @tatsumasu() def _params_(self): self._first_param_() self.add_last_node_to_name('@') - def block1(): + def block0(): self._token(',') self._literal_() self.add_last_node_to_name('@') with self._ifnot(): self._token('=') self._cut() - self._closure(block1) + self._closure(block0) + @tatsumasu() def _first_param_(self): @@ -458,18 +412,20 @@ def _first_param_(self): 'expecting one of: ' '(?!\\d)\\w+(?:::(?!\\d)\\w+)+ ' ' ' - ' ' # noqa: COM812 + ' ' ) + @tatsumasu() def _kwparams_(self): def sep0(): self._token(',') - def block0(): + def block1(): self._pair_() - self._positive_gather(block0, sep0) + self._positive_gather(block1, sep0) + @tatsumasu() def _pair_(self): @@ -480,6 +436,7 @@ def _pair_(self): self._literal_() self.add_last_node_to_name('@') + @tatsumasu() def _expre_(self): with self._choice(): @@ -490,9 +447,10 @@ def _expre_(self): self._error( 'expecting one of: ' "'|'