Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Generated Code not equivalent to AST #341

Open
sk- opened this issue Jul 12, 2020 · 11 comments · May be fixed by #458 or grepdemos/LibCST#11
Open

[BUG] Generated Code not equivalent to AST #341

sk- opened this issue Jul 12, 2020 · 11 comments · May be fixed by #458 or grepdemos/LibCST#11

Comments

@sk-
Copy link
Contributor

sk- commented Jul 12, 2020

When doing replacements in a node, it's easy to end up with a tree, which won't get correctly converted to source code. The issue is that libcst is not adding the necessary parentheses.

For example, the code below

import libcst.matchers as m
import libcst

module = libcst.parse_module('x * y')

matcher = m.Name(value='x')
new_module = m.replace(module, matcher, libcst.parse_expression("1 + 2"))
print(new_module.code)

will output

1 + 2 * y

whereas the expected code is:

(1 + 2) * y

I think it's reasonable to expect that a module is equivalent to parse_module(module.code).

@sk- sk- changed the title [BUG] Generated Code different than AST [BUG] Generated Code not equivalent to AST Jul 13, 2020
@jimmylai
Copy link
Contributor

Good catch! The parentheses are added by attributes lpar/rpar and their values are a sequence.
For parentheses, we didn't introduce something like MaybeSentinel used on other whilespace attributes.
To fix this issue, we can automatically add the required parentheses in _visit_and_replace_children of BinaryOperation by checking the operator precedence. https://docs.python.org/3/reference/expressions.html#operator-precedence
That way, when any of the children of BinaryOperation is updated, the missing parentheses will be automatically added if needed.
Feel free to submit a PR to fix it and I'm more than happy to review it.

@sk-
Copy link
Contributor Author

sk- commented Jul 21, 2020

@jimmylai I'm not sure _visit_and_replace_children is the right place to add this logic, as one could programmatically create a Module without using the visitors and then calling module.code, which won't trigger a visit, so the nodes won't be updated.

Maybe an option would be to change _codegen_impl and _BaseParenthesizedNode._parenthesize to receive the parent node.

Then _parerenthesize could be something like:

@contextmanager
def _parenthesize(self, parent: CSTNode, state: CodegenState) -> Generator[None, None, None]:
    need_pars = need_parentheses(self, parent)                 # new code
    if need_pars:                                              # new code
        LeftParen()._codegen(state)                            # new code
    for lpar in self.lpar:
        lpar._codegen(state)
    with state.record_syntactic_position(self):
        yield
    for rpar in self.rpar:
        rpar._codegen(state)
    if need_pars:                                              # new code
        RightParen()._codegen(state)                           # new code

Note that this does not only affect BinaryOperation, but also UnaryOperation, Comparison, etc.
There are also two special cases:

  1. generator expressions, which need to be parenthesized when they are in a call with 2 or more arguments
  2. tuples when they become part of an expression (addition, call argument, comparison, etc).

I do have a custom implementation of replace which checks the precedence and the edge cases above when replacing the node, but it requires the ParentNodeProvider. This of course has its limitations as metadata is not recomputed. I'm missing though the same logic with a transformer. Would happily contribute it.

@jimmylai
Copy link
Contributor

@jimmylai I'm not sure _visit_and_replace_children is the right place to add this logic, as one could programmatically create a Module without using the visitors and then calling module.code, which won't trigger a visit, so the nodes won't be updated.

In our design, every LibCST node is an immutable dataclass instance and thus we only provide transformer patter for modifying the tree.

E.g. https://github.com/Instagram/LibCST/blob/master/libcst/_nodes/module.py#L33-L35

The matcher replace() API also uses a transformer to replace target nodes.

Do you still think adding the logic in _visit_and_replace_children won't work?

@sk-
Copy link
Contributor Author

sk- commented Jul 22, 2020

Hi @jimmylai, I still think that _visit_and_replace_children won't work, as one can generate an AST without using the visitor pattern.

For example, the code below also triggers the bug:

import libcst

module = libcst.Module(body=[
    libcst.BinaryOperation(
        left = libcst.parse_expression("1 + 2"),
        operator = libcst.Multiply(), 
        right = libcst.Integer("3")
    )
])
print(module.code)

producing the output 1 + 2 * 3, but it should have been (1 + 2) * 3.

In this case _visit_and_replace_children won't be called. I checked it by adding print statements to the method definition in both Module and BinaryOperation

@sk-
Copy link
Contributor Author

sk- commented Aug 5, 2020

@jimmylai WDYT?

@jimmylai
Copy link
Contributor

jimmylai commented Aug 5, 2020

Yeah, the example tree you provided is created from bottom up and it's possible to create a tree like that.
Use ParentProvider is not ideal. We don't use metadata in codegen and tree validation so far.
Maybe we can run the transformer you mentioned when user calls code_for_node? That is the entry point of code generation.

def code_for_node(self, node: CSTNode) -> str:

@sk-
Copy link
Contributor Author

sk- commented Aug 5, 2020

I'm not suggesting to use the metadata in codegen, just to change _codegen_impl and _parenthesize to receive the parent, which is already available as the traversal is top to bottom.

With your proposal you will be traversing the tree twice, once to fix the parentheses and a second time to generate the code.

@sk-
Copy link
Contributor Author

sk- commented Sep 21, 2020

@jimmylai Any chance to revisit this? It'd be great if we could discuss and agree on a proposal to fix this.

After looking deeper into how other syntactic elements are treated, I think we should also discuss whether parens should accept a MaybeSentinel as a default, or whether an empty sequence would always mean MaybeSentinel.

Also related to how some other elements are generated, another alternative is to check in the parent whether a child needs parentheses.

@cdonovick
Copy link
Contributor

I just ran into this bug as well.

It may not be pretty but a simple work around is:

class FixParens(cst.CSTTransformer):
    def _wrap_with_parens(self, node):
        return node.with_changes(
                lpar=[cst.LeftParen()],
                rpar=[cst.RightParen()],
            )

    def leave_BinaryOperation(self,
            original_node: cst.BinaryOperation,
            update_node: cst.BinaryOperation
            ) -> cst.BinaryOperation:
        if not update_node.lpar:
            assert not update_node.rpar
            return self._wrap_with_parens(update_node)
        return update_node
    
    # similar for leave_UnaryOperation

Now this has the downside that it inserts parens where they are not needed but does guarantee hand built expressions are properly grouped.

@sk-
Copy link
Contributor Author

sk- commented Oct 27, 2020

That solution would make it impracticable to use in a code refactoring tool, as lots of unneeded parentheses would be added and unfortunately code formatters do not remove extra parens.

I do have a more comprehensive transformer that only adds the parenthesis when needed but the problem is that it requires the parent and ParentNodeProvider does not recompute the metadata, that's why I'm asking for a baked in solution.

@cdonovick
Copy link
Contributor

@sk- agreed. I actually use an additional guard (updated_node is not original_node) to minimize the "damage" it does. Certainly not ideal though. Definitely something that should be handled in the code generator.

@cdonovick cdonovick linked a pull request Feb 17, 2021 that will close this issue
@lizard-boy lizard-boy linked a pull request Oct 6, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants