Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow extracting (deeply) nested calls in Python and Javascript #1127

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dylankiss
Copy link
Contributor

@dylankiss dylankiss commented Sep 19, 2024

Currently the Python extractor does not support deeply nested gettext calls (deeper than as a direct argument to the top-level gettext call).

e.g.

_("Hello %s", _("Person"))
_("Hello %s",
  random_function(", ".join([_("Person 1"), _("Person 2")])))

The extraction code was refactored quite a bit to simplify the flow and support this use-case.

Currently the Javascript extractor does not support nested gettext calls at all.

The extraction code was refactored a bit to resemble the Python code as much as possible and support this use-case.

Fixes #1125 (meanwhile also fixes #1123)

@dylankiss
Copy link
Contributor Author

During the refactor, the order of extraction was also changed, as you can see in this test:
https://github.com/python-babel/babel/pull/1127/files#diff-c74d633b5cd37350f5a10b2697475119ba1db4f541eeccec744f7d79ab99d6c1R437-R452

It is now the same as the extraction order of xgettext. Also the comments extraction was fixed to be the same as xgettext and apply to all gettext calls (also nested ones) on the same line.
Nested translator comment with nested gettext calls are also supported now, just like with xgettext.

e.g.

# NOTE: Main Comment
_("Hello %s",
    # NOTE: Nested Comment
    _("Nested Gettext")
)

Both terms would get their right comment extracted.

@tomasr8
Copy link
Member

tomasr8 commented Sep 23, 2024

Not saying this is not worth fixing, but out of curiosity, do nested gettext calls actually come up often? I don't think I've ever come across one..

@dylankiss
Copy link
Contributor Author

@tomasr8 In our own codebase with lots of developers, people assume it works and it happens from time to time that they add in nested gettext calls. Even the deeply nested ones happen, like this example: https://github.com/odoo/odoo/pull/149921/files#diff-e073b7fa9d45d46ba8d7f011257b0e77e1f87bf47982abc63dd618ff05dddb1aL267-L268
I think it deserves a fix, meanwhile also fixing some other small issues 🤷

@dylankiss dylankiss changed the title Allow extracting deeply nested calls in Python Allow extracting (deeply) nested calls in Python and Javascript Oct 10, 2024
@dylankiss
Copy link
Contributor Author

UPDATE: I added an extra commit to also allow nested calls in the Javascript extractor. If it's better to open a separate PR for that, no problem.

Currently the Python extractor does not support deeply nested gettext
calls (deeper than as a direct argument to the top-level gettext call).

e.g.
```py
_("Hello %s", _("Person"))
_("Hello %s",
  random_function(", ".join([_("Person 1"), _("Person 2")])))
```

The extraction code was refactored quite a bit to simplify the flow and
support this use-case.

Fixes python-babel#1125
(meanwhile also fixes python-babel#1123)
Currently the Javascript extractor does not support nested gettext calls
at all.

The extraction code was refactored a bit to resemble the Python code
as much as possible and support this use-case.
Copy link
Member

@akx akx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments within, including some that would make this easier to review for me 😄

Comment on lines +550 to +556
function_stack.append({
'function_line_no': line_no,
'function_name': last_name,
'message_line_no': None,
'messages': [],
'translator_comments': cur_translator_comments,
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a typing.NamedTuple or a dataclass would be more appropriate than a dict for this state.

# Keep track of the (split) strings encountered
message_buffer = []

for token, value, (line_no, _), _, _ in tokens:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny thing, but could the local line_no be renamed back to lineno? It would make reviewing easier since the diff is smaller 😅
(Similarly, line_no elsewhere should maybe be lineno for consistency and compat.)

Comment on lines +727 to +728
jsx=options.get('jsx', True),
template_string=options.get('template_string', True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spurious changes, please revert?

Comment on lines -427 to +462
assert messages[0][2] == ('Hello, {name}!', None)
assert messages[0][2] == 'Foo Bar'
assert messages[0][3] == ['NOTE: First']
assert messages[1][2] == 'Foo Bar'
assert messages[1][3] == []
assert messages[2][2] == ('Hello, {name1} and {name2}!', None)
assert messages[1][2] == ('Hello, {name}!', None)
assert messages[1][3] == ['NOTE: First']
assert messages[2][2] == 'Heungsub'
assert messages[2][3] == ['NOTE: Second']
assert messages[3][2] == 'Heungsub'
assert messages[3][2] == 'Armin'
assert messages[3][3] == []
assert messages[4][2] == 'Armin'
assert messages[4][3] == []
assert messages[5][2] == ('Hello, {0} and {1}!', None)
assert messages[4][2] == ('Hello, {name1} and {name2}!', None, None)
assert messages[4][3] == ['NOTE: Second']
assert messages[5][2] == 'Heungsub'
assert messages[5][3] == ['NOTE: Third']
assert messages[6][2] == 'Heungsub'
assert messages[6][2] == 'Armin'
assert messages[6][3] == []
assert messages[7][2] == 'Armin'
assert messages[7][3] == []
assert messages[7][2] == ('Hello, {0} and {1}!', None, None)
assert messages[7][3] == ['NOTE: Third']
assert messages[8][2] == 'Person'
assert messages[8][3] == ['NOTE: Fourth']
assert messages[9][2] == ('Hello %(person)', None)
assert messages[9][3] == ['NOTE: Fourth']
assert messages[10][2] == 'Person 1'
assert messages[10][3] == []
assert messages[11][2] == 'Person 2'
assert messages[11][3] == []
assert messages[12][2] == ('Hello %(people)', None)
assert messages[12][3] == ['NOTE: Fifth']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this test be rewritten in a... less verbose way? Looks like it's only looking at indices 2 and 3 of each message, so maybe redo it as something like

assert [(m[2], m[3]) for m in messages] == [
    (..., ...),
    (..., ...),
    (..., ...),
    ...
]

?

I reckon it would be easy to generate the ... segment by doing assert [(m[2], m[3]) for m in messages] == 8 or similar and copy-pasting the complaint pytest -vv would inevitably throw :)

@tomasr8
Copy link
Member

tomasr8 commented Nov 15, 2024

I'm not a big fan of the token-based extractor getting even more complex. I'm thinking we might be able to replace the python extractor with a NodeVisitor which would simplify the code and it would also solve all of the issues with nested calls, f-strings etc., once and for all.

@tomasr8
Copy link
Member

tomasr8 commented Nov 16, 2024

So I did some investigation and an AST-based extractor cuts down the complexity quite a bit. However, it's about twice as slow compared to the current extractor. @akx Given the slowdown, is this something worth pursuing in your opinion?

@dylankiss
Copy link
Contributor Author

@tomasr8 @akx Depending on what we want, I can adapt the PR accordingly. I agree it's not the nicest and most robust way of traversing through a code file, but if the performance is degraded that much by using an AST-based extractor it might still be best to continue this way 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants