Use a separate lexer #836

bjorn3 · 2021-09-30T18:31:57Z

bjorn3
Sep 30, 2021

This will make it easier to recover from lexer errors like a newline in the middle of a quoted string. I think it will also make it easier to do program repair, by allowing access to a list of tokens even if the program has a syntax error instead of presenting program repair a raw string and requiring it to do lexing itself as necessary. It could also allow the grammar to be agnostic to whitespace by handling all whitespace separation in the lexer.

There is the slight complication though that lexing for Hedy will need to be context sensitive due to the existence of unquoted strings. AFAIK they can contain words that are keywords outside of said unquoted strings.

Lark allows custom lexers through the lexer argument when constructing the parser. You need to pass a class implementing Lexer whose lex method accepts the parser input and yields Tokens.

https://lark-parser.readthedocs.io/en/latest/examples/advanced/custom_lexer.html

Felienne · 2021-10-01T07:19:58Z

Felienne
Oct 1, 2021
Maintainer

Hi @bjorn3!

I think this is a lovely idea! But, it fits a discussion more than an issue.

In our repo, issues are meant for things we are going to fix in a PR, discussions are a place to chat about stuff we do not know if we want to do yet, or we are not sure how to do.

1 reply

bjorn3 Oct 1, 2021
Author

I see, moved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a separate lexer #836

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Use a separate lexer #836

bjorn3 Sep 30, 2021

Replies: 1 comment · 1 reply

Felienne Oct 1, 2021 Maintainer

bjorn3 Oct 1, 2021 Author

bjorn3
Sep 30, 2021

Replies: 1 comment 1 reply

Felienne
Oct 1, 2021
Maintainer

bjorn3 Oct 1, 2021
Author