-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieving error correction tokens #455
Comments
Hello @stephe-ada-guru, Really sorry about the time we took to answer. Somehow this got lost in the flux of internal & external issues we handle. We don't (yet) have a lot of stored information wrt. error recovery, and there is no notion of inserted/deleted tokens, even though we do kind of insert and delete tokens. "deleted" tokens are stored in OTOH, we don't keep track of "inserted" tokens directly, and we don't actually insert any tokens, instead we presume that the tokens were here even if they're not (sort of). Now, I'm wondering if that would be a worthwhile addition to our parser, because it might make things easier to use. But that's a pretty big overhaul of the current API. We'll discuss this with the team and keep you updated! |
Raphaël AMIARD <[email protected]> writes:
Hello @stephe-ada-guru,
Really sorry about the time we took to answer. Somehow this got lost
in the flux of internal & external issues we handle.
Ok.
I wrote code based on your current pretty-printer to output the token
sequence implied by a syntax tree. Tedious, but straight-forward.
I'm now working on doing the same for the tree-sitter parser, which
claims to have good error recovery. First I have to port my Ada grammar
to their grammar file syntax, which is turning out to be more work than
I anticipated. It doesn't use BNF syntax; it's closer to your grammar
file syntax.
Once that's done, I'll get back to writing the paper that describes all
this. Writing Ada code is always more fun than writing Latex prose about
Ada code :), so it will take a while.
We don't (yet) have a lot of stored information wrt. error recovery,
and there is no notion of inserted/deleted tokens, even though we do
kind of insert and delete tokens.
"deleted" tokens are stored in `ErrorDecl` nodes, one token per node,
so those should be pretty easy for you to recover.
OTOH, we don't keep track of "inserted" tokens directly, and we don't
actually insert any tokens, instead we presume that the tokens were
here even if they're not (sort of).
Now, I'm wondering if that would be a worthwhile addition to our
parser, because it might make things easier to use. But that's a
pretty big overhaul of the current API. We'll discuss this with the
team and keep you updated!
Emacs ada-mode uses the insert/delete list to automatically correct any
errors in a parameter list before formatting it. However, I implemented
that when the formatting code was all in elisp; now it's in Ada, and
uses the syntax tree directly (it's an instance of "refactor"). So
that's actually not needed any more.
The error correction is also available on user request, but I've never
actually used it except when testing it. So I suspect this is not a very
useful feature.
The generalized LR parser does use the length of the insert/delete list
as a metric in deciding which parallel parser to terminate when two
parsers reach an identical state. I don't understand error correction in
a packrat parser, but there must be some sort of choice between possible
corrections, where the insert/delete list could be useful, but some
other measure of error severity could work as well.
I do look at the error messages from the parser, to figure out why
indent or highlight is wrong, so good error messages are useful
(something my parser is _not_ very good at).
…--
-- Stephe
|
I'm trying to compare the error correction in the libadalang parser with the parser in Emacs Ada mode (and other parsers). One way to do that is to retrieve the token list of the final parse, including "virtual tokens" inserted for error correction, and excluding deleted tokens. Diffing that token list with the similar lists from the other parsers, and with the user-expected "correct" token list, gives a fairly objective measure of error correction.
In doing this for libadalang, I first tried:
This only gives the "real" tokens; the ones present in the source text.
The diagnostics gives some hints about inserted and deleted tokens, but is not explicit enough for this use.
I don't see any mention of something like virtual tokens in the libadalang specs.
One way to output the list I'm looking for would be to traverse the AST, outputting the tokens implied by the structure. This is a lot of work, although I suspect I can copy code from gnatpp that does mostly the same thing. I can't just use gnatpp; it refuses to output anything when there are syntax errors in the source.
An LSP language server should be able to provide a source edit script for each detected syntax error; if that functionality is in libadalang or ada_language_server somewhere, I could use that.
Is there another way?
NOTE: edited by @raph-amiard for style corrections
The text was updated successfully, but these errors were encountered: