Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backtracking #75

Open
Kodiologist opened this issue Jul 31, 2022 · 3 comments
Open

Backtracking #75

Kodiologist opened this issue Jul 31, 2022 · 3 comments

Comments

@Kodiologist
Copy link

Kodiologist commented Jul 31, 2022

It looks like funcparserlib has no way to allow backtracking, in the manner of Parsec's try. #43 provides one example of where you might want this. I found myself wanting it in changing Hy to parse dfor forms like this (hylang/hy#2324):

(dfor  x (range 10)  y (range 3)  #(x y) (* x y))

It would be the equivalent of this Python:

{(x, y): x*y  for x in range(10)  for y in range(3)}

dfor and friends in Hy have a relatively complicated parser, so here's a simplified example:

from collections import namedtuple
from funcparserlib.parser import tok, many, finished
from funcparserlib.lexer import Token

FORM = tok('FORM')

loopers = many(FORM + FORM)

lfor = loopers + FORM + finished
dfor = loopers + FORM + FORM + finished

def f(parser, ts):
    return parser.parse([Token(t, 1) for t in ts.split()])

print(f(lfor, 'FORM FORM  FORM FORM  FORM'))
print(f(dfor, 'FORM FORM  FORM FORM  FORM FORM'))

Here the example with lfor returns ([(1, 1), (1, 1)], 1, None), as desired, but for dfor, I get funcparserlib.parser.NoParseError: got unexpected end of input, expected: FORM. The problem is that loopers consumes all the FORM FORM pairs and doesn't try moving back one iteration of many to give the final FORM + FORM + finished in dfor a chance to match.

@vlasovskikh
Copy link
Owner

See also the docs on try in Parsec.

@vlasovskikh
Copy link
Owner

Note: This Markdown reply is executable via python3 -m doctest reply-text.md.

Imports:

>>> from funcparserlib.parser import a, many, finished, forward_decl, Parser, pure

The alternative <|> combinator in Parsec always uses the look ahead of 1. Therefore you have to use try there if you need an arbitrary look ahead. In contrast to that, the alternative | combinator in funcparserlib always uses an arbitrary look ahead. Here is an example with the look ahead of 2, but it's really arbitrary:

>>> x = a("x")
>>> y = a("y")
>>> z = a("z")
>>> xxy_or_xxz = x + x + y | x + x + z
>>> xxy_or_xxz.parse("xxy")
('x', 'x', 'y')
>>> xxy_or_xxz.parse("xxz")
('x', 'x', 'z')

@Kodiologist Could you please share how exacly you would have fixed the problem in your example by using Parsec's try? Then you and I could think of either a workaround or a feature request for funcparserlib.

many() is greedy, like * in regexps. It consumes all matching tokens and always succeeds:

>>> x = a("x")
>>> many_x = many(x)
>>> many_x.parse("xxx")
['x', 'x', 'x']
>>> many_x.parse("y")
[]

The problem with the greedy * in x*xx implemented via many() is that it consumes all the tokens and the + combinator doesn't do any backtracking:

>>> x = a("x")
>>> many_x_then_x = many(x) + x + x
>>> many_x_then_x.parse("xxx")
Traceback (most recent call last):
    ...
funcparserlib.parser.NoParseError: got unexpected end of input, expected: 'x'

I guess that you, in fact, want a non-greedy version of many(). A sort of the *? operator in regexps in addition to *, right? A regexp equivalent of this would be x*?xx. Then you could have written something like this:

>>> x = a("x")
>>> def non_greedy_many(p):
...     # Fake implementation for a single case
...     return a("x") >> list
>>> many_x_then_x = non_greedy_many(x) + x + x
>>> many_x_then_x.parse("xxx")
(['x'], 'x', 'x')

The problem is that I currenlty have no idea how to implement it easily, since + in funcparserlib doesn't support any backtracking.

Workaround 1. Extract the last x, x at the tree transformation level:

>>> x = a("x")
>>> def to_results(args):
...     x1, x2, xs = args
...     combined = [x1] + [x2] + xs
...     *rest, last1, last2 = combined
...     return rest, last1, last2
>>> many_x_then_x = x + x + many(x) >> to_results
>>> many_x_then_x.parse("xxx")
(['x'], 'x', 'x')
>>> many_x_then_x.parse("xx")
([], 'x', 'x')
>>> many_x_then_x.parse("x")
Traceback (most recent call last):
    ...
funcparserlib.parser.NoParseError: got unexpected end of input, expected: 'x'
>>> many_x_then_x.parse("")
Traceback (most recent call last):
    ...
funcparserlib.parser.NoParseError: got unexpected end of input, expected: 'x'

@Kodiologist
Copy link
Author

It's been a long time (over a decade) since I touched Parsec and so perhaps I was wrong in thinking it can be done with try. But yes, a non-greedy repetition operator fits the bill here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants