Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: lookahead (?=) (?!) #11

Open
conartist6 opened this issue Jan 11, 2021 · 16 comments
Open

Feature: lookahead (?=) (?!) #11

conartist6 opened this issue Jan 11, 2021 · 16 comments

Comments

@conartist6
Copy link
Member

Support lookahead assertions. Given a lookahead expression and a contingent sequence, here's what has to happen:

Make a lookahead state at the engine level. It will essentially be a normal expression state of [lookahead, contingent] with a bit of special sauce in handling success and failure.

  • When lookahead succeeds it must fail the lookahead expression, causing contingent to become the best alternative
  • When lookahead fails the entire lookahead state must fail.
@conartist6
Copy link
Member Author

I'm going to build this. I hadn't looked at the code in long enough that I forgot how some of it works. I took it as an opportunity to improve the docs a bit as I rediscovered where I put the important logic. I'm confident once again that I know how to do what this issue describes, and I'll do it just as soon as the macrome generated artifacts are stable again (shifted from timestamps to hashes)

@conartist6
Copy link
Member Author

This will be significantly simpler once #35 lands I think.

@conartist6 conartist6 changed the title Support lookahead (?=) (?!) Feature: lookahead (?=) (?!) Mar 30, 2022
@conartist6
Copy link
Member Author

I'm ready to implement this, and I'm less likely to mess it up since I just spent so much time rewriting and debugging the engine.

@conartist6 conartist6 mentioned this issue Mar 30, 2022
@conartist6
Copy link
Member Author

conartist6 commented Mar 31, 2022

The architectural question is how to look at expressions when succeeding or failing. Currently success and failure are the responsibility of sequences, and expressions are assumed to be safe to prune when they no longer are needed to hold a list.

Currently there are two places in the code that are relevant.
In remove a failing sequence may replace its parent if it has no siblings, which would be incorrect if its parent is a lookahead.

parent.replaceWith(sibling);

A succeeding sequence is promoted, which may allow it to eliminate expressions:
promote(): Expression | null {
let expr: Expression = this;
// A single match may break out of multiple layers of nesting
while (expr.better === null && (expr === this || !(expr instanceof Match))) {
expr = expr.parent;
}
if (expr !== this && expr instanceof Match) {
return expr.replaceWith(this);
} else {
expr.removeWorse();
if (expr !== this) expr.replaceWith(this);
return expr;
}
}

@conartist6
Copy link
Member Author

conartist6 commented Mar 31, 2022

The question will be, do I want a recursive solution, which would allow me to program the behavior as a a method overload, or should I try to avoid recursion, as I am unsure how deep the state tree may be?

A middle ground may be for a lookahead to be a subclass of match, say SpeculativeMatch extends Match. This would make it easier to quickly check for lookaheads, as matches have their own linkages. I think that's probably the thing to do.

@conartist6
Copy link
Member Author

If I do use a SpeculativeMatch it will need some way to know that the lookahead has succeeded. This will require support in regex.ts which currently would only ever return a success result when the entire pattern is terminated. If lookahead is a kind of match though, it would need a success result to be returned or else it would not be possible for the lookahead to succeed!

@conartist6
Copy link
Member Author

conartist6 commented Mar 31, 2022

I've been assuming I need engine-level support for this, but is that really true?

It's not too hard to create an endLookahead matcher which will fail when its contents succeed. What I guess isn't possible (?) is to do the other thing, to fail the whole expression if the sequence fails. When a sequence fails the chain of matchers is broken, and regex.ts has no more control over what happens. So yes, I think it is true.

@conartist6
Copy link
Member Author

if I make implement a SpeculativeMatch type in engine how will that impact the chaining of matchers? What is next in the chain after a lookahead, and how is that going to be called?

@conartist6
Copy link
Member Author

conartist6 commented Apr 1, 2022

I'll have a next -- a reference to the thing that follows the sequence. But that will already be used immediately because it'll be one of the two sequences held (initially) by the speculative match. So it will indeed be safe to just ignore the next reference when causing a speculative match to succeed

@conartist6
Copy link
Member Author

conartist6 commented Apr 2, 2022

A failure of the contingent sequence will also need to erase the speculative sequence. How will we know when to do this? The conditions are slightly different -- the lookahead sequence may still be active, which is to say that a SpeculativeMatch fails when it has one remaining sequence vs a normal Match which fails when it has 0

@conartist6
Copy link
Member Author

I think I want a SpeculativeSequence and a SpeculativeExpression. This way when a speculative match fails it will be a sequence failing that can overload methods to do the correct work.

@conartist6
Copy link
Member Author

It's time! I need this feature myself, and am building it.

@conartist6
Copy link
Member Author

conartist6 commented May 22, 2022

Current problem: captures. First, I don't really remember entirely how what I built works, and I'm not 100% sure it's right.

Second: lookahead has some really weird cases. Evaluating something like /(a)(?=(b))/.exec('abc') I actually need to be able to insert the capture group 2 result (b) into the captures after the rest of the pattern has already succeeded!

@conartist6
Copy link
Member Author

For an execution like /(a)(?=.(c))(b)/.exec('abc') I need the captures to be a, c, b, which is tricky because the order we'll actually capture in is a, b, c.

@conartist6
Copy link
Member Author

I think I've duplicated part of the engine's data structure in the stack-of-lists data structure I use for captures.

@conartist6
Copy link
Member Author

OK, that's fixed. Captures themselves no longer contain parentList, which is now part of captureStack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant