Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Representation for Pattern Matches - Syntax #286

Closed
thtuerk opened this issue Aug 30, 2015 · 46 comments
Closed

New Representation for Pattern Matches - Syntax #286

thtuerk opened this issue Aug 30, 2015 · 46 comments

Comments

@thtuerk
Copy link
Member

thtuerk commented Aug 30, 2015

This is a subtask of issue #285.

A PMATCH expression for expresion MEM x l is

PMATCH l [
  PMATCH_ROW (\(uv:unit). []) (\uv. T) (\uv. F);
  PMATCH_ROW (\(y,ys). y::ys) (\(y,ys). x = y) (\(y,ys). T);
  PMATCH_ROW (\(_0,ys). _0::ys) (\(_0,ys). T) (\(_0,ys). MEM x ys)
]

The following syntax available for parsing and pretty-printing:

CASE l OF [ 
  ||. [] ~> F; 
  || (y,ys). y::ys when (x = y) ~> T;
  || ys. _::ys ~> MEM x ys
]

There are several issues with this

  • list syntax is reused. This makes parsing much simpler, but the ; at the end of each line looks odd
  • the || notation clashes with or for words
  • mentioning all bound vars of a pattern can be tedious

The last issue needs discussion. I personally like this notation. It makes everything clearer in my opinion. However, I have to admit that it might become tedious to use. As minor mitigation, the input syntax ||! can be used. It marks all variables occurring in the pattern as bound. However, this happens after typechecking. This means that unexpected effects might links bound variables to bound variables of other rows or free variables of the context. One might work on pre-terms instead, but this is much harder to implement.

It would be great if some ideas could be exchanged and perhaps once we decided on the goals, someone could perhaps help me getting the parser working, if needed.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@mn200
Copy link
Member

mn200 commented Aug 31, 2015

My suggestion is that we aim for

'case' <exp> 'of' <row>+

where a row is something like

  <row> ::= '|'? <pat> '=>' <exp>  | '|.' <bvtuple> '.' <pat> '=>' <exp>

(The optional leading '|' has to be present on rows after the first.)

I am happy to implement syntax along these lines, and think it should be possible.

I don't know what exactly should be done with instances of the case-syntax that occur before lists.

@thtuerk
Copy link
Member Author

thtuerk commented Aug 31, 2015

In principal I like your proposed syntax (provided we also add guards to your proposal). However, it would clash with the currently used syntax for case-expressions that get compiled to decision trees. I fear this might break quite a few existing developments. We should check carefully before deciding, I believe. As a test, we could just set the trace parse deep cases and see what happens. In principle, I have no problem replacing existing syntax, though. I believe, we would need to provide in this case a new syntax for the old case-expressions. Any suggestions? If we need a new syntax anyhow though, we could use it for the new expressions and avoid confusion?

Moreover, I'm uncertain, what the semantics of your proposal would be. If no list of bound variables is provided, I expect it would mean that no variables are bound by the pattern? This would match the current design of PMATCH. I like this approach of naming bound variables explicitly. However, it would be different than the existing case-expressions causing potential trouble because of similar notations with different semantics.

@mn200
Copy link
Member

mn200 commented Aug 31, 2015

No, I was thinking that use of | (rather than |.) would bind all variables; hopefully making the semantics the same as the existing. Because of that, I am not too bothered by the prospect of replacing the old with the new. I looked to see how many uses of case-of there were before lists in src and there don't seem to be too many.

@thtuerk
Copy link
Member Author

thtuerk commented Aug 31, 2015

I fear, changing the behavior, will break a lot. Especially the examples. I'm also not too happy about the automatic binding of variables. One of the points of the new representation was too make things very obvious. And automatic guessing of bound vars is violating this in my opinion. Using the old syntax is easy anyhow and I implemented that already. It can be activated with the trace parse deep cases. How about checking how much work it would for example be getting Anthony's ARM model working with that? It would just require some boilerplate like

open patternMatchesLib
set_trace "parse deep cases" 1;

And then see what happens. I see that there might be a desire to not maintain two representations of pattern matching. But currently, the old version is still heavily used by Define and my new one won't work with compute easily (only after compilation to a decision tree essentially). So, I think for now I don't feel confident getting rid of the old method. At least I would like some new syntax for the old one then.

@acjf3 How much hazzle would a change be for you? Can you estimate that without much experimenting?

@mn200
Copy link
Member

mn200 commented Sep 1, 2015

  • I don't think that "guessing bound vars" is a fair characterisation: people are used to this behaviour, and people are pretty good at figuring out what the bound variables will be. (You yourself suggested using |! to give us this behaviour.)
  • I am not sure I understand what the parse deep cases trace does. My guess: if this trace is on, and the user writes with the old syntax, this syntax is translated to patternMatches semantics. If this is available as I describe, I think it would be interesting to turn it on by default, and to then see what breaks.
  • Strictly, I don't think that the old-style syntax is heavily used by Define. It certainly uses pattern-matching resolution to figure out how to represent definitions, and it then translates those to use case-constants as necessary. We'd want to keep this code unchanged, so we would need to maintain the existing code to support this. But the syntax-to-semantics of case expressions could certainly change without affecting Define.
  • How much work does “compiling to decision tree” represent? If we did this for every theorem added to a compset, would it be very noticeable. I totally agree that we need to be able to execute new style case expressions with computeLib.

@konrad-slind
Copy link
Contributor

Creating the induction theorem is also an important consideration. The
main algorithm in that is proving pattern completeness, and that took
a lot of attention to details, especially to make it run fast. (It can
obviously
be farmed out to metis or something like that, but I would worry about
it being slow in that case.)

Konrad.

On Mon, Aug 31, 2015 at 7:08 PM, Michael Norrish [email protected]
wrote:

I don't think that "guessing bound vars" is a fair characterisation:
people are used to this behaviour, and people are pretty good at figuring
out what the bound variables will be. (You yourself suggested using |!
to give us this behaviour.)

I am not sure I understand what the parse deep cases trace does. My
guess: if this trace is on, and the user writes with the old syntax, this
syntax is translated to patternMatches semantics. If this is available
as I describe, I think it would be interesting to turn it on by default,
and to then see what breaks.

Strictly, I don't think that the old-style syntax is heavily used by
Define. It certainly uses pattern-matching resolution to figure out
how to represent definitions, and it then translates those to use
case-constants as necessary. We'd want to keep this code unchanged, so we
would need to maintain the existing code to support this. But the
syntax-to-semantics of case expressions could certainly change
without affecting Define.

How much work does “compiling to decision tree” represent? If we did
this for every theorem added to a compset, would it be very noticeable. I
totally agree that we need to be able to execute new style case expressions
with computeLib.


Reply to this email directly or view it on GitHub
#286 (comment)
.

@mn200
Copy link
Member

mn200 commented Sep 1, 2015

Sure; I don't think any code in TFL would change or need to change. The machinery there would still be using case-constants so that something like

f 0 = e1 /\
f 1 = e2 /\
f (SUC n) = ... f n ...

could still compile internally to something involving num_CASE or whatever else was required.

My proposal is to try to make case e of <patrows> map to something else as much as possible. If a definition used case, then its RHS would change, but TFL's analysis should still be fine because the new technology provides congruence rules as required.

@konrad-slind
Copy link
Contributor

Ah. So the proposal is to keep the lhs's of TFL definitions as-is, but
allow more flexibility for case-expressions in the body. That
should be eminently do-able.

Konrad.

On Mon, Aug 31, 2015 at 8:22 PM, Michael Norrish [email protected]
wrote:

Sure; I don't think any code in TFL would change or need to change. The
machinery there would still be using case-constants so that something like

f 0 = e1 /
f 1 = e2 /
f (SUC n) = ... f n ...

could still compile internally to something involving num_CASE or
whatever else was required.

My proposal is to try to make case e of map to something else
as much as possible. If a definition used case, then its RHS would
change, but TFL's analysis should still be fine because the new technology
provides congruence rules as required.


Reply to this email directly or view it on GitHub
#286 (comment)
.

@thtuerk
Copy link
Member Author

thtuerk commented Sep 1, 2015

Michael wrote:

I don't think that "guessing bound vars" is a fair characterisation: people are used to this behaviour, and people are pretty good at figuring out what the bound variables will be. (You yourself suggested using |! to give us this behaviour.)

|! is not my idea, but Ramana talked me into providing it. I personally don't like it. However, I agree that it is not too troublesome, since the result is mostly predictable. I still don't like it too much, but I understand why people do. The main point is, we should provide an alternative and in my opinion the pretty printer should use the verbose version. A lengthy comment why I don't like it, will follow in a separate comment.

I am not sure I understand what the parse deep cases trace does. My guess: if this trace is on, and the user writes with the old syntax, this syntax is translated to patternMatches semantics. If this is available as I describe, I think it would be interesting to turn it on by default, and to then see what breaks.

Yes, that's what I suggested. As an example that sounds troublesome to me (not completely sure), I suggested Anthony's ARM model.

Strictly, I don't think that the old-style syntax is heavily used by Define. It certainly uses pattern-matching resolution to figure out how to represent definitions, and it then translates those to use case-constants as necessary. We'd want to keep this code unchanged, so we would need to maintain the existing code to support this. But the syntax-to-semantics of case expressions could certainly change without affecting Define.

Fair enough, but the main point was (as you seem to agree) that we need (unless we want to put in serious work and change a lot) the code in TFL and this one uses the pattern compilation used by the parser. What I am mainly arguing for is backward compatibility. The very least is in my opinion to provide new syntax for the old case-expressions, if we reuse their syntax for the new ones. However, I prefer to keep the old syntax and just provide new one for the new case-expressions. I got the impression (since you did not reply to my suggestions above), that you want to get rid of them altogether. The only reason for this would in my opinion be, if it allows us to get rid of some code, so we have a simpler system and less code to maintain. This however, is not possible, as I tried to point out with the Define comment. Therefore, I don't see why you don't want to provide input syntax for old-style case-expressions any more.

How much work does “compiling to decision tree” represent? If we did this for every theorem added to a compset, would it be very noticeable. I totally agree that we need to be able to execute new style case expressions with computeLib.

I disagree that we need to automatically execute new-style case expressions with computeLib. I never intended to say that. For me, a manual approach would be sufficient. Compiling to a decision tree and then adding manually sounds fine to me. However, I don't know computeLib well and perhaps I miss something here. Automatically compiling to a decision tree and adding the resulting theorem to the compset is out of the question, because it would be much too slow. For the red-black-tree example compilation takes about 10 s.

@mn200
Copy link
Member

mn200 commented Sep 1, 2015

I disagree that we need to automatically execute new-style case expressions with computeLib.

We do need to be able to execute new-style case-expressions with computeLib. It's part of the implicit contract with the user that I should be able to write typical functional programming language definitions with Define, and to then be able to EVAL them on concrete arguments.

If people are going to be making definitions with new-style case expressions, then we should strive to EVAL them. Now, I do agree that 10_s_ sounds too much. Is there any way of telling in advance that compiling the red-black tree example will take a long time? If so, it would be nice if we could not do it in those sorts of situations, emit a diagnostic message explaining this, and also explaining how to compile the decision tree if the user still wants it.

Of course, I also believe that the current handling of red-black trees with Define results in terrible run-time behaviour, suggesting that we would not be worsening the user-experience by simply doing it every time.

@mn200
Copy link
Member

mn200 commented Sep 1, 2015

I personally don't like ...[ |! ]... However, I agree that it is not too troublesome, since the result is mostly predictable. I still don't like it too much, but I understand why people do. The main point is, we should provide an alternative and in my opinion the pretty printer should use the verbose version. A lengthy comment why I don't like it, will follow in a separate comment.

I think that printing with the verbose style should certainly be an option, and forced if the bound variables are not equal to the free variables (just as with the printing of set comprehensions). Otherwise, I do not think it should be the default. Many, many SML, Haskell and OCaml programmers are perfectly comfortable with “implicit” bound variables, so I don't think we should confront them with verbose forms if they're not necessary.

@thtuerk
Copy link
Member Author

thtuerk commented Sep 1, 2015

Why I don't like guessing bound variables

The main trouble I have with automatically mark all variables occurring in a pattern as bound is that it is rather tricky to implement. Before type-checking, the free variables need to be turned into bound ones. Otherwise variables with the same name in different rows or bound variables and context variables with the same name have to have the same type. This leads to unexpected problems. We had these issues for quite some time with Define I believe (solved now I think). However, I believe that we certainly don't want this situation, since it can very easily lead to unexpected typing errors or worse unexcepted types of the parsed term.

So, we need to start modifying preterms. This is a deep change and we hard to implement. Look at all the mess the current parsing (even without pattern compilation) of case-expressions is. The point of my library was getting rid of this mess and make parsing simpler. So, I don't want preterm hackery in there.

@thtuerk
Copy link
Member Author

thtuerk commented Sep 1, 2015

We do need to be able to execute new-style case-expressions with computeLib. It's part of the implicit contract with the user that I should be able to write typical functional programming language definitions with Define, and to then be able to EVAL them on concrete arguments.

Fair enough, but then the new-style case-expressions can't be defined as typical functional programming style. They just won't work with computeLib. Not a chance.

If people are going to be making definitions with new-style case expressions, then we should strive to EVAL them. Now, I do agree that 10s sounds too much. Is there any way of telling in advance that compiling the red-black tree example will take a long time? If so, it would be nice if we could not do it in those sorts of situations, emit a diagnostic message explaining this, and also explaining how to compile the decision tree if the user still wants it.

There is nothing special about the red-black-tree example. In fact, it is not too complicated. There is just a lot of heavy machinery involved. This can still be optimised for speed. But it is always going to be slow on non-trivial inputs.

Of course, I also believe that the current handling of red-black trees with Define results in terrible run-time behaviour, suggesting that we would not be worsening the user-experience by simply doing it every time.

I don't believe there is horrible run-time behaviour. At least not for computeLib. Only proofs become looking verbose and the ML-code is bad. But decision trees are prefect for computeLib.

@thtuerk
Copy link
Member Author

thtuerk commented Sep 1, 2015

I think that printing with the verbose style should certainly be an option, and forced if the bound variables are not equal to the free variables (just as with the printing of set comprehensions).

Set comprehensions are for me a perfect example, why not to do it. They are a nightmare in HOL and hardly usable, if the bound vars are not listed explicitly. It is very hard to predict, which variables are bound and which ones are free. It depends on the context of the term. This caused me serious headache while working with Lem. If it was me, I would get rid of the verison without an explicit list of bound variables.

Otherwise, I do not think it should be the default. Many, many SML, Haskell and OCaml programmers are perfectly comfortable with “implicit” bound variables, so I don't think we should confront them with verbose forms if they're not necessary.

But we are not doing functional programming here. This is reinforced by the computeLib discussion. So, trying to present things that are different in exactly the same form is for me more confusing than just acknowledge the difference.

I propose something more verbose, but for good reasons. It is more trustworthy (because it avoids some arcane parser magic), easier to read and the result of parsing it is more predictable.

@thtuerk
Copy link
Member Author

thtuerk commented Sep 1, 2015

@mn200 I still have not understood, why you want to replace the old-style case-expressions. This is going to cause a lot of trouble as the discussions above already indicate. I never intended to replace the old-style. Just provide an alternative. I in fact like the old-style for certain problems. So, why get rid of them?

@thtuerk
Copy link
Member Author

thtuerk commented Sep 1, 2015

Another argument against using new-style case-expressions with computeLib

Michael argued

We do need to be able to execute new-style case-expressions with computeLib. It's part of the implicit contract with the user that I should be able to write typical functional programming language definitions with Define, and to then be able to EVAL them on concrete arguments.

However, not all new-style case expressions are easily computeable and certainly not all can be compiled to a decision tree easily. As a comparably easy example, look at the following definition of cardinality of sets.

val CARD2_defn = Defn.Hol_defn "CARD2" `
  CARD2 s = CASE s OF [
    ||. {} ~> 0;
    ||! x INSERT s' when (FINITE s' /\ ~(x IN s')) ~>
        SUC (CARD2 s');
    ||! s' ~> 0
  ]`

I don't see how this can be automatically compiled to something that works with computeLib.
Worse, in general, we can write

CASE T OF [
  || x. P x ~> SOME x;
  ||. _ ~> NONE
]

which defines some x. P x.

So, we have two choices. Either we acknowledge that definitions using new-style case-expressions won't automatically work with computeLib or we define some subsets that do. That later approach is confusing to the user and takes a lot of runtime trying to compile to decision trees. We can leave that job to the user in my opinion and don't try to do it automatically.

@mn200
Copy link
Member

mn200 commented Sep 1, 2015

Otherwise, I do not think it should be the default. Many, many SML, Haskell and OCaml programmers are perfectly comfortable with “implicit” bound variables, so I don't think we should confront them with verbose forms if they're not necessary.

But we are not doing functional programming here. This is reinforced by the computeLib discussion. So, trying to present things that are different in exactly the same form is for me more confusing than just acknowledge the difference.

I propose something more verbose, but for good reasons. It is more trustworthy (because it avoids some arcane parser magic), easier to read and the result of parsing it is more predictable.

Having variables implicitly bound inside case expression patterns is not confusing; as witnessed by many many functional programmers successfully using exactly this form of syntax. Moreover, users are not going to be confused even if they choose not to have the verbose form pretty-printed back to them, because bound variables will appear in green.

The only possible confusion when parsing a pattern is possible confusion over what is a constant and what is a variable. And again, this will be resolved instantly when terms are pretty-printed because the variables will be green and the constants will be black.

@mn200
Copy link
Member

mn200 commented Sep 1, 2015

The main trouble I have with automatically mark all variables occurring in a pattern as bound is that it is rather tricky to implement. Before type-checking, the free variables need to be turned into bound ones. Otherwise variables with the same name in different rows or bound variables and context variables with the same name have to have the same type. This leads to unexpected problems. We had these issues for quite some time with Define I believe (solved now I think). However, I believe that we certainly don't want this situation, since it can very easily lead to unexpected typing errors or worse unexcepted types of the parsed term.

So, we need to start modifying preterms. This is a deep change and we hard to implement. Look at all the mess the current parsing (even without pattern compilation) of case-expressions is. The point of my library was getting rid of this mess and make parsing simpler. So, I don't want preterm hackery in there.

This is not difficult, is not hard to implement, and has been done already. Moreover, I am happy to do it again (inasmuch this will require any change at all, which I doubt). The complexity of getting similar things to work with Define was because the syntactic structure of a series of conjuncts is much harder to work with than a nicely structured case expression.

@mn200
Copy link
Member

mn200 commented Sep 2, 2015

I still have not understood, why you want to replace the old-style case-expressions. This is going to cause a lot of trouble as the discussions above already indicate. I never intended to replace the old-style. Just provide an alternative. I in fact like the old-style for certain problems. So, why get rid of them?

I am not committed to "getting rid" of old style case expressions. I am committed to not having too many syntaxes. In particular, I absolutely want to avoid the situation where we have two flavours of case expression and have to document just why and how one would prefer one over the other. If that requires two implementations of pattern-resolution, that's totally fine.

(Let's ignore the situation with what happens before boss for the moment. It's irrelevant to the user experience, and as I have discovered, there are hardly any occurrences anyway. Try

grep -R --include=*Script.sml '\<case\>' bool relation one pair sum option combin num list pred_set

in src and see that there is one occurrence in list and one in option.)

I'd like to have the new semantics attached to the old syntax so that users can write code just as before, and can also add guards, non-linear patterns and the rest as the fancy takes them. The worst case scenario is that they can either have nice syntax and confusing semantics with respect to things like additional rows being added (as at the moment), or horrible syntax and cool features.

I'm sure you're right that there will be scenarios where people will want the old semantics. In this case, they should simply adjust the trace variable as necessary so that the parser will change back to the old behaviour. This should be an expert option, and one mostly provided for the sake of backwards compatibility.

I haven't made my point about computeLib very well. We do strive to make things work with EVAL as much as possible: when people write functional programs, we make an effort to make EVAL evaluate those programs well (see for example our handling of numerals, even for definitions involving SUC).

Of course, it has also always been possible to write definitions that do not evaluate well. (I can Define the function that returns true if a Turing machine terminates and false if it loops.) Given this, I am not too bothered by the occasional need to turn off computeLib translation/handling. Indeed, we already have this facility in the form of zDefine. I think the right interface would be to try to automatically generate the compilation of the decision tree when Define is used. If this results in poor performance, the user has a number of options:

  1. switch to zDefine
  2. lift the case expression into the clauses of the Define so that the old-style facilities take over
  3. switch to old style parsing via the trace variable

In sum, I do not think this will cause a "lot of trouble", and that it will result in a better system.

@thtuerk
Copy link
Member Author

thtuerk commented Sep 2, 2015

I have completely differing opinions. For me colour is an aid, but I should not need to see the colour a term is printed in to be able to parse it. We are not doing functional programming and a comparison with functional programming does not make sense, since we are in a richer setting. Therefore, we need to present the features that are not available in functional programming clearly.

Setting up the parser might not be difficult for Michael, for me it unluckily is. I played around with it for 3 or 4 days when I started with the pattern matching example for a few days. It took me very long to understand, how this works and I'm still not sure, I really do. I then tried to copy the code for my PMATCH expressions and failed. For an expert like Michael, this might be not difficult, but at least I have serious trouble understanding the parser and there are a lot of special hooks and special code to make cases work.

I believe that Michael's proposal will cause a lot of trouble. Luckily, explicit case expressions are not that commonly used. However, their behaviour will change with the proposal a lot. The main difference is how to evaluate the resulting terms. Having Define do some magic in the background to get computeLib setup again, is making things worse, I believe. Even now, I have (as Michael and I discussed several times), serious issues with Define. I (and I consider myself after using HOL 4 for a decade now an expert user) find it hard to predict, what Define will do. Part of the motivation for work on pattern matches was in fact to use them in future work to provide an alternative to Define. That it uses currently pattern compilation in the background is part of the problem. Extending it to do something as time consuming and hard to predict as a second, this time verified pattern compilation in the background will make things worse. I believe it will be hard to predict, which patterns will work with computeLib and which ones won't.

I strongly believe that the old case-expressions are useful. I want to be able to use them in certain situations. They might be handy for code generation (depending on circumstances) as well as evaluation. They -- by virtue of their syntax -- are always exhaustive and have non-overlapping cases. Their semantics is really straightforward. I like them. That's why a design principle behind my new pattern match example was to have both representations and easily be able to switch between them. So, I don't like getting rid of them, which I still believe Michael's proposal essentially aims at even if they would be available as a fallback to expert users.

@mn200 Currently, we seem stuck. We seem to have different expectations how much the change would influence. And I fear you have much too much hopes in my new representation and don't understand what it can do and what it can't do well. However, we discussed 2 tests to check our expectations against reality. If you want to convince me, why not really test Anthony's example and if that goes through then move the example into bossLib and set the trace parse deep cases. Then see what happens.

Anyhow, I will write a compromise proposal in a separate message. Perhaps we can meet somewhere in the middle.

@thtuerk
Copy link
Member Author

thtuerk commented Sep 2, 2015

Compromise proposal

I of course like my syntax (otherwise I would not have implemented it :-).

CASE l OF [ 
  ||. [] ~> F; 
  || (y,ys). y::ys when (x = y) ~> T;
  || ys. _::ys ~> MEM x ys
]

However, I appreciate that

  • people really don't want to write down bound variables
  • a syntax less verbose (e.g. list syntax for rows)
  • closer to existing case-expressions

How about using essentially the same syntax as for the existing case-expressions (with some extensions), but use capital keywords instead. So, the example would look like

CASE l OF 
     [] => F 
  | y::ys WHEN (x = y) => T
  | _::ys => MEM x ys

I would also like explicit bound variable syntax available. The user can choose what to use. If we do fancy, complicated parsing then perhaps also for the bound vars. So the example could as well be written like

CASE l OF 
  |. [] => F 
  | y ys. y::ys WHEN (x = y) => T
  | ys. _::ys => MEM x ys

The old-style case expressions I would keep as they are.

Discussion

With the proposal the user has the choice, which case-expressions to use. The pretty printed result would be clearly visible as either an old or a new-style case-expression. They would be similar enough in syntax though that the switch is very easy for users. Essentially, one can take an old-style case-expression, replace case ... of with CASE ... OF and get a new-style one. If we set up the pretty printer to avoid explicitly listing bound vars where possible, we can go the other way round as well for case-expressions in the supported subset.

Since we only add extra syntax, existing code should not be effected. We can step by step play around with changing existing case expressions and see what happens. We can then with more information available decide later, what to do about Define.

@konrad-slind
Copy link
Contributor

Hi Thomas,

This discussion is becoming obscure. On the one hand, case expressions in
the
body of a definition are always exhaustive and have non-overlapping cases.
On the
other hand that is not supposed to be true of LHS patterns as used in
Define. But
they both use the same pattern-match compiler.

Side note: I think it is a bad idea to have multiple case-translations and
a flag to
choose which one is in force. That might be OK for developers, but for
users it is
less obviously good. An alternative would be to have a highly parameterized
version of Define, (or term parser) that took the case parser/translator as
a parameter.

Konrad.

On Tue, Sep 1, 2015 at 10:10 PM, Thomas Tuerk [email protected]
wrote:

I have completely differing opinions. For me colour is an aid, but I
should not need to see the colour a term is printed in to be able to parse
it. We are not doing functional programming and a comparison with
functional programming does not make sense, since we are in a richer
setting. Therefore, we need to present the features that are not available
in functional programming clearly.

Setting up the parser might not be difficult for Michael, for me it
unluckily is. I played around with it for 3 or 4 days when I started with
the pattern matching example for a few days. It took me very long to
understand, how this works and I'm still not sure, I really do. I then
tried to copy the code for my PMATCH expressions and failed. For an expert
like Michael, this might be not difficult, but at least I have serious
trouble understanding the parser and there are a lot of special hooks and
special code to make cases work.

I believe that Michael's proposal will cause a lot of trouble. Luckily,
explicit case expressions are not that commonly used. However, their
behaviour will change with the proposal a lot. The main difference is how
to evaluate the resulting terms. Having Define do some magic in the
background to get computeLib setup again, is making things worse, I
believe. Even now, I have (as Michael and I discussed several times),
serious issues with Define. I (and I consider myself after using HOL 4
for a decade now an expert user) hard to predict, what Define will do.
Part of the motivation for work on pattern matches was in fact to use them
in future work to provide an alternative to Define. That it uses
currently pattern compilation in the background is part of the problem.
Extending it to do something as time consuming and hard to predict as a
second, this time verified pattern compilation in the background will make
things w orse. I believe it will be hard to predict, which patterns will
work with computeLib and which ones won't.

I strongly believe that the old case-expressions are useful. I want to be
able to use them in certain situations. They might be handy for code
generation (depending on circumstances) as well as evaluation. They -- by
virtue of their syntax -- are always exhaustive and have non-overlapping
cases. Their semantics is really straightforward. I like them. That's why a
design principle behind my new pattern match example was to have both
representations and easily be able to switch between them. So, I don't like
getting rid of them, which I still believe Michael's proposal essentially
aims at even if they would be available as a fallback to expert users.

@mn200 https://github.com/mn200 Currently, we seem stuck. We seem to
have different expectations how much the change would influence. And I fear
you have much too much hopes in my new representation and don't understand
what it can do and what it can't do well. However, we discussed 2 tests to
check our expectations against reality. If you want to convince me, why not
really test Anthony's example and if that goes through then move the
example into bossLib and set the trace parse deep cases. Then see what
happens.

Anyhow, I will write a compromise proposal in a separate message. Perhaps
we can meet somewhere in the middle.


Reply to this email directly or view it on GitHub
#286 (comment)
.

@thtuerk
Copy link
Member Author

thtuerk commented Sep 2, 2015

Hi Konrad,

yes, the discussion is getting obscure and I need to elaborate. Sorry. I will start early and repeat thing you know for the sake of clarity.

Current case-expressions

Currently, we can write in the input case-expressions. These are compiled to a decision tree as part of parsing. The input expressions can be non-exhaustive and have overlapping rows. However, the resulting decision tree, so the representation inside the logic does not. For missing patterns, a non-exhaustive case-expression defaults to ARB. All the semantic complexity is compiled away. That's why these decision trees work well with computeLib. The decision trees are pretty-printed in the form of case-expressions again. The connection with the input case-expressions is not obvious. However, the printed case-expressions are always exhaustive and I believe in the current implementation there are no overlapping rows (there was talk about changing this though).

Define

Define uses case-expressions and pattern-compilation in two ways. For patterns on the LHS, i.e. the ones occurring at the top-level of definitions, the same pattern compilation as in the parser is used to compile them to a decision tree. From this decision tree, a exhaustive, non-overlapping set of patterns is extracted using techniques similar to the ones used by the pretty-printer. For the induction theorem, the exhaustiveness of this set of patterns is reproved. For the generated definition theorem, the ones not actually used are removed and the rest is used to generate equations. Therefore, the resulting conjunction of equations does not represent a exhaustive set despite being generated from an exhaustive set of patterns.

On the RHS case-expressions are removed by the parser and Define does not need to take care of them (except for well-foundedness checks working on the decision trees).

New Case-Expressions

The new case-expressions represent pattern matches directly in the logic. Therefore, they can represent overlapping rows and non-exhaustive matches. Of course, all functions in HOL are total. Therefore, a non-exhaustive pattern match is defaulting to ARB for missing cases. This is however implicit in the semantics and not represented by adding extra rows to the case-expression.

The new case-expressions are defined with existential quantification and Hilbert's choice. They are in general not executable. There is some heavy machinery to evaluate and simplify them. This works well, but providing general theorems for computeLib is impossible. So, to get functions defined with the new case-expressions working with computeLib, one needs to remove them. There are several options for this. One is compilation to a decision tree, the other is lifting to boolean expressions. Both are not always possible.

Clarification of discussion

The pattern compilation for old case-expression is used by Define to handle patterns on the LHS of definitions. As I understood Michael's proposal, we would use the pattern compilation for new case-expressions on the RHS of definitions to bring the result into a form that can be handled by computeLib.

@xrchz
Copy link
Member

xrchz commented Sep 2, 2015

(@konrad-slind In case Thomas's comment above wasn't clarification enough, you may need to be aware that this issue is about a new, better, representation for case expressions, which are not necessarily exhaustive or non-overlapping. See the paper in ITP'15.)

@thtuerk
Copy link
Member Author

thtuerk commented Sep 2, 2015

See also the main issue #285 and the ITP talk Pattern Matches in HOL: A New Representation and Improved Code Generation (slides, paper).

@thtuerk
Copy link
Member Author

thtuerk commented Sep 2, 2015

@xrchz I just believe that it is a new, different representation. It has advantages and disadvantages. It is not a "better" representation in my opinion. That's why I'm so reluctant to make the change Michael proposes. That's also why I made sure the user can easily switch between representations.

@mn200
Copy link
Member

mn200 commented Sep 2, 2015

I am happy to assume that big complicated examples like Anthony's will break given a new treatment of case-expressions. For this reason, I agree that we need to allow a fall-back to the old behaviour. Given that we already have code implementing the old behaviour, I don't see this as likely to cause a problem.

Of course, we should support explicit bound variable syntax. I am not suggesting the opposite. It is absolutely necessary for some examples, and it's a very nice feature. Naturally, that same syntax would get printed back to the user if it is necessary to make sense of the input. I am only additionally asserting that we should allow the bound variables to be omitted when specifying them is not required. If the additional precision with respect to bound variables is not required, there are good reasons to suppose that users will not be confused:

  1. there is no evidence to suggest that our existing users are confused by old-style case expressions' handling of bound variables (they may be confused by the way that case-expressions explode on them); and
  2. there is no evidence to suggest that functional programmers are confused by their languages' handling of bound variables

@thtuerk
Copy link
Member Author

thtuerk commented Sep 2, 2015

@mn200 I disagree, but this is a minor issue and perhaps we should take this offline. I think I have evidence that it is confusing. Anyhow, as already stated in my original comment I don't like this behaviour, but don't object too heavily. My compromise proposal admits that point and follows your view (at the price of a complicated parser). So perhaps we can take this discussion offline and concetrate on the main issues.

For me the real point of difference is whether to replace the old case-expression syntax. I don't like that. As proposed, I would keep it and add new syntax for the new case-expressions. At the very least I would provide new syntax for the old-case expressions, if we reuse the old syntax for the new ones. A flag to change parsing is not good enough. Then I would for example have trouble distinguishing the internal representations while pretty printing. Moreover, if we change the default behaviour, the tools need to be much more robust and issue like the one with computeLib need to be solved properly. So, for me the real core of the discussion is whether to replace the old case-expression syntax.

@mn200
Copy link
Member

mn200 commented Sep 2, 2015

How about this:

  1. Two traces: "parse old cases" and "prettyprint old cases". Both default to false
  2. Support for old style syntax using oldcase-of (or primcase-of, or ...). To be ultra-discouraging, terms using this pretty-print to raw forms (num_CASE, pair_CASE, etc) unless "prettyprint old cases" is true. (The ultra-discouragement might be going too far.)
  3. Existing uses that don't set "parse old cases" map to the new underlying semantics.

In other words, you never get case-of from old-style terms so it's always clear which representation you are seeing when terms are printed. If you want to write old-style terms in the new world, the preferred route is via primcase-of. The "parse old cases" is really only to support backwards compatibility.

I believe it's clear from our experience with things like quantHeuristics and others that new tools don't get used unless users are pushed into it (preferably without them even realising that this is happening).

@thtuerk
Copy link
Member Author

thtuerk commented Sep 2, 2015

So, to be clear. You would parse old case-expressions with something like

old-case ... of
   | ...
   | ...

If that is the case, I'm happy. The two traces provide all the backwards compatibility needed in my opinion. And unluckily you are right about the need to push users. How about dtree-case instead of old-case? I like the old-cases sometimes and there are good reasons for using them. So old-case might be a bit discouraging implying that you should always use the new ones. And the name primcase reminds me of primitive recursion. So users might not expect to parse fancy nested patterns.

I would even go so far as having prettyprint old cases turned off by default. If people complain heavily, we can with minimal effort change the default.

That however leaves the question what to do about computeLib in Define. I would leave it unsupported for now and see what happens.

@xrchz
Copy link
Member

xrchz commented Sep 2, 2015

Would it be possible to figure out some conversions that could be added to computeLib to get it to handle PMATCH-style case expressions efficiently? Or is that difficult even in theory?

@thtuerk
Copy link
Member Author

thtuerk commented Sep 2, 2015

Essentially evaluation has to get rid of existential quantification and Hilbert's choice. Even in the simple examples, we need to show the injectivity of patterns to blow them away. I have no idea how this could be achieved with the simple, fast rewriting mechanisms provided by computeLib. We can of course - as you point out - use arbitrary conversions in comp-sets. So, we could add a cut-down version of the ones used for simplifying the new case-expressions. We would only need one looking at the first row and either discarding it or notice that it matches and evaluate the case-expression to its right hand side. It might be worth adding this one. But it will still be heavy-weight for the usually very fast computeLib. Anyhow, it might be better than just stopping and not doing a thing.

@mn200
Copy link
Member

mn200 commented Sep 2, 2015

Yes, something like the syntax you suggest. The name dtree_case is fine (probably with an underscore rather than a hyphen: though the system would actually handle the hyphen, perhaps it might be confused with subtraction). The fact that both old and new syntaxes use of and | would not be an issue as far as parser implementation would be concerned.

I'm happy with the infix when keyword for guards; are there any competing suggestions for this?

Unfortunately, I don't think a simple . is easy to do for rows where you want to specify bound variables explicitly. I can see two options:

  1. Get to have . but have to use a second token after the bar. Something like

    case x of 
       | |. bvs . pat => e
    

    where there is a | followed by a separate |. (or whatever). This would effectively be making |. a binder (like ! and others), so the standard abstract syntax treatment of binders would kick in. I don't think it's possible to have the row separator also be a binder (and so we might as well use the | as a separator uniformly).

  2. Alternatively, another token apart from . to terminate the bound variables. Perhaps

    case x of 
      | bvs .| pat => e
    

    In this case, the | is still a top-level separator. I had thought we might have to put a different separator there, but that's not necessary. The handling of rows would look for the .| (treated as an infix), and then for the => (also handled as an infix).

In both cases, the separator token can remain optional when preceding the first case.

@mn200
Copy link
Member

mn200 commented Sep 2, 2015

As for computeLib, I've never understood why it can't even handle rewrites of the form

|- (?x. T) = T

let alone the somewhat more complicated ones that arise once injectivity of constructors has been dealt with. For the moment, we just have to deal with it.

However, it does seem as if we should be to automatically add the right conversions. The constructor injectivity theorems can be in there easily (and automatically as types are defined), so the question is handling what remains after they've fired.

@thtuerk
Copy link
Member Author

thtuerk commented Sep 2, 2015

From the two syntax alternative, I prefer the second one | bvs .| pat => e. It's not looking too bad, I believe. dtree_case is fine with me. The traces should be called something corresponding then.

For computeLib: It's not that simple as just adding injectivity theorem. One needs to use it cleverly as well. Essentially we need something like PMATCH_CLEANUP_CONV. It is sufficient to look at just the first row instead of all rows, though.

@konrad-slind
Copy link
Contributor

I read the paper some time ago. It's nice technology. The question is
the right way to use it, given that the core of what it offers is already
present in
Define.

Since (I assert) one never uses case expressions except when making
definitions,
why not just implement an alternative to Define that uses the new case
expressions on
both LHS and RHS?

Konrad.

On Tue, Sep 1, 2015 at 11:43 PM, Ramana Kumar [email protected]
wrote:

(@konrad-slind https://github.com/konrad-slind In case Thomas's comment
above wasn't clarification enough, you may need to be aware that this issue
is about a new, better, representation for case expressions, which are not
necessarily exhaustive or non-overlapping. See the paper in ITP'15.)


Reply to this email directly or view it on GitHub
#286 (comment)
.

@thtuerk
Copy link
Member Author

thtuerk commented Sep 2, 2015

Sorry, I'm confused. I don't see why the core of the pattern matching paper is already present in Define. So far, I don't see much of it present in Define. Can you elaborate, please?

What do you mean with "one never uses case expressions except when making definitions"? Just as with any other function or construct in the logic, they are used in definitions, but need dealing with, when the definiton is used. Clearly they are useful for implementing Define. In fact the idea is close to some internal concepts used by function definition packages of e.g. HOL Light. However, the point was here to present the case-expressions to the user. This has many additional benefits. And in fact these function definition package ideas are not present in HOL 4's Define. So, I fear I don't understand what you mean.

@konrad-slind
Copy link
Contributor

Sorry about the confusion, I was in a rush. I am still in a rush.

The simple point I was making is that, for someone
who is going to write an essentially ML-style function in HOL,
there is no real difference in input syntax (quibbles about making bound
variables explicit aside). That's the "core pattern language" that is
already available
in Define.

This is roughly your argument to Michael, right? I don't see any
disagreement between
us.

The other point is merely how/whether to merge function definition
facilities. I don't
see it as a problem: my proposal was to implement another Define-like
entrypoint,
which supports the new case-and-pattern syntax. Since you already have proof
tools for dealing with the explicit representation of patterns, these can
be made
available for people to use in proofs of formulas having explicit pattern
representations.

Konrad.

On Wed, Sep 2, 2015 at 11:58 AM, Thomas Tuerk [email protected]
wrote:

Sorry, I'm confused. I don't see why the core of the pattern matching
paper is already present in Define. So far, I don't see much of it
present in Define. Can you elaborate, please?

What do you mean with "one never uses case expressions except when making
definitions"? Just as with any other function or construct in the logic,
they are used in definitions, but need dealing with, when the definiton is
used. Clearly they are useful for implementing Define. In fact the idea
is close to some internal concepts used by function definition packages of
e.g. HOL Light. However, the point was here to present the case-expressions
to the user. This has many additional benefits. And in fact these function
definition package ideas are not present in HOL 4's Define. So, I fear I
don't understand what you mean.


Reply to this email directly or view it on GitHub
#286 (comment)
.

@thtuerk
Copy link
Member Author

thtuerk commented Sep 2, 2015

Sorry, I still don't get you.

The simple point I was making is that, for someone who is going to write an essentially ML-style function in HOL, there is no real difference in input syntax (quibbles about making bound variables explicit aside). That's the "core pattern language" that is already available in Define.

Ok.

This is roughly your argument to Michael, right? I don't see any disagreement between us.

No, I did not intend to say that. In fact I don't see how that fits in the discussion above. Sorry, confused.

The other point is merely how/whether to merge function definition facilities. I don't see it as a problem: my proposal was to implement another Define-like entrypoint, which supports the new case-and-pattern syntax.

Yes, that is sensible and actually I planned that from the very beginning when I started working on the new pattern representation. It was part of the motivation. See also issue #289. But this seems to me orthogonal to our discussion.

Since you already have proof tools for dealing with the explicit representation of patterns, these can be made available for people to use in proofs of formulas having explicit pattern representations.

These tools are available already. This was the main point of the library.

Sorry, I still don't get your drift. The main goals of the library were:

  • eliminate the need to trust unverified pattern compilation
  • eliminate need to trust fancy pretty printer
  • better code generation
  • more natural proof structure

As a side-effect, we ended also up with more expressive case expressions. The discussion with Define was in my opinion mainly about whether the new expressions need to be supported by computeLib. Sorry, I still don't get what you want to tell me.

@konrad-slind
Copy link
Contributor

konrad-slind commented Sep 2, 2015 via email

@thtuerk
Copy link
Member Author

thtuerk commented Sep 2, 2015

Ok. Thanks. Sorry, if I was stupid in not getting you. It was a long day for me and I'm quite tired. Sorry.

thtuerk added a commit that referenced this issue Sep 9, 2015
Use ||| instead of || everywhere. This avoids the conflict with
n-bit and therefore allows to build HOL 4 again.

This is a temporary measure till Michael implements the changes
discussed in issue #286. Since now HOL 4 builds again, it is
easier to test changes to e.g. EmitML.
@thtuerk
Copy link
Member Author

thtuerk commented Sep 9, 2015

@mn200 I just pushed commit b1bc3c9, which implements some syntactic checks for PMATCH. I need it for adding PMATCH support to EmitML. However, I wrote it more general than what I really need myself. I did this with parsing in mind. This functionality might be handy to print some warning messages when parsing certain PMATCH expressions or even enforce that (if a certain flag is set) every PMATCH parsed as an argument of Define is automatically extended to an exhaustive match. (No idea whether we want this though)

thtuerk added a commit that referenced this issue Sep 10, 2015
Use ||| instead of || everywhere. This avoids the conflict with
n-bit and therefore allows to build HOL 4 again.

This is a temporary measure till Michael implements the changes
discussed in issue #286. Since now HOL 4 builds again, it is
easier to test changes to e.g. EmitML.
thtuerk added a commit that referenced this issue Oct 1, 2015
Use ||| instead of || everywhere. This avoids the conflict with
n-bit and therefore allows to build HOL 4 again.

This is a temporary measure till Michael implements the changes
discussed in issue #286. Since now HOL 4 builds again, it is
easier to test changes to e.g. EmitML.
thtuerk added a commit that referenced this issue Oct 7, 2015
Use ||| instead of || everywhere. This avoids the conflict with
n-bit and therefore allows to build HOL 4 again.

This is a temporary measure till Michael implements the changes
discussed in issue #286. Since now HOL 4 builds again, it is
easier to test changes to e.g. EmitML.
thtuerk added a commit that referenced this issue Dec 25, 2015
Use ||| instead of || everywhere. This avoids the conflict with
n-bit and therefore allows to build HOL 4 again.

This is a temporary measure till Michael implements the changes
discussed in issue #286. Since now HOL 4 builds again, it is
easier to test changes to e.g. EmitML.
mn200 added a commit that referenced this issue Jan 18, 2016
Syntax as described in github issue #286 parses.

Examples:

    > ``case x of 0 => 3 | SUC n when n < 10 => n + 1``;
    val it =
       ``PMATCH x
        [PMATCH_ROW (λ_. 0) (K T) (λ_. 3);
         PMATCH_ROW (λn. SUC n) (λn. n < 10) (λn. n + 1)]``:
       term
    > Datatype `tree = Lf | Nd tree 'a tree`;
    <<HOL message: Defined type: "tree">>
    val it = (): unit
    > ``case t of Lf => 0 | Nd t1 x t2 => x + f t1``;
    val it =
       ``PMATCH t
        [PMATCH_ROW (λ_. Lf) (K T) (λ_. 0);
         PMATCH_ROW (λ(t1,x,t2). Nd t1 x t2) (K T) (λ(t1,x,t2). x + f t1)]``:
       term
    > ``x + case t of Lf => 0 | x .| Nd t1 x t2 => f t1``;
    <<HOL message: inventing new type variable names: 'a>>
    val it =
       ``x +
      PMATCH t
        [PMATCH_ROW (λ_. Lf) (K T) (λ_. 0);
         PMATCH_ROW (λx. Nd t1 x t2) (K T) (λx. f t1)]``:
       term
    > ``x + case t of Lf => 0 | t1,t2 .| Nd t1 x t2 => f t1``;
    val it =
       ``x +
      PMATCH t
        [PMATCH_ROW (λ_. Lf) (K T) (λ_. 0);
         PMATCH_ROW (λ(t1,t2). Nd t1 x t2) (K T) (λ(t1,t2). f t1)]``:
       term
    > ``x + case t of Lf => 0 | t1,t2 .| Nd t1 x t2 when g t2 > 6 => f t1``;
    val it =
       ``x +
      PMATCH t
        [PMATCH_ROW (λ_. Lf) (K T) (λ_. 0);
         PMATCH_ROW (λ(t1,t2). Nd t1 x t2) (λ(t1,t2). g t2 > 6)
           (λ(t1,t2). f t1)]``:
       term

I think we do still need a separate notation to indicate no variables
are to be bound, as

  | .| pat => rhs

won't work when .| is an infix.  Maybe

  | ..| pat => rhs

This would be needed in a situation where all of the variables in pat
have to be read as free (referring to variables occurring in a wider
context).
@mn200
Copy link
Member

mn200 commented Jan 27, 2016

I believe the bulk of this work is done on the pattern_matches branch. The syntax is not turned on by default, but could be done so with a call to patternMatchesLib.ENABLE_PMATCH_CASES. I'd like to have this called in bossLib or similar when all of our integration is complete.

@thtuerk
Copy link
Member Author

thtuerk commented Jan 27, 2016

Sounds great. Thanks.

@mn200
Copy link
Member

mn200 commented Jul 27, 2016

This work on pattern match syntax is now enabled on master. The syntax still needs to be properly documented.

@thtuerk
Copy link
Member Author

thtuerk commented Jan 24, 2017

I believe the new syntax is properly documented now. So I'm closing this issue.

@thtuerk thtuerk closed this as completed Jan 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants