-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Representation for Pattern Matches - Syntax #286
Comments
My suggestion is that we aim for
where a row is something like
(The optional leading I am happy to implement syntax along these lines, and think it should be possible. I don't know what exactly should be done with instances of the case-syntax that occur before lists. |
In principal I like your proposed syntax (provided we also add guards to your proposal). However, it would clash with the currently used syntax for case-expressions that get compiled to decision trees. I fear this might break quite a few existing developments. We should check carefully before deciding, I believe. As a test, we could just set the trace Moreover, I'm uncertain, what the semantics of your proposal would be. If no list of bound variables is provided, I expect it would mean that no variables are bound by the pattern? This would match the current design of |
No, I was thinking that use of |
I fear, changing the behavior, will break a lot. Especially the examples. I'm also not too happy about the automatic binding of variables. One of the points of the new representation was too make things very obvious. And automatic guessing of bound vars is violating this in my opinion. Using the old syntax is easy anyhow and I implemented that already. It can be activated with the trace
And then see what happens. I see that there might be a desire to not maintain two representations of pattern matching. But currently, the old version is still heavily used by @acjf3 How much hazzle would a change be for you? Can you estimate that without much experimenting? |
|
Creating the induction theorem is also an important consideration. The Konrad. On Mon, Aug 31, 2015 at 7:08 PM, Michael Norrish [email protected]
|
Sure; I don't think any code in TFL would change or need to change. The machinery there would still be using case-constants so that something like
could still compile internally to something involving My proposal is to try to make |
Ah. So the proposal is to keep the lhs's of TFL definitions as-is, but Konrad. On Mon, Aug 31, 2015 at 8:22 PM, Michael Norrish [email protected]
|
Michael wrote:
Yes, that's what I suggested. As an example that sounds troublesome to me (not completely sure), I suggested Anthony's ARM model.
Fair enough, but the main point was (as you seem to agree) that we need (unless we want to put in serious work and change a lot) the code in TFL and this one uses the pattern compilation used by the parser. What I am mainly arguing for is backward compatibility. The very least is in my opinion to provide new syntax for the old case-expressions, if we reuse their syntax for the new ones. However, I prefer to keep the old syntax and just provide new one for the new case-expressions. I got the impression (since you did not reply to my suggestions above), that you want to get rid of them altogether. The only reason for this would in my opinion be, if it allows us to get rid of some code, so we have a simpler system and less code to maintain. This however, is not possible, as I tried to point out with the
I disagree that we need to automatically execute new-style case expressions with computeLib. I never intended to say that. For me, a manual approach would be sufficient. Compiling to a decision tree and then adding manually sounds fine to me. However, I don't know computeLib well and perhaps I miss something here. Automatically compiling to a decision tree and adding the resulting theorem to the compset is out of the question, because it would be much too slow. For the red-black-tree example compilation takes about 10 s. |
We do need to be able to execute new-style case-expressions with If people are going to be making definitions with new-style case expressions, then we should strive to Of course, I also believe that the current handling of red-black trees with |
I think that printing with the verbose style should certainly be an option, and forced if the bound variables are not equal to the free variables (just as with the printing of set comprehensions). Otherwise, I do not think it should be the default. Many, many SML, Haskell and OCaml programmers are perfectly comfortable with “implicit” bound variables, so I don't think we should confront them with verbose forms if they're not necessary. |
Why I don't like guessing bound variablesThe main trouble I have with automatically mark all variables occurring in a pattern as bound is that it is rather tricky to implement. Before type-checking, the free variables need to be turned into bound ones. Otherwise variables with the same name in different rows or bound variables and context variables with the same name have to have the same type. This leads to unexpected problems. We had these issues for quite some time with So, we need to start modifying preterms. This is a deep change and we hard to implement. Look at all the mess the current parsing (even without pattern compilation) of case-expressions is. The point of my library was getting rid of this mess and make parsing simpler. So, I don't want preterm hackery in there. |
Fair enough, but then the new-style case-expressions can't be defined as typical functional programming style. They just won't work with computeLib. Not a chance.
There is nothing special about the red-black-tree example. In fact, it is not too complicated. There is just a lot of heavy machinery involved. This can still be optimised for speed. But it is always going to be slow on non-trivial inputs.
I don't believe there is horrible run-time behaviour. At least not for computeLib. Only proofs become looking verbose and the ML-code is bad. But decision trees are prefect for computeLib. |
Set comprehensions are for me a perfect example, why not to do it. They are a nightmare in HOL and hardly usable, if the bound vars are not listed explicitly. It is very hard to predict, which variables are bound and which ones are free. It depends on the context of the term. This caused me serious headache while working with Lem. If it was me, I would get rid of the verison without an explicit list of bound variables.
But we are not doing functional programming here. This is reinforced by the computeLib discussion. So, trying to present things that are different in exactly the same form is for me more confusing than just acknowledge the difference. I propose something more verbose, but for good reasons. It is more trustworthy (because it avoids some arcane parser magic), easier to read and the result of parsing it is more predictable. |
@mn200 I still have not understood, why you want to replace the old-style case-expressions. This is going to cause a lot of trouble as the discussions above already indicate. I never intended to replace the old-style. Just provide an alternative. I in fact like the old-style for certain problems. So, why get rid of them? |
Another argument against using new-style case-expressions with computeLibMichael argued
However, not all new-style case expressions are easily computeable and certainly not all can be compiled to a decision tree easily. As a comparably easy example, look at the following definition of cardinality of sets.
I don't see how this can be automatically compiled to something that works with computeLib.
which defines So, we have two choices. Either we acknowledge that definitions using new-style case-expressions won't automatically work with computeLib or we define some subsets that do. That later approach is confusing to the user and takes a lot of runtime trying to compile to decision trees. We can leave that job to the user in my opinion and don't try to do it automatically. |
Having variables implicitly bound inside case expression patterns is not confusing; as witnessed by many many functional programmers successfully using exactly this form of syntax. Moreover, users are not going to be confused even if they choose not to have the verbose form pretty-printed back to them, because bound variables will appear in green. The only possible confusion when parsing a pattern is possible confusion over what is a constant and what is a variable. And again, this will be resolved instantly when terms are pretty-printed because the variables will be green and the constants will be black. |
This is not difficult, is not hard to implement, and has been done already. Moreover, I am happy to do it again (inasmuch this will require any change at all, which I doubt). The complexity of getting similar things to work with |
I am not committed to "getting rid" of old style case expressions. I am committed to not having too many syntaxes. In particular, I absolutely want to avoid the situation where we have two flavours of case expression and have to document just why and how one would prefer one over the other. If that requires two implementations of pattern-resolution, that's totally fine. (Let's ignore the situation with what happens before
in I'd like to have the new semantics attached to the old syntax so that users can write code just as before, and can also add guards, non-linear patterns and the rest as the fancy takes them. The worst case scenario is that they can either have nice syntax and confusing semantics with respect to things like additional rows being added (as at the moment), or horrible syntax and cool features. I'm sure you're right that there will be scenarios where people will want the old semantics. In this case, they should simply adjust the trace variable as necessary so that the parser will change back to the old behaviour. This should be an expert option, and one mostly provided for the sake of backwards compatibility. I haven't made my point about Of course, it has also always been possible to write definitions that do not evaluate well. (I can
In sum, I do not think this will cause a "lot of trouble", and that it will result in a better system. |
I have completely differing opinions. For me colour is an aid, but I should not need to see the colour a term is printed in to be able to parse it. We are not doing functional programming and a comparison with functional programming does not make sense, since we are in a richer setting. Therefore, we need to present the features that are not available in functional programming clearly. Setting up the parser might not be difficult for Michael, for me it unluckily is. I played around with it for 3 or 4 days when I started with the pattern matching example for a few days. It took me very long to understand, how this works and I'm still not sure, I really do. I then tried to copy the code for my PMATCH expressions and failed. For an expert like Michael, this might be not difficult, but at least I have serious trouble understanding the parser and there are a lot of special hooks and special code to make cases work. I believe that Michael's proposal will cause a lot of trouble. Luckily, explicit case expressions are not that commonly used. However, their behaviour will change with the proposal a lot. The main difference is how to evaluate the resulting terms. Having I strongly believe that the old case-expressions are useful. I want to be able to use them in certain situations. They might be handy for code generation (depending on circumstances) as well as evaluation. They -- by virtue of their syntax -- are always exhaustive and have non-overlapping cases. Their semantics is really straightforward. I like them. That's why a design principle behind my new pattern match example was to have both representations and easily be able to switch between them. So, I don't like getting rid of them, which I still believe Michael's proposal essentially aims at even if they would be available as a fallback to expert users. @mn200 Currently, we seem stuck. We seem to have different expectations how much the change would influence. And I fear you have much too much hopes in my new representation and don't understand what it can do and what it can't do well. However, we discussed 2 tests to check our expectations against reality. If you want to convince me, why not really test Anthony's example and if that goes through then move the example into Anyhow, I will write a compromise proposal in a separate message. Perhaps we can meet somewhere in the middle. |
Compromise proposalI of course like my syntax (otherwise I would not have implemented it :-).
However, I appreciate that
How about using essentially the same syntax as for the existing case-expressions (with some extensions), but use capital keywords instead. So, the example would look like
I would also like explicit bound variable syntax available. The user can choose what to use. If we do fancy, complicated parsing then perhaps also for the bound vars. So the example could as well be written like
The old-style case expressions I would keep as they are. DiscussionWith the proposal the user has the choice, which case-expressions to use. The pretty printed result would be clearly visible as either an old or a new-style case-expression. They would be similar enough in syntax though that the switch is very easy for users. Essentially, one can take an old-style case-expression, replace Since we only add extra syntax, existing code should not be effected. We can step by step play around with changing existing case expressions and see what happens. We can then with more information available decide later, what to do about |
Hi Thomas, This discussion is becoming obscure. On the one hand, case expressions in Side note: I think it is a bad idea to have multiple case-translations and Konrad. On Tue, Sep 1, 2015 at 10:10 PM, Thomas Tuerk [email protected]
|
Hi Konrad, yes, the discussion is getting obscure and I need to elaborate. Sorry. I will start early and repeat thing you know for the sake of clarity. Current case-expressionsCurrently, we can write in the input case-expressions. These are compiled to a decision tree as part of parsing. The input expressions can be non-exhaustive and have overlapping rows. However, the resulting decision tree, so the representation inside the logic does not. For missing patterns, a non-exhaustive case-expression defaults to Define
On the RHS case-expressions are removed by the parser and New Case-ExpressionsThe new case-expressions represent pattern matches directly in the logic. Therefore, they can represent overlapping rows and non-exhaustive matches. Of course, all functions in HOL are total. Therefore, a non-exhaustive pattern match is defaulting to The new case-expressions are defined with existential quantification and Hilbert's choice. They are in general not executable. There is some heavy machinery to evaluate and simplify them. This works well, but providing general theorems for Clarification of discussionThe pattern compilation for old case-expression is used by |
(@konrad-slind In case Thomas's comment above wasn't clarification enough, you may need to be aware that this issue is about a new, better, representation for case expressions, which are not necessarily exhaustive or non-overlapping. See the paper in ITP'15.) |
@xrchz I just believe that it is a new, different representation. It has advantages and disadvantages. It is not a "better" representation in my opinion. That's why I'm so reluctant to make the change Michael proposes. That's also why I made sure the user can easily switch between representations. |
I am happy to assume that big complicated examples like Anthony's will break given a new treatment of case-expressions. For this reason, I agree that we need to allow a fall-back to the old behaviour. Given that we already have code implementing the old behaviour, I don't see this as likely to cause a problem. Of course, we should support explicit bound variable syntax. I am not suggesting the opposite. It is absolutely necessary for some examples, and it's a very nice feature. Naturally, that same syntax would get printed back to the user if it is necessary to make sense of the input. I am only additionally asserting that we should allow the bound variables to be omitted when specifying them is not required. If the additional precision with respect to bound variables is not required, there are good reasons to suppose that users will not be confused:
|
@mn200 I disagree, but this is a minor issue and perhaps we should take this offline. I think I have evidence that it is confusing. Anyhow, as already stated in my original comment I don't like this behaviour, but don't object too heavily. My compromise proposal admits that point and follows your view (at the price of a complicated parser). So perhaps we can take this discussion offline and concetrate on the main issues. For me the real point of difference is whether to replace the old case-expression syntax. I don't like that. As proposed, I would keep it and add new syntax for the new case-expressions. At the very least I would provide new syntax for the old-case expressions, if we reuse the old syntax for the new ones. A flag to change parsing is not good enough. Then I would for example have trouble distinguishing the internal representations while pretty printing. Moreover, if we change the default behaviour, the tools need to be much more robust and issue like the one with |
How about this:
In other words, you never get I believe it's clear from our experience with things like |
So, to be clear. You would parse old case-expressions with something like
If that is the case, I'm happy. The two traces provide all the backwards compatibility needed in my opinion. And unluckily you are right about the need to push users. How about I would even go so far as having That however leaves the question what to do about |
Would it be possible to figure out some conversions that could be added to |
Essentially evaluation has to get rid of existential quantification and Hilbert's choice. Even in the simple examples, we need to show the injectivity of patterns to blow them away. I have no idea how this could be achieved with the simple, fast rewriting mechanisms provided by |
Yes, something like the syntax you suggest. The name I'm happy with the infix Unfortunately, I don't think a simple
In both cases, the separator token can remain optional when preceding the first case. |
As for
let alone the somewhat more complicated ones that arise once injectivity of constructors has been dealt with. For the moment, we just have to deal with it. However, it does seem as if we should be to automatically add the right conversions. The constructor injectivity theorems can be in there easily (and automatically as types are defined), so the question is handling what remains after they've fired. |
From the two syntax alternative, I prefer the second one For |
I read the paper some time ago. It's nice technology. The question is Since (I assert) one never uses case expressions except when making Konrad. On Tue, Sep 1, 2015 at 11:43 PM, Ramana Kumar [email protected]
|
Sorry, I'm confused. I don't see why the core of the pattern matching paper is already present in What do you mean with "one never uses case expressions except when making definitions"? Just as with any other function or construct in the logic, they are used in definitions, but need dealing with, when the definiton is used. Clearly they are useful for implementing |
Sorry about the confusion, I was in a rush. I am still in a rush. The simple point I was making is that, for someone This is roughly your argument to Michael, right? I don't see any The other point is merely how/whether to merge function definition Konrad. On Wed, Sep 2, 2015 at 11:58 AM, Thomas Tuerk [email protected]
|
Sorry, I still don't get you.
Ok.
No, I did not intend to say that. In fact I don't see how that fits in the discussion above. Sorry, confused.
Yes, that is sensible and actually I planned that from the very beginning when I started working on the new pattern representation. It was part of the motivation. See also issue #289. But this seems to me orthogonal to our discussion.
These tools are available already. This was the main point of the library. Sorry, I still don't get your drift. The main goals of the library were:
As a side-effect, we ended also up with more expressive case expressions. The discussion with |
Time for me to give this a rest. I will take it up again with Michael
(and Ramana, I hope!) in Sydney in a few weeks time.
Thanks,
Konrad.
|
Ok. Thanks. Sorry, if I was stupid in not getting you. It was a long day for me and I'm quite tired. Sorry. |
Use ||| instead of || everywhere. This avoids the conflict with n-bit and therefore allows to build HOL 4 again. This is a temporary measure till Michael implements the changes discussed in issue #286. Since now HOL 4 builds again, it is easier to test changes to e.g. EmitML.
@mn200 I just pushed commit b1bc3c9, which implements some syntactic checks for |
Use ||| instead of || everywhere. This avoids the conflict with n-bit and therefore allows to build HOL 4 again. This is a temporary measure till Michael implements the changes discussed in issue #286. Since now HOL 4 builds again, it is easier to test changes to e.g. EmitML.
Use ||| instead of || everywhere. This avoids the conflict with n-bit and therefore allows to build HOL 4 again. This is a temporary measure till Michael implements the changes discussed in issue #286. Since now HOL 4 builds again, it is easier to test changes to e.g. EmitML.
Use ||| instead of || everywhere. This avoids the conflict with n-bit and therefore allows to build HOL 4 again. This is a temporary measure till Michael implements the changes discussed in issue #286. Since now HOL 4 builds again, it is easier to test changes to e.g. EmitML.
Use ||| instead of || everywhere. This avoids the conflict with n-bit and therefore allows to build HOL 4 again. This is a temporary measure till Michael implements the changes discussed in issue #286. Since now HOL 4 builds again, it is easier to test changes to e.g. EmitML.
Syntax as described in github issue #286 parses. Examples: > ``case x of 0 => 3 | SUC n when n < 10 => n + 1``; val it = ``PMATCH x [PMATCH_ROW (λ_. 0) (K T) (λ_. 3); PMATCH_ROW (λn. SUC n) (λn. n < 10) (λn. n + 1)]``: term > Datatype `tree = Lf | Nd tree 'a tree`; <<HOL message: Defined type: "tree">> val it = (): unit > ``case t of Lf => 0 | Nd t1 x t2 => x + f t1``; val it = ``PMATCH t [PMATCH_ROW (λ_. Lf) (K T) (λ_. 0); PMATCH_ROW (λ(t1,x,t2). Nd t1 x t2) (K T) (λ(t1,x,t2). x + f t1)]``: term > ``x + case t of Lf => 0 | x .| Nd t1 x t2 => f t1``; <<HOL message: inventing new type variable names: 'a>> val it = ``x + PMATCH t [PMATCH_ROW (λ_. Lf) (K T) (λ_. 0); PMATCH_ROW (λx. Nd t1 x t2) (K T) (λx. f t1)]``: term > ``x + case t of Lf => 0 | t1,t2 .| Nd t1 x t2 => f t1``; val it = ``x + PMATCH t [PMATCH_ROW (λ_. Lf) (K T) (λ_. 0); PMATCH_ROW (λ(t1,t2). Nd t1 x t2) (K T) (λ(t1,t2). f t1)]``: term > ``x + case t of Lf => 0 | t1,t2 .| Nd t1 x t2 when g t2 > 6 => f t1``; val it = ``x + PMATCH t [PMATCH_ROW (λ_. Lf) (K T) (λ_. 0); PMATCH_ROW (λ(t1,t2). Nd t1 x t2) (λ(t1,t2). g t2 > 6) (λ(t1,t2). f t1)]``: term I think we do still need a separate notation to indicate no variables are to be bound, as | .| pat => rhs won't work when .| is an infix. Maybe | ..| pat => rhs This would be needed in a situation where all of the variables in pat have to be read as free (referring to variables occurring in a wider context).
I believe the bulk of this work is done on the pattern_matches branch. The syntax is not turned on by default, but could be done so with a call to |
Sounds great. Thanks. |
This work on pattern match syntax is now enabled on |
I believe the new syntax is properly documented now. So I'm closing this issue. |
This is a subtask of issue #285.
A
PMATCH
expression for expresionMEM x l
isThe following syntax available for parsing and pretty-printing:
There are several issues with this
;
at the end of each line looks odd||
notation clashes with or for wordsThe last issue needs discussion. I personally like this notation. It makes everything clearer in my opinion. However, I have to admit that it might become tedious to use. As minor mitigation, the input syntax
||!
can be used. It marks all variables occurring in the pattern as bound. However, this happens after typechecking. This means that unexpected effects might links bound variables to bound variables of other rows or free variables of the context. One might work on pre-terms instead, but this is much harder to implement.It would be great if some ideas could be exchanged and perhaps once we decided on the goals, someone could perhaps help me getting the parser working, if needed.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
The text was updated successfully, but these errors were encountered: