RFC-0051: EXCLUDE Clause #51

alancai98 · 2023-11-11T01:49:35Z

Issue #, if available: partiql/partiql-lang#27

RFC to define the EXCLUDE operator and definition in terms of existing PartiQL operators. Current reference implementation is in partiql-lang-kotlin's EvaluatingCompiler (though I'm currently working to port it to the PhysicalPlanCompilerImpl).

Rendered doc

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

RFCs/0051-exclude-operator.adoc

johnedquinn · 2023-11-17T17:58:05Z

RFCs/0051-exclude-operator.adoc

A discussion from the original issue revolves around replacing items rather than just excluding them. A major use-case of PartiQL is using PartiQL as a means of performing transformations on semi-structured, open-schema data. Mentioned in the issue are also customers who have 1000+ columns in their source tables.

From how I've been reading this RFC, we might be able to provide a useful work-around -- at least for top-level values. We can take advantage of the fact that LET evaluates before EXCLUDE. See below:

SELECT t.*, someItemThatHasBeenReplaced EXCLUDE t.b FROM t LET t.b + 1 AS someItemThatHasBeenReplaced

For nested attributes, however, I couldn't immediately find an intuitive solution.

With this RFC, do you expect any future necessary RFC's to add support for REPLACE? If so, in your opinion, does this RFC impede or allow for the addition of REPLACE?

With this RFC, do you expect any future necessary RFC's to add support for REPLACE?

That was my assumption and to leave REPLACE out of scope for this PR. REPLACE is included in the "Future possibilities" section of the RFC.

If so, in your opinion, does this RFC impede or allow for the addition of REPLACE?

I need to think more about the relationship between EXCLUDE and REPLACE. I think the syntactic rewrite included in the RFC could be adapted to support REPLACE, so I don't believe this RFC impedes an addition of REPLACE. After I get back from the Thanksgiving holiday, I'll look more into if the syntactic rewrite approach could be applied to nested attributes of REPLACE.

Playing around a bit with the rewrite rules from the RFC, we could do something similar in the nested case branches for REPLACE of nested attributes. For example, using the query from example-tuple-attribute-as-final-step, if we had added the REPLACE clause: REPLACE t.b.field_x AS t.b.field_x * 42, the rewrite could add a WHEN branch like

WHEN LOWER(attr_1) = LOWER('b') THEN CASE WHEN v_1 IS STRUCT THEN ( PIVOT ( CASE WHEN LOWER(attr_2) = LOWER('field_x') THEN v_2 * 42 ELSE v_2 END ) AT attr_2 FROM UNPIVOT v_1 AS v_2 AT attr_2 ) ELSE v_1 END ELSE v_1

The full query could look something like:

-- EXCLUDE t.a.field_x -- REPLACE t.b.field_x AS t.b.field_x * 42 SELECT t.* FROM ( SELECT VALUE { 't': CASE WHEN t IS STRUCT THEN ( PIVOT ( CASE WHEN LOWER(attr_1) = LOWER('a') THEN CASE WHEN v_1 IS STRUCT THEN ( PIVOT v_2 AT attr_2 FROM UNPIVOT v_1 AS v_2 AT attr_2 WHERE LOWER(attr_2) NOT IN [LOWER('field_x')] ) ELSE v_1 END WHEN LOWER(attr_1) = LOWER('b') THEN CASE WHEN v_1 IS STRUCT THEN ( PIVOT ( CASE WHEN LOWER(attr_2) = LOWER('field_x') THEN v_2 * 42 ELSE v_2 END ) AT attr_2 FROM UNPIVOT v_1 AS v_2 AT attr_2 ) ELSE v_1 END ELSE v_1 END ) AT attr_1 FROM UNPIVOT t AS v_1 AT attr_1 ) ELSE t END } FROM << { 'a': { 'field_x': 0, 'field_y': 'zero' }, -- `field_x` excluded 'b': { 'field_x': 1, 'field_y': 'one' }, -- `field_y` replaced with `field_y` * 42 'c': { 'field_x': 2, 'field_y': 'two' } } >> AS t )

, which the Kotlin implementation will output as:

<< { 'a': { 'field_y': 'zero' }, 'b': { 'field_x': 42, 'field_y': 'one' }, 'c': { 'field_x': 2, 'field_y': 'two' } } >>

RFCs/0051-exclude-operator.adoc

johnedquinn · 2023-11-20T19:50:13Z

RFCs/0051-exclude-operator.adoc

+
+Why is `EXCLUDE` modeled as a binding tuple operator as opposed to a value expression?::
+
+We had also considered modeling `EXCLUDE` as a value operation evaluated after the `<select clause>`. Evaluating `EXCLUDE` last could contradict the PartiQL specification's assertion that the `<select clause>` is evaluated last, which may add confusion. There were also some additional edge cases that complicated defining `EXCLUDE` as a value operator. For example, let's look at the following query:


Evaluating EXCLUDE last could contradict the PartiQL specification's assertion that the <select clause> is evaluated last, which may add confusion.

Did you explore EXCLUDE as a value operation (maybe just a special-form) evaluated during the <select clause>? Similar to how any other special-form function invocation would work in the projection list? For example:

SELECT t EXCLUDE "a", s EXCLUDE d[0] FROM << { 'a': 1, 'b': 2 } >> AS t, << { 'c': 3, 'd': [ 0, 1, 2 ] } >> AS s

This would maintain PartiQL specification's assertion that the <select clause> is evaluated last, and it gives end-users flexibility in when/where they can strip out attributes. Since it is a value expression, it could also be potentially combined with other EXCLUDEs or REPLACES. Consider a simple example with nested attributes:

SELECT person REPLACE info.age WITH info.age + 1, info.ssn_encrypted WITH udf_encrypt(info.ssn) -- some user-defined function to encrypt the SSN EXCLUDE info.ssn -- let's exclude the raw SSN FROM << { 'info': { 'age': 1, 'ssn': 10} } >> AS person

Assuming the expressions are evaluated left-to-right, the output could look like:

<< { 'info': { 'age': 2, 'ssn_encrypted': xy190d...sws8ch } } >>

Did you explore EXCLUDE as a value operation (maybe just a special-form) evaluated during the <select clause>? Similar to how any other special-form function invocation would work in the projection list?

Could you clarify what you mean by special-form function? Is this referring to non-standard function syntax (kinda like what substring, overlay do)?

Regarding modeling EXCLUDE as a value operation/function evaluated within the SELECT clause, we had also explored this option. These aren't quite functions though, since function arguments expect value expressions and exclude paths do not return values.

The above way you had modeled expressions evaluated left-to-right could be problematic/confusing. Consider SELECT EXCLUDE t.a[0], EXCLUDE t.a[1].field FROM .... Should the second exclude path remove field from the a's original 1 index or after evaluating the first exclude path (i.e. originally the 2 index). We found considering the paths as associative and evaluating the paths in parallel (i.e. applying all exclude paths on the same value/binding) as more intuitive.

Yes, meaning non-standard function syntax.

Yes -- but that is the AST, which isn't the same as execution. We don't operate on the AST -- we convert TRIM to $trim_leading and CURRENT_USER to $current_user. The SQL:1999 Specification even talks about the conversion of the syntactic BETWEEN to comparison operators -- the theoretical between AST node disappears and gets replaced. We could just as easily say <lhs> EXCLUDE ( <exclude-arg-1>, <exclude-arg-2> ) gets converted to EXCLUDE(<lhs>, [ <exclude-arg-1-stringified>, <exclude-arg-2-stringified>]). Anyways, it could provide flexibility, and I believe it's possible to model.

We could probably look at BigQuery which expects the RHS to be parenthesized. Ex: SELECT * EXCEPT (a, b). I would imagine this query in BigQuery strips out both a and b simultaneously.

Given PartiQL's flexibility, I'm personally intrigued by the prospect of writing:

SELECT VALUE { 'full_name': person.l_name || ', ' person.f_name, 'encrypted_ssn': udf_to_encrypt(person.ssn), 'all_other_details': person EXCLUDE (f_name, l_name, ssn) } FROM Persons AS person

Especially since tuples are first-class citizens, I find it long overdue for some practical built-ins (beyond TUPLEUNION which PLK doesn't have implemented -- yet). It seems like a potential opportunity to allow for its introduction, especially since SQL's SELECT gets transformed to PartiQL's SELECT VALUE <tuple>.

Cannot be a value operator if you want to ever prune binding tuple variables.

alancai98

Thanks for the review, John. Once I get back after the Thanksgiving holiday, I'll followup to your comments and apply your feedback around variable definitions + IS TUPLE/IS ARRAY in the next revision.

RFCs/0051-exclude-operator.adoc

alancai98 · 2023-11-20T22:40:58Z

RFCs/0051-exclude-operator.adoc

With this RFC, do you expect any future necessary RFC's to add support for REPLACE?

That was my assumption and to leave REPLACE out of scope for this PR. REPLACE is included in the "Future possibilities" section of the RFC.

If so, in your opinion, does this RFC impede or allow for the addition of REPLACE?

I need to think more about the relationship between EXCLUDE and REPLACE. I think the syntactic rewrite included in the RFC could be adapted to support REPLACE, so I don't believe this RFC impedes an addition of REPLACE. After I get back from the Thanksgiving holiday, I'll look more into if the syntactic rewrite approach could be applied to nested attributes of REPLACE.

alancai98 · 2023-11-20T23:55:48Z

RFCs/0051-exclude-operator.adoc

Playing around a bit with the rewrite rules from the RFC, we could do something similar in the nested case branches for REPLACE of nested attributes. For example, using the query from example-tuple-attribute-as-final-step, if we had added the REPLACE clause: REPLACE t.b.field_x AS t.b.field_x * 42, the rewrite could add a WHEN branch like

WHEN LOWER(attr_1) = LOWER('b') THEN CASE WHEN v_1 IS STRUCT THEN ( PIVOT ( CASE WHEN LOWER(attr_2) = LOWER('field_x') THEN v_2 * 42 ELSE v_2 END ) AT attr_2 FROM UNPIVOT v_1 AS v_2 AT attr_2 ) ELSE v_1 END ELSE v_1

alancai98 · 2023-11-20T23:57:55Z

RFCs/0051-exclude-operator.adoc

The full query could look something like:

-- EXCLUDE t.a.field_x -- REPLACE t.b.field_x AS t.b.field_x * 42 SELECT t.* FROM ( SELECT VALUE { 't': CASE WHEN t IS STRUCT THEN ( PIVOT ( CASE WHEN LOWER(attr_1) = LOWER('a') THEN CASE WHEN v_1 IS STRUCT THEN ( PIVOT v_2 AT attr_2 FROM UNPIVOT v_1 AS v_2 AT attr_2 WHERE LOWER(attr_2) NOT IN [LOWER('field_x')] ) ELSE v_1 END WHEN LOWER(attr_1) = LOWER('b') THEN CASE WHEN v_1 IS STRUCT THEN ( PIVOT ( CASE WHEN LOWER(attr_2) = LOWER('field_x') THEN v_2 * 42 ELSE v_2 END ) AT attr_2 FROM UNPIVOT v_1 AS v_2 AT attr_2 ) ELSE v_1 END ELSE v_1 END ) AT attr_1 FROM UNPIVOT t AS v_1 AT attr_1 ) ELSE t END } FROM << { 'a': { 'field_x': 0, 'field_y': 'zero' }, -- `field_x` excluded 'b': { 'field_x': 1, 'field_y': 'one' }, -- `field_y` replaced with `field_y` * 42 'c': { 'field_x': 2, 'field_y': 'two' } } >> AS t )

, which the Kotlin implementation will output as:

<< { 'a': { 'field_y': 'zero' }, 'b': { 'field_x': 42, 'field_y': 'one' }, 'c': { 'field_x': 2, 'field_y': 'two' } } >>

RFCs/0051-exclude-operator.adoc

alancai98 · 2023-11-21T00:35:00Z

RFCs/0051-exclude-operator.adoc

+
+Why is `EXCLUDE` modeled as a binding tuple operator as opposed to a value expression?::
+
+We had also considered modeling `EXCLUDE` as a value operation evaluated after the `<select clause>`. Evaluating `EXCLUDE` last could contradict the PartiQL specification's assertion that the `<select clause>` is evaluated last, which may add confusion. There were also some additional edge cases that complicated defining `EXCLUDE` as a value operator. For example, let's look at the following query:


Did you explore EXCLUDE as a value operation (maybe just a special-form) evaluated during the <select clause>? Similar to how any other special-form function invocation would work in the projection list?

Could you clarify what you mean by special-form function? Is this referring to non-standard function syntax (kinda like what substring, overlay do)?

Regarding modeling EXCLUDE as a value operation/function evaluated within the SELECT clause, we had also explored this option. These aren't quite functions though, since function arguments expect value expressions and exclude paths do not return values.

The above way you had modeled expressions evaluated left-to-right could be problematic/confusing. Consider SELECT EXCLUDE t.a[0], EXCLUDE t.a[1].field FROM .... Should the second exclude path remove field from the a's original 1 index or after evaluating the first exclude path (i.e. originally the 2 index). We found considering the paths as associative and evaluating the paths in parallel (i.e. applying all exclude paths on the same value/binding) as more intuitive.

alancai98

New revision addresses John's comments related to

IS STRUCT -> IS TUPLE, IS LIST -> IS ARRAY and related prose
more consistent variable definitions through the doc
change in multi-exclude path variable definitions
additional subsumption rule for empty steps
typo fixes

RFCs/0051-exclude-operator.adoc

- struct -> tuple - list -> array - consistent variable definitions - empty subsumption rule - other feedback

am357

I still need to review the rest of RFC; adding the portion that I have reviewed so far.

RFCs/0051-exclude-operator.adoc

am357 · 2023-12-12T17:54:27Z

RFCs/0051-exclude-operator.adoc

+* If sufficient schema is present and the path can be resolved, we assume the root of an `EXCLUDE` path can be omitted. The variable resolution rules follow what is already included in the PartiQL specification.
+* We require that every fully-qualified `<exclude path>` contain a root and at least one step. If a use case arises to exclude a binding tuple variable, then this functionality can be added.
+* S-expressions are part of the Ion type system.footnote:[https://amazon-ion.github.io/ion-docs/docs/spec.html#sexp]
+PartiQL should support s-expression types and values since PartiQL's type system is a superset over the Ion types. Because the current PartiQL specification does not formally define s-expressions operations, we consider the definition of collection index and wildcard steps on s-expressions as out-of-scope for this RFC.


Perhaps the statement can be less assertive; I know this is one of those hotly debated topics. The spec. says:

PartiQL’s data model extends SQL to Ion’s type system to cover schema-less and nested data. Such values can be
directly quoted with `quotes.

So text can just convey the message that s-expressions semantics as a collection type is not fully defined yet, hence is out of the scope.

Agreed. This statement makes more assertions about the PartiQL value system than does the spec.

am357 · 2023-12-12T21:16:14Z

RFCs/0051-exclude-operator.adoc

+
+NOTE: The following rules assume `root~p~=root~q~`.
+
+.Subsumption rules


I know we have the table 1. for the examples, but adding an example for each of the rules would also enhance readability.

am357 · 2023-12-12T21:19:42Z

RFCs/0051-exclude-operator.adoc

+Otherwise, there must be some step at which `p` and `q` diverge. Let's call this step's index `i`.
+
+[[anchor-1c]] Rule 1.c::
+    If `s~i~` is a tuple attribute and `t~i~` is a tuple wildcard and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`.


For this and others that apply: In order for the underlying subsumption to happen (I.e., t~i+1~...t~y~ subsumes s~i+1~...s~x~) the two should be considered as independent <exclude path>s so that they can have root; is that so? if yes, text may need to clarify that further.

am357 · 2023-12-12T21:26:57Z

RFCs/0051-exclude-operator.adoc

+[[anchor-1d]] Rule 1.d::
+    If `s~i~` is a collection index and `t~i~` is a collection wildcard and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`.
+[[anchor-1e]] Rule 1.e::
+    If `s~i~` is a case-sensitive tuple attribute and `t~i~` is a case-insensitive tuple attribute and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`.


Suggested change

If `s~i~` is a case-sensitive tuple attribute and `t~i~` is a case-insensitive tuple attribute and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`.

Given `i < y and i < x`, if `s~i~` is a case-sensitive tuple attribute and `t~i~` is a case-insensitive tuple attribute and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`.

am357 · 2023-12-12T21:29:29Z

RFCs/0051-exclude-operator.adoc

+|`t.a.b[1].c` |`t.a.b[*]`  |`q` subsumes `p` (by <<anchor-1d, 1.d>> then <<anchor-1a, 1.a>>)
+|`t.a.b[1].c` |`t.a.b[*].c`|`q` subsumes `p` (by <<anchor-1d, 1.d>> then <<anchor-1b, 1.b>>)
+|`t.a."b"`    |`t.a.b`     |`q` subsumes `p` (by <<anchor-1e, 1.e>> then <<anchor-1a, 1.a>>)
+|`t.a."b".c`  |`t.a.b.c`   |`q` subsumes `p` (by <<anchor-1e, 1.e>> then <<anchor-1b, 1.b>>)


Would be good to have a 'No subsumption rule apply` for case sensitive mismatch.

am357 · 2023-12-12T22:15:36Z

RFCs/0051-exclude-operator.adoc

+<select clause>
+FROM (
+    SELECT VALUE {
+        'r': -- Apply below rewrite rules for steps `s~1~...s~n~`


Suggested change

'r': -- Apply below rewrite rules for steps `s~1~...s~n~`

'r': -- Apply rewrite rules explained in the following sections for steps `s~1~...s~n~`

am357 · 2023-12-12T22:15:57Z

RFCs/0051-exclude-operator.adoc

+FROM (
+    SELECT VALUE {
+        'r': -- Apply below rewrite rules for steps `s~1~...s~n~`
+        ...  -- Other vars created from the other clauses


Suggested change

... -- Other vars created from the other clauses

... -- Add other variables created from the other clauses using identity function

am357 · 2023-12-12T22:22:55Z

RFCs/0051-exclude-operator.adoc

+----
+
+
+The main idea for rewriting the `EXCLUDE` steps `s~1~,...,s~n~` is to create a nested `CASE` expression for each step, whereby the nested `CASE` expressions for `s~1~,...,s~n-1~` unnest the input binding tuple and the final `CASE` expression for `s~n~` (i.e. the final step) filters out the desired tuple field(s) or collection index(es). Every exclude step has an expected type to process during evaluation. Tuple attribute and wildcard exclude steps expect a tuple. Whereas a collection index expects an array and a collection wildcard expects an array or bag. The `CASE` expression at each level `i` recreates this expected type by including a `WHEN` branch based on the expected type. Each `CASE` expression will include an `ELSE` branch which outputs the previous level's identifier. This set of branches ensures that at evaluation time, if there is a type mismatch (e.g. evaluation value is an array while the exclude step is a tuple attribute), there is no evaluation error and the previous level's value is returned through the `ELSE` branch. This behavior applies to both the permissive and strict typing modes.


Suggested change

The main idea for rewriting the `EXCLUDE` steps `s~1~,...,s~n~` is to create a nested `CASE` expression for each step, whereby the nested `CASE` expressions for `s~1~,...,s~n-1~` unnest the input binding tuple and the final `CASE` expression for `s~n~` (i.e. the final step) filters out the desired tuple field(s) or collection index(es). Every exclude step has an expected type to process during evaluation. Tuple attribute and wildcard exclude steps expect a tuple. Whereas a collection index expects an array and a collection wildcard expects an array or bag. The `CASE` expression at each level `i` recreates this expected type by including a `WHEN` branch based on the expected type. Each `CASE` expression will include an `ELSE` branch which outputs the previous level's identifier. This set of branches ensures that at evaluation time, if there is a type mismatch (e.g. evaluation value is an array while the exclude step is a tuple attribute), there is no evaluation error and the previous level's value is returned through the `ELSE` branch. This behavior applies to both the permissive and strict typing modes.

The main idea for rewriting the `EXCLUDE` steps `s~1~,...,s~n~` is to create a nested `CASE` expression for each step, whereby the nested `CASE` expressions for `s~1~,...,s~n-1~` unnest the input binding tuple and the final `CASE` expression for `s~n~` (i.e. the final step) filters out the desired tuple field(s) or collection index(es). Every exclude step has an expected type to process during evaluation. Tuple attribute and wildcard exclude steps expect a `Tuple`, whereas a collection index expects `Array` and a collection wildcard expects `Array` or `Bag` types. The `CASE` expression at each level `i` recreates this expected type by including a `WHEN` branch based on the expected type. Each `CASE` expression will include an `ELSE` branch which outputs the previous level's identifier. This set of branches ensures that at evaluation time, if there is a type mismatch (e.g. evaluation value is an array while the exclude step is a tuple attribute), there is no evaluation error and the previous level's value is returned through the `ELSE` branch. This behavior applies to both the permissive and strict typing modes.

RFCs/0051-exclude-operator.adoc

am357 · 2023-12-20T22:15:57Z

RFCs/0051-exclude-operator.adoc

+---
+We first illustrate the rewrite rule for a single `EXCLUDE` path and then explain the syntax rewrite for multiple exclude paths.
+
+==== Step 2 (single): rewrite a single `EXCLUDE` path


Considering that there are many rules, perhaps a pseudo-code accompanying the text explanation could also be helpful.

almann · 2023-12-20T22:12:28Z

RFCs/0051-exclude-operator.adoc

+=== Out of scope / assumptions
+
+* We restrict tuple attribute exclude steps to use string literals and collection index exclude steps to use int literals. Thus `<exclude paths>` are statically known. We can decide whether to add other exclude paths (e.g. expressions) if a use case arises.
+* If sufficient schema is present and the path can be resolved, we assume the root of an `EXCLUDE` path can be omitted. The variable resolution rules follow what is already included in the PartiQL specification.


I think we might want to have an example of attribute as a variable.

almann · 2023-12-20T22:14:02Z

RFCs/0051-exclude-operator.adoc

Top-level, let's format these lines to be like 80 or 120 characters wide.

almann · 2023-12-20T22:15:27Z

RFCs/0051-exclude-operator.adoc

+
+* We restrict tuple attribute exclude steps to use string literals and collection index exclude steps to use int literals. Thus `<exclude paths>` are statically known. We can decide whether to add other exclude paths (e.g. expressions) if a use case arises.
+* If sufficient schema is present and the path can be resolved, we assume the root of an `EXCLUDE` path can be omitted. The variable resolution rules follow what is already included in the PartiQL specification.
+* We require that every fully-qualified `<exclude path>` contain a root and at least one step. If a use case arises to exclude a binding tuple variable, then this functionality can be added.


What is the rationale for this limitation? We should put that here.

almann · 2023-12-20T22:17:30Z

RFCs/0051-exclude-operator.adoc

+
+Why is `EXCLUDE` modeled as a binding tuple operator as opposed to a value expression?::
+
+We had also considered modeling `EXCLUDE` as a value operation evaluated after the `<select clause>`. Evaluating `EXCLUDE` last could contradict the PartiQL specification's assertion that the `<select clause>` is evaluated last, which may add confusion. There were also some additional edge cases that complicated defining `EXCLUDE` as a value operator. For example, let's look at the following query:


Cannot be a value operator if you want to ever prune binding tuple variables.

jpschorr · 2023-12-20T22:09:00Z

RFCs/0051-exclude-operator.adoc

+* If sufficient schema is present and the path can be resolved, we assume the root of an `EXCLUDE` path can be omitted. The variable resolution rules follow what is already included in the PartiQL specification.
+* We require that every fully-qualified `<exclude path>` contain a root and at least one step. If a use case arises to exclude a binding tuple variable, then this functionality can be added.
+* S-expressions are part of the Ion type system.footnote:[https://amazon-ion.github.io/ion-docs/docs/spec.html#sexp]
+PartiQL should support s-expression types and values since PartiQL's type system is a superset over the Ion types. Because the current PartiQL specification does not formally define s-expressions operations, we consider the definition of collection index and wildcard steps on s-expressions as out-of-scope for this RFC.


Agreed. This statement makes more assertions about the PartiQL value system than does the spec.

jpschorr · 2023-12-20T22:17:22Z

RFCs/0051-exclude-operator.adoc

+=== Out of scope / assumptions
+
+* We restrict tuple attribute exclude steps to use string literals and collection index exclude steps to use int literals. Thus `<exclude paths>` are statically known. We can decide whether to add other exclude paths (e.g. expressions) if a use case arises.
+* If sufficient schema is present and the path can be resolved, we assume the root of an `EXCLUDE` path can be omitted. The variable resolution rules follow what is already included in the PartiQL specification.
+* We require that every fully-qualified `<exclude path>` contain a root and at least one step. If a use case arises to exclude a binding tuple variable, then this functionality can be added.
+* S-expressions are part of the Ion type system.footnote:[https://amazon-ion.github.io/ion-docs/docs/spec.html#sexp]
+PartiQL should support s-expression types and values since PartiQL's type system is a superset over the Ion types. Because the current PartiQL specification does not formally define s-expressions operations, we consider the definition of collection index and wildcard steps on s-expressions as out-of-scope for this RFC.


I would reword this section. I would avoid usage of terms like 'we' and write it more formally and more tersely.

e.g.,

=== Limitations * This RFC requires that every fully-qualified `<exclude path>` contain a root and at least one step. * This RFC restricts tuple attribute exclude steps to use string literals and collection index exclude steps to use int literals. Thus `<exclude paths>` are statically known. This * This RFC makes no changes to schema and name inference, and assumes that such inference is run as a prerequisite. * This RFC defines `<exclude paths>` only over `list`, `bag`, and `tuple` value collections.

am357 · 2023-12-20T22:40:16Z

RFCs/0051-exclude-operator.adoc

+db.inventory.find( { status: "A" }, { status: 0, instock: 0 } )
+----
+
+== Unresolved questions


Wondering if partial schema case be out of scope only? i.e. having SQL's schema-full within the scope as well.

johnedquinn · 2024-03-20T18:19:47Z

I have come across a scenario where I'd like to express caution. It has to do with scoping of variables specifically when EXCLUDE is present. Consider the following simple query

SELECT
    t.a,
    t.b -- this won't produce anything!
EXCLUDE t.b
FROM t

This makes sense. However, I've been dealing with nested queries, and I'm curious what would happen in the following scenario:

SELECT
    (
        SELECT t2.c + t1.a -- this shouldn't work!
        EXCLUDE t1.a
        FROM t2
    ) AS t1_plus_t2
FROM t1 -- this has columns a and b

In the above, your RFC works great I believe. The query should fail. However, this may cause problems if EXCLUDE is allowed to remove bindings entirely. In my opinion, here is the query (again) with some more information regarding binding environments:

global env = E0 = < t1: <<{a, b}>>, t2: <<{c, d}>> >
SELECT
    (
        SELECT t2.c + t1.a AS x-- input env = E0 || E1 || E2 = < t1: {b}, t2: {c, d} >. Output env = < x >
        EXCLUDE t1.a -- input env = E0 || E1 || E2. Output Env = E0 || E1 || E2 (with some minor eliminations of attributes) = < t1: {b}, t2: {c, d} >
        FROM t2 -- input env = E0 || E1. Output env (E2) = E0 || E1 || < t2: {c, d} > = < t1: {a, b}, t2: {c, d} >
    ) AS t1_plus_t2 -- the whole SELECT (including this projection item subquery) has input env = E0 || E1
FROM t1 -- input env = E0, output env (E1) = E0 || < t1: { a, b } > = < t1: {a, b}, t2: <<{c, d}>> >

If we want to make sure that the inner select does NOT get access to what is being excluded, then we must not allow EXCLUDE to exclude whole bindings (rather than attributes of bindings). If we had a very similar query that excluded an entire binding, we might still allow the projection list to access the original t1.a. See below:

global env = E0 = < t1: <<{a, b}>>, t2: <<{c, d}>> >
SELECT
    (
        SELECT t2.c + t1.a AS x-- input env = E0 || E1 || E3 = < t1: {a, b}, t2: {c, d} >. Output env (E4) = < x >
        EXCLUDE t1 -- input env = E0 || E1 || E2. Output Env (E3) = E0 || E1 || E2 = < t2: {c, d} >
        FROM t2 -- input env = E0 || E1. Output env (E2) = E0 || E1 || < t2: {c, d} > = < t1: {a, b}, t2: {c, d} >
    ) AS t1_plus_t2 -- the whole SELECT (including this projection item subquery) has input env = E0 || E1
FROM t1 -- input env = E0, output env (E1) = E0 || < t1: { a, b } > = < t1: {a, b}, t2: <<{c, d}>> >

Notice that the inner select still received t1 in its entirety due to the concatenation of environments (effectively bypassing the exclusion). One might ask: "That wouldn't be the case. Bindings always come from their inputs. Why is this different?" We might look at another operator that strips out bindings, the aggregation. In other engines, we can still access the outer variables. See the following examples:

alancai98 self-assigned this Nov 11, 2023

alancai98 changed the title ~~[WIP] EXCLUDE clause RFC~~ [WIP] EXCLUDE clause RFC draft Nov 11, 2023

alancai98 force-pushed the exclude-rfc branch 2 times, most recently from 595268b to dc140e4 Compare November 11, 2023 01:52

alancai98 marked this pull request as draft November 11, 2023 01:55

alancai98 changed the title ~~[WIP] EXCLUDE clause RFC draft~~ EXCLUDE clause RFC draft Nov 15, 2023

alancai98 marked this pull request as ready for review November 15, 2023 23:57

alancai98 requested review from jpschorr, almann, am357 and johnedquinn November 15, 2023 23:58

alancai98 changed the title ~~EXCLUDE clause RFC draft~~ RFC-0051: EXCLUDE Clause Nov 16, 2023

alancai98 added the RFC label Nov 16, 2023

alancai98 commented Nov 16, 2023

View reviewed changes

RFCs/0051-exclude-operator.adoc Show resolved Hide resolved

johnedquinn reviewed Nov 20, 2023

View reviewed changes

alancai98 commented Nov 21, 2023

View reviewed changes

alancai98 force-pushed the exclude-rfc branch from 5c5d0da to 2386536 Compare November 27, 2023 23:46

alancai98 commented Nov 27, 2023

View reviewed changes

RFCs/0051-exclude-operator.adoc Outdated Show resolved Hide resolved

alancai98 requested a review from johnedquinn November 27, 2023 23:51

alancai98 added 8 commits December 4, 2023 15:37

[WIP] EXCLUDE clause RFC

219ea6e

Add anchors/refs, subsumption examples, and typo fixes

10eaba1

Add drawbacks, rationale & alternatives, unresolved questions

833dfda

Multiple path edge case, another multi path example, fix typos

4b394d7

Fix other typos

3dfbf94

Add parens around subclauses

efda425

Apply John's feedback

d379a6d

- struct -> tuple - list -> array - consistent variable definitions - empty subsumption rule - other feedback

Some typo fixes + rebase

cea4b7a

alancai98 force-pushed the exclude-rfc branch from 2386536 to cea4b7a Compare December 5, 2023 00:09

am357 reviewed Dec 19, 2023

View reviewed changes

alancai98 mentioned this pull request Dec 20, 2023

Add EXCLUDE to partiql-eval partiql/partiql-lang-kotlin#1320

Merged

jpschorr reviewed Dec 20, 2023

View reviewed changes

RFCs/0051-exclude-operator.adoc Show resolved Hide resolved

RFCs/0051-exclude-operator.adoc Show resolved Hide resolved

am357 reviewed Dec 20, 2023

View reviewed changes

almann reviewed Dec 20, 2023

View reviewed changes

jpschorr reviewed Dec 20, 2023

View reviewed changes

am357 reviewed Dec 20, 2023

View reviewed changes

jpschorr mentioned this pull request Jul 25, 2024

Add Parsing of EXCLUDE partiql/partiql-lang-rust#480

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC-0051: EXCLUDE Clause #51

RFC-0051: EXCLUDE Clause #51

alancai98 commented Nov 11, 2023 •

edited

Loading

johnedquinn Nov 17, 2023

alancai98 Nov 20, 2023

alancai98 Nov 20, 2023

alancai98 Nov 20, 2023

johnedquinn Nov 20, 2023

alancai98 Nov 21, 2023 •

edited

Loading

johnedquinn Dec 8, 2023

almann Dec 20, 2023

alancai98 left a comment

alancai98 Nov 20, 2023

alancai98 Nov 20, 2023

alancai98 Nov 20, 2023

alancai98 Nov 21, 2023 •

edited

Loading

alancai98 left a comment

am357 left a comment

am357 Dec 12, 2023

jpschorr Dec 20, 2023

am357 Dec 12, 2023

am357 Dec 12, 2023

am357 Dec 12, 2023

am357 Dec 12, 2023

am357 Dec 12, 2023

am357 Dec 12, 2023

am357 Dec 12, 2023

am357 Dec 20, 2023

almann Dec 20, 2023

almann Dec 20, 2023

almann Dec 20, 2023

almann Dec 20, 2023

jpschorr Dec 20, 2023

jpschorr Dec 20, 2023

am357 Dec 20, 2023

johnedquinn commented Mar 20, 2024


		Why is `EXCLUDE` modeled as a binding tuple operator as opposed to a value expression?::

		We had also considered modeling `EXCLUDE` as a value operation evaluated after the `<select clause>`. Evaluating `EXCLUDE` last could contradict the PartiQL specification's assertion that the `<select clause>` is evaluated last, which may add confusion. There were also some additional edge cases that complicated defining `EXCLUDE` as a value operator. For example, let's look at the following query:


		NOTE: The following rules assume `root~p~=root~q~`.

		.Subsumption rules

	If `s~i~` is a case-sensitive tuple attribute and `t~i~` is a case-insensitive tuple attribute and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`.
	Given `i < y and i < x`, if `s~i~` is a case-sensitive tuple attribute and `t~i~` is a case-insensitive tuple attribute and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`.

	'r': -- Apply below rewrite rules for steps `s~1~...s~n~`
	'r': -- Apply rewrite rules explained in the following sections for steps `s~1~...s~n~`

	... -- Other vars created from the other clauses
	... -- Add other variables created from the other clauses using identity function

		----


		The main idea for rewriting the `EXCLUDE` steps `s~1~,...,s~n~` is to create a nested `CASE` expression for each step, whereby the nested `CASE` expressions for `s~1~,...,s~n-1~` unnest the input binding tuple and the final `CASE` expression for `s~n~` (i.e. the final step) filters out the desired tuple field(s) or collection index(es). Every exclude step has an expected type to process during evaluation. Tuple attribute and wildcard exclude steps expect a tuple. Whereas a collection index expects an array and a collection wildcard expects an array or bag. The `CASE` expression at each level `i` recreates this expected type by including a `WHEN` branch based on the expected type. Each `CASE` expression will include an `ELSE` branch which outputs the previous level's identifier. This set of branches ensures that at evaluation time, if there is a type mismatch (e.g. evaluation value is an array while the exclude step is a tuple attribute), there is no evaluation error and the previous level's value is returned through the `ELSE` branch. This behavior applies to both the permissive and strict typing modes.

RFC-0051: EXCLUDE Clause #51

Are you sure you want to change the base?

RFC-0051: EXCLUDE Clause #51

Conversation

alancai98 commented Nov 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alancai98 Nov 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alancai98 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alancai98 Nov 21, 2023 • edited Loading

Choose a reason for hiding this comment

alancai98 left a comment

Choose a reason for hiding this comment

am357 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnedquinn commented Mar 20, 2024

alancai98 commented Nov 11, 2023 •

edited

Loading

alancai98 Nov 21, 2023 •

edited

Loading

alancai98 Nov 21, 2023 •

edited

Loading