-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC-0051: EXCLUDE Clause #51
base: main
Are you sure you want to change the base?
Conversation
595268b
to
dc140e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A discussion from the original issue revolves around replacing items rather than just excluding them. A major use-case of PartiQL is using PartiQL as a means of performing transformations on semi-structured, open-schema data. Mentioned in the issue are also customers who have 1000+ columns in their source tables.
From how I've been reading this RFC, we might be able to provide a useful work-around -- at least for top-level values. We can take advantage of the fact that LET
evaluates before EXCLUDE
. See below:
SELECT t.*, someItemThatHasBeenReplaced
EXCLUDE t.b
FROM t
LET t.b + 1 AS someItemThatHasBeenReplaced
For nested attributes, however, I couldn't immediately find an intuitive solution.
With this RFC, do you expect any future necessary RFC's to add support for REPLACE
? If so, in your opinion, does this RFC impede or allow for the addition of REPLACE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this RFC, do you expect any future necessary RFC's to add support for
REPLACE
?
That was my assumption and to leave REPLACE
out of scope for this PR. REPLACE
is included in the "Future possibilities" section of the RFC.
If so, in your opinion, does this RFC impede or allow for the addition of REPLACE?
I need to think more about the relationship between EXCLUDE
and REPLACE
. I think the syntactic rewrite included in the RFC could be adapted to support REPLACE
, so I don't believe this RFC impedes an addition of REPLACE
. After I get back from the Thanksgiving holiday, I'll look more into if the syntactic rewrite approach could be applied to nested attributes of REPLACE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing around a bit with the rewrite rules from the RFC, we could do something similar in the nested case branches for REPLACE
of nested attributes. For example, using the query from example-tuple-attribute-as-final-step, if we had added the REPLACE
clause: REPLACE t.b.field_x AS t.b.field_x * 42
, the rewrite could add a WHEN
branch like
WHEN LOWER(attr_1) = LOWER('b') THEN
CASE
WHEN v_1 IS STRUCT THEN (
PIVOT (
CASE
WHEN LOWER(attr_2) = LOWER('field_x') THEN v_2 * 42
ELSE v_2
END
) AT attr_2
FROM UNPIVOT v_1 AS v_2 AT attr_2
)
ELSE v_1
END
ELSE v_1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full query could look something like:
-- EXCLUDE t.a.field_x
-- REPLACE t.b.field_x AS t.b.field_x * 42
SELECT t.*
FROM (
SELECT VALUE {
't':
CASE
WHEN t IS STRUCT THEN (
PIVOT (
CASE
WHEN LOWER(attr_1) = LOWER('a') THEN
CASE
WHEN v_1 IS STRUCT THEN (
PIVOT v_2 AT attr_2
FROM UNPIVOT v_1 AS v_2 AT attr_2
WHERE LOWER(attr_2) NOT IN [LOWER('field_x')]
)
ELSE v_1
END
WHEN LOWER(attr_1) = LOWER('b') THEN
CASE
WHEN v_1 IS STRUCT THEN (
PIVOT (
CASE
WHEN LOWER(attr_2) = LOWER('field_x') THEN v_2 * 42
ELSE v_2
END
) AT attr_2
FROM UNPIVOT v_1 AS v_2 AT attr_2
)
ELSE v_1
END
ELSE v_1
END
) AT attr_1 FROM UNPIVOT t AS v_1 AT attr_1
)
ELSE t
END
}
FROM <<
{
'a': { 'field_x': 0, 'field_y': 'zero' }, -- `field_x` excluded
'b': { 'field_x': 1, 'field_y': 'one' }, -- `field_y` replaced with `field_y` * 42
'c': { 'field_x': 2, 'field_y': 'two' }
}
>> AS t
)
, which the Kotlin implementation will output as:
<<
{
'a': {
'field_y': 'zero'
},
'b': {
'field_x': 42,
'field_y': 'one'
},
'c': {
'field_x': 2,
'field_y': 'two'
}
}
>>
|
||
Why is `EXCLUDE` modeled as a binding tuple operator as opposed to a value expression?:: | ||
|
||
We had also considered modeling `EXCLUDE` as a value operation evaluated after the `<select clause>`. Evaluating `EXCLUDE` last could contradict the PartiQL specification's assertion that the `<select clause>` is evaluated last, which may add confusion. There were also some additional edge cases that complicated defining `EXCLUDE` as a value operator. For example, let's look at the following query: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evaluating
EXCLUDE
last could contradict the PartiQL specification's assertion that the<select clause>
is evaluated last, which may add confusion.
Did you explore EXCLUDE
as a value operation (maybe just a special-form) evaluated during the <select clause>
? Similar to how any other special-form function invocation would work in the projection list? For example:
SELECT t EXCLUDE "a", s EXCLUDE d[0]
FROM <<
{ 'a': 1, 'b': 2 }
>> AS t, <<
{ 'c': 3, 'd': [ 0, 1, 2 ] }
>> AS s
This would maintain PartiQL specification's assertion that the <select clause>
is evaluated last, and it gives end-users flexibility in when/where they can strip out attributes. Since it is a value expression, it could also be potentially combined with other EXCLUDEs
or REPLACES
. Consider a simple example with nested attributes:
SELECT
person
REPLACE
info.age WITH info.age + 1,
info.ssn_encrypted WITH udf_encrypt(info.ssn) -- some user-defined function to encrypt the SSN
EXCLUDE info.ssn -- let's exclude the raw SSN
FROM <<
{ 'info': { 'age': 1, 'ssn': 10} }
>> AS person
Assuming the expressions are evaluated left-to-right, the output could look like:
<<
{ 'info': { 'age': 2, 'ssn_encrypted': xy190d...sws8ch } }
>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you explore
EXCLUDE
as a value operation (maybe just a special-form) evaluated during the<select clause>
? Similar to how any other special-form function invocation would work in the projection list?
Could you clarify what you mean by special-form function? Is this referring to non-standard function syntax (kinda like what substring, overlay do)?
Regarding modeling EXCLUDE
as a value operation/function evaluated within the SELECT
clause, we had also explored this option. These aren't quite functions though, since function arguments expect value expressions and exclude paths do not return values.
The above way you had modeled expressions evaluated left-to-right could be problematic/confusing. Consider SELECT EXCLUDE t.a[0], EXCLUDE t.a[1].field FROM ...
. Should the second exclude path remove field
from the a
's original 1
index or after evaluating the first exclude path (i.e. originally the 2
index). We found considering the paths as associative and evaluating the paths in parallel (i.e. applying all exclude paths on the same value/binding) as more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes, meaning non-standard function syntax.
- Yes -- but that is the AST, which isn't the same as execution. We don't operate on the AST -- we convert
TRIM
to$trim_leading
andCURRENT_USER
to$current_user
. The SQL:1999 Specification even talks about the conversion of the syntacticBETWEEN
to comparison operators -- the theoreticalbetween
AST node disappears and gets replaced. We could just as easily say<lhs> EXCLUDE ( <exclude-arg-1>, <exclude-arg-2> )
gets converted toEXCLUDE(<lhs>, [ <exclude-arg-1-stringified>, <exclude-arg-2-stringified>])
. Anyways, it could provide flexibility, and I believe it's possible to model. - We could probably look at BigQuery which expects the RHS to be parenthesized. Ex:
SELECT * EXCEPT (a, b)
. I would imagine this query in BigQuery strips out botha
andb
simultaneously.
Given PartiQL's flexibility, I'm personally intrigued by the prospect of writing:
SELECT VALUE {
'full_name': person.l_name || ', ' person.f_name,
'encrypted_ssn': udf_to_encrypt(person.ssn),
'all_other_details': person EXCLUDE (f_name, l_name, ssn)
} FROM Persons AS person
Especially since tuples are first-class citizens, I find it long overdue for some practical built-ins (beyond TUPLEUNION
which PLK doesn't have implemented -- yet). It seems like a potential opportunity to allow for its introduction, especially since SQL's SELECT
gets transformed to PartiQL's SELECT VALUE <tuple>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot be a value operator if you want to ever prune binding tuple variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, John. Once I get back after the Thanksgiving holiday, I'll followup to your comments and apply your feedback around variable definitions + IS TUPLE
/IS ARRAY
in the next revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this RFC, do you expect any future necessary RFC's to add support for
REPLACE
?
That was my assumption and to leave REPLACE
out of scope for this PR. REPLACE
is included in the "Future possibilities" section of the RFC.
If so, in your opinion, does this RFC impede or allow for the addition of REPLACE?
I need to think more about the relationship between EXCLUDE
and REPLACE
. I think the syntactic rewrite included in the RFC could be adapted to support REPLACE
, so I don't believe this RFC impedes an addition of REPLACE
. After I get back from the Thanksgiving holiday, I'll look more into if the syntactic rewrite approach could be applied to nested attributes of REPLACE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing around a bit with the rewrite rules from the RFC, we could do something similar in the nested case branches for REPLACE
of nested attributes. For example, using the query from example-tuple-attribute-as-final-step, if we had added the REPLACE
clause: REPLACE t.b.field_x AS t.b.field_x * 42
, the rewrite could add a WHEN
branch like
WHEN LOWER(attr_1) = LOWER('b') THEN
CASE
WHEN v_1 IS STRUCT THEN (
PIVOT (
CASE
WHEN LOWER(attr_2) = LOWER('field_x') THEN v_2 * 42
ELSE v_2
END
) AT attr_2
FROM UNPIVOT v_1 AS v_2 AT attr_2
)
ELSE v_1
END
ELSE v_1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full query could look something like:
-- EXCLUDE t.a.field_x
-- REPLACE t.b.field_x AS t.b.field_x * 42
SELECT t.*
FROM (
SELECT VALUE {
't':
CASE
WHEN t IS STRUCT THEN (
PIVOT (
CASE
WHEN LOWER(attr_1) = LOWER('a') THEN
CASE
WHEN v_1 IS STRUCT THEN (
PIVOT v_2 AT attr_2
FROM UNPIVOT v_1 AS v_2 AT attr_2
WHERE LOWER(attr_2) NOT IN [LOWER('field_x')]
)
ELSE v_1
END
WHEN LOWER(attr_1) = LOWER('b') THEN
CASE
WHEN v_1 IS STRUCT THEN (
PIVOT (
CASE
WHEN LOWER(attr_2) = LOWER('field_x') THEN v_2 * 42
ELSE v_2
END
) AT attr_2
FROM UNPIVOT v_1 AS v_2 AT attr_2
)
ELSE v_1
END
ELSE v_1
END
) AT attr_1 FROM UNPIVOT t AS v_1 AT attr_1
)
ELSE t
END
}
FROM <<
{
'a': { 'field_x': 0, 'field_y': 'zero' }, -- `field_x` excluded
'b': { 'field_x': 1, 'field_y': 'one' }, -- `field_y` replaced with `field_y` * 42
'c': { 'field_x': 2, 'field_y': 'two' }
}
>> AS t
)
, which the Kotlin implementation will output as:
<<
{
'a': {
'field_y': 'zero'
},
'b': {
'field_x': 42,
'field_y': 'one'
},
'c': {
'field_x': 2,
'field_y': 'two'
}
}
>>
|
||
Why is `EXCLUDE` modeled as a binding tuple operator as opposed to a value expression?:: | ||
|
||
We had also considered modeling `EXCLUDE` as a value operation evaluated after the `<select clause>`. Evaluating `EXCLUDE` last could contradict the PartiQL specification's assertion that the `<select clause>` is evaluated last, which may add confusion. There were also some additional edge cases that complicated defining `EXCLUDE` as a value operator. For example, let's look at the following query: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you explore
EXCLUDE
as a value operation (maybe just a special-form) evaluated during the<select clause>
? Similar to how any other special-form function invocation would work in the projection list?
Could you clarify what you mean by special-form function? Is this referring to non-standard function syntax (kinda like what substring, overlay do)?
Regarding modeling EXCLUDE
as a value operation/function evaluated within the SELECT
clause, we had also explored this option. These aren't quite functions though, since function arguments expect value expressions and exclude paths do not return values.
The above way you had modeled expressions evaluated left-to-right could be problematic/confusing. Consider SELECT EXCLUDE t.a[0], EXCLUDE t.a[1].field FROM ...
. Should the second exclude path remove field
from the a
's original 1
index or after evaluating the first exclude path (i.e. originally the 2
index). We found considering the paths as associative and evaluating the paths in parallel (i.e. applying all exclude paths on the same value/binding) as more intuitive.
5c5d0da
to
2386536
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New revision addresses John's comments related to
IS STRUCT
->IS TUPLE
,IS LIST
->IS ARRAY
and related prose- more consistent variable definitions through the doc
- change in multi-exclude path variable definitions
- additional subsumption rule for empty steps
- typo fixes
- struct -> tuple - list -> array - consistent variable definitions - empty subsumption rule - other feedback
2386536
to
cea4b7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to review the rest of RFC; adding the portion that I have reviewed so far.
* If sufficient schema is present and the path can be resolved, we assume the root of an `EXCLUDE` path can be omitted. The variable resolution rules follow what is already included in the PartiQL specification. | ||
* We require that every fully-qualified `<exclude path>` contain a root and at least one step. If a use case arises to exclude a binding tuple variable, then this functionality can be added. | ||
* S-expressions are part of the Ion type system.footnote:[https://amazon-ion.github.io/ion-docs/docs/spec.html#sexp] | ||
PartiQL should support s-expression types and values since PartiQL's type system is a superset over the Ion types. Because the current PartiQL specification does not formally define s-expressions operations, we consider the definition of collection index and wildcard steps on s-expressions as out-of-scope for this RFC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the statement can be less assertive; I know this is one of those hotly debated topics. The spec. says:
PartiQL’s data model extends SQL to Ion’s type system to cover schema-less and nested data. Such values can be
directly quoted with `quotes.
So text can just convey the message that s-expressions semantics as a collection type is not fully defined yet, hence is out of the scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This statement makes more assertions about the PartiQL value system than does the spec.
|
||
NOTE: The following rules assume `root~p~=root~q~`. | ||
|
||
.Subsumption rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we have the table 1.
for the examples, but adding an example for each of the rules would also enhance readability.
Otherwise, there must be some step at which `p` and `q` diverge. Let's call this step's index `i`. | ||
|
||
[[anchor-1c]] Rule 1.c:: | ||
If `s~i~` is a tuple attribute and `t~i~` is a tuple wildcard and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this and others that apply: In order for the underlying subsumption to happen (I.e., t~i+1~...t~y~
subsumes s~i+1~...s~x~
) the two should be considered as independent <exclude path>
s so that they can have root; is that so? if yes, text may need to clarify that further.
[[anchor-1d]] Rule 1.d:: | ||
If `s~i~` is a collection index and `t~i~` is a collection wildcard and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`. | ||
[[anchor-1e]] Rule 1.e:: | ||
If `s~i~` is a case-sensitive tuple attribute and `t~i~` is a case-insensitive tuple attribute and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If `s~i~` is a case-sensitive tuple attribute and `t~i~` is a case-insensitive tuple attribute and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`. | |
Given `i < y and i < x`, if `s~i~` is a case-sensitive tuple attribute and `t~i~` is a case-insensitive tuple attribute and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`. |
|`t.a.b[1].c` |`t.a.b[*]` |`q` subsumes `p` (by <<anchor-1d, 1.d>> then <<anchor-1a, 1.a>>) | ||
|`t.a.b[1].c` |`t.a.b[*].c`|`q` subsumes `p` (by <<anchor-1d, 1.d>> then <<anchor-1b, 1.b>>) | ||
|`t.a."b"` |`t.a.b` |`q` subsumes `p` (by <<anchor-1e, 1.e>> then <<anchor-1a, 1.a>>) | ||
|`t.a."b".c` |`t.a.b.c` |`q` subsumes `p` (by <<anchor-1e, 1.e>> then <<anchor-1b, 1.b>>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have a 'No subsumption rule apply` for case sensitive mismatch.
<select clause> | ||
FROM ( | ||
SELECT VALUE { | ||
'r': -- Apply below rewrite rules for steps `s~1~...s~n~` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'r': -- Apply below rewrite rules for steps `s~1~...s~n~` | |
'r': -- Apply rewrite rules explained in the following sections for steps `s~1~...s~n~` |
FROM ( | ||
SELECT VALUE { | ||
'r': -- Apply below rewrite rules for steps `s~1~...s~n~` | ||
... -- Other vars created from the other clauses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... -- Other vars created from the other clauses | |
... -- Add other variables created from the other clauses using identity function |
---- | ||
|
||
|
||
The main idea for rewriting the `EXCLUDE` steps `s~1~,...,s~n~` is to create a nested `CASE` expression for each step, whereby the nested `CASE` expressions for `s~1~,...,s~n-1~` unnest the input binding tuple and the final `CASE` expression for `s~n~` (i.e. the final step) filters out the desired tuple field(s) or collection index(es). Every exclude step has an expected type to process during evaluation. Tuple attribute and wildcard exclude steps expect a tuple. Whereas a collection index expects an array and a collection wildcard expects an array or bag. The `CASE` expression at each level `i` recreates this expected type by including a `WHEN` branch based on the expected type. Each `CASE` expression will include an `ELSE` branch which outputs the previous level's identifier. This set of branches ensures that at evaluation time, if there is a type mismatch (e.g. evaluation value is an array while the exclude step is a tuple attribute), there is no evaluation error and the previous level's value is returned through the `ELSE` branch. This behavior applies to both the permissive and strict typing modes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main idea for rewriting the `EXCLUDE` steps `s~1~,...,s~n~` is to create a nested `CASE` expression for each step, whereby the nested `CASE` expressions for `s~1~,...,s~n-1~` unnest the input binding tuple and the final `CASE` expression for `s~n~` (i.e. the final step) filters out the desired tuple field(s) or collection index(es). Every exclude step has an expected type to process during evaluation. Tuple attribute and wildcard exclude steps expect a tuple. Whereas a collection index expects an array and a collection wildcard expects an array or bag. The `CASE` expression at each level `i` recreates this expected type by including a `WHEN` branch based on the expected type. Each `CASE` expression will include an `ELSE` branch which outputs the previous level's identifier. This set of branches ensures that at evaluation time, if there is a type mismatch (e.g. evaluation value is an array while the exclude step is a tuple attribute), there is no evaluation error and the previous level's value is returned through the `ELSE` branch. This behavior applies to both the permissive and strict typing modes. | |
The main idea for rewriting the `EXCLUDE` steps `s~1~,...,s~n~` is to create a nested `CASE` expression for each step, whereby the nested `CASE` expressions for `s~1~,...,s~n-1~` unnest the input binding tuple and the final `CASE` expression for `s~n~` (i.e. the final step) filters out the desired tuple field(s) or collection index(es). Every exclude step has an expected type to process during evaluation. Tuple attribute and wildcard exclude steps expect a `Tuple`, whereas a collection index expects `Array` and a collection wildcard expects `Array` or `Bag` types. The `CASE` expression at each level `i` recreates this expected type by including a `WHEN` branch based on the expected type. Each `CASE` expression will include an `ELSE` branch which outputs the previous level's identifier. This set of branches ensures that at evaluation time, if there is a type mismatch (e.g. evaluation value is an array while the exclude step is a tuple attribute), there is no evaluation error and the previous level's value is returned through the `ELSE` branch. This behavior applies to both the permissive and strict typing modes. |
--- | ||
We first illustrate the rewrite rule for a single `EXCLUDE` path and then explain the syntax rewrite for multiple exclude paths. | ||
|
||
==== Step 2 (single): rewrite a single `EXCLUDE` path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that there are many rules, perhaps a pseudo-code accompanying the text explanation could also be helpful.
=== Out of scope / assumptions | ||
|
||
* We restrict tuple attribute exclude steps to use string literals and collection index exclude steps to use int literals. Thus `<exclude paths>` are statically known. We can decide whether to add other exclude paths (e.g. expressions) if a use case arises. | ||
* If sufficient schema is present and the path can be resolved, we assume the root of an `EXCLUDE` path can be omitted. The variable resolution rules follow what is already included in the PartiQL specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to have an example of attribute as a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Top-level, let's format these lines to be like 80 or 120 characters wide.
|
||
* We restrict tuple attribute exclude steps to use string literals and collection index exclude steps to use int literals. Thus `<exclude paths>` are statically known. We can decide whether to add other exclude paths (e.g. expressions) if a use case arises. | ||
* If sufficient schema is present and the path can be resolved, we assume the root of an `EXCLUDE` path can be omitted. The variable resolution rules follow what is already included in the PartiQL specification. | ||
* We require that every fully-qualified `<exclude path>` contain a root and at least one step. If a use case arises to exclude a binding tuple variable, then this functionality can be added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale for this limitation? We should put that here.
|
||
Why is `EXCLUDE` modeled as a binding tuple operator as opposed to a value expression?:: | ||
|
||
We had also considered modeling `EXCLUDE` as a value operation evaluated after the `<select clause>`. Evaluating `EXCLUDE` last could contradict the PartiQL specification's assertion that the `<select clause>` is evaluated last, which may add confusion. There were also some additional edge cases that complicated defining `EXCLUDE` as a value operator. For example, let's look at the following query: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot be a value operator if you want to ever prune binding tuple variables.
* If sufficient schema is present and the path can be resolved, we assume the root of an `EXCLUDE` path can be omitted. The variable resolution rules follow what is already included in the PartiQL specification. | ||
* We require that every fully-qualified `<exclude path>` contain a root and at least one step. If a use case arises to exclude a binding tuple variable, then this functionality can be added. | ||
* S-expressions are part of the Ion type system.footnote:[https://amazon-ion.github.io/ion-docs/docs/spec.html#sexp] | ||
PartiQL should support s-expression types and values since PartiQL's type system is a superset over the Ion types. Because the current PartiQL specification does not formally define s-expressions operations, we consider the definition of collection index and wildcard steps on s-expressions as out-of-scope for this RFC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This statement makes more assertions about the PartiQL value system than does the spec.
=== Out of scope / assumptions | ||
|
||
* We restrict tuple attribute exclude steps to use string literals and collection index exclude steps to use int literals. Thus `<exclude paths>` are statically known. We can decide whether to add other exclude paths (e.g. expressions) if a use case arises. | ||
* If sufficient schema is present and the path can be resolved, we assume the root of an `EXCLUDE` path can be omitted. The variable resolution rules follow what is already included in the PartiQL specification. | ||
* We require that every fully-qualified `<exclude path>` contain a root and at least one step. If a use case arises to exclude a binding tuple variable, then this functionality can be added. | ||
* S-expressions are part of the Ion type system.footnote:[https://amazon-ion.github.io/ion-docs/docs/spec.html#sexp] | ||
PartiQL should support s-expression types and values since PartiQL's type system is a superset over the Ion types. Because the current PartiQL specification does not formally define s-expressions operations, we consider the definition of collection index and wildcard steps on s-expressions as out-of-scope for this RFC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would reword this section. I would avoid usage of terms like 'we' and write it more formally and more tersely.
e.g.,
=== Limitations
* This RFC requires that every fully-qualified `<exclude path>` contain a root and at least one step.
* This RFC restricts tuple attribute exclude steps to use string literals and collection index exclude steps to use int literals. Thus `<exclude paths>` are statically known. This
* This RFC makes no changes to schema and name inference, and assumes that such inference is run as a prerequisite.
* This RFC defines `<exclude paths>` only over `list`, `bag`, and `tuple` value collections.
db.inventory.find( { status: "A" }, { status: 0, instock: 0 } ) | ||
---- | ||
|
||
== Unresolved questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if partial schema case be out of scope only? i.e. having SQL's schema-full within the scope as well.
I have come across a scenario where I'd like to express caution. It has to do with scoping of variables specifically when SELECT
t.a,
t.b -- this won't produce anything!
EXCLUDE t.b
FROM t This makes sense. However, I've been dealing with nested queries, and I'm curious what would happen in the following scenario: SELECT
(
SELECT t2.c + t1.a -- this shouldn't work!
EXCLUDE t1.a
FROM t2
) AS t1_plus_t2
FROM t1 -- this has columns a and b In the above, your RFC works great I believe. The query should fail. However, this may cause problems if global env = E0 = < t1: <<{a, b}>>, t2: <<{c, d}>> >
SELECT
(
SELECT t2.c + t1.a AS x-- input env = E0 || E1 || E2 = < t1: {b}, t2: {c, d} >. Output env = < x >
EXCLUDE t1.a -- input env = E0 || E1 || E2. Output Env = E0 || E1 || E2 (with some minor eliminations of attributes) = < t1: {b}, t2: {c, d} >
FROM t2 -- input env = E0 || E1. Output env (E2) = E0 || E1 || < t2: {c, d} > = < t1: {a, b}, t2: {c, d} >
) AS t1_plus_t2 -- the whole SELECT (including this projection item subquery) has input env = E0 || E1
FROM t1 -- input env = E0, output env (E1) = E0 || < t1: { a, b } > = < t1: {a, b}, t2: <<{c, d}>> > If we want to make sure that the inner select does NOT get access to what is being excluded, then we must not allow EXCLUDE to exclude whole bindings (rather than attributes of bindings). If we had a very similar query that excluded an entire binding, we might still allow the projection list to access the original global env = E0 = < t1: <<{a, b}>>, t2: <<{c, d}>> >
SELECT
(
SELECT t2.c + t1.a AS x-- input env = E0 || E1 || E3 = < t1: {a, b}, t2: {c, d} >. Output env (E4) = < x >
EXCLUDE t1 -- input env = E0 || E1 || E2. Output Env (E3) = E0 || E1 || E2 = < t2: {c, d} >
FROM t2 -- input env = E0 || E1. Output env (E2) = E0 || E1 || < t2: {c, d} > = < t1: {a, b}, t2: {c, d} >
) AS t1_plus_t2 -- the whole SELECT (including this projection item subquery) has input env = E0 || E1
FROM t1 -- input env = E0, output env (E1) = E0 || < t1: { a, b } > = < t1: {a, b}, t2: <<{c, d}>> > Notice that the inner select still received |
Issue #, if available: partiql/partiql-lang#27
RFC to define the
EXCLUDE
operator and definition in terms of existing PartiQL operators. Current reference implementation is in partiql-lang-kotlin'sEvaluatingCompiler
(though I'm currently working to port it to thePhysicalPlanCompilerImpl
).Rendered doc
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.