From 8d40d10a348e862a18117fa02d761e23c545708e Mon Sep 17 00:00:00 2001 From: lucioleKi Date: Mon, 30 Sep 2024 10:13:04 +0200 Subject: [PATCH 1/4] add draft for eep-73 --- eeps/eep-0073.md | 212 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 212 insertions(+) create mode 100644 eeps/eep-0073.md diff --git a/eeps/eep-0073.md b/eeps/eep-0073.md new file mode 100644 index 0000000..d57646b --- /dev/null +++ b/eeps/eep-0073.md @@ -0,0 +1,212 @@ + Author: Isabell Huang + Status: Draft + Type: Standard Track + Created: 21-Sep-2024 + Erlang-Version: 28 + Post-History: + Replaces: 19 +**** +EEP 73: Zip generator +---- + +Abstract +======== + +This EEP proposes the addition of zip generators with a syntax of `&&` for +comprehensions in Erlang. The idea and syntax of zip generators (comprehension +multigenerators) was first brought up by EEP-19. Even if the syntax and +usages of zip generators proposed by this EEP is mostly the same with EEP-19, +the comprehension language of Erlang has undergone many changes since EEP-19 +was accepted. With an implementation that is compatible with all existing +comprehensions, this EEP defines the behavior of zip generators with more +clarification on the compiler's part. + +Rationale +========= + +List comprehension is a way to create a new list from existing list(s). Lists +are traversed in a dependent (nested) way. In the following example, the +resulting list has length 4 when the two input lists both have length 2. + + [{X,Y} || X <- [1,2], Y <- [3,4]] = [{1,3}, {1,4}, {2,3}, {2,4}]. + +In contrast, parallel list comprehension (also known as zip comprehension) +evaluates qualifiers (a generalization of lists) in parallel. Qualifiers are +first "zipped" together, and then evaluated. Many functional languages +([Haskell][2], [Racket][3], etc.) and non-functional languages (Python etc.) support +this variation. Suppose the two lists in the example above are evaluated as +a zip comprehension, the result would be `[{1,3}, {2,4}]`. + +Zip comprehensions allow the user to conveniently iterate over several lists +at once. Without it, the standard way to accomplish the same task in Erlang +is to use `lists:zip` to zip two lists into two-tuples, or to use `lists:zip3` +to zip three lists into three-tuples. The list module does not provide a +function to zip more than three lists. Functions like `lists:zip` always +create intermediate data structures when compiled. The compiler does not +perform deforestation to eliminate the unwanted tuples. + +Zip generators is a generalization of zip comprehensions. Every set of zipped +generators is treated as one generator. Instead of constraining a comprehension +to be either zipped or non-zipped, any generator can be either a zip generator +(containing at least two generators zipped together), or a non-zip generator +(all existing generators are non-zip generator). Therefore, zip generators +can be mixed freely with all existing generators and filters. Zip comprehension +then becomes a special case of comprehension where only zip generators are +used. + +In summary, zip generator removes the user's need to call the zip function +and allows for any number of lists to be zipped at once. It can be used in +list, binary, and map comprehensions, and mixed freely with all existing +generators and filters. Internally, the compiler does not create any +intermediate data structure, therefore also removing the need of deforestation. + +Specification +======================== + +Currently, Erlang supports three kinds of comprehensions, list comprehension, +binary comprehension, and map comprehension. Their names refer to the result +of the comprehension. List comprehension produces a list; binary comprehension +produces a binary, etc. + + [Expression || Qualifier(s)] %% List Comprehension + <> %% Binary Comprehension + #{Expression || Qualifier(s)} %% Map Comprehension + + +Qualifiers can have the following kind: filter, list generator, bitstring +generator, and map generator. Except for filters, the other three kinds of +qualifiers are generators. Their names refer to the type on the right hand +side of `<-` or `<=`. Generators have the following form: + + Pattern <- List %% List Generator + Pattern <= Bitstring %% Bitstring Generator + Pattern_1 := Pattern_2 <- Map %% Map Generator + +All qualifiers and filters can be freely used and mixed in all 3 kinds of +comprehensions. The following example shows a list comprehension with a +list generator and a bitstring generator. + + [{X,Y} || X <- [1,2,3], <> <= <<4,5,6>>]. + +This EEP proposes the addition of zip generators. A zip generator is two or +more generators connected by `&&`. Zip generators is constructed to connect +any number of the 3 kinds of generators above. Zip generators can be used +in list, binary, or map comprehensions in the same way. + +For example, if the two generators in the above example is combined together +as a zip generator, the comprehension would look like: + + [{X,Y} || X <- [1,2,3] && <> <= <<4,5,6>>]. + +For every zip generator of the form +`G1 && ... && Gn`, it is evaluated to have the same result as `zip/n` where + + zip([H1|T1], ..., [Hn|Tn]) -> + [{H1,...,Hn} | zip(T1, ..., Tn)]; + zip([], ..., []) -> + []. + +Therefore, the above comprehension evaluates to `[{1,4}, {2,5}, {3,6}]`, which +is the same as if using `lists:zip/2`. + +Zip generator can also be used when a comprehension contains other non-zip +generators and/or filters. The `&&` symbol has a higher precedence than `,`. + +The following example evaluates to `[{b,4}, {c,6}]`. The element `{a,2}` is +filtered out from the resulting list. + + [{X, Y} || X <- [a, b, c] && <> <= <<2, 4, 6>>, Y =/= 2]. + +Comparing to using helper functions, there is one advantage of using a zip +generator: The Erlang compiler does not generate any tuple when a zip +generator is translated into core Erlang. The generated code reflects the +programmer's intent, which is to collect one element from every list at a +time without creating a list of tuples. + +Error Behaviors +================ + +One would expect that when errors happen, a zip generator behaves the same +as `lists:zip/2`, `lists:zip3/3`, and also the `zip/n` function above when +more than 3 lists are zipped together. The design and implementation of +zip generators aim to achieve that both for compiled code and for comprehensions +evaluated in Erlang shell. + +Generators of Different Lengths +-------------- + +`lists:zip/2` and `lists:zip3/3` will fail if the given lists are not of the +same length, where `zip/n` will also crash. Therefore, a zip generator raises a +`bad generators` error when it discovers that the given generators are of +different lengths. + +When a zip generator crashes because the containing generators are of +different lengths, the error message is a tuple, where the first element +is the atom `bad_generators`, and the second element is a tuple that contains +the remaining data from all generators. + +For example, this comprehension will crash at runtime. + + [{X,Y} || X <- [1,2,3] && Y <- [1,2,3,4]]. + +The resulting error tuple is `{bad_generators,{[],[4]}}`. This is because +when the comprehension crashes, the first list in the zip generator has +only the empty list `[]` left, while the second list in the zip generator +has `[4]` left. + +On the compiler's side, it is rather difficult to return the original zip +generator in the error message, or to point out which generator is of +different length comparing to others. The proposed error message aims to +gives the most helpful information without imposing extra burden on the +compiler or runtime. + +Non-generator in a Zip Generator +----------------- + +As the idea of zipping only makes sense for generators, a zip generator cannot +contain filters or any expression that is not a generator. Whenever it is +possible to catch such an error at compile-time, this error is caught by +the Erlang linter. + +For example, the zip generator in the following comprehension contains a +non-generator. + + zip() -> [{X,Y} || X <- [1,2,3] && Y <- [1,2,3] && X > 0]. + +When the function is compiled, the linter points out that only generators are +allowed in a zip generator, together with the position of the non-generator. + + t.erl:6:55: only generators are allowed in a zip generator. + % 6| zip() -> [{X,Y} || X <- [1,2,3] && Y <- [1,2,3] && X > 0]. + % | ^ + + +Backwards Compatibility +======================== + +The operator `&&` is not used in Erlang. No existing code is affected by +this addition. + +Reference Implementation +======================== + +[Git Branch][1] contains the implementation for zip generators. + +[1]: https://github.com/lucioleKi/otp/tree/isabell/compiler/zip-comprehension/OTP-19184 +[2]: https://downloads.haskell.org/~ghc/5.00/docs/set/parallel-list-comprehensions.html +[3]: https://docs.racket-lang.org/reference/for.html + +Copyright +========= + +This document is placed in the public domain or under the CC0-1.0-Universal +license, whichever is more permissive. + +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" +[VimVar]: <> " vim: set fileencoding=utf-8 expandtab shiftwidth=4 softtabstop=4: " From 71cf040a16ee2e844f3799cbaec81fa131f0baf7 Mon Sep 17 00:00:00 2001 From: lucioleKi Date: Thu, 10 Oct 2024 12:01:54 +0200 Subject: [PATCH 2/4] formatting and more explanation for error --- eeps/eep-0073.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/eeps/eep-0073.md b/eeps/eep-0073.md index d57646b..f05ec72 100644 --- a/eeps/eep-0073.md +++ b/eeps/eep-0073.md @@ -12,7 +12,7 @@ EEP 73: Zip generator Abstract ======== -This EEP proposes the addition of zip generators with a syntax of `&&` for +This EEP proposes the addition of zip generators with a syntax of `&&` for comprehensions in Erlang. The idea and syntax of zip generators (comprehension multigenerators) was first brought up by EEP-19. Even if the syntax and usages of zip generators proposed by this EEP is mostly the same with EEP-19, @@ -72,11 +72,10 @@ produces a binary, etc. <> %% Binary Comprehension #{Expression || Qualifier(s)} %% Map Comprehension - Qualifiers can have the following kind: filter, list generator, bitstring generator, and map generator. Except for filters, the other three kinds of qualifiers are generators. Their names refer to the type on the right hand -side of `<-` or `<=`. Generators have the following form: +side of `<-` or `<=`. Generators have the following form: Pattern <- List %% List Generator Pattern <= Bitstring %% Bitstring Generator @@ -138,14 +137,16 @@ Generators of Different Lengths `lists:zip/2` and `lists:zip3/3` will fail if the given lists are not of the same length, where `zip/n` will also crash. Therefore, a zip generator raises a `bad generators` error when it discovers that the given generators are of -different lengths. +different lengths. When a zip generator crashes because the containing generators are of -different lengths, the error message is a tuple, where the first element -is the atom `bad_generators`, and the second element is a tuple that contains -the remaining data from all generators. +different lengths, the internal error message is a tuple, where the first +element is the atom `bad_generators`, and the second element is a tuple that +contains the remaining data from all generators. The user-facing error message +is `bad generators:`, followed by the tuple containing remaining data from +all generators. -For example, this comprehension will crash at runtime. +For example, this comprehension will crash at runtime. [{X,Y} || X <- [1,2,3] && Y <- [1,2,3,4]]. @@ -158,7 +159,7 @@ On the compiler's side, it is rather difficult to return the original zip generator in the error message, or to point out which generator is of different length comparing to others. The proposed error message aims to gives the most helpful information without imposing extra burden on the -compiler or runtime. +compiler or runtime. Non-generator in a Zip Generator ----------------- @@ -169,7 +170,7 @@ possible to catch such an error at compile-time, this error is caught by the Erlang linter. For example, the zip generator in the following comprehension contains a -non-generator. +filter. zip() -> [{X,Y} || X <- [1,2,3] && Y <- [1,2,3] && X > 0]. @@ -180,7 +181,6 @@ allowed in a zip generator, together with the position of the non-generator. % 6| zip() -> [{X,Y} || X <- [1,2,3] && Y <- [1,2,3] && X > 0]. % | ^ - Backwards Compatibility ======================== @@ -190,7 +190,7 @@ this addition. Reference Implementation ======================== -[Git Branch][1] contains the implementation for zip generators. +[Git Branch][1] contains the implementation for zip generators. [1]: https://github.com/lucioleKi/otp/tree/isabell/compiler/zip-comprehension/OTP-19184 [2]: https://downloads.haskell.org/~ghc/5.00/docs/set/parallel-list-comprehensions.html From 93182c67365f20763a6069ccab3428e660e686ff Mon Sep 17 00:00:00 2001 From: lucioleKi Date: Thu, 10 Oct 2024 14:01:59 +0200 Subject: [PATCH 3/4] add example and fix typo --- eeps/eep-0073.md | 38 ++++++++++++++++++++++++++++++-------- 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/eeps/eep-0073.md b/eeps/eep-0073.md index f05ec72..96c458b 100644 --- a/eeps/eep-0073.md +++ b/eeps/eep-0073.md @@ -33,9 +33,9 @@ resulting list has length 4 when the two input lists both have length 2. In contrast, parallel list comprehension (also known as zip comprehension) evaluates qualifiers (a generalization of lists) in parallel. Qualifiers are first "zipped" together, and then evaluated. Many functional languages -([Haskell][2], [Racket][3], etc.) and non-functional languages (Python etc.) support -this variation. Suppose the two lists in the example above are evaluated as -a zip comprehension, the result would be `[{1,3}, {2,4}]`. +([Haskell][2], [Racket][3], etc.) and non-functional languages (Python etc.) +support this variation. Suppose the two lists in the example above are +evaluated as a zip comprehension, the result would be `[{1,3}, {2,4}]`. Zip comprehensions allow the user to conveniently iterate over several lists at once. Without it, the standard way to accomplish the same task in Erlang @@ -54,11 +54,33 @@ can be mixed freely with all existing generators and filters. Zip comprehension then becomes a special case of comprehension where only zip generators are used. -In summary, zip generator removes the user's need to call the zip function -and allows for any number of lists to be zipped at once. It can be used in -list, binary, and map comprehensions, and mixed freely with all existing -generators and filters. Internally, the compiler does not create any -intermediate data structure, therefore also removing the need of deforestation. +Within the OTP codebase, there are many uses of `lists:zip` within comprehensions. +All of them can be simplified by zip generators using `&&` syntax. For example, +The `yecc.erl` in parsetools contains the following comprehension (external +function calls and irrelevant fields redacted for readability): + + PartDataL = [#part_data{name = Nm, eq_state = Eqs, actions = P, states = S} + || {{Nm,P}, {Nm,S}, {Nm,EqS}} <- + lists:zip3(PartNameL, PartInStates, PartStates)]. + +When using zip generators, the comprehension is rewritten to: + + PartDataL = [#part_data{name = Nm, eq_state = Eqs, actions = P, states = S} + || {Nm,P} <- PartNameL && {Nm,S} <- PartInStates && {Nm,EqS} <- PartStates]. + +By using zip generators, the compiler avoids the need to build the intermediate +list of tuples. Variable bindings and pattern matching within a zip generator +works as expected, as `Nm` is supposed to bind to the same value in `{Nm,P}` +and `{Nm,S}`. If the binding fails, then one element from each of the 3 +generators is skipped. (If a strict generator is used, then the comprehension +fails with exception `badmatch`, as specified in EEP-70.) + +In summary, zip generators remove the user's need to call the zip function +within comprehensions and allows for any number of lists to be zipped at once. +It can be used in list, binary, and map comprehensions, and mixed freely with +all existing generators and filters. Internally, the compiler does not create +any intermediate data structure, therefore also removing the need of +deforestation. Specification ======================== From 4fa5b7245ef6cb59abfbe21496037ea4fdf9c67a Mon Sep 17 00:00:00 2001 From: lucioleKi Date: Thu, 10 Oct 2024 16:17:12 +0200 Subject: [PATCH 4/4] add link to PR --- eeps/eep-0073.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/eeps/eep-0073.md b/eeps/eep-0073.md index 96c458b..61de912 100644 --- a/eeps/eep-0073.md +++ b/eeps/eep-0073.md @@ -212,9 +212,10 @@ this addition. Reference Implementation ======================== -[Git Branch][1] contains the implementation for zip generators. +[compiler: Add zip generators for comprehensions][1] contains the implementation +for zip generators. -[1]: https://github.com/lucioleKi/otp/tree/isabell/compiler/zip-comprehension/OTP-19184 +[1]: https://github.com/erlang/otp/pull/8926 [2]: https://downloads.haskell.org/~ghc/5.00/docs/set/parallel-list-comprehensions.html [3]: https://docs.racket-lang.org/reference/for.html