From 3fc8c14632da5a61c33e7e3baaffb57e4654c0a8 Mon Sep 17 00:00:00 2001 From: Kiko Fernandez-Reyes Date: Wed, 7 Aug 2024 13:29:15 +0200 Subject: [PATCH] EEP 71: clarification of type specs & generics --- eeps/eep-0071.md | 158 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 158 insertions(+) create mode 100644 eeps/eep-0071.md diff --git a/eeps/eep-0071.md b/eeps/eep-0071.md new file mode 100644 index 0000000..edecba0 --- /dev/null +++ b/eeps/eep-0071.md @@ -0,0 +1,158 @@ + Author: John Högberg , Ilya Klyuchnikov + Status: Draft + Type: Standards Track + Created: 7-Aug-2024 + Erlang-Version: OTP-28 + Post-History: +**** +EEP 71: Clarify the type documentation, especially regarding type variables +---- + +Abstract +======== + +Type variables are an often-used yet very misunderstood part of the type +language. As `dialyzer` has been lax in enforcing their defined semantics owing +to a few bugs, many users have been led to believe that they are simple generics +as in most other languages. As a significant amount of code has been written +with this in mind, OTP cannot fix the bugs that made `dialyzer` lax by accident +without breaking the analysis of a lot of code. + +While that would be fine if `dialyzer` were the only tool out there, all the +other code analyzers we know of interpret type variables as generics which would +fragment the ecosystem should we elect to fix the bugs. One application might +have been written with `eqWAlizer` in mind and another with `dialyzer`, +preventing either tool from being applied to the whole release. + +As `dialyzer` is simply broken in this regard, this EEP aims to solve the +problem by changing the definition of type variables from _equality constraints_ +to _generics_ (parametric polymorphism). We also propose several other changes +to make the type language more useful and less ambiguous. + +Rationale +========= + +As currently documented, variables in `-spec`s and `-type`s act just like in +ordinary code. That is, if the variable `V` appears several times it has the +same _value_ in all places. In a language like C++ or Java where said value is a +type, this works pretty well: + + + template + V foo(V a, V b) { + return a + b; + } +``` + +For example, `foo(12, 34)` works because both arguments have the same type `int` +(yielding `int` for `T`), but `foo(12, 34.0)` doesn't because `int` is not +`double`. + +However, as the documentation for the Erlang type language says that repeated +variables refer to the same _value at runtime_, tools like `gradualizer` and +`eqWAlizer` are not free to treat them as generics. For example: + + + -spec my_add(T, T) -> T. + my_add(A, B) -> A + B. + + +Here, the only possible values of `T` that will work are the various +representations of zero: `foo(12, 34)` will fail because `12` is not `34`, and +`foo(1, 1)` will fail because the result `2` is not `1`. + +This makes it impossible to declare that a function takes a value of a certain +type and returns something of the same type (but not necessarily _same value_). +This makes sense in `dialyzer` as values and types are the same thing in its +paradigm[^1], but restricts what other type checkers can do; while the +documentation states that the "extra information" given by type variables may be +ignored, it leaves no room for other interpretations when they are taken into +account. + +[^1]: In an untyped language we cannot simply say that the type of `1` is + `integer()` since `1 | foo` is just as much of a superset of `1` as + `integer()` is. The type of `1` is simply `1`. + + +Changes +---------------- + +To solve this issue and make the type language less surprising, we propose the +following changes: + +1. Redefine `::` as alias instead of "subtype of." + + All currently known tools implement this interpretation[^2]. Should bounded + quantification be needed in the future, the "subtype of" operator can be + introduced later under the more widely used `<:` or `=<` syntaxes. + + Multiple aliases under the same name are rejected, making the following illegal: + + + -spec multiple(X :: integer(), X :: integer()) -> atom(). + -spec when_multiple(X) -> atom() when X :: integer(), X :: number(). + + To prevent breaking existing code as a result of this change, the compiler + will _not_ enforce the above, leaving that to the type checking tools (or + optionally a compiler flag). + +2. Normalize types in the parser, substituting aliases and treating "annotated + types" the same as `when`. This makes the following signatures equivalent: + + -spec xyzzy(A) -> term() when A :: number(). + -spec waldo(B :: number()) -> term(). + -spec fred(number()) -> term(). + + Where `B :: number()` is mere shorthand for `B ... when B :: number()`, and + all variants declare that the first argument is a `number()`. + +3. Treat type variables as generics rather than as equality constraints. + Example: + + The current `lists:append` type spec is: + + -spec append(List1, List2) -> List3 when List1 :: [T], List2 :: [T], List3 :: [T], T :: term(). + + + With the defined semantics, `T :: term()` makes the `append` function to + refer to a top type. Instead, the following is preferred and more in line + with current type systems: + + -spec append(List1, List2) -> List3 when List1 :: [T], List2 :: [U], List3 :: [T | U]. + +4. Have function contracts (`-spec`) describe argument types as they go in. + + A very surprising aspect of function contracts is that they describe the + types of the arguments _when the function returns successfully_, and do not + affect argument types _within_ the function. For example: + + -spec foo(integer()) -> integer(). + foo(X) -> + bar(X), %% X is not necessarily an integer here! + X + 1. + + Most people take the above to mean that the type passed on to `bar/1` is + `integer()`, yet it can be whatever is passed to `foo/1` including values + that cause `foo/1` to fail. This often hides the fact that there is + unreachable code in callees such as `bar/1` here, and sometimes results in + cryptic warnings describing something the user believes could never happen. + + This also makes it much easier for a user to see what the type of a variable + is at any point: just start with what it says in the `-spec`. At present, + the type of `X` depends entirely on how `X` got there which is not always + trivial to figure out. + +[^2]: `dialyzer` also does this as a consequence of the bugs mentioned in the abstract. + +Reference Implementation +------------------------ + +-- TBD, we'll need to make a PR with changes in the documentation to reflect +above once we're happy with the above. -- + +Copyright +========= + +This document is placed in the public domain or under the CC0-1.0-Universal +license, whichever is more permissive. +