Skip to content

Commit

Permalink
EEP 71: clarification of type specs & generics
Browse files Browse the repository at this point in the history
  • Loading branch information
kikofernandez committed Aug 7, 2024
1 parent 68a0a1b commit 3fc8c14
Showing 1 changed file with 158 additions and 0 deletions.
158 changes: 158 additions & 0 deletions eeps/eep-0071.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
Author: John Högberg <john(at)erlang(dot)org>, Ilya Klyuchnikov <ilya(dot)klyuchnikov(at)gmail(dot)com>
Status: Draft
Type: Standards Track
Created: 7-Aug-2024
Erlang-Version: OTP-28
Post-History:
****
EEP 71: Clarify the type documentation, especially regarding type variables
----

Abstract
========

Type variables are an often-used yet very misunderstood part of the type
language. As `dialyzer` has been lax in enforcing their defined semantics owing
to a few bugs, many users have been led to believe that they are simple generics
as in most other languages. As a significant amount of code has been written
with this in mind, OTP cannot fix the bugs that made `dialyzer` lax by accident
without breaking the analysis of a lot of code.

While that would be fine if `dialyzer` were the only tool out there, all the
other code analyzers we know of interpret type variables as generics which would
fragment the ecosystem should we elect to fix the bugs. One application might
have been written with `eqWAlizer` in mind and another with `dialyzer`,
preventing either tool from being applied to the whole release.

As `dialyzer` is simply broken in this regard, this EEP aims to solve the
problem by changing the definition of type variables from _equality constraints_
to _generics_ (parametric polymorphism). We also propose several other changes
to make the type language more useful and less ambiguous.

Rationale
=========

As currently documented, variables in `-spec`s and `-type`s act just like in
ordinary code. That is, if the variable `V` appears several times it has the
same _value_ in all places. In a language like C++ or Java where said value is a
type, this works pretty well:


Check failure on line 40 in eeps/eep-0071.md

View workflow job for this annotation

GitHub Actions / markdownlint

Multiple consecutive blank lines

eeps/eep-0071.md:40 MD012/no-multiple-blanks Multiple consecutive blank lines [Expected: 1; Actual: 2] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md012.md

Check failure on line 40 in eeps/eep-0071.md

View workflow job for this annotation

GitHub Actions / markdownlint

Multiple consecutive blank lines

eeps/eep-0071.md:40 MD012/no-multiple-blanks Multiple consecutive blank lines [Expected: 1; Actual: 2] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md012.md
template<typename V>
V foo(V a, V b) {
return a + b;
}
```

Check failure on line 45 in eeps/eep-0071.md

View workflow job for this annotation

GitHub Actions / markdownlint

Fenced code blocks should be surrounded by blank lines

eeps/eep-0071.md:45 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md031.md

Check failure on line 45 in eeps/eep-0071.md

View workflow job for this annotation

GitHub Actions / markdownlint

Fenced code blocks should have a language specified

eeps/eep-0071.md:45 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md040.md

Check failure on line 45 in eeps/eep-0071.md

View workflow job for this annotation

GitHub Actions / markdownlint

Code block style

eeps/eep-0071.md:45 MD046/code-block-style Code block style [Expected: indented; Actual: fenced] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md046.md

Check failure on line 45 in eeps/eep-0071.md

View workflow job for this annotation

GitHub Actions / markdownlint

Fenced code blocks should be surrounded by blank lines

eeps/eep-0071.md:45 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md031.md

Check failure on line 45 in eeps/eep-0071.md

View workflow job for this annotation

GitHub Actions / markdownlint

Fenced code blocks should have a language specified

eeps/eep-0071.md:45 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md040.md

Check failure on line 45 in eeps/eep-0071.md

View workflow job for this annotation

GitHub Actions / markdownlint

Code block style

eeps/eep-0071.md:45 MD046/code-block-style Code block style [Expected: indented; Actual: fenced] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md046.md
For example, `foo(12, 34)` works because both arguments have the same type `int`
(yielding `int` for `T`), but `foo(12, 34.0)` doesn't because `int` is not
`double`.
However, as the documentation for the Erlang type language says that repeated
variables refer to the same _value at runtime_, tools like `gradualizer` and
`eqWAlizer` are not free to treat them as generics. For example:
-spec my_add(T, T) -> T.
my_add(A, B) -> A + B.
Here, the only possible values of `T` that will work are the various
representations of zero: `foo(12, 34)` will fail because `12` is not `34`, and
`foo(1, 1)` will fail because the result `2` is not `1`.
This makes it impossible to declare that a function takes a value of a certain
type and returns something of the same type (but not necessarily _same value_).
This makes sense in `dialyzer` as values and types are the same thing in its
paradigm[^1], but restricts what other type checkers can do; while the
documentation states that the "extra information" given by type variables may be
ignored, it leaves no room for other interpretations when they are taken into
account.
[^1]: In an untyped language we cannot simply say that the type of `1` is
`integer()` since `1 | foo` is just as much of a superset of `1` as
`integer()` is. The type of `1` is simply `1`.
Changes
----------------
To solve this issue and make the type language less surprising, we propose the
following changes:
1. Redefine `::` as alias instead of "subtype of."
All currently known tools implement this interpretation[^2]. Should bounded
quantification be needed in the future, the "subtype of" operator can be
introduced later under the more widely used `<:` or `=<` syntaxes.
Multiple aliases under the same name are rejected, making the following illegal:
-spec multiple(X :: integer(), X :: integer()) -> atom().
-spec when_multiple(X) -> atom() when X :: integer(), X :: number().
To prevent breaking existing code as a result of this change, the compiler
will _not_ enforce the above, leaving that to the type checking tools (or
optionally a compiler flag).
2. Normalize types in the parser, substituting aliases and treating "annotated
types" the same as `when`. This makes the following signatures equivalent:
-spec xyzzy(A) -> term() when A :: number().
-spec waldo(B :: number()) -> term().
-spec fred(number()) -> term().
Where `B :: number()` is mere shorthand for `B ... when B :: number()`, and
all variants declare that the first argument is a `number()`.
3. Treat type variables as generics rather than as equality constraints.
Example:
The current `lists:append` type spec is:
-spec append(List1, List2) -> List3 when List1 :: [T], List2 :: [T], List3 :: [T], T :: term().
With the defined semantics, `T :: term()` makes the `append` function to
refer to a top type. Instead, the following is preferred and more in line
with current type systems:
-spec append(List1, List2) -> List3 when List1 :: [T], List2 :: [U], List3 :: [T | U].
4. Have function contracts (`-spec`) describe argument types as they go in.
A very surprising aspect of function contracts is that they describe the
types of the arguments _when the function returns successfully_, and do not
affect argument types _within_ the function. For example:
-spec foo(integer()) -> integer().
foo(X) ->
bar(X), %% X is not necessarily an integer here!
X + 1.
Most people take the above to mean that the type passed on to `bar/1` is
`integer()`, yet it can be whatever is passed to `foo/1` including values
that cause `foo/1` to fail. This often hides the fact that there is
unreachable code in callees such as `bar/1` here, and sometimes results in
cryptic warnings describing something the user believes could never happen.
This also makes it much easier for a user to see what the type of a variable
is at any point: just start with what it says in the `-spec`. At present,
the type of `X` depends entirely on how `X` got there which is not always
trivial to figure out.
[^2]: `dialyzer` also does this as a consequence of the bugs mentioned in the abstract.
Reference Implementation
------------------------
-- TBD, we'll need to make a PR with changes in the documentation to reflect
above once we're happy with the above. --
Copyright
=========
This document is placed in the public domain or under the CC0-1.0-Universal
license, whichever is more permissive.

0 comments on commit 3fc8c14

Please sign in to comment.