Typing #160

dekuenstle · 2018-05-16T19:13:45Z

Fixes #33.

Changes proposed in this pull request:

Use type annotations in all modules.
Fail travis CI if type checking fails.

Please note, that the type annotations which I added are fare from perfect.
Therefore with this PR I would love to hear your ideas and comments to improve them before merging.
There is still a lot of dirty stuff in it (type: ignore, ...) to get things running.

Some things we should think about:

Do we like type alias like Cue = str?
Which level of abstraction? My rule of thumb was to introduce an higher type to avoid more than 2 layers (e.g. Iterable[Event] vs Iterable[Tuple[Cues, Outcomes]] vs Iterable[Tuple[List[str],List[str]]] vs ...)
Some of our coding patterns are hard to represent with nice types. Probably it's worth to refactor the code itself instead of building fancy type constructs to work around. (e.g. method assigning of JobFilter in preprocess.py)
Our API is inconsistent (see also events instead of event_path / event_list #126 ). Probably it's worth streamlining instead of defining a lot of different Union / TypeVar .
Python 3.5. does not support all type annotation features. Do we keep 3.5 support? Then we have to work around these problems!

As a rule of thumb: We should try to remove almost all # type: ignore and Union and minimize the number of TypeVar usages. Then we not only get typing but also nices code + api.

Feel free to improve directly in this branch, I probably cannot contribute for the next 20 days.
The PR uses a develop branch as base like suggested with git-flow pattern in #104 .

Best,
David

…language)

For automatic generation we used MonkeyType. We run monkeytype on the pytests and generate the stubs. The information of the stubs was manually slightly simplified and applied. Then minor refactoring was necessary to pass typechecking. https://github.com/Instagram/MonkeyType

coveralls · 2018-05-16T19:17:04Z

Coverage increased (+0.4%) to 83.667% when pulling d8e4e7b on typing into 57ebba9 on develop.

dekuenstle · 2018-06-26T15:54:07Z

@Trybnetic @derNarr Finally tests also pass for python3.5 .
Please review this PR.

derNarr

Sorry for replying so late. I had some very busy weeks. I really like the changes and they gave me the first good impression how "typed" python code looks.

Additionally, some minor changes that should be merged into the master directly are included in this pull request.

I added some questions for clarification and some comments into the diff. I noted where imho code changes should go directly into master.

I will address the questions from the pull request comment in a different comment.

Before merging:

the type imports should be discussed
questions in the pull request description should be discussed

Now some overall observations follow:

Overall, the code get's more specific and less general by typing it. This is also reflected in some of the name changes of variables / labels, e. g. activations -> activation_dict, events -> cues_gen. This feels like the right naming scheme after typing the code. But at the same time it feels slightly unpythonic and reduces the readability for me a bit. I might only need to get used to this kind of written code and I see the benefits of the typing. But still the code looks less clean and more cluttered to me right now. Maybe I simply needs to get used to it.

derNarr · 2018-06-27T09:23:14Z

pyndl/__init__.py

+            if len(memory) > 0:
+                _, total, used, *_ = memory[0].split()
+            else:
+                total, used = '?', '?'
            osinfo += "{} {}MiB/{}MiB\n".format(identifier, used, total)


This of the memory printing change should go into master as well (independent of the typing). IMHO.

derNarr · 2018-06-27T09:24:17Z

pyndl/__init__.py

@@ -63,8 +63,11 @@ def sysinfo():
    if uname.sysname == "Linux":
        _, *lines = os.popen("free -m").readlines()
        for identifier in ["Mem:", "Swap:"]:


Iterate over Tuple here? i. e. ("Mem:", "Swap:")

derNarr · 2018-06-27T09:26:55Z

pyndl/activation.py


 import numpy as np
 import xarray as xr

+
+from numpy import ndarray
+from xarray.core.dataarray import DataArray


This type can also be accessed via xr.core.dataarray.DataArray therefore it would be possible to define the DataArray like:

DataArray = xr.core.dataarray.DataArray

instead of

from xarray.core.dataarray import DataArray

But the proposed version is probably cleaner. I am still a bit undecided. Any opinions on that? (The same applies to ndarray.)

derNarr · 2018-06-27T12:25:31Z

pyndl/corpus.py

@@ -68,7 +69,7 @@ def read_clean_gzfile(gz_file_path, *, break_duration=2.0):
                text = word_tag.text
                if text in PUNCTUATION:
                    words.append(text)
-                else:
+                elif text is not None:


Do we have enough test coverage here? Is this the right thing to do? If yes, this should be merged into master as soon as possible as well.

We agreed that it might be good to raise an exception here when text is None.

Now we raise a exception for null text.

derNarr · 2018-06-27T12:27:20Z

pyndl/io.py


 import pandas as pd
+from pandas.core.frame import DataFrame


See comment above about how to define / import the DataFrame class / type.

derNarr · 2018-06-27T14:19:19Z

tests/conftest.py



 def pytest_addoption(parser):
    """
    adds custom option to the pytest parser
    """
    parser.addoption("--runslow", action="store_true",
-                     help="run slow tests")
+                     default=False, help="run slow tests")


This change should go into master.

derNarr · 2018-06-27T14:19:32Z

tests/test_activation.py

@@ -140,7 +138,7 @@ def test_ignore_missing_cues_dict():
        assert np.allclose(reference_activations[outcome], activation_list)


-@slow
+@pytest.mark.slow


This change should go into master.

derNarr · 2018-06-27T14:19:44Z

tox.ini

@@ -1,5 +1,5 @@
 [tox]
-envlist = py{35,36}-test, checkstyle, documentation
+envlist = py{35,36}-test, checkstyle, checktypes, documentation


This change should go into master.

derNarr · 2018-06-27T14:19:57Z

tox.ini

@@ -52,7 +52,6 @@ deps = mypy
 setenv =
       MYPYPATH=./stubs/
 commands = mypy --ignore-missing-imports pyndl
-ignore_outcome = True


This change should go into master.

derNarr · 2018-06-27T14:25:11Z

pyndl/preprocess.py

@@ -351,7 +379,19 @@ def process_context(line):
                process_words(words)


-class JobFilter():
+class JobFilterBase():


What is the reasoning behind defining this base class? Is it to define the interface?

Yes. The type checker requires method definitions, but doesn't accept assignment already existing methods.

derNarr · 2018-06-27T14:44:44Z

Addressing some of the questions posed by David:

Do we like type alias like Cue = str?

I like these kind of type aliases as they improve readability imho.

Which level of abstraction? My rule of thumb was to introduce an higher type to avoid more than 2 layers (e.g. Iterable[Event] vs Iterable[Tuple[Cues, Outcomes]] vs Iterable[Tuple[List[str],List[str]]] vs ...)

IMHO sometimes three layers like Iterable[Tuple[Cues, Outcomes]] make sense. But trying to stick to mostly 2 and maximum 3 sound reasonable.

Some of our coding patterns are hard to represent with nice types. Probably it's worth to refactor the code itself instead of building fancy type constructs to work around. (e.g. method assigning of JobFilter in preprocess.py)

If there is a good and nice and comparably fast way to implement the filtering, I would be really happy about any suggestions. For me this was the nicest way of writing it, but I was never really happy about this piece of code. At the same time all the other ways of implementing it -- that I explored -- felt way more obscure.

Our API is inconsistent (see also events instead of event_path / event_list #126 ). Probably it's worth streamlining instead of defining a lot of different Union / TypeVar .

For me this is a mixed bag. On the one hand I would like to have a consistent easy to understand API, on the other hand especially for the events, it is really convenient, to either see them as a pandas.DataFrame, a generator of tuples, or a file path. I would really like to keep the API and even stabilize it in the direction of having these three representations for an event. What are your opinions?

Python 3.5. does not support all type annotation features. Do we keep 3.5 support? Then we have to work around these problems!

I am fine with dropping Python 3.5 support from 2019 onwards. Therefore we can merge and create a develop branch with 3.6, 3.7 and soon 3.8 support only.

Trybnetic · 2018-07-05T10:07:22Z

pyndl/preprocess.py

+    Union,
+    TypeVar,
+    Generic
+)


see @derNarr's comment above about indentation.

Trybnetic · 2018-07-05T10:12:03Z

pyndl/types.py

+
+
+from numpy import ndarray
+from xarray.core.dataarray import DataArray


See @derNarr's comments above about how to declare types

I don't think his suggestion can be applied here:
In the other files, numpy / xarray is already importet.

dekuenstle · 2018-07-05T12:07:49Z

@derNarr @Trybnetic thanks for your reviews. I'll try to implement your suggestions when i've some time left for this.

@derNarr
I don't completely get your comments about 'should go into master'. My suggestion would be, to merge to develop. Refining the code there with some more PR. Testing it, and then merging it to master.
This is how we would do it with git-flow #104 which would hopefully result in less, but stable releases.
Do you think, the annotated patches are such important, that we merge them directly to master and to a bugfix-release?

If there is a good and nice and comparably fast way to implement the filtering, I would be really happy about any suggestions. For me this was the nicest way of writing it, but I was never really happy about this piece of code. At the same time all the other ways of implementing it -- that I explored -- felt way more obscure.

I'll think about it. But probably refactoring it is beyond the scope of this PR.

For me this is a mixed bag. On the one hand I would like to have a consistent easy to understand API, on the other hand especially for the events, it is really convenient, to either see them as a pandas.DataFrame, a generator of tuples, or a file path. I would really like to keep the API and even stabilize it in the direction of having these three representations for an event. What are your opinions?

My suggestion is not to restrict to e.g. dicts. In contrast I would define a Union Type Events which contains all of the types you describe and is accepted everywhere. Numpy does similar, usually they convert them at the beginning of a function to one consistently used type.
Currently we accept dicts + DataFrame in one function, xarrays + dicts in another, etc.

I am fine with dropping Python 3.5 support from 2019 onwards. Therefore we can merge and create a develop branch with 3.6, 3.7 and soon 3.8 support only.

I fixed the 3.5 issues, so no need to drop 3.5 support just for typechecking.

dekuenstle · 2018-07-06T07:14:27Z

@derNarr @Trybnetic Please have a look at the refinements.
I would suggest merging this PR to develop as a first introduction of typechecks.
Then in another PR(s) we can refactor filtering and unify the accepted event types as described above,
before merging back to master and releasing them.

dekuenstle added 7 commits May 16, 2018 11:01

Using pytest.config is discouraged. Config in conftest instead

fd0926c

Don't throw error for unexpected 'free -m' result (e.g. other system …

b3b9af4

…language)

Remove existing mypy error by type hint and stub file

c5fa5bf

Add some shared higher types

59f66b2

Add missing functions to numpy type stub

f8b6990

Make type checking mandatory to pass tox + CI pipeline.

5f1dd69

dekuenstle added enhancement help wanted cleanup code quality API changes labels May 16, 2018

dekuenstle requested review from derNarr, Trybnetic and kuchenrolle May 16, 2018 19:13

dekuenstle added 4 commits May 16, 2018 21:24

Try supporting 3.5 typing

be84f00

Use more variable type annotations compatible to 3.5

d8e4e7b

Define python3.5 fallback for types.Collection

6ff6406

Make type annotations compatible to python3.5

e79b456

derNarr reviewed Jun 27, 2018

View reviewed changes

Trybnetic reviewed Jul 5, 2018

View reviewed changes

pyndl/preprocess.py

Union,

TypeVar,

Generic

)

Copy link

Member

Trybnetic Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see @derNarr's comment above about indentation.

Trybnetic reviewed Jul 5, 2018

View reviewed changes

dekuenstle added 2 commits July 6, 2018 09:03

Minor style improvements.

9c9208c

Raise error for empty word tag.

09737b8

Trybnetic mentioned this pull request Nov 7, 2018

General code improvements #164

Merged

Base automatically changed from develop to master June 21, 2021 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typing #160

Typing #160

dekuenstle commented May 16, 2018 •

edited

Loading

coveralls commented May 16, 2018 •

edited

Loading

dekuenstle commented Jun 26, 2018

derNarr left a comment

derNarr Jun 27, 2018

derNarr Jun 27, 2018

derNarr Jun 27, 2018

dekuenstle Jul 5, 2018

derNarr Jun 27, 2018

Trybnetic Jul 5, 2018

dekuenstle Jul 6, 2018

derNarr Jun 27, 2018

derNarr Jun 27, 2018

derNarr Jun 27, 2018

derNarr Jun 27, 2018

derNarr Jun 27, 2018

derNarr Jun 27, 2018

dekuenstle Jul 5, 2018

derNarr commented Jun 27, 2018

Trybnetic Jul 5, 2018

Trybnetic Jul 5, 2018

dekuenstle Jul 6, 2018

dekuenstle commented Jul 5, 2018

dekuenstle commented Jul 6, 2018 •

edited

Loading



		from numpy import ndarray
		from xarray.core.dataarray import DataArray

Typing #160

Are you sure you want to change the base?

Typing #160

Conversation

dekuenstle commented May 16, 2018 • edited Loading

coveralls commented May 16, 2018 • edited Loading

dekuenstle commented Jun 26, 2018

derNarr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derNarr commented Jun 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dekuenstle commented Jul 5, 2018

dekuenstle commented Jul 6, 2018 • edited Loading

dekuenstle commented May 16, 2018 •

edited

Loading

coveralls commented May 16, 2018 •

edited

Loading

dekuenstle commented Jul 6, 2018 •

edited

Loading