Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typing #160

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Typing #160

wants to merge 13 commits into from

Conversation

dekuenstle
Copy link
Member

@dekuenstle dekuenstle commented May 16, 2018

Fixes #33.

Changes proposed in this pull request:

  • Use type annotations in all modules.
  • Fail travis CI if type checking fails.

Please note, that the type annotations which I added are fare from perfect.
Therefore with this PR I would love to hear your ideas and comments to improve them before merging.
There is still a lot of dirty stuff in it (type: ignore, ...) to get things running.

Some things we should think about:

  • Do we like type alias like Cue = str?
  • Which level of abstraction? My rule of thumb was to introduce an higher type to avoid more than 2 layers (e.g. Iterable[Event] vs Iterable[Tuple[Cues, Outcomes]] vs Iterable[Tuple[List[str],List[str]]] vs ...)
  • Some of our coding patterns are hard to represent with nice types. Probably it's worth to refactor the code itself instead of building fancy type constructs to work around. (e.g. method assigning of JobFilter in preprocess.py)
  • Our API is inconsistent (see also events instead of event_path / event_list #126 ). Probably it's worth streamlining instead of defining a lot of different Union / TypeVar .
  • Python 3.5. does not support all type annotation features. Do we keep 3.5 support? Then we have to work around these problems!

As a rule of thumb: We should try to remove almost all # type: ignore and Union and minimize the number of TypeVar usages. Then we not only get typing but also nices code + api.

Feel free to improve directly in this branch, I probably cannot contribute for the next 20 days.
The PR uses a develop branch as base like suggested with git-flow pattern in #104 .

Best,
David

@coveralls
Copy link

coveralls commented May 16, 2018

Coverage Status

Coverage increased (+0.4%) to 83.667% when pulling d8e4e7b on typing into 57ebba9 on develop.

@dekuenstle
Copy link
Member Author

@Trybnetic @derNarr Finally tests also pass for python3.5 .
Please review this PR.

Copy link
Member

@derNarr derNarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for replying so late. I had some very busy weeks. I really like the changes and they gave me the first good impression how "typed" python code looks.

Additionally, some minor changes that should be merged into the master directly are included in this pull request.

I added some questions for clarification and some comments into the diff. I noted where imho code changes should go directly into master.

I will address the questions from the pull request comment in a different comment.

Before merging:

  • the type imports should be discussed
  • questions in the pull request description should be discussed

Now some overall observations follow:

Overall, the code get's more specific and less general by typing it. This is also reflected in some of the name changes of variables / labels, e. g. activations -> activation_dict, events -> cues_gen. This feels like the right naming scheme after typing the code. But at the same time it feels slightly unpythonic and reduces the readability for me a bit. I might only need to get used to this kind of written code and I see the benefits of the typing. But still the code looks less clean and more cluttered to me right now. Maybe I simply needs to get used to it.

if len(memory) > 0:
_, total, used, *_ = memory[0].split()
else:
total, used = '?', '?'
osinfo += "{} {}MiB/{}MiB\n".format(identifier, used, total)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This of the memory printing change should go into master as well (independent of the typing). IMHO.

@@ -63,8 +63,11 @@ def sysinfo():
if uname.sysname == "Linux":
_, *lines = os.popen("free -m").readlines()
for identifier in ["Mem:", "Swap:"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterate over Tuple here? i. e. ("Mem:", "Swap:")


import numpy as np
import xarray as xr


from numpy import ndarray
from xarray.core.dataarray import DataArray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type can also be accessed via xr.core.dataarray.DataArray therefore it would be possible to define the DataArray like:

DataArray = xr.core.dataarray.DataArray

instead of

from xarray.core.dataarray import DataArray

But the proposed version is probably cleaner. I am still a bit undecided. Any opinions on that? (The same applies to ndarray.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@@ -68,7 +69,7 @@ def read_clean_gzfile(gz_file_path, *, break_duration=2.0):
text = word_tag.text
if text in PUNCTUATION:
words.append(text)
else:
elif text is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have enough test coverage here? Is this the right thing to do? If yes, this should be merged into master as soon as possible as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed that it might be good to raise an exception here when text is None.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we raise a exception for null text.

pyndl/io.py Outdated

import pandas as pd
from pandas.core.frame import DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above about how to define / import the DataFrame class / type.



def pytest_addoption(parser):
"""
adds custom option to the pytest parser
"""
parser.addoption("--runslow", action="store_true",
help="run slow tests")
default=False, help="run slow tests")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should go into master.

@@ -140,7 +138,7 @@ def test_ignore_missing_cues_dict():
assert np.allclose(reference_activations[outcome], activation_list)


@slow
@pytest.mark.slow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should go into master.

@@ -1,5 +1,5 @@
[tox]
envlist = py{35,36}-test, checkstyle, documentation
envlist = py{35,36}-test, checkstyle, checktypes, documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should go into master.

@@ -52,7 +52,6 @@ deps = mypy
setenv =
MYPYPATH=./stubs/
commands = mypy --ignore-missing-imports pyndl
ignore_outcome = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should go into master.

@@ -351,7 +379,19 @@ def process_context(line):
process_words(words)


class JobFilter():
class JobFilterBase():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning behind defining this base class? Is it to define the interface?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The type checker requires method definitions, but doesn't accept assignment already existing methods.

@derNarr
Copy link
Member

derNarr commented Jun 27, 2018

Addressing some of the questions posed by David:

  • Do we like type alias like Cue = str?

I like these kind of type aliases as they improve readability imho.

  • Which level of abstraction? My rule of thumb was to introduce an higher type to avoid more than 2 layers (e.g. Iterable[Event] vs Iterable[Tuple[Cues, Outcomes]] vs Iterable[Tuple[List[str],List[str]]] vs ...)

IMHO sometimes three layers like Iterable[Tuple[Cues, Outcomes]] make sense. But trying to stick to mostly 2 and maximum 3 sound reasonable.

  • Some of our coding patterns are hard to represent with nice types. Probably it's worth to refactor the code itself instead of building fancy type constructs to work around. (e.g. method assigning of JobFilter in preprocess.py)

If there is a good and nice and comparably fast way to implement the filtering, I would be really happy about any suggestions. For me this was the nicest way of writing it, but I was never really happy about this piece of code. At the same time all the other ways of implementing it -- that I explored -- felt way more obscure.

For me this is a mixed bag. On the one hand I would like to have a consistent easy to understand API, on the other hand especially for the events, it is really convenient, to either see them as a pandas.DataFrame, a generator of tuples, or a file path. I would really like to keep the API and even stabilize it in the direction of having these three representations for an event. What are your opinions?

  • Python 3.5. does not support all type annotation features. Do we keep 3.5 support? Then we have to work around these problems!

I am fine with dropping Python 3.5 support from 2019 onwards. Therefore we can merge and create a develop branch with 3.6, 3.7 and soon 3.8 support only.

Union,
TypeVar,
Generic
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see @derNarr's comment above about indentation.



from numpy import ndarray
from xarray.core.dataarray import DataArray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See @derNarr's comments above about how to declare types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think his suggestion can be applied here:
In the other files, numpy / xarray is already importet.

@dekuenstle
Copy link
Member Author

@derNarr @Trybnetic thanks for your reviews. I'll try to implement your suggestions when i've some time left for this.

@derNarr
I don't completely get your comments about 'should go into master'. My suggestion would be, to merge to develop. Refining the code there with some more PR. Testing it, and then merging it to master.
This is how we would do it with git-flow #104 which would hopefully result in less, but stable releases.
Do you think, the annotated patches are such important, that we merge them directly to master and to a bugfix-release?

If there is a good and nice and comparably fast way to implement the filtering, I would be really happy about any suggestions. For me this was the nicest way of writing it, but I was never really happy about this piece of code. At the same time all the other ways of implementing it -- that I explored -- felt way more obscure.

I'll think about it. But probably refactoring it is beyond the scope of this PR.

For me this is a mixed bag. On the one hand I would like to have a consistent easy to understand API, on the other hand especially for the events, it is really convenient, to either see them as a pandas.DataFrame, a generator of tuples, or a file path. I would really like to keep the API and even stabilize it in the direction of having these three representations for an event. What are your opinions?

My suggestion is not to restrict to e.g. dicts. In contrast I would define a Union Type Events which contains all of the types you describe and is accepted everywhere. Numpy does similar, usually they convert them at the beginning of a function to one consistently used type.
Currently we accept dicts + DataFrame in one function, xarrays + dicts in another, etc.

I am fine with dropping Python 3.5 support from 2019 onwards. Therefore we can merge and create a develop branch with 3.6, 3.7 and soon 3.8 support only.

I fixed the 3.5 issues, so no need to drop 3.5 support just for typechecking.

@dekuenstle
Copy link
Member Author

dekuenstle commented Jul 6, 2018

@derNarr @Trybnetic Please have a look at the refinements.
I would suggest merging this PR to develop as a first introduction of typechecks.
Then in another PR(s) we can refactor filtering and unify the accepted event types as described above,
before merging back to master and releasing them.

Base automatically changed from develop to master June 21, 2021 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants