-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typing #160
base: main
Are you sure you want to change the base?
Typing #160
Conversation
For automatic generation we used MonkeyType. We run monkeytype on the pytests and generate the stubs. The information of the stubs was manually slightly simplified and applied. Then minor refactoring was necessary to pass typechecking. https://github.com/Instagram/MonkeyType
@Trybnetic @derNarr Finally tests also pass for python3.5 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for replying so late. I had some very busy weeks. I really like the changes and they gave me the first good impression how "typed" python code looks.
Additionally, some minor changes that should be merged into the master directly are included in this pull request.
I added some questions for clarification and some comments into the diff. I noted where imho code changes should go directly into master.
I will address the questions from the pull request comment in a different comment.
Before merging:
- the type imports should be discussed
- questions in the pull request description should be discussed
Now some overall observations follow:
Overall, the code get's more specific and less general by typing it. This is also reflected in some of the name changes of variables / labels, e. g. activations -> activation_dict
, events -> cues_gen
. This feels like the right naming scheme after typing the code. But at the same time it feels slightly unpythonic and reduces the readability for me a bit. I might only need to get used to this kind of written code and I see the benefits of the typing. But still the code looks less clean and more cluttered to me right now. Maybe I simply needs to get used to it.
if len(memory) > 0: | ||
_, total, used, *_ = memory[0].split() | ||
else: | ||
total, used = '?', '?' | ||
osinfo += "{} {}MiB/{}MiB\n".format(identifier, used, total) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This of the memory printing change should go into master as well (independent of the typing). IMHO.
pyndl/__init__.py
Outdated
@@ -63,8 +63,11 @@ def sysinfo(): | |||
if uname.sysname == "Linux": | |||
_, *lines = os.popen("free -m").readlines() | |||
for identifier in ["Mem:", "Swap:"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterate over Tuple here? i. e. ("Mem:", "Swap:")
pyndl/activation.py
Outdated
|
||
import numpy as np | ||
import xarray as xr | ||
|
||
|
||
from numpy import ndarray | ||
from xarray.core.dataarray import DataArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type can also be accessed via xr.core.dataarray.DataArray
therefore it would be possible to define the DataArray like:
DataArray = xr.core.dataarray.DataArray
instead of
from xarray.core.dataarray import DataArray
But the proposed version is probably cleaner. I am still a bit undecided. Any opinions on that? (The same applies to ndarray.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
@@ -68,7 +69,7 @@ def read_clean_gzfile(gz_file_path, *, break_duration=2.0): | |||
text = word_tag.text | |||
if text in PUNCTUATION: | |||
words.append(text) | |||
else: | |||
elif text is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have enough test coverage here? Is this the right thing to do? If yes, this should be merged into master as soon as possible as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed that it might be good to raise an exception here when text
is None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we raise a exception for null text.
pyndl/io.py
Outdated
|
||
import pandas as pd | ||
from pandas.core.frame import DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above about how to define / import the DataFrame class / type.
|
||
|
||
def pytest_addoption(parser): | ||
""" | ||
adds custom option to the pytest parser | ||
""" | ||
parser.addoption("--runslow", action="store_true", | ||
help="run slow tests") | ||
default=False, help="run slow tests") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should go into master.
@@ -140,7 +138,7 @@ def test_ignore_missing_cues_dict(): | |||
assert np.allclose(reference_activations[outcome], activation_list) | |||
|
|||
|
|||
@slow | |||
@pytest.mark.slow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should go into master.
@@ -1,5 +1,5 @@ | |||
[tox] | |||
envlist = py{35,36}-test, checkstyle, documentation | |||
envlist = py{35,36}-test, checkstyle, checktypes, documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should go into master.
@@ -52,7 +52,6 @@ deps = mypy | |||
setenv = | |||
MYPYPATH=./stubs/ | |||
commands = mypy --ignore-missing-imports pyndl | |||
ignore_outcome = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should go into master.
@@ -351,7 +379,19 @@ def process_context(line): | |||
process_words(words) | |||
|
|||
|
|||
class JobFilter(): | |||
class JobFilterBase(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasoning behind defining this base class? Is it to define the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The type checker requires method definitions, but doesn't accept assignment already existing methods.
Addressing some of the questions posed by David:
I like these kind of type aliases as they improve readability imho.
IMHO sometimes three layers like
If there is a good and nice and comparably fast way to implement the filtering, I would be really happy about any suggestions. For me this was the nicest way of writing it, but I was never really happy about this piece of code. At the same time all the other ways of implementing it -- that I explored -- felt way more obscure.
For me this is a mixed bag. On the one hand I would like to have a consistent easy to understand API, on the other hand especially for the events, it is really convenient, to either see them as a pandas.DataFrame, a generator of tuples, or a file path. I would really like to keep the API and even stabilize it in the direction of having these three representations for an event. What are your opinions?
I am fine with dropping Python 3.5 support from 2019 onwards. Therefore we can merge and create a |
Union, | ||
TypeVar, | ||
Generic | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see @derNarr's comment above about indentation.
|
||
|
||
from numpy import ndarray | ||
from xarray.core.dataarray import DataArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See @derNarr's comments above about how to declare types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think his suggestion can be applied here:
In the other files, numpy / xarray is already importet.
@derNarr @Trybnetic thanks for your reviews. I'll try to implement your suggestions when i've some time left for this. @derNarr
I'll think about it. But probably refactoring it is beyond the scope of this PR.
My suggestion is not to restrict to e.g. dicts. In contrast I would define a Union Type Events which contains all of the types you describe and is accepted everywhere. Numpy does similar, usually they convert them at the beginning of a function to one consistently used type.
I fixed the 3.5 issues, so no need to drop 3.5 support just for typechecking. |
@derNarr @Trybnetic Please have a look at the refinements. |
Fixes #33.
Changes proposed in this pull request:
Please note, that the type annotations which I added are fare from perfect.
Therefore with this PR I would love to hear your ideas and comments to improve them before merging.
There is still a lot of dirty stuff in it (
type: ignore
, ...) to get things running.Some things we should think about:
Cue = str
?Iterable[Event]
vsIterable[Tuple[Cues, Outcomes]]
vsIterable[Tuple[List[str],List[str]]] vs ...
)As a rule of thumb: We should try to remove almost all
# type: ignore
and Union and minimize the number of TypeVar usages. Then we not only get typing but also nices code + api.Feel free to improve directly in this branch, I probably cannot contribute for the next 20 days.
The PR uses a develop branch as base like suggested with git-flow pattern in #104 .
Best,
David