Here are some guidelines to effectively contribute to pypath
development.
Please make sure you follow them as much as possible.
- Create a new module in the
pypath.inputs
module, or a new function if a module for the resource already exists. Follow other input modules as examples and use the module infrastructure, especiallyshare.curl
for downloads andshare.settings
for configuration. Configuration is stored insettings.yaml
. - Add all the required information in
resources.json
anddescriptions.py
. These information will be shown also on the OmniPath web page after the next update of the server. - Most important to add the license, for license short names see this directory
- Check the loaded data directly in the output of the new input method
- Create input definitions for the relevant core databases (network,
enzyme-substrate, complexes, annotations, intercell); this might be
not straightforward, don't hesitate to ask for help. See a brief guidance
below:
- For each core database, create a function in the
inputs
module that produces an output similar to existing functions for the same core database; e.g. for the annotations database it should be a dict with UniProt IDs as keys and sets of tuples as values; name this function after the name of the resource and the name of the core database separated by underscore, e.g. for a resource called Cool New Resource, the function name should becoolnew_annotations
(avoid including redundant words such as "resource" or "database") network
: Create an input definition inresources.data_formats
, make sure it is in the correct dataset dictannot
: Create a new resource specific class incore.annot
following the example of existing classes; add the name of the class to the list ofprotein_sources_default
complex
: Create a new resource specific class incore.complex
following the example of existing classes; add the name of the class to the list ofcomplex_resources
enz_sub
: Create an input definition inresources.json
under the name of the resource, under keys "inputs" → "enzyme_substrate" (see examples of such definitions under any existing enz-sub resource, e.g. SIGNOR)intercell
: Make sure the resource is already added to theannotations
database. Add one or moreAnnotDef
class annotation definitions toannot_combined_classes
in thecore.intercell_annot
module. Theargs
dict of each definition is passed tocore.annot.AnnotationBase.select
. Usingresource = AnnotOp(...)
in the definitions provides a way to apply custom operations on multiple definitions. The tilde notation refers to all defined specific categories within a generic category, e.g."~ligand"
refers to the union of all ligand categories. You can add identifiers to theexclude
dict which are misannotated in any resource (e.g. ESR1 is not secreted).
- For each core database, create a function in the
- Check the data loaded into the core databases
- Our daily builds might reveal issues and also here we can see if the new resource is loaded correctly into the web service data frames.
- Follow the coding style recommendations below when writing the new class or function
- Write the docstrings - description of the object, inputs, outputs, examples, etc (see below more details about docstrings).
- Rebuild the documentation.
- Include in
__all__
module variable if applicable. - Write the unit test(s) necessary.
- Add dependencies to
setup.py
if applicable.
The pypath
aims to follow the style standards most widely shared in the
Python community such as PEP8 and the Google style guide. pypath
has undergone large refactoring efforts since 2019 and it has still around 30%
of its working codebase left with a sometimes messy and ugly style. Please
forgiving when you see these parts and feel free to improvethem if you want.
To see a more consistent style characteristic to the module, we recommend to
look at some new core modules, e.g. pypath.core.network
or
pypath.core.interaction
. As we mentioned we follow the standards, we have
exceptions in a few points. In the document below we only discuss these
alterations and additional rules. For general guidelines you can refer to the
PEP8 and the Google style guide.
- Class names should be
PascalCase
(camel case with first letter capital), however resource names should be shown like a single word, e.g. a class of annotation from Protein Atlas isProteinatlasAnnotation
- Function, method and variable names should follow
snake_case
, resource names typically shown as a single word, e.g. a function yielding annotations from Protein Atlas should beproteinatlas_annotations
- The names of the resources should not contain any underscore as it is used to separate primary and secondary resources (e.g. to refer to PhosphoSite data retrieved from ProtMapper we use the label "PhosphoSite_ProtMapper")
Vertical empty space does not cost anything and helps a lot with readability. We use it extensively. Between, before and after methods and classes we leave 2 empty lines. Before and after blocks we leave an empty line. Inside blocks we add blank lines between short segments of logically strongly connected elements (usually one or a few lines).
def function1(a = 1, b = 2):
a = a + 1
b = b - 1
return a * b
def function2(a, b):
for i in a:
if i // b:
yield i
If the list of arguments or anything inside parentheses needs to be broken to
multiple lines we start newline after the opening parenthese and each element
should have its own line one level deeper indented than the opening line.
We add commas after each element including the last one, except if it is
**kwargs
because comma after **kwargs
is syntax error in Python 2.
If the elements are statements separated by operators (e.g. +
or and
)
we leave the operator at the end of the line and start the next statement
in a new line. The closing parenthesis returns to the upper indentation level
and left hanging in its own line.
a = function(
foo = 'foo',
bar = 'bar',
baz = 'baz',
)
a = function(
foo = 'foo',
bar = 'bar',
**kwargs
)
if (
condition1 and
condition2
):
do_something()
Anywhere else if the vertical alignment helps readability is good to use line breaks. For example when similar statements are repeated.
a = (
record[i],
record[i + 7],
)
If the contents of a list or other comprehension does not fit into one line
or even if it fits but looks better if broken into multiple pieces, we start
the value, each for
and if
statements in a new line. The closing
parenthesis has the original indentation level in a new line.
foo = [
bar
for bar in baz
if bar[0]
]
After the triple double quotes the docstring starts in a new line and also the closing quotes have their own line. For docstrings we follow the Napoleon (Google) standard.
Thanks to type hints we don't have to add the type of arguments and return
values in parentheses (enough to write arg:
instead of arg (int):
).
def function(a: int) -> List[int]:
"""
Does something.
Args:
a: A number. I just make this sentence longer, hopefully
enough long to show how to make a line break: indent the rest
of the lines, so the argument name itself sticks out.
Returns:
A list of integers. I just make this sentence longer,
hopefully enough long to show how to make a line break.
"""
return [a, a + 1, a + 2]
By the end of 2021 we stopped supporting Python 2. Since then we use Python 3 only features such as type hinting.
You can see many uses of future.utils.iteritems()
, past.builtins.xrange()
instead of simply .items()
and range()
, or map
, filter
, reduce
(past.builtins.reduce
) which all aimed to maintain Python 2 compatibility.
These are already not necessary and should not be used in new code.
Operators never stick to the operands, always surrounded by spaces. Also there is a space after the comma when listing elements of tuples and lists within a line.
def function(a = 1):
a = a * 4 + 3
a = (1, 2, 3)
We use single quotes unless there is any strong reason to use double.
We group imports the following way:
- Compatibility imports (e.g.
future.utils.iteritems
; not used any more) - Standard library imports
- Typing imports
- Imports from other third party dependencies (e.g.
numpy
) - Imports from
pypath
itself
Between these sections we leave an empty line
pypath
has its own session and log manager built in. A class either should
inherit from pypath.share.session.Logger
and then have its _log
method
or a logger instance should be created at the module level and used by
methods and classes in that module. Each logger instance have a default label
so in the log one can see which message comes from which part of the module.