Coding guidelines for contributors (and pull request reviewers) are described in the Code Review Checklist.
Please record all changes and updates in HISTORY.rst under the upcoming section. It is especially important to log changes that break backwards compatibility so we can appropriately adjust the versioning.
- Only imports to
vcm
,vcm.cubedsphere
,vcm.safe
,vcm.cloud
,vcm.testing
are allowed. No deeper nesting (e.g.vcm.cloud.fsspec
) or imports to other modules are allowed.
Excessive coupling between our workflows has several negative consequences including:
- Changes to shared code can potentially break any workflow importing that code, making it hard to predict what can break for a given change.
- Depending on uneeded code can increase the number of dependencies required for a particular workflow. More dependencies result in more bugs related to upstream changes and more setup costs (e.g. huge docker images or slow dataflow startup times).
To make each workflow as modular as possible, we need to make sure that workflow-specific code cannot effect other workflows. This is accomplished by following these rules:
- Any new workflow code is assumed workflow-specific unless actually used by multiple workflows.
- Workflow specific code should be contained within the
workflows/<workflow name>/
folder.
Of course, some workflow-specific code will become useful in other workflows. In this case, the shared functionality should be moved to a relevant python "micropackage" in external
. If the new functionality does not seem like a good fit for an existing package e.g. (external/vcm
) then a new package should be created. You can refer to existing packages as templates. Each of these micropackages should have a minimal set of dependencies.
- The external interfaces are the modules
vcm
,vcm.cubedsphere
, andvcm.cloud
. All routines to be used externally should be imported into one of these namespaces. This rule could change pending future changes to the vcm API.
Type checking with mypy
can make the code more robust and can help catch
type-related errors without the need to write unit-tests. For example,
type-checking can discover if a function is passed the wrong number of
arguments. A common mistake in long-running pipeline code. In general,
type-checks + unit tests often very closely predict if a long-running
integration test will succeed.
By default, mypy will only analyze code that has type hints which allows a
gradual migration. Unfortunately, many of functions use type-hints
incorrectly and mypy will catch those errors. For this reason, type-checking
is only enabled for a subset of modules, see .pre-commit-config.yaml
for an
up-to-date list.
Contributers are encouraged to add type-hints in their code and add the
modules they are updating to .pre-commit-config.yaml
.
Many python libraries do not have type-hints, so mypy will complain when
importing them. Add type: ignore
after the import to ignore these mypy
errors.
Xarray has type-hinting, but it oftentimes is incorrect and not very helpful.
For the former, a quick # type: ignore
will help.
Some xarray functions also return Union types, which makes life difficult. For example, mypy will fail on this code:
def func(ds: xr.Dataset):
pass
dataset: xr.Dataset = ...
# error:
# this line will give type error because mypy doesn't know
# if ds[['a', 'b]] is Dataset or a DataArray
func(ds[['a', 'b']])
You can fix this by explicitly, casting ds[['a', 'b']]
as is done by
vcm.safe.get_variables
.
from vcm import safe
# OK:
func(safe.get_variables(ds, ['a', 'b']))