-
Notifications
You must be signed in to change notification settings - Fork 3
/
guide.qmd
201 lines (146 loc) · 8.83 KB
/
guide.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
---
title: "openstatsguide"
subtitle: "Minimum Viable Good Practices for High Quality Statistical Software Packages"
author: "0.1-0"
author-title: "Version"
date: "16 Sep 2024"
bibliography: references.bib
nocite: |
@*
csl: computational-statistics.csl
---
# Scope
We encourage developers of statistical software packages to follow this minimum set of good practices around:
> "**D**ocumentation, **V**ignettes, **T**ests, **F**unctions, **S**tyle, **L**ife cycle"
These keywords can be easily remembered with the mnemonic bridge sentence:
> "**D**evelopers **V**alue **T**ests **F**or **S**oftware **L**ongevity"
While the recommendations are rather generic, we focus on functional programming languages and give links to implementations in R, Python and Julia.
This guide primarily addresses developers of statistical packages. Users interested in assessing the quality of existing statistical packages will find complementary "validation" focused resources valuable, as listed in [References](#references).
# Recommendations
```{r echo=FALSE}
glue_or_drop <- function(doc, img) {
if (identical(doc, "")) {
NULL
} else {
paste0(
"## ", htmltools::tags$img(src = img, width = 25, alt = "logo"), "\n",
paste(
paste0(
paste0("[", names(doc), "]"),
paste0("(", doc, ")")
),
collapse = "\n"
)
)
}
}
guide_tabset <- function(r = "", python = "", julia = "") {
contents <- paste(
"::: {.panel-tabset}",
glue_or_drop(r, "https://www.r-project.org/logo/Rlogo.svg"),
glue_or_drop(python, "https://s3.dualstack.us-east-2.amazonaws.com/pythondotorg-assets/media/files/python-logo-only.svg"),
glue_or_drop(julia, "https://raw.githubusercontent.com/JuliaLang/julia-logo-graphics/master/images/julia-dots.svg"),
":::",
sep = "\n\n"
)
cat(contents, sep = "\n\n")
}
```
## Documentation
Documentation is important for both users and developers to understand all objects in your package, without reading and interpreting the underlying source code.
1. Use **in-line comments** next to functions, classes and other objects to generate their corresponding documentation.
```{r results="asis", echo=FALSE}
guide_tabset(
r = c("roxygen2" = "https://roxygen2.r-lib.org/"),
python = c("docstrings" = "https://peps.python.org/pep-0257/"),
julia = c("docstrings" = "https://docs.julialang.org/en/v1/manual/documentation/")
)
```
2. Do also **document internal functions** and classes for maintenance by future developers.
3. Add **code comments** for ambiguous or complex pieces of internal code, but only after optimizing the code design as much as possible.
## Vignettes
Vignettes are documents that complement the object documentation by providing a comprehensive and long-form overview of your package's functionality from a user point of view, with particular emphasis on the connection to the statistical approaches.
1. Provide an **introduction vignette** that introduces the package to new users, such that they have an easy entry point for getting started. Make sure to include code examples and automatically compile the vignette to ensure reproducibility.
2. Include **deep dive vignettes** that go into depth on specific use cases, functionalities or underlying theory, in particular describing the underlying statistical methodology and how it is implemented in the package.
3. Host your vignettes on a **dedicated website**, which allows users to read the vignettes without installing the package, and simplifies citing the vignettes in other publications.
```{r results="asis", echo=FALSE}
guide_tabset(
r = c("pkgdown" = "https://pkgdown.r-lib.org/"),
python = c("Sphinx" = "https://www.sphinx-doc.org/en/master/"),
julia = c("Documenter" = "https://documenter.juliadocs.org/")
)
```
## Tests
Tests are a fundamental safety net and development tool to ensure that your package works as expected, both during development as well as on user systems.
1. Write **unit tests** for all functions and classes in your package, to ensure that all building blocks work correctly on their own ("white box" testing). Expect to rewrite tests for internal code when you refactor it.
2. Write **functional tests** for all user facing functionality ("black box" testing). These tests ensure that the user API is stable when refactoring internal code.
```{r results="asis", echo=FALSE}
guide_tabset(
r = c("testthat" = "https://testthat.r-lib.org/"),
python = c("pytest" = "https://pytest.org/"),
julia = c("Test" = "https://docs.julialang.org/en/v1/stdlib/Test/")
)
```
3. In addition, ensure a **good coverage** of your code with your tests as a final check, but only after having unit and functional tests on all levels of the code.
```{r results="asis", echo=FALSE}
guide_tabset(
r = c("covr" = "https://covr.r-lib.org/"),
python = c("Coverage.py" = "https://coverage.readthedocs.io/"),
julia = c("Coverage.jl" = "https://github.com/JuliaCI/Coverage.jl")
)
```
## Functions
Function definitions should be short, simple and enforce argument types with assertions.
1. Write **short functions** with less than 50 lines of code for a single and well-defined purpose, with **few arguments**, and low cyclomatic complexity, in order to reduce the risk of bugs, simplify writing tests and ensure that you can maintain them.
2. Use **type hints** or types to explain to the user which argument of the function expects which type of input.
```{r results="asis", echo=FALSE}
guide_tabset(
r = c("roxytypes" = "https://openpharma.github.io/roxytypes/"),
python = c("typing" = "https://docs.python.org/3/library/typing.html"),
julia = c("types" = "https://docs.julialang.org/en/v1/manual/types/")
)
```
3. Enforce types and other expected properties of function arguments with **assertions**, which give an early and readable error message to the user instead of failing function code downstream in a less explicable way.
```{r results="asis", echo=FALSE}
guide_tabset(
r = c("checkmate" = "https://mllg.github.io/checkmate/"),
python = c("assertpy" = "https://pypi.org/project/assertpy/"),
julia = c("ArgCheck.jl" = "https://github.com/jw3126/ArgCheck.jl")
)
```
## Style
A consistent and readable code style that is language idiomatic should be used and enforced by style checks.
1. Use **language idiomatic** code and follow the **"clean code" rules** (use descriptive and meaningful names, prefer simpler over more complex code, avoid duplication of code, regularly refactor code), while allowing for exceptions only where needed.
2. Use a **formatting tool** to automatically implement a consistent and readable code format.
```{r results="asis", echo=FALSE}
guide_tabset(
r = c("styler" = "https://styler.r-lib.org/"),
python = c("Autopep8" = "https://pypi.org/project/autopep8/"),
julia = c("JuliaFormatter.jl" = "https://domluna.github.io/JuliaFormatter.jl/")
)
```
3. Use a **style checking** tool to enforce a consistent and readable code style.
```{r results="asis", echo=FALSE}
guide_tabset(
r = c("lintr" = "https://lintr.r-lib.org/"),
python = c("Pylint" = "https://pypi.org/project/pylint/"),
julia = c("StaticLint.jl" = "https://github.com/julia-vscode/StaticLint.jl")
)
```
## Life cycle
Life cycle management is simplified by reducing dependencies, and should comprise a central code repository.
1. **Reduce dependencies** to simplify maintenance of your own package. Only depend on other packages that you trust and deem stable enough for the purpose, in order to avoid reinventing the wheel.
2. **Ensure that you track dependencies and pin their versions** so if another developer contributes then they can use the same environment to produce consistent results and behaviours. This could be tracked using more system level approaches like configuring a snapshot date in the package repository to language-specific tools that generate a file that tracks dependencies and versions that serves as a source of truth for all packages developers.
3. Give clear information to users about changes in the package API via maintaining the **change log** and first **deprecating functionality** before removing it.
```{r results="asis", echo=FALSE}
guide_tabset(
r = c(
"lifecycle" = "https://cran.r-project.org/web/packages/lifecycle/index.html",
"fledge" = "https://fledge.cynkra.com/"
),
python = c("deprecation" = "https://deprecation.readthedocs.io/"),
julia = c("workflow" = "https://invenia.github.io/blog/2022/06/17/deprecating-in-julia/")
)
```
4. Use a **central repository** for version control, collecting and resolving issues, and managing releases. Include the publication of a **contributing guide** to help onboard new developers and enable community contributions.
# References