forked from hadley/r-pkgs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Metadata.Rmd
322 lines (241 loc) · 17.7 KB
/
Metadata.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
# `DESCRIPTION` and `NAMESPACE` {#sec-metadata}
```{r, echo = FALSE}
source("common.R")
status("restructuring")
```
## Introduction
There are two important files that provide metadata about your package `DESCRIPTION` and `NAMESPACE`.
The `DESCRIPTION` provides overall metadata about the package, and the `NAMESPACE` describes which functions you use from other packages and you expose to the world.
In this chapter, you'll learn the basic structure of these files and some of their simple applications: like the name and title of your package and who wrote it.
We'll continue in the next chapters to explain:
- Licensing is a big enough topic that it has a dedicated chapter (@sec-license).
If you have no plans to share your package, you may be able to ignore licensing.
But if you plan to share, even if only by putting the code where others can see it, you really should specify a license.
- The `License` field which defines who can use your package.
- The dependencies of your package, which
## `DESCRIPTION` {#sec-description}
The job of the `DESCRIPTION` file is to store important metadata about your package.
When you first start writing packages, you'll mostly use these metadata to record what packages are needed to run your package.
However, as time goes by, other aspects of the metadata file will become useful to you, such as revealing what your package does (via the `Title` and `Description`) and whom to contact (you!) if there are any problems.
Every package must have a `DESCRIPTION`.
In fact, it's the defining feature of a package (RStudio and devtools consider any directory containing `DESCRIPTION` to be a package)[^metadata-1].
To get you started, `usethis::create_package("mypackage")` automatically adds a bare-bones `DESCRIPTION` file.
This will allow you to start writing the package without having to worry about the metadata until you need to.
This minimal `DESCRIPTION` will vary a bit depending on your settings, but should look something like this:
[^metadata-1]: The relationship between "has a `DESCRIPTION` file" and "is a package" is not quite this clear-cut.
Many non-package projects use a `DESCRIPTION` file to declare their dependencies, i.e. which packages they rely on.
In fact, the project for this book does exactly this!
This off-label use of `DESCRIPTION` makes it easy to piggy-back on package development tooling to install all the packages necessary to work with a non-package project.
```{r include = FALSE, cache = FALSE}
temp_pkg <- fs::path_temp("mypackage")
withr::with_options(
list(usethis.description = NULL),
usethis::create_package(
temp_pkg, open = FALSE, rstudio = TRUE, check_name = FALSE
)
)
```
```{cat, code = readLines(fs::path(temp_pkg, "DESCRIPTION")), class.source = "yaml"}
```
If you create a lot of packages, you can customize the default content of new `DESCRIPTION` files by setting the global option `usethis.description` to a named list.
You can pre-configure your preferred name, email, license, etc.
See the [article on usethis setup](https://usethis.r-lib.org/articles/articles/usethis-setup.html) for more details.
`DESCRIPTION` uses a simple file format called DCF, the Debian control format.
You can see most of the structure in the examples in this chapter.
Each line consists of a **field** name and a value, separated by a colon.
When values span multiple lines, they need to be indented:
``` yaml
Description: The description of a package is usually long,
spanning multiple lines. The second and subsequent lines
should be indented, usually with four spaces.
```
If you ever need to work with a `DESCRIPTION` file programmatically, take a look at the [desc package](https://www.r-pkg.org/pkg/desc), which usethis uses heavily under-the-hood.
This chapter will show you how to use the straightforward `DESCRIPTION` fields.
## Title and description: What does your package do? {#description-title-description}
The title and description fields describe what the package does.
They differ only in length:
- `Title` is a one line description of the package, and is often shown in a package listing. It should be plain text (no markup), capitalised like a title, and NOT end in a period. Keep it short: listings will often truncate the title to 65 characters.
- `Description` is more detailed than the title. You can use multiple sentences, but you are limited to one paragraph. If your description spans multiple lines (and it should!), each line must be no more than 80 characters wide. Indent subsequent lines with 4 spaces.
The `Title` and `Description` for ggplot2 are:
``` yaml
Title: Create Elegant Data Visualisations Using the Grammar of Graphics
Description: A system for 'declaratively' creating graphics,
based on "The Grammar of Graphics". You provide the data, tell 'ggplot2'
how to map variables to aesthetics, what graphical primitives to use,
and it takes care of the details.
```
A good title and description are important, especially if you plan to release your package to CRAN, because they appear on the CRAN download page as follows:
<!-- TODO: I know my hacky diagram might need replacement, but at least it's more up-to-date. -->
```{r}
#| label: fig-cran-package-page
#| echo: false
#| out-width: ~
#| fig.cap: >
#| The CRAN page for ggplot2, highlighting Title and Description.
knitr::include_graphics("diagrams/cran-package-ggplot2.png")
```
If you plan to submit your package to CRAN, both the `Title` and `Description` are a frequent source of rejections for reasons not covered by the automated `R CMD check`.
In addition to the basics above, here are a few more tips:
- Put the names of R packages, software, and APIs inside single quotes. This goes for both the `Title` and the `Description`. See the ggplot2 example above.
- If you need to use an acronym, try to do so in `Description`, not in `Title`. In either case, explain the acronym in `Description`, i.e. fully expand it.
- Don't include the package name, especially in `Title`, which is often prefixed with the package name.
- Do not start with "A package for ..." or "This package does ...". This rule makes sense once you look at [the list of CRAN packages by name](https://cran.r-project.org/web/packages/available_packages_by_name.html). The information density of such a listing is much higher without a universal prefix like "A package for ...".
If these constraints give you writer's block, it often helps to spend a few minutes reading `Title` and `Description` of packages already on CRAN.
Once you read a couple dozen, you can usually find a way to say what you want to say about your package that is also likely to pass CRAN's human-enforced checks.
You'll notice that `Description` only gives you a small amount of space to describe what your package does.
This is why it's so important to also include a `README.md` file that goes into much more depth and shows a few examples.
You'll learn about that in @sec-readme.
### Author: who are you? {#description-authors}
To identify the package's author, and whom to contact if something goes wrong, use the `Authors@R` field.
This field is unusual because it contains executable R code rather than plain text.
Here's an example:
``` yaml
Authors@R: person("Hadley", "Wickham", email = "[email protected]",
role = c("aut", "cre"))
```
```{r}
person("Hadley", "Wickham", email = "[email protected]",
role = c("aut", "cre"))
```
This command says that Hadley Wickham is both the maintainer (`cre`) and an author (`aut`) and that his email address is `[email protected]`.
The `person()` function has four main inputs:
- The name, specified by the first two arguments, `given` and `family` (these are normally supplied by position, not name).
In English cultures, `given` (first name) comes before `family` (last name).
In many cultures, this convention does not hold.
For a non-person entity, such as "R Core Team" or "RStudio", use the `given` argument (and omit `family`).
- The `email` address.
It's important to note that this is the address CRAN uses to let you know if your package needs to be fixed in order to stay on CRAN.
Make sure to use an email address that's likely to be around for a while.
CRAN policy requires that this be for a person, as opposed to, e.g., a mailing list.
- One or more three letter codes specifying the `role`.
These are the most important roles to know about:
- `cre`: the creator or maintainer, the person you should bother if you have problems.
Despite being short for "creator", this is the correct role to use for the current maintainer, even if they are not the initial creator of the package.
- `aut`: authors, those who have made significant contributions to the package.
- `ctb`: contributors, those who have made smaller contributions, like patches.
- `cph`: copyright holder.
This is used if the copyright is held by someone other than the author, typically a company (i.e. the author's employer).
- `fnd`: funder, the people or organizations that have provided financial support for the development of the package.
(The [full list of roles](https://www.loc.gov/marc/relators/relaterm.html) is extremely comprehensive. Should your package have a woodcutter (`wdc`), lyricist (`lyr`) or costume designer (`cst`), rest comfortably that you can correctly describe their role in creating your package.
However, note that packages destined for CRAN must limit themselves to the subset of MARC roles listed in the documentation for `person()`.)
- The optional `comment` argument has become more relevant, since `person()` and CRAN landing pages have gained some nice features around [ORCID identifiers](https://orcid.org).
Here's an example of such usage (note the auto-generated URI):
```{r}
person(
"Jennifer", "Bryan",
email = "[email protected]",
role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-6983-2759")
)
```
```{r, eval = FALSE, include = FALSE}
db <- getNamespace("utils")$MARC_relator_db
db <- db[db$usage != "",]
tibble::as_tibble(db)
#> # A tibble: 11 × 4
#> term code description usage
#> <chr> <chr> <chr> <chr>
#> 1 Author aut A person, family, or organiz… Use for full authors wh…
#> 2 Compiler com A person, family, or organiz… Use for package maintai…
#> 3 Contractor ctr A person or organization rel… Use for authors who hav…
#> 4 Contributor ctb A person, family or organiza… Use for authors who hav…
#> 5 Copyright holder cph A person or organization to … Use for all copyright h…
#> 6 Creator cre A person or organization res… Use for the package mai…
#> 7 Data contributor dtc A person or organization tha… Use for persons who con…
#> 8 Funder fnd A person or organization tha… Use for persons or orga…
#> 9 Reviewer rev A person or organization res… Use for persons or orga…
#> 10 Thesis advisor ths A person under whose supervi… If the package is part …
#> 11 Translator trl A person or organization who… If the R code is merely…
```
You can list multiple authors with `c()`:
``` yaml
Authors@R: c(
person("Hadley", "Wickham", email = "[email protected]", role = "cre"),
person("Winston", "Chang", email = "[email protected]", role = "aut"),
person("RStudio", role = c("cph", "fnd")))
```
Every package must have at least one author (`aut`) and one maintainer (`cre`) (they might be the same person).
The maintainer (`cre`) must have an email address.
These fields are used to generate the basic citation for the package (e.g. `citation("pkgname")`).
Only people listed as authors will be included in the auto-generated citation.
There are a few extra details if you're including code that other people have written, which you can learn about in @sec-code-you-bundle.
An older, still valid approach is to have separate `Maintainer` and `Author` fields in `DESCRIPTION`.
However, we strongly recommend the more modern approach of `Authors@R` and the `person()` function, because it offers richer metadata for various downstream uses.
### `Url` and `BugReports`
As well as the maintainer's email address, it's a good idea to list other places people can learn more about your package.
The `URL` field is commonly used to advertise the package's website and to link to a public source repository, where development happens.
Multiple URLs are separated with a comma.
`BugReports` is the URL where bug reports should be submitted, e.g., as GitHub issues.
For example, devtools has:
``` yaml
URL: https://devtools.r-lib.org/, https://github.com/r-lib/devtools
BugReports: https://github.com/r-lib/devtools/issues
```
If you use `usethis::use_github()` to connect your local package to a remote GitHub repository, it will automatically populate `URL` and `BugReports` for you.
If a package is already connected to a remote GitHub repository, `usethis::use_github_links()` can be called to just add the relevant links to `DESCRIPTION`.
### Other fields {#description-other-fields}
A few other `DESCRIPTION` fields are heavily used and worth knowing about:
- `Encoding` describes the character encoding of files throughout your package.
Our package development workflow always assumes that this is set to `Encoding: UTF-8` as this now the most commonly used text encoding, and we are not aware of any reasons to use a different value.
- `Collate` controls the order in which R files are sourced.
This only matters if your code has side-effects; most commonly because you're using S4.
- `Version` is really important as a way of communicating where your package is in its lifecycle and how it is evolving over time.
Learn more in @sec-lifecycle.
- `LazyData` is relevant if your package makes data available to the user.
If you specify `LazyData: true`, the datasets are lazy-loaded, which makes them more immediately available, i.e. users don't have to use `data()`.
The addition of `LazyData: true` is handled automatically by `usethis::use_data()`.
More detail is given when we talk about external data in @sec-data.
There are actually many other rarely, if ever, used fields.
A complete list can be found in the "The DESCRIPTION file" section of the [R extensions manual](https://cran.r-project.org/doc/manuals/R-exts.html#The-DESCRIPTION-file).
### Custom fields
There is also some flexibility to create your own fields to add additional metadata.
In the narrowest sense, the only restriction is that you shouldn't re-purpose the official field names used by R.
You should also limit yourself to valid English words, so the field names aren't flagged by the spell-check.
In practice, if you plan to submit to CRAN, we recommend that any custom field name should start with `Config/`.
We featured an example of this earlier, where `Config/Needs/website` is used to record additional packages needed to build a package's website.
You might notice that `create_package()` writes two more fields we haven't discussed yet, relating to the use of the roxygen2 package for documentation:
``` yaml
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.1
```
You will learn more about these in @sec-man.
The use of these specific field names is basically an accident of history and, if it were re-done today, they would follow the `Config/*` pattern recommended above.
## The `NAMESPACE` file {#sec-namespace}
The following code is an excerpt of the `NAMESPACE` file from the testthat package.
# Generated by roxygen2 (4.0.2): do not edit by hand
S3method(as.character,expectation)
S3method(compare,character)
export(auto_test)
export(auto_test_package)
export(colourise)
export(context)
exportClasses(ListReporter)
exportClasses(MinimalReporter)
importFrom(methods,setRefClass)
useDynLib(testthat,duplicate_)
useDynLib(testthat,reassign_function)
You can see that the `NAMESPACE` file looks a bit like R code.
Each line contains a **directive**: `S3method()`, `export()`, `exportClasses()`, and so on.
Each directive describes an R object, and says whether it's exported from this package to be used by others, or it's imported from another package to be used locally.
In total, there are eight namespace directives.
Four describe exports:
- `export()`: export functions (including S3 and S4 generics).
- `exportPattern()`: export all functions that match a pattern.
- `exportClasses()`, `exportMethods()`: export S4 classes and methods.
- `S3method()`: export S3 methods.
And four describe imports:
- `import()`: import all functions from a package.
- `importFrom()`: import selected functions (including S4 generics).
- `importClassesFrom()`, `importMethodsFrom()`: import S4 classes and methods.
I don't recommend writing these directives by hand.
Instead, in this chapter you'll learn how to generate the `NAMESPACE` file with roxygen2.
There are three main advantages to using roxygen2:
- Namespace definitions live next to its associated function, so when you read the code it's easier to see what's being imported and exported.
- Roxygen2 abstracts away some of the details of `NAMESPACE`.
You only need to learn one tag, `@export`, which will automatically generate the right directive for functions, S3 methods, S4 methods and S4 classes.
- Roxygen2 makes `NAMESPACE` tidy.
No matter how many times you use `@importFrom foo bar` you'll only get one `importFrom(foo, bar)` in your `NAMESPACE`.
This makes it easy to attach import directives to every function that need them, rather than trying to manage in one central place.
Note that you can choose to use roxygen2 to generate just `NAMESPACE`, just `man/*.Rd`, or both.
If you don't use any namespace related tags, roxygen2 won't touch `NAMESPACE`.
If you don't use any documentation related tags, roxygen2 won't touch `man/`.