Git is a tool that allows one to track changes made to documents over time. In GitHub, Git provides us with the power of tracking the contribution of different users and collaborate in and manage how modifications should be merged and incorporated to the main project.
Everyone needs a Github account to contribute to this project. You can make one here. To keep your GitHub account secure you should use a strong and unique password.
-
Repository. A repository is usually used to organize a single project. Repositories can contain folders and files, images, videos, spreadsheets, and data sets -- anything your project needs. This one is named
manuscript
! -
Branches. Each repository contains one or several branches, which allows one to have different versions of a repository at one time. Our default branch for the
manuscript
repository is themain
one. There are also thegh-pages
andoutput
, which contain theHTML
andPDF
versions that are frequently created by the Manubot. You will also see other branches that correspond to the working versions of other contributors. These personal branches can be merged to themain
branch, so that their edits are passed to themain
manuscript. Once in a while, branches that have been merged and are no longer modified will be safely deleted (to keep things tidy!). -
Making and committing changes. Change to files are arbitrarily grouped together and associated to a message through commits.
- For instance, once you add or change a sentence of the manuscript, you can commit them and add a message
Added the discussion about X with its citation.
. - Each commit has a unique identifier (called a SHA or hash) and both the commit and the commit message allow contributors to identify who made the changes, when they were made, and which were the specific changes.
- When making a commit, you must include a commit message that briefly describes the changes.
- In certain circumstances, commits are the reference for the deletion or reversion of commits that are no longer necessary.
- For commits to be passed to the online repository, they must be pushed. Commits from other contributors that were pushed to the online repository can come to your local versions (either in your computer or in your Github profile) through pulls.
- For instance, once you add or change a sentence of the manuscript, you can commit them and add a message
-
Forking a repository. You can fork a repository to your Github profile, so you can keep fix merge conflicts, add or remove files, and push larger commits. Forks are used to either propose changes to someone else's project or to use someone else's project as a starting point for your own idea. One of the main ways to provide changes to this manuscript, is by:
- Forking this repository;
- Regularly committing changes to the files and pushing them to your local version of this repository; and,
- Submitting a pull request to the
main
manuscript repository, so that others can review, merge and incorporate your changes to themain
version of the manuscript.
- You must always keep your fork up-to-date with the
main
repository. You can do this by accessing your version of this repository in your Github profile and fetching upstream. See the example below:
-
Cloning a repository. Cloning a repository is similar to forking, with the main difference being that the copy of the repository is in your local computer, instead of your Github profile. When you clone a repository, you copy the repository from GitHub.com to your local machine. You can push your changes to the remote repository on GitHub.com, or pull other people's changes from GitHub.com.
- You can clone a repository by following these instructions;
- There are tools that can help you track changes with cloned repositories from your computer. You are free to choose whichever you prefer (e.g., RStudio, Atom, Visual Basic). Try Github Desktop!
- You must always keep your cloned repository up-to-date with the online version. Do this by fetching and pulling from the origin, as below: After clicking Fetch origin, it will show how many commits you must pull from the origin.
-
Making a pull request. Pull requests let you tell others about changes you have pushed to a branch in a repository on GitHub. Once a pull request is opened, you can discuss and review the potential changes with collaborators and add follow-up commits before your changes are merged into the
main
branch. This section will deserve its own topic. See more below!
As mentioned above, changes to the content of the manuscript and to this repository require you to:
- Fork or clone this repository;
- Regularly commit changes to the files and push these commits to your local version of this repository; and,
- Submit a pull request to the
main
manuscript repository, so that others can review, merge and incorporate your changes to themain
version of the manuscript.
-
Create our local branch and select it as your main working environment.
-
Commit your changes, with a relevant Commit message and with a Description explaining the changes happening in that commit (e.g., whether they solve an issue or something alonge these lines):
- Push your commits to your branch:
-
Once you are done with the changes you intended to do, create a pull request:
-
Create your pull request. Do not forget to briefly explain your changes, which will be reviewed by other contributors. You may also assign specific reviewers, if you would like to:
-
Once created, Manubot will evaluate if the pull request passes all error checks:
A green cue confirms that everything has worked fine.
- Your pull request will be reviewed by two other contributors, which will discuss, provide or request new changes, or directly merge and close your pull request with the
main
repository. Below, another contributor evaluated the pull request and requested corrections to be made. The assignee for the pull request then provided the requested changes by pushing commits to the same branch that concerns the pull request.
This repository uses Manubot to automatically produce a manuscript from the source in the content
directory.
Check out the Manubot catalog for examples of what is possible when writing with Manubot.
Try editing the demo manuscript to quickly test Manubot formatting and citations.
Manuscript text should be written in markdown files located in the content
directory.
Markdown files are identified by their .md
extension and ordered according to their two-digit prefix (e.g. 01.
, 02.
, … 99.
).
For basic formatting, check out the CommonMark Help page for an introduction to the formatting options provided by standard markdown. In addition, Manubot supports an extended version of markdown, tailored for scholarly writing, which includes Pandoc's Markdown and the extensions discussed below.
The content/02.delete-me.md
file in the Rootstock repository shows many of the elements and formatting options supported by Manubot.
See the raw markdown in this file and compare it to the rendered manuscript.
Within a paragraph in markdown, single newlines are interpreted as whitespace (same as a space). A paragraph's source does not need to contain newlines. However, "one paragraph per line" makes the git diff less precise, leading to less granular review commenting, and makes conflicts more likely. Therefore, we recommend using semantic linefeeds — newlines between sentences. We have found that "one sentence per line" is preferable to "word wrap" or "one paragraph per line".
Manubot supports markdown tables.
| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| value_a | 1 | 47 |
| value_b | 2 | 56 |
Table: Caption for this example table. {#tbl:example-id}
Support for table numbering and citation is provided by pandoc-tablenos
.
Above, {#tbl:example-id}
sets the table ID, which creates an HTML anchor and allows citing the table like @tbl:example-id
.
For easy creation of markdown tables, check out the Tables Generator webapp.
Figures can be included with the following markdown:
![Caption for the example figure.](url_or_path_to_figure){#fig:example-id}
The blank line before the figure is required.
Support for figure numbering and citation is provided by pandoc-fignos
.
This figure can be cited in the text using @fig:example-id
.
In context, a figure citation may look like: Figure {@fig:example-id}B shows …
.
For images created by the manuscript authors that are hosted elsewhere on GitHub, we recommend using a versioned GitHub URL to embed figures, thereby preserving exact image provenance.
When embedding SVG images hosted on GitHub, it's necessary to append ?sanitize=true
to the raw.githubusercontent.com
URL.
For example:
https://raw.githubusercontent.com/greenelab/scihub/572d6947cb958e797d1a07fdb273157ad9154273/figure/citescore.svg?sanitize=true
Figures placed in the content/images
directory can be embedded using their relative path.
For example, we embed an ORCID icon inline using:
![ORCID icon](images/orcid.svg){height="13px"}
The bracketed text following the image declaration is interpreted by Pandoc's link_attributes
extension.
For example, the following will override the figure number to be "S1" and set the image width to 5 inches:
{#fig:supplement tag="S1" width="5in"}
We recommend always specifying the width of SVG images (even if just width="100%"
), since otherwise SVGs may not render properly in the WeasyPrint PDF export.
Manubot supports Pandoc citations, but with added support for citing persistent identifiers directly. Citations are processed in 3 stages:
- Pandoc parses the input Markdown to locate citation keys.
- The
pandoc-manubot-cite
filter automatically retrieves the bibliographic metadata for citation keys. - The
pandoc-citeproc
filter renders in-text citations and generates styled references.
When citing persistent identifiers, citation keys should be formatted like @prefix:accession
,
where prefix
is one of the options described below.
When choosing which source to use for a citation, we recommend the following order:
- DOI (Digital Object Identifier), cite like
@doi:10.15363/thinklab.4
. Shortened versions of DOIs can be created at shortdoi.org. shortDOIs begin with10/
rather than10.
and can also be cited. For example, Manubot will expand@doi:10/993
to the DOI above. We suggest using shortDOIs to cite DOIs containing forbidden characters, such as(
or)
. - PubMed Central ID, cite like
@pmc:PMC4497619
. - PubMed ID, cite like
@pubmed:26158728
. - arXiv ID, cite like
@arxiv:1508.06576v2
. - ISBN (International Standard Book Number), cite like
@isbn:9781339919881
. - URL / webpage, cite like
@https://nyti.ms/1QUgAt1
. URL citations can be helpful if the above methods return incorrect metadata. For example,@doi:10.1038/ng.3834
incorrectly handles the consortium name resulting in a blank author, while@https://doi.org/10.1038/ng.3834
succeeds. Similarly,@https://doi.org/10.1101/142760
is a workaround to set the journal name of bioRxiv preprints to bioRxiv. - Wikidata Items, cite like
@wikidata:Q50051684
. Note that anyone can edit or add records on Wikidata, so users are encouraged to contribute metadata for hard-to-cite works to Wikidata. - Any other compact identifier supported by https://bioregistry.io.
Manubot uses the Bioregistry to support hundreds of prefixes.
For example, citing
@clinicaltrials:NCT04280705
will produce the same bibliographic metadata as@https://bioregistry.io/clinicaltrials:NCT04280705
or@https://clinicaltrials.gov/ct2/show/NCT04280705
. - For references that do not have any of the above persistent identifiers,
the citation key does not need to include a prefix.
Citing
@old-manuscript
will work, but only if reference metadata is provided manually.
Manubot is able to infer certain prefixes,
such some citations can be formatted like @accession
(without a prefix).
Examples includes DOIs like @10.15363/thinklab.4
or @10/993
,
PMC / PubMed identifiers like @PMC4497619
or @26158728
,
arXiv identifier like @1508.06576v2
,
and Wikidata identifiers like @Q50051684
.
To disable citekey prefix inference, add the following to metadata.yaml
:
pandoc:
manubot-infer-citekey-prefixes: false
Cite multiple items at once like:
Here is a sentence with several citations [@doi:10.15363/thinklab.4; @pubmed:26158728; @arxiv:1508.06576; @isbn:9780394603988].
Note that multiple citations must be semicolon separated.
Be careful not to cite the same study using identifiers from multiple sources.
For example, the following citations all refer to the same study, but will be treated as separate references: [@doi:10.7717/peerj.705; @pmc:PMC4304851; @pubmed:25648772]
.
The citation key syntax is described in the Pandoc manual:
Unless a citation key start with a letter, digit, or
_
, and contains only alphanumerics and internal punctuation characters (:.#$%&-+?<>~/
), it must be surrounded by curly braces, which are not considered part of the key. In@Foo_bar.baz.
, the key isFoo_bar.baz
. The final period is not internal punctuation, so it is not included in the key. In@{Foo_bar.baz.}
, the key isFoo_bar.baz.
, including the final period. The curly braces are recommended if you use URLs as keys:[@{https://example.com/bib?name=foobar&date=2000}, p. 33]
.
If a citation key does not fully match this online regex
(for example, contains characters such as ;
or =
or end with a non-alphanumeric character such as /
),
make sure to surround it with curly braces or use the citation aliases workaround below.
Prior to Rootstock commit 6636b91
on 2020-01-14, Manubot processed citations separately from Pandoc.
Switching to a Pandoc filter improved reliability on complex documents, but restricted the syntax of citation keys slightly.
Therefore, users upgrading Rootstock may find some citations become invalid.
By default, pandoc-manubot-cite
does not fail upon invalid citations, although this can be changed by adding the following to metadata.yaml
:
pandoc:
manubot-fail-on-errors: true
The system also supports citation aliases, which map from one citation key (the "alias" or "tag") to another. Aliases are recommended for the following applications:
- A citation key contains forbidden characters.
- A single reference is cited many times. Therefore, it might make sense to define an alias, so if the citation updates (e.g. a newer version becomes available), only a single change is required.
Aliases can be defined using Markdown's link reference syntax as follows:
Citing a URL containing a `?` character [@my-url].
Citing a DOI containing parentheses [@my-doi].
[@my-url]: https://openreview.net/forum?id=HkwoSDPgg
[@my-doi]: doi:10.1016/S0022-2836(05)80360-2
This syntax is also used by pandoc-url2cite
.
Make sure to place these link reference definitions in their own paragraphs.
These paragraphs can be in any of the content Markdown files.
Another method for defining aliases is to define pandoc.citekey-aliases
in metadata.yaml
:
pandoc:
citekey-aliases:
my-url: https://openreview.net/forum?id=HkwoSDPgg
my-doi: doi:10.1016/S0022-2836(05)80360-2
Manubot stores the bibliographic details for references (the set of all cited works) as CSL JSON (Citation Style Language Items).
Manubot automatically generates CSL JSON for most persistent identifiers (as described in Citations above).
In some cases, automatic metadata retrieval fails or provides incorrect or incomplete information.
Errors are most common for references generated from scraping HTML metadata from websites.
This occurs most frequently for https
/http
/url
citations as well as Bioregistry prefixes without explicit support listed above.
Therefore, Manubot supports user-provided metadata, which we refer to as "manual references".
When a manual reference is provided, Manubot uses the supplied metadata and does not attempt to generate it.
Manubot searches the content
directory for files that match the glob pattern manual-references*.*
and expects that these files contain manual references.
content/manual-references.json
is the default file to specify custom CSL JSON metadata.
Manual references are matched to citations using their "id" field.
For example, to manually specify the metadata for the citation @https://github.com/manubot/rootstock
, add a CSL JSON Item to manual-references.json
that contains the following excerpt:
"id": "https://github.com/manubot/rootstock",
The metadata for unhandled citations — any citation key that is a not a supported persistent ID — must be provided in a manual reference file (e.g. manual-references.json
) or an error will occur.
For example, to cite @private-message
in a manuscript, a corresponding CSL JSON Item is required, such as:
{
"id": "private-message",
"type": "personal_communication",
"title": "Personal communication with Doctor X"
}
All manual references must provide values for the "id" and "type" fields. For guidance on what CSL JSON should be like for different document types, refer to these examples.
Manubot offers some support for other bibliographic metadata formats besides CSL JSON, by delegating conversion to the pandoc-citeproc --bib2json
utility.
Formats are inferred from filename extensions.
So, for example, to provide metadata for @https://github.com/manubot/rootstock
in BibTeX format, create the file content/manual-references.bib
and create an item whose definition starts with the excerpt:
@misc{https://github.com/manubot/rootstock,
Processed reference metadata in CSL JSON format, either generated by Manubot or specified via manual references, is exported to references.json
.
This file is located in the output
branch on GitHub or in the output
subdirectory of local builds.
The "id" field in references.json
and in the final manuscript uses a shortened ID that is derived from the original ID.
For debugging information, see citations.tsv
, which shows citation identifiers as they progress through the processing pipeline.
In order to freeze all references, rather than have Manubot regenerate them during future builds, copy the references.json
output file to content
with a filename matching the manual-references*.json
pattern.
One tip is to embed the date references.json
was generated into the frozen manual reference filename, like content/manual-references-2019-06-21.json
.
content/metadata.yaml
contains manuscript metadata that gets passed through to Pandoc, via a yaml_metadata_block
.
metadata.yaml
should contain the manuscript title
, authors
list, keywords
, and lang
(language tag).
Additional metadata, such as date
, will automatically be created by the Manubot.
Manubot uses the timezone specified in build.sh
for setting the manuscript's date.
For example, setting the TZ
environment variable to Etc/UTC
directs the Manubot to use Coordinated Universal Time.
We recommend authors add themselves to metadata.yaml
via pull request (when requested by a maintainer), thereby signaling that they've read and approved the manuscript.
The following YAML shows the supported key–value pairs for an author:
github: dhimmel # strongly suggested
name: Daniel S. Himmelstein # mandatory
initials: DSH # optional
orcid: 0000-0002-3012-7446 # mandatory
twitter: dhimmel # optional
email: [email protected] # suggested
affiliations: # as a list, strongly suggested
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania
- Department of Biological & Medical Informatics, University of California, San Francisco
funders:
- GBMF4552 # optional list of author's funding
Note that affiliations
should be a list to allow for multiple affiliations per author.
A thumbnail is an image used to visually represent the manuscript, such as when a manuscript is shared on social media or added to the Manubot catalog. Specify a thumbnail in any of the following ways:
- placing an image named
thumbnail.png
anywhere in the manuscript repository (for example, in the root directory). - setting
thumbnail
inmetadata.yaml
to a path, relative to the repository root, where the image file is located. Example:thumbnail: build/assets/thumbnail-1000x1000.png
- setting
thumbnail
inmetadata.yaml
to an absolute URL where the image is located. Example:thumbnail: https://github.com/greenelab/meta-review/raw/master/thumbnail.png
Methods 2 and 3 take precedence over method 1. View the guidelines here for suggestions on how to create a good thumbnail. Key points are that thumbnails should be 1000 × 1000 pixels, PNG formatted, and striking.
Modifying the manuscript formatting requires modifying the CSS in the file build/themes/default.html
.
Common formatting changes, such as font size and double spacing, can be found by searching the Rootstock issues.
Open a new issue if you have a new formatting question.
Changing the citation style or which interactive HTML plugins are loaded requires editing the options specified by Pandoc defaults files in build/pandoc/defaults
.
The citation style is determined by the Citation Style Language file specified in common.yaml
:
metadata:
csl: build/assets/style.csl
The value for metadata.csl
can be a URL, allowing access to thousands of existing styles hosted by Zotero or the CSL GitHub.
For example, the following options replace the Manubot citation style with the PeerJ style:
metadata:
csl: https://github.com/citation-style-language/styles/raw/906cd6d43d0c136190ecfbb12f6af0ca794e3c5b/peerj.csl
When the SPELLCHECK
environment variable is true
, the pandoc spellcheck filter is run.
Potential spelling errors will be printed in the continuous integration log along with the files and line numbers in which they appeared.
Words in build/assets/custom-dictionary.txt
are ignored during spellchecking.
Spellchecking is currently only supported for English language manuscripts.
If you experience any issues with the Manubot or would like to contribute to its source code, please visit manubot/manubot
or manubot/rootstock
.
To cite the Manubot project or for more information on its design and history, see @doi:10.1371/journal.pcbi.1007128
:
Open collaborative writing with Manubot
Daniel S. Himmelstein, Vincent Rubinetti, David R. Slochower, Dongbo Hu, Venkat S. Malladi, Casey S. Greene, Anthony Gitter
PLOS Computational Biology (2019-06-24) https://doi.org/c7np
DOI: 10.1371/journal.pcbi.1007128 · PMID: 31233491
The Manubot version of this manuscript is available at https://greenelab.github.io/meta-review/.
We would like to thank the contributors and funders whose support makes the Manubot project possible. Specifically, Manubot development has been financially supported by:
- the Alfred P. Sloan Foundation in Grant G-2018-11163 to @dhimmel.
- the Gordon & Betty Moore Foundation (@DDD-Moore) in Grant GBMF4552 to @cgreene.