This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Course Description1
Scientific discovery is typically a collective process, as researchers build their work on the preceding efforts of other researchers. This is certainly the case for theory, empirical evidence, and methods, as empirical researchers use analytical techniques developed by methodologists, theoreticians build on up-to-date evidence, and data collection inspires new methods of analysis. The reality is that contemporary research is not possible in isolation. A key element of the web of research relationships is the basic unit of research output, which typically takes the form of a journal paper, book chapter, or report. This unit of output, however, represents only the face of a multilayered process, and by its very nature is limited in the amount of information that it can communicate.
Increasingly, the development of recent technologies makes it easier and less expensive to communicate with greater efficiency. From data repositories to supplementary e-content in journals, as well as data policy requirements of research funders, there is a strong incentive for research to become more open and reproducible. Reproducibility means that research results can be verified independently, including all relevant assumptions and decisions. Every figure, every table, and every result are open for inspection, including the processes used to generate them. Research reproducibility is essential to maintain trust in the process, and has numerous advantages, including accelerating discovery and reducing inequality in access to research tools and results. Furthermore, other researchers can more easily use methods and tools if they are open. Not surprisingly, as newer technologies facilitate the transfer of research findings (including open data, open software, and open publishing), there has been a growth of interest in ways of achieving openness and reproducibility.
The objective of this course is to equip students with the fundamental concepts and tools needed to develop a reproducible research workflow. The course should be of interest to new graduate students in the sciences and social sciences, and is relevant to research involving qualitative or quantitative data. The course is also appropriate for experienced researchers who would like to update their workflow to comply with reproducibility criteria.
The course covers the following topics:
- Fundamentals of reproducible research
- Basic tools for implementing a reproducible research workflow:
GitHub and
R
- Data Management Plans
- Creating basic units of shareable code
- Documenting the process of doing research
- Generating reproducible research documents
By the end of the course, the students will produce a report with all the necessary components to make it a unit of reproducible research. In the spirit of the course, resources and materials will be based on mostly open resources.
Antonio Paez | Professor |
---|---|
Office: GSB 236 | |
Office Hours: TBD | |
Phone: (905) 525-9140, ext. 26099 | |
Email: [email protected] |
The course will be organized in weekly 2-and-a-half-hour meetings. The format of the meetings will be a combination of seminar-style discussion, hands-on activities, and guest speakers. The topics and readings are found in the Course Schedule.
Students are responsible for completing the readings indicated in the Course Schedule. Any resources that are not open will be shared by the instructors.
Students are assessed based on the completion of a sequence of activities. Note that the activities are designed to combine towards one final deliverable, so it is not advisable to skip any of them.
Activity 1: R Markdown Exercise | 5% |
Activity 2: First project | 5% |
Activity 3: Version Control Exercise | 10% |
Activity 4: DMP | 10% |
Activity 5: Data Package | 15% |
Activity 6: Data Analysis Documentation | 15% |
Activity 7: Peer Review Exercise | 20% |
Final Deliverable | 20% |
McMaster’s graduate grading system will be used. Note that according to section 2.5.3 of the Graduate Calendar passing grades are A+, A, A-, B+, B and B- only.
Academic dishonesty consists of misrepresentation by deception or by other fraudulent means and can result in serious consequences, e.g. the grade of zero on an assignment, loss of credit with a notation on the transcript (notation reads: “Grade of F assigned for academic dishonesty”), and/or suspension or expulsion from the university.
It is your responsibility to understand what constitutes academic dishonesty. For information on the various kinds of academic dishonesty please refer to the Academic Integrity Policy, specifically Appendix 3.
The following illustrates only three forms of academic dishonesty:
-
Plagiarism, e.g. the submission of work that is not one’s own or for which other credit has been obtained.
-
Improper collaboration in group work.
-
Copying or using unauthorized aids tests and examinations.
Week 1 (Sept. 6, 10:00 am - 12:30 pm)
Topic: Course overview and introduction: Why reproducible
research?
Readings: No readings this week
For discussion: Principles of open science, advantages, funding and
policy environment, journal policies and the publication process,
roadmap for course
Week 2 (Sept. 13, 10:00 am - 12:30 pm) Topic: R
+ RStudio +
markdown
Suggested Readings:
What is R
?
R for Data Science
What is
Markdown
Activity 1: Use markdown to create a document with basic operations in
R
Week 3 (Sept. 20, 10:00 am - 12:30 pm) Topic: Projects and
Reproducible
Environments
Readings:
Projects
{here}: a package for projet oriented
workflows {renv}: a package for reproducible
environments in R
Activity 2: Create a project with your proposed directory structure,
and initialize a reproducible environment
Week 4 (Sept. 27, 10:00 am - 12:30 pm) Topic: Version Control and
GitHub
Readings:
What is version
control?
What is GitHub?
{gitcreds}: a package to query git credentials from
R
Activity 3: Post a README notice in
GitHub and one document with basic operations in R
Week 5 (Oct. 4, 10:00 am - 12:30 pm)
Topic: Data Management Plans (DMP):
Principles
Readings:
10 aspects of highly effective research
data
Week 6 (Oct. 11, 10:00 am - 12:30 pm) Topic: Data Management
Plans (DMP):
Tools
Readings: TBD
Activity 3: Write a DMP and post in GitHub
Week 7 (Oct. 18) Topic: Reading week
Readings: N/A
Week 8 (Oct. 25, 10:00 am - 12:30 pm)
Topic: Creating packages in R
and documenting
datasets
Readings:
Writing an R package from
scratch
R
Package Primer - A minimal
Example
R
Packages
Building R
Packages
Activity 4: Create a small package with a dataset
Week 9 (Nov. 1, 10:00 am - 12:30 pm) Topic: Documenting data
analysis and use of
RMarkdown
Readings:
Ten Simple Rules for Reproducible Computational
Research
Best Practices for Scientific
Computing
Activity 6: Create an R Makdown file with documented data analysis (a
vignette for your package)
Week 10 (Nov. 8, 10:00 am - 12:30 pm) Topic: Peer review and
collaboration
Readings: Review readings of Sessions 7 and 8
Activity 7: In-class activity peer reviewing packages, vignettes, and
revisions due in GitHub
Week 11 (Nov. 15, 10:00 am - 12:30 pm) Topic: {Rticles} and
practical issues preparing self-contained open research documents (math
notation and
figures)
Readings:
LaTeX for
Beginners
{ggplot2}: A Package for a Grammar of
Graphics
Activity: No activity this week
Week 12 (Nov. 22, 10:00 am - 12:30 pm) We need to discuss dates for the last two seminars: Antonio will be in Brussels on November 22, and possibly in Yunnan on November 29
Week 13 (Date TBD Nov. 29, 10:00 am - 12:30 pm) Topic: Package
Topic: {Rticles} and practical issues preparing self-contained open
research documents (tables and
citations)
Readings:
BibTeX
KableExtra for
HTML
KableExtra for
PDF
Activity: Final deliverable due on DATE TBD.
{macdown}: writing a thesis in R
markdown
Readings: No readings assigned
Footnotes
-
The University reserves the right to change any aspect of this course outline. ↩