GEOG 712 Reproducible Research Workflow with GitHub and `R`

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

School of Earth, Environment and Society, McMaster University

Course Description¹

Scientific discovery is typically a collective process, as researchers build their work on the preceding efforts of other researchers. This is certainly the case for theory, empirical evidence, and methods, as empirical researchers use analytical techniques developed by methodologists, theoreticians build on up-to-date evidence, and data collection inspires new methods of analysis. The reality is that contemporary research is not possible in isolation. A key element of the web of research relationships is the basic unit of research output, which typically takes the form of a journal paper, book chapter, or report. This unit of output, however, represents only the face of a multilayered process, and by its very nature is limited in the amount of information that it can communicate.

Increasingly, the development of recent technologies makes it easier and less expensive to communicate with greater efficiency. From data repositories to supplementary e-content in journals, as well as data policy requirements of research funders, there is a strong incentive for research to become more open and reproducible. Reproducibility means that research results can be verified independently, including all relevant assumptions and decisions. Every figure, every table, and every result are open for inspection, including the processes used to generate them. Research reproducibility is essential to maintain trust in the process, and has numerous advantages, including accelerating discovery and reducing inequality in access to research tools and results. Furthermore, other researchers can more easily use methods and tools if they are open. Not surprisingly, as newer technologies facilitate the transfer of research findings (including open data, open software, and open publishing), there has been a growth of interest in ways of achieving openness and reproducibility.

The objective of this course is to equip students with the fundamental concepts and tools needed to develop a reproducible research workflow. The course should be of interest to new graduate students in the sciences and social sciences, and is relevant to research involving qualitative or quantitative data. The course is also appropriate for experienced researchers who would like to update their workflow to comply with reproducibility criteria.

The course covers the following topics:

Fundamentals of reproducible research
Basic tools for implementing a reproducible research workflow: GitHub and R
Data Management Plans
Creating basic units of shareable code
Documenting the process of doing research
Generating reproducible research documents

By the end of the course, the students will produce a report with all the necessary components to make it a unit of reproducible research. In the spirit of the course, resources and materials will be based on mostly open resources.

Instructors

Antonio Paez	Professor
	Office: GSB 236
	Office Hours: TBD
	Phone: (905) 525-9140, ext. 26099
	Email: [email protected]

Organization

The course will be organized in weekly 2-and-a-half-hour meetings. The format of the meetings will be a combination of seminar-style discussion, hands-on activities, and guest speakers. The topics and readings are found in the Course Schedule.

Readings and Resources

Students are responsible for completing the readings indicated in the Course Schedule. Any resources that are not open will be shared by the instructors.

Assessment

Students are assessed based on the completion of a sequence of activities. Note that the activities are designed to combine towards one final deliverable, so it is not advisable to skip any of them.


Activity 1: R Markdown Exercise	5%
Activity 2: First project	5%
Activity 3: Version Control Exercise	10%
Activity 4: DMP	10%
Activity 5: Data Package	15%
Activity 6: Data Analysis Documentation	15%
Activity 7: Peer Review Exercise	20%
Final Deliverable	20%

McMaster’s graduate grading system will be used. Note that according to section 2.5.3 of the Graduate Calendar passing grades are A+, A, A-, B+, B and B- only.

Academic Dishonesty

Academic dishonesty consists of misrepresentation by deception or by other fraudulent means and can result in serious consequences, e.g. the grade of zero on an assignment, loss of credit with a notation on the transcript (notation reads: “Grade of F assigned for academic dishonesty”), and/or suspension or expulsion from the university.

It is your responsibility to understand what constitutes academic dishonesty. For information on the various kinds of academic dishonesty please refer to the Academic Integrity Policy, specifically Appendix 3.

The following illustrates only three forms of academic dishonesty:

Plagiarism, e.g. the submission of work that is not one’s own or for which other credit has been obtained.
Improper collaboration in group work.
Copying or using unauthorized aids tests and examinations.

Course Schedule (September-December 2024)

Week 1 (Sept. 6, 10:00 am - 12:30 pm)
Topic: Course overview and introduction: Why reproducible research?
Readings: No readings this week
For discussion: Principles of open science, advantages, funding and policy environment, journal policies and the publication process, roadmap for course

Week 2 (Sept. 13, 10:00 am - 12:30 pm) Topic: R + RStudio + markdown
Suggested Readings:
What is R?
R for Data Science
What is Markdown
Activity 1: Use markdown to create a document with basic operations in R

Week 3 (Sept. 20, 10:00 am - 12:30 pm) Topic: Projects and Reproducible Environments
Readings:
Projects
{here}: a package for projet oriented workflows {renv}: a package for reproducible environments in R
Activity 2: Create a project with your proposed directory structure, and initialize a reproducible environment

Week 4 (Sept. 27, 10:00 am - 12:30 pm) Topic: Version Control and GitHub
Readings:
What is version control?
What is GitHub?
{gitcreds}: a package to query git credentials from R Activity 3: Post a README notice in GitHub and one document with basic operations in R

Week 5 (Oct. 4, 10:00 am - 12:30 pm)
Topic: Data Management Plans (DMP): Principles
Readings:
10 aspects of highly effective research data

Week 6 (Oct. 11, 10:00 am - 12:30 pm) Topic: Data Management Plans (DMP): Tools
Readings: TBD
Activity 3: Write a DMP and post in GitHub

Week 7 (Oct. 18) Topic: Reading week
Readings: N/A

Week 8 (Oct. 25, 10:00 am - 12:30 pm)
Topic: Creating packages in R and documenting datasets
Readings:
Writing an R package from scratch
R Package Primer - A minimal Example
R Packages
Building R Packages
Activity 4: Create a small package with a dataset

Week 9 (Nov. 1, 10:00 am - 12:30 pm) Topic: Documenting data analysis and use of RMarkdown
Readings:
Ten Simple Rules for Reproducible Computational Research
Best Practices for Scientific Computing
Activity 6: Create an R Makdown file with documented data analysis (a vignette for your package)

Week 10 (Nov. 8, 10:00 am - 12:30 pm) Topic: Peer review and collaboration
Readings: Review readings of Sessions 7 and 8
Activity 7: In-class activity peer reviewing packages, vignettes, and revisions due in GitHub

Week 11 (Nov. 15, 10:00 am - 12:30 pm) Topic: {Rticles} and practical issues preparing self-contained open research documents (math notation and figures)
Readings:
LaTeX for Beginners
{ggplot2}: A Package for a Grammar of Graphics
Activity: No activity this week

Week 12 (Nov. 22, 10:00 am - 12:30 pm) We need to discuss dates for the last two seminars: Antonio will be in Brussels on November 22, and possibly in Yunnan on November 29

Week 13 (Date TBD Nov. 29, 10:00 am - 12:30 pm) Topic: Package Topic: {Rticles} and practical issues preparing self-contained open research documents (tables and citations)
Readings:
BibTeX
KableExtra for HTML
KableExtra for PDF
Activity: Final deliverable due on DATE TBD.

{macdown}: writing a thesis in R markdown
Readings: No readings assigned

The University reserves the right to change any aspect of this course outline. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 294 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
Enrollment		Enrollment
Session-01-Why-Reproducible-Research		Session-01-Why-Reproducible-Research
Session-02-R-and-Markdown		Session-02-R-and-Markdown
Session-03-Projects-and-Reproducible-Environments		Session-03-Projects-and-Reproducible-Environments
Session-04-Git-and-GitHub		Session-04-Git-and-GitHub
Session-05-Research-Data-Management-Principles		Session-05-Research-Data-Management-Principles
Session-06-Data-Management-Plans		Session-06-Data-Management-Plans
Session-07-Creating-R-Packages-and-Documenting-Data		Session-07-Creating-R-Packages-and-Documenting-Data
Session-08-Documenting-Data-Analysis-with-RMarkdown		Session-08-Documenting-Data-Analysis-with-RMarkdown
Session-09-Peer-Review-and-Collaboration		Session-09-Peer-Review-and-Collaboration
Session-10-Rticles-Math-and-Figures		Session-10-Rticles-Math-and-Figures
Session-11-Rticles-Tables-and-Citations		Session-11-Rticles-Tables-and-Citations
Session-13-Extras		Session-13-Extras
_Session-07-Forensic-Issues-and-Archiving		_Session-07-Forensic-Issues-and-Archiving
renv		renv
.Rprofile		.Rprofile
.gitignore		.gitignore
LICENSE		LICENSE
Lect5_Revised		Lect5_Revised
README.Rmd		README.Rmd
README.md		README.md
Reproducible-Research-Workflow.Rproj		Reproducible-Research-Workflow.Rproj
cc_licences-1024x730.png		cc_licences-1024x730.png
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GEOG 712 Reproducible Research Workflow with GitHub and `R`

School of Earth, Environment and Society, McMaster University

Course Description¹

Instructors

Organization

Readings and Resources

Assessment

Academic Dishonesty

Course Schedule (September-December 2024)

About

Releases

Packages

Contributors 21

Languages

License

paezha/Reproducible-Research-Workflow

Folders and files

Latest commit

History

Repository files navigation

GEOG 712 Reproducible Research Workflow with GitHub and R

School of Earth, Environment and Society, McMaster University

Course Description1

Instructors

Organization

Readings and Resources

Assessment

Academic Dishonesty

Course Schedule (September-December 2024)

Footnotes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 21

Languages

GEOG 712 Reproducible Research Workflow with GitHub and `R`

Course Description¹

Packages