Skip to content

Notes and resources for graduate course GEOG 712 Reproducible Research Workflow with GitHub and R offered by the School of Earth, Environment and Society at McMaster University

License

Notifications You must be signed in to change notification settings

paezha/Reproducible-Research-Workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GEOG 712 Reproducible Research Workflow with GitHub and R

CC BY-SA 4.0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0

Course Description1

Scientific discovery is typically a collective process, as researchers build their work on the preceding efforts of other researchers. This is certainly the case for theory, empirical evidence, and methods, as empirical researchers use analytical techniques developed by methodologists, theoreticians build on up-to-date evidence, and data collection inspires new methods of analysis. The reality is that contemporary research is not possible in isolation. A key element of the web of research relationships is the basic unit of research output, which typically takes the form of a journal paper, book chapter, or report. This unit of output, however, represents only the face of a multilayered process, and by its very nature is limited in the amount of information that it can communicate.

Increasingly, the development of recent technologies makes it easier and less expensive to communicate with greater efficiency. From data repositories to supplementary e-content in journals, as well as data policy requirements of research funders, there is a strong incentive for research to become more open and reproducible. Reproducibility means that research results can be verified independently, including all relevant assumptions and decisions. Every figure, every table, and every result are open for inspection, including the processes used to generate them. Research reproducibility is essential to maintain trust in the process, and has numerous advantages, including accelerating discovery and reducing inequality in access to research tools and results. Furthermore, other researchers can more easily use methods and tools if they are open. Not surprisingly, as newer technologies facilitate the transfer of research findings (including open data, open software, and open publishing), there has been a growth of interest in ways of achieving openness and reproducibility.

The objective of this course is to equip students with the fundamental concepts and tools needed to develop a reproducible research workflow. The course should be of interest to new graduate students in the sciences and social sciences, and is relevant to research involving qualitative or quantitative data. The course is also appropriate for experienced researchers who would like to update their workflow to comply with reproducibility criteria.

The course covers the following topics:

  1. Fundamentals of reproducible research
  2. Basic tools for implementing a reproducible research workflow: GitHub and R
  3. Data Management Plans
  4. Creating basic units of shareable code
  5. Documenting the process of doing research
  6. Generating reproducible research documents

By the end of the course, the students will produce a report with all the necessary components to make it a unit of reproducible research. In the spirit of the course, resources and materials will be based on mostly open resources.

Instructors

Antonio Paez Professor
Office: GSB 236
Office Hours: TBD
Phone: (905) 525-9140, ext. 26099
Email: [email protected]

Organization

The course will be organized in weekly 2-and-a-half-hour meetings. The format of the meetings will be a combination of seminar-style discussion, hands-on activities, and guest speakers. The topics and readings are found in the Course Schedule.

Readings and Resources

Students are responsible for completing the readings indicated in the Course Schedule. Any resources that are not open will be shared by the instructors.

Assessment

Students are assessed based on the completion of a sequence of activities. Note that the activities are designed to combine towards one final deliverable, so it is not advisable to skip any of them.

Activity 1: R Markdown Exercise 5%
Activity 2: First project 5%
Activity 3: Version Control Exercise 10%
Activity 4: DMP 10%
Activity 5: Data Package 15%
Activity 6: Data Analysis Documentation 15%
Activity 7: Peer Review Exercise 20%
Final Deliverable 20%

McMaster’s graduate grading system will be used. Note that according to section 2.5.3 of the Graduate Calendar passing grades are A+, A, A-, B+, B and B- only.

Academic Dishonesty

Academic dishonesty consists of misrepresentation by deception or by other fraudulent means and can result in serious consequences, e.g. the grade of zero on an assignment, loss of credit with a notation on the transcript (notation reads: “Grade of F assigned for academic dishonesty”), and/or suspension or expulsion from the university.

It is your responsibility to understand what constitutes academic dishonesty. For information on the various kinds of academic dishonesty please refer to the Academic Integrity Policy, specifically Appendix 3.

The following illustrates only three forms of academic dishonesty:

  1. Plagiarism, e.g. the submission of work that is not one’s own or for which other credit has been obtained.

  2. Improper collaboration in group work.

  3. Copying or using unauthorized aids tests and examinations.

Course Schedule (September-December 2024)

Week 1 (Sept. 6, 10:00 am - 12:30 pm)
Topic: Course overview and introduction: Why reproducible research?
Readings: No readings this week
For discussion: Principles of open science, advantages, funding and policy environment, journal policies and the publication process, roadmap for course

Week 2 (Sept. 13, 10:00 am - 12:30 pm) Topic: R + RStudio + markdown
Suggested Readings:
What is R?
R for Data Science
What is Markdown
Activity 1: Use markdown to create a document with basic operations in R

Week 3 (Sept. 20, 10:00 am - 12:30 pm) Topic: Projects and Reproducible Environments
Readings:
Projects
{here}: a package for projet oriented workflows {renv}: a package for reproducible environments in R
Activity 2: Create a project with your proposed directory structure, and initialize a reproducible environment

Week 4 (Sept. 27, 10:00 am - 12:30 pm) Topic: Version Control and GitHub
Readings:
What is version control?
What is GitHub?
{gitcreds}: a package to query git credentials from R Activity 3: Post a README notice in GitHub and one document with basic operations in R

Week 5 (Oct. 4, 10:00 am - 12:30 pm)
Topic: Data Management Plans (DMP): Principles
Readings:
10 aspects of highly effective research data

Week 6 (Oct. 11, 10:00 am - 12:30 pm) Topic: Data Management Plans (DMP): Tools
Readings: TBD
Activity 3: Write a DMP and post in GitHub

Week 7 (Oct. 18) Topic: Reading week
Readings: N/A

Week 8 (Oct. 25, 10:00 am - 12:30 pm)
Topic: Creating packages in R and documenting datasets
Readings:
Writing an R package from scratch
R Package Primer - A minimal Example
R Packages
Building R Packages
Activity 4: Create a small package with a dataset

Week 9 (Nov. 1, 10:00 am - 12:30 pm) Topic: Documenting data analysis and use of RMarkdown
Readings:
Ten Simple Rules for Reproducible Computational Research
Best Practices for Scientific Computing
Activity 6: Create an R Makdown file with documented data analysis (a vignette for your package)

Week 10 (Nov. 8, 10:00 am - 12:30 pm) Topic: Peer review and collaboration
Readings: Review readings of Sessions 7 and 8
Activity 7: In-class activity peer reviewing packages, vignettes, and revisions due in GitHub

Week 11 (Nov. 15, 10:00 am - 12:30 pm) Topic: {Rticles} and practical issues preparing self-contained open research documents (math notation and figures)
Readings:
LaTeX for Beginners
{ggplot2}: A Package for a Grammar of Graphics
Activity: No activity this week

Week 12 (Nov. 22, 10:00 am - 12:30 pm) We need to discuss dates for the last two seminars: Antonio will be in Brussels on November 22, and possibly in Yunnan on November 29

Week 13 (Date TBD Nov. 29, 10:00 am - 12:30 pm) Topic: Package Topic: {Rticles} and practical issues preparing self-contained open research documents (tables and citations)
Readings:
BibTeX
KableExtra for HTML
KableExtra for PDF
Activity: Final deliverable due on DATE TBD.

{macdown}: writing a thesis in R markdown
Readings: No readings assigned

Footnotes

  1. The University reserves the right to change any aspect of this course outline.

About

Notes and resources for graduate course GEOG 712 Reproducible Research Workflow with GitHub and R offered by the School of Earth, Environment and Society at McMaster University

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages