A tutorial on data-management basics using OpenRefine. Adapted from a 2017 Data Managment workshop designed by Rachael Starry and the tri-cods tidy-data workshop.
As scholars seeking to use digital methods in our research, we need to understand how we can make both quantitative and qualitative data machine-readable and machine-usable. But what is data anyway? What is a dataset? Where can we find raw data? How do we know whether our data is "clean" or "dirty"? How can we clean "dirty" data? These are a few of the questions which this workshop sets out to address.
- Understand the structure and appearance of datasets
- Discover where and how to find data online
- Apply learned concepts in order to clean raw datasets in OpenRefine
- OpenRefine documentation
- Cleaning Data with OpenRefine lesson from Programming Historian
- "Tidy Data" by Hadley Wickham
- The European Spreadsheet Risks Interest Group for a curious and comic collection of messy data horror stories