Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement/add dmp resources #77

Merged
merged 6 commits into from
Dec 12, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 66 additions & 33 deletions data.Rmd
Original file line number Diff line number Diff line change
@@ -1,43 +1,43 @@
# Data Management

*Can the data be shared and published, and easily re-used in other analyses*?

- Create and maintain a [data management plan](https://dmptool.org/plans)
- Store data in simple, cross-compatible formats such as CSV files.
- Microsoft Excel can be a useful tool for data entry and organization, but
limit its use to that, and organize your data in a way that can be easily
exported.
- Metadata! Metadata! Document your data.
- For relational datasets you can create linked data on [Airtable](https://airtable.com/). For more information see \@ref(airtable)
- For data sets that cross multiple projects, create data-only project folders
for the master version. When these data sets are finalized, they can be
deposited in public or private data repositories such as
[figshare](https://figshare.com/) and [zenodo](https://zenodo.org/). In some
cases it makes sense for us to create data-only R packages for easily
distributing data internally and externally.
EcoHealth Alliance is committed to producing and promoting reliable and
reproducible research. In order to achieve this, we have to provide data
(and other research outputs) that non-team members can interpret and use; as well
as promote best practices for data management among collaborators. Ideally, the
framework for managing data laid out in this chapter will facilitate the creation
of high quality, share-able research outputs. By focusing on [Data Management
Plans](https://datamanagement.hms.harvard.edu/plan-design/data-management-plans) and the [dmptool](https://dmptool.org/plans), we can build on well
established workflows for producing high quality research outputs.

We aim to generally work in a **tidy data** framework. This approach to
structuring data makes interoperability between tools easier.

## Data Management Plan

*Data Management Plans* , also called *Outputs Management Plans* or *Data Management and Sharing Plans*, are living documents that help structure the creation and management of data throughout the lifecycle of a project. DMPs are flexible and do not force researchers to choose a particular technology set but rather ask probing questions about the mechanics and ethics of data use in research projects. Organizing data management in this way provides a common framework to think about data without requiring specific technologies be used in the research workflow. Furthermore, DMPs use reliable identifiers (URIs) to connect components of the research workflow, making long term data access more reliable.
*Data Management Plans* , also called *Outputs Management Plans* or *Data Management and Sharing Plans*, are living documents that help structure the creation and management of data throughout the lifecycle of a project. DMPs are flexible and do not force researchers to choose a particular technology set but rather ask probing questions about the mechanics and ethics of data use in research projects. Organizing data management in this way provides a common framework to think about data without requiring specific technologies be used in the research workflow. Furthermore, DMPs use stable identifiers (URIs) to connect components of the research workflow, making long term data access more reliable.

The majority of funders require a DMP; however, each funder has specific expectations
about what, when, and how research outputs should be shared. It is important you
and your collaborators understand those expectations before submitting a DMP. Its
equally important that all collaborators understand and agree to the obligations
created when submitting a DMP. Early communication between collaborators
is key to navigating differing expectations about data sharing from researchers
in different contexts.

![](assets/data_mgmg_plan.png)
*Data management plan as hub in knowledge management system*

**Important note on budgeting**:
Data management activities, but not necessarily infrastructure, are an allowable cost for most funding agencies (NIH, NSF, NASA). Gray areas include paying for hosting services and other infrastructure-like components of the DMP.

**Benefits of using a DMP**:

1. They are a funder requirement and you want funding
- NIH, NSF, NASA, Wellcome Trust, etc. require a DMP be submitted with a proposal.
2. They provide a scaffold for you to conceptualize data management for your project
1. They provide a scaffold for you to conceptualize data management for your project
- What data do you need to answer your research question, where will it come from, what resources are needed throughout the project lifecycle, what are the mechanics of managing the data?
3. They make it easier collaborate
2. They make it easier collaborate
- Defining responsibilities, Committing to using data standards, Documenting how the project works
4. They make it easier for your data to be reused
3. They make it easier for your data to be reused
- You get more citations, your effort contributes to knowledge creation in unexpected ways, your results become more reproducible
4. They are a funder requirement and you want funding
- NIH, NSF, NASA, Wellcome Trust, etc. require a DMP be submitted with a proposal.

**Components of a DMP**:

Expand All @@ -53,28 +53,24 @@ Data management activities, but not necessarily infrastructure, are an allowable
1. Its never too late to write a DMP
2. Data Management Plans are living documents that change with a project
3. DMPs are created collaboratively and stored in DMPTool.org
4. We ensure our DMPs meet EHA best practices for FAIR data and Reproducible Science
4. We ensure our DMPs meet EHA best practices for [FAIR data](https://www.go-fair.org/fair-principles/) and Reproducible Science
5. Collaborators, especially those from outside institutions, are full participants in the DMP process

### DMP Process Overview

0. [Create an account](https://dmptool.org/quick_start_guide) on DMPTool.org associated with EcoHealth Alliance
1. Identify Funder DMP requirements and `r params$data_librarian_appt` with the `r params$data_librarian`
2. Create a DMP using appropriate template given your funder. If no template is available or the funder has no requirements, use the EHA Minimal Data Management Plan. Add collaborators and complete as much of the plan as you can
3. Request feedback from the `r params$data_librarian`
4. Work with the `r params$data_librarian` to incorporate feedback
5. Export DMP for inclusion in grant

### Expectations by project phase

**Proposal/Pre-Award Phase**

- Look for and use Funder Requirements for DMPs. If no template exists, use this one or create one based on funder requirements.
- Look for funder requirements and use funder specific templates for DMPs. If no template exists, use the EHA Minimal Data Management Plan or create one based on funder requirements.
- Think about how you might make data Findable, Accessible, Interoperable and Reproducible (FAIR)
- Establish expectations for data sharing and outputs with collaborators and PIs. These discussions should begin early at the same time as discussing project responsibilities and budget.
- Consider what tools you will use throughout the lifecycle of your data 
- Consider how data collection, analysis and management tasks will be divided among collaborators
- Outline the ethical considerations for properly managing data in your project
- Ensure collaborators and PIs understand the commitments they are making via the DMP. Request and incorporate feedback from collaborators.
- `r params$data_librarian_appt` with the Data Librarian, create a timeline for proposal submission, and have a notion of tools and standards to use


**Post-Award/Early Phase**

- Review and update proposal DMP
Expand Down Expand Up @@ -112,9 +108,46 @@ Data management activities, but not necessarily infrastructure, are an allowable
- Use EHA institutional tags where possible e.g. [Zenodo Community](https://zenodo.org/communities/ecohealthalliance/?page=1&size=20)
- `r params$data_librarian_appt` with the `r params$data_librarian`

### Using DMPTool to create prepare your proposal data Management plan

0. [Create an account](https://dmptool.org/quick_start_guide) on DMPTool.org associated with EcoHealth Alliance
1. Identify Funder DMP requirements and `r params$data_librarian_appt` with the `r params$data_librarian`
2. Create a DMP using appropriate template given your funder. If no template is available or the funder has no requirements, use the EHA Minimal Data Management Plan. Add collaborators and complete as much of the plan as you can
3. Principle Investigators and Project Partners explicitly agree to abide by the DMP. All collaborators should fully understand and agree with the data sharing components of the plan before approving it.
3. Request feedback from the `r params$data_librarian`
4. Work with the `r params$data_librarian` to incorporate feedback
5. Export DMP for inclusion in grant

## Notes on data management
*Can the data be shared and published, and easily re-used in other analyses*?

- Create and maintain a [data management plan](https://dmptool.org/plans)
- Store data in simple, interoperable formats such as CSV files.
- Microsoft Excel can be a useful tool for data entry and organization, but
limit its use to that, and organize your data in a way that can be easily
exported.
- Metadata! Metadata! Document your data.
- For relational datasets you can create linked data on [Airtable](https://airtable.com/). For more information see \@ref(airtable)
- For data sets that cross multiple projects, create data-only project folders
for the master version. When these data sets are finalized, they can be
deposited in public or private data repositories such as
[figshare](https://figshare.com/) and [zenodo](https://zenodo.org/). In some
cases it makes sense for us to create data-only R packages for easily
distributing data internally and externally.

We aim to generally work in a **tidy data** framework. This approach to
structuring data makes interoperability between tools easier.



## Learn
- Watch M3 on [Data Management Plans](https://airtable.com/appwlxIzmQx5njRtQ/tbledVCO9MRKkK9MW/viwfFq11zdwCbBT83/recNVSuG2ApgfYkbl?blocks=hide)
- Read California Digital Library guidance on [Data Management Plans](https://dmptool.org/general_guidance)
- [Data Management Plan Skill Building](https://dataoneorg.github.io/Education/bp_step/plan/) from DataOne
- [NIH Data Sharing Guidance](https://sharing.nih.gov/data-management-and-sharing-policy)
- [NIH Data Sharing learning Resources](https://sharing.nih.gov/about/learning)
- [Condensed NIH DMSP Guidance Resources](https://osf.io/uadxr/)
- [NSF Bio DMP Guidance](https://www.nsf.gov/bio/biodmp.jsp)
- Read Hadley Wickham's [tidy data
paper](http://vita.had.co.nz/papers/tidy-data.pdf) for the general concept.
Note the *packages* in this paper are out of date, but the structures and
Expand Down