Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protistan plankton time series from the northern Salish Sea and Central Coast, British Columbia, Canada #93

Closed
23 of 27 tasks
timvdstap opened this issue Jul 9, 2024 · 16 comments
Assignees

Comments

@timvdstap
Copy link
Collaborator

timvdstap commented Jul 9, 2024

@JessyBarrette
@jdelbel

Hey Justin, I just noticed that this record is sitting in the 'submitted' phase -- I'm sorry I didn't have a look at it yet! I think a new issue wasn't created as it was a modifications of an existing, published record. I should be able to review it this week and get back to you with any comments and/or suggestions!

Best Practices Checklist

In General

  • No previous versions of this metadata record exist (eg for earlier versions of the data, if so update that record rather than creating a new one)

Data Identification

Dataset title:

  • No version information in the title
  • Frontloaded (with the most important information first)
  • Include the geographical region the data apply to
  • Short – aim for 60 characters including spaces
  • Does not include acronyms – put these in the keywords
  • Does not include the word “dataset”
  • Time series datasets should include “time series” at the end of the title

Abstract

  • Abbreviations have been expanded upon at first mention
  • Abstract describes how, when, what, where, why of data collection and is limited to no more than 500 words

DOI

  • A DOI has been drafted for this record
  • DOI has been updated via the form after review and changes to record
  • DOI has been manually edited on datacite fabrica
  • DOI status has been changed from Draft to Findable

Spatial

  • Ensure that Depth or Height Positive is correctly selected

Contact

  • ROR and ORCID(s) are included and linked properly where applicable
  • For datasets where DFO is a partner, ensure 'parent' ROR is added (https://ror.org/02qa1x782). DFO 'child' organizations (i.e. CHS) and their ROR are optional.
  • Include Hakai Institute as Publisher and include [email protected] as email
  • Make sure email address is provided if the role is 'Metadata Custodian' or 'Point of Contact'
  • Add contact affiliation where known including ROR
  • If resource is (partially) generated by Hakai researchers, include 'Tula Foundation' (with associated ROR) with 'Funder' role. Be sure to uncheck 'include in citation' for Tula Foundation.

Resources

  • Resource links go to specific dataset download (not generic platform like waterproperties.ca)
  • Readme, changelog, data dictionary, protocols included in data-package (for tabular text based data)
  • An archive folder, or other means, for older data versions is included in the data package if the version is not 1.0
  • Links work
  • All files in the data package can be opened and are not corrupt
  • No executable files in the data package. Files should be open formats and standards (.csv, .txt for example)
@jdelbel
Copy link

jdelbel commented Jul 9, 2024 via email

@timvdstap
Copy link
Collaborator Author

That's good to know, thanks Justin! @JessyBarrette we have to check why no issue was generated following this record submission, and also whether we've missed out on any others in the meantime.

@JessyBarrette
Copy link
Contributor

yeah good point

@timvdstap
Copy link
Collaborator Author

timvdstap commented Jul 12, 2024

Hey @jdelbel -- nice record! Some thoughts, suggestions, comments:

  • If the underlying data are publicly available, you can include the data when the data was published. If this version (1.4) was the initial publication of the data, then perhaps I'd argue that youre 'revision date' should instead be your 'publication date'.

  • Remove role "Publisher" from your contact details

  • Does Drew not have an ORCID? Can strongly recommend that they create an ORCID.

  • Include 'Hakai Institute' separately as contact, with role "Publisher".

  • Include 'Tula Foundation' separately as contact, with role "Funder". Do not include in citation.

  • I think a lot of your Primary Resources should be included under Lineage - this section describes any documents, sources or processing steps used to achieve your processed data. Do you have (a) link(s) to eg. a GitHub repository (or GitHub Release) that includes all the data? @JessyBarrette would you agree that methods papers and identification keys/guidelines are best included under Lineage (vs. Related Works). This item has made me realized we need to update our guidance text for Primary Resources.

  • The empty field in 'Lineage' currently can be removed.

  • Do the records in OBIS/GBIF contain the full dataset? If not, I would recommend using 'Has Part Of' to indicate that only a subset of the data is standardized to DwC. If full dataset, perhaps a better relationship would be "Is Original Form Of" @JessyBarrette do you have any thoughts on this?

@jdelbel
Copy link

jdelbel commented Jul 18, 2024

Thanks Tim.

I am confused on point #1 - The original dataset was published on OBIS on 2021-06-21. I recently revised the dataset adding more stations and years of data, which is the current revision date (v1.4, 2024-06-17). Are you saying I should use the original publication data (2021-06-21) here instead?

I moved my primary resources to lineage, but am unclear if they are added correctly. Please advise. The "scope" description and options are a bit confusing as the methods and taxonomic key documents are not "datasets".

I do not have a Github repository that includes all of the data.

@timvdstap
Copy link
Collaborator Author

timvdstap commented Jul 20, 2024

I am confused on point #1 - The original dataset was published on OBIS on 2021-06-21. I recently revised the dataset adding more stations and years of data, which is the current revision date (v1.4, 2024-06-17). Are you saying I should use the original publication data (2021-06-21) here instead?

My apologies - I had forgotten that the initial dataset was already previously published on OBIS. In that case, you can keep it the way it is or include the date for the original dataset publication under 'date published'. The revised data should be the one showing up in the citation (which is currently the case).

Another note:

  • Currently Hakai Institute is included within the 'body' of your citation, because you have also selected the role of 'Data Owner'. This is not necessarily wrong, but I just want to check whether you want the institute included in the citation. For example, compare:

image

to

image

In the second screenshot, I have created a separate entry for 'Hakai Institute' with role: Data Owner, but not having it appear in the citation. It's a bit cumbersome I agree, but that's the current workaround. Let me know if that's what you prefer.

  • You'll need to include a link to a primary resource. Right now you have 'Related Works' but don't specify the resource that they're in relation too (which would be the primary resource). If there is no 'full dataset' available, or if this metadata record is specific to the standardized data on OBIS, then I would include that URL as Primary Resource. The GBIF URL would then have as relationship type: 'Is Identical To'. Let me know if that makes sense.

I agree that the Lineage section is slightly confusing and is in need of improved documentation perhaps. I think selecting 'Dataset' as scope is not necessarily wrong, as applying the taxonomic keys in the processing results in the dataset. However I will create a separate ticket for this issue to discuss and get back to you. Feel free to chime in / keep track: https://github.com/HakaiInstitute/hakai-data/issues/186

@jdelbel
Copy link

jdelbel commented Jul 22, 2024

Thanks Tim. that all sounds good.

I added the date published.

Removed Hakai from the reference.

Put the OBIS URL as the primary resource, removed it from the related works and put 'Is Identitical To' for GBIF. Generally, I push to OBIS when I have completed years/projects and the OBIS standardization is the best output. As such, I think it's good/correct to use this as the primary resource.

Sounds good about lineage. I'm likely just not familiar with the accepted terminology around this.

I think I got everything, but let me know if there is anything else.

@JessyBarrette
Copy link
Contributor

JessyBarrette commented Jul 22, 2024 via email

@timvdstap
Copy link
Collaborator Author

timvdstap commented Jul 23, 2024

Put the OBIS URL as the primary resource, removed it from the related works and put 'Is Identitical To' for GBIF. Generally, I push to OBIS when I have completed years/projects and the OBIS standardization is the best output. As such, I think it's good/correct to use this as the primary resource.

Sounds good @jdelbel - I think that for now this will work just fine! Just as an FYI, and as an extention to Jessy's comment above - wherever possible we want to link external data/metadata in different repositories back to records in the Hakai Catalogue so that we can have an accurate list of Hakai holdings, if - for whatever reason - the location of external data holdings changes. This way we can appropriately represent ownership and credit. What this means is that current preferred practice is that Hakai-owned data, along with any documentation and scripts, are stored in an institutional-level repository (such as GitHub). So in the event that you do create a GitHub repository to store your code, data and documentation -- which would be recommended especially given the ongoing timeseries nature of this record -- we can modify your Primary Resource to point to the GitHub repository/release, and add the URL to OBIS as a Related Works.

Sounds good about lineage. I'm likely just not familiar with the accepted terminology around this.

No worries, I myself think it's a tricky concept and something that has only recently been implemented in the metadata form. In your case, entries 2 through 5 appear to be related to the detection step in this workflow, so should be entered as the processing steps / methods under the first entry, rather than as separate Lineage entries.

@timvdstap
Copy link
Collaborator Author

No worries, I myself think it's a tricky concept and something that has only recently been implemented in the metadata form. In your case, entries 2 through 5 appear to be related to the detection step in this workflow, so should be entered as the processing steps / methods under the first entry, rather than as separate Lineage entries.

I can make this change for you btw if you like/agree with this approach @jdelbel -- I think it's the final piece of this puzzle before the record can be published :)

@timvdstap
Copy link
Collaborator Author

Hey @jdelbel just pinging you to see if you're OK with the approach suggested above regarding the processing steps/methods. Link to your record submission is here.

@jdelbel
Copy link

jdelbel commented Sep 10, 2024

@timvdstap Sorry Tim, I got pulled away with GEM work and then holiday. That sounds good to me and if you can make the change the would be awesome. Let me know if this is still possible.

I can definitely publish my scripts via GitHub. I will need to do some work to clean everything up and provide proper documentation. I may have some questions around how to best set up the repository for public viewership. Unfortunately, this will need to wait until early October following the GEM workshop.

@timvdstap
Copy link
Collaborator Author

timvdstap commented Oct 10, 2024

Hey @jdelbel I've made those changes to the Lineage section, let me know what you think: https://hakaiinstitute.github.io/hakai-metadata-entry-form/#/en/hakai/7U7b8oPpeTN6gjvXlUCTGJr5pga2/-Nh8VlPQ_m681tso87Gd.

I've also made 2 slight changes to the Contact section:

  • I created 2 separate entries for the 'Hakai Institute' as contact. For one, Hakai Institute is listed as 'Data Owner' but will not appear in the citation. For the other, Hakai Institute is listed as Publisher, and included in the citation. This is the workaround we currently have for ensuring Hakai Institute is listed as publishing organization without being included in the list of authors.
  • I added Drew's email as their role is a 'Point of Contact'.

If you agree with these changes then I think we can go ahead with publishing this record. Moving forward, ideally we're looking for the Primary Resource to link to the full data package (preferrably on GitHub) - should any (subset of the) data also be published on a global repository, such as OBIS / GBIF in this case, then that would be added as Related Work. This way we would always have a copy of the data package in our institutional data repository. Should you create a data package and include it in a GitHub repository down the road we can update the record accordingly.

@jdelbel
Copy link

jdelbel commented Oct 10, 2024

I changed the citation order to Wiley being second in the citation. Otherwise, looks great to me.

100% on the GitHub data package. I will work on this.

@timvdstap
Copy link
Collaborator Author

Sounds good - I have updated the record in DataCite and published the record to the catalogue. Should show up there within 24 hours, at which point I'll close this issue :)

@timvdstap
Copy link
Collaborator Author

Record published in the catalogue: https://doi.org/10.21966/jv5k-3k59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants