Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Update the CodeMeta Metadata Block to add some more structure for machine actionability #10859

Open
doigl opened this issue Sep 18, 2024 · 5 comments · May be fixed by #11087
Open
Labels

Comments

@doigl
Copy link
Contributor

doigl commented Sep 18, 2024

Overview of the Suggestion
Actually, the fields MemoryRequirements and ProcessorRequirements and StorageRequirements are just free text fields, what makes it difficult to use them in an automated process to provide the right resources for running a jupyter notebook or a container. Adding subfields to these fields with controlled vocabularies would it make it easier to differentiate between different types and identify the right amount of resources like memory.

Also, as @poikilotherm mentioned, the CodeMeta Scheme is now available in version 3 and it could be worth a look, if we want to also add some of the new fields (code reviews) in the metadata block.

What kind of user is the suggestion intended for?
(Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin)
User, Sysadmin

What inspired this idea?
Two different things:

  • We have a dataset, where a user tried to add two different types of memory requirements to a research software (RAM and GPU memory) and we expect this to happen more often in the future
  • We want to connect our dataverse instance to a Jupyter Hub as an external tool to allow for an interactive exploration of published Jupyter Notebooks. In this process, we have to decide, which ressources the machine should provide, that will run the notebook.

What existing behavior do you want changed?
Adding structured subfields and controlled vocabularies at least for the fields memoryRequirements, processorRequirements and storageRequirements. Make the memoryRequirements field multiple to allow different types of memory. We are open to discuss changes also for other fields and think about adding new version 3 fields to the block (do we need software reviews?).

Any brand new behavior do you want to add to Dataverse?
Also interesting would be a CodeMeta-Export that then puts the structured fields again together to be compatible with the CodeMeta standard. And we would have to adjust our GitHub-Action to import the information from codemeta files in Git-Repos into Dataverse datasets.

Any open or closed issues related to this suggestion?

Are you thinking about creating a pull request for this issue?
Help is always welcome, is this idea something you or your organization plan to implement?
We would be happy to provide a suggestion for an updated tsv of the codemeta block, but would also be very interested in the opinion and the requirements of the community, and perhaps especially from @jggautier and @pdurbin

@doigl doigl added the Type: Suggestion an idea label Sep 18, 2024
@pdurbin
Copy link
Member

pdurbin commented Sep 18, 2024

We would be happy to provide a suggestion for an updated tsv of the codemeta block

If you're willing to produce an updated tsv, I'd be happy to look at it!

On a related note, as of Dataverse 6.4, you'll be able to designate the "type" of a dataset as software. Please see:

@pdurbin
Copy link
Member

pdurbin commented Oct 21, 2024

These's a task under IQSS/dataverse-pm#174 to support CodeMeta and I just added a subtask to look at this issue and consider upgrading to v3 of CodeMeta first. Pull requests welcome, of course! 😄 ❤️

@pdurbin
Copy link
Member

pdurbin commented Oct 28, 2024

@doigl @poikilotherm and others, as I work on this issue...

... I'm wondering if I should promote codemeta.tsv as it exists now, in 6.4, in tests and explanations of the feature or if I should use computational_workflow.tsv which as far as I know, doesn't have any planned updates.

Basically, I'll pick one or the other to explain the feature of associating a dataset type such as "software" with a metadata block such as CodeMeta or Computational Workflow.

I'm a little nervous about promoting CodeMeta much in its current form, since it sounds like it's likely to change. So maybe I'll go with Computational Workflow. 🤷

@doigl
Copy link
Contributor Author

doigl commented Oct 29, 2024

@pdurbin: sorry for the late answer and the missing pull request so far (too much other things on the plate). Wouldn't be Computational Workflow a good metadata block for workflows and CodeMeta a good one for software? But I have to admit, that I do not really have a clear understanding about the difference between the two types workflow and software.

The main changes in version 3 are - as far as I know - the review/reviewBody/reviewAspect fields, a start and end date, the hasSourceCode/isSourceCodeOf relations and the renaming of continousIntegration and embargoEndDate.

While it would be really great to have the possibility to link to external reviews for software and for data (perhaps in form of badges), I would not see this feature in the software metadata/codemeta block, because this is important for datasets (and workflows?) as well.

The relations between source code and application could be implemented in the "Related Materials" in Citation, if we would have there also the relation types.

And so far, we do not have a use case for a start and end date for software.

What do you mean @pdurbin , @poikilotherm ?

@pdurbin
Copy link
Member

pdurbin commented Oct 29, 2024

@doigl thanks, yes, CodeMeta for software makes sense, of course. I'm playing with the codeMeta20 block right now. One thing I observe about both codeMeta20 and computationalworkflow is that both have fields with displayoncreate set to TRUE, which other metadatablocks don't have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment