From 83dc72b56ddd49de5c4f41d28b184aac1870b564 Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Thu, 25 Apr 2024 23:47:07 +0100 Subject: [PATCH 01/18] Define how to reference an RO-Crate --- docs/1.2-DRAFT/data-entities.md | 46 +++++++++++++++++++++++++++++++-- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/docs/1.2-DRAFT/data-entities.md b/docs/1.2-DRAFT/data-entities.md index 585e144b..4f169e1e 100644 --- a/docs/1.2-DRAFT/data-entities.md +++ b/docs/1.2-DRAFT/data-entities.md @@ -389,7 +389,31 @@ These can be included for File Data Entities as additional metadata, regardless ### Directories on the web; dataset distributions -A _Directory File Entry_ or [Dataset] identifier expressed as an absolute URL on the web can be harder to download than a [File] because it consists of multiple resources. It is RECOMMENDED that such directories have a complete listing of their content in [hasPart], enabling download traversal. +A _Directory File Entry_ or [Dataset] identifier expressed as an absolute URL on the web can be harder to download than a [File] because it consists of multiple resources. It is RECOMMENDED that such directories have a complete listing of their content in [hasPart], enabling download traversal, or are themselves RO-Crates. + +#### Referencing other RO-Crates + +A referenced RO-Crate is also a [Dataset], but where its [hasPart] do not need to be listed. Instead, its content and further metadata is available from its own RO-Crate Metadata File: + +```json +{ + "@id": "http://example.com/another-crate/", + "@type": "Dataset", + "conformsTo": { "@id": "https://w3id.org/ro/crate" }, + "subjectOf": { "@id": "http://example.com/another-crate/ro-crate-metadata.json" } +}, +{ + "@id": "http://example.com/another-crate/ro-crate-metadata.json", + "@type": "CreativeWork" +} +``` + +{.tip } +> The referenced RO-Crate metadata descriptor MUST NOT include its own `conformsTo` or be referenced with `about`, this is to avoid confusion with the referencing RO-Crate's own +[metadata descriptor](root-data-entity.md#ro-crate-metadata-descriptor). Likewise, the `conformsTo` on the referenced `Dataset` entity is version-less, as the referenced crate is free to use a different version of the RO-Crate specification. + +#### Downloadable dataset + Alternatively, a common mechanism to provide downloads of a reasonably sized directory is as an archive file in formats such as [`application/zip`](https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263) or [`application/gzip`](https://www.nationalarchives.gov.uk/PRONOM/x-fmt/266), described as a [DataDownload]. @@ -409,7 +433,25 @@ Alternatively, a common mechanism to provide downloads of a reasonably sized dir } ``` -Similarly, the _RO-Crate root_ entity may also provide a [distribution] URL, in which case the download SHOULD be an archive that contains the _RO-Crate Metadata Document_. +Similarly, the _RO-Crate root_ entity (or a reference to another RO-Crate as a `Dataset`) may provide a [distribution] URL, in which case the download SHOULD be an archive that contains the _RO-Crate Metadata Document_ (either directly in the archive's root, or within a single folder in the archive), indicated by a version-less `conformsTo`: + +```json + { + "@id": "./", + "@type": "Dataset", + "identifier": "https://doi.org/10.48546/workflowhub.workflow.775.1", + "url": "https://workflowhub.eu/workflows/775/ro_crate?version=1", + "name": "Research Object Crate for Jupyter Notebook Molecular Structure Checking", + "distribution": {"@id": "https://workflowhub.eu/workflows/775/ro_crate?version=1"}, + "…": "" + }, + { + "@id": "https://workflowhub.eu/workflows/775/ro_crate?version=1", + "@type": "DataDownload", + "encodingFormat": ["application/zip", {"@id": "https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263"}], + "conformsTo": { "@id": "https://w3id.org/ro/crate" } + } +``` In all cases, consumers should be aware that a `DataDownload` is a snapshot that may not reflect the current state of the `Dataset` or RO-Crate. From 46de12fbf2d542b69d24fd949d220a760070de3c Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Thu, 25 Apr 2024 23:52:52 +0100 Subject: [PATCH 02/18] clarify encodingFormat --- docs/1.2-DRAFT/data-entities.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/1.2-DRAFT/data-entities.md b/docs/1.2-DRAFT/data-entities.md index 4f169e1e..f9ea77d0 100644 --- a/docs/1.2-DRAFT/data-entities.md +++ b/docs/1.2-DRAFT/data-entities.md @@ -404,13 +404,13 @@ A referenced RO-Crate is also a [Dataset], but where its [hasPart] do not need t }, { "@id": "http://example.com/another-crate/ro-crate-metadata.json", - "@type": "CreativeWork" + "@type": "CreativeWork", + "encodingFormat": "application/ld+json" } ``` {.tip } -> The referenced RO-Crate metadata descriptor MUST NOT include its own `conformsTo` or be referenced with `about`, this is to avoid confusion with the referencing RO-Crate's own -[metadata descriptor](root-data-entity.md#ro-crate-metadata-descriptor). Likewise, the `conformsTo` on the referenced `Dataset` entity is version-less, as the referenced crate is free to use a different version of the RO-Crate specification. +> The referenced RO-Crate metadata descriptor SHOULD NOT include its own `conformsTo` or reference the dataset with `about`; this is to avoid confusion with the referencing RO-Crate's own [metadata descriptor](root-data-entity.md#ro-crate-metadata-descriptor). Likewise, the `conformsTo` on the referenced `Dataset` entity is version-less, as the referenced crate is free to self-declare a different version of the RO-Crate specification. #### Downloadable dataset From d465ee0e8f0b6baa56dc03f8b5be90de90c4fe57 Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Fri, 26 Apr 2024 00:29:34 +0100 Subject: [PATCH 03/18] Add Retrieving an RO-Crate section --- docs/1.2-DRAFT/data-entities.md | 58 +++++++++++++++++++++++++++++++-- 1 file changed, 56 insertions(+), 2 deletions(-) diff --git a/docs/1.2-DRAFT/data-entities.md b/docs/1.2-DRAFT/data-entities.md index f9ea77d0..46547dac 100644 --- a/docs/1.2-DRAFT/data-entities.md +++ b/docs/1.2-DRAFT/data-entities.md @@ -412,6 +412,27 @@ A referenced RO-Crate is also a [Dataset], but where its [hasPart] do not need t {.tip } > The referenced RO-Crate metadata descriptor SHOULD NOT include its own `conformsTo` or reference the dataset with `about`; this is to avoid confusion with the referencing RO-Crate's own [metadata descriptor](root-data-entity.md#ro-crate-metadata-descriptor). Likewise, the `conformsTo` on the referenced `Dataset` entity is version-less, as the referenced crate is free to self-declare a different version of the RO-Crate specification. +If the referenced crate conforms to a given [RO-Crate profile](profiles.md), this MAY be indicated by expanding `conformsTo` to an array: + +```json +{ + "@id": "https://doi.org/10.48546/workflowhub.workflow.26.1", + "@type": "Dataset", + "conformsTo": [ + { "@id": "https://w3id.org/ro/crate" }, + { "@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"} + ], + "subjectOf": { "@id": "https://workflowhub.eu/ga4gh/trs/v2/tools/26/versions/1/PLAIN_CWL/descriptor/ro-crate-metadata.json" } +}, +{ "@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0", + "@type": ["CreativeWork", "Profile"], + "name": "Workflow RO-Crate Profile", + "version": "1.0" +} +``` + + + #### Downloadable dataset @@ -440,7 +461,6 @@ Similarly, the _RO-Crate root_ entity (or a reference to another RO-Crate as a ` "@id": "./", "@type": "Dataset", "identifier": "https://doi.org/10.48546/workflowhub.workflow.775.1", - "url": "https://workflowhub.eu/workflows/775/ro_crate?version=1", "name": "Research Object Crate for Jupyter Notebook Molecular Structure Checking", "distribution": {"@id": "https://workflowhub.eu/workflows/775/ro_crate?version=1"}, "…": "" @@ -452,8 +472,42 @@ Similarly, the _RO-Crate root_ entity (or a reference to another RO-Crate as a ` "conformsTo": { "@id": "https://w3id.org/ro/crate" } } ``` - +profile="https://w3id.org/ro/crate" In all cases, consumers should be aware that a `DataDownload` is a snapshot that may not reflect the current state of the `Dataset` or RO-Crate. +#### Retrieving an RO-Crate + +To resolve a reference to an RO-Crate, but where `subjectOf` or `distribution` is unknown (e.g. an RO-Crate is cited from a journal article), the below approach is recommended to retrieve its [RO-Crate Metadata Document](root-data-entity.md#ro-crate-metadata-file-descriptor): + +1. Try [Signposting] after permalink redirects, looking for `Link` headers that reference `Link rel="describedby` for a _RO-Crate Metadata Document_, or `Link rel="item"` for a distribution archive -- in either case looking for a link with `profile="https://w3id.org/ro/crate"` for example: + +``` +curl --location --head https://doi.org/10.48546/workflowhub.workflow.120.5 + +HTTP/2 302 +Location: https://workflowhub.eu/workflows/120?version=5 + +HTTP/2 200 +Content-Type: text/html; charset=UTF-8 +Link: ; + rel="item" ; type="application/zip" ; + profile="https://w3id.org/ro/crate" +``` + +2. [HTTP Content-negotiation] for the [RO-Crate media type](appendix/jsonld.md#ro-crate-json-ld-media-type), for example: + +Requesting `https://w3id.org/workflowhub/workflow-ro-crate/1.0` with HTTP header + + `Accept: application/ld+json;profile=https://w3id.org/ro/crate` redirects to the _RO-Crate Metadata file_ + `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json` +3. The above approach may fail or returns a HTML page, e.g. for content-delivery networks that do not support content-negotiation. +4. An optional heuristic fallback is to try resolving the path `./ro-crate-metadata.json` from the _resolved_ URI (after permalink redirects). For example: +If permalink `https://w3id.org/workflowhub/workflow-ro-crate/1.0` redirects to `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/index.html` (a HTML page), then +try retrieving `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json`. +5. If the returned document (possibly extracted from an archive) is valid JSON and have a [root data entity](root-data-entity.md#finding-the-root-data-entity), this is the RO-Crate Metadata File. + +{.tip } +Some PID providers such as DataCite may respond to content-negotiation and provide their own JSON-LD, which do not describe an RO-Crate (the `profile=` was ignored). The use of Signposting allows the repository to explicitly provide the RO-Crate. + {% include references.liquid %} From 0969f4adc1e446a657a0a87bb1cf5757041c0a9f Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Fri, 26 Apr 2024 00:29:57 +0100 Subject: [PATCH 04/18] Reference generalized "How to retrieve a Profile Crate" --- docs/1.2-DRAFT/profiles.md | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/docs/1.2-DRAFT/profiles.md b/docs/1.2-DRAFT/profiles.md index a43e150a..adb21939 100644 --- a/docs/1.2-DRAFT/profiles.md +++ b/docs/1.2-DRAFT/profiles.md @@ -118,21 +118,10 @@ The rest of the requirements for being referenced as a contextual entity also ap ### How to retrieve a Profile Crate -To resolve a Profile URI to a machine-readable _Profile Crate_, two approaches are recommended to retrieve its [RO-Crate Metadata Document](root-data-entity.md#ro-crate-metadata-file-descriptor): +To resolve a Profile URI to a machine-readable _Profile Crate_, follow the approaches of [retrieving an RO-Crate](data-entities.md#retrieving-an-ro-crate). -1. [HTTP Content-negotiation] for the [RO-Crate media type](appendix/jsonld.md#ro-crate-json-ld-media-type), for example: +If none of these approaches worked, then this profile probably does not have a corresponding Profile Crate. For human display of conformed profiles, display a hyperlink to its `@id` Web page, described by its `name`. -Requesting `https://w3id.org/workflowhub/workflow-ro-crate/1.0` with HTTP header - - `Accept: application/ld+json;profile=https://w3id.org/ro/crate` redirects to the _RO-Crate Metadata file_ - `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json` - -2. The above approach may fail (or returns a HTML page), e.g. for content-delivery networks that do not support content-negotiation. The fallback is to try resolving the path `./ro-crate-metadata.json` from the _resolved_ URI (after permalink redirects). For example: -If permalink `https://w3id.org/workflowhub/workflow-ro-crate/1.0` redirects to `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/index.html` (a HTML page), then -try retrieving `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json` -3. If none of these approaches worked, then this profile probably does not have a corresponding Profile Crate. For humans, display a hyperlink to its `@id` described by its `name`. - - ### What is included in the Profile Crate? From 1744a20fe1092aa16bb3345a7acad87a32357d88 Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Fri, 26 Apr 2024 00:30:55 +0100 Subject: [PATCH 05/18] reference relocated how-to-retrieve section --- docs/1.2-DRAFT/root-data-entity.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/1.2-DRAFT/root-data-entity.md b/docs/1.2-DRAFT/root-data-entity.md index 90089317..3d7736b2 100644 --- a/docs/1.2-DRAFT/root-data-entity.md +++ b/docs/1.2-DRAFT/root-data-entity.md @@ -205,7 +205,7 @@ RO-Crates that have been assigned a _persistent identifier_ (e.g. a DOI) SHOULD #### Resolvable persistent identifiers and citation text -It is RECOMMENDED that resolving the `identifier` programmatically return the _RO-Crate Metadata Document_ or an archive (e.g. ZIP) that contain the _RO-Crate Metadata File_, using [content negotiation](profiles.md#how-to-retrieve-a-profile-crate) and/or [Signposting]. With an RO-Crate identifier that is persistant and resolvable in this way from a URI, the root data entity SHOULD indicate this using the `cite-as` property according to [RFC8574]. Likewise, an HTTP/HTTPS server of the resolved RO-Crate Metadata Document or archive (possibly after redirection) SHOULD indicate that persistent identifier in its [Signposting] headers using `Link rel="cite-as"`. +It is RECOMMENDED that resolving the `identifier` programmatically return the _RO-Crate Metadata Document_ or an archive (e.g. ZIP) that contain the _RO-Crate Metadata File_, using [content negotiation](data-entities.md#how-to-retrieve-a-profile-crate) and/or [Signposting]. With an RO-Crate identifier that is persistant and resolvable in this way from a URI, the root data entity SHOULD indicate this using the `cite-as` property according to [RFC8574]. Likewise, an HTTP/HTTPS server of the resolved RO-Crate Metadata Document or archive (possibly after redirection) SHOULD indicate that persistent identifier in its [Signposting] headers using `Link rel="cite-as"`. {: .tip} > The above `cite-as` MAY go to a repository landing page, and MAY require authentication, but MUST ultimately have the RO-Crate as a downloadable item, which SHOULD be programmatically accessible through content negotiation or [Signposting] (`Link rel="describedby"` for a _RO-Crate Metadata Document_, or `Link rel="item"` for an archive). To rather associate a textual scholarly citation for a crate (e.g. journal article), indicate instead a [publication via `citation` property](contextual-entities.md#publications-via-citation-property). From 394aeb7a9b84979fc897af0e8b78f7b768353635 Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Fri, 26 Apr 2024 00:39:55 +0100 Subject: [PATCH 06/18] allow nested ro-crate-metadata.json --- docs/1.2-DRAFT/data-entities.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/1.2-DRAFT/data-entities.md b/docs/1.2-DRAFT/data-entities.md index 46547dac..c8317c79 100644 --- a/docs/1.2-DRAFT/data-entities.md +++ b/docs/1.2-DRAFT/data-entities.md @@ -501,11 +501,12 @@ Requesting `https://w3id.org/workflowhub/workflow-ro-crate/1.0` with HTTP header `Accept: application/ld+json;profile=https://w3id.org/ro/crate` redirects to the _RO-Crate Metadata file_ `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json` -3. The above approach may fail or returns a HTML page, e.g. for content-delivery networks that do not support content-negotiation. +3. The above approaches may fail or returns a HTML page, e.g. for content-delivery networks that do not support content-negotiation. 4. An optional heuristic fallback is to try resolving the path `./ro-crate-metadata.json` from the _resolved_ URI (after permalink redirects). For example: If permalink `https://w3id.org/workflowhub/workflow-ro-crate/1.0` redirects to `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/index.html` (a HTML page), then try retrieving `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json`. -5. If the returned document (possibly extracted from an archive) is valid JSON and have a [root data entity](root-data-entity.md#finding-the-root-data-entity), this is the RO-Crate Metadata File. +5. If the retrieved resource is a ZIP file (`Content-Type: application/zip`), then extract `ro-crate-metadata.json`, or, if the archive root only contains a single folder (e.g. `folder1/`), extract `folder1/ro-crate-metadata.json` +6. If the returned/extracted document is valid JSON and have a [root data entity](root-data-entity.md#finding-the-root-data-entity), this is the RO-Crate Metadata File. {.tip } Some PID providers such as DataCite may respond to content-negotiation and provide their own JSON-LD, which do not describe an RO-Crate (the `profile=` was ignored). The use of Signposting allows the repository to explicitly provide the RO-Crate. From 783a4d8a0f9f5a403d5b40101bde8068c4f56671 Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Fri, 26 Apr 2024 00:43:39 +0100 Subject: [PATCH 07/18] also BagIt! --- docs/1.2-DRAFT/data-entities.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/1.2-DRAFT/data-entities.md b/docs/1.2-DRAFT/data-entities.md index c8317c79..b9638721 100644 --- a/docs/1.2-DRAFT/data-entities.md +++ b/docs/1.2-DRAFT/data-entities.md @@ -506,7 +506,8 @@ Requesting `https://w3id.org/workflowhub/workflow-ro-crate/1.0` with HTTP header If permalink `https://w3id.org/workflowhub/workflow-ro-crate/1.0` redirects to `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/index.html` (a HTML page), then try retrieving `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json`. 5. If the retrieved resource is a ZIP file (`Content-Type: application/zip`), then extract `ro-crate-metadata.json`, or, if the archive root only contains a single folder (e.g. `folder1/`), extract `folder1/ro-crate-metadata.json` -6. If the returned/extracted document is valid JSON and have a [root data entity](root-data-entity.md#finding-the-root-data-entity), this is the RO-Crate Metadata File. +6. If the retrieved resource is a [BagIt archive](appendix/implementation-notes.md#combining-with-other-packaging-schemes), e.g. containing a single folder `folder1` with `folder1/bagit.txt`, then extract and verify BagIt checksums before returning the bag's `data/ro-crate-metadata.json` +7. If the returned/extracted document is valid JSON and have a [root data entity](root-data-entity.md#finding-the-root-data-entity), this is the RO-Crate Metadata File. {.tip } Some PID providers such as DataCite may respond to content-negotiation and provide their own JSON-LD, which do not describe an RO-Crate (the `profile=` was ignored). The use of Signposting allows the repository to explicitly provide the RO-Crate. From 445344be09e1978845761cf6923dd881eaf18798 Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Thu, 11 Jul 2024 08:42:26 +0100 Subject: [PATCH 08/18] clarify where the metadata goes --- docs/_specification/1.2-DRAFT/data-entities.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/_specification/1.2-DRAFT/data-entities.md b/docs/_specification/1.2-DRAFT/data-entities.md index 05c3350b..61aa3e6b 100644 --- a/docs/_specification/1.2-DRAFT/data-entities.md +++ b/docs/_specification/1.2-DRAFT/data-entities.md @@ -400,7 +400,7 @@ A _Directory File Entry_ or [Dataset] identifier expressed as an absolute URL on #### Referencing other RO-Crates -A referenced RO-Crate is also a [Dataset], but where its [hasPart] do not need to be listed. Instead, its content and further metadata is available from its own RO-Crate Metadata File: +A referenced RO-Crate is also a [Dataset], but where its [hasPart] do not need to be listed. Instead, its content and further metadata is available from its own RO-Crate Metadata File. An RO-Crate that is referencing another crate `http://example.com/another-crate/` and metadata file `http://example.com/another-crate/ro-crate-metadata.json` will declare it as: ```json { @@ -438,6 +438,8 @@ If the referenced crate conforms to a given [RO-Crate profile](profiles), this M } ``` +{.note} +> The profile declaration of a referenced crate is a hint. Consumers should check `conformsTo` of the retrieved RO-Crate as it may have been updated after this RO-Crate. #### Downloadable dataset From e79b073be42411a92310647a862b3ef76328648a Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Thu, 11 Jul 2024 08:46:39 +0100 Subject: [PATCH 09/18] about ZIP file --- docs/_specification/1.2-DRAFT/data-entities.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/docs/_specification/1.2-DRAFT/data-entities.md b/docs/_specification/1.2-DRAFT/data-entities.md index 61aa3e6b..38a0484d 100644 --- a/docs/_specification/1.2-DRAFT/data-entities.md +++ b/docs/_specification/1.2-DRAFT/data-entities.md @@ -435,12 +435,18 @@ If the referenced crate conforms to a given [RO-Crate profile](profiles), this M "@type": ["CreativeWork", "Profile"], "name": "Workflow RO-Crate Profile", "version": "1.0" +}, +{ + "@id": "https://workflowhub.eu/ga4gh/trs/v2/tools/26/versions/1/PLAIN_CWL/descriptor/ro-crate-metadata.json", + "@type": "CreativeWork", + "encodingFormat": "application/ld+json" } ``` {.note} > The profile declaration of a referenced crate is a hint. Consumers should check `conformsTo` of the retrieved RO-Crate as it may have been updated after this RO-Crate. +If the RO-Crate metadata file is not available as a web resource, but only within an archive (e.g. ZIP), then instead reference it as a _Downloadable dataset_ (see below). #### Downloadable dataset @@ -481,7 +487,7 @@ Similarly, the _RO-Crate root_ entity (or a reference to another RO-Crate as a ` "conformsTo": { "@id": "https://w3id.org/ro/crate" } } ``` -profile="https://w3id.org/ro/crate" + In all cases, consumers should be aware that a `DataDownload` is a snapshot that may not reflect the current state of the `Dataset` or RO-Crate. From c08ee14bb94cf315013ba19a134f733ca5d8050f Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Thu, 11 Jul 2024 08:54:27 +0100 Subject: [PATCH 10/18] also about @id and identifier --- docs/_specification/1.2-DRAFT/data-entities.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/_specification/1.2-DRAFT/data-entities.md b/docs/_specification/1.2-DRAFT/data-entities.md index 38a0484d..54d04954 100644 --- a/docs/_specification/1.2-DRAFT/data-entities.md +++ b/docs/_specification/1.2-DRAFT/data-entities.md @@ -446,7 +446,10 @@ If the referenced crate conforms to a given [RO-Crate profile](profiles), this M {.note} > The profile declaration of a referenced crate is a hint. Consumers should check `conformsTo` of the retrieved RO-Crate as it may have been updated after this RO-Crate. -If the RO-Crate metadata file is not available as a web resource, but only within an archive (e.g. ZIP), then instead reference it as a _Downloadable dataset_ (see below). +{.tip} +> The `@id` of the referenced RO-Crate entity SHOULD be equal to the persistent identifier within its own metadata file (as `identifier` on its root entity), see [Root Data Entity identifier](root-data-entity#root-data-entity-identifier). See [Retrieving an RO-Crate](#retrieving-an-ro-crate) for how to retrieve from a persistent identifier if the `subjectOf` RO-Crate Metadata file is not retrievable. + +If the RO-Crate metadata file is not available as a web resource, but only within an archive (e.g. ZIP), then instead reference it as a [Downloadable dataset](#downloadable-dataset). #### Downloadable dataset From 9d9c6414712c7ff288a890085c8374dbcd635e12 Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Thu, 11 Jul 2024 16:42:47 +0100 Subject: [PATCH 11/18] fix internal anchor links --- docs/_specification/1.2-DRAFT/data-entities.md | 2 +- docs/_specification/1.2-DRAFT/root-data-entity.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/_specification/1.2-DRAFT/data-entities.md b/docs/_specification/1.2-DRAFT/data-entities.md index 54d04954..1523d275 100644 --- a/docs/_specification/1.2-DRAFT/data-entities.md +++ b/docs/_specification/1.2-DRAFT/data-entities.md @@ -496,7 +496,7 @@ In all cases, consumers should be aware that a `DataDownload` is a snapshot that #### Retrieving an RO-Crate -To resolve a reference to an RO-Crate, but where `subjectOf` or `distribution` is unknown (e.g. an RO-Crate is cited from a journal article), the below approach is recommended to retrieve its [RO-Crate Metadata Document](root-data-entity#ro-crate-metadata-file-descriptor): +To resolve a reference to an RO-Crate, but where `subjectOf` or `distribution` is unknown (e.g. an RO-Crate is cited from a journal article), the below approach is recommended to retrieve its [RO-Crate Metadata Document](structure#ro-crate-metadata-document-ro-crate-metadatajson): 1. Try [Signposting] after permalink redirects, looking for `Link` headers that reference `Link rel="describedby` for a _RO-Crate Metadata Document_, or `Link rel="item"` for a distribution archive -- in either case looking for a link with `profile="https://w3id.org/ro/crate"` for example: diff --git a/docs/_specification/1.2-DRAFT/root-data-entity.md b/docs/_specification/1.2-DRAFT/root-data-entity.md index fb5a79b5..7c491b67 100644 --- a/docs/_specification/1.2-DRAFT/root-data-entity.md +++ b/docs/_specification/1.2-DRAFT/root-data-entity.md @@ -207,7 +207,7 @@ RO-Crates that have been assigned a _persistent identifier_ (e.g. a DOI) SHOULD #### Resolvable persistent identifiers and citation text -It is RECOMMENDED that resolving the `identifier` programmatically return the _RO-Crate Metadata Document_ or an archive (e.g. ZIP) that contain the _RO-Crate Metadata File_, using [content negotiation](data-entities#how-to-retrieve-a-profile-crate) and/or [Signposting]. With an RO-Crate identifier that is persistant and resolvable in this way from a URI, the root data entity SHOULD indicate this using the `cite-as` property according to [RFC8574]. Likewise, an HTTP/HTTPS server of the resolved RO-Crate Metadata Document or archive (possibly after redirection) SHOULD indicate that persistent identifier in its [Signposting] headers using `Link rel="cite-as"`. +It is RECOMMENDED that resolving the `identifier` programmatically return the _RO-Crate Metadata Document_ or an archive (e.g. ZIP) that contain the _RO-Crate Metadata File_, using [content negotiation](data-entities#retrieving-an-ro-crate) and/or [Signposting]. With an RO-Crate identifier that is persistant and resolvable in this way from a URI, the root data entity SHOULD indicate this using the `cite-as` property according to [RFC8574]. Likewise, an HTTP/HTTPS server of the resolved RO-Crate Metadata Document or archive (possibly after redirection) SHOULD indicate that persistent identifier in its [Signposting] headers using `Link rel="cite-as"`. {: .tip} > The above `cite-as` MAY go to a repository landing page, and MAY require authentication, but MUST ultimately have the RO-Crate as a downloadable item, which SHOULD be programmatically accessible through content negotiation or [Signposting] (`Link rel="describedby"` for a _RO-Crate Metadata Document_, or `Link rel="item"` for an archive). To rather associate a textual scholarly citation for a crate (e.g. journal article), indicate instead a [publication via `citation` property](contextual-entities#publications-via-citation-property). From 0af8e563a69d2ab5d0cf1b8a69ad37c6f0214de9 Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Thu, 22 Aug 2024 21:26:34 +0100 Subject: [PATCH 12/18] Update data-entities.md --- docs/_specification/1.2-DRAFT/data-entities.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/_specification/1.2-DRAFT/data-entities.md b/docs/_specification/1.2-DRAFT/data-entities.md index 1523d275..8f30e0bd 100644 --- a/docs/_specification/1.2-DRAFT/data-entities.md +++ b/docs/_specification/1.2-DRAFT/data-entities.md @@ -400,7 +400,7 @@ A _Directory File Entry_ or [Dataset] identifier expressed as an absolute URL on #### Referencing other RO-Crates -A referenced RO-Crate is also a [Dataset], but where its [hasPart] do not need to be listed. Instead, its content and further metadata is available from its own RO-Crate Metadata File. An RO-Crate that is referencing another crate `http://example.com/another-crate/` and metadata file `http://example.com/another-crate/ro-crate-metadata.json` will declare it as: +A referenced RO-Crate is also a [Dataset], but where its [hasPart] do not need to be listed. Instead, its content and further metadata is available from its own RO-Crate Metadata Document. An RO-Crate that is referencing another crate `http://example.com/another-crate/` and metadata document `http://example.com/another-crate/ro-crate-metadata.json` will declare it as: ```json { @@ -447,9 +447,9 @@ If the referenced crate conforms to a given [RO-Crate profile](profiles), this M > The profile declaration of a referenced crate is a hint. Consumers should check `conformsTo` of the retrieved RO-Crate as it may have been updated after this RO-Crate. {.tip} -> The `@id` of the referenced RO-Crate entity SHOULD be equal to the persistent identifier within its own metadata file (as `identifier` on its root entity), see [Root Data Entity identifier](root-data-entity#root-data-entity-identifier). See [Retrieving an RO-Crate](#retrieving-an-ro-crate) for how to retrieve from a persistent identifier if the `subjectOf` RO-Crate Metadata file is not retrievable. +> The `@id` of the referenced RO-Crate entity SHOULD be equal to the persistent identifier within its own metadata file (as `identifier` on its root entity), see [Root Data Entity identifier](root-data-entity#root-data-entity-identifier). See [Retrieving an RO-Crate](#retrieving-an-ro-crate) for how to retrieve from a persistent identifier if the `subjectOf` RO-Crate Metadata Document is not retrievable. -If the RO-Crate metadata file is not available as a web resource, but only within an archive (e.g. ZIP), then instead reference it as a [Downloadable dataset](#downloadable-dataset). +If the RO-Crate Mtadata Document is not available as a web resource, but only within an archive (e.g. ZIP), then instead reference it as a [Downloadable dataset](#downloadable-dataset). #### Downloadable dataset From 2574205f8bec60471ec0f8684a4faa04b6e4a2a6 Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Thu, 22 Aug 2024 23:37:41 +0100 Subject: [PATCH 13/18] More algorithm! --- .../_specification/1.2-DRAFT/data-entities.md | 40 ++++++++++++++++--- 1 file changed, 34 insertions(+), 6 deletions(-) diff --git a/docs/_specification/1.2-DRAFT/data-entities.md b/docs/_specification/1.2-DRAFT/data-entities.md index 8f30e0bd..10a8772f 100644 --- a/docs/_specification/1.2-DRAFT/data-entities.md +++ b/docs/_specification/1.2-DRAFT/data-entities.md @@ -400,12 +400,41 @@ A _Directory File Entry_ or [Dataset] identifier expressed as an absolute URL on #### Referencing other RO-Crates -A referenced RO-Crate is also a [Dataset], but where its [hasPart] do not need to be listed. Instead, its content and further metadata is available from its own RO-Crate Metadata Document. An RO-Crate that is referencing another crate `http://example.com/another-crate/` and metadata document `http://example.com/another-crate/ro-crate-metadata.json` will declare it as: +A referenced RO-Crate is also a [Dataset] data entity, but where its [hasPart] do not need to be listed. Instead, its content and further metadata is available from its own RO-Crate Metadata Document, which may be retrieved or packaged within an archive. The referenced RO-Crate entity SHOULD have `conformsTo` pointing to the generic RO-Crate profile using the fixed URI `https://w3id.org/ro/crate`. + +This section defines how a _referencing_ RO-Crate ("A") can declare data entities within A's RO-Crate Metadata Document, in order to indicate a _referenced_ RO-Crate ("B"). There are different options on how to find the identifier to assign the referenced crate in A, and how a consumer of A finding such a reference can find the corresponding RO-Crate Metadata Document for B. + +If the referenced RO-Crate B has a `identifier` declared as B's [Root Data Entity identifier](root-data-entity#root-data-entity-identifier), then this is a _persistent identifier_ which SHOULD be used as the URI in the `@id` of the corresponding entity in RO-Crate A. For instance, if crate B had declared the identifier `http://example.com/another-crate/` then crate A can reference B as an entity: ```json { "@id": "http://example.com/another-crate/", "@type": "Dataset", + "conformsTo": { "@id": "https://w3id.org/ro/crate" } +} +``` + +{.tip } +> The `conformsTo` generic RO-Crate profile on a `Dataset` entity MUST be version-less. The referenced crate B is NOT required to conform to the same version of the RO-Crate specification as A's RO-Crate Metadata Document. + +{.warning } +> It is NOT RECOMMENDED to declare the generic profile `https://w3id.org/ro/crate` on a referencing crate A's own [root data entity](root-data-entity.html#direct-properties-of-the-root-data-entity), see [metadata descriptor](root-data-entity.html#ro-crate-metadata-descriptor). + +Consumers that find a reference to a `Dataset` with the generic RO-Crate profile indicated MAY attempt to resolve the persistent identifier, but SHOULD NOT assume that the `@id` directly resolves to an RO-Crate Metadata Document. See section [Retrieving an RO-Crate](#retrieving-an-ro-crate) below for the recommended algorithm. + +In some cases, if the referenced RO-Crate B has not got a resolvable `identifier` declared, additional steps are needed to find the correct crate URI: + +1. If RO-Crate A is an [attached](structure.html#attached-ro-crate) crate, and RO-Crate B is a nested folder (e.g. `another-crate/`), then B SHOULD be treated as an attached crate (e.g. it has `another-crate/ro-crate-metadata.json`) and the relative path (`another-crate/`) used directly as `@id` as a [Directory File Entity](#directory-file-entity) within crate A, adding the `conformsTo` as above. +2. If B's root data entity has an `@id` that is an absolute URI indicating a [detached crate](structure.html#detached-ro-crate)), and that URI resolves according to [Retrieving an RO-Crate](#retrieving-an-ro-crate), then that can be used as the `@id` of the `Dataset` entity in A, equivalent to the `identifier` case above. However, as that URI was not declared as a persistent identifier, the timestamp property [sdDatePublished] SHOULD be included to indicate when the absolute URL was accessed. +2. If B's RO-Crate Metadata Document was located on the Web, but uses a relative URI reference for its root data entity (`./`), then its absolute URI can be determined from the [RFC3986] algorithm for [establishing a base URI](https://datatracker.ietf.org/doc/html/rfc3986#section-5). For example, if root `{"@id": "./" }` is in metadata document `http://example.com/another-crate/ro-crate-metadata.json`, then the absolute URI for the `Dataset` entity is `http://example.com/another-crate/` (with the trailing `/`). If that URI is resolvable as in point 1, it can be used as equivalent `@id`. It is NOT RECOMMENDED to resolve a relative root identifier if the metadata document was retrieved from a URI that does not end with `/ro-crate-metadata.json` or `/ro-crate-metadata.jsonld` -- these are not part of a valid [attached](structure.html#attached-ro-crate) or [detached](structure.html#detached-ro-crate) RO-Crate. +4. If the RO-Crate is not on the Web, and does not have a persistent identifier, e.g. is within a ZIP file or local file system, then a non-resolvable identifier could be established. See appendix [Establishing a base URI inside a ZIP file](appendix/relative-uris.html#establishing-a-base-uri-inside-a-zip-file), e.g. `arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/` if using a randomly generated UUID. This method may also be used if the above steps fail for an RO-Crate Metadata Document that is on the Web. + +If a RO-Crate Metadata Document is known at a given URI, but its corresponding RO-Crate identifier can't be determined as above (e.g. [Retrieving an RO-Crate](#retrieving-an-ro-crate) fails), for instance because , then its RO-Crate Metadata Document SHOULD + +```json +{ + "@id": "arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/", + "@type": "Dataset", "conformsTo": { "@id": "https://w3id.org/ro/crate" }, "subjectOf": { "@id": "http://example.com/another-crate/ro-crate-metadata.json" } }, @@ -415,9 +444,9 @@ A referenced RO-Crate is also a [Dataset], but where its [hasPart] do not need t "encodingFormat": "application/ld+json" } ``` - {.tip } -> The referenced RO-Crate metadata descriptor SHOULD NOT include its own `conformsTo` or reference the dataset with `about`; this is to avoid confusion with the referencing RO-Crate's own [metadata descriptor](root-data-entity#ro-crate-metadata-descriptor). Likewise, the `conformsTo` on the referenced `Dataset` entity is version-less, as the referenced crate is free to self-declare a different version of the RO-Crate specification. +> The referenced RO-Crate metadata descriptor SHOULD NOT include its own `conformsTo` or reference the dataset with `about`; this is to avoid confusion with the referencing RO-Crate's own [metadata descriptor](root-data-entity#ro-crate-metadata-descriptor). + If the referenced crate conforms to a given [RO-Crate profile](profiles), this MAY be indicated by expanding `conformsTo` to an array: @@ -446,10 +475,9 @@ If the referenced crate conforms to a given [RO-Crate profile](profiles), this M {.note} > The profile declaration of a referenced crate is a hint. Consumers should check `conformsTo` of the retrieved RO-Crate as it may have been updated after this RO-Crate. -{.tip} -> The `@id` of the referenced RO-Crate entity SHOULD be equal to the persistent identifier within its own metadata file (as `identifier` on its root entity), see [Root Data Entity identifier](root-data-entity#root-data-entity-identifier). See [Retrieving an RO-Crate](#retrieving-an-ro-crate) for how to retrieve from a persistent identifier if the `subjectOf` RO-Crate Metadata Document is not retrievable. -If the RO-Crate Mtadata Document is not available as a web resource, but only within an archive (e.g. ZIP), then instead reference it as a [Downloadable dataset](#downloadable-dataset). + +If the RO-Crate Metadata Document is not available as a web resource, but only within an archive (e.g. ZIP), then instead reference it as a [Downloadable dataset](#downloadable-dataset). #### Downloadable dataset From 782eb729994f95ef4d1fc43cc09ad5390cf136ad Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Fri, 23 Aug 2024 00:18:15 +0100 Subject: [PATCH 14/18] Way too many details --- .../_specification/1.2-DRAFT/data-entities.md | 51 +++++++++++-------- 1 file changed, 29 insertions(+), 22 deletions(-) diff --git a/docs/_specification/1.2-DRAFT/data-entities.md b/docs/_specification/1.2-DRAFT/data-entities.md index 10a8772f..44211329 100644 --- a/docs/_specification/1.2-DRAFT/data-entities.md +++ b/docs/_specification/1.2-DRAFT/data-entities.md @@ -404,11 +404,13 @@ A referenced RO-Crate is also a [Dataset] data entity, but where its [hasPart] d This section defines how a _referencing_ RO-Crate ("A") can declare data entities within A's RO-Crate Metadata Document, in order to indicate a _referenced_ RO-Crate ("B"). There are different options on how to find the identifier to assign the referenced crate in A, and how a consumer of A finding such a reference can find the corresponding RO-Crate Metadata Document for B. -If the referenced RO-Crate B has a `identifier` declared as B's [Root Data Entity identifier](root-data-entity#root-data-entity-identifier), then this is a _persistent identifier_ which SHOULD be used as the URI in the `@id` of the corresponding entity in RO-Crate A. For instance, if crate B had declared the identifier `http://example.com/another-crate/` then crate A can reference B as an entity: +##### Referencing RO-Crates that have a persistent identifier + +If the referenced RO-Crate B has a `identifier` declared as B's [Root Data Entity identifier](root-data-entity#root-data-entity-identifier), then this is a _persistent identifier_ which SHOULD be used as the URI in the `@id` of the corresponding entity in RO-Crate A. For instance, if crate B had declared the identifier `https://pid.example.com/another-crate/` then crate A can reference B as an entity: ```json { - "@id": "http://example.com/another-crate/", + "@id": "https://pid.example.com/another-crate/", "@type": "Dataset", "conformsTo": { "@id": "https://w3id.org/ro/crate" } } @@ -422,18 +424,27 @@ If the referenced RO-Crate B has a `identifier` declared as B's [Root Data Entit Consumers that find a reference to a `Dataset` with the generic RO-Crate profile indicated MAY attempt to resolve the persistent identifier, but SHOULD NOT assume that the `@id` directly resolves to an RO-Crate Metadata Document. See section [Retrieving an RO-Crate](#retrieving-an-ro-crate) below for the recommended algorithm. -In some cases, if the referenced RO-Crate B has not got a resolvable `identifier` declared, additional steps are needed to find the correct crate URI: +If an `identifier` is not declared in a referenced RO-Crate B, but the determined absolute URI has [Signposting] declared for a `Link:` with `rel=cite-as`, then that link MAY be considered as an equivalent permalink for B. + + +##### Determening entity identifier for a referenced RO-Crate + +In some cases, if the referenced RO-Crate B has not got a resolvable `identifier` declared, additional steps are needed to find the correct `@id` to use: -1. If RO-Crate A is an [attached](structure.html#attached-ro-crate) crate, and RO-Crate B is a nested folder (e.g. `another-crate/`), then B SHOULD be treated as an attached crate (e.g. it has `another-crate/ro-crate-metadata.json`) and the relative path (`another-crate/`) used directly as `@id` as a [Directory File Entity](#directory-file-entity) within crate A, adding the `conformsTo` as above. +1. If RO-Crate A is an [attached](structure.html#attached-ro-crate) crate, and RO-Crate B is a nested folder (e.g. `another-crate/`), then B SHOULD be treated as an attached crate (e.g. it has `another-crate/ro-crate-metadata.json`) and the relative path (`another-crate/`) used directly as `@id` as a [Directory File Entity](#directory-file-entity) within crate A. 2. If B's root data entity has an `@id` that is an absolute URI indicating a [detached crate](structure.html#detached-ro-crate)), and that URI resolves according to [Retrieving an RO-Crate](#retrieving-an-ro-crate), then that can be used as the `@id` of the `Dataset` entity in A, equivalent to the `identifier` case above. However, as that URI was not declared as a persistent identifier, the timestamp property [sdDatePublished] SHOULD be included to indicate when the absolute URL was accessed. 2. If B's RO-Crate Metadata Document was located on the Web, but uses a relative URI reference for its root data entity (`./`), then its absolute URI can be determined from the [RFC3986] algorithm for [establishing a base URI](https://datatracker.ietf.org/doc/html/rfc3986#section-5). For example, if root `{"@id": "./" }` is in metadata document `http://example.com/another-crate/ro-crate-metadata.json`, then the absolute URI for the `Dataset` entity is `http://example.com/another-crate/` (with the trailing `/`). If that URI is resolvable as in point 1, it can be used as equivalent `@id`. It is NOT RECOMMENDED to resolve a relative root identifier if the metadata document was retrieved from a URI that does not end with `/ro-crate-metadata.json` or `/ro-crate-metadata.jsonld` -- these are not part of a valid [attached](structure.html#attached-ro-crate) or [detached](structure.html#detached-ro-crate) RO-Crate. -4. If the RO-Crate is not on the Web, and does not have a persistent identifier, e.g. is within a ZIP file or local file system, then a non-resolvable identifier could be established. See appendix [Establishing a base URI inside a ZIP file](appendix/relative-uris.html#establishing-a-base-uri-inside-a-zip-file), e.g. `arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/` if using a randomly generated UUID. This method may also be used if the above steps fail for an RO-Crate Metadata Document that is on the Web. +4. If the RO-Crate is not on the Web, and does not have a persistent identifier, e.g. is within a ZIP file or local file system, then a non-resolvable identifier could be established. See appendix [Establishing a base URI inside a ZIP file](appendix/relative-uris.html#establishing-a-base-uri-inside-a-zip-file), e.g. `arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/` if using a randomly generated UUID. This method may also be used if the above steps fail for an RO-Crate Metadata Document that is on the Web. In this case, the referenced RO-Crate entity MUST either declare a [referenced metadata document](#referencing-another-metadata-document) or [distribution](downloadable-dataset). -If a RO-Crate Metadata Document is known at a given URI, but its corresponding RO-Crate identifier can't be determined as above (e.g. [Retrieving an RO-Crate](#retrieving-an-ro-crate) fails), for instance because , then its RO-Crate Metadata Document SHOULD +If the RO-Crate Metadata Document is not available as a web resource, but only within an archive (e.g. ZIP), then instead reference it as a [Downloadable dataset](#downloadable-dataset). + +##### Referencing another metadata document + +If a referenced RO-Crate Metadata Document is known at a given URI or path, but its corresponding RO-Crate identifier can't be determined as above (e.g. [Retrieving an RO-Crate](#retrieving-an-ro-crate) fails or requires heuristics), then an referenced metadata descriptor entity SHOULD be added. For instance, if `http://example.com/another-crate/ro-crate-metadata.json` resolves to an RO-Crate Metadata Document describing root `./`, but `http://example.com/another-crate/` always return a HTML page without [Signposting] to the metadata document, then `subjectOf` SHOULD be added to an explicit metadata descriptor entity, which has `encodingFormat` declared for JSON-LD: ```json { - "@id": "arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/", + "@id": "http://example.com/another-crate/", "@type": "Dataset", "conformsTo": { "@id": "https://w3id.org/ro/crate" }, "subjectOf": { "@id": "http://example.com/another-crate/ro-crate-metadata.json" } @@ -441,14 +452,18 @@ If a RO-Crate Metadata Document is known at a given URI, but its corresponding R { "@id": "http://example.com/another-crate/ro-crate-metadata.json", "@type": "CreativeWork", - "encodingFormat": "application/ld+json" + "encodingFormat": "application/ld+json", + "sdDatePublished": "2024-08-22T23:57:03+01:00" } ``` + {.tip } -> The referenced RO-Crate metadata descriptor SHOULD NOT include its own `conformsTo` or reference the dataset with `about`; this is to avoid confusion with the referencing RO-Crate's own [metadata descriptor](root-data-entity#ro-crate-metadata-descriptor). +> Counter to [file format profile](data-entities.html#file-format-profiles) recommendations, the referenced RO-Crate metadata descriptor SHOULD NOT include its own `conformsTo` declarations to `https://w3id.org/ro/crate` or reference the dataset with `about`; this is to avoid confusion with the referencing RO-Crate's own [metadata descriptor](root-data-entity#ro-crate-metadata-descriptor). + +##### Profiles of referenced crates -If the referenced crate conforms to a given [RO-Crate profile](profiles), this MAY be indicated by expanding `conformsTo` to an array: +If the referenced crate conforms to a given [RO-Crate profile](profiles), this MAY be indicated by expanding `conformsTo` on the `Dataset` to an array to reference the profile as an contextual entity: ```json { @@ -457,28 +472,20 @@ If the referenced crate conforms to a given [RO-Crate profile](profiles), this M "conformsTo": [ { "@id": "https://w3id.org/ro/crate" }, { "@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"} - ], - "subjectOf": { "@id": "https://workflowhub.eu/ga4gh/trs/v2/tools/26/versions/1/PLAIN_CWL/descriptor/ro-crate-metadata.json" } + ] }, { "@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0", "@type": ["CreativeWork", "Profile"], "name": "Workflow RO-Crate Profile", "version": "1.0" -}, -{ - "@id": "https://workflowhub.eu/ga4gh/trs/v2/tools/26/versions/1/PLAIN_CWL/descriptor/ro-crate-metadata.json", - "@type": "CreativeWork", - "encodingFormat": "application/ld+json" } ``` {.note} -> The profile declaration of a referenced crate is a hint. Consumers should check `conformsTo` of the retrieved RO-Crate as it may have been updated after this RO-Crate. +> The profile declaration of a referenced crate is a hint. Consumers should check `conformsTo` as declared in the retrieved RO-Crate, as it may have been updated after this RO-Crate. -If the RO-Crate Metadata Document is not available as a web resource, but only within an archive (e.g. ZIP), then instead reference it as a [Downloadable dataset](#downloadable-dataset). - #### Downloadable dataset @@ -526,7 +533,7 @@ In all cases, consumers should be aware that a `DataDownload` is a snapshot that To resolve a reference to an RO-Crate, but where `subjectOf` or `distribution` is unknown (e.g. an RO-Crate is cited from a journal article), the below approach is recommended to retrieve its [RO-Crate Metadata Document](structure#ro-crate-metadata-document-ro-crate-metadatajson): -1. Try [Signposting] after permalink redirects, looking for `Link` headers that reference `Link rel="describedby` for a _RO-Crate Metadata Document_, or `Link rel="item"` for a distribution archive -- in either case looking for a link with `profile="https://w3id.org/ro/crate"` for example: +1. Assuming the URI is a permanlink, after following HTTP redirects without content negotiation, try [Signposting] to look for `Link` headers that reference `Link rel="describedby` for a _RO-Crate Metadata Document_, or `Link rel="item"` for a distribution archive -- in either case prefer a link with `profile="https://w3id.org/ro/crate"` declared. For example, signposting for `https://doi.org/10.48546/workflowhub.workflow.120.5` leads to the archive `https://workflowhub.eu/workflows/120/ro_crate?version=5` as: ``` curl --location --head https://doi.org/10.48546/workflowhub.workflow.120.5 @@ -547,7 +554,7 @@ Requesting `https://w3id.org/workflowhub/workflow-ro-crate/1.0` with HTTP header `Accept: application/ld+json;profile=https://w3id.org/ro/crate` redirects to the _RO-Crate Metadata file_ `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json` -3. The above approaches may fail or returns a HTML page, e.g. for content-delivery networks that do not support content-negotiation. +3. The above approaches may fail or return a HTML page, e.g. for content-delivery networks that do not support content-negotiation. 4. An optional heuristic fallback is to try resolving the path `./ro-crate-metadata.json` from the _resolved_ URI (after permalink redirects). For example: If permalink `https://w3id.org/workflowhub/workflow-ro-crate/1.0` redirects to `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/index.html` (a HTML page), then try retrieving `https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json`. From 3cc0c16a8d580a646284de06cb5cf006a8fb3cf3 Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Fri, 23 Aug 2024 00:21:38 +0100 Subject: [PATCH 15/18] #downloadable-dataset --- docs/_specification/1.2-DRAFT/data-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/_specification/1.2-DRAFT/data-entities.md b/docs/_specification/1.2-DRAFT/data-entities.md index 44211329..318440da 100644 --- a/docs/_specification/1.2-DRAFT/data-entities.md +++ b/docs/_specification/1.2-DRAFT/data-entities.md @@ -434,7 +434,7 @@ In some cases, if the referenced RO-Crate B has not got a resolvable `identifier 1. If RO-Crate A is an [attached](structure.html#attached-ro-crate) crate, and RO-Crate B is a nested folder (e.g. `another-crate/`), then B SHOULD be treated as an attached crate (e.g. it has `another-crate/ro-crate-metadata.json`) and the relative path (`another-crate/`) used directly as `@id` as a [Directory File Entity](#directory-file-entity) within crate A. 2. If B's root data entity has an `@id` that is an absolute URI indicating a [detached crate](structure.html#detached-ro-crate)), and that URI resolves according to [Retrieving an RO-Crate](#retrieving-an-ro-crate), then that can be used as the `@id` of the `Dataset` entity in A, equivalent to the `identifier` case above. However, as that URI was not declared as a persistent identifier, the timestamp property [sdDatePublished] SHOULD be included to indicate when the absolute URL was accessed. 2. If B's RO-Crate Metadata Document was located on the Web, but uses a relative URI reference for its root data entity (`./`), then its absolute URI can be determined from the [RFC3986] algorithm for [establishing a base URI](https://datatracker.ietf.org/doc/html/rfc3986#section-5). For example, if root `{"@id": "./" }` is in metadata document `http://example.com/another-crate/ro-crate-metadata.json`, then the absolute URI for the `Dataset` entity is `http://example.com/another-crate/` (with the trailing `/`). If that URI is resolvable as in point 1, it can be used as equivalent `@id`. It is NOT RECOMMENDED to resolve a relative root identifier if the metadata document was retrieved from a URI that does not end with `/ro-crate-metadata.json` or `/ro-crate-metadata.jsonld` -- these are not part of a valid [attached](structure.html#attached-ro-crate) or [detached](structure.html#detached-ro-crate) RO-Crate. -4. If the RO-Crate is not on the Web, and does not have a persistent identifier, e.g. is within a ZIP file or local file system, then a non-resolvable identifier could be established. See appendix [Establishing a base URI inside a ZIP file](appendix/relative-uris.html#establishing-a-base-uri-inside-a-zip-file), e.g. `arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/` if using a randomly generated UUID. This method may also be used if the above steps fail for an RO-Crate Metadata Document that is on the Web. In this case, the referenced RO-Crate entity MUST either declare a [referenced metadata document](#referencing-another-metadata-document) or [distribution](downloadable-dataset). +4. If the RO-Crate is not on the Web, and does not have a persistent identifier, e.g. is within a ZIP file or local file system, then a non-resolvable identifier could be established. See appendix [Establishing a base URI inside a ZIP file](appendix/relative-uris.html#establishing-a-base-uri-inside-a-zip-file), e.g. `arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/` if using a randomly generated UUID. This method may also be used if the above steps fail for an RO-Crate Metadata Document that is on the Web. In this case, the referenced RO-Crate entity MUST either declare a [referenced metadata document](#referencing-another-metadata-document) or [distribution](#downloadable-dataset). If the RO-Crate Metadata Document is not available as a web resource, but only within an archive (e.g. ZIP), then instead reference it as a [Downloadable dataset](#downloadable-dataset). From e51b1ea6851cc5e3946c2eab2a35da80f9b39031 Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Thu, 29 Aug 2024 10:59:02 +0100 Subject: [PATCH 16/18] typo --- docs/_specification/1.2-DRAFT/data-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/_specification/1.2-DRAFT/data-entities.md b/docs/_specification/1.2-DRAFT/data-entities.md index 318440da..b1cd0cda 100644 --- a/docs/_specification/1.2-DRAFT/data-entities.md +++ b/docs/_specification/1.2-DRAFT/data-entities.md @@ -406,7 +406,7 @@ This section defines how a _referencing_ RO-Crate ("A") can declare data entitie ##### Referencing RO-Crates that have a persistent identifier -If the referenced RO-Crate B has a `identifier` declared as B's [Root Data Entity identifier](root-data-entity#root-data-entity-identifier), then this is a _persistent identifier_ which SHOULD be used as the URI in the `@id` of the corresponding entity in RO-Crate A. For instance, if crate B had declared the identifier `https://pid.example.com/another-crate/` then crate A can reference B as an entity: +If the referenced RO-Crate B has an `identifier` declared as B's [Root Data Entity identifier](root-data-entity#root-data-entity-identifier), then this is a _persistent identifier_ which SHOULD be used as the URI in the `@id` of the corresponding entity in RO-Crate A. For instance, if crate B had declared the identifier `https://pid.example.com/another-crate/` then crate A can reference B as an entity: ```json { From ca3c1f5037ea4f287ff750ab9a2cf9b77521d792 Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Thu, 29 Aug 2024 10:59:58 +0100 Subject: [PATCH 17/18] Typo --- docs/_specification/1.2-DRAFT/data-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/_specification/1.2-DRAFT/data-entities.md b/docs/_specification/1.2-DRAFT/data-entities.md index b1cd0cda..9e276f83 100644 --- a/docs/_specification/1.2-DRAFT/data-entities.md +++ b/docs/_specification/1.2-DRAFT/data-entities.md @@ -427,7 +427,7 @@ Consumers that find a reference to a `Dataset` with the generic RO-Crate profile If an `identifier` is not declared in a referenced RO-Crate B, but the determined absolute URI has [Signposting] declared for a `Link:` with `rel=cite-as`, then that link MAY be considered as an equivalent permalink for B. -##### Determening entity identifier for a referenced RO-Crate +##### Determining entity identifier for a referenced RO-Crate In some cases, if the referenced RO-Crate B has not got a resolvable `identifier` declared, additional steps are needed to find the correct `@id` to use: From fedb14335fed07ded0dfb68e48d58072220ade53 Mon Sep 17 00:00:00 2001 From: Stian Soiland-Reyes Date: Thu, 29 Aug 2024 11:00:34 +0100 Subject: [PATCH 18/18] Typo --- docs/_specification/1.2-DRAFT/data-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/_specification/1.2-DRAFT/data-entities.md b/docs/_specification/1.2-DRAFT/data-entities.md index 9e276f83..f89277c5 100644 --- a/docs/_specification/1.2-DRAFT/data-entities.md +++ b/docs/_specification/1.2-DRAFT/data-entities.md @@ -440,7 +440,7 @@ If the RO-Crate Metadata Document is not available as a web resource, but only w ##### Referencing another metadata document -If a referenced RO-Crate Metadata Document is known at a given URI or path, but its corresponding RO-Crate identifier can't be determined as above (e.g. [Retrieving an RO-Crate](#retrieving-an-ro-crate) fails or requires heuristics), then an referenced metadata descriptor entity SHOULD be added. For instance, if `http://example.com/another-crate/ro-crate-metadata.json` resolves to an RO-Crate Metadata Document describing root `./`, but `http://example.com/another-crate/` always return a HTML page without [Signposting] to the metadata document, then `subjectOf` SHOULD be added to an explicit metadata descriptor entity, which has `encodingFormat` declared for JSON-LD: +If a referenced RO-Crate Metadata Document is known at a given URI or path, but its corresponding RO-Crate identifier can't be determined as above (e.g. [Retrieving an RO-Crate](#retrieving-an-ro-crate) fails or requires heuristics), then a referenced metadata descriptor entity SHOULD be added. For instance, if `http://example.com/another-crate/ro-crate-metadata.json` resolves to an RO-Crate Metadata Document describing root `./`, but `http://example.com/another-crate/` always return a HTML page without [Signposting] to the metadata document, then `subjectOf` SHOULD be added to an explicit metadata descriptor entity, which has `encodingFormat` declared for JSON-LD: ```json {