Skip to content

Commit

Permalink
Merge branch 'develop' into 10517-dataset-types #10517
Browse files Browse the repository at this point in the history
Conflicts:
src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java
tests/integration-tests.txt
  • Loading branch information
pdurbin committed Aug 20, 2024
2 parents 67e9971 + cf174b2 commit 6be46c6
Show file tree
Hide file tree
Showing 25 changed files with 1,021 additions and 133 deletions.
3 changes: 3 additions & 0 deletions doc/release-notes/10169-JSON-schema-validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
### Improved JSON Schema validation for datasets

Enhanced JSON schema validation with checks for required and allowed child objects, type checking for field types including `primitive`, `compound` and `controlledVocabulary`. More user-friendly error messages to help pinpoint the issues in the dataset JSON. See [Retrieve a Dataset JSON Schema for a Collection](https://guides.dataverse.org/en/6.3/api/native-api.html#retrieve-a-dataset-json-schema-for-a-collection) in the API Guide and PR #10543.
3 changes: 3 additions & 0 deletions doc/release-notes/10726-dataverse-facets-api-extension.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
New optional query parameter "returnDetails" added to "dataverses/{identifier}/facets/" endpoint to include detailed information of each DataverseFacet.

New endpoint "datasetfields/facetables" that lists all facetable dataset fields defined in the installation.
9 changes: 9 additions & 0 deletions doc/release-notes/7068-reserve-file-pids.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## Release Highlights

### Pre-Publish File DOI Reservation with DataCite

Dataverse installations using DataCite (or other persistent identifier (PID) Providers that support reserving PIDs) will be able to reserve PIDs for files when they are uploaded (rather than at publication time). Note that reserving file DOIs can slow uploads with large numbers of files so administrators may need to adjust timeouts (specifically any Apache "``ProxyPass / ajp://localhost:8009/ timeout=``" setting in the recommended Dataverse configuration).

## Major Use Cases

- Users will have DOIs/PIDs reserved for their files as part of file upload instead of at publication time. (Issue #7068, PR #7334)
62 changes: 56 additions & 6 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,22 @@ The fully expanded example above (without environment variables) looks like this
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/dataverses/root/facets"
By default, this endpoint will return an array including the facet names. If more detailed information is needed, we can set the query parameter ``returnDetails`` to ``true``, which will return the display name and id in addition to the name for each facet:

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=root
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/dataverses/$ID/facets?returnDetails=true"
The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/dataverses/root/facets?returnDetails=true"
Set Facets for a Dataverse Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -566,9 +582,7 @@ The fully expanded example above (without environment variables) looks like this
Retrieve a Dataset JSON Schema for a Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Retrieves a JSON schema customized for a given collection in order to validate a dataset JSON file prior to creating the dataset. This
first version of the schema only includes required elements and fields. In the future we plan to improve the schema by adding controlled
vocabulary and more robust dataset field format testing:
Retrieves a JSON schema customized for a given collection in order to validate a dataset JSON file prior to creating the dataset:

.. code-block:: bash
Expand All @@ -593,8 +607,22 @@ While it is recommended to download a copy of the JSON Schema from the collectio
Validate Dataset JSON File for a Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Validates a dataset JSON file customized for a given collection prior to creating the dataset. The validation only tests for json formatting
and the presence of required elements:
Validates a dataset JSON file customized for a given collection prior to creating the dataset.

The validation tests for:

- JSON formatting
- required fields
- typeClass must follow these rules:

- if multiple = true then value must be a list
- if typeClass = ``primitive`` the value object is a String or a List of Strings depending on the multiple flag
- if typeClass = ``compound`` the value object is a FieldDTO or a List of FieldDTOs depending on the multiple flag
- if typeClass = ``controlledVocabulary`` the values are checked against the list of allowed values stored in the database
- typeName validations (child objects with their required and allowed typeNames are configured automatically by the database schema). Examples include:

- dsDescription validation includes checks for typeName = ``dsDescriptionValue`` (required) and ``dsDescriptionDate`` (optional)
- datasetContact validation includes checks for typeName = ``datasetContactName`` (required) and ``datasetContactEmail``; ``datasetContactAffiliation`` (optional)

.. code-block:: bash
Expand Down Expand Up @@ -4826,6 +4854,28 @@ The fully expanded example above (without environment variables) looks like this
curl "https://demo.dataverse.org/api/metadatablocks/citation"
.. _dataset-fields-api:
Dataset Fields
--------------
List All Facetable Dataset Fields
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
List all facetable dataset fields defined in the installation.
.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
curl "$SERVER_URL/api/datasetfields/facetables"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl "https://demo.dataverse.org/api/datasetfields/facetables"
.. _Notifications:
Notifications
Expand Down Expand Up @@ -5242,7 +5292,7 @@ The fully expanded example above (without environment variables) looks like this
Reserve a PID
~~~~~~~~~~~~~
Reserved a PID for a dataset. A superuser API token is required.
Reserve a PID for a dataset if not yet registered, and, if FilePIDs are enabled, reserve any file PIDs that are not yet registered. A superuser API token is required.
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.
Expand Down
102 changes: 102 additions & 0 deletions scripts/search/tests/data/dataset-finch3.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
{
"datasetVersion": {
"license": {
"name": "CC0 1.0",
"uri": "http://creativecommons.org/publicdomain/zero/1.0"
},
"metadataBlocks": {
"citation": {
"fields": [
{
"value": "HTML & More",
"typeClass": "primitive",
"multiple": false,
"typeName": "title"
},
{
"value": [
{
"authorName": {
"value": "Markup, Marty",
"typeClass": "primitive",
"multiple": false,
"typeName": "authorName"
},
"authorAffiliation": {
"value": "W4C",
"typeClass": "primitive",
"multiple": false,
"typeName": "authorAffiliation"
}
}
],
"typeClass": "compound",
"multiple": true,
"typeName": "author"
},
{
"value": [
{
"datasetContactEmail": {
"typeClass": "primitive",
"multiple": false,
"typeName": "datasetContactEmail",
"value": "[email protected]"
},
"datasetContactName": {
"typeClass": "primitive",
"multiple": false,
"typeName": "datasetContactName",
"value": "Markup, Marty"
}
}
],
"typeClass": "compound",
"multiple": true,
"typeName": "datasetContact"
},
{
"value": [
{
"dsDescriptionValue": {
"value": "BEGIN<br></br>END",
"multiple": false,
"typeClass": "primitive",
"typeName": "dsDescriptionValue"
},
"dsDescriptionDate": {
"typeName": "dsDescriptionDate",
"multiple": false,
"typeClass": "primitive",
"value": "2021-07-13"
}
}
],
"typeClass": "compound",
"multiple": true,
"typeName": "dsDescription"
},
{
"value": [
"Medicine, Health and Life Sciences"
],
"typeClass": "controlledVocabulary",
"multiple": true,
"typeName": "subject"
},
{
"typeName": "language",
"multiple": true,
"typeClass": "controlledVocabulary",
"value": [
"English",
"Afar",
"aar"
]
}
],
"displayName": "Citation Metadata"
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -1248,14 +1248,6 @@ public List<Long> selectFilesWithMissingOriginalSizes() {
}


/**
* Check that a identifier entered by the user is unique (not currently used
* for any other study in this Dataverse Network). Also check for duplicate
* in the remote PID service if needed
* @param datafileId
* @param storageLocation
* @return {@code true} iff the global identifier is unique.
*/
public void finalizeFileDelete(Long dataFileId, String storageLocation) throws IOException {
// Verify that the DataFile no longer exists:
if (find(dataFileId) != null) {
Expand Down
27 changes: 22 additions & 5 deletions src/main/java/edu/harvard/iq/dataverse/DataverseServiceBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
import edu.harvard.iq.dataverse.storageuse.StorageQuota;
import edu.harvard.iq.dataverse.util.StringUtil;
import edu.harvard.iq.dataverse.util.SystemConfig;
import edu.harvard.iq.dataverse.util.json.JsonUtil;

import java.io.File;
import java.io.IOException;
import java.sql.Timestamp;
Expand All @@ -34,6 +34,7 @@
import java.util.logging.Logger;
import java.util.Properties;

import edu.harvard.iq.dataverse.validation.JSONDataValidation;
import jakarta.ejb.EJB;
import jakarta.ejb.Stateless;
import jakarta.inject.Inject;
Expand Down Expand Up @@ -888,14 +889,16 @@ public List<Object[]> getDatasetTitlesWithinDataverse(Long dataverseId) {
return em.createNativeQuery(cqString).getResultList();
}


public String getCollectionDatasetSchema(String dataverseAlias) {
return getCollectionDatasetSchema(dataverseAlias, null);
}
public String getCollectionDatasetSchema(String dataverseAlias, Map<String, Map<String,List<String>>> schemaChildMap) {

Dataverse testDV = this.findByAlias(dataverseAlias);

while (!testDV.isMetadataBlockRoot()) {
if (testDV.getOwner() == null) {
break; // we are at the root; which by defintion is metadata blcok root, regarldess of the value
break; // we are at the root; which by definition is metadata block root, regardless of the value
}
testDV = testDV.getOwner();
}
Expand Down Expand Up @@ -932,6 +935,8 @@ public String getCollectionDatasetSchema(String dataverseAlias) {
dsft.setRequiredDV(dsft.isRequired());
dsft.setInclude(true);
}
List<String> childrenRequired = new ArrayList<>();
List<String> childrenAllowed = new ArrayList<>();
if (dsft.isHasChildren()) {
for (DatasetFieldType child : dsft.getChildDatasetFieldTypes()) {
DataverseFieldTypeInputLevel dsfIlChild = dataverseFieldTypeInputLevelService.findByDataverseIdDatasetFieldTypeId(testDV.getId(), child.getId());
Expand All @@ -944,8 +949,18 @@ public String getCollectionDatasetSchema(String dataverseAlias) {
child.setRequiredDV(child.isRequired() && dsft.isRequired());
child.setInclude(true);
}
if (child.isRequired()) {
childrenRequired.add(child.getName());
}
childrenAllowed.add(child.getName());
}
}
if (schemaChildMap != null) {
Map<String, List<String>> map = new HashMap<>();
map.put("required", childrenRequired);
map.put("allowed", childrenAllowed);
schemaChildMap.put(dsft.getName(), map);
}
if(dsft.isRequiredDV()){
requiredDSFT.add(dsft);
}
Expand Down Expand Up @@ -1021,11 +1036,13 @@ private String getCustomMDBSchema (MetadataBlock mdb, List<DatasetFieldType> req
}

public String isDatasetJsonValid(String dataverseAlias, String jsonInput) {
JSONObject rawSchema = new JSONObject(new JSONTokener(getCollectionDatasetSchema(dataverseAlias)));
Map<String, Map<String,List<String>>> schemaChildMap = new HashMap<>();
JSONObject rawSchema = new JSONObject(new JSONTokener(getCollectionDatasetSchema(dataverseAlias, schemaChildMap)));

try {
try {
Schema schema = SchemaLoader.load(rawSchema);
schema.validate(new JSONObject(jsonInput)); // throws a ValidationException if this object is invalid
JSONDataValidation.validate(schema, schemaChildMap, jsonInput); // throws a ValidationException if any objects are invalid
} catch (ValidationException vx) {
logger.info(BundleUtil.getStringFromBundle("dataverses.api.validate.json.failed") + " " + vx.getErrorMessage());
String accumulatedexceptions = "";
Expand Down
29 changes: 29 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/api/DatasetFields.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
package edu.harvard.iq.dataverse.api;

import edu.harvard.iq.dataverse.DatasetFieldServiceBean;
import edu.harvard.iq.dataverse.DatasetFieldType;
import jakarta.ejb.EJB;
import jakarta.ws.rs.*;
import jakarta.ws.rs.core.Response;

import java.util.List;

import static edu.harvard.iq.dataverse.util.json.JsonPrinter.jsonDatasetFieldTypes;

/**
* Api bean for managing dataset fields.
*/
@Path("datasetfields")
@Produces("application/json")
public class DatasetFields extends AbstractApiBean {

@EJB
DatasetFieldServiceBean datasetFieldService;

@GET
@Path("facetables")
public Response listAllFacetableDatasetFields() {
List<DatasetFieldType> datasetFieldTypes = datasetFieldService.findAllFacetableFieldTypes();
return ok(jsonDatasetFieldTypes(datasetFieldTypes));
}
}
1 change: 0 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/api/Datasets.java
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
package edu.harvard.iq.dataverse.api;

import com.amazonaws.services.s3.model.PartETag;

import edu.harvard.iq.dataverse.*;
import edu.harvard.iq.dataverse.DatasetLock.Reason;
import edu.harvard.iq.dataverse.actionlogging.ActionLogRecord;
Expand Down
23 changes: 15 additions & 8 deletions src/main/java/edu/harvard/iq/dataverse/api/Dataverses.java
Original file line number Diff line number Diff line change
Expand Up @@ -855,22 +855,29 @@ public Response setMetadataRoot(@Context ContainerRequestContext crc, @PathParam
/**
* return list of facets for the dataverse with alias `dvIdtf`
*/
public Response listFacets(@Context ContainerRequestContext crc, @PathParam("identifier") String dvIdtf) {
public Response listFacets(@Context ContainerRequestContext crc,
@PathParam("identifier") String dvIdtf,
@QueryParam("returnDetails") boolean returnDetails) {
try {
User u = getRequestUser(crc);
DataverseRequest r = createDataverseRequest(u);
User user = getRequestUser(crc);
DataverseRequest request = createDataverseRequest(user);
Dataverse dataverse = findDataverseOrDie(dvIdtf);
JsonArrayBuilder fs = Json.createArrayBuilder();
for (DataverseFacet f : execCommand(new ListFacetsCommand(r, dataverse))) {
fs.add(f.getDatasetFieldType().getName());
List<DataverseFacet> dataverseFacets = execCommand(new ListFacetsCommand(request, dataverse));

if (returnDetails) {
return ok(jsonDataverseFacets(dataverseFacets));
} else {
JsonArrayBuilder facetsBuilder = Json.createArrayBuilder();
for (DataverseFacet facet : dataverseFacets) {
facetsBuilder.add(facet.getDatasetFieldType().getName());
}
return ok(facetsBuilder);
}
return ok(fs);
} catch (WrappedResponse e) {
return e.getResponse();
}
}


@GET
@AuthRequired
@Path("{identifier}/featured")
Expand Down
Loading

0 comments on commit 6be46c6

Please sign in to comment.