From 33de8e5716362ac645a9d3d707fe31292e6d9f8b Mon Sep 17 00:00:00 2001 From: Danielle Groenen <89479247+daniellegroenen@users.noreply.github.com> Date: Thu, 5 Jan 2023 13:27:50 -0600 Subject: [PATCH 01/46] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 422fee35..ebb3bf62 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,7 @@ The CMR is designed around its own metadata standard called the [Unified Metadat pyQuARC supports DIF10 (collection only), ECHO10 (collection and granule), UMM-C, and UMM-G standards. At this time, there are no plans to add ISO 19115 or UMM-S/T specific checks. **Additionally, the output messages pyQuARC currently displays should be taken with a grain of salt. There is still testing and clean-up work to be done.** -**For inquiries, please email: jeanne.leroux@nsstc.uah.edu** +**For inquiries, please email: jenny.wood@uah.edu** ## pyQuARC as a Service (QuARC) From 63371d2fb286f98a14a11c5cc5c834b8a7661385 Mon Sep 17 00:00:00 2001 From: Danielle Groenen <89479247+daniellegroenen@users.noreply.github.com> Date: Thu, 5 Jan 2023 13:51:10 -0600 Subject: [PATCH 02/46] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ebb3bf62..a4e392c3 100644 --- a/README.md +++ b/README.md @@ -53,7 +53,7 @@ The `checks.json` file includes a comprehensive list of rules. Each rule is spec The `rule_mapping.json` file specifies which metadata element(s) each rule applies to. The `rule_mapping.json` also references the `messages.json` file which includes messages that can be displayed when a check passes or fails. -Furthermore, the `rule_mapping.json` file specifies the level of severity associated with a failure. If a check fails, it will be assigned a severity category of “error,” “warning,” or info.” These categories correspond to priority categorizations in [ARC’s priority matrix](https://wiki.earthdata.nasa.gov/display/CMR/ARC+Priority+Matrix) and communicate the importance of the failed check, with “error” being the most critical category, “warning” indicating a failure of medium priority, and “info” indicating a minor issue or inconsistency. Default severity values are assigned based on ARC’s metadata quality assessment framework, but can be customized to meet individual needs. +Furthermore, the `rule_mapping.json` file specifies the level of severity associated with a failure. If a check fails, it will be assigned a severity category of “error,” “warning,” or "info.” These categories correspond to priority categorizations in [ARC’s priority matrix](https://wiki.earthdata.nasa.gov/display/CMR/ARC+Priority+Matrix) and communicate the importance of the failed check, with “error” being the most critical category, “warning” indicating a failure of medium priority, and “info” indicating a minor issue or inconsistency. Default severity values are assigned based on ARC’s metadata quality assessment framework, but can be customized to meet individual needs. ## Customization pyQuARC is designed to be customizable. Output messages can be modified using the `messages_override.json` file - any messages added to `messages_override.json` will display over the default messages in the `message.json` file. Similarly, there is a `rule_mapping_override.json` file which can be used to override the default settings for which rules/checks are applied to which metadata elements. From 01cd68211a93b0b40a6e263c81291cd91316c092 Mon Sep 17 00:00:00 2001 From: Danielle Groenen <89479247+daniellegroenen@users.noreply.github.com> Date: Thu, 5 Jan 2023 13:53:34 -0600 Subject: [PATCH 03/46] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a4e392c3..849df5b3 100644 --- a/README.md +++ b/README.md @@ -53,7 +53,7 @@ The `checks.json` file includes a comprehensive list of rules. Each rule is spec The `rule_mapping.json` file specifies which metadata element(s) each rule applies to. The `rule_mapping.json` also references the `messages.json` file which includes messages that can be displayed when a check passes or fails. -Furthermore, the `rule_mapping.json` file specifies the level of severity associated with a failure. If a check fails, it will be assigned a severity category of “error,” “warning,” or "info.” These categories correspond to priority categorizations in [ARC’s priority matrix](https://wiki.earthdata.nasa.gov/display/CMR/ARC+Priority+Matrix) and communicate the importance of the failed check, with “error” being the most critical category, “warning” indicating a failure of medium priority, and “info” indicating a minor issue or inconsistency. Default severity values are assigned based on ARC’s metadata quality assessment framework, but can be customized to meet individual needs. +Furthermore, the `rule_mapping.json` file specifies the level of severity associated with a failure. If a check fails, it will be assigned a severity category of “error”, “warning”, or "info.” These categories correspond to priority categorizations in [ARC’s priority matrix](https://wiki.earthdata.nasa.gov/display/CMR/ARC+Priority+Matrix) and communicate the importance of the failed check, with “error” being the most critical category, “warning” indicating a failure of medium priority, and “info” indicating a minor issue or inconsistency. Default severity values are assigned based on ARC’s metadata quality assessment framework, but can be customized to meet individual needs. ## Customization pyQuARC is designed to be customizable. Output messages can be modified using the `messages_override.json` file - any messages added to `messages_override.json` will display over the default messages in the `message.json` file. Similarly, there is a `rule_mapping_override.json` file which can be used to override the default settings for which rules/checks are applied to which metadata elements. From 88383d8ce721a5a53be769a18e031fb0254d1a1b Mon Sep 17 00:00:00 2001 From: Danielle Groenen <89479247+daniellegroenen@users.noreply.github.com> Date: Thu, 5 Jan 2023 14:09:01 -0600 Subject: [PATCH 04/46] Update README.md Changed "Use" to "Using" in the last section to match the other sections. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 849df5b3..a7eca8b3 100644 --- a/README.md +++ b/README.md @@ -317,7 +317,7 @@ Then, if the check function receives input `value1=0` and `value2=1`, the output The values 0 and 1 do not amount to a true value ``` -### Use as a package +### Using as a package *Note:* This program requires `Python 3.8` installed in your system. **Clone the repo:** [https://github.com/NASA-IMPACT/pyQuARC/](https://github.com/NASA-IMPACT/pyQuARC/) From 0d0820a8e66276174b356e1f0961ad392d728d00 Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Mon, 17 Apr 2023 13:54:24 -0500 Subject: [PATCH 05/46] added license_url_description_check field to rule_mapping --- pyQuARC/schemas/rule_mapping.json | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 1199fe4f..5406d192 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -3715,6 +3715,34 @@ "severity": "error", "check_id": "one_item_presence_check" }, + "license_url_description_check": { + "rule_name": "License URL Description Check", + "fields_to_apply": { + "echo-c": [ + { + "fields": [ + "Collection/UseConstraints/LicenseURL/Description" + ] + } + ], + "dif10": [ + { + "fields": [ + "DIF/Use_Constraints/License_Text" + ] + } + ], + "umm-c": [ + { + "fields": [ + "UseConstraints/LicenseURL/Description" + ] + } + ] + }, + "severity": "warning", + "check_id": "one_item_presence_check" + }, "collection_citation_presence_check": { "rule_name": "Collection Citation Presence Check", "fields_to_apply": { From a904c5ab03c0d842b920890da560ab1a81766604 Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Mon, 17 Apr 2023 13:59:33 -0500 Subject: [PATCH 06/46] Added check message for license url description check --- pyQuARC/schemas/check_messages.json | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index b5f4d0ff..b413fe41 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -663,6 +663,14 @@ }, "remediation": "Recommend providing information about the license applicable to the dataset, preferably as a URL. For EOSDIS records, recommend providing the following link (unless under different usage terms): https://earthdata.nasa.gov/earth-observation-data/data-use-policy" }, + "license_url_description_check": { + "failure": "Recommend providing a license URL description. For example: \n", + "help": { + "message": "", + "url": "" + }, + "remediation": "'The Earth Observing System Data and Information System (EOSDIS) data use policy for NASA data.'" + }, "collection_citation_presence_check": { "failure": "No citation information is provided.", "help": { From 0b8e731f14dd86bceefc6a8d92b99524b34e554c Mon Sep 17 00:00:00 2001 From: Jenny Wood Date: Tue, 18 Apr 2023 13:47:54 -0500 Subject: [PATCH 07/46] Added horizontal_resolution_presence_check to the rule_mapping and check_messages --- pyQuARC/schemas/check_messages.json | 10 +++++- pyQuARC/schemas/rule_mapping.json | 56 ++++++++++++++++++++++++++++- 2 files changed, 64 insertions(+), 2 deletions(-) diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index b5f4d0ff..063696ea 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -1022,5 +1022,13 @@ "url": "https://wiki.earthdata.nasa.gov/display/CMR/Instrument" }, "remediation": "Recommend updating the Number Of Instruments/Sensors value to match the numerical amount of instruments/sensors provided." + }, + "horizontal_resolution_presence_check": { + "failure": "No horizontal resolution information is provided.", + "help": { + "message": "", + "url": "https://wiki.earthdata.nasa.gov/display/CMR/Spatial+Extent" + }, + "remediation": "Recommend providing the horizontal pixel resolution, if applicable. If provided, this information will be indexed in the EDSC 'Horizontal Data Resolution' search facet which allows users to search by spatial resolution." } -} +} \ No newline at end of file diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 1199fe4f..081616a9 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -5301,5 +5301,59 @@ }, "severity": "warning", "check_id": "granule_data_format_presence_check" + }, + "horizontal_resolution_presence_check": { + "rule_name": "Horizontal Resolution Presence Check", + "fields_to_apply": { + "echo-c": [ + { + "fields": [ + "Collection/SpatialInfo/HorizontalCoordinateSystem/GeographicCoordinateSystem/LongitudeResolution", + "Collection/SpatialInfo/HorizontalCoordinateSystem/GeographicCoordinateSystem/LatitudeResolution", + "Collection/SpatialInfo/HorizontalCoordinateSystem/GeographicCoordinateSystem/GeographicCoordinateUnits" + ] + } + ], + "dif10": [ + { + "fields": [ + "DIF/Data_Resolution/Longitude_Resolution", + "DIF/Data_Resolution/Latitude_Resolution", + "DIF/Spatial_Coverage/Spatial_Info/Horizontal_Coordinate_System/Geographic_Coordinate_System/LongitudeResolution", + "DIF/Spatial_Coverage/Spatial_Info/Horizontal_Coordinate_System/Geographic_Coordinate_System/LatitudeResolution", + "DIF/Spatial_Coverage/Spatial_Info/Horizontal_Coordinate_System/Geographic_Coordinate_System/GeographicCoordinateUnits" + ] + } + ], + "umm-c": [ + { + "fields": [ + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/VariesResolution", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/PointResolution", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/GriddedResolutions/XDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/GriddedResolutions/YDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/GriddedResolutions/Unit", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/GenericResolutions/XDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/GenericResolutions/YDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/GenericResolutions/Unit", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/NonGriddedResolutions/XDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/NonGriddedResolutions/YDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/NonGriddedResolutions/Unit", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/NonGriddedRangeResolutions/MinimumXDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/NonGriddedRangeResolutions/MaximumXDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/NonGriddedRangeResolutions/MinimumYDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/NonGriddedRangeResolutions/MaximumYDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/NonGriddedRangeResolutions/Unit", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/GriddedRangeResolutions/MinimumXDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/GriddedRangeResolutions/MaximumXDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/GriddedRangeResolutions/MinimumYDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/GriddedRangeResolutions/MaximumYDimension", + "SpatialExtent/HorizontalSpatialDomain/ResolutionAndCoordinateSystem/HorizontalDataResolution/GriddedRangeResolutions/Unit" + ] + } + ] + }, + "severity": "warning", + "check_id": "one_item_presence_check" } -} +} \ No newline at end of file From 9b6172f9b937543fbd90ea6cfc8069b30a6fe531 Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Wed, 19 Apr 2023 10:01:41 -0500 Subject: [PATCH 08/46] fixed field path --- pyQuARC/schemas/rule_mapping.json | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 5406d192..3a0136ae 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -3699,7 +3699,7 @@ { "fields": [ "DIF/Use_Constraints/License_URL/URL", - "DIF/Use_Constraints/License_Text" + "DIF/Use_Constraints/License_URL/Description" ] } ], @@ -3728,7 +3728,7 @@ "dif10": [ { "fields": [ - "DIF/Use_Constraints/License_Text" + "DIF/Use_Constraints/License_URL/Description" ] } ], @@ -4618,12 +4618,6 @@ "DIF/Multimedia_Sample/Description", "DIF/Multimedia_Sample/URL" ] - }, - { - "fields": [ - "DIF/Use_Constraints/License_URL/Description", - "DIF/Use_Constraints/License_URL/URL" - ] } ], "umm-c": [ From c3bab5c07e1060facb8076245b8534a68bae8342 Mon Sep 17 00:00:00 2001 From: smk0033 Date: Wed, 19 Apr 2023 15:13:39 -0700 Subject: [PATCH 09/46] Standard Product Check added --- pyQuARC/schemas/check_messages.json | 8 ++++++++ pyQuARC/schemas/rule_mapping.json | 29 +++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index b5f4d0ff..d149391b 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -311,6 +311,14 @@ }, "remediation": "Recommend providing the instrument short name." }, + "standard_product_check": { + "failure": "The Standard Product is missing.", + "help": { + "message": "", + "url": "" + }, + "remediation": "Recommend indicating whether this is a StandardProduct. For information please see: https://wiki.earthdata.nasa.gov/display/CMR/StandardProduct" + }, "validate_granule_instrument_against_collection": { "failure": "The instrument short name listed in the granule metadata does not match the instrument short name listed in the collection metadata.", "help": { diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 1199fe4f..1c6fb17c 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -2565,6 +2565,35 @@ ] }, "severity": "info", + "check_id": "one_item_presence_check" + }, + "standard_product_check": { + "rule_name": "Standard Product Check", + "fields_to_apply": { + "echo-c": [ + { + "fields": [ + "Collection/StandardProduct" + ] + } + ], + "dif10": [ + { + "fields": [ + "DIF/Metadata/Value", + "DIF/Extended_Metadata/Metadata/Value" + ] + } + ], + "umm-c": [ + { + "fields": [ + "StandardProduct" + ] + } + ] + }, + "severity": "warning", "check_id": "one_item_presence_check" }, "validate_granule_instrument_against_collection": { From 8dee88d2d110dcca14f522af0ca89f6cc283b0b1 Mon Sep 17 00:00:00 2001 From: sydney-lybrand Date: Thu, 20 Apr 2023 15:14:31 -0500 Subject: [PATCH 10/46] rule_mapping file for FreeAndOpenData check --- pyQuARC/schemas/rule_mapping.json | 32 +++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 1199fe4f..10f199a3 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -1,4 +1,36 @@ { + + "free_and_open_data_presence_check": { + "rule_name": "Free and Open Data Presence Check", + "fields_to_apply": { + + "echo-c": [ + { + "fields": [ + "Collection/UseConstraints/FreeAndOpenData" + ] + } + ], + "dif10": [ + { + "fields": [ + "DIF/Use_Constraints/Free_And_Open_Data" + ] + } + ], + "umm-c": [ + { + "fields": [ + "UseConstraints/FreeAndOpenData" + ] + } + ] + + }, + "severity": "warning", + "check_id": "one_item_presence_check" + }, + "data_update_time_logic_check": { "rule_name": "Data Update Time Logic Check", "fields_to_apply": { From 90b18e6d04015ab977cbe342e782bf9ac00e81e1 Mon Sep 17 00:00:00 2001 From: sydney-lybrand Date: Thu, 20 Apr 2023 15:23:01 -0500 Subject: [PATCH 11/46] added FreeAndOpenData check to check_messages --- pyQuARC/schemas/check_messages.json | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index b5f4d0ff..fd2642a2 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -1,4 +1,13 @@ { + "free_and_open_data_presence_check":{ + "failure": "No FreeAndOpenData value was given.", + "help": { + "message": "", + "url": "https://wiki.earthdata.nasa.gov/display/CMR/Use+Constraints" + }, + "remediation": "Recommend providing a FreeAndOpenData value of 'true'." + }, + "datetime_format_check": { "failure": "`{}` does not adhere to the ISO 1601 standard.", "help": { From 1e331d6aa0cf167c604a647cb9ad69647c440f7a Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Fri, 21 Apr 2023 11:02:06 -0500 Subject: [PATCH 12/46] updated ends at present flag check to account for text values of False --- pyQuARC/code/custom_validator.py | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/pyQuARC/code/custom_validator.py b/pyQuARC/code/custom_validator.py index 726201b2..1281b67d 100644 --- a/pyQuARC/code/custom_validator.py +++ b/pyQuARC/code/custom_validator.py @@ -17,8 +17,13 @@ def ends_at_present_flag_logic_check( valid = ( ends_at_present_flag == True and not (ending_date_time) and collection_state == "ACTIVE" + or collection_state == "IN WORK" ) or ( - ends_at_present_flag == False + ends_at_present_flag == False or ends_at_present_flag == "false" + or ends_at_present_flag == "False" + and bool(ending_date_time) and collection_state == "COMPLETE" + ) or ( + ends_at_present_flag == None and bool(ending_date_time) and collection_state == "COMPLETE" ) From a93a59584fdf75b043bc27b2862f184e8ee2db5c Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Mon, 24 Apr 2023 10:28:38 -0500 Subject: [PATCH 13/46] fixed bugs for ends at presence flag check --- pyQuARC/code/custom_validator.py | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/pyQuARC/code/custom_validator.py b/pyQuARC/code/custom_validator.py index 1281b67d..2f01d18c 100644 --- a/pyQuARC/code/custom_validator.py +++ b/pyQuARC/code/custom_validator.py @@ -15,15 +15,12 @@ def ends_at_present_flag_logic_check( ): collection_state = collection_state.upper() valid = ( - ends_at_present_flag == True - and not (ending_date_time) and collection_state == "ACTIVE" - or collection_state == "IN WORK" + (bool(ends_at_present_flag) + and ends_at_present_flag not in ("False", "false")) + and not (ending_date_time) and collection_state in ("ACTIVE", "IN WORK") ) or ( - ends_at_present_flag == False or ends_at_present_flag == "false" - or ends_at_present_flag == "False" - and bool(ending_date_time) and collection_state == "COMPLETE" - ) or ( - ends_at_present_flag == None + (bool(ends_at_present_flag) == False + or ends_at_present_flag in ("False", "false")) and bool(ending_date_time) and collection_state == "COMPLETE" ) From b8069e7f9809c998fcbaad0863c37c8c6ca93e66 Mon Sep 17 00:00:00 2001 From: Slesa Adhikari Date: Mon, 24 Apr 2023 16:53:12 -0500 Subject: [PATCH 14/46] Work around and log invalid token & cmr response --- pyQuARC/code/downloader.py | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/pyQuARC/code/downloader.py b/pyQuARC/code/downloader.py index 58364138..82cd8582 100644 --- a/pyQuARC/code/downloader.py +++ b/pyQuARC/code/downloader.py @@ -1,15 +1,15 @@ -import os +import json import re -from urllib.parse import urlparse - import requests +from urllib.parse import urlparse + from .utils import get_cmr_url, get_headers class Downloader: """ - Downloads data given a concept ID + Downloads data given a concept ID """ BASE_URL = "{cmr_host}/search/concepts/" @@ -26,7 +26,9 @@ class Downloader: "dif10": "dif10", } - def __init__(self, concept_id, metadata_format, version=None, cmr_host=get_cmr_url()): + def __init__( + self, concept_id, metadata_format, version=None, cmr_host=get_cmr_url() + ): """ Args: concept_id (str): The concept id of the metadata to download @@ -42,7 +44,7 @@ def __init__(self, concept_id, metadata_format, version=None, cmr_host=get_cmr_u self.downloaded_content = None parsed_url = urlparse(cmr_host) - self.cmr_host = f'{parsed_url.scheme}://{parsed_url.netloc}' + self.cmr_host = f"{parsed_url.scheme}://{parsed_url.netloc}" def _valid_concept_id(self): """ @@ -64,9 +66,8 @@ def _construct_url(self): """ extension = Downloader.FORMAT_MAP.get(self.metadata_format, "echo10") - concept_id_type = Downloader._concept_id_type(self.concept_id) base_url = Downloader.BASE_URL.format(cmr_host=self.cmr_host) - version = f'/{self.version}' if self.version else '' + version = f"/{self.version}" if self.version else "" constructed_url = f"{base_url}{self.concept_id}{version}.{extension}" return constructed_url @@ -91,24 +92,32 @@ def download(self): # is the concept id valid? if not, log error if not self._valid_concept_id(): - self.log_error( - "invalid_concept_id", - {"concept_id": self.concept_id} - ) + self.log_error("invalid_concept_id", {"concept_id": self.concept_id}) return # constructs url based on concept id url = self._construct_url() headers = get_headers() response = requests.get(url, headers=headers) + + # if the authorization token is invalid, even public metadata that doesn't require the token is inaccessible + # this works around that + if response.status_code == 401: # if token invalid, try without token + response = requests.get(url) + # gets the response, makes sure it's 200, puts it in an object variable if response.status_code != 200: + try: + message = json.loads(response.text).get("errors") + except (json.decoder.JSONDecodeError, KeyError): + message = "Something went wrong while downloading the requested metadata. Make sure all the inputs are correct." self.log_error( "request_failed", { "concept_id": self.concept_id, "url": url, "status_code": response.status_code, + "message": message, }, ) return From f3594d2cc8b53c1901672f757238d1fc9eee5ecc Mon Sep 17 00:00:00 2001 From: Slesa Adhikari Date: Mon, 24 Apr 2023 16:54:53 -0500 Subject: [PATCH 15/46] Format --- .gitignore | 1 + pyQuARC/__init__.py | 4 +- pyQuARC/code/base_validator.py | 2 +- pyQuARC/code/checker.py | 72 +- pyQuARC/code/constants.py | 16 +- pyQuARC/code/custom_checker.py | 52 +- pyQuARC/code/custom_validator.py | 84 +- pyQuARC/code/datetime_validator.py | 70 +- pyQuARC/code/gcmd_validator.py | 56 +- pyQuARC/code/scheduler.py | 18 +- pyQuARC/code/schema_validator.py | 44 +- pyQuARC/code/string_validator.py | 75 +- pyQuARC/code/tracker.py | 10 +- pyQuARC/code/url_validator.py | 34 +- pyQuARC/code/utils.py | 27 +- pyQuARC/main.py | 97 +-- pyQuARC/schemas/version.txt | 2 +- setup.py | 8 +- tests/common.py | 4 +- tests/fixtures/checker.py | 29 +- tests/fixtures/custom_checker.py | 23 +- tests/fixtures/downloader.py | 4 +- tests/fixtures/test_check_files.py | 1263 +++++++++++++++++++--------- tests/fixtures/validator.py | 12 +- tests/test_custom_checker.py | 60 +- tests/test_datetime_validator.py | 5 +- tests/test_downloader.py | 6 +- tests/test_schema_validator.py | 14 +- tests/test_string_validator.py | 2 +- 29 files changed, 1265 insertions(+), 829 deletions(-) diff --git a/.gitignore b/.gitignore index aacf55e3..fbbcb98d 100644 --- a/.gitignore +++ b/.gitignore @@ -7,3 +7,4 @@ build/* dist/* pyQuARC.egg-info/* env/* +.venv/* \ No newline at end of file diff --git a/pyQuARC/__init__.py b/pyQuARC/__init__.py index 6b817340..63d187c3 100644 --- a/pyQuARC/__init__.py +++ b/pyQuARC/__init__.py @@ -17,7 +17,7 @@ with open(f"{ABS_PATH}/version.txt") as version_file: __version__ = version_file.read().strip() + def version(): - """Returns the current version of pyQuARC. - """ + """Returns the current version of pyQuARC.""" return __version__ diff --git a/pyQuARC/code/base_validator.py b/pyQuARC/code/base_validator.py index 9fbb7475..16b9e500 100644 --- a/pyQuARC/code/base_validator.py +++ b/pyQuARC/code/base_validator.py @@ -40,7 +40,7 @@ def contains(list_of_values, value): @staticmethod def compare(first, second, relation): - if relation.startswith('not_'): + if relation.startswith("not_"): return not (BaseValidator.compare(first, second, relation[4:])) func = getattr(BaseValidator, relation) return func(first, second) diff --git a/pyQuARC/code/checker.py b/pyQuARC/code/checker.py index e2cebd77..2e603a75 100644 --- a/pyQuARC/code/checker.py +++ b/pyQuARC/code/checker.py @@ -26,7 +26,7 @@ def __init__( metadata_format=ECHO10_C, messages_override=None, checks_override=None, - rules_override=None + rules_override=None, ): """ Args: @@ -53,13 +53,13 @@ def __init__( self.rules_override, self.checks, self.checks_override, - metadata_format=metadata_format + metadata_format=metadata_format, + ) + self.schema_validator = SchemaValidator( + self.messages_override or self.messages, metadata_format ) - self.schema_validator = SchemaValidator(self.messages_override or self.messages, metadata_format) self.tracker = Tracker( - self.rule_mapping, - self.rules_override, - metadata_format=metadata_format + self.rule_mapping, self.rules_override, metadata_format=metadata_format ) @staticmethod @@ -76,15 +76,9 @@ def load_schemas(self): self.checks = Checker._json_load_schema("checks") self.rule_mapping = Checker._json_load_schema("rule_mapping") self.messages = Checker._json_load_schema("check_messages") - self.messages_override = Checker._json_load_schema( - self.msgs_override_file - ) - self.rules_override = Checker._json_load_schema( - self.rules_override_file - ) - self.checks_override = Checker._json_load_schema( - self.checks_override_file - ) + self.messages_override = Checker._json_load_schema(self.msgs_override_file) + self.rules_override = Checker._json_load_schema(self.rules_override_file) + self.checks_override = Checker._json_load_schema(self.checks_override_file) @staticmethod def map_to_function(data_type, function): @@ -112,19 +106,19 @@ def message(self, rule_id, msg_type): msg_type can be any one of 'failure', 'remediation' """ messages = self.messages_override.get(rule_id) or self.messages.get(rule_id) - return messages[msg_type] if messages else '' + return messages[msg_type] if messages else "" def build_message(self, result, rule_id): """ Formats the message for `rule_id` based on the result """ failure_message = self.message(rule_id, "failure") - rule_mapping = self.rules_override.get( + rule_mapping = self.rules_override.get(rule_id) or self.rule_mapping.get( rule_id - ) or self.rule_mapping.get(rule_id) + ) severity = rule_mapping.get("severity", "error") messages = [] - if not(result["valid"]) and result.get("value"): + if not (result["valid"]) and result.get("value"): for value in result["value"]: formatted_message = failure_message value = value if isinstance(value, tuple) else (value,) @@ -143,7 +137,9 @@ def _check_dependency_validity(self, dependency, field_dict): """ Checks if the dependent check called `dependency` is valid """ - dependency_fields = field_dict["fields"] if len(dependency) == 1 else [dependency[1]] + dependency_fields = ( + field_dict["fields"] if len(dependency) == 1 else [dependency[1]] + ) for field in dependency_fields: if not self.tracker.read_data(dependency[0], field).get("valid"): return False @@ -162,27 +158,26 @@ def _run_func(self, func, check, rule_id, metadata_content, result_dict): """ Run the check function for `rule_id` and update `result_dict` """ - rule_mapping = self.rules_override.get( + rule_mapping = self.rules_override.get(rule_id) or self.rule_mapping.get( rule_id - ) or self.rule_mapping.get(rule_id) + ) external_data = rule_mapping.get("data", []) relation = rule_mapping.get("relation") - list_of_fields_to_apply = \ - rule_mapping.get("fields_to_apply").get(self.metadata_format, {}) - + list_of_fields_to_apply = rule_mapping.get("fields_to_apply").get( + self.metadata_format, {} + ) + for field_dict in list_of_fields_to_apply: - dependencies = self.scheduler.get_all_dependencies(rule_mapping, check, field_dict) + dependencies = self.scheduler.get_all_dependencies( + rule_mapping, check, field_dict + ) main_field = field_dict["fields"][0] external_data = field_dict.get("data", external_data) result_dict.setdefault(main_field, {}) if not self._check_dependencies_validity(dependencies, field_dict): continue result = self.custom_checker.run( - func, - metadata_content, - field_dict, - external_data, - relation + func, metadata_content, field_dict, external_data, relation ) self.tracker.update_data(rule_id, main_field, result["valid"]) @@ -211,14 +206,16 @@ def perform_custom_checks(self, metadata_content): ) or self.rule_mapping.get(rule_id) check_id = rule_mapping.get("check_id", rule_id) check = self.checks_override.get(check_id) or self.checks.get(check_id) - func = Checker.map_to_function(check["data_type"], check["check_function"]) + func = Checker.map_to_function( + check["data_type"], check["check_function"] + ) if func: self._run_func(func, check, rule_id, metadata_content, result_dict) except Exception as e: pyquarc_errors.append( { "message": f"Running check for the rule: '{rule_id}' failed.", - "details": str(e) + "details": str(e), } ) return result_dict, pyquarc_errors @@ -233,6 +230,7 @@ def run(self, metadata_content): Returns: (dict): The results of the jsonschema check and all custom checks """ + def _xml_postprocessor(_, key, value): """ Sometimes the XML values contain attributes. @@ -259,11 +257,7 @@ def _xml_postprocessor(_, key, value): parser = parse kwargs = {"postprocessor": _xml_postprocessor} json_metadata = parser(metadata_content, **kwargs) - result_schema = self.perform_schema_check( - metadata_content - ) + result_schema = self.perform_schema_check(metadata_content) result_custom, pyquarc_errors = self.perform_custom_checks(json_metadata) - result = { - **result_schema, **result_custom - } + result = {**result_schema, **result_custom} return result, pyquarc_errors diff --git a/pyQuARC/code/constants.py b/pyQuARC/code/constants.py index 33403f98..30a63f5c 100644 --- a/pyQuARC/code/constants.py +++ b/pyQuARC/code/constants.py @@ -14,7 +14,7 @@ ROOT_DIR = ( # go up one directory - os.path.abspath(os.path.join(__file__, '../..')) + os.path.abspath(os.path.join(__file__, "../..")) ) SCHEMAS_BASE_PATH = f"{ROOT_DIR}/schemas" @@ -46,17 +46,17 @@ "rules_override", f"{UMM_C}-json-schema", "umm-cmn-json-schema", - f"{UMM_G}-json-schema" + f"{UMM_G}-json-schema", ], "csv": GCMD_KEYWORDS, - "xsd": [ f"{DIF}_schema", f"{ECHO10_C}_schema", f"{ECHO10_G}_schema" ], - "xml": [ "catalog" ] + "xsd": [f"{DIF}_schema", f"{ECHO10_C}_schema", f"{ECHO10_G}_schema"], + "xml": ["catalog"], } SCHEMA_PATHS = { - schema: f"{SCHEMAS_BASE_PATH}/{schema}.{filetype}" - for filetype, schemas in SCHEMAS.items() - for schema in schemas + schema: f"{SCHEMAS_BASE_PATH}/{schema}.{filetype}" + for filetype, schemas in SCHEMAS.items() + for schema in schemas } VERSION_FILE = f"{SCHEMAS_BASE_PATH}/version.txt" @@ -67,7 +67,7 @@ "error": Fore.RED, "warning": Fore.YELLOW, "reset": Style.RESET_ALL, - "bright": Style.BRIGHT + "bright": Style.BRIGHT, } GCMD_BASIC_URL = "https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/" diff --git a/pyQuARC/code/custom_checker.py b/pyQuARC/code/custom_checker.py index afe875f2..16006445 100644 --- a/pyQuARC/code/custom_checker.py +++ b/pyQuARC/code/custom_checker.py @@ -10,7 +10,9 @@ def __init__(self): pass @staticmethod - def _get_path_value_recursively(subset_of_metadata_content, path_list, container, query_params=None): + def _get_path_value_recursively( + subset_of_metadata_content, path_list, container, query_params=None + ): """ Gets the path values recursively while handling list or dictionary in `subset_of_metadata_content` Adds the values to `container` @@ -37,7 +39,11 @@ def _get_path_value_recursively(subset_of_metadata_content, path_list, container container.append(subset_of_metadata_content) return new_path = path_list[1:] - if isinstance(root_content, str) or isinstance(root_content, int) or isinstance(root_content, float): + if ( + isinstance(root_content, str) + or isinstance(root_content, int) + or isinstance(root_content, float) + ): container.append(root_content) return elif isinstance(root_content, list): @@ -46,7 +52,13 @@ def _get_path_value_recursively(subset_of_metadata_content, path_list, container return if len(new_path) == 1 and query_params: try: - root_content = next((x for x in root_content if x[query_params[0]] == query_params[1])) + root_content = next( + ( + x + for x in root_content + if x[query_params[0]] == query_params[1] + ) + ) root_content = root_content[new_path[0]] container.append(root_content) except: @@ -55,13 +67,15 @@ def _get_path_value_recursively(subset_of_metadata_content, path_list, container for each in root_content: try: CustomChecker._get_path_value_recursively( - each, new_path, container, query_params) + each, new_path, container, query_params + ) except KeyError: container.append(None) continue elif isinstance(root_content, dict): CustomChecker._get_path_value_recursively( - root_content, new_path, container, query_params) + root_content, new_path, container, query_params + ) @staticmethod def _get_path_value(content_to_validate, path_string): @@ -80,15 +94,18 @@ def _get_path_value(content_to_validate, path_string): query_params = None parsed = urlparse(path_string) - path = parsed.path.split('/') + path = parsed.path.split("/") if key_value := parsed.query: - query_params = key_value.split('=') + query_params = key_value.split("=") CustomChecker._get_path_value_recursively( - content_to_validate, path, container, query_params) + content_to_validate, path, container, query_params + ) return container - def run(self, func, content_to_validate, field_dict, external_data, external_relation): + def run( + self, func, content_to_validate, field_dict, external_data, external_relation + ): """ Runs the custom check based on `func` to the `content_to_validate`'s `field_dict` path @@ -112,12 +129,9 @@ def run(self, func, content_to_validate, field_dict, external_data, external_rel fields = field_dict["fields"] field_values = [] relation = field_dict.get("relation") - result = { - "valid": None - } + result = {"valid": None} for _field in fields: - value = CustomChecker._get_path_value( - content_to_validate, _field) + value = CustomChecker._get_path_value(content_to_validate, _field) field_values.append(value) args = zip(*field_values) @@ -125,9 +139,15 @@ def run(self, func, content_to_validate, field_dict, external_data, external_rel validity = None for arg in args: function_args = [*arg] - function_args.extend([extra_arg for extra_arg in [relation, *external_data, external_relation] if extra_arg]) + function_args.extend( + [ + extra_arg + for extra_arg in [relation, *external_data, external_relation] + if extra_arg + ] + ) func_return = func(*function_args) - valid = func_return["valid"] # can be True, False or None + valid = func_return["valid"] # can be True, False or None if valid is not None: if valid: validity = validity or (validity is None) diff --git a/pyQuARC/code/custom_validator.py b/pyQuARC/code/custom_validator.py index 726201b2..141e06b1 100644 --- a/pyQuARC/code/custom_validator.py +++ b/pyQuARC/code/custom_validator.py @@ -4,7 +4,6 @@ from .utils import cmr_request, if_arg, set_cmr_prms - class CustomValidator(BaseValidator): def __init__(self): super().__init__() @@ -16,10 +15,12 @@ def ends_at_present_flag_logic_check( collection_state = collection_state.upper() valid = ( ends_at_present_flag == True - and not (ending_date_time) and collection_state == "ACTIVE" + and not (ending_date_time) + and collection_state == "ACTIVE" ) or ( ends_at_present_flag == False - and bool(ending_date_time) and collection_state == "COMPLETE" + and bool(ending_date_time) + and collection_state == "COMPLETE" ) return {"valid": valid, "value": ends_at_present_flag} @@ -37,9 +38,9 @@ def ends_at_present_flag_presence_check( @staticmethod def mime_type_check(mime_type, url_type, controlled_list): """ - Checks that if the value for url_type is "USE SERVICE API", - the mime_type should be one of the values from a controlled list - For all other cases, the check should be valid + Checks that if the value for url_type is "USE SERVICE API", + the mime_type should be one of the values from a controlled list + For all other cases, the check should be valid """ result = {"valid": True, "value": mime_type} if url_type: @@ -55,7 +56,10 @@ def mime_type_check(mime_type, url_type, controlled_list): @staticmethod def availability_check(field_value, parent_value): # If the parent is available, the child should be available too, else it is invalid - return {"valid": bool(field_value) if parent_value else True, "value": parent_value} + return { + "valid": bool(field_value) if parent_value else True, + "value": parent_value, + } @staticmethod @if_arg @@ -80,9 +84,9 @@ def bounding_coordinate_logic_check(west, north, east, south): @staticmethod def one_item_presence_check(*field_values): """ - Checks if one of the specified fields is populated - At least one of the `field_values` should not be null - It is basically a OneOf check + Checks if one of the specified fields is populated + At least one of the `field_values` should not be null + It is basically a OneOf check """ validity = False value = None @@ -96,7 +100,9 @@ def one_item_presence_check(*field_values): return {"valid": validity, "value": value} @staticmethod - def granule_sensor_presence_check(sensor_values, collection_shortname=None, version=None, dataset_id=None): + def granule_sensor_presence_check( + sensor_values, collection_shortname=None, version=None, dataset_id=None + ): """ Checks if sensor is provided at the granule level if provided at collection level @@ -110,14 +116,14 @@ def granule_sensor_presence_check(sensor_values, collection_shortname=None, vers } prms = set_cmr_prms(params, format="umm_json") collections = cmr_request(prms) - if collections := collections.get('items'): + if collections := collections.get("items"): collection = collections[0] - for platform in collection['umm'].get('Platforms', []): - instruments = platform.get('Instruments', []) + for platform in collection["umm"].get("Platforms", []): + instruments = platform.get("Instruments", []) for instrument in instruments: - if 'ComposedOf' in instrument.keys(): + if "ComposedOf" in instrument.keys(): return CustomValidator.presence_check(sensor_values) - + return { "valid": True, "value": sensor_values, @@ -128,9 +134,9 @@ def granule_sensor_presence_check(sensor_values, collection_shortname=None, vers def user_services_check(first_name, middle_name, last_name): return { "valid": ( - first_name.lower() != 'user' or - last_name.lower() != 'services' or - (middle_name and (middle_name.lower() != 'null')) + first_name.lower() != "user" + or last_name.lower() != "services" + or (middle_name and (middle_name.lower() != "null")) ), "value": f"{first_name} {middle_name} {last_name}", } @@ -138,10 +144,7 @@ def user_services_check(first_name, middle_name, last_name): @staticmethod def doi_missing_reason_explanation(explanation, missing_reason, doi): validity = bool(doi or ((not doi) and missing_reason and explanation)) - return { - "valid": validity, - "value": explanation - } + return {"valid": validity, "value": explanation} @staticmethod @if_arg @@ -157,22 +160,19 @@ def collection_progress_consistency_check( # Logic: https://github.com/NASA-IMPACT/pyQuARC/issues/61 validity = False collection_state = collection_state.upper() - ends_at_present_flag = str(ends_at_present_flag).lower() if ends_at_present_flag else None + ends_at_present_flag = ( + str(ends_at_present_flag).lower() if ends_at_present_flag else None + ) if collection_state in ["ACTIVE", "IN WORK"]: validity = (not ending_date_time) and (ends_at_present_flag == "true") elif collection_state == "COMPLETE": validity = ending_date_time and ( - not ends_at_present_flag or ( - ends_at_present_flag == "false" - ) + not ends_at_present_flag or (ends_at_present_flag == "false") ) - - return { - "valid": validity, - "value": collection_state - } - + + return {"valid": validity, "value": collection_state} + @staticmethod @if_arg def uniqueness_check(list_of_objects, key): @@ -183,10 +183,7 @@ def uniqueness_check(list_of_objects, key): duplicates.add(description) else: seen.add(description) - return { - "valid": not bool(duplicates), - "value": ', '.join(duplicates) - } + return {"valid": not bool(duplicates), "value": ", ".join(duplicates)} @staticmethod def get_data_url_check(related_urls, key): @@ -207,7 +204,7 @@ def get_data_url_check(related_urls, key): "Description": "The LP DAAC product page provides information on Science Data Set layers and links for user guides, ATBDs, data access, tools, customer support, etc.", "URL_Content_Type": { "Type": "GET DATA", - "Subtype>: "LAADS" + "Subtype>: "LAADS" }, "URL": "https://doi.org/10.5067/MODIS/MOD13Q1.061", ... @@ -218,14 +215,14 @@ def get_data_url_check(related_urls, key): or ["URL_Content_Type", "Type"] """ - return_obj = { 'valid': False, 'value': 'N/A' } + return_obj = {"valid": False, "value": "N/A"} for url_obj in related_urls: type = url_obj.get(key[0]) if len(key) == 2: type = (type or {}).get(key[1]) if (validity := type == "GET DATA") and (url := url_obj.get("URL")): - return_obj['valid'] = validity - return_obj['value'] = url + return_obj["valid"] = validity + return_obj["value"] = url break return return_obj @@ -236,7 +233,4 @@ def count_check(count, values, key): if not isinstance(items, list): items = [items] num_items = len(items) - return { - "valid": int(count) == num_items, - "value": (count, num_items) - } + return {"valid": int(count) == num_items, "value": (count, num_items)} diff --git a/pyQuARC/code/datetime_validator.py b/pyQuARC/code/datetime_validator.py index 2b4d6ced..34f67186 100644 --- a/pyQuARC/code/datetime_validator.py +++ b/pyQuARC/code/datetime_validator.py @@ -7,7 +7,6 @@ from .utils import cmr_request, if_arg, set_cmr_prms - class DatetimeValidator(BaseValidator): """ Validator class for datetime datatype @@ -50,7 +49,7 @@ def _iso_date(date_string): Returns: (datetime.datetime) If the string is valid iso string, False otherwise """ - + try: value = datetime.strptime(date_string, "%Y-%m-%d") return value @@ -89,7 +88,8 @@ def date_or_datetime_format_check(datetime_string): (dict) An object with the validity of the check and the instance """ return { - "valid": bool(DatetimeValidator._iso_datetime(datetime_string)) or bool(DatetimeValidator._iso_date(datetime_string)), + "valid": bool(DatetimeValidator._iso_datetime(datetime_string)) + or bool(DatetimeValidator._iso_date(datetime_string)), "value": datetime_string, } @@ -101,19 +101,24 @@ def compare(first, second, relation): Returns: (dict) An object with the validity of the check and the instance """ - first = (DatetimeValidator._iso_datetime(first) or DatetimeValidator._iso_date(first)).replace(tzinfo=pytz.utc) - second = DatetimeValidator._iso_datetime(second) or DatetimeValidator._iso_date(second) - if not(second): + first = ( + DatetimeValidator._iso_datetime(first) or DatetimeValidator._iso_date(first) + ).replace(tzinfo=pytz.utc) + second = DatetimeValidator._iso_datetime(second) or DatetimeValidator._iso_date( + second + ) + if not (second): second = datetime.now() - second = second.replace(tzinfo=pytz.UTC) # Making it UTC for comparison with other UTC times + second = second.replace( + tzinfo=pytz.UTC + ) # Making it UTC for comparison with other UTC times result = BaseValidator.compare(first, second, relation) - return { - "valid": result, - "value": (str(first), str(second)) - } + return {"valid": result, "value": (str(first), str(second))} @staticmethod - def validate_datetime_against_granules(datetime, collection_shortname, version, sort_key, time_key): + def validate_datetime_against_granules( + datetime, collection_shortname, version, sort_key, time_key + ): """ Validates the collection datetime against the datetime of the last granule in the collection @@ -125,29 +130,32 @@ def validate_datetime_against_granules(datetime, collection_shortname, version, Returns: (dict) An object with the validity of the check and the instance """ - cmr_prms = set_cmr_prms({ - "short_name": collection_shortname, - "version": version, - "sort_key[]": sort_key, - }, "json", "granules") + cmr_prms = set_cmr_prms( + { + "short_name": collection_shortname, + "version": version, + "sort_key[]": sort_key, + }, + "json", + "granules", + ) granules = cmr_request(cmr_prms) validity = True last_granule_datetime = None - if len(granules['feed']['entry']) > 0: - last_granule = granules['feed']['entry'][0] + if len(granules["feed"]["entry"]) > 0: + last_granule = granules["feed"]["entry"][0] last_granule_datetime = last_granule.get(time_key) validity = datetime == last_granule_datetime - return { - "valid": validity, - "value": (datetime, last_granule_datetime) - } + return {"valid": validity, "value": (datetime, last_granule_datetime)} @staticmethod @if_arg - def validate_ending_datetime_against_granules(ending_datetime, collection_shortname, version): + def validate_ending_datetime_against_granules( + ending_datetime, collection_shortname, version + ): """ Validates the collection EndingDatetime against the datetime of the last granule in the collection @@ -159,16 +167,14 @@ def validate_ending_datetime_against_granules(ending_datetime, collection_shortn (dict) An object with the validity of the check and the instance """ return DatetimeValidator.validate_datetime_against_granules( - ending_datetime, - collection_shortname, - version, - '-end_date', - 'time_end' + ending_datetime, collection_shortname, version, "-end_date", "time_end" ) @staticmethod @if_arg - def validate_beginning_datetime_against_granules(beginning_datetime, collection_shortname, version): + def validate_beginning_datetime_against_granules( + beginning_datetime, collection_shortname, version + ): """ Validates the collection BeginningDateTime against the datetime of the last granule in the collection @@ -183,6 +189,6 @@ def validate_beginning_datetime_against_granules(beginning_datetime, collection_ beginning_datetime, collection_shortname, version, - 'start_date', - 'time_start' + "start_date", + "time_start", ) diff --git a/pyQuARC/code/gcmd_validator.py b/pyQuARC/code/gcmd_validator.py index ae2048ac..7febd903 100644 --- a/pyQuARC/code/gcmd_validator.py +++ b/pyQuARC/code/gcmd_validator.py @@ -40,9 +40,7 @@ def __init__(self): ), "provider": GcmdValidator._create_hierarchy_dict( self._read_from_csv( - "providers", - columns=["Short_Name", "Long_Name"], - hierarchy=True + "providers", columns=["Short_Name", "Long_Name"], hierarchy=True ) ), "provider_short_name": self._read_from_csv( @@ -53,9 +51,7 @@ def __init__(self): ), "instrument": GcmdValidator._create_hierarchy_dict( self._read_from_csv( - "instruments", - columns=["Short_Name", "Long_Name"], - hierarchy=True + "instruments", columns=["Short_Name", "Long_Name"], hierarchy=True ) ), "instrument_short_name": self._read_from_csv( @@ -66,9 +62,7 @@ def __init__(self): ), "campaign": GcmdValidator._create_hierarchy_dict( self._read_from_csv( - "projects", - columns=["Short_Name", "Long_Name"], - hierarchy=True + "projects", columns=["Short_Name", "Long_Name"], hierarchy=True ) ), "campaign_short_name": self._read_from_csv( @@ -82,9 +76,7 @@ def __init__(self): ), "platform": GcmdValidator._create_hierarchy_dict( self._read_from_csv( - "platforms", - columns=["Short_Name", "Long_Name"], - hierarchy=True + "platforms", columns=["Short_Name", "Long_Name"], hierarchy=True ) ), "platform_short_name": self._read_from_csv( @@ -93,9 +85,7 @@ def __init__(self): "platform_long_name": self._read_from_csv( "platforms", columns=["Long_Name"] ), - "platform_type": self._read_from_csv( - "platforms", columns=["Category"] - ), + "platform_type": self._read_from_csv("platforms", columns=["Category"]), "rucontenttype": self._read_from_csv( "rucontenttype", columns=["Type", "Subtype"] ), @@ -111,12 +101,8 @@ def __init__(self): "temporalresolutionrange": self._read_from_csv( "temporalresolutionrange", columns=["Temporal_Resolution_Range"] ), - "mimetype": self._read_from_csv( - "MimeType", columns=["MimeType"] - ), - "idnnode_shortname": self._read_from_csv( - "idnnode", columns=["Short_Name"] - ) + "mimetype": self._read_from_csv("MimeType", columns=["MimeType"]), + "idnnode_shortname": self._read_from_csv("idnnode", columns=["Short_Name"]), } @staticmethod @@ -137,7 +123,9 @@ def _download_files(force=False): # Downloading updated gcmd keyword files response = requests.get(link, headers=headers) data = response.text - with open(SCHEMA_PATHS[keyword], "w", encoding="utf-8") as download_file: + with open( + SCHEMA_PATHS[keyword], "w", encoding="utf-8" + ) as download_file: download_file.write(data) with open(VERSION_FILE, "w") as version_file: version_file.write(current_datetime.strftime(DATE_FORMAT)) @@ -157,7 +145,9 @@ def _create_hierarchy_dict(rows): (dict): The lookup dictionary for GCMD hierarchy """ all_keywords = [ - [keyword.upper() for keyword in row if keyword.strip()] for row in rows if row + [keyword.upper() for keyword in row if keyword.strip()] + for row in rows + if row ] hierarchy_dict = {} for row in all_keywords: @@ -171,8 +161,8 @@ def _load_csvs(): for key, _ in GCMD_LINKS.items(): csvfile = open(SCHEMA_PATHS[key]) reader = csv.reader(csvfile) - next(reader) # Remove the metadata (1st column) - headers = next(reader) # Get the headers (2nd column) + next(reader) # Remove the metadata (1st column) + headers = next(reader) # Get the headers (2nd column) list_of_rows = list(reader) csvfile.close() content[key] = headers, list_of_rows @@ -207,10 +197,16 @@ def _read_from_csv(self, keyword_kind, columns=None, hierarchy=False): end = (headers.index(columns[-1]) + 1) if columns else None # handling cases when there are multiple entries for same shortname but the first entry has missing long name return_value = [ - [clean_keyword for keyword in useful_data if (clean_keyword := keyword.strip() or 'N/A')] + [ + clean_keyword + for keyword in useful_data + if (clean_keyword := keyword.strip() or "N/A") + ] for row in list_of_rows if ( - useful_data := row[start : end if end else (len(row) - 1)] # remove UUID (last column) + useful_data := row[ + start : end if end else (len(row) - 1) + ] # remove UUID (last column) ) ] return return_value @@ -235,7 +231,7 @@ def merge_dicts(parent, child): return parent, child else: for key in child: - if (parent.get(key) and not(parent.get(key) == LEAF)): + if parent.get(key) and not (parent.get(key) == LEAF): parent[key], _ = GcmdValidator.merge_dicts(parent[key], child[key]) else: parent[key] = child[key] @@ -392,13 +388,13 @@ def validate_horizontal_resolution_range(self, input_keyword): Validates the Horizontal Resolution Range against GCMD 'horizontalresolutionrange' list """ return input_keyword in self.keywords["horizontalresolutionrange"] - + def validate_vertical_resolution_range(self, input_keyword): """ Validates the vertical Resolution Range against GCMD 'verticalresolutionrange' list """ return input_keyword in self.keywords["verticalresolutionrange"] - + def validate_temporal_resolution_range(self, input_keyword): """ Validates the temporal Resolution Range against GCMD 'temporalresolutionrange' list diff --git a/pyQuARC/code/scheduler.py b/pyQuARC/code/scheduler.py index f66e7e11..754e8ebc 100644 --- a/pyQuARC/code/scheduler.py +++ b/pyQuARC/code/scheduler.py @@ -3,7 +3,9 @@ class Scheduler: Schedules the rules based on the applicable ordering """ - def __init__(self, rule_mapping, rules_override, checks, checks_override, metadata_format): + def __init__( + self, rule_mapping, rules_override, checks, checks_override, metadata_format + ): self.check_list = {**checks, **checks_override} self.rule_mapping = {**rule_mapping, **rules_override} self.metadata_format = metadata_format @@ -50,11 +52,9 @@ def dependencies_ordering(self, dependencies, list): for dependency in dependencies: dependency_check = self.check_list.get(dependency[0]) if dependency_check.get("dependencies"): - self.dependencies_ordering( - dependency_check.get("dependencies"), list - ) + self.dependencies_ordering(dependency_check.get("dependencies"), list) Scheduler.append_if_not_exist(dependency[0], list) - + def _find_rule_ids_based_on_check_id(self, check_id): """ Returns all the rule_ids that are based on a check_id @@ -66,7 +66,9 @@ def _find_rule_ids_based_on_check_id(self, check_id): list: list of all the rule_ids that are based on the check_id """ return [ - rule_id for rule_id, rule in self.rule_mapping.items() if (rule.get("check_id") == check_id) or (rule_id == check_id) + rule_id + for rule_id, rule in self.rule_mapping.items() + if (rule.get("check_id") == check_id) or (rule_id == check_id) ] def order_rules(self): @@ -91,8 +93,6 @@ def order_rules(self): print(f"Missing entry for {check_id} in `checks.json`") for dependency in ordered_check_list: - ordered_rules.extend( - self._find_rule_ids_based_on_check_id(dependency) - ) + ordered_rules.extend(self._find_rule_ids_based_on_check_id(dependency)) return ordered_rules diff --git a/pyQuARC/code/schema_validator.py b/pyQuARC/code/schema_validator.py index ef3eac52..11b3f087 100644 --- a/pyQuARC/code/schema_validator.py +++ b/pyQuARC/code/schema_validator.py @@ -18,7 +18,9 @@ class SchemaValidator: PATH_SEPARATOR = "/" def __init__( - self, check_messages, metadata_format=ECHO10_C, + self, + check_messages, + metadata_format=ECHO10_C, ): """ Args: @@ -45,7 +47,9 @@ def read_xml_schema(self): # Path to catalog must be a url catalog_path = f"file:{pathname2url(str(SCHEMA_PATHS['catalog']))}" # Temporarily set the environment variable - os.environ['XML_CATALOG_FILES'] = os.environ.get('XML_CATALOG_FILES', catalog_path) + os.environ["XML_CATALOG_FILES"] = os.environ.get( + "XML_CATALOG_FILES", catalog_path + ) with open(SCHEMA_PATHS[f"{self.metadata_format}_schema"]) as schema_file: file_content = schema_file.read().encode() @@ -78,8 +82,8 @@ def run_json_validator(self, content_to_validate): # workaround to read local referenced schema file (only supports uri) schema_store = { - schema_base.get('$id','/umm-cmn-json-schema.json') : schema_base, - schema_base.get('$id','umm-cmn-json-schema.json') : schema_base, + schema_base.get("$id", "/umm-cmn-json-schema.json"): schema_base, + schema_base.get("$id", "umm-cmn-json-schema.json"): schema_base, } errors = {} @@ -90,18 +94,26 @@ def run_json_validator(self, content_to_validate): schema, format_checker=draft7_format_checker, resolver=resolver ) - for error in sorted(validator.iter_errors(json.loads(content_to_validate)), key=str): - field = SchemaValidator.PATH_SEPARATOR.join([str(x) for x in list(error.path)]) + for error in sorted( + validator.iter_errors(json.loads(content_to_validate)), key=str + ): + field = SchemaValidator.PATH_SEPARATOR.join( + [str(x) for x in list(error.path)] + ) message = error.message remediation = None - if error.validator == "oneOf" and (check_message := self.check_messages.get(error.validator)): - fields = [f'{field}/{obj["required"][0]}' for obj in error.validator_value] + if error.validator == "oneOf" and ( + check_message := self.check_messages.get(error.validator) + ): + fields = [ + f'{field}/{obj["required"][0]}' for obj in error.validator_value + ] message = check_message["failure"].format(fields) remediation = check_message["remediation"] errors.setdefault(field, {})["schema"] = { "message": [f"Error: {message}"], "remediation": remediation, - "valid": False + "valid": False, } return errors @@ -126,16 +138,14 @@ def _build_errors(error_log, paths): # the following 3 lines of code removes the namespace namespaces = re.findall("(\{http[^}]*\})", line) for namespace in namespaces: - line = line.replace(namespace, '') - field_name = re.search("Element\s\'(.*)\':", line)[1] - field_paths = [ - abs_path for abs_path in paths if field_name in abs_path - ] + line = line.replace(namespace, "") + field_name = re.search("Element\s'(.*)':", line)[1] + field_paths = [abs_path for abs_path in paths if field_name in abs_path] field_name = field_paths[0] if len(field_paths) == 1 else field_name - message = re.search("Element\s\'.+\':\s(\[.*\])?(.*)", line)[2].strip() + message = re.search("Element\s'.+':\s(\[.*\])?(.*)", line)[2].strip() errors.setdefault(field_name, {})["schema"] = { "message": [f"Error: {message}"], - "valid": False + "valid": False, } return errors @@ -159,7 +169,7 @@ def run_xml_validator(self, content_to_validate): # The validator only gives the field name, not full path # Getting this to map it to the full path later paths = [] - for node in doc.xpath('//*'): + for node in doc.xpath("//*"): if not node.getchildren() and node.text: paths.append(doc.getpath(node)[1:]) diff --git a/pyQuARC/code/string_validator.py b/pyQuARC/code/string_validator.py index e2993c15..809c8ca0 100644 --- a/pyQuARC/code/string_validator.py +++ b/pyQuARC/code/string_validator.py @@ -359,40 +359,43 @@ def idnnode_shortname_gcmd_check(resource_type): @staticmethod def _validate_against_collection(prm_value, entry_title, short_name, version, key): - cmr_prms = set_cmr_prms({ - "entry_title": entry_title, - "short_name": short_name, - "version": version - }, "umm_json") + cmr_prms = set_cmr_prms( + {"entry_title": entry_title, "short_name": short_name, "version": version}, + "umm_json", + ) if not (collection_in_cmr(cmr_prms)): return True - cmr_request_prms = f'{cmr_prms}&{key}={prm_value}' - hits = cmr_request(cmr_request_prms).get('hits', 0) + cmr_request_prms = f"{cmr_prms}&{key}={prm_value}" + hits = cmr_request(cmr_request_prms).get("hits", 0) return hits > 0 @staticmethod @if_arg - def granule_project_short_name_check(project_shortname, entry_title=None, short_name=None, version=None): - validity = StringValidator._validate_against_collection(project_shortname, entry_title, short_name, version, 'project') - return { - "valid": validity, - "value": project_shortname - } + def granule_project_short_name_check( + project_shortname, entry_title=None, short_name=None, version=None + ): + validity = StringValidator._validate_against_collection( + project_shortname, entry_title, short_name, version, "project" + ) + return {"valid": validity, "value": project_shortname} @staticmethod @if_arg - def granule_sensor_short_name_check(sensor_shortname, entry_title=None, short_name=None, version=None): - validity = StringValidator._validate_against_collection(sensor_shortname, entry_title, short_name, version, 'instrument') - return { - "valid": validity, - "value": sensor_shortname - } - + def granule_sensor_short_name_check( + sensor_shortname, entry_title=None, short_name=None, version=None + ): + validity = StringValidator._validate_against_collection( + sensor_shortname, entry_title, short_name, version, "instrument" + ) + return {"valid": validity, "value": sensor_shortname} + @staticmethod @if_arg - def validate_granule_instrument_against_collection(instrument_shortname, collection_shortname=None, version=None, dataset_id=None): + def validate_granule_instrument_against_collection( + instrument_shortname, collection_shortname=None, version=None, dataset_id=None + ): """ Validates the instrument shortname provided in the granule metadata against the instrument shortname provided at the collection level. @@ -406,15 +409,20 @@ def validate_granule_instrument_against_collection(instrument_shortname, collect Returns: (dict) An object with the validity of the check and the instance """ - validity = StringValidator._validate_against_collection(instrument_shortname, dataset_id, collection_shortname, version, "instrument") - return { - "valid": validity, - "value": instrument_shortname - } + validity = StringValidator._validate_against_collection( + instrument_shortname, + dataset_id, + collection_shortname, + version, + "instrument", + ) + return {"valid": validity, "value": instrument_shortname} @staticmethod @if_arg - def validate_granule_platform_against_collection(platform_shortname, collection_shortname=None, version=None, dataset_id=None): + def validate_granule_platform_against_collection( + platform_shortname, collection_shortname=None, version=None, dataset_id=None + ): """ Validates the platform shortname provided in the granule metadata against the platform shortname provided at the collection level. @@ -428,12 +436,11 @@ def validate_granule_platform_against_collection(platform_shortname, collection_ Returns: (dict) An object with the validity of the check and the instance """ - validity = StringValidator._validate_against_collection(platform_shortname, dataset_id, collection_shortname, version, "platform") - return { - "valid": validity, - "value": platform_shortname - } - + validity = StringValidator._validate_against_collection( + platform_shortname, dataset_id, collection_shortname, version, "platform" + ) + return {"valid": validity, "value": platform_shortname} + @if_arg def validate_granule_data_format_against_collection( granule_data_format, collection_shortname=None, version=None, dataset_id=None @@ -466,7 +473,7 @@ def validate_granule_data_format_against_collection( query_string = set_cmr_prms(params, "json") collection = cmr_request(query_string) - + if collection["feed"]["entry"]: return {"valid": True, "value": granule_data_format} return {"valid": False, "value": granule_data_format} diff --git a/pyQuARC/code/tracker.py b/pyQuARC/code/tracker.py index 8eed4816..201126d0 100644 --- a/pyQuARC/code/tracker.py +++ b/pyQuARC/code/tracker.py @@ -11,9 +11,7 @@ def __init__(self, rule_mapping, rules_override, metadata_format): metadata_format (str): The format of the metadata file (eg. echo10, dif10) """ self.data = Tracker.create_initial_track( - rule_mapping, - rules_override, - metadata_format + rule_mapping, rules_override, metadata_format ) @staticmethod @@ -49,11 +47,7 @@ def create_initial_track(rule_mapping, rules_override, metadata_format): rule = rules_override.get(rule_id) or rule_mapping.get(rule_id) for field in rule["fields_to_apply"].get(metadata_format, {}): data[rule_id].append( - { - "field": field["fields"][0], - "applied": False, - "valid": None - } + {"field": field["fields"][0], "applied": False, "valid": None} ) return data diff --git a/pyQuARC/code/url_validator.py b/pyQuARC/code/url_validator.py index 1ce194de..8e23b869 100644 --- a/pyQuARC/code/url_validator.py +++ b/pyQuARC/code/url_validator.py @@ -17,7 +17,7 @@ def __init__(self): @staticmethod def _extract_http_texts(text_with_urls): """ - Extracts anything that starts with 'http' from `text_with_urls`. + Extracts anything that starts with 'http' from `text_with_urls`. This is required for catching "wrong" urls that aren't extracted by `URLExtract.find_urls()` because they are not urls at all An example: https://randomurl Args: @@ -26,10 +26,10 @@ def _extract_http_texts(text_with_urls): Returns: (list) List of texts that start with 'http' from `text_with_urls` """ - texts = text_with_urls.split(' ') + texts = text_with_urls.split(" ") starts_with_http = set() for text in texts: - if text.startswith('http'): + if text.startswith("http"): starts_with_http.add(text) return starts_with_http @@ -47,8 +47,8 @@ def health_and_status_check(text_with_urls): def status_code_from_request(url): headers = get_headers() # timeout = 10 seconds, to allow for slow but not invalid connections - return requests.get(url, headers = headers, timeout=10).status_code - + return requests.get(url, headers=headers, timeout=10).status_code + results = [] validity = True @@ -56,9 +56,7 @@ def status_code_from_request(url): # extract URLs from text extractor = URLExtract() urls = extractor.find_urls(text_with_urls) - urls.extend( - UrlValidator._extract_http_texts(text_with_urls) - ) + urls.extend(UrlValidator._extract_http_texts(text_with_urls)) # remove dots at the end (The URLExtract library catches URLs, but sometimes appends a '.' at the end) # remove duplicated urls @@ -75,11 +73,14 @@ def status_code_from_request(url): if url.startswith("http://"): secure_url = url.replace("http://", "https://") if status_code_from_request(secure_url) == 200: - result = {"url": url, "error": "The URL is secure. Please use 'https' instead of 'http'."} + result = { + "url": url, + "error": "The URL is secure. Please use 'https' instead of 'http'.", + } else: continue else: - result = {"url": url, "error": f'Status code {response_code}'} + result = {"url": url, "error": f"Status code {response_code}"} except requests.ConnectionError: result = {"url": url, "error": "The URL does not exist on Internet."} except: @@ -102,21 +103,16 @@ def doi_check(doi): (dict) An object with the validity of the check and the instance/results """ valid = False - if doi.strip().startswith("10."): # doi always starts with "10." + if doi.strip().startswith("10."): # doi always starts with "10." url = f"https://www.doi.org/{doi}" - valid = UrlValidator.health_and_status_check(url).get("valid") + valid = UrlValidator.health_and_status_check(url).get("valid") return {"valid": valid, "value": doi} @staticmethod @if_arg - def doi_link_update( - value, bad_urls - ): + def doi_link_update(value, bad_urls): validity = True if value in bad_urls: validity = False - return { - "valid": validity, - "Value": value - } + return {"valid": validity, "Value": value} diff --git a/pyQuARC/code/utils.py b/pyQuARC/code/utils.py index a6578c4e..80f5ab1a 100644 --- a/pyQuARC/code/utils.py +++ b/pyQuARC/code/utils.py @@ -13,12 +13,11 @@ def run_function_only_if_arg(*args): if args[0]: return func(*args) else: - return { - "valid": None, - "value": None - } + return {"valid": None, "value": None} + return run_function_only_if_arg + def get_headers(): token = os.environ.get("AUTH_TOKEN") headers = None @@ -26,34 +25,42 @@ def get_headers(): headers = {"Authorization": f"Bearer {token}"} return headers + def _add_protocol(url): if not url.startswith("http"): url = f"https://{url}" return url + def is_valid_cmr_url(url): url = _add_protocol(url) valid = False headers = get_headers() - try: # some invalid url throw an exception - response = requests.get(url, headers=headers, timeout=5) # some invalid urls freeze + try: # some invalid url throw an exception + response = requests.get( + url, headers=headers, timeout=5 + ) # some invalid urls freeze valid = response.status_code == 200 and response.headers.get("CMR-Request-Id") except: valid = False return valid + def get_cmr_url(): cmr_url = os.environ.get("CMR_URL", CMR_URL) return _add_protocol(cmr_url) -def set_cmr_prms(params, format='json', type="collections"): + +def set_cmr_prms(params, format="json", type="collections"): base_url = f"{type}.{format}?" - params = { key:value for key, value in params.items() if value } + params = {key: value for key, value in params.items() if value} return f"{base_url}{urllib.parse.urlencode(params)}" + def cmr_request(cmr_prms): headers = get_headers() - return requests.get(f'{get_cmr_url()}/search/{cmr_prms}', headers=headers).json() + return requests.get(f"{get_cmr_url()}/search/{cmr_prms}", headers=headers).json() + def collection_in_cmr(cmr_prms): - return cmr_request(cmr_prms).get('hits', 0) > 0 + return cmr_request(cmr_prms).get("hits", 0) > 0 diff --git a/pyQuARC/main.py b/pyQuARC/main.py index 0bd0fd30..c0890e31 100644 --- a/pyQuARC/main.py +++ b/pyQuARC/main.py @@ -6,7 +6,7 @@ from tqdm import tqdm -if __name__ == '__main__': +if __name__ == "__main__": from code.checker import Checker from code.constants import COLOR, ECHO10_C, SUPPORTED_FORMATS from code.downloader import Downloader @@ -64,10 +64,11 @@ def __init__( self.errors = [] self.file_path = ( - file_path if file_path else os.path.join( - ABS_PATH, - f"../tests/fixtures/test_cmr_metadata.{metadata_format}" - ) + file_path + if file_path + else os.path.join( + ABS_PATH, f"../tests/fixtures/test_cmr_metadata.{metadata_format}" + ) ) self.metadata_format = metadata_format self.checks_override = checks_override @@ -93,7 +94,7 @@ def _cmr_query(self): # page_num query param already_selected = False - page_qparams = ['page_size', 'page_num', 'offset'] + page_qparams = ["page_size", "page_num", "offset"] for qparam in page_qparams: if qparam in self.query: already_selected = True @@ -103,10 +104,14 @@ def _cmr_query(self): collected = 0 page_num = 1 - orig_query = f"{self.query}&page_size={page_size}" if not already_selected else self.query + orig_query = ( + f"{self.query}&page_size={page_size}" + if not already_selected + else self.query + ) query = orig_query headers = get_headers() - + while True: response = requests.get(query, headers=headers) @@ -119,10 +124,7 @@ def _cmr_query(self): collected += len(collections) - concept_ids.extend([ - collection["id"] - for collection in collections - ]) + concept_ids.extend([collection["id"] for collection in collections]) if collected >= hits or already_selected: break @@ -143,34 +145,30 @@ def validate(self): metadata_format=self.metadata_format, checks_override=self.checks_override, rules_override=self.rules_override, - messages_override=self.messages_override + messages_override=self.messages_override, ) if self.concept_ids: for concept_id in tqdm(self.concept_ids): - downloader = Downloader(concept_id, self.metadata_format, self.version, self.cmr_host) + downloader = Downloader( + concept_id, self.metadata_format, self.version, self.cmr_host + ) if not (content := downloader.download()): self.errors.append( { "concept_id": concept_id, - "errors": [], - "pyquarc_errors": [ - { - "message": "No metadata content found. Please make sure the concept id is correct.", - "details": f"The request to CMR {self.cmr_host} failed for concept id {concept_id}", - } - ] + "errors": {}, + "pyquarc_errors": downloader.errors, } ) continue content = content.encode() - validation_errors, pyquarc_errors = checker.run(content) self.errors.append( { "concept_id": concept_id, "errors": validation_errors, - "pyquarc_errors": pyquarc_errors + "pyquarc_errors": pyquarc_errors, } ) @@ -183,7 +181,7 @@ def validate(self): { "file": self.file_path, "errors": validation_errors, - "pyquarc_errors": pyquarc_errors + "pyquarc_errors": pyquarc_errors, } ) return self.errors @@ -194,41 +192,46 @@ def _error_message(messages): result_string = "" for message in messages: colored_message = [ - message.replace( - text, - f"{COLOR[severity]}{text}{END}" - ) + message.replace(text, f"{COLOR[severity]}{text}{END}") for severity in severities if (text := severity.title()) and message.startswith(text) ][0] - result_string += (f"\t\t{colored_message}{END}\n") + result_string += f"\t\t{colored_message}{END}\n" return result_string def display_results(self): - result_string = ''' + result_string = """ ******************************** ** Metadata Validation Errors ** - ********************************\n''' + ********************************\n""" error_prompt = "" for error in self.errors: title = error.get("concept_id") or error.get("file") - error_prompt += (f"\n\t{COLOR['title']}{COLOR['bright']}METADATA: {title}{END}\n") + error_prompt += ( + f"\n\t{COLOR['title']}{COLOR['bright']}METADATA: {title}{END}\n" + ) validity = True for field, result in error["errors"].items(): for rule_type, value in result.items(): if not value.get("valid"): messages = value.get("message") - error_prompt += (f"\n\t>> {field}: {END}\n") + error_prompt += f"\n\t>> {field}: {END}\n" error_prompt += self._error_message(messages) - error_prompt += (f"\t\t{remedy}\n") if (remedy := value.get('remediation')) else "" + error_prompt += ( + (f"\t\t{remedy}\n") + if (remedy := value.get("remediation")) + else "" + ) validity = False if validity: error_prompt += "\n\tNo validation errors\n" - if (pyquarc_errors := error["pyquarc_errors"]): - error_prompt += (f"\n\t {COLOR['title']}{COLOR['bright']} pyQuARC ERRORS: {END}\n") + if pyquarc_errors := error["pyquarc_errors"]: + error_prompt += ( + f"\n\t {COLOR['title']}{COLOR['bright']} pyQuARC ERRORS: {END}\n" + ) for error in pyquarc_errors: - error_prompt += (f"\t\t ERROR: {error['message']}. Details: {error['details']} \n") + error_prompt += f"\t\t ERROR: {error['message']}. Details: {error['details']} \n" result_string += error_prompt print(result_string) @@ -236,14 +239,14 @@ def display_results(self): if __name__ == "__main__": """ - parse command line arguments (argparse) - --query - --concept_ids - --file - --fake - --format - --cmr_host - --version + parse command line arguments (argparse) + --query + --concept_ids + --file + --fake + --format + --cmr_host + --version """ parser = argparse.ArgumentParser() download_group = parser.add_mutually_exclusive_group() @@ -300,7 +303,7 @@ def display_results(self): "No metadata given, add --query or --concept_ids or --file or --fake" ) format = args.format or ECHO10_C - if (format not in SUPPORTED_FORMATS): + if format not in SUPPORTED_FORMATS: parser.error( f"The given format is not supported. Only {', '.join(SUPPORTED_FORMATS)} are supported." ) @@ -309,7 +312,7 @@ def display_results(self): if not is_valid_cmr_url(cmr_host): raise Exception(f"The given CMR host is not valid: {cmr_host}") os.environ["CMR_URL"] = cmr_host - + arc = ARC( query=args.query, input_concept_ids=args.concept_ids or [], diff --git a/pyQuARC/schemas/version.txt b/pyQuARC/schemas/version.txt index b1e54027..adcf29f0 100644 --- a/pyQuARC/schemas/version.txt +++ b/pyQuARC/schemas/version.txt @@ -1 +1 @@ -2022-09-15 +2023-04-24 \ No newline at end of file diff --git a/setup.py b/setup.py index 9472af12..9ab2b9b9 100644 --- a/setup.py +++ b/setup.py @@ -1,7 +1,7 @@ import setuptools from distutils.util import convert_path -version_path = convert_path('pyQuARC/version.txt') +version_path = convert_path("pyQuARC/version.txt") with open(version_path) as version_file: __version__ = version_file.read().strip() @@ -28,9 +28,9 @@ "License :: OSI Approved :: Apache License, Version 2.0", "Operating System :: OS Independent", ], - keywords='validation metadata cmr quality', - python_requires='>=3.8', + keywords="validation metadata cmr quality", + python_requires=">=3.8", install_requires=requirements, - package_data={'pyQuARC': ['schemas/*', '*.txt'], 'tests': ['fixtures/*']}, + package_data={"pyQuARC": ["schemas/*", "*.txt"], "tests": ["fixtures/*"]}, include_package_data=True, ) diff --git a/tests/common.py b/tests/common.py index ccc69c1f..7a3df4b7 100644 --- a/tests/common.py +++ b/tests/common.py @@ -4,7 +4,5 @@ def read_test_metadata(): - with open( - os.path.join(os.getcwd(), DUMMY_METADATA_FILE_PATH), "r" - ) as content_file: + with open(os.path.join(os.getcwd(), DUMMY_METADATA_FILE_PATH), "r") as content_file: return content_file.read().encode() diff --git a/tests/fixtures/checker.py b/tests/fixtures/checker.py index 19ab5543..e7fe09e6 100644 --- a/tests/fixtures/checker.py +++ b/tests/fixtures/checker.py @@ -8,9 +8,9 @@ "ContactPerson": { "FirstName": "SLESA", "LastName": "OSTRENGA", - "JobPosition": "METADATA AUTHOR" + "JobPosition": "METADATA AUTHOR", } - } + }, }, { "Role": "TECHNICAL CONTACT", @@ -19,19 +19,19 @@ { "FirstName": "DANA", "LastName": "OSTRENGA", - "JobPosition": "METADATA AUTHOR" + "JobPosition": "METADATA AUTHOR", }, { "FirstName": "MICHAEL", "LastName": "BOSILOVICH", - "JobPosition": "INVESTIGATOR" + "JobPosition": "INVESTIGATOR", }, { "blabla": "BOSILOVICH", - } + }, ] - } - } + }, + }, ] }, } @@ -39,17 +39,8 @@ FUNCTION_MAPPING = { "input": [ - { - "datatype": "datetime", - "function": "iso_format_check" - }, - { - "datatype": "datetime", - "function": "format_check" - } + {"datatype": "datetime", "function": "iso_format_check"}, + {"datatype": "datetime", "function": "format_check"}, ], - "output": [ - True, - False - ] + "output": [True, False], } diff --git a/tests/fixtures/custom_checker.py b/tests/fixtures/custom_checker.py index 41a395c7..08b63885 100644 --- a/tests/fixtures/custom_checker.py +++ b/tests/fixtures/custom_checker.py @@ -7,18 +7,25 @@ "Collection/ShortName", "Collection/DataSetId", "Collection/Contacts/Contact/Role", - "Collection/Platforms/Platform/Instruments/Instrument" + "Collection/Platforms/Platform/Instruments/Instrument", ], "output": [ ["ACOS_L2S"], - ["ACOS GOSAT/TANSO-FTS Level 2 Full Physics Standard Product V7.3 (ACOS_L2S) at GES DISC"], + [ + "ACOS GOSAT/TANSO-FTS Level 2 Full Physics Standard Product V7.3 (ACOS_L2S) at GES DISC" + ], ["ARCHIVER", "TECHNICAL CONTACT"], [ - OrderedDict([ - ("ShortName", "TANSO-FTS"), - ("LongName", "Thermal And Near Infrared Sensor For Carbon Observation") - ]) - ] - ] + OrderedDict( + [ + ("ShortName", "TANSO-FTS"), + ( + "LongName", + "Thermal And Near Infrared Sensor For Carbon Observation", + ), + ] + ) + ], + ], } } diff --git a/tests/fixtures/downloader.py b/tests/fixtures/downloader.py index 0db3279e..0967ef42 100644 --- a/tests/fixtures/downloader.py +++ b/tests/fixtures/downloader.py @@ -1,3 +1 @@ -{ - -} +{} diff --git a/tests/fixtures/test_check_files.py b/tests/fixtures/test_check_files.py index 2ce40b24..2e03cf52 100644 --- a/tests/fixtures/test_check_files.py +++ b/tests/fixtures/test_check_files.py @@ -10,16 +10,18 @@ import json import sys import os + sys.path.append(os.getcwd()) from pyQuARC.code.checker import Checker + # Opening JSON files -f = open('tests/fixtures/checks_dif10_master_test_file.json') -f2 = open('tests/fixtures/checks_echo-c_master_test_file.json') -f3 = open('tests/fixtures/checks_echo-g_master_test_file.json') -f4 = open('tests/fixtures/checks_umm-c_master_test_file.json') -f5 = open('tests/fixtures/checks_umm-g_master_test_file.json') -f6 = open('pyQuARC/schemas/rule_mapping.json') -f7 = open('pyQuARC/schemas/checks.json') +f = open("tests/fixtures/checks_dif10_master_test_file.json") +f2 = open("tests/fixtures/checks_echo-c_master_test_file.json") +f3 = open("tests/fixtures/checks_echo-g_master_test_file.json") +f4 = open("tests/fixtures/checks_umm-c_master_test_file.json") +f5 = open("tests/fixtures/checks_umm-g_master_test_file.json") +f6 = open("pyQuARC/schemas/rule_mapping.json") +f7 = open("pyQuARC/schemas/checks.json") # returning JSON objects as dictionaries dif10_checks = json.load(f) echo_c_checks = json.load(f2) @@ -30,164 +32,201 @@ checks = json.load(f7) # note: field / relation / data values should be added to test files in valid / invalid lists code_checker = Checker() -format_dict = {'dif10': dif10_checks, 'echo-c': echo_c_checks, 'echo-g': echo_g_checks, 'umm-c': umm_c_checks, 'umm-g': umm_g_checks} -format_in = '' -rule = '' -check_id = '' -data_type = '' -check_function = '' +format_dict = { + "dif10": dif10_checks, + "echo-c": echo_c_checks, + "echo-g": echo_g_checks, + "umm-c": umm_c_checks, + "umm-g": umm_g_checks, +} +format_in = "" +rule = "" +check_id = "" +data_type = "" +check_function = "" + # functions to call validator function and return what is returned from validator function when given valid or invalid values as arguments def DatetimeValidator_iso_format_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error (within DatetimeValidator_iso_format_check_test function)' + return "error (within DatetimeValidator_iso_format_check_test function)" + + def DatetimeValidator_compare_test(val_function, value): dependency = code_checker.map_to_function("datetime", "iso_format_check") dependency_bool = True - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 for x in value: temp = DatetimeValidator_iso_format_check_test(dependency, value[i]) - if temp['valid'] == False: - return 'Fails dependency' - if (isinstance(value[0],str)): + if temp["valid"] == False: + return "Fails dependency" + if isinstance(value[0], str): temp = DatetimeValidator_iso_format_check_test(dependency, value) - if temp['valid'] == False: - return 'Fails dependency' + if temp["valid"] == False: + return "Fails dependency" try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1], value[i][2])) i = i + 1 return return_list - if (isinstance(value[0],str)): - try: + if isinstance(value[0], str): + try: DatetimeValidator_iso_format_check_test(dependency, value[0]) except: - print('dependency error') + print("dependency error") return val_function(value[0], value[1], value[2]) except: - return 'error' + return "error" + + def DatetimeValidator_date_or_datetime_format_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def UrlValidator_health_and_status_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def DOI_update_check(val_function, value): try: - if (isinstance(value[0],list)): - i = 0 - return_list = [] - for x in value: - return_list.append(val_function(value[i][0], value[i][1])) - i = i +1 - return return_list - elif (isinstance(value[0],str)): - return val_function(value[0], value[1]) + if isinstance(value[0], list): + i = 0 + return_list = [] + for x in value: + return_list.append(val_function(value[i][0], value[i][1])) + i = i + 1 + return return_list + elif isinstance(value[0], str): + return val_function(value[0], value[1]) except: - return 'error' + return "error" + def CustomValidator_one_item_presence_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(*value[i])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(*value) except: - return 'error' + return "error" + + def StringValidator_compare_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1], value[i][2])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2]) except: - return 'error' + return "error" + + def CustomValidator_availability_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1]) except: - return 'error' + return "error" + + def StringValidator_science_keywords_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: - return_list.append(val_function(value[i][0], value[i][1], value[i][2], value[i][3], value[i][4], value[i][5])) + return_list.append( + val_function( + value[i][0], + value[i][1], + value[i][2], + value[i][3], + value[i][4], + value[i][5], + ) + ) i = i + 1 return return_list - if (isinstance(value[0],str)): - return val_function(value[0], value[1], value[2], value[3], value[4], value[5]) + if isinstance(value[0], str): + return val_function( + value[0], value[1], value[2], value[3], value[4], value[5] + ) except: - return 'error' + return "error" + + def StringValidator_location_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(*value[i])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(*value) except: - return 'error' + return "error" + + def CustomValidator_ends_at_present_flag_logic_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: @@ -199,892 +238,1296 @@ def CustomValidator_ends_at_present_flag_logic_check_test(val_function, value): print("error") i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2]) except: - return 'error' + return "error" + + def CustomValidator_ends_at_present_flag_presence_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: if value[i][0] == "": return_list.append(val_function(None, value[i][1], value[i][2])) else: - return_list.append(val_function(value[i][0], value[i][1], value[i][2])) + return_list.append( + val_function(value[i][0], value[i][1], value[i][2]) + ) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): if value[0] == "": return val_function(None, value[1], value[2]) else: return val_function(value[0], value[1], value[2]) except: - return 'error' + return "error" + + def CustomValidator_mime_type_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1], value[i][2])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2]) except: - return 'error' + return "error" + + def CustomValidator_bounding_coordinate_logic_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: - return_list.append(val_function(value[i][0], value[i][1], value[i][2], value[i][3])) + return_list.append( + val_function(value[i][0], value[i][1], value[i][2], value[i][3]) + ) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2], value[3]) except: - return 'error' + return "error" + + def CustomValidator_user_services_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1], value[i][2])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2]) except: - return 'error' + return "error" + + def CustomValidator_doi_missing_reason_explanation_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1], value[i][2])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2]) except: - return 'error' -def DatetimeValidator_validate_ending_datetime_against_granules_test(val_function, value): + return "error" + + +def DatetimeValidator_validate_ending_datetime_against_granules_test( + val_function, value +): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1], value[i][2])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2]) except: - return 'error' -def DatetimeValidator_validate_beginning_datetime_against_granules_test(val_function, value): + return "error" + + +def DatetimeValidator_validate_beginning_datetime_against_granules_test( + val_function, value +): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1], value[i][2])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2]) except: - return 'error' + return "error" + + def StringValidator_controlled_keywords_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1]) except: - return 'error' + return "error" def UrlValidator_Url_check(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) - i = i +1 + i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def CustomValidator_count_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1], value[i][2])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2]) except: - return 'error' + return "error" + + def CustomValidator_collection_progress_consistency_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1], value[i][2])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2]) except: - return 'error' + return "error" + + def StringValidator_organization_short_name_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def UrlValidator_doi_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) - i = i +1 + i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def StringValidator_organization_long_name_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" def UrlValidator_get_data_url_check(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) - i = i +1 + i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + def StringValidator_instrument_short_name_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + def CustomValidator_shortname_uniqueness(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) - i = i +1 + i = i + 1 return return_list - if (isinstance(value[0], str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + def StringValidator_instrument_long_name_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def StringValidator_platform_short_name_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def StringValidator_data_format_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + def StringValidator_platform_long_name_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def StringValidator_spatial_keyword_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def StringValidator_platform_type_gcmd_check_test(valfunction, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + def StringValidator_abstract_length_check(val_function, value): try: - if(isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) - i = i +1 + i = i + 1 return return_list - if (isinstance(value[0], str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'warning' + return "warning" + def StringValidator_characteristic_name_length_check(val_function, value): try: - if(isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) - i = i +1 + i = i + 1 return return_list - if (isinstance(value[0], str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + def StringValidator_campaign_short_name_gcmd_check_test(valfunction, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def StringValidator_characteristic_desc_length_check(val_function, value): try: - if(isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1], value[i][2])) - i = i +1 + i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2]) except: - return 'error' + return "error" + + def StringValidator_Campaign_long_name_gcmd_check_test(valfunction, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def StringValidator_horizontal_range_res_gcmd_check_test(valfunction, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def StringValidator_vertical_range_res_gcmd_check_test(valfunction, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def StringValidator_temporal_range_res_gcmd_check_test(valfunction, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def StringValidator_mime_type_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def StringValidator_idnnode_shortname_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def StringValidator_chrono_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(*value[i])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(*value) except: - return 'error' + return "error" def StringValidator_characteristic_unit_length_check(val_function, value): try: - if(isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1], value[i][2])) - i = i +1 + i = i + 1 return return_list - if (isinstance(value[0], str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2]) except: - return 'error' + return "error" + + def StringValidator_length_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1], value[i][2])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2]) except: - return 'error' + return "error" + def StringValidator_characteristic_value_length_check(val_function, value): try: - if(isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) - i = i +1 + i = i + 1 return return_list - if (isinstance(value[0], str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' -def StringValidator_validate_granule_instrument_against_collection_test(val_function, value): + return "error" + + +def StringValidator_validate_granule_instrument_against_collection_test( + val_function, value +): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: - return_list.append(val_function(value[i][0], value[i][1], value[i][2], value[i][3])) + return_list.append( + val_function(value[i][0], value[i][1], value[i][2], value[i][3]) + ) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2], value[3]) except: - return 'error' + return "error" + + def CustomValidator_boolean_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' + return "error" + + def StringValidator_online_resource_type_gcmd_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0]) except: - return 'error' - + return "error" + def CustomValidator_uniqueness_check_test(val_function, value): try: - if (isinstance(value[0][0],list)): + if isinstance(value[0][0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1])) i = i + 1 return return_list - if (isinstance(value[0][0],dict)): + if isinstance(value[0][0], dict): return val_function(value[0], value[1]) except: - return 'error' -def StringValidator_validate_granule_platform_against_collection_test(val_function, value): + return "error" + + +def StringValidator_validate_granule_platform_against_collection_test( + val_function, value +): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: - return_list.append(val_function(value[i][0], value[i][1], value[i][2], value[i][3])) + return_list.append( + val_function(value[i][0], value[i][1], value[i][2], value[i][3]) + ) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2], value[3]) except: - return 'error' + return "error" + + def StringValidator_granule_project_short_name_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: - return_list.append(val_function(value[i][0], value[i][1], value[i][2], value[i][3])) + return_list.append( + val_function(value[i][0], value[i][1], value[i][2], value[i][3]) + ) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2], value[3]) except: - return 'error' + return "error" + + def StringValidator_granule_sensor_short_name_check_test(val_function, value): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: - return_list.append(val_function(value[i][0], value[i][1], value[i][2], value[i][3])) + return_list.append( + val_function(value[i][0], value[i][1], value[i][2], value[i][3]) + ) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2], value[3]) except: + return "error" - return 'error' -def StringValidator_validate_granule_data_format_against_collection_test(val_function, value): + +def StringValidator_validate_granule_data_format_against_collection_test( + val_function, value +): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: - return_list.append(val_function(value[i][0], value[i][1], value[i][2], value[i][3])) + return_list.append( + val_function(value[i][0], value[i][1], value[i][2], value[i][3]) + ) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(value[0], value[1], value[2], value[3]) except: - return 'error' -def StringValidator_organization_short_long_name_consistency_check_test(val_function, value): + return "error" + + +def StringValidator_organization_short_long_name_consistency_check_test( + val_function, value +): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(*value[i])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(*value) except: - return 'error' -def StringValidator_instrument_short_long_name_consistency_check_test(val_function, value): + return "error" + + +def StringValidator_instrument_short_long_name_consistency_check_test( + val_function, value +): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(*value[i])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(*value) except: - return 'error' -def StringValidator_platform_short_long_name_consistency_check_test(val_function, value): + return "error" + + +def StringValidator_platform_short_long_name_consistency_check_test( + val_function, value +): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(*value[i])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(*value) except: - return 'error' -def StringValidator_campaign_short_long_name_consistency_check_test(val_function, value): + return "error" + + +def StringValidator_campaign_short_long_name_consistency_check_test( + val_function, value +): try: - if (isinstance(value[0],list)): + if isinstance(value[0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(*value[i])) i = i + 1 return return_list - if (isinstance(value[0],str)): + if isinstance(value[0], str): return val_function(*value) except: - return 'error' + return "error" + + def CustomValidator_get_data_url_check_test(val_function, value): try: - if (isinstance(value[0][0],list)): + if isinstance(value[0][0], list): i = 0 return_list = [] for x in value: return_list.append(val_function(value[i][0], value[i][1])) i = i + 1 return return_list - if (isinstance(value[0][0],dict)): + if isinstance(value[0][0], dict): return val_function(value[0], value[1]) except: - return 'error' + return "error" + + # iterate through metadata formats for k in format_dict.keys(): format_in = k - print('\n----------------------------------------------\n') - print(f'Test output for format {format_in}:') - print('\n----------------------------------------------\n') + print("\n----------------------------------------------\n") + print(f"Test output for format {format_in}:") + print("\n----------------------------------------------\n") format_choice = format_dict[format_in] # iterating through the json for i in format_choice: if i in rule_mapping: try: - rule = rule_mapping[i]['rule_name'] - print(f'rule_name: {rule}') + rule = rule_mapping[i]["rule_name"] + print(f"rule_name: {rule}") except KeyError: pass try: - check_id = rule_mapping[i]['check_id'] - print(f'check_id: {check_id}') + check_id = rule_mapping[i]["check_id"] + print(f"check_id: {check_id}") except KeyError: pass if check_id in checks: - data_type = checks[check_id]['data_type'] - check_function = checks[check_id]['check_function'] + data_type = checks[check_id]["data_type"] + check_function = checks[check_id]["check_function"] val_function = code_checker.map_to_function(data_type, check_function) val_function_name = f"{data_type.title()}Validator.{check_function}" print(f"validator function: {val_function_name}") - valid = format_choice[i]['valid'] - invalid = format_choice[i]['invalid'] + valid = format_choice[i]["valid"] + invalid = format_choice[i]["invalid"] print("test output:") - if val_function_name == 'DatetimeValidator.compare': - print(f"with valid test input: {DatetimeValidator_compare_test(val_function, valid)}") - print(f"with invalid test input: {DatetimeValidator_compare_test(val_function, invalid)}") - if val_function_name == 'UrlValidator.doi_link_update': - print(f"with valid test input: {DOI_update_check(val_function, valid)}") - print(f"with invalid test input: {DOI_update_check(val_function, invalid)}") - if val_function_name == 'DatetimeValidator.date_or_datetime_format_check': - print(f"with valid test input: {DatetimeValidator_date_or_datetime_format_check_test(val_function, valid)}") - print(f"with invalid test input: {DatetimeValidator_date_or_datetime_format_check_test(val_function, invalid)}") + if val_function_name == "DatetimeValidator.compare": + print( + f"with valid test input: {DatetimeValidator_compare_test(val_function, valid)}" + ) + print( + f"with invalid test input: {DatetimeValidator_compare_test(val_function, invalid)}" + ) + if val_function_name == "UrlValidator.doi_link_update": + print( + f"with valid test input: {DOI_update_check(val_function, valid)}" + ) + print( + f"with invalid test input: {DOI_update_check(val_function, invalid)}" + ) + if ( + val_function_name + == "DatetimeValidator.date_or_datetime_format_check" + ): + print( + f"with valid test input: {DatetimeValidator_date_or_datetime_format_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {DatetimeValidator_date_or_datetime_format_check_test(val_function, invalid)}" + ) # assert_func(val_function, DatetimeValidator_date_or_datetime_format_check_test, valid, invalid) - if val_function_name == 'UrlValidator.doi_check': - print(f"with valid test input: {UrlValidator_doi_check_test(val_function, valid)}") - print(f"with invalid test input: {UrlValidator_doi_check_test(val_function, invalid)}") + if val_function_name == "UrlValidator.doi_check": + print( + f"with valid test input: {UrlValidator_doi_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {UrlValidator_doi_check_test(val_function, invalid)}" + ) # assert_func(val_function, DatetimeValidator_date_or_datetime_format_check_test, valid, invalid) - if val_function_name == 'UrlValidator.health_and_status_check': - print(f"with valid test input: {UrlValidator_health_and_status_check_test(val_function, valid)}") - print(f"with invalid test input: {UrlValidator_health_and_status_check_test(val_function, invalid)}") + if val_function_name == "UrlValidator.health_and_status_check": + print( + f"with valid test input: {UrlValidator_health_and_status_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {UrlValidator_health_and_status_check_test(val_function, invalid)}" + ) # assert_func(val_function, UrlValidator_health_and_status_check_test, valid, invalid) - if val_function_name == 'CustomValidator.one_item_presence_check': - print(f"with valid test input: {CustomValidator_one_item_presence_check_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_one_item_presence_check_test(val_function, invalid)}") + if val_function_name == "CustomValidator.one_item_presence_check": + print( + f"with valid test input: {CustomValidator_one_item_presence_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_one_item_presence_check_test(val_function, invalid)}" + ) # assert_func(val_function, CustomValidator_one_item_presence_check_test, valid, invalid) - if val_function_name == 'StringValidator.compare': - print(f"with valid test input: {StringValidator_compare_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_compare_test(val_function, invalid)}") + if val_function_name == "StringValidator.compare": + print( + f"with valid test input: {StringValidator_compare_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_compare_test(val_function, invalid)}" + ) # assert_func(val_function, StringValidator_compare_test, valid, invalid) - if val_function_name == 'CustomValidator.availability_check': - print(f"with valid test input: {CustomValidator_availability_check_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_availability_check_test(val_function, invalid)}") + if val_function_name == "CustomValidator.availability_check": + print( + f"with valid test input: {CustomValidator_availability_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_availability_check_test(val_function, invalid)}" + ) # assert_func(val_function, CustomValidator_availability_check_test, valid, invalid) - if val_function_name == 'StringValidator.science_keywords_gcmd_check': - print(f"with valid test input: {StringValidator_science_keywords_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_science_keywords_gcmd_check_test(val_function, invalid)}") + if val_function_name == "StringValidator.science_keywords_gcmd_check": + print( + f"with valid test input: {StringValidator_science_keywords_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_science_keywords_gcmd_check_test(val_function, invalid)}" + ) # assert_func(val_function, StringValidator_science_keywords_gcmd_check_test, valid, invalid) - if val_function_name == 'StringValidator.location_gcmd_check': - print(f"with valid test input: {StringValidator_location_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_location_gcmd_check_test(val_function, invalid)}") + if val_function_name == "StringValidator.location_gcmd_check": + print( + f"with valid test input: {StringValidator_location_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_location_gcmd_check_test(val_function, invalid)}" + ) # assert_func(val_function, StringValidator_location_gcmd_check_test, valid, invalid) - if val_function_name == 'CustomValidator.ends_at_present_flag_logic_check': - print(f"with valid test input: {CustomValidator_ends_at_present_flag_logic_check_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_ends_at_present_flag_logic_check_test(val_function, invalid)}") + if ( + val_function_name + == "CustomValidator.ends_at_present_flag_logic_check" + ): + print( + f"with valid test input: {CustomValidator_ends_at_present_flag_logic_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_ends_at_present_flag_logic_check_test(val_function, invalid)}" + ) # assert_func(val_function, CustomValidator_ends_at_present_flag_logic_check_test, valid, invalid) - if val_function_name == 'CustomValidator.ends_at_present_flag_presence_check': - print(f"with valid test input: {CustomValidator_ends_at_present_flag_presence_check_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_ends_at_present_flag_presence_check_test(val_function, invalid)}") + if ( + val_function_name + == "CustomValidator.ends_at_present_flag_presence_check" + ): + print( + f"with valid test input: {CustomValidator_ends_at_present_flag_presence_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_ends_at_present_flag_presence_check_test(val_function, invalid)}" + ) # assert_func(val_function, CustomValidator_ends_at_present_flag_presence_check_test, valid, invalid) - if val_function_name == 'CustomValidator.mime_type_check': - print(f"with valid test input: {CustomValidator_mime_type_check_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_mime_type_check_test(val_function, invalid)}") + if val_function_name == "CustomValidator.mime_type_check": + print( + f"with valid test input: {CustomValidator_mime_type_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_mime_type_check_test(val_function, invalid)}" + ) # assert_func(val_function, CustomValidator_mime_type_check_test, valid, invalid) - if val_function_name == 'CustomValidator.bounding_coordinate_logic_check': - print(f"with valid test input: {CustomValidator_bounding_coordinate_logic_check_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_bounding_coordinate_logic_check_test(val_function, invalid)}") + if ( + val_function_name + == "CustomValidator.bounding_coordinate_logic_check" + ): + print( + f"with valid test input: {CustomValidator_bounding_coordinate_logic_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_bounding_coordinate_logic_check_test(val_function, invalid)}" + ) # assert_func(val_function, CustomValidator_bounding_coordinate_logic_check_test, valid, invalid) - if val_function_name == 'CustomValidator.user_services_check': - print(f"with valid test input: {CustomValidator_user_services_check_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_user_services_check_test(val_function, invalid)}") + if val_function_name == "CustomValidator.user_services_check": + print( + f"with valid test input: {CustomValidator_user_services_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_user_services_check_test(val_function, invalid)}" + ) # assert_func(val_function, CustomValidator_user_services_check_test, valid, invalid) - if val_function_name == 'CustomValidator.doi_missing_reason_explanation': - print(f"with valid test input: {CustomValidator_doi_missing_reason_explanation_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_doi_missing_reason_explanation_test(val_function, invalid)}") + if ( + val_function_name + == "CustomValidator.doi_missing_reason_explanation" + ): + print( + f"with valid test input: {CustomValidator_doi_missing_reason_explanation_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_doi_missing_reason_explanation_test(val_function, invalid)}" + ) # assert_func(val_function, CustomValidator_doi_missing_reason_explanation_test, valid, invalid) - if val_function_name == 'DatetimeValidator.validate_ending_datetime_against_granules': - print(f"with valid test input: {DatetimeValidator_validate_ending_datetime_against_granules_test(val_function, valid)}") - print(f"with invalid test input: {DatetimeValidator_validate_ending_datetime_against_granules_test(val_function, invalid)}") + if ( + val_function_name + == "DatetimeValidator.validate_ending_datetime_against_granules" + ): + print( + f"with valid test input: {DatetimeValidator_validate_ending_datetime_against_granules_test(val_function, valid)}" + ) + print( + f"with invalid test input: {DatetimeValidator_validate_ending_datetime_against_granules_test(val_function, invalid)}" + ) # assert_func(val_function, DatetimeValidator_validate_ending_datetime_against_granules_test, valid, invalid) - if val_function_name == 'DatetimeValidator.validate_beginning_datetime_against_granules': - print(f"with valid test input: {DatetimeValidator_validate_beginning_datetime_against_granules_test(val_function, valid)}") - print(f"with invalid test input: {DatetimeValidator_validate_beginning_datetime_against_granules_test(val_function, invalid)}") + if ( + val_function_name + == "DatetimeValidator.validate_beginning_datetime_against_granules" + ): + print( + f"with valid test input: {DatetimeValidator_validate_beginning_datetime_against_granules_test(val_function, valid)}" + ) + print( + f"with invalid test input: {DatetimeValidator_validate_beginning_datetime_against_granules_test(val_function, invalid)}" + ) # assert_func(val_function, DatetimeValidator_validate_beginning_datetime_against_granules_test, valid, invalid) - if val_function_name == 'StringValidator.controlled_keywords_check': - print(f"with valid test input: {StringValidator_controlled_keywords_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_controlled_keywords_check_test(val_function, invalid)}") + if val_function_name == "StringValidator.controlled_keywords_check": + print( + f"with valid test input: {StringValidator_controlled_keywords_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_controlled_keywords_check_test(val_function, invalid)}" + ) # assert_func(val_function, StringValidator_controlled_keywords_check_test, valid, invalid) - if val_function_name == 'CustomValidator.count_check': - print(f"with valid test input: {CustomValidator_count_check_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_count_check_test(val_function, invalid)}") + if val_function_name == "CustomValidator.count_check": + print( + f"with valid test input: {CustomValidator_count_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_count_check_test(val_function, invalid)}" + ) # assert_func(val_function, CustomValidator_count_check_test, valid, invalid) - if val_function_name == 'CustomValidator.collection_progress_consistency_check': - print(f"with valid test input: {CustomValidator_collection_progress_consistency_check_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_collection_progress_consistency_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.organization_short_name_gcmd_check': - print(f"with valid test input: {StringValidator_organization_short_name_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_organization_short_name_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.organization_long_name_gcmd_check': - print(f"with valid test input: {StringValidator_organization_long_name_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_organization_long_name_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.instrument_short_name_gcmd_check': - print(f"with valid test input: {StringValidator_instrument_short_name_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_instrument_short_name_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.instrument_long_name_gcmd_check': - print(f"with valid test input: {StringValidator_instrument_long_name_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_instrument_long_name_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.platform_short_name_gcmd_check': - print(f"with valid test input: {StringValidator_platform_short_name_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_platform_short_name_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.data_format_gcmd_check': - print(f"with valid test input: {StringValidator_data_format_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_data_format_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.platform_long_name_gcmd_check': - print(f"with valid test input: {StringValidator_platform_long_name_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_platform_long_name_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.spatial_keyword_gcmd_check': - print(f"with valid test input: {StringValidator_spatial_keyword_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_spatial_keyword_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.platform_type_gcmd_check': - print(f"with valid test input: {StringValidator_platform_type_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_platform_type_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.campaign_short_name_gcmd_check': - print(f"with valid test input: {StringValidator_campaign_short_name_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_campaign_short_name_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.campaign_long_name_gcmd_check': - print(f"with valid test input: {StringValidator_Campaign_long_name_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_Campaign_long_name_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.horizontal_range_res_gcmd_check': - print(f"with valid test input: {StringValidator_horizontal_range_res_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_horizontal_range_res_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.vertical_range_res_gcmd_check': - print(f"with valid test input: {StringValidator_vertical_range_res_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_vertical_range_res_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.temporal_range_res_gcmd_check': - print(f"with valid test input: {StringValidator_temporal_range_res_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_temporal_range_res_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.mime_type_gcmd_check': - print(f"with valid test input: {StringValidator_mime_type_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_mime_type_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.idnnode_shortname_gcmd_check': - print(f"with valid test input: {StringValidator_idnnode_shortname_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_idnnode_shortname_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.chrono_gcmd_check': - print(f"with valid test input: {StringValidator_chrono_gcmd_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_chrono_gcmd_check_test(val_function, invalid)}") - if val_function_name == 'DatetimeValidator.iso_format_check': - print(f"with valid test input: {DatetimeValidator_iso_format_check_test(val_function, valid)}") - print(f"with invalid test input: {DatetimeValidator_iso_format_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.length_check': - print(f"with valid test input: {StringValidator_length_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_length_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.validate_granule_instrument_against_collection': - print(f"with valid test input: {StringValidator_validate_granule_instrument_against_collection_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_validate_granule_instrument_against_collection_test(val_function, invalid)}") - if val_function_name == 'CustomValidator.boolean_check': - print(f"with valid test input: {CustomValidator_boolean_check_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_boolean_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.online_resource_type_gcmd_check': - print(f"with valid test input: {CustomValidator_boolean_check_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_boolean_check_test(val_function, invalid)}") - if val_function_name == 'CustomValidator.uniqueness_check': - print(f"with valid test input: {CustomValidator_uniqueness_check_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_uniqueness_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.validate_granule_platform_against_collection': - print(f"with valid test input: {StringValidator_validate_granule_platform_against_collection_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_validate_granule_platform_against_collection_test(val_function, invalid)}") - if val_function_name == 'StringValidator.granule_project_short_name_check': - print(f"with valid test input: {StringValidator_granule_project_short_name_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_granule_project_short_name_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.granule_sensor_short_name_check': - print(f"with valid test input: {StringValidator_granule_sensor_short_name_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_granule_sensor_short_name_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.validate_granule_data_format_against_collection': - print(f"with valid test input: {StringValidator_validate_granule_data_format_against_collection_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_validate_granule_data_format_against_collection_test(val_function, invalid)}") - if val_function_name == 'StringValidator.organization_short_long_name_consistency_check': - print(f"with valid test input: {StringValidator_organization_short_long_name_consistency_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_organization_short_long_name_consistency_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.instrument_short_long_name_consistency_check': - print(f"with valid test input: {StringValidator_instrument_short_long_name_consistency_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_instrument_short_long_name_consistency_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.platform_short_long_name_consistency_check': - print(f"with valid test input: {StringValidator_platform_short_long_name_consistency_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_platform_short_long_name_consistency_check_test(val_function, invalid)}") - if val_function_name == 'StringValidator.campaign_short_long_name_consistency_check': - print(f"with valid test input: {StringValidator_campaign_short_long_name_consistency_check_test(val_function, valid)}") - print(f"with invalid test input: {StringValidator_campaign_short_long_name_consistency_check_test(val_function, invalid)}") - if val_function_name == 'CustomValidator.get_data_url_check': - print(f"with valid test input: {CustomValidator_get_data_url_check_test(val_function, valid)}") - print(f"with invalid test input: {CustomValidator_get_data_url_check_test(val_function, invalid)}") # possibly: - create a list of validator check test functions + if ( + val_function_name + == "CustomValidator.collection_progress_consistency_check" + ): + print( + f"with valid test input: {CustomValidator_collection_progress_consistency_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_collection_progress_consistency_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.organization_short_name_gcmd_check" + ): + print( + f"with valid test input: {StringValidator_organization_short_name_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_organization_short_name_gcmd_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.organization_long_name_gcmd_check" + ): + print( + f"with valid test input: {StringValidator_organization_long_name_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_organization_long_name_gcmd_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.instrument_short_name_gcmd_check" + ): + print( + f"with valid test input: {StringValidator_instrument_short_name_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_instrument_short_name_gcmd_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.instrument_long_name_gcmd_check" + ): + print( + f"with valid test input: {StringValidator_instrument_long_name_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_instrument_long_name_gcmd_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.platform_short_name_gcmd_check" + ): + print( + f"with valid test input: {StringValidator_platform_short_name_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_platform_short_name_gcmd_check_test(val_function, invalid)}" + ) + if val_function_name == "StringValidator.data_format_gcmd_check": + print( + f"with valid test input: {StringValidator_data_format_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_data_format_gcmd_check_test(val_function, invalid)}" + ) + if val_function_name == "StringValidator.platform_long_name_gcmd_check": + print( + f"with valid test input: {StringValidator_platform_long_name_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_platform_long_name_gcmd_check_test(val_function, invalid)}" + ) + if val_function_name == "StringValidator.spatial_keyword_gcmd_check": + print( + f"with valid test input: {StringValidator_spatial_keyword_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_spatial_keyword_gcmd_check_test(val_function, invalid)}" + ) + if val_function_name == "StringValidator.platform_type_gcmd_check": + print( + f"with valid test input: {StringValidator_platform_type_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_platform_type_gcmd_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.campaign_short_name_gcmd_check" + ): + print( + f"with valid test input: {StringValidator_campaign_short_name_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_campaign_short_name_gcmd_check_test(val_function, invalid)}" + ) + if val_function_name == "StringValidator.campaign_long_name_gcmd_check": + print( + f"with valid test input: {StringValidator_Campaign_long_name_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_Campaign_long_name_gcmd_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.horizontal_range_res_gcmd_check" + ): + print( + f"with valid test input: {StringValidator_horizontal_range_res_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_horizontal_range_res_gcmd_check_test(val_function, invalid)}" + ) + if val_function_name == "StringValidator.vertical_range_res_gcmd_check": + print( + f"with valid test input: {StringValidator_vertical_range_res_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_vertical_range_res_gcmd_check_test(val_function, invalid)}" + ) + if val_function_name == "StringValidator.temporal_range_res_gcmd_check": + print( + f"with valid test input: {StringValidator_temporal_range_res_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_temporal_range_res_gcmd_check_test(val_function, invalid)}" + ) + if val_function_name == "StringValidator.mime_type_gcmd_check": + print( + f"with valid test input: {StringValidator_mime_type_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_mime_type_gcmd_check_test(val_function, invalid)}" + ) + if val_function_name == "StringValidator.idnnode_shortname_gcmd_check": + print( + f"with valid test input: {StringValidator_idnnode_shortname_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_idnnode_shortname_gcmd_check_test(val_function, invalid)}" + ) + if val_function_name == "StringValidator.chrono_gcmd_check": + print( + f"with valid test input: {StringValidator_chrono_gcmd_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_chrono_gcmd_check_test(val_function, invalid)}" + ) + if val_function_name == "DatetimeValidator.iso_format_check": + print( + f"with valid test input: {DatetimeValidator_iso_format_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {DatetimeValidator_iso_format_check_test(val_function, invalid)}" + ) + if val_function_name == "StringValidator.length_check": + print( + f"with valid test input: {StringValidator_length_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_length_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.validate_granule_instrument_against_collection" + ): + print( + f"with valid test input: {StringValidator_validate_granule_instrument_against_collection_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_validate_granule_instrument_against_collection_test(val_function, invalid)}" + ) + if val_function_name == "CustomValidator.boolean_check": + print( + f"with valid test input: {CustomValidator_boolean_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_boolean_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.online_resource_type_gcmd_check" + ): + print( + f"with valid test input: {CustomValidator_boolean_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_boolean_check_test(val_function, invalid)}" + ) + if val_function_name == "CustomValidator.uniqueness_check": + print( + f"with valid test input: {CustomValidator_uniqueness_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_uniqueness_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.validate_granule_platform_against_collection" + ): + print( + f"with valid test input: {StringValidator_validate_granule_platform_against_collection_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_validate_granule_platform_against_collection_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.granule_project_short_name_check" + ): + print( + f"with valid test input: {StringValidator_granule_project_short_name_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_granule_project_short_name_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.granule_sensor_short_name_check" + ): + print( + f"with valid test input: {StringValidator_granule_sensor_short_name_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_granule_sensor_short_name_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.validate_granule_data_format_against_collection" + ): + print( + f"with valid test input: {StringValidator_validate_granule_data_format_against_collection_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_validate_granule_data_format_against_collection_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.organization_short_long_name_consistency_check" + ): + print( + f"with valid test input: {StringValidator_organization_short_long_name_consistency_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_organization_short_long_name_consistency_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.instrument_short_long_name_consistency_check" + ): + print( + f"with valid test input: {StringValidator_instrument_short_long_name_consistency_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_instrument_short_long_name_consistency_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.platform_short_long_name_consistency_check" + ): + print( + f"with valid test input: {StringValidator_platform_short_long_name_consistency_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_platform_short_long_name_consistency_check_test(val_function, invalid)}" + ) + if ( + val_function_name + == "StringValidator.campaign_short_long_name_consistency_check" + ): + print( + f"with valid test input: {StringValidator_campaign_short_long_name_consistency_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {StringValidator_campaign_short_long_name_consistency_check_test(val_function, invalid)}" + ) + if val_function_name == "CustomValidator.get_data_url_check": + print( + f"with valid test input: {CustomValidator_get_data_url_check_test(val_function, valid)}" + ) + print( + f"with invalid test input: {CustomValidator_get_data_url_check_test(val_function, invalid)}" + ) # possibly: - create a list of validator check test functions # - see if modified val_function_name in function list (ex: f"{data_type.title()}Validator_{check_function}_test) # - call this func with valid and invalid values - print('----------------------------------------------') + print("----------------------------------------------") # close files f.close() f2.close() diff --git a/tests/fixtures/validator.py b/tests/fixtures/validator.py index 061bb21b..5c2f717b 100644 --- a/tests/fixtures/validator.py +++ b/tests/fixtures/validator.py @@ -12,20 +12,20 @@ { "input": "2016-06-1400:00:00.000", "output": False, - } + }, ], "get_path_value": [ { "input": "Contacts/Contact/ContactPersons/ContactPerson/glabb", - "output": set() + "output": set(), }, { "input": "Contacts/Contact/ContactPersons/ContactPerson/blabla", - "output": {"BOSILOVICH"} + "output": {"BOSILOVICH"}, }, { "input": "Contacts/Contact/ContactPersons/ContactPerson/FirstName", - "output": {"DANA", "SLESA", "MICHAEL"} - } - ] + "output": {"DANA", "SLESA", "MICHAEL"}, + }, + ], } diff --git a/tests/test_custom_checker.py b/tests/test_custom_checker.py index 9611406a..9a830191 100644 --- a/tests/test_custom_checker.py +++ b/tests/test_custom_checker.py @@ -16,10 +16,7 @@ def setup_method(self): def test_get_path_value(self): in_out = INPUT_OUTPUT["get_path_value"] for _in, _out in zip(in_out["input"], in_out["output"]): - assert CustomChecker._get_path_value( - self.dummy_metadata, - _in - ) == _out + assert CustomChecker._get_path_value(self.dummy_metadata, _in) == _out dummy_dif_metadata = { "CollectionCitations": [ @@ -27,29 +24,20 @@ def test_get_path_value(self): "Creator": "Kamel Didan", "OnlineResource": { "Linkage": "https://doi.org/10.5067/MODIS/MOD13Q1.061", - "Name": "DOI Landing Page" + "Name": "DOI Landing Page", }, "OtherCitationDetails": "The DOI landing page provides citations in APA and Chicago styles.", "Publisher": "NASA EOSDIS Land Processes DAAC", "ReleaseDate": "2021-02-16", "SeriesName": "MOD13Q1.061", - "Title": "MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061" + "Title": "MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061", } ], "MetadataDates": [ - { - "Type": "CREATE", - "Date": "2021-09-15T15:54:00.000Z" - }, - { - "Type": "UPDATE", - "Date": "2021-09-30T15:54:00.000Z" - } + {"Type": "CREATE", "Date": "2021-09-15T15:54:00.000Z"}, + {"Type": "UPDATE", "Date": "2021-09-30T15:54:00.000Z"}, ], - "DOI": { - "Authority": "https://doi.org", - "DOI": "10.5067/MODIS/MOD13Q1.061" - }, + "DOI": {"Authority": "https://doi.org", "DOI": "10.5067/MODIS/MOD13Q1.061"}, "SpatialExtent": { "GranuleSpatialRepresentation": "GEODETIC", "HorizontalSpatialDomain": { @@ -59,10 +47,10 @@ def test_get_path_value(self): "EastBoundingCoordinate": 180.0, "NorthBoundingCoordinate": 85, "SouthBoundingCoordinate": 89, - "WestBoundingCoordinate": -180.0 + "WestBoundingCoordinate": -180.0, } ], - "CoordinateSystem": "CARTESIAN" + "CoordinateSystem": "CARTESIAN", }, "ResolutionAndCoordinateSystem": { "HorizontalDataResolution": { @@ -70,48 +58,42 @@ def test_get_path_value(self): { "Unit": "Meters", "XDimension": 250.0, - "YDimension": 250.0 + "YDimension": 250.0, } ] } }, - "ZoneIdentifier": "MODIS Sinusoidal Tiling System" + "ZoneIdentifier": "MODIS Sinusoidal Tiling System", }, - "SpatialCoverageType": "HORIZONTAL" - } + "SpatialCoverageType": "HORIZONTAL", + }, } assert CustomChecker._get_path_value( - dummy_dif_metadata, - "CollectionCitations/Creator" + dummy_dif_metadata, "CollectionCitations/Creator" ) == ["Kamel Didan"] assert CustomChecker._get_path_value( - dummy_dif_metadata, - "CollectionCitations/OnlineResource/Name" + dummy_dif_metadata, "CollectionCitations/OnlineResource/Name" ) == ["DOI Landing Page"] assert CustomChecker._get_path_value( - dummy_dif_metadata, - "MetadataDates/Date?Type=UPDATE" + dummy_dif_metadata, "MetadataDates/Date?Type=UPDATE" ) == ["2021-09-30T15:54:00.000Z"] assert CustomChecker._get_path_value( - dummy_dif_metadata, - "MetadataDates/Date?Type=CREATE" + dummy_dif_metadata, "MetadataDates/Date?Type=CREATE" ) == ["2021-09-15T15:54:00.000Z"] - assert CustomChecker._get_path_value( - dummy_dif_metadata, - "DOI/DOI" - ) == ["10.5067/MODIS/MOD13Q1.061"] + assert CustomChecker._get_path_value(dummy_dif_metadata, "DOI/DOI") == [ + "10.5067/MODIS/MOD13Q1.061" + ] assert CustomChecker._get_path_value( dummy_dif_metadata, - "SpatialExtent/HorizontalSpatialDomain/Geometry/BoundingRectangles/WestBoundingCoordinate" + "SpatialExtent/HorizontalSpatialDomain/Geometry/BoundingRectangles/WestBoundingCoordinate", ) == [-180.0] assert CustomChecker._get_path_value( - dummy_dif_metadata, - "SpatialExtent/GranuleSpatialRepresentation" + dummy_dif_metadata, "SpatialExtent/GranuleSpatialRepresentation" ) == ["GEODETIC"] diff --git a/tests/test_datetime_validator.py b/tests/test_datetime_validator.py index cc13d1b5..9f9c0262 100644 --- a/tests/test_datetime_validator.py +++ b/tests/test_datetime_validator.py @@ -4,7 +4,7 @@ class TestValidator: """ - Test cases for the validator script in validator.py + Test cases for the validator script in validator.py """ def setup_method(self): @@ -13,8 +13,7 @@ def setup_method(self): def test_datetime_iso_format_check(self): for input_output in INPUT_OUTPUT["date_datetime_iso_format_check"]: assert ( - DatetimeValidator.iso_format_check( - input_output["input"])["valid"] + DatetimeValidator.iso_format_check(input_output["input"])["valid"] ) == input_output["output"] def test_datetime_compare(self): diff --git a/tests/test_downloader.py b/tests/test_downloader.py index e646c0ce..1e0689be 100644 --- a/tests/test_downloader.py +++ b/tests/test_downloader.py @@ -34,8 +34,7 @@ def test_download(self): def test_concept_id_type_collection(self): assert ( - Downloader._concept_id_type( - self.concept_ids["collection"]["dummy"]) + Downloader._concept_id_type(self.concept_ids["collection"]["dummy"]) == Downloader.COLLECTION ) @@ -73,8 +72,7 @@ def test_log_error(self): dummy_granule = self.concept_ids["granule"]["dummy"] downloader = Downloader(dummy_granule, "echo-g") - downloader.log_error("invalid_concept_id", { - "concept_id": dummy_granule}) + downloader.log_error("invalid_concept_id", {"concept_id": dummy_granule}) downloader.log_error( "request_failed", diff --git a/tests/test_schema_validator.py b/tests/test_schema_validator.py index 197ad843..3e919b40 100644 --- a/tests/test_schema_validator.py +++ b/tests/test_schema_validator.py @@ -2,9 +2,7 @@ from xmltodict import parse from pyQuARC.code.schema_validator import SchemaValidator -KEYS = [ - "no_error_metadata", "bad_syntax_metadata", "test_cmr_metadata" -] +KEYS = ["no_error_metadata", "bad_syntax_metadata", "test_cmr_metadata"] class TestSchemaValidator: @@ -17,17 +15,11 @@ def read_data(self): for data_key in KEYS: # os.path.join(os.getcwd(), DUMMY_METADATA_FILE_PATH) with open( - os.path.join( - os.getcwd(), - f"tests/fixtures/{data_key}.echo-c" - ), - "r" + os.path.join(os.getcwd(), f"tests/fixtures/{data_key}.echo-c"), "r" ) as myfile: result[data_key] = myfile.read().encode() return result def test_xml_validator(self): for data_key in KEYS: - assert self.schema_validator.run_xml_validator( - self.data[data_key] - ) + assert self.schema_validator.run_xml_validator(self.data[data_key]) diff --git a/tests/test_string_validator.py b/tests/test_string_validator.py index d228b8a0..b56b7874 100644 --- a/tests/test_string_validator.py +++ b/tests/test_string_validator.py @@ -3,7 +3,7 @@ class TestValidator: """ - Test cases for the validator script in validator.py + Test cases for the validator script in validator.py """ def setup_method(self): From 49fba9233f5b2368ac05a6d70c959fea0f306655 Mon Sep 17 00:00:00 2001 From: Slesa Adhikari Date: Mon, 24 Apr 2023 17:01:12 -0500 Subject: [PATCH 16/46] Update tests --- tests/test_downloader.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/test_downloader.py b/tests/test_downloader.py index 1e0689be..06dd9652 100644 --- a/tests/test_downloader.py +++ b/tests/test_downloader.py @@ -123,6 +123,7 @@ def test_download_dummy_collection_no_errors(self): "concept_id": dummy_collection, "url": f"https://cmr.earthdata.nasa.gov/search/concepts/{dummy_collection}.echo10", "status_code": 404, + "message": "Something went wrong while downloading the requested metadata. Make sure all the inputs are correct.", }, } ] @@ -150,6 +151,7 @@ def test_download_dummy_granule_no_errors(self): "concept_id": dummy_granule, "url": f"https://cmr.earthdata.nasa.gov/search/concepts/{dummy_granule}.echo10", "status_code": 404, + "message": "Something went wrong while downloading the requested metadata. Make sure all the inputs are correct.", }, } ] From b3c7ac4ac1fbd43bd5bceb6eec3d61762eea48bf Mon Sep 17 00:00:00 2001 From: Slesa Adhikari Date: Mon, 24 Apr 2023 17:11:20 -0500 Subject: [PATCH 17/46] Add `details` to downloader error --- pyQuARC/code/downloader.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/pyQuARC/code/downloader.py b/pyQuARC/code/downloader.py index 82cd8582..2bcb2975 100644 --- a/pyQuARC/code/downloader.py +++ b/pyQuARC/code/downloader.py @@ -107,10 +107,11 @@ def download(self): # gets the response, makes sure it's 200, puts it in an object variable if response.status_code != 200: + message = "Something went wrong while downloading the requested metadata. Make sure all the inputs are correct." try: - message = json.loads(response.text).get("errors") + details = json.loads(response.text).get("errors") except (json.decoder.JSONDecodeError, KeyError): - message = "Something went wrong while downloading the requested metadata. Make sure all the inputs are correct." + details = "N/A" self.log_error( "request_failed", { @@ -118,6 +119,7 @@ def download(self): "url": url, "status_code": response.status_code, "message": message, + "details": details, }, ) return From 3923bb901aa8774abe45100e1e353a91b934b47c Mon Sep 17 00:00:00 2001 From: Slesa Adhikari Date: Mon, 24 Apr 2023 17:14:04 -0500 Subject: [PATCH 18/46] Update tests --- tests/test_downloader.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/test_downloader.py b/tests/test_downloader.py index 06dd9652..ddd7d5db 100644 --- a/tests/test_downloader.py +++ b/tests/test_downloader.py @@ -124,6 +124,7 @@ def test_download_dummy_collection_no_errors(self): "url": f"https://cmr.earthdata.nasa.gov/search/concepts/{dummy_collection}.echo10", "status_code": 404, "message": "Something went wrong while downloading the requested metadata. Make sure all the inputs are correct.", + "details": "N/A", }, } ] @@ -152,6 +153,7 @@ def test_download_dummy_granule_no_errors(self): "url": f"https://cmr.earthdata.nasa.gov/search/concepts/{dummy_granule}.echo10", "status_code": 404, "message": "Something went wrong while downloading the requested metadata. Make sure all the inputs are correct.", + "details": "N/A", }, } ] From 083d32e905a0a075e63364c7e8de24f280f39278 Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Mon, 1 May 2023 15:17:13 -0500 Subject: [PATCH 19/46] added specific DIF10 check for standard product --- pyQuARC/code/custom_validator.py | 22 ++++++++++++++++++++++ pyQuARC/schemas/check_messages.json | 8 ++++++++ pyQuARC/schemas/checks.json | 5 +++++ pyQuARC/schemas/rule_mapping.json | 22 ++++++++++++++-------- 4 files changed, 49 insertions(+), 8 deletions(-) diff --git a/pyQuARC/code/custom_validator.py b/pyQuARC/code/custom_validator.py index 726201b2..8fbe84d5 100644 --- a/pyQuARC/code/custom_validator.py +++ b/pyQuARC/code/custom_validator.py @@ -94,6 +94,28 @@ def one_item_presence_check(*field_values): break return {"valid": validity, "value": value} + + @staticmethod + def dif_standard_product_check(*field_values): + """ + Checks if the Extended_Metadata field in the DIF schema is being + utilized to specify whether or not the collection is a Standard Product. + This check is needed because DIF schema does not have a dedicated field + for Standard Product, and the Extended_Metadata field is also utilized + for other things. + """ + validity = False + value = None + + for field_value in field_values: + if field_value: + if 'StandardProduct' in field_value: + value = field_value + validity = True + break + else: + pass + return {"valid": validity, "value": value} @staticmethod def granule_sensor_presence_check(sensor_values, collection_shortname=None, version=None, dataset_id=None): diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index d149391b..e6c91768 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -319,6 +319,14 @@ }, "remediation": "Recommend indicating whether this is a StandardProduct. For information please see: https://wiki.earthdata.nasa.gov/display/CMR/StandardProduct" }, + "dif_standard_product_check": { + "failure": "The Standard Product is missing.", + "help": { + "message": "", + "url": "" + }, + "remediation": "Recommend indicating whether this is a StandardProduct. For information please see: https://wiki.earthdata.nasa.gov/display/CMR/StandardProduct" + }, "validate_granule_instrument_against_collection": { "failure": "The instrument short name listed in the granule metadata does not match the instrument short name listed in the collection metadata.", "help": { diff --git a/pyQuARC/schemas/checks.json b/pyQuARC/schemas/checks.json index bf83e4bb..30a68f48 100644 --- a/pyQuARC/schemas/checks.json +++ b/pyQuARC/schemas/checks.json @@ -194,6 +194,11 @@ "check_function": "one_item_presence_check", "available": true }, + "dif_standard_product_check": { + "data_type": "custom", + "check_function": "dif_standard_product_check", + "available": true + }, "doi_link_update": { "data_type": "url", "check_function": "doi_link_update", diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 1c6fb17c..866ed940 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -2577,14 +2577,6 @@ ] } ], - "dif10": [ - { - "fields": [ - "DIF/Metadata/Value", - "DIF/Extended_Metadata/Metadata/Value" - ] - } - ], "umm-c": [ { "fields": [ @@ -2596,6 +2588,20 @@ "severity": "warning", "check_id": "one_item_presence_check" }, + "dif_standard_product_check": { + "rule_name": "Standard Product Check", + "fields_to_apply": { + "dif10": [ + { + "fields": [ + "DIF/Extended_Metadata/Metadata/Name" + ] + } + ] + }, + "severity": "warning", + "check_id": "dif_standard_product_check" + }, "validate_granule_instrument_against_collection": { "rule_name": "Granule Instrument Short Name Check", "fields_to_apply": { From bae889c21733fc0bd893012a5a290a182fadc163 Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Mon, 1 May 2023 15:39:56 -0500 Subject: [PATCH 20/46] updated output messages for standard product fails --- pyQuARC/schemas/check_messages.json | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index e6c91768..cbb7ca34 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -312,20 +312,20 @@ "remediation": "Recommend providing the instrument short name." }, "standard_product_check": { - "failure": "The Standard Product is missing.", + "failure": "The Standard Product flag is missing.", "help": { - "message": "", - "url": "" + "message": "For information please see:", + "url": "https://wiki.earthdata.nasa.gov/display/CMR/StandardProduct" }, - "remediation": "Recommend indicating whether this is a StandardProduct. For information please see: https://wiki.earthdata.nasa.gov/display/CMR/StandardProduct" + "remediation": "Recommend indicating whether this is a StandardProduct." }, "dif_standard_product_check": { - "failure": "The Standard Product is missing.", + "failure": "The Standard Product flag is missing.", "help": { - "message": "", - "url": "" + "message": "For information please see:", + "url": "https://wiki.earthdata.nasa.gov/display/CMR/StandardProduct" }, - "remediation": "Recommend indicating whether this is a StandardProduct. For information please see: https://wiki.earthdata.nasa.gov/display/CMR/StandardProduct" + "remediation": "Recommend indicating whether this is a StandardProduct." }, "validate_granule_instrument_against_collection": { "failure": "The instrument short name listed in the granule metadata does not match the instrument short name listed in the collection metadata.", From 214a27e679e226156a03d9b8c0d0ee4bacb68292 Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Tue, 2 May 2023 12:33:38 -0500 Subject: [PATCH 21/46] changed description fields to licensetext fields for license information check --- pyQuARC/schemas/rule_mapping.json | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 3a0136ae..07b4b9f7 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -3691,7 +3691,7 @@ { "fields": [ "Collection/UseConstraints/LicenseURL/URL", - "Collection/UseConstraints/LicenseURL/Description" + "Collection/UseConstraints/LicenseText" ] } ], @@ -3699,7 +3699,7 @@ { "fields": [ "DIF/Use_Constraints/License_URL/URL", - "DIF/Use_Constraints/License_URL/Description" + "DIF/Use_Constraints/License_URL/License_Text" ] } ], @@ -3707,7 +3707,7 @@ { "fields": [ "UseConstraints/LicenseURL/Linkage", - "UseConstraints/LicenseURL/Description" + "UseConstraints/LicenseText" ] } ] From 75716a7d5549cfcfcc4200967002c7e38d7f19b9 Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Tue, 2 May 2023 12:35:34 -0500 Subject: [PATCH 22/46] added URL fields to license URL description check --- pyQuARC/schemas/rule_mapping.json | 3 +++ 1 file changed, 3 insertions(+) diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 07b4b9f7..ad80c4e3 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -3721,6 +3721,7 @@ "echo-c": [ { "fields": [ + "Collection/UseConstraints/LicenseURL/URL", "Collection/UseConstraints/LicenseURL/Description" ] } @@ -3728,6 +3729,7 @@ "dif10": [ { "fields": [ + "DIF/Use_Constraints/License_URL/URL", "DIF/Use_Constraints/License_URL/Description" ] } @@ -3735,6 +3737,7 @@ "umm-c": [ { "fields": [ + "UseConstraints/LicenseURL/Linkage", "UseConstraints/LicenseURL/Description" ] } From df870c9ae0d238f4f76daf78fc69e44f4643dbb9 Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Tue, 2 May 2023 13:03:38 -0500 Subject: [PATCH 23/46] added a check for license URL description --- pyQuARC/code/custom_validator.py | 20 ++++++++++++++++++++ pyQuARC/schemas/checks.json | 5 +++++ pyQuARC/schemas/rule_mapping.json | 2 +- 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/pyQuARC/code/custom_validator.py b/pyQuARC/code/custom_validator.py index 726201b2..fe17b1b0 100644 --- a/pyQuARC/code/custom_validator.py +++ b/pyQuARC/code/custom_validator.py @@ -94,6 +94,26 @@ def one_item_presence_check(*field_values): break return {"valid": validity, "value": value} + + @staticmethod + def license_url_description_check(url_field, description_field): + """ + Determines if a description has been provided for the License URL if a + License URL has been provided in the metadata. + + Args: + url_field (string): license URL string + description_field (string): string describing the URL + """ + validity = True + value = description_field + + if not url_field: + return {"valid": validity, "value": value} + else: + if not description_field: + validity = False + return {"valid": validity, "value": value} @staticmethod def granule_sensor_presence_check(sensor_values, collection_shortname=None, version=None, dataset_id=None): diff --git a/pyQuARC/schemas/checks.json b/pyQuARC/schemas/checks.json index bf83e4bb..05737e1e 100644 --- a/pyQuARC/schemas/checks.json +++ b/pyQuARC/schemas/checks.json @@ -229,6 +229,11 @@ "check_function": "online_resource_type_gcmd_check", "available": true }, + "license_url_description_check": { + "data_type": "custom", + "check_function": "license_url_description_check", + "available": true + }, "uniqueness_check": { "data_type": "custom", "check_function": "uniqueness_check", diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index ad80c4e3..5132d526 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -3744,7 +3744,7 @@ ] }, "severity": "warning", - "check_id": "one_item_presence_check" + "check_id": "license_url_description_check" }, "collection_citation_presence_check": { "rule_name": "Collection Citation Presence Check", From d3177042789b107d2dd0559e4a494302fad5427f Mon Sep 17 00:00:00 2001 From: Shelby Bagwell <70609840+svbagwell@users.noreply.github.com> Date: Thu, 4 May 2023 12:24:24 -0500 Subject: [PATCH 24/46] Update pyQuARC/schemas/rule_mapping.json --- pyQuARC/schemas/rule_mapping.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 5132d526..6527e043 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -3699,7 +3699,7 @@ { "fields": [ "DIF/Use_Constraints/License_URL/URL", - "DIF/Use_Constraints/License_URL/License_Text" + "DIF/Use_Constraints/License_Text" ] } ], From ca0b2e4bad488f9332171b9854a140f3195cf3a4 Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Thu, 4 May 2023 15:25:36 -0500 Subject: [PATCH 25/46] updated echo-10 schema --- pyQuARC/schemas/MetadataCommon.xsd | 63 +++++++++++++++- pyQuARC/schemas/echo-c_schema.xsd | 112 ++++++++++++++++++++++++----- 2 files changed, 156 insertions(+), 19 deletions(-) diff --git a/pyQuARC/schemas/MetadataCommon.xsd b/pyQuARC/schemas/MetadataCommon.xsd index 78caaaa2..b1639527 100644 --- a/pyQuARC/schemas/MetadataCommon.xsd +++ b/pyQuARC/schemas/MetadataCommon.xsd @@ -354,7 +354,7 @@ - + @@ -678,6 +678,7 @@ + @@ -695,4 +696,64 @@ + + + This element stores DOIs that are associated with the collection + such as from campaigns and other related sources. Note: The values + should start with the directory indicator which in ESDIS' case is 10. + If the DOI was registered through ESDIS, the beginning of the string + should be 10.5067. The DOI URL is not stored here; it should be + stored as a RelatedURL. The DOI organization that is responsible + for creating the DOI is described in the Authority element. For + ESDIS records the value of https://doi.org/ should be used. + + + + + + This element stores the DOI (Digital Object Identifier) that + identifies an associated collection. Note: The values should + start with the directory indicator which in ESDIS' case is 10. + If the DOI was registered through ESDIS, the beginning of the + string should be 10.5067. The DOI URL is not stored here; it + should be stored as an OnlineResource. + + + + + + + + + + + + The title of the DOI landing page. The title describes the + DOI object to a user, so they don't have to look it up themselves to + understand the association. + + + + + + + + + + + + The DOI organization that is responsible for creating the + associated DOI is described in the Authority element. + For ESDIS records the value of https://doi.org/ should be used. + + + + + + + + + + + diff --git a/pyQuARC/schemas/echo-c_schema.xsd b/pyQuARC/schemas/echo-c_schema.xsd index edfa6a22..b61b2808 100644 --- a/pyQuARC/schemas/echo-c_schema.xsd +++ b/pyQuARC/schemas/echo-c_schema.xsd @@ -228,37 +228,48 @@ xmlns:xs="http://www.w3.org/2001/XMLSchema"> This element stores the DOI (Digital Object Identifier) that identifies the collection. Note: The values should start with the directory indicator which in ESDIS' case is 10. If the DOI was registered through ESDIS, the beginning of the string should be 10.5067. The DOI URL is not stored here; it should be stored as an OnlineResource. The DOI organization that is responsible for creating the DOI is described in the Authority element. For ESDIS records the value of https://doi.org/ should be used. While this element is not required, NASA metadata providers are strongly encouraged to include DOI and DOI Authority for their collections. For those that want to specify that a DOI is not applicable for their record use the second option. + + + This element stores DOIs that are associated with the collection + such as from campaigns and other related sources. Note: The values + should start with the directory indicator which in ESDIS' case is 10. + If the DOI was registered through ESDIS, the beginning of the string + should be 10.5067. The DOI URL is not stored here; it should be + stored as a RelatedURL. The DOI organization that is responsible + for creating the DOI is described in the Authority element. For + ESDIS records the value of https://doi.org/ should be used. + + + - An element to identify - non-science-quality products such as NRT data. If a - collection does not contain this field, it will be - assumed to be of science-quality. + This element is used to identify the collections ready for end user consumption latency from when the data was acquired by an instrument. NEAR_REAL_TIME is defined to be ready for end user consumption 1 to 3 hours after data acquisition. LOW_LATENCY is defined to be ready for consumption 3 to 24 hours after data acquisition. EXPEDITED is defined to be 1 to 4 days after data acquisition. SCIENCE_QUALITY is defined to mean that a collection has been fully and completely processed which usually takes between 2 to 3 weeks after data acquisition. OTHER is defined for collection where the latency is between EXPEDITED and SCIENCE_QUALITY. - All EOS and non-EOS data and - data products that are archived - by EOSDIS. + All data products that have been fully and completely processed which usually takes between 2 to 3 weeks after data acquisition. - Data from the source that are - available for use within a time that is short in - comparison to important time scales in the phenomena - being studied. This data is not science quality and - is not retained by EOSDIS once the SCIENCE_QUALITY - product is archived + The data is ready for end user consumption 1 to 3 hours after data acquisition. This data is not fully processed and is not retained by EOSDIS once the SCIENCE_QUALITY product is archived. + + + + + The data is ready for end user consumption 3 to 24 hours after data acquisition. This data is not fully processed and is not retained by EOSDIS once the SCIENCE_QUALITY product is archived. + + + + + The data is ready for end user consumption 1 to 4 days after data acquisition. This data is not fully processed and is not retained by EOSDIS once the SCIENCE_QUALITY product is archived. - Any EOS and non-EOS data and data - products, that are not SCIENCE_QUALITY and do not - fall under NEAR_REAL_TIME holdings. + The data is ready for end user consumption between the EXPEDITED and SCIENCE_QUALITY acquisition timeframes. This data is not fully processed and is not retained by EOSDIS once the SCIENCE_QUALITY product is archived. @@ -544,6 +555,66 @@ xmlns:xs="http://www.w3.org/2001/XMLSchema"> + + + + + + + + + Defines the possible values for the Amazon Web Service US Regions + where the data product resides. + + + + + + + + + + + + Defines the possible values for the Amazon Web Service US S3 bucket + and/or object prefix names. + + + + + + + + + + + Defines the URL where the credentials are stored. + + + + + + + + + + + Defines the URL where the credential documentation are stored. + + + + + + + + + + + + + + @@ -559,8 +630,13 @@ xmlns:xs="http://www.w3.org/2001/XMLSchema"> + + + This sub-element if true, describes to end users and machines that this collection's data is free of charge and open for any use the user sees fit. + + - + This element holds the actual license text. If this element is used the LicenseUrl element cannot be used. @@ -2188,7 +2264,7 @@ xmlns:xs="http://www.w3.org/2001/XMLSchema"> data etc. - + This attribute denotes whether the locality/coverage requires horizontal, vertical, or both From 46fed0a32c8a4b4ac3fdc16b96be0d65dc05c713 Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Fri, 5 May 2023 09:31:04 -0500 Subject: [PATCH 26/46] updated umm-c schema --- pyQuARC/schemas/umm-c-json-schema.json | 159 +++++++++++++++++++++-- pyQuARC/schemas/umm-cmn-json-schema.json | 10 +- 2 files changed, 154 insertions(+), 15 deletions(-) diff --git a/pyQuARC/schemas/umm-c-json-schema.json b/pyQuARC/schemas/umm-c-json-schema.json index 1dcb9c2f..21cd5252 100644 --- a/pyQuARC/schemas/umm-c-json-schema.json +++ b/pyQuARC/schemas/umm-c-json-schema.json @@ -83,9 +83,13 @@ } }, "CollectionDataType": { - "description": "Identifies the collection as a Science Quality collection or a non-science-quality collection such as a Near-Real-Time collection.", + "description": "This element is used to identify the collection's ready for end user consumption latency from when the data was acquired by an instrument. NEAR_REAL_TIME is defined to be ready for end user consumption 1 to 3 hours after data acquisition. LOW_LATENCY is defined to be ready for consumption 3 to 24 hours after data acquisition. EXPEDITED is defined to be 1 to 4 days after data acquisition. SCIENCE_QUALITY is defined to mean that a collection has been fully and completely processed which usually takes between 2 to 3 weeks after data acquisition. OTHER is defined for collection where the latency is between EXPEDITED and SCIENCE_QUALITY.", "$ref": "#/definitions/CollectionDataTypeEnum" }, + "StandardProduct": { + "description": "This element is reserved for NASA records only. A Standard Product is a product that has been vetted to ensure that they are complete, consistent, maintain integrity, and satifies the goals of the Earth Observing System mission. The NASA product owners have also commmitted to archiving and maintaining the data products. More information can be found here: https://earthdata.nasa.gov/eosdis/science-system-description/eosdis-standard-products.", + "type": "boolean" + }, "ProcessingLevel": { "description": "The identifier for the processing level of the collection (e.g., Level0, Level1A).", "$ref": "#/definitions/ProcessingLevelType" @@ -261,9 +265,13 @@ "VersionDescription": { "description": "The Version Description of the collection.", "$ref": "umm-cmn-json-schema.json#/definitions/VersionDescriptionType" + }, + "MetadataSpecification": { + "description": "Requires the client, or user, to add in schema information into every collection record. It includes the schema's name, version, and URL location. The information is controlled through enumerations at the end of this schema.", + "$ref": "#/definitions/MetadataSpecificationType" } }, - "required": ["ShortName", "Version", "EntryTitle", "Abstract", "DOI", "DataCenters", "ProcessingLevel", "ScienceKeywords", "TemporalExtents", "SpatialExtent", "Platforms", "CollectionProgress"], + "required": ["ShortName", "Version", "EntryTitle", "Abstract", "DOI", "DataCenters", "ProcessingLevel", "ScienceKeywords", "TemporalExtents", "SpatialExtent", "Platforms", "CollectionProgress", "MetadataSpecification"], @@ -287,6 +295,16 @@ "minLength": 1, "maxLength": 4000 }, + "FreeAndOpenDataType":{ + "description": "This sub-element if true, describes to end users and machines that this collection's data is free of charge and open for any use the user sees fit.", + "type": "boolean" + }, + "EULAIdentifierType": { + "description": "Allows an End User License Agreement to be associated with a collection. This allows a service to check if an end user has accepted a EULA for this collection (data set) before it will allow data to be downloaded or used.", + "type": "string", + "minLength": 1, + "maxLength": 40 + }, "UseConstraintsType": { "description": "This element defines how the data may or may not be used after access is granted to assure the protection of privacy or intellectual property. This includes license text, license URL, or any special restrictions, legal prerequisites, terms and conditions, and/or limitations on using the data set. Data providers may request acknowledgement of the data from users and claim no responsibility for quality and completeness of data.", "oneOf": [{ @@ -296,6 +314,17 @@ "properties": { "Description": { "$ref": "#/definitions/UseConstraintsDescType" + }, + "FreeAndOpenData": { + "$ref": "#/definitions/FreeAndOpenDataType" + }, + "EULAIdentifiers": { + "description": "A list of End User license Agreement identifiers that are associated to a collection. These identifiers can be found in the Earthdata Login application where End User License Agreements are stored. These identifiers allow services to check if an end user has accepted a license agreement before allowing data to be downloaded.", + "type": "array", + "items": { + "$ref": "#/definitions/EULAIdentifierType" + }, + "minItems": 1 } }, "required": ["Description"] @@ -310,6 +339,17 @@ "LicenseURL": { "description": "This element holds the URL and associated information to access the License on the web. If this element is used the LicenseText element cannot be used.", "$ref": "umm-cmn-json-schema.json#/definitions/OnlineResourceType" + }, + "FreeAndOpenData": { + "$ref": "#/definitions/FreeAndOpenDataType" + }, + "EULAIdentifiers": { + "description": "A list of End User license Agreement identifiers that are associated to a collection. These identifiers can be found in the Earthdata Login application where End User License Agreements are stored. These identifiers allow services to check if an end user has accepted a license agreement before allowing data to be downloaded.", + "type": "array", + "items": { + "$ref": "#/definitions/EULAIdentifierType" + }, + "minItems": 1 } }, "required": ["LicenseURL"] @@ -326,6 +366,17 @@ "type": "string", "minLength": 1, "maxLength": 20000 + }, + "FreeAndOpenData": { + "$ref": "#/definitions/FreeAndOpenDataType" + }, + "EULAIdentifiers": { + "description": "A list of End User license Agreement identifiers that are associated to a collection. These identifiers can be found in the Earthdata Login application where End User License Agreements are stored. These identifiers allow services to check if an end user has accepted a license agreement before allowing data to be downloaded.", + "type": "array", + "items": { + "$ref": "#/definitions/EULAIdentifierType" + }, + "minItems": 1 } }, "required": ["LicenseText"] @@ -643,33 +694,90 @@ "type": "string", "enum": ["Atmosphere Layer", "Maximum Altitude", "Maximum Depth", "Minimum Altitude", "Minimum Depth"] }, + "FootprintType": { + "type": "object", + "additionalProperties": false, + "description": "The largest width of an instrument's footprint as measured on the Earths surface. The largest Footprint takes the place of SwathWidth in the Orbit Backtrack Algorithm if SwathWidth does not exist. The optional description element allows the user of the record to be able to distinguish between the different footprints of an instrument if it has more than 1.", + "properties": { + "Footprint": { + "description": "The largest width of an instrument's footprint as measured on the Earths surface. The largest Footprint takes the place of SwathWidth in the Orbit Backtrack Algorithm if SwathWidth does not exist.", + "type": "number" + }, + "FootprintUnit": { + "description": "The Footprint value's unit.", + "type": "string", + "enum": ["Kilometer", "Meter"] + }, + "Description": { + "description": "The description element allows the user of the record to be able to distinguish between the different footprints of an instrument if it has more than 1.", + "type": "string" + } + }, + "required": ["Footprint", "FootprintUnit"] + }, "OrbitParametersType": { "type": "object", "additionalProperties": false, "description": "Orbit parameters for the collection used by the Orbital Backtrack Algorithm.", "properties": { "SwathWidth": { - "description": "Width of the swath at the equator in Kilometers.", + "description": "Total observable width of the satellite sensor nominally measured at the equator.", "type": "number" }, - "Period": { - "description": "Orbital period in decimal minutes.", + "SwathWidthUnit": { + "description": "The SwathWidth value's unit.", + "type": "string", + "enum": ["Kilometer", "Meter"] + }, + "Footprints" : { + "description": "A list of instrument footprints or field of views. A footprint holds the largest width of the described footprint as measured on the earths surface along with the width's unit. An optional description element exists to be able to distinguish between the footprints, if that is desired. This element is optional. If this element is used at least 1 footprint must exist in the list.", + "type": "array", + "items": { + "$ref": "#/definitions/FootprintType" + }, + "minItems": 1 + }, + "OrbitPeriod": { + "description": "The time in decimal minutes the satellite takes to make one full orbit.", "type": "number" }, + "OrbitPeriodUnit": { + "description": "The Orbit Period value's unit.", + "type": "string", + "enum": ["Decimal Minute"] + }, "InclinationAngle": { - "description": "Inclination of the orbit. This is the same as (180-declination) and also the same as the highest latitude achieved by the satellite. Data Unit: Degree.", + "description": "The heading of the satellite as it crosses the equator on the ascending pass. This is the same as (180-declination) and also the same as the highest latitude achieved by the satellite.", "type": "number" }, + "InclinationAngleUnit": { + "description": "The InclinationAngle value's unit.", + "type": "string", + "enum": ["Degree"] + }, "NumberOfOrbits": { - "description": "Indicates the number of orbits.", + "description": "The number of full orbits composing each granule. This may be a fraction of an orbit.", "type": "number" }, "StartCircularLatitude": { "description": "The latitude start of the orbit relative to the equator. This is used by the backtrack search algorithm to treat the orbit as if it starts from the specified latitude. This is optional and will default to 0 if not specified.", "type": "number" + }, + "StartCircularLatitudeUnit": { + "description": "The StartCircularLatitude value's unit.", + "type": "string", + "enum": ["Degree"] } }, - "required": ["SwathWidth", "Period", "InclinationAngle", "NumberOfOrbits"] + "anyOf": [{ + "required": ["SwathWidth", "SwathWidthUnit"] + }, { + "required": ["Footprints"] + }], + "required": ["OrbitPeriod", "OrbitPeriodUnit", "InclinationAngle", "InclinationAngleUnit", "NumberOfOrbits"], + "dependencies": { + "StartCircularLatitude": ["StartCircularLatitudeUnit"] + } }, "GranuleSpatialRepresentationEnum": { "type": "string", @@ -698,10 +806,14 @@ "description": "Defines the minimum and maximum value for one dimension of a two dimensional coordinate system.", "properties": { "MinimumValue": { - "type": "number" + "type": "string", + "minLength": 1, + "maxLength": 80 }, "MaximumValue": { - "type": "number" + "type": "string", + "minLength": 1, + "maxLength": 80 } } }, @@ -1138,9 +1250,9 @@ } }, "CollectionDataTypeEnum": { - "description": "This element is used to identify the collection as a Science Quality Collection or as a non-science-quality collection such as a Near Real Time collection. If a collection does not contain this field, it will be assumed to be of science-quality.", + "description": "This element is used to identify the collection's ready for end user consumption latency from when the data was acquired by an instrument. NEAR_REAL_TIME is defined to be ready for end user consumption 1 to 3 hours after data acquisition. LOW_LATENCY is defined to be ready for consumption 3 to 24 hours after data acquisition. EXPEDITED is defined to be 1 to 4 days after data acquisition. SCIENCE_QUALITY is defined to mean that a collection has been fully and completely processed which usually takes between 2 to 3 weeks after data acquisition. OTHER is defined for collection where the latency is between EXPEDITED and SCIENCE_QUALITY.", "type": "string", - "enum": ["SCIENCE_QUALITY", "NEAR_REAL_TIME", "OTHER"] + "enum": ["NEAR_REAL_TIME", "LOW_LATENCY", "EXPEDITED", "SCIENCE_QUALITY", "OTHER"] }, "CollectionProgressEnum": { "description": "This element describes the production status of the data set. There are five choices for Data Providers: PLANNED refers to data sets to be collected in the future and are thus unavailable at the present time. For Example: The Hydro spacecraft has not been launched, but information on planned data sets may be available. ACTIVE refers to data sets currently in production or data that is continuously being collected or updated. For Example: data from the AIRS instrument on Aqua is being collected continuously. COMPLETE refers to data sets in which no updates or further data collection will be made. For Example: Nimbus-7 SMMR data collection has been completed. DEPRECATED refers to data sets that have been retired, but still can be retrieved. Usually newer products exist that replace the retired data set. NOT APPLICABLE refers to data sets in which a collection progress is not applicable such as a calibration collection. There is a sixth value of NOT PROVIDED that should not be used by a data provider. It is currently being used as a value when a correct translation cannot be done with the current valid values, or when the value is not provided by the data provider.", @@ -1506,6 +1618,29 @@ } }, "required": ["DOI"] + }, + "MetadataSpecificationType": { + "type": "object", + "additionalProperties": false, + "description": "This object requires any metadata record that is validated by this schema to provide information about the schema.", + "properties": { + "URL": { + "description": "This element represents the URL where the schema lives. The schema can be downloaded.", + "type": "string", + "enum": ["https://cdn.earthdata.nasa.gov/umm/collection/v1.17.2"] + }, + "Name": { + "description": "This element represents the name of the schema.", + "type": "string", + "enum": ["UMM-C"] + }, + "Version": { + "description": "This element represents the version of the schema.", + "type": "string", + "enum": ["1.17.2"] + } + }, + "required": ["URL", "Name", "Version"] } } } diff --git a/pyQuARC/schemas/umm-cmn-json-schema.json b/pyQuARC/schemas/umm-cmn-json-schema.json index 766caa32..5bac9af6 100644 --- a/pyQuARC/schemas/umm-cmn-json-schema.json +++ b/pyQuARC/schemas/umm-cmn-json-schema.json @@ -316,7 +316,9 @@ }, "MimeType": { "description": "The mime type of the service.", - "$ref": "#/definitions/URLMimeTypeEnum" + "type": "string", + "minLength": 1, + "maxLength": 80 }, "Size": { "description": "The size of the data.", @@ -434,7 +436,9 @@ }, "MimeType": { "description": "The mime type of the online resource.", - "$ref": "#/definitions/URLMimeTypeEnum" + "type": "string", + "minLength": 1, + "maxLength": 80 } }, "required": ["Linkage"] @@ -1234,7 +1238,7 @@ "application/vnd.google-earth.kml+xml", "image/gif", "image/tiff", "image/bmp", "text/csv", "text/xml", "application/pdf", "application/x-hdf", "application/xhdf5", "application/octet-stream", "application/vnd.google-earth.kmz", "image/jpeg", "image/png", - "image/vnd.collada+xml", "text/html", "text/plain", "Not provided"] + "image/vnd.collada+xml", "application/x-vnd.iso.19139-2+xml", "text/html", "text/plain", "Not provided"] }, "GetServiceTypeFormatEnum": { "type": "string", From ac774a5515023e44eed9602485aac308dc35ea13 Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Fri, 5 May 2023 10:05:38 -0500 Subject: [PATCH 27/46] updated dif10 schema --- .../{UmmCommon_1.3.xsd => UmmCommon_1.2.xsd} | 111 +----- pyQuARC/schemas/catalog.xml | 2 +- pyQuARC/schemas/dif10_schema.xsd | 315 ++++++++++-------- pyQuARC/schemas/ruleset.json | 4 +- 4 files changed, 194 insertions(+), 238 deletions(-) rename pyQuARC/schemas/{UmmCommon_1.3.xsd => UmmCommon_1.2.xsd} (84%) diff --git a/pyQuARC/schemas/UmmCommon_1.3.xsd b/pyQuARC/schemas/UmmCommon_1.2.xsd similarity index 84% rename from pyQuARC/schemas/UmmCommon_1.3.xsd rename to pyQuARC/schemas/UmmCommon_1.2.xsd index 8f4df33c..d192a422 100644 --- a/pyQuARC/schemas/UmmCommon_1.3.xsd +++ b/pyQuARC/schemas/UmmCommon_1.2.xsd @@ -61,45 +61,6 @@ - - - - - - - - Example Usage: - * /DIF/Distribution/Average_Granule_Size_Unit - - - - - - - - - - - - - - - - - - - Example Usage: - * /DIF/Distribution/Distribution_Format_Type - - - - - - - - - - @@ -120,7 +81,6 @@ - - +
@@ -540,46 +501,12 @@ technical contact Much feedback was given on the wisdom of this syntax, based on that feedback the DOI fields in DIF have been changed to support a type allowing for any value.
- changed - CMRSCI-2704,ECSE-226 - 10.3 - Added support for Not Applicable and Missing Reason the Persistent Identifier - - * = New DIF Field - x = DIF field renamed - - | DIF 10.2 | DIF-10.3 | UMM | Notes | - | -------------------------------------| ----------------------------------------------|---------------------------------------| - | Persistent_Identifier (0..1) | Persistent_Identifier (0..1) | DOI (1) | | - | Persistent_Identifier/Type (0..1) | Persistent_Identifier/Type (1) | N/A | | - | Persistent_Identifier/Identifier (1) | Persistent_Identifier/Identifier (1) | DOI/Identifier (1) | | - | N/A | * Persistent_Identifier/Authority (0..1) | DOI/Authority (0..1) | | - | | OR | | | - | N/A | * Persistent_Identifier/Missing_Reason (1) |DOI/MissingReason (1) | | - | N/A | * Persistent_Identifier/Explanation (0..1) |DOI/MissingReason (0..1) | | - -
- - - - - - - - - - - - - - - - - - - - + + + + + @@ -594,24 +521,8 @@ technical contact - - - - - - - Usr to denote that a DOI is not applicable to this collection. - - - - - - - - - diff --git a/pyQuARC/schemas/catalog.xml b/pyQuARC/schemas/catalog.xml index d14f726d..a4ad7ed2 100644 --- a/pyQuARC/schemas/catalog.xml +++ b/pyQuARC/schemas/catalog.xml @@ -3,5 +3,5 @@ - + diff --git a/pyQuARC/schemas/dif10_schema.xsd b/pyQuARC/schemas/dif10_schema.xsd index 8037134d..9ffb184a 100644 --- a/pyQuARC/schemas/dif10_schema.xsd +++ b/pyQuARC/schemas/dif10_schema.xsd @@ -27,7 +27,7 @@ History: 2015-10-26 : 10.2 - Milestone. 2015-01-08 : 10.2 a - Patch, fix /DIF/Additional_Attributes/ParameterRangeBegin 2016-01-27 : 10.2 b - patch, changed date time types on three fields - 2018-03-30 : 10.3 - Update, multiple changes/additions for UMM-C Compliance. See CMRSCI-2627 + --> - + @@ -60,6 +60,7 @@ History: + @@ -69,11 +70,11 @@ History: - + - + @@ -81,10 +82,11 @@ History: + - + @@ -99,9 +101,9 @@ History: - + - +
@@ -253,6 +255,81 @@ History: + + + + + +
/DIF/Associated_DOIs
+ changedUMM16.1 + addedUMM1.16.1 + added Associated DOIs + + This element stores DOIs that are associated with the collection + such as from campaigns and other related sources. Note: The values + should start with the directory indicator which in ESDIS' case is 10. + If the DOI was registered through ESDIS, the beginning of the string + should be 10.5067. The DOI URL is not stored here; it should be + stored as a RelatedURL. The DOI organization that is responsible + for creating the DOI is described in the Authority element. For + ESDIS records the value of https://doi.org/ should be used. + + | ECHO 10 |UMM | DIF 10 | + | -------------- | ------------------ | ---------------- | + | AssociatedDOIs | DOI/AssociatedDOIs | Associated_DOIs | + | - | - | - | + + +
+ + + + This element stores the DOI (Digital Object Identifier) that + identifies an associated collection. Note: The values should + start with the directory indicator which in ESDIS' case is 10. + If the DOI was registered through ESDIS, the beginning of the + string should be 10.5067. The DOI URL is not stored here; it + should be stored as an OnlineResource. + + + + + + + + + + + + The title of the DOI landing page. The title describes the + DOI object to a user, so they don't have to look it up themselves to + understand the association. + + + + + + + + + + + + The DOI organization that is responsible for creating the + associated DOI is described in the Authority element. + For ESDIS records the value of https://doi.org/ should be used. + + + + + + + + + + +
+ @@ -842,11 +919,7 @@ History: renamed GCMD (DIF) - - - Made required - CMRSCI-2698 - Dataset_Progress made required in 10.3 + Describes the production status of the data set regarding its completeness. @@ -1385,16 +1458,8 @@ History: UMM (ECHO) Campaign was incorrectly made required in version 10.0.0a. - - - - - changed - ECSE-264,CMRSCI-2699 - 10.3 - Change cardinality of DIF 10 project/campaign from 0..1 to 0..* + - This entity contains attributes describing the scientific endeavor(s) to which the collection is associated. Scientific @@ -1430,7 +1495,7 @@ History: - + @@ -1465,7 +1530,7 @@ History: - +
none @@ -1476,85 +1541,38 @@ History: | Access_Constraints | - | AccessConstraints | Access_Constraints | No change |
- changed - CMRSCI-2700 - 10.3 - Added fields to Support Access Control - - * = New DIF Field - x = DIF field renamed - - | DIF 10.2 | DIF-10.3 | UMM | Notes | - | ----------------------------- | -------------------------------------------------------|--------------------------------------------------| - | Access_Constraints (0..1) | Access_Constraints (0..1) | AccessConstraints (0..1) | | - | Access_Constraints/Text Block | * Access_Constraints/Description (1) | AccessConstraints/Description (1) | | - | N/A | * Access_Constraints/Access_Control (0..1) | AccessConstraints/Value (0..1) | | - | N/A | * Access_Constraints/Access_Control_Description (0..1) | N/A | | - - - - - - - This sub-element is a free-text description that details access constraints of this collection. - - - - - - - - - - - - - - - - - - - - This sub-element is a free-text description that details the Access Control. - - - - - - - - - -
+ + + - +
changedUMM (DIF) - + | ECHO 10 | UMM | DIF 10 | Notes | | ------- | ------------------| ------------------ | ------------- | | - | UseConstraints | Use_Constraints | No change | - + changed - CMRSCI-2700, ECSE-171 - 10.3 + ECSE-942 + 10.2 updated Added fields to Support licensing elements in UMM Models - * = New DIF Field + * = New DIF Field x = DIF field renamed - - | DIF 10.2 | DIF-10.3 | UMM | Notes | + + | DIF 10.2 | DIF-10.2 updated | UMM | Notes | | ---------------------------| -------------------------------------------------------|------------------------------------------------------| ------------| | Use_Constraints (0..1) | Use_Constraints (0..1) | UseConstraints (0..1) | | | Use_Constraints/Text Block | * Use_Constraints/Description (0..1) | UseConstraints/Description (0..1) | | + | N/A | * Use_Constraints/Free_And_Open_Data (0..1) | UseConstraints/FreeAndOpenData (0..1) | | | N/A | * Use_Constraints/License_Text (0..1) | UseConstraints/LicenseText (0..1) | | | N/A | * Use_Constraints/License_URL (0..1) | UseConstraints/LicenseURL (0..1) | | | N/A | * Use_Constraints/License_URL/URL (1..*) | UseConstraints/LicenseURL/Linkage (1) | | @@ -1579,6 +1597,11 @@ History:
+ + + This sub-element if true, describes to end users and machines that this collection's data is free of charge and open for any use the user sees fit. + + @@ -1714,42 +1737,84 @@ History: | Fees | Price | - | Fees | Not in UMM | - changed - ECSE-333, CMRSCI-2702, ARC-CMR All Day Meeting 2017-10-26 - 10.3 - Added fields to Support Average_Granule_Size, Average_Granule_Size_Unit, Total_Collection_Size, Total_Collection_Size_Unit - - * = New DIF Field - x = DIF field renamed - - | DIF 10.2 | DIF-10.3 | UMM | Notes | - | ----------------------------- | -------------------------------------------|----------------------------------------------------------------|----------| - | Distribution (0..1) | Distribution (0..1) | FileDistributionInformation (0..1) | | - | Distribution_Media (0..1) | Distribution_Media (0..1) | FileDistributionInformation/Media (0..1) | | - | X Distribution_Size (0..1) | * Average_Granule_Size (0..1) | FileDistributionInformation/AverageFileSize (0..1) | | - | X Distribution_Size (0..1) | * Average_Granule_Size_Unit (0..1) | FileDistributionInformation/AverageFileSizeUnit (0..1) | | - | X Distribution_Size (0..1) | * Total_Collection_Size (0..1) | FileDistributionInformation/TotalCollectionFileSize (0..1) | | - | N/A | * Distribution_Collection_Size_Unit (0..1) | FileDistributionInformation/TotalCollectionFileSizeUnit (0..1) | | - | Distribution_Format (0..1) | Distribution_Format (0..1) | FileDistributionInformation/Format (1) | | - | N/A | * Distribution_Format_Type (0..1) | FileDistributionInformation/FormatType (0..1) | | - | Distribution/Fees (0..1) | Distribution/Fees (0..1) | FileDistributionInformation/Fees (0..1) | | - - - - - - - - + - + + + + + +
+ none + + Field describes region S3 bucket or object prefix, S3 Credential API Endpoint, + S3 Credential API Documentation URL involved in distributing the data set through S3. + + | ECHO 10 | UMM | DIF 10 | Notes | + | ---------------------------- | ---------------------------- | ---------------------------- | ---------- | + | DirectDistributionInformation| DirectDistributionInformation| DirectDistributionInformation| | + +
+ + + + + Defines the possible values for the Amazon Web Service US Regions + where the data product resides. + + + + + + + + + + + + Defines the possible values for the Amazon Web Service US S3 bucket + and/or object prefix names. + + + + + + + + + + + Defines the URL where the credentials are stored. + + + + + + + + + + + Defines the URL where the credential documentation are stored. + + + + + + + + + + +
+ @@ -1865,7 +1930,6 @@ History: changedUMM (ECHO)b renameUMM (DIF)clocal consistence changedDIF (CMRSCI-432)10.1 - changedUMM Version 1.410.3changed to optional from required * Require at least one, the datacenter (organization) url should be used if no other url can be found @@ -1880,31 +1944,14 @@ History: | Description | OnlineAccessURL/URLDescription, OnlineResource/Description | Description | Description | ? | Can now be formatted | | - | OnlineResource/MimeType, OnlineAccessURL/Mimetype | MimeType | Mime_Type | ? | No change | - - changed - UMM - 10.3 - Added fields for UMM Compliance - - * = New DIF Field - x = DIF field renamed - - | DIF 9 | DIF 10 |UMM | Notes | - | ---------------- | ---------------------------- |-----------------------------------------------------|-----------| - | N/A | * Application_Profile (0..1) | UseConstraints/LicenseURL/ApplicationProfile (0..1) | | - | N/A | * Function (0..1) | UseConstraints/Function (0..1) | | - - - + - - - + + - @@ -2058,7 +2105,6 @@ History: addedUMM changedUMM10.1 changedCMR10.2
changed to ProcessingLevelIdType
- changedCMRSCI-269810.3
changed to required from optional
Field renamed after a conversation on: @@ -2171,7 +2217,6 @@ History: renamedUMM (DIF)crenamed LastUpdate to Last_Revision_Time addedUMM (ECHO)cadded Future_Review_Time renamedUMM (ECHO)crenamed DeleteTime to Delete_Time - optionalCMRSCI-270110.3Data Dates - changed from required to optional A union of the DIF Metadata event date fields with the three ECHO event time fields. @@ -2198,8 +2243,8 @@ History: - - + + diff --git a/pyQuARC/schemas/ruleset.json b/pyQuARC/schemas/ruleset.json index cb1622ba..ce3bbeca 100644 --- a/pyQuARC/schemas/ruleset.json +++ b/pyQuARC/schemas/ruleset.json @@ -149,7 +149,7 @@ "message-fail": "Error: {DOI/MissingReason} is not a valid value.", "remediation": "The Missing Reason should read \"Not Applicable\"", "help_url": "https://wiki.earthdata.nasa.gov/display/CMR/DOI", - "specification": "ECHO10:\nhttps://git.earthdata.nasa.gov/projects/EMFD/repos/echo-schemas/browse/schemas/10.0/MetadataCommon.xsd#680\n\n\nDIF10.3 (this enumeration is not in 10.2):\nhttps://git.earthdata.nasa.gov/projects/EMFD/repos/dif-schemas/browse/10.x/UmmCommon_1.3.xsd#600\n\n\nUMM-C/Common:\nhttps://git.earthdata.nasa.gov/projects/EMFD/repos/unified-metadata-model/browse/collection/v1.15/umm-cmn-json-schema.json#532", + "specification": "ECHO10:\nhttps://git.earthdata.nasa.gov/projects/EMFD/repos/echo-schemas/browse/schemas/10.0/MetadataCommon.xsd#680\n\n\nDIF10.3 (this enumeration is not in 10.2):\nhttps://git.earthdata.nasa.gov/projects/EMFD/repos/dif-schemas/browse/10.x/UmmCommon_1.2.xsd#600\n\n\nUMM-C/Common:\nhttps://git.earthdata.nasa.gov/projects/EMFD/repos/unified-metadata-model/browse/collection/v1.15/umm-cmn-json-schema.json#532", "spec_version": null, "id": "doi_missing_reason_enumeration_check" }, @@ -163,7 +163,7 @@ "message-fail": "Error: {CollectionDataType} is not a valid value.", "remediation": "ECHO10:\nRecommend the Collection Data Type match one of the following values: [SCIENCE_QUALITY, NEAR_REAL_TIME, OTHER]\n\n\nDIF10.2 & DIF10.3:\nRecommend the Collection Data Type match one of the following values: [SCIENCE_QUALITY, NEAR_REAL_TIME, ON_DEMAND, OTHER]\n\n\n\nUMM-C/Common:", "help_url": null, - "specification": "ECHO10:\nhttps://git.earthdata.nasa.gov/projects/EMFD/repos/echo-schemas/browse/schemas/10.0/Collection.xsd#240\n\n\nDIF10.2:\nhttps://git.earthdata.nasa.gov/projects/EMFD/repos/dif-schemas/browse/10.x/UmmCommon_1.2.xsd#211\n\nDIF10.3:\nhttps://git.earthdata.nasa.gov/projects/EMFD/repos/dif-schemas/browse/10.x/UmmCommon_1.3.xsd#251\n\n\nUMM-C/Common:", + "specification": "ECHO10:\nhttps://git.earthdata.nasa.gov/projects/EMFD/repos/echo-schemas/browse/schemas/10.0/Collection.xsd#240\n\n\nDIF10.2:\nhttps://git.earthdata.nasa.gov/projects/EMFD/repos/dif-schemas/browse/10.x/UmmCommon_1.2.xsd#211\n\nDIF10.3:\nhttps://git.earthdata.nasa.gov/projects/EMFD/repos/dif-schemas/browse/10.x/UmmCommon_1.2.xsd#251\n\n\nUMM-C/Common:", "spec_version": null, "id": "collectiondatatype_enumeration_check" }, From 3abb0b3d16119ec2e9b82a14de31b9aa7071c13e Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Fri, 5 May 2023 13:26:01 -0500 Subject: [PATCH 28/46] updated the license description check --- pyQuARC/schemas/rule_mapping.json | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 5132d526..b19bdfd9 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -3721,24 +3721,27 @@ "echo-c": [ { "fields": [ + "Collection/UseConstraints/LicenseURL/Description", "Collection/UseConstraints/LicenseURL/URL", - "Collection/UseConstraints/LicenseURL/Description" + "Collection/UseConstraints/LicenseText" ] } ], "dif10": [ { "fields": [ + "DIF/Use_Constraints/License_URL/Description", "DIF/Use_Constraints/License_URL/URL", - "DIF/Use_Constraints/License_URL/Description" + "DIF/Use_Constraints/License_URL/License_Text" ] } ], "umm-c": [ { "fields": [ + "UseConstraints/LicenseURL/Description", "UseConstraints/LicenseURL/Linkage", - "UseConstraints/LicenseURL/Description" + "UseConstraints/LicenseText" ] } ] From 8c991e2249a1664e62d97ac9253037e01ce2d98c Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Fri, 5 May 2023 14:03:35 -0500 Subject: [PATCH 29/46] fixed field path --- pyQuARC/schemas/rule_mapping.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index a8cc309e..6a5ba852 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -3732,7 +3732,7 @@ "fields": [ "DIF/Use_Constraints/License_URL/Description", "DIF/Use_Constraints/License_URL/URL", - "DIF/Use_Constraints/License_URL/License_Text" + "DIF/Use_Constraints/License_Text" ] } ], From a6fb9d9349ba8d21ebac82103e2238bfc92922a4 Mon Sep 17 00:00:00 2001 From: Shelby Bagwell Date: Fri, 5 May 2023 14:06:33 -0500 Subject: [PATCH 30/46] updated check function --- pyQuARC/code/custom_validator.py | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/pyQuARC/code/custom_validator.py b/pyQuARC/code/custom_validator.py index fe17b1b0..697ff831 100644 --- a/pyQuARC/code/custom_validator.py +++ b/pyQuARC/code/custom_validator.py @@ -96,7 +96,7 @@ def one_item_presence_check(*field_values): return {"valid": validity, "value": value} @staticmethod - def license_url_description_check(url_field, description_field): + def license_url_description_check(description_field, url_field, license_text): """ Determines if a description has been provided for the License URL if a License URL has been provided in the metadata. @@ -108,7 +108,10 @@ def license_url_description_check(url_field, description_field): validity = True value = description_field - if not url_field: + if not license_text and not url_field: + validity = False + return {"valid": validity, "value": value} + elif license_text and not url_field: return {"valid": validity, "value": value} else: if not description_field: From 5d6e42c3bd4755a4b9ec75ddd24e232f5ddb78e9 Mon Sep 17 00:00:00 2001 From: Ashish Acharya Date: Tue, 13 Jun 2023 16:38:40 -0500 Subject: [PATCH 31/46] Create dependabot.yml --- .github/dependabot.yml | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 .github/dependabot.yml diff --git a/.github/dependabot.yml b/.github/dependabot.yml new file mode 100644 index 00000000..bfae19f2 --- /dev/null +++ b/.github/dependabot.yml @@ -0,0 +1,7 @@ +version: 2 +updates: + # Enable version updates for pip + - package-ecosystem: "pip" # See documentation for possible values + directory: "/" # Location of package manifests + schedule: + interval: "weekly" From 3c3239ca6ee3185168ca62aff7cf24295525e277 Mon Sep 17 00:00:00 2001 From: Essence Raphael Date: Mon, 24 Jul 2023 09:49:38 -0500 Subject: [PATCH 32/46] Update instrument long name presence check --- pyQuARC/code/gcmd_validator.py | 7 +++++ pyQuARC/code/string_validator.py | 16 ++++++++++ pyQuARC/schemas/check_messages.json | 2 +- pyQuARC/schemas/checks.json | 5 ++++ pyQuARC/schemas/rule_mapping.json | 46 +++++++++++++++++++++++++++-- 5 files changed, 73 insertions(+), 3 deletions(-) diff --git a/pyQuARC/code/gcmd_validator.py b/pyQuARC/code/gcmd_validator.py index ae2048ac..48c48e6f 100644 --- a/pyQuARC/code/gcmd_validator.py +++ b/pyQuARC/code/gcmd_validator.py @@ -288,6 +288,13 @@ def validate_instrument_long_name(self, input_keyword): Validates GCMD instrument long name """ return input_keyword in self.keywords["instrument_long_name"] + + def validate_instrument_long_name_presence(self, input_keyword): + """ + Validates whether a given instrument short name has a corresponding long name + """ + long_name_field = self.keywords["instrument"].get(input_keyword) + return "N/A" in long_name_field def validate_platform_short_name(self, input_keyword): """ diff --git a/pyQuARC/code/string_validator.py b/pyQuARC/code/string_validator.py index e2993c15..9988f9c2 100644 --- a/pyQuARC/code/string_validator.py +++ b/pyQuARC/code/string_validator.py @@ -159,6 +159,22 @@ def instrument_long_name_gcmd_check(value): "value": value, } + @staticmethod + @if_arg + def instrument_long_name_presence_check(*args): + if not args[1]: + return { + "valid": StringValidator.gcmdValidator.validate_instrument_long_name_presence( + args[0].upper() + ), + "value": args[0], + } + else: + return { + "valid": True, + "value": args[0], + } + @staticmethod @if_arg def platform_short_name_gcmd_check(value): diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index bc0b9435..3cf44b6c 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -305,7 +305,7 @@ "remediation": "Select a valid long name, or submit a request to support@earthdata.nasa.gov to have this instrument added to the GCMD Instruments keyword list." }, "instrument_long_name_presence_check": { - "failure": "The instrument/sensor long name is missing.", + "failure": "The provided instrument/sensor short name `{}` is missing the corresponding instrument/sensor long name.", "help": { "message": "", "url": "https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/instruments/?format=csv&page_num=1&page_size=2000" diff --git a/pyQuARC/schemas/checks.json b/pyQuARC/schemas/checks.json index 39306b53..7e4b1149 100644 --- a/pyQuARC/schemas/checks.json +++ b/pyQuARC/schemas/checks.json @@ -99,6 +99,11 @@ "check_function": "instrument_long_name_gcmd_check", "available": true }, + "instrument_long_name_presence_check": { + "data_type": "string", + "check_function": "instrument_long_name_presence_check", + "available": true + }, "validate_granule_instrument_against_collection": { "data_type": "string", "check_function": "validate_granule_instrument_against_collection", diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index d04c752f..68551f7d 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -2536,47 +2536,89 @@ "check_id": "instrument_long_name_gcmd_check" }, "instrument_long_name_presence_check": { - "rule_name": "Instrument Longname Presence Check", + "rule_name": "Instrument Longname Requirement Check", "fields_to_apply": { "echo-c": [ { "fields": [ + "Collection/Platforms/Platform/Instruments/Instrument/ShortName", "Collection/Platforms/Platform/Instruments/Instrument/LongName" + ], + "dependencies": [ + [ + "instrument_short_name_gcmd_check", + "Collection/Platforms/Platform/Instruments/Instrument/ShortName" + ] ] }, { "fields": [ + "Collection/Platforms/Platform/Instruments/Instrument/Sensors/Sensor/ShortName", "Collection/Platforms/Platform/Instruments/Instrument/Sensors/Sensor/LongName" + ], + "dependencies": [ + [ + "instrument_short_name_gcmd_check", + "Collection/Platforms/Platform/Instruments/Instrument/Sensors/Sensor/ShortName" + ] ] } ], "dif10": [ { "fields": [ + "DIF/Platform/Instrument/Short_Name", "DIF/Platform/Instrument/Long_Name" + ], + "dependencies": [ + [ + "instrument_short_name_gcmd_check", + "DIF/Platform/Instrument/Short_Name" + ] ] }, { "fields": [ + "DIF/Platform/Instrument/Sensor/Short_Name", "DIF/Platform/Instrument/Sensor/Long_Name" + ], + "dependencies": [ + [ + "instrument_short_name_gcmd_check", + "DIF/Platform/Instrument/Sensor/Short_Name" + ] ] } ], "umm-c": [ { "fields": [ + "Platforms/Instruments/ShortName", "Platforms/Instruments/LongName" + ], + "dependencies": [ + [ + "instrument_short_name_gcmd_check", + "Platforms/Instruments/ShortName" + ] ] }, { "fields": [ + "Platforms/Instruments/ComposedOf/ShortName", "Platforms/Instruments/ComposedOf/LongName" + ], + "dependencies": [ + [ + "instrument_short_name_gcmd_check", + "Platforms/Instruments/ComposedOf/ShortName" + ] ] } ] }, "severity": "warning", - "check_id": "one_item_presence_check" + "check_id": "instrument_long_name_presence_check" }, "granule_instrument_presence_check": { "rule_name": "Granule Instrument Presence Check", From 2593fd8348fc55cbdef576bcde91092eb9fb2f74 Mon Sep 17 00:00:00 2001 From: Essence Raphael Date: Mon, 24 Jul 2023 10:02:06 -0500 Subject: [PATCH 33/46] Update platform long name presence check --- pyQuARC/code/gcmd_validator.py | 7 +++++++ pyQuARC/code/string_validator.py | 16 ++++++++++++++++ pyQuARC/schemas/check_messages.json | 2 +- pyQuARC/schemas/checks.json | 5 +++++ pyQuARC/schemas/rule_mapping.json | 27 ++++++++++++++++++++++++--- 5 files changed, 53 insertions(+), 4 deletions(-) diff --git a/pyQuARC/code/gcmd_validator.py b/pyQuARC/code/gcmd_validator.py index 48c48e6f..348ba5bd 100644 --- a/pyQuARC/code/gcmd_validator.py +++ b/pyQuARC/code/gcmd_validator.py @@ -314,6 +314,13 @@ def validate_platform_type(self, input_keyword): """ return input_keyword in self.keywords["platform_type"] + def validate_platform_long_name_presence(self, input_keyword): + """ + Validates whether a given platform short name has a corresponding long name + """ + long_name_field = self.keywords["platform"].get(input_keyword) + return "N/A" in long_name_field + def validate_platform_short_long_name_consistency(self, input_keyword): """ Validates GCMD platform short name and long name consistency diff --git a/pyQuARC/code/string_validator.py b/pyQuARC/code/string_validator.py index 9988f9c2..cc68bab7 100644 --- a/pyQuARC/code/string_validator.py +++ b/pyQuARC/code/string_validator.py @@ -205,6 +205,22 @@ def platform_type_gcmd_check(value): "value": value, } + @staticmethod + @if_arg + def platform_long_name_presence_check(*args): + if not args[1]: + return { + "valid": StringValidator.gcmdValidator.validate_platform_long_name_presence( + args[0].upper() + ), + "value": args[0], + } + else: + return { + "valid": True, + "value": args[0], + } + @staticmethod @if_arg def platform_short_long_name_consistency_check(*args): diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index 3cf44b6c..46317023 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -929,7 +929,7 @@ "remediation": "Recommend adding a platform type for the corresponding platform." }, "platform_long_name_presence_check": { - "failure": "A platform long name is not provided.", + "failure": "The provided platform short name `{}` is missing the corresponding platform long name.", "help": { "message": "", "url": "https://wiki.earthdata.nasa.gov/display/CMR/Platform" diff --git a/pyQuARC/schemas/checks.json b/pyQuARC/schemas/checks.json index 7e4b1149..67ac748d 100644 --- a/pyQuARC/schemas/checks.json +++ b/pyQuARC/schemas/checks.json @@ -124,6 +124,11 @@ "check_function": "platform_type_gcmd_check", "available": true }, + "platform_long_name_presence_check": { + "data_type": "string", + "check_function": "platform_long_name_presence_check", + "available": true + }, "platform_short_long_name_consistency_check": { "data_type": "string", "check_function": "platform_short_long_name_consistency_check", diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 68551f7d..e56808b3 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -5083,32 +5083,53 @@ "check_id": "one_item_presence_check" }, "platform_long_name_presence_check": { - "rule_name": "Platform Longname Presence Check", + "rule_name": "Platform Longname Requirement Check", "fields_to_apply": { "echo-c": [ { "fields": [ - "Collection/Platforms/Platforms/LongName" + "Collection/Platforms/Platform/ShortName", + "Collection/Platforms/Platform/LongName" + ], + "dependencies": [ + [ + "platform_short_name_gcmd_check", + "Collection/Platforms/Platform/ShortName" + ] ] } ], "dif10": [ { "fields": [ + "DIF/Platform/Short_Name", "DIF/Platform/Long_Name" + ], + "dependencies": [ + [ + "platform_short_name_gcmd_check", + "DIF/Platform/Short_Name" + ] ] } ], "umm-c": [ { "fields": [ + "Platforms/ShortName", "Platforms/LongName" + ], + "dependencies": [ + [ + "platform_short_name_gcmd_check", + "Platforms/ShortName" + ] ] } ] }, "severity": "warning", - "check_id": "one_item_presence_check" + "check_id": "platform_long_name_presence_check" }, "granule_platform_presence_check": { "rule_name": "Granule Platform Presence Check", From 55dba37c74bea5f401b22795b864e17df65b8557 Mon Sep 17 00:00:00 2001 From: Essence Raphael Date: Mon, 24 Jul 2023 10:10:48 -0500 Subject: [PATCH 34/46] Update campaign long name presence check --- pyQuARC/code/gcmd_validator.py | 7 ++++++ pyQuARC/code/string_validator.py | 16 ++++++++++++ pyQuARC/schemas/check_messages.json | 2 +- pyQuARC/schemas/checks.json | 5 ++++ pyQuARC/schemas/rule_mapping.json | 39 +++++++++++++++++++++++++++-- 5 files changed, 66 insertions(+), 3 deletions(-) diff --git a/pyQuARC/code/gcmd_validator.py b/pyQuARC/code/gcmd_validator.py index 348ba5bd..254bf42c 100644 --- a/pyQuARC/code/gcmd_validator.py +++ b/pyQuARC/code/gcmd_validator.py @@ -383,6 +383,13 @@ def validate_campaign_long_name(self, input_keyword): """ return input_keyword in self.keywords["campaign_long_name"] + def validate_campaign_long_name_presence(self, input_keyword): + """ + Validates whether a given campaign short name has a corresponding long name + """ + long_name_field = self.keywords["campaign"].get(input_keyword) + return "N/A" in long_name_field + def validate_data_format(self, input_keyword): """ Validates GCMD Granule Data Format diff --git a/pyQuARC/code/string_validator.py b/pyQuARC/code/string_validator.py index cc68bab7..d2a4ac6c 100644 --- a/pyQuARC/code/string_validator.py +++ b/pyQuARC/code/string_validator.py @@ -273,6 +273,22 @@ def campaign_long_name_gcmd_check(value): "value": value, } + @staticmethod + @if_arg + def campaign_long_name_presence_check(*args): + if not args[1]: + return { + "valid": StringValidator.gcmdValidator.validate_campaign_long_name_presence( + args[0].upper() + ), + "value": args[0], + } + else: + return { + "valid": True, + "value": args[0], + } + @staticmethod @if_arg def data_format_gcmd_check(value): diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index 46317023..4b2928f1 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -425,7 +425,7 @@ "remediation": "Select a valid long name, or submit a request to support@earthdata.nasa.gov to have this project/campaign name added to the GCMD Projects keyword list." }, "campaign_long_name_presence_check": { - "failure": "The project/campaign long name is missing.", + "failure": "The provided project/campaign short name `{}` is missing the corresponding project/campaign long name.", "help": { "message": "", "url": "" diff --git a/pyQuARC/schemas/checks.json b/pyQuARC/schemas/checks.json index 67ac748d..778f4da3 100644 --- a/pyQuARC/schemas/checks.json +++ b/pyQuARC/schemas/checks.json @@ -164,6 +164,11 @@ "check_function": "campaign_long_name_gcmd_check", "available": true }, + "campaign_long_name_presence_check": { + "data_type": "string", + "check_function": "campaign_long_name_presence_check", + "available": true + }, "data_format_gcmd_check": { "data_type": "string", "check_function": "data_format_gcmd_check", diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index e56808b3..cf6cc9ab 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -3143,18 +3143,53 @@ "check_id": "campaign_long_name_gcmd_check" }, "campaign_long_name_presence_check": { - "rule_name": "Campaign Long Name Presence Check", + "rule_name": "Campaign Longname Requirement Check", "fields_to_apply": { + "echo-c": [ + { + "fields": [ + "Collection/Campaigns/Campaign/ShortName", + "Collection/Campaigns/Campaign/LongName" + ], + "dependencies": [ + [ + "campaign_short_name_gcmd_check", + "Collection/Campaigns/Campaign/ShortName" + ] + ] + } + ], "dif10": [ { "fields": [ + "DIF/Project/Short_Name", "DIF/Project/Long_Name" + ], + "dependencies": [ + [ + "campaign_short_name_gcmd_check", + "DIF/Project/Short_Name" + ] + ] + } + ], + "umm-c": [ + { + "fields": [ + "Projects/ShortName", + "Projects/LongName" + ], + "dependencies": [ + [ + "campaign_short_name_gcmd_check", + "Projects/ShortName" + ] ] } ] }, "severity": "warning", - "check_id": "one_item_presence_check" + "check_id": "campaign_long_name_presence_check" }, "version_description_not_provided": { "rule_name": "Version Description Not Provided", From 0bb815bbe97700ad72561391fa9f06856efa0c45 Mon Sep 17 00:00:00 2001 From: Jenny Wood Date: Tue, 25 Jul 2023 09:40:35 -0500 Subject: [PATCH 35/46] Added Granule Campaign Name Presence Check --- pyQuARC/schemas/check_messages.json | 11 ++++++++-- pyQuARC/schemas/rule_mapping.json | 31 ++++++++++++++++------------- 2 files changed, 26 insertions(+), 16 deletions(-) diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index 4b2928f1..ea2d23c5 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -1,5 +1,5 @@ { - "free_and_open_data_presence_check":{ + "free_and_open_data_presence_check": { "failure": "No FreeAndOpenData value was given.", "help": { "message": "", @@ -7,7 +7,6 @@ }, "remediation": "Recommend providing a FreeAndOpenData value of 'true'." }, - "datetime_format_check": { "failure": "`{}` does not adhere to the ISO 1601 standard.", "help": { @@ -608,6 +607,14 @@ }, "remediation": "Please add a GCMD compliant campaign/project name if applicable to the dataset." }, + "granule_campaign_name_presence_check": { + "failure": "The campaign/project name is missing.", + "help": { + "message": "", + "url": "https://wiki.earthdata.nasa.gov/display/CMR/Project" + }, + "remediation": "Please add a GCMD compliant campaign/project name if applicable to the dataset." + }, "spatial_coverage_type_presence_check": { "failure": "The Spatial Coverage Type is missing.", "help": { diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index cf6cc9ab..2970ca04 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -1,9 +1,7 @@ { - "free_and_open_data_presence_check": { "rule_name": "Free and Open Data Presence Check", "fields_to_apply": { - "echo-c": [ { "fields": [ @@ -25,12 +23,10 @@ ] } ] - }, "severity": "warning", "check_id": "one_item_presence_check" }, - "data_update_time_logic_check": { "rule_name": "Data Update Time Logic Check", "fields_to_apply": { @@ -2639,7 +2635,7 @@ ] }, "severity": "info", - "check_id": "one_item_presence_check" + "check_id": "one_item_presence_check" }, "standard_product_check": { "rule_name": "Standard Product Check", @@ -2650,7 +2646,7 @@ "Collection/StandardProduct" ] } - ], + ], "umm-c": [ { "fields": [ @@ -3539,13 +3535,6 @@ ] } ], - "echo-g": [ - { - "fields": [ - "Granule/Campaigns/Campaign/ShortName" - ] - } - ], "dif10": [ { "fields": [ @@ -3559,6 +3548,20 @@ "Projects/ShortName" ] } + ] + }, + "severity": "warning", + "check_id": "one_item_presence_check" + }, + "granule_campaign_name_presence_check": { + "rule_name": "Campaign Name Presence Check", + "fields_to_apply": { + "echo-g": [ + { + "fields": [ + "Granule/Campaigns/Campaign/ShortName" + ] + } ], "umm-g": [ { @@ -3568,7 +3571,7 @@ } ] }, - "severity": "warning", + "severity": "info", "check_id": "one_item_presence_check" }, "spatial_coverage_type_presence_check": { From 178d4c56d58f92511c0ac83304e57063ed6c115a Mon Sep 17 00:00:00 2001 From: Jenny Wood Date: Tue, 25 Jul 2023 09:48:43 -0500 Subject: [PATCH 36/46] Removed online_access_url_description_check since the url_desc_presence_check achieves the same result --- pyQuARC/schemas/check_messages.json | 8 -------- pyQuARC/schemas/rule_mapping.json | 21 --------------------- 2 files changed, 29 deletions(-) diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index ea2d23c5..159969df 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -647,14 +647,6 @@ }, "remediation": "Recommend providing at least one Online Resource URL." }, - "online_access_url_description_check": { - "failure": "There does not appear to be a description for the Online Access URL(s).", - "help": { - "message": "", - "url": "https://wiki.earthdata.nasa.gov/display/CMR/Related+URLs" - }, - "remediation": "Recommend providing a description for each URL that is provided." - }, "online_resource_url_description_check": { "failure": "There does not appear to be a description for the Online Resource URL(s).", "help": { diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 2970ca04..6c881de5 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -3672,27 +3672,6 @@ "severity": "error", "check_id": "one_item_presence_check" }, - "online_access_url_description_check": { - "rule_name": "Online Access URL Description Check", - "fields_to_apply": { - "echo-c": [ - { - "fields": [ - "Collection/OnlineAccessURLs/OnlineAccessURL/URLDescription" - ] - } - ], - "echo-g": [ - { - "fields": [ - "Granule/OnlineAccessURLs/OnlineAccessURL/URLDescription" - ] - } - ] - }, - "severity": "warning", - "check_id": "one_item_presence_check" - }, "online_resource_url_description_check": { "rule_name": "Online Resource URL Description Check", "fields_to_apply": { From de0c5dd6f7b08d05368fb551faa139d59d7a93ee Mon Sep 17 00:00:00 2001 From: Jenny Wood Date: Tue, 25 Jul 2023 10:02:57 -0500 Subject: [PATCH 37/46] Removed online_resource_url_description_check since the url_desc_presence_check achieves the same result --- pyQuARC/schemas/check_messages.json | 8 ------- pyQuARC/schemas/rule_mapping.json | 36 +++++++---------------------- 2 files changed, 8 insertions(+), 36 deletions(-) diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index 159969df..ff5fb666 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -647,14 +647,6 @@ }, "remediation": "Recommend providing at least one Online Resource URL." }, - "online_resource_url_description_check": { - "failure": "There does not appear to be a description for the Online Resource URL(s).", - "help": { - "message": "", - "url": "https://wiki.earthdata.nasa.gov/display/CMR/Related+URLs" - }, - "remediation": "Recommend providing a description for each URL that is provided." - }, "opendap_url_location_check": { "failure": "The provided Online Access URL `{}` is an OPeNDAP URL.", "help": { diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 6c881de5..9335ce7d 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -3672,34 +3672,6 @@ "severity": "error", "check_id": "one_item_presence_check" }, - "online_resource_url_description_check": { - "rule_name": "Online Resource URL Description Check", - "fields_to_apply": { - "echo-c": [ - { - "fields": [ - "Collection/OnlineResources/OnlineResource/Description" - ] - } - ], - "echo-g": [ - { - "fields": [ - "Granule/OnlineResources/OnlineResource/Description" - ] - } - ], - "umm-g": [ - { - "fields": [ - "RelatedUrls/Description" - ] - } - ] - }, - "severity": "warning", - "check_id": "one_item_presence_check" - }, "opendap_url_location_check": { "rule_name": "ECHO10 OPeNDAP URL Location Check", "fields_to_apply": { @@ -4759,6 +4731,14 @@ "RelatedUrls/URL" ] } + ], + "umm-g": [ + { + "fields": [ + "RelatedUrls/Description", + "RelatedUrls/URL" + ] + } ] }, "severity": "warning", From 10b8195d9fb55ab7af7bc738b7b08ef6112f6be8 Mon Sep 17 00:00:00 2001 From: Jenny Wood Date: Tue, 25 Jul 2023 13:48:13 -0500 Subject: [PATCH 38/46] Removed data_center_short_name_gcmd_check since the organization_short_name_gcmd_check achieves the same result --- pyQuARC/schemas/check_messages.json | 12 ++--------- pyQuARC/schemas/rule_mapping.json | 33 ----------------------------- 2 files changed, 2 insertions(+), 43 deletions(-) diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index ff5fb666..e20c3cd9 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -136,12 +136,12 @@ "remediation": "Consider updating the delete date to a future date or remove it from the metadata if it is no longer valid." }, "organization_short_name_gcmd_check": { - "failure": "The provided data center short name `{}` does not comply with GCMD. ", + "failure": "The provided short name `{}` does not comply with GCMD. ", "help": { "message": "", "url": "https://wiki.earthdata.nasa.gov/display/CMR/Data+Center" }, - "remediation": "Provide a valid short name name from the GCMD Providers keyword list or submit a request to support@earthdata.nasa.gov to have this keyword added to the GCMD KMS." + "remediation": "Provide a valid short name from the GCMD Providers keyword list or submit a request to support@earthdata.nasa.gov to have this keyword added to the GCMD KMS." }, "organization_long_name_gcmd_check": { "failure": "The provided data center long name `{}` does not comply with the GCMD. ", @@ -895,14 +895,6 @@ }, "remediation": "Recommend providing a Directory Name/IDN Node from the following list: https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/idnnode?format=csv." }, - "data_center_short_name_gcmd_check": { - "failure": "The provided short name `{}` does not comply with GCMD. ", - "help": { - "message": "", - "url": "https://wiki.earthdata.nasa.gov/display/CMR/Data+Center" - }, - "remediation": "Recommend providing a short name from the following list: https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/providers?format=csv." - }, "characteristic_data_type": { "failure": "The provided characteristic is missing a data type. ", "help": { diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 9335ce7d..876c2baa 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -4951,39 +4951,6 @@ "severity": "error", "check_id": "count_check" }, - "data_center_short_name_gcmd_check": { - "rule_name": "Data Center Shortname GCMD Check", - "fields_to_apply": { - "echo-c": [ - { - "fields": [ - "Collection/ProcessingCenter" - ] - }, - { - "fields": [ - "Collection/ArchiveCenter" - ] - } - ], - "dif10": [ - { - "fields": [ - "DIF/Organization/Organization_Name/Short_Name" - ] - } - ], - "umm-c": [ - { - "fields": [ - "DataCenters/ShortName" - ] - } - ] - }, - "severity": "error", - "check_id": "organization_short_name_gcmd_check" - }, "characteristic_data_type": { "rule_name": "Characteristics Data Type Presence Check", "fields_to_apply": { From 8ee99469230b91bb221aacbd3e3c88491f397f56 Mon Sep 17 00:00:00 2001 From: Jenny Wood Date: Tue, 25 Jul 2023 13:52:54 -0500 Subject: [PATCH 39/46] Modified rule mapping for collection_citation_presence_check --- pyQuARC/schemas/rule_mapping.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 876c2baa..17a44a3a 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -3853,7 +3853,7 @@ "echo-c": [ { "fields": [ - "Collection/CitationforExternalPublication" + "Collection/CitationForExternalPublication" ] } ], From a2f6112d46bbbf6fbc3d4ce3dbedb3839f492473 Mon Sep 17 00:00:00 2001 From: Jenny Wood Date: Tue, 25 Jul 2023 14:19:09 -0500 Subject: [PATCH 40/46] Resolving issues with the validate_beginning_datetime_against_granules check --- pyQuARC/schemas/rule_mapping.json | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 17a44a3a..9c873045 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -4232,7 +4232,8 @@ ], "dependencies": [ [ - "date_or_datetime_format_check" + "date_or_datetime_format_check", + "DIF/Temporal_Coverage/Range_DateTime/Beginning_Date_Time" ] ] } @@ -4246,7 +4247,7 @@ ], "dependencies": [ [ - "datetime_format_check", + "date_or_datetime_format_check", "TemporalExtents/RangeDateTimes/BeginningDateTime" ] ] From 589a6add9d3519f37cc8fdfda2b53dbb2885ed1a Mon Sep 17 00:00:00 2001 From: Jenny Wood Date: Thu, 27 Jul 2023 15:42:41 -0500 Subject: [PATCH 41/46] Added data_format_presence_check for collections --- pyQuARC/schemas/check_messages.json | 8 ++++++++ pyQuARC/schemas/rule_mapping.json | 28 ++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/pyQuARC/schemas/check_messages.json b/pyQuARC/schemas/check_messages.json index e20c3cd9..b072b553 100644 --- a/pyQuARC/schemas/check_messages.json +++ b/pyQuARC/schemas/check_messages.json @@ -1039,6 +1039,14 @@ }, "remediation": "Recommend updating the Number Of Instruments/Sensors value to match the numerical amount of instruments/sensors provided." }, + "data_format_presence_check": { + "failure": "The data format is missing.", + "help": { + "message": "", + "url": "https://wiki.earthdata.nasa.gov/display/CMR/Direct+Distribution+Information" + }, + "remediation": "Recommend providing a valid data format from the Granule Data Format GCMD keyword list." + }, "horizontal_resolution_presence_check": { "failure": "No horizontal resolution information is provided.", "help": { diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 9c873045..26e0fe08 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -5425,6 +5425,34 @@ "severity": "warning", "check_id": "granule_data_format_presence_check" }, + "data_format_presence_check": { + "rule_name": "Data Format Presence Check", + "fields_to_apply": { + "echo-c": [ + { + "fields": [ + "Collection/DataFormat" + ] + } + ], + "dif10": [ + { + "fields": [ + "DIF/Distribution/Distribution_Format" + ] + } + ], + "umm-c": [ + { + "fields": [ + "ArchiveAndDistributionInformation/FileDistributionInformation/Format" + ] + } + ] + }, + "severity": "error", + "check_id": "one_item_presence_check" + }, "horizontal_resolution_presence_check": { "rule_name": "Horizontal Resolution Presence Check", "fields_to_apply": { From 5db98b840912f93a12af410bf3e9f2ed8217307d Mon Sep 17 00:00:00 2001 From: Jenny Wood Date: Fri, 28 Jul 2023 14:06:26 -0500 Subject: [PATCH 42/46] Removed echo-g rule mapping from online_resource_url_presence_check --- pyQuARC/schemas/rule_mapping.json | 7 ------- 1 file changed, 7 deletions(-) diff --git a/pyQuARC/schemas/rule_mapping.json b/pyQuARC/schemas/rule_mapping.json index 26e0fe08..23367765 100644 --- a/pyQuARC/schemas/rule_mapping.json +++ b/pyQuARC/schemas/rule_mapping.json @@ -3660,13 +3660,6 @@ "Collection/OnlineResources/OnlineResource/URL" ] } - ], - "echo-g": [ - { - "fields": [ - "Granule/OnlineResources/OnlineResource/URL" - ] - } ] }, "severity": "error", From 7e501665c1650a4c89f9deed1682fe20e2bf4c3c Mon Sep 17 00:00:00 2001 From: Jenny Wood <57103986+jenny-m-wood@users.noreply.github.com> Date: Wed, 2 Aug 2023 10:59:04 -0500 Subject: [PATCH 43/46] Remove else: pass from dif_standard_product_check --- pyQuARC/code/custom_validator.py | 2 -- 1 file changed, 2 deletions(-) diff --git a/pyQuARC/code/custom_validator.py b/pyQuARC/code/custom_validator.py index c7123b79..27cbebad 100644 --- a/pyQuARC/code/custom_validator.py +++ b/pyQuARC/code/custom_validator.py @@ -117,8 +117,6 @@ def dif_standard_product_check(*field_values): value = field_value validity = True break - else: - pass return {"valid": validity, "value": value} @staticmethod From 8a3e77a844668788e008da2bec590e4e682cd30c Mon Sep 17 00:00:00 2001 From: Jenny Wood <57103986+jenny-m-wood@users.noreply.github.com> Date: Wed, 2 Aug 2023 11:13:41 -0500 Subject: [PATCH 44/46] Update changelog --- CHANGELOG.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1955eb46..9eb317d5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,17 @@ # CHANGELOG +## v1.2.3 +- Updated schema files +- Added Free And Open Data check +- Added Horizontal Resolution Presence check +- Added Data Format Presence check +- Added Standard Product check +- Added License URL Description check +- Added Granule Campaign Name Presence check +- Revised GCMD long name presence checks +- Revised validate_beginning_datetime_against_granules check +- Removed redundant checks + ## v1.2.2 - Bugfixes: From cdea94e05800573e63378e006f5a67da433356b6 Mon Sep 17 00:00:00 2001 From: Jenny Wood <57103986+jenny-m-wood@users.noreply.github.com> Date: Wed, 2 Aug 2023 11:14:10 -0500 Subject: [PATCH 45/46] Update version.txt --- pyQuARC/version.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pyQuARC/version.txt b/pyQuARC/version.txt index 23aa8390..0495c4a8 100644 --- a/pyQuARC/version.txt +++ b/pyQuARC/version.txt @@ -1 +1 @@ -1.2.2 +1.2.3 From 8b6136cabfe59da3fdf6266911609a7a16cd38e7 Mon Sep 17 00:00:00 2001 From: Slesa Adhikari Date: Wed, 2 Aug 2023 11:58:24 -0500 Subject: [PATCH 46/46] Add auth issue fix to changelog --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9eb317d5..6b659413 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,6 +11,7 @@ - Revised GCMD long name presence checks - Revised validate_beginning_datetime_against_granules check - Removed redundant checks +- Fix auth issue when downloading metadata files ## v1.2.2