diff --git a/tech/docs/technical_components/metadata_augmentation.md b/tech/docs/technical_components/metadata_augmentation.md index fbdeb7c..83ec4b6 100644 --- a/tech/docs/technical_components/metadata_augmentation.md +++ b/tech/docs/technical_components/metadata_augmentation.md @@ -19,49 +19,32 @@ In this component scripting / NLP / LLM are used on a metadata record to augment For the first SoilWise prototype, the functionality of the Metadata Augmentation component comprises: - [Automatic metadata generation](#automatic-metadata-generation) -- [Spatial scope analyser](#spatial-scope-analyser) +- [Translation module](#translation-module) + ### Automatic metadata generation To generate metadata (data set and service metadata), activate the corresponding button(s) when setting up the theme for the transformation process. The steps are described [here](https://main.soilwise-documentation.pages.dev/technical_components/metadata_validation/#setting-up-a-transformation-process-in-haleconnect) -### Spatial scope analyser -A script that analyses the spatial scope of a resource +### Translation module -The bounding box is matched to country bounding boxes +Many records arrive in a local language, SWR translates the main properties for the record: title and abstract into English, to offer a single language user experience. The translations are used in filtering and display of records. -To understand if the dataset has a global, continental, national or regional scope +The translation module builds on the EU translation service (API documentation at ). Translations are stored in a database for reuse by the SWR. +The EU translation returns asynchronous responses to translation requests, this means that translations may not yet be available after initial load of new data. A callback operation populates the database, from that moment a translation is available to SWR. The translation service uses 2-letter language codes, it means a translation from a 3-letter iso code (as used in for example iso19139:2007) to 2-letter code is required. The EU translation service has a limited set of translations from a certain to alternative language available, else returns an error. -- Retrieves all datasets (as iso19139 xml) from database (records table joined with augmentations) which: - - have a bounding box - - no spatial scope - - in iso19139 format -- For each record it compares the boundingbox to country bounding boxes: - - if bigger then continents > global - - If matches a continent > continental - - if matches a country > national - - if smaller > regional -- result is written to as an augmentation in a dedicated table +Initial translation is triggered by a running harvester. The translations will then be available once the record is ingested to the triplestore and catalogue database in a followup step of the harvester. ## Foreseen functionality In the next iterations, Metadata augmentation component is foreseen to include the following additional functions: -- [Translation module](#translation-module) - [Keyword matcher](#keyword-matcher) - [Spatial Locator](#spatial-locator) +- [Spatial scope analyser](#spatial-scope-analyser) - [EUSO-high-value dataset tagging](#euso-high-value-dataset-tagging) -### Translation module - -Many records arrive in a local language, SWR translates the main properties for the record: title and abstract into English, to offer a single language user experience. The translations are used in filtering and display of records. - -The translation module builds on the EU translation service (API documentation at ). Translations are stored in a database for reuse by the SWR. -The EU translation returns asynchronous responses to translation requests, this means that translations may not yet be available after initial load of new data. A callback operation populates the database, from that moment a translation is available to SWR. The translation service uses 2-letter language codes, it means a translation from a 3-letter iso code (as used in for example iso19139:2007) to 2-letter code is required. The EU translation service has a limited set of translations from a certain to alternative language available, else returns an error. - -Initial translation is triggered by a running harvester. The translations will then be available once the record is ingested to the triplestore and catalogue database in a followup step of the harvester. - ### Keyword matcher @@ -81,6 +64,25 @@ For metadata records which have not been analysed yet (in that iteration), the m Analyses existing keywords to find a relevant geography for the record, it then uses the [GeoNames](https://www.geonames.org/about.html){target=_blank} API to find spatial coordinates for the geography, which are inserted into the metadata record. +### Spatial scope analyser + +A script that analyses the spatial scope of a resource + +The bounding box is matched to country bounding boxes + +To understand if the dataset has a global, continental, national or regional scope + +- Retrieves all datasets (as iso19139 xml) from database (records table joined with augmentations) which: + - have a bounding box + - no spatial scope + - in iso19139 format +- For each record it compares the boundingbox to country bounding boxes: + - if bigger then continents > global + - If matches a continent > continental + - if matches a country > national + - if smaller > regional +- result is written to as an augmentation in a dedicated table + ### EUSO-high-value dataset tagging The EUSO high-value datasets are those with substantial potential to assess soil health status, as detailed on the [EUSO dashboard](https://esdac.jrc.ec.europa.eu/esdacviewer/euso-dashboard/){target=_blank}. This framework includes the concept of [soil degradation indicator](https://esdac.jrc.ec.europa.eu/content/soil-degradation-indicators-eu){target=_blank} metadata-based identification and tagging. Each dataset (possibly only those with the supra-national spatial scope - under discussion) will be annotated with a potential soil degradation indicator for which it might be utilised. Users can then filter these datasets according to their specific needs.