Skip to content

Commit

Permalink
structure review
Browse files Browse the repository at this point in the history
  • Loading branch information
DajanaSnopkova committed Sep 9, 2024
1 parent 9dbf0d5 commit fbe0b09
Show file tree
Hide file tree
Showing 4 changed files with 39 additions and 26 deletions.
3 changes: 1 addition & 2 deletions tech/docs/technical_components/.pages
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,4 @@ nav:
- metadata_augmentation.md
- knowledge_graph.md
- natural_language_querying.md
- user_management.md
- monitoring.md
- user_management.md
55 changes: 33 additions & 22 deletions tech/docs/technical_components/metadata_augmentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,23 @@
**Project:** [Metadata augmentation](https://github.com/soilwise-he/metadata-augmentation)


## Translation module
## Functionality

### Functionality
In this component scripting / NLP / LLM are used on a metadata record to augment metadata statements about the resource. Augmentations are stored on a dedicated augmentation table, indicating the process which produced it.

Many records arrive in a local language, SWR translates the main properties for the record: title and abstract into english, to offer a single language user experience to users. The translations are used in filtering and display of records.
| metadata-uri | metadata-element | source | value | proces | date |
| --- | --- | --- | --- | --- | --- |
| <https://geo.fi/data/ee44-aa22-33> | spatial-scope | 16.7,62.2,18,81.5 | <https://inspire.ec.europa.eu/metadata-codelist/SpatialScope/national> | spatial-scope-analyser | 2024-07-04 |
| <https://geo.fi/data/abc1-ba27-67> | soil-thread | This dataset is used to evaluate Soil Compaction in Nuohous Sundström | <http://aims.fao.org/aos/agrovoc/c_7163> | keyword-analyser | 2024-06-28 |

### Technical
For the first SoilWise prototype, the functionality of the Metadata Augmentation component comprises:

The translation module builds on the EU translation service (API documentation at <https://language-tools.ec.europa.eu/>). Translations are stored in a database for reuse by the SWR.
The EU translation returns asynchronous responses to translation requests, this means that translations may not yet be available after initial load of new data. A callback operation populates the database, from that moment a translation is available to SWR. The translation service uses 2-letter language codes, it means a translation from a 3-letter iso code (as used in for example iso19139:2007) to 2-letter code is required. The EU translation service has a limited set of translations from a certain to alternative language available, else returns an error.
- [Automatic metadata generation](#automatic-metadata-generation)
- [Spatial scope analyser](#spatial-scope-analyser)

Initial translation is triggered by a running harvester. The translations will then be available once the record is ingested to the triplestore and catalogue database in a followup step of the harvester.
### Automatic metadata generation

@WE to add content

### Spatial scope analyser

Expand All @@ -45,35 +50,43 @@ To understand if the dataset has a global, continental, national or regional sco
| <https://geo.fi/data/abc1-ba27-67> | spatial-scope | 17.4,68.2,17.6,71,2 | <https://inspire.ec.europa.eu/metadata-codelist/SpatialScope/regional> | spatial-scope-analyser | 2024-07-04 |


## Foreseen functionality

## Keyword matcher
In the next iterations, Metadata augmentation component is foreseen to include the following additional functions:

Keywords are an important mechanism to filter and cluster records. But similar keywords need to be equal to be able to match them. This module evaluates keywords of existing records to make them equal in case of high similarity.
- [Translation module](#translation-module)
- [Keyword matcher](#keyword-matcher)
- [Spatial Locator](#spatial-locator)
- [EUSO-high-value dataset tagging](#euso-high-value-dataset-tagging)

### Translation module

Many records arrive in a local language, SWR translates the main properties for the record: title and abstract into english, to offer a single language user experience to users. The translations are used in filtering and display of records.

### Functionality
The translation module builds on the EU translation service (API documentation at <https://language-tools.ec.europa.eu/>). Translations are stored in a database for reuse by the SWR.
The EU translation returns asynchronous responses to translation requests, this means that translations may not yet be available after initial load of new data. A callback operation populates the database, from that moment a translation is available to SWR. The translation service uses 2-letter language codes, it means a translation from a 3-letter iso code (as used in for example iso19139:2007) to 2-letter code is required. The EU translation service has a limited set of translations from a certain to alternative language available, else returns an error.

Initial translation is triggered by a running harvester. The translations will then be available once the record is ingested to the triplestore and catalogue database in a followup step of the harvester.


### Keyword matcher

Keywords are an important mechanism to filter and cluster records. But similar keywords need to be equal to be able to match them. This module evaluates keywords of existing records to make them equal in case of high similarity.

Analyses existing keywords on a metadata record. Two cases can be identified.
- If a keyword, having a skos identifier, has a closeMatch or sameAs relation to a prefered keyword, the prefered keyword is used.
- If an existing keyword, without skos identifier, matches a prefered keyword by (translated) string or synonym, then append the matched keyword (including skos identifier). Consider the risk of false positives.

### Technical

To facilitate this use case the SWR contains a knowledge graph of prefered keywords in the soil domain with relations to alternative keywords, such as agrovoc, gemet, dpedia, iso. This knowledge graph is maintained at <https://github.com/soilwise-he/soil-health-knowledge-graph>. Agrovoc is multilingual, facilitating the translation case.

For metadata records which have not been analysed yet (in that iteration), the module extracts the records, for each keyword an analyses is made if it maches any of the prefered keywords, if so, the prefered keyword is added to the record.

## Spatial Locator

### Functionality
### Spatial Locator

Analyses existing keywords to find a relevant geography for the record, it then uses the [GeoNames](https://www.geonames.org/about.html){target=_blank} API to find spatial coordinates for the geography, which are inserted into the metadata record.

### Technical


## EUSO-high-value dataset tagging

### functionality
### EUSO-high-value dataset tagging

The EUSO high-value datasets are those with substantial potential to assess soil health status, as detailed on the [EUSO dashboard](https://esdac.jrc.ec.europa.eu/esdacviewer/euso-dashboard/){target=_blank}. This framework includes the concept of [soil degradation indicator](https://esdac.jrc.ec.europa.eu/content/soil-degradation-indicators-eu){target=_blank} metadata-based identification and tagging. Each dataset (possibly only those with the supra-national spatial scope) will be annotated with a potential soil degradation indicator for which it might be utilised. Users can then filter these datasets according to their specific needs.

Expand Down Expand Up @@ -172,8 +185,6 @@ The EUSO soil degradation indicators employ specific [methodologies and threshol
</tr>
</table>

### Technical

Technically, we forsee the metadata tagging process as illustrated below. At first, metadata record's title, abstract and keywords will be checked for the occurence of specific **values from the Soil Indicator and Soil Degradation Codelists**, such as `Water erosion` or `Soil erosion` (see the Table above). If found, the `Soil Degradation Indicator Tag` (corresponding value from the Soil Degradation Codelist) will be displayed to indicate suitability of given dataset for soil indicator related analyses. Additionally, a search for corresponding **methodology** will be conducted to see if the dataset is compliant with the EUSO Soil Health indicators presented in the [EUSO Dashboard](https://esdac.jrc.ec.europa.eu/esdacviewer/euso-dashboard/){target=_blank}. If found, the tag `EUSO High-value dataset` will be added. In later phase we assume search for references to Scientific Methodology papers in metadata record's links. Next, the possibility of involving a more complex search using soil thesauri will also be explored.


Expand Down
5 changes: 4 additions & 1 deletion tech/docs/technical_components/metadata_authoring.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
# Metadata Authoring

!!! component-header "Info"
**Current version:** 0.1

**Project:** https://github.com/soilwise-he/soilinfohub

**Access point:** https://github.com/soilwise-he/soilinfohub

## Functionality

**No implementations are yet an integrated part of the SWR delivery.**

## Foreseen functionality

Users are enabled to create and maintain metadata records within the SWR, in case these records can not be imported from a remote source. Note that importing records from remote is the preferred approach from the SWR point of view because the ownership and persistence of the record is facilitated by the remote platform.

- Users login to the system and are enabled to upload a metadata record.
Expand Down
2 changes: 1 addition & 1 deletion tech/docs/technical_components/transformation.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ The specific requirements these components have to fulfil are:
- It should be possible to share transformation processes.
- Transformation processes should be fully documented or self-documented.

## Implementation Technologies
## Technology & Integration

We plan to deploy the needed capabilities to the SWR using two technologies:

Expand Down

0 comments on commit fbe0b09

Please sign in to comment.