Skip to content

Latest commit

 

History

History
177 lines (99 loc) · 13.4 KB

instance_conversion_rules.adoc

File metadata and controls

177 lines (99 loc) · 13.4 KB

Instance conversion rules

From input to output data structure

An object from the input structure is converted to an RDF resource. Object properties are represented by RDF resource properties. A dataset is represented by an RDF graph.

Resource identifiers

The identifier of an RDF resource that represents a spatial object should be constructed from its 'inspireId'. However, the identifier should not contain the 'versionId' from the 'inspireId'.

Versioning is a known open issue (see the Generic Conceptual Model, section 9.7.2), and therefore not addressed by this specification.

If an RDF resource has a unique identifier, the identifier should be a globally unique persistent HTTP URI.

This recommendation follows Best Practice #7 in the Spatial Data on the Web Best Practices, but also takes into account additional considerations on spatial objects.

When designing an HTTP URI, take into account the recommendations from Annex H of the INSPIRE Generic Conceptual Model as well as the ISA studies on persistent URIs and RDF & PIDs.

When representing a structured value - also called data type - as an RDF resource, a data provider may assign a unique identifier for the resource.

A structured value / data type does not have identity (see the UML Profile defined in clause 9.6.3 of the INSPIRE Generic Conceptual Model), and therefore has only local scope.

In some concrete RDF syntaxes (for example RDF XML, Turtle, and N-Triples), such a structured value can be encoded as an RDF blank node, i.e. an RDF resource without identifier. As explained in the RDF 1.1 Concepts and Abstract Syntax, "The blank node identifiers introduced by some concrete syntaxes have only local scope and are purely an artifact of the serialization."

When representing a structured value as an RDF resource, a data provider therefore has a choice between creating an identifier for the structured value, and representing it as a blank node.

Considerations on spatial objects

According to the definition of the [inspire_directive], a spatial object is an abstract representation of a real-world phenomenon related to a specific location or geographical area.

Most attributes and roles of a spatial object represent properties of the real-world phenomenon. It is common practice to model both properties that describe the real-world phenomenon and properties that describe the spatial object document, i.e. which are spatial object metadata, as spatial object properties.

In INSPIRE, most properties are properties that describe the real-world phenomenon. However, there are exceptions:

  • Properties that represent life-cycle information (in particular, the beginLifespanVersion and endLifespanVersion attributes) are marked with the stereotype <<lifeCycleInfo>>.

  • Properties that have a value type from ISO 19115 are often feature metadata. However, this is not always the case, in particular for CI types. An example is ProtectedSite.legalFoundationDocument with value type CI_Citation.

  • There are also some properties that require a closer review to identify them as metadata. Examples are CadastralZoning.estimatedAccuracy with value type Length or CadastralZoning.originalMapScaleDenominator with value type Integer. These properties are not properties of the real-world phenomenon, but of the spatial object.

It is important in linked data and the semantic web to be clear about the subject of a property. It might therefore be tempting to use two subjects for the properties of a spatial object: one subject for all properties providing information about the real-world phenomenon and one subject for all properties that are about the spatial object itself (like life-cycle information) or that provide metadata. Such a separation of spatial object data is NOT recommended, because there is no actual benefit in doing so.

From the perspective of RDF vocabularies there is no distinction between the two types of properties. In RDF, a property is a relationship between a subject resource and an object resource. "Anything can be a resource, including physical things, documents, abstract concepts, […​]" [w3c_rdf11_concepts] The semantic of a spatial object property has to provide information whether the property represents metadata or actual data that describes the real-world phenomenon. This condition must be fulfilled, otherwise the model would be ambiguous. If the condition is fulfilled, then assigning spatial object properties to two different subjects, just to identify data about the real-world phenomenon and metadata, is not necessary.

Another argument that has been made for separating data about the real-world phenomenon from metadata is that such a separation would facilitate gaining additional knowledge, i.e. finding more information, about the same real-world phenomenon. This would be achieved through the ability to assert equality between two resources which, in this case, represent the same real-world phenomenon.

Before we discuss the topic of equality in more detail, we need to consider that multiple communities can define RDF/OWL classes to abstract the same real-world phenomenon, but in their particular "worlds" (their "universe of discourse"). Typically, these abstractions differ. The sets of properties used to describe the phenomenon can be different. If the same properties are used, then their values can still be different. For example, the geometry of a building can be represented as a solid, a polygon, or a point. The height of the building can be given in different formats ("10m", "100ft", "100", "small", "Code_T"). As we can see, the information provided for a real-world phenomenon depends on how a community abstracts this phenomenon.

Coming back to the topic of equality: With pure RDF semantics, properties with the same subject belong to the same resource. If two communities used the same identifier for their abstractions of real-world phenomena, then these abstractions would be the same resource. Consequently, the information captured by both communities about the phenomenon would be available for that resource.

On first glance, this sounds useful. However, this approach can cause problems for users that want to gain new knowledge about a resource. Whenever two representations of the same resource share the same properties (e.g. "geometry" or "creationDate"), but with different values (e.g. a polygon vs a point, "1954-04-28" vs "1954-05-02"), then the user must decide what to do with this information. Should information be ignored (e.g. use the polygon instead of the point), or does it need to be processed to derive new information (e.g. computing the mean of two creation dates)? This problem will get bigger if more resources are "equal".

This problem is a general one. It can occur whenever two resources are equal. Equality can also be asserted, usually using the property owl:sameAs. Communities often do not use shared identifiers for their resources. In these situations, owl:sameAs predicates can be used to assert equality between two resources. That different identifiers are used for resources that represent the same real-world phenomenon is common, and is independent of a separation between resources with properties that only provide information about the real-world phenomenon and resources with properties that provide metadata.

As we can see, gaining new knowledge from linked data is a hard problem. It is not solved by simply establishing equality (by using the same resource identifier, or by using owl:sameAs). In fact, abstractions of real-world phenomena created by different communities can rarely considered to be equal / the same. The abstractions will very likely be different, to a greater or lesser extent.

Note
Best practice No 1 of the The Spatial Data on the Web Best Practices also recommends caution when using owl:sameAs: "Care must always be taken when using owl:sameAs to determine that [the] two URIs actually refer to the same resource, rather than two resources that are similar. Warning: don’t say it if you’re not sure it’s true!"

How can INSPIRE RDF data support referencing the real-world phenomena that are abstracted, without requiring resource equality? The link between an INSPIRE resource and one or more real-world phenomena can be established with a property that does not imply resource equality. Unfortunately, the semantic web standards do not define a property with these semantics.

We can conclude that:

  • A separation of an INSPIRE spatial object into two resources, one with the properties that exclusively describe the real-world phenomenon that is being abstracted by the spatial object and one with the other properties, creates no actual benefit.

  • Full equality between two resources often does not exist. Care should be taken when assigning equality.

Encoding geometry

An instance of an ISO 19107 geometry shall be serialized as a GeoSPARQL geometry as follows:

  • If the geometry is compliant to a Simple Feature geometry, it shall be serialized using the WKT Serialization defined by GeoSPARQL.

  • Otherwise, it shall be serialized using the GML Serialization defined by GeoSPARQL.

Properties from INSPIRE application schemas that have a value type from ISO 19107 are aligned with GeoSPARQL and the ISA Programme Location Core Vocabulary (LOCN) - see the property alignment requirements in the schema conversion rules chapter. Both specifications (GeoSPARQL and LOCN) support WKT and GML for serializing a geometry.

The requirement establishes WKT and GML as the main formats for encoding geometry information in INSPIRE RDF data. At the same time, however, the requirement does not preclude addition of predicates from GeoSPARQL (gsp:hasGeometry) and LOCN (locn:geometry) with other geometry serializations. In summary, the requirement establishes a suitable level of interoperability while offering enough flexibility to encode geometry in other formats.

Encoding metadata

According to the GeoDCAT-AP specification, "GeoDCAT-AP provides an RDF syntax binding for the union of metadata elements of the core profile of ISO 19115:2003 and those defined in the framework of the INSPIRE Directive". It is therefore well-suited to implement MD_Metadata in the context of INSPIRE.

An instance of MD_Metadata, implemented as dcat:Dataset, shall be compliant to GeoDCAT-AP.

An instance of CI_Date shall be represented by one of the DCMI Metadata Terms properties created, modified, and issued:

  • dct:created - if the date type of the CI_Date is 'creation'

  • dct:issued - if the date type of the CI_Date is 'publication'

  • dct:modified - if the date type of the CI_Date is 'revision'

Caution

Relevant issue(s):

Value collections

Multiple values for a single property shall be encoded as multiple property assertions, rather than using an RDF collection (rdf:List) or RDF containers (rdf:Bag, rdf:Seq, rdf:Alt).

These RDF constructs do not add particular value, since the INSPIRE application schemas do not make use of ordering and uniqueness indicators for properties with multiplicity greater than one.

Extensions

A publisher may have more information about a spatial object than what is covered by the INSPIRE schemas. When encoding the spatial object as an RDF resource, the information that is covered by the INSPIRE schemas should be encoded as defined by this document. The additional information can be added to that resource in any way the publisher likes.

This extensibility is one of the benefits of RDF: a resource can be of any type, also multiple types - with according information expressed via RDF statements where the resource is the subject.