-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to model dct:temporal for continously evolving Datasets? #201
Comments
First of all, the As to modelling "last 10 days", you could have also continuously updated metadata, and change startDate and endDate. Otherwise, I think there is no solution to this currently using DCAT(-AP). |
Hm, updating the meta data every hour, just because time moved an hour, is not our desired solution... Doesn't anyone else sees this usecase. Maybe we need standard solution? |
Continuously updating metadata will not work. Even if I update the the values in my local open data portal the national portal will take a (daily) snapshot and soon after the temporal information will be incorrect. The error will be even bigger until the European data portal has taken over the data. Therefore, we need a solution to specify these continuously changing datasets. Here is a real world example: A weather observation of the Deutscher Wetterdienst always covers the last 24 hours: https://opendata.dwd.de/weather/weather_reports/poi/10015-BEOB.csv |
This is indeed not possible to express as such in DCAT(-AP). And as @jze explains there is no guarantee that the metadata you find at the harvested dataportal is the most accurate one. Both are connected issues but also distinct. If you connect them as @jze, then a loosely coupled distributed cross-organisational system shall/cannot work. For this distribution scheme, temporal delays and information skew are part of the game. However one can compensate this by using proper PURI handling: namely a visitor of the EDP might find the German Weather reports dataset and considers them to use. In that case the visitor has to go to the source of the metadata to find all information natively. Through which the most recent info can be found e.g. that this dataset is now obsolete, and replaced with a JSON REST API. This example is to illustrate that we should keep the objectives of the Open Data Portals clear. If you want to have machines connected to your endpoints though your catalogue then very precise and up-to-date meta data is required. However, that is not the objective for most Open Data Portals. They are a human browseable interface to (governmental) data. One can also look to this topic from the human consumer perspective: knowing it is continuously updating data is probably a criterion I am going to use when looking for appropriate data sources. But knowing I only get a window of 10 days is probably less important at first. I would consider that a technical implementation restriction. From that perspective it is less problematic that this information is not machine processable available, but described in some textual notes. I have encountered datastreams with windows of 1 day, 1hour, 10 years. Independent of my intended usage, the question remains then how to express this window. Expressing a window could be done via temporal coverage (https://www.w3.org/TR/vocab-dcat-2/#Property:dataset_temporal). But the window expression is hard to construct. I have no direct answer for that. Probably we could define based on https://www.w3.org/TR/owl-time/#link-interval-meets, the notion of a coverage window
But I did not find yet the notion of NOW. |
It is a pity that this problem was marked as wont-fix. In practice it is very relevant. Now there is no way to express these records DCAT-AP compliant. Especially when forwarding to other portals, it is important not to have to specify fixed times. Without "floating" time data, we will often have incorrect time metadata. |
@jze, I tagged it as won't fix because there will be no resolution in the near future in DCAT-AP. If you believe this should be future work, I will tag it as that. |
On your sentence:
I think you want to say that "I have no formal way to express that only the data of the last days is available". Note that a landingspage in which you explain to the potential reuser this situation, is always possible.
As mentioned in my previous answer, you could explore temporal expressions e.g. build from OWLTime. But no guarantee this allows to express this. To a certain level, your usecase is similar as legal information. Before the existence of ODRL, there was no other way to express legal information as in a document. In your usecase there must be a formal language (suggestion OWLTime) that is able to express the situation and then it is easy to adopt it in DCAT-AP. |
In GovDataOfficial/DCAT-AP.de#17 we are discussing a real usecase where I am surprised to find no obvous answer. Maybe I am missing something.
There is a Dataset which is updated constantly (
dcterms:accrualPeriodicity
) with a resolution of one hour (dcat:temporalResolution
). But you can only get the data of the last 10 days. (Something that's probably pretty common for sensor data.)How would you model this? Neither
xsd:date
nordcterms:PeriodOfTime
allows this. We would need a xsd:duration:But that would not be allowed. (And it would only be implicit, that you get the last 10 days.)
The text was updated successfully, but these errors were encountered: