You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that in some of the published datasets there are issues with single instances of lsqv:RemoteExec that have multiple values for properties like lsqv:hostHash and lsqv:uri, which (conceptually) should be functional.
Further analysing the data and later the source code, I discovered that the problem is that if the timestamp is available (which I guess is most of the times) it is used (alongside the service id) to build the IRI for the remote execution.
The problem is exacerbated in the case of the dbpedia.3.5.1 log, because for some reason the timestamps are truncated at the hour and hence blocks of several executions are merged together.
But it easily happens also in other cases (for sure in the case of the bioportal log) cause multiple query executions may be logged in the same second.
My suggestion is to either use always the sequential id (easiest hack, I guess) or add a mechanism to differentiate the IRIs when the timestamp is the same.
The text was updated successfully, but these errors were encountered:
I guess the better solution is the middle ground: prefer the timestamp - but if there are clashes then start using sequential ids within that timestamp. The idea is that even when processing only a subset of a log one would get the same RDF data for the remote executions because a request usually can be globally identified by its destination URL and timestamp. With sequence ID this information would get completely lost.
I proposed a simple solution. Obviously other solutions may be found, but I think this an important bug and should be addressed. The advantage of this solution is that is does not add any bottleneck.
If the problem is preserving the ID when processing a subset of a log, this could be addressed separately if LSQ is aware of the line of the log on which it starts (could be a parameter), by starting seqId from an appropriate initial value.
I noticed that in some of the published datasets there are issues with single instances of lsqv:RemoteExec that have multiple values for properties like lsqv:hostHash and lsqv:uri, which (conceptually) should be functional.
Further analysing the data and later the source code, I discovered that the problem is that if the timestamp is available (which I guess is most of the times) it is used (alongside the service id) to build the IRI for the remote execution.
The problem is exacerbated in the case of the dbpedia.3.5.1 log, because for some reason the timestamps are truncated at the hour and hence blocks of several executions are merged together.
But it easily happens also in other cases (for sure in the case of the bioportal log) cause multiple query executions may be logged in the same second.
My suggestion is to either use always the sequential id (easiest hack, I guess) or add a mechanism to differentiate the IRIs when the timestamp is the same.
The text was updated successfully, but these errors were encountered: