Multiple remote query executions merged together due to timestamp clash #40

miguel76 · 2022-09-09T20:54:37Z

I noticed that in some of the published datasets there are issues with single instances of lsqv:RemoteExec that have multiple values for properties like lsqv:hostHash and lsqv:uri, which (conceptually) should be functional.
Further analysing the data and later the source code, I discovered that the problem is that if the timestamp is available (which I guess is most of the times) it is used (alongside the service id) to build the IRI for the remote execution.
The problem is exacerbated in the case of the dbpedia.3.5.1 log, because for some reason the timestamps are truncated at the hour and hence blocks of several executions are merged together.
But it easily happens also in other cases (for sure in the case of the bioportal log) cause multiple query executions may be logged in the same second.

My suggestion is to either use always the sequential id (easiest hack, I guess) or add a mechanism to differentiate the IRIs when the timestamp is the same.

Aklakan · 2022-09-23T12:21:40Z

I guess the better solution is the middle ground: prefer the timestamp - but if there are clashes then start using sequential ids within that timestamp. The idea is that even when processing only a subset of a log one would get the same RDF data for the remote executions because a request usually can be globally identified by its destination URL and timestamp. With sequence ID this information would get completely lost.

miguel76 · 2022-10-10T13:06:38Z

I proposed a simple solution. Obviously other solutions may be found, but I think this an important bug and should be addressed. The advantage of this solution is that is does not add any bottleneck.
If the problem is preserving the ID when processing a subset of a log, this could be addressed separately if LSQ is aware of the line of the log on which it starts (could be a parameter), by starting seqId from an appropriate initial value.

miguel76 mentioned this issue Oct 10, 2022

Always use seqId in logEntryId #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple remote query executions merged together due to timestamp clash #40

Multiple remote query executions merged together due to timestamp clash #40

miguel76 commented Sep 9, 2022

Aklakan commented Sep 23, 2022

miguel76 commented Oct 10, 2022

Multiple remote query executions merged together due to timestamp clash #40

Multiple remote query executions merged together due to timestamp clash #40

Comments

miguel76 commented Sep 9, 2022

Aklakan commented Sep 23, 2022

miguel76 commented Oct 10, 2022