-
Notifications
You must be signed in to change notification settings - Fork 14
/
interactive-v2.tex
407 lines (348 loc) · 23.6 KB
/
interactive-v2.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
\chapter{Interactive v2 Workload}
\label{sec:interactive-v2}
\begin{quote}
\textit{This chapter is based on the TPCTC~2023 paper ``The LDBC Social Network Benchmark Interactive Workload v2: A Transactional Graph Query Benchmark with Deep Delete Operations''~\cite{DBLP:journals/corr/abs-2307-04820}, co-authored by several members of the SNB task force.}
\end{quote}
\subsection*{Work-in-Progress}
The \interactivevtwo workload is currently work-in-progress.
As of January 2024, commissioning audits for this workload is not yet possible.
\subsection*{Related Software Components}
\begin{itemize}
\item Datagen (Spark-based): \url{https://github.com/ldbc/ldbc_snb_datagen_spark}
\item Driver: \url{https://github.com/ldbc/ldbc_snb_interactive_v2_driver}
\item Reference implementations: \url{https://github.com/ldbc/ldbc_snb_interactive_v2_impls}
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Overview}
\begin{figure}[htb]
\centering
\includegraphics[scale=\yedscale]{figures/interactive-v2-components}
\caption{
Components and workflow of the Interactive~v2 workload.
The corresponding sections are shown in green circles \Circled{\textsf{\scriptsize \S}}.
Legend:
\fcolorbox{mydarkyellow}{mylightyellow}{{\scriptsize \textit{\textsf{Software component}}}}
\fcolorbox{mydarkblue}{mylightblue}{{\scriptsize \textsf{Data artifact\vphantom{p}}}}
}
\label{fig:interactive-components}
\end{figure}
\section{Operations}
The LDBC \snbinteractivevtwo workload uses four types of operations.
There are 14~complex and 7~short read queries.
Update operations include
8~inserts and, newly introduced in the \interactivevtwo workload, 8~deletes.
The workload mix consists of approximately
8\% complex read,
72\% short read,
20\% insert, and
0.2\% delete operations.
The complex reads and the short reads are identical to the ones in \interactivevone, except for query 14, which was replaced to cover the \emph{Cheapest path-finding} choke point.%
\footnote{
The term \emph{shortest paths} refers to the problem of finding \emph{unweighted shortest paths}, which can be computed with BFS.
The term \emph{cheapest paths} refers to the \emph{weighted shortest paths} problem, which can be solved using \eg Dijkstra's algorithm.
}
\paragraph{Cheapest path-finding}
While we strived to keep the changes to the queries minimal, we replaced Q14 due to two reasons.
First, we found the original query in \interactivevone to be ill-suited to the workload as it required the enumeration of \emph{all shortest paths} between two \Persons, which can be prohibitively expensive on large scale factors.
% It is in fact intractable according to its theoretical complexity.
%
% A well-constructed adversarial example can result in an exponential number of paths for the number of edges, I think:
%
% --o-- --o-- --o-- --o-- --o--
% / \ / \ / \ / \ / \
% o---o---o---o---o---o---o---o---o---o---o
% \ / \ / \ / \ / \ /
% --o-- --o-- --o-- --o-- --o--
%
% This graph has 3^5 paths for just 21 nodes and 30 edges.
%
% Note that while this pattern can occur in the LDBC social graph, its length is usually 4 (especially with the new parameter curation where the path is *guaranteed* to be 4).
% This which means it only produces a quadratic (~d^2) number of paths, which is manageable.
%
% Still, "all shortest paths" is a strange kernel that IMHO isn't very useful and we were right to replace it.
Second, we introduced a new choke point,
\textsf{CP-7.6}
\emph{Cheapest path-finding,}
a key computational kernel and a language opportunity for GQL~\cite{DBLP:conf/sigmod/DeutschFGHLLLMM22}.
Therefore, we changed Q14 to use \emph{cheapest paths} instead of \emph{all shortest paths}.
\subsection{Complex Reads}
\label{sec:interactive-v2-complex-reads}
\iftoggle{StandaloneWorkloadSpecification}{
\input{interactive-v2-complex-reads}
}{
\input{query-cards/interactive-complex-read-14-v2}
}
\subsection{Short Reads}
\label{sec:interactive-v2-short-reads}
\iftoggle{StandaloneWorkloadSpecification}{
\input{interactive-short-reads}
}{
The short reads operations are identical to the ones in \interactivevone, see \autoref{sec:interactive-v1-short-reads}.
}
\subsection{Insert Operations}
\label{sec:interactive-v2-insert-operations}
\iftoggle{StandaloneWorkloadSpecification}{
\input{inserts}
}{
See \autoref{sec:insert-operations}.
}
\subsection{Delete Operations}
\label{sec:interactive-v2-delete-operations}
\iftoggle{StandaloneWorkloadSpecification}{
\input{deletes}
}{
See \autoref{sec:delete-operations}.
}
\section{Parameter Curation}
\label{sec:parameter-curation}
% \subsection{The need for parameter curation}
% \label{sec:parameter-curation-motivation}
To prevent caching query results, the \snbinteractivevtwo driver instantiates the parameterized complex read (\CR) query templates with different \emph{substitution parameters} (\aka parameter bindings).
% Selecting parameters that lead to stable query runtimes is non-trivial.
% The key reason for this is that the LDBC social network graph exhibits a realistic highly skewed distribution~\cite{DBLP:conf/tpctc/PhamBE12} where some \Person nodes have a large number of outgoing edges while others only have a few connections.
% This has a significant impact on runtimes: if query parameters are selected using uniform random sampling, query runtimes will be unstable, often leading to multimodal distributions which spread across many orders of magnitude~\cite[Figure 4(e)]{DBLP:journals/pvldb/SzarnyasWSSBWZB22}.
However, the na\"ive approach (using a uniform random sampling of parameters and ignoring updates)
leads to unstable runtimes,
which compromise both the benchmark's understandability and reproducibility.
To ensure stable runtimes, LDBC invented \emph{parameter curation} techniques, which select parameters that produce query runtimes with a unimodal (preferably Gaussian) distribution~\cite{DBLP:conf/tpctc/GubichevB14,DBLP:journals/pvldb/SzarnyasWSSBWZB22}.
\subsection{Building Blocks for Parameter Curation}
\paragraph{Temporal bucketing}
\label{sec:temporal-bucketing}
%
To ensure that operations are always executable, \ie they avoid targeting nodes that are yet to be inserted or ones that are already deleted, the parameter curation process in \interactivevtwo employs \emph{temporal bucketing}.
Namely, we create a parameter bucket for \emph{each day in the simulation time of the update streams},
\ie each day in the simulation time has its own distinct set of parameters.
This is a novel feature in \interactivevtwo{} -- previous SNB benchmarks lacked this feature and only selected parameters from the \emph{initial snapshot}.
\paragraph{Factor tables}
As shown in \autoref{fig:interactive-components}, the parameter generation is a two-step process.
The \emph{factor generator} produces \emph{factor tables}, which contain data cube-like summary statistics~\cite{DBLP:journals/datamine/GrayCBLRVPP97} of the temporal graph such as the number of \Messages for friends.
The factor generator is executed in a distributed setup using Spark as this computation includes expensive joins over large tables,
\eg $\knows(\snbperson, \snbfriend) \bowtie \hasCreator(\snbperson, \snbcomment)$.
% this is really a factor as part of the personNumFriendOfFriendComments table
% + temporal information is also required
% \todo{I'm not 100\% sure where this comes from.}
\subsection{Parameter Curation for Relational Queries}
For relational queries (without path-finding), we based our parameter generation on two techniques.
\paragraph{(1)~Selecting windows}
%
To select the parameters that are expected to yield similar runtimes, we look for windows with the smallest variance for a given value using SQL window functions.
The parameters are first sorted and grouped together based on their difference in frequency.
Groups that are smaller than a given minimum threshold are discarded to select a group of
parameters large enough to generate a sufficient amount of parameters.
From the latter, we select
the group with the smallest standard deviation.
\paragraph{(2)~Selecting distributions}
%
For queries where we want to select parameters that are correlated or anti-correlated, we use factor tables encoding possible combinations (\eg \texttt{countryPairsNumFriends} for \CR[3]):
we select values near a high percentile for the correlated and a low percentile for the anti-correlated case.
\paragraph{Generating the parameters}
%
The parameter candidates discovered by the previous approaches are stored in temporary tables.
The parameter generation step uses these tables to select parameters for each day in the update stream.
% The parameter curation first searches for parts in the factor tables with the smallest variance across for each
% column, called \emph{windows}.
% For each column, the windows are then merged with windows also having the smallest variance.
% This is repeated until there are no windows left or the
% last column is reached.
% The output of the parameter generation is a file for each query containing the substitution parameters.
% This approach significantly increases the complexity of parameter curation as parameter sets
% makes parameter generation more expensive as the parameter generator needs to generate 33~parameter sets (Nov 29 to Dec 31, 2012).
% This makes the parameter generators performance improvements even more important.
\subsection{Parameter Curation for Path-Finding Queries}
\label{sec:path-curation}
\paragraph{The effect of deletes}
%
A key distinguishing feature of graph data management systems is their first-class support for path queries~\cite{DBLP:journals/csur/AnglesABHRV17}.
We demonstrate why ensuring stable query runtimes for path queries is particularly challenging through the example of \autoref{fig:paths}, where we query for the (unweighted) shortest path between \emph{Ada} and \emph{Bob} over a dynamic graph.
Initially, at $t = 1$, the length of the shortest path is 4~hops.
Then, the edge between \emph{Carl} and \emph{Dan} is deleted, making \emph{Ada} and \emph{Bob} unreachable from each other at $t = 2$.
Finally, a new edge is inserted between \emph{Carl} and \emph{Bob}, yielding a shortest path of length 3 at $t = 3$.
This illustrates how a given input parameter (a pair of \Persons) can oscillate between being reachable and being in disjoint connected components over a short period.
To ensure stable query runtimes for path queries in the presence of inserts and deletes, \interactivevtwo introduces a novel \emph{path curation} algorithm, which produces pairs of \Person nodes whose shortest path length from each other is guaranteed to be exactly $k$ hops at any point during a given day.
\paragraph{Graph construction}
%
The parameter curation algorithm builds two variants of the \Person--\knows--\Person subgraph for each day based on the \emph{temporal graph}:
graph $G_1$ has the inserts applied until the beginning of the day and the deletes applied until the end of the day,
while $G_2$ has the deletes applied until the beginning of the day and the inserts applied until the end of the day.
For a given pair of \Person nodes, their shortest path length in $G_1$ is an upper bound $k_\mathrm{upper}$ on their shortest path length at any point in the day -- when the inserts during the day are gradually applied, the shortest path length can only become shorter.
Conversely, $G_2$ gives a lower bound $k_\mathrm{lower}$ for the shortest path -- the deletes can only make the shortest path length become longer.
\paragraph{Parameter selection}
The bounds provided by $G_1$ and $G_2$ guarantee for the shortest path length $k$ that $k_\mathrm{lower} \leq k \leq k_\mathrm{upper}$ will hold at any point during the day.
We can ensure that $k$ will stay constant during the day by selecting \Person pairs where $k_\mathrm{lower} = k_\mathrm{upper}$ holds.
To this end, we select pairs who are exactly 4~hops apart in both $G_1$ and $G_2$, hence they will be always~4 hops apart during the given day.
Unreachable pairs of nodes can be generated by calculating the connected components of $G_2$ and selecting nodes from disjoint components.
The path curation for both the reachable and the unreachable cases is implemented using the NetworKit graph algorithm library~\cite{lit:networkit}.
\newsavebox{\largestimage}
\begin{figure}[htb]
\savebox{\largestimage}{\includegraphics[width=.61\textwidth]{figures/paramgen/paramgen-frequency-disc-countries}}%
\begin{subfigure}{0.39\textwidth}
\centering
\raisebox{\dimexpr.5\ht\largestimage-.5\height}{%
\includegraphics[width=\textwidth]{figures/paths-small}
}
\caption{
Shortest path (denoted with thick lines) between \emph{Ada} and \emph{Bob} in the presence of updates.
}
\label{fig:paths}
\end{subfigure}
\hspace{2mm}
\begin{subfigure}{0.58\textwidth}
\centering
\includegraphics[width=\linewidth]{figures/paramgen/paramgen-frequency-disc-countries}
\vspace{-3ex}
\caption{
Pairs of \Countries in the \texttt{countryPairsNum\-Friends} factor table representing the number of friendships between both \Countries.
%{percentile\_disc} function. Correlated \Countries (red) are selected using percentile $1.0$ and anti-correlated \Countries (green) are selected using percentile $0.01$.
}
\label{fig:paramgen-frequency-disc-countries}
\end{subfigure}
\caption{Example graph and distribution for path curation.}
\label{y}
\vspace{-4.5ex}
\end{figure}
\subsection{Query Variants}
\label{sec:interactive-v2-query-variants}
The new workload introduces variants for three queries:
\queryRefCard{interactive-complex-read-03}{IC}{3},
\queryRefCard{interactive-complex-read-13}{IC}{13},
\queryRefCard{interactive-complex-read-14-v2}{IC}{14v2}.
\paragraph{Complex read 3: Correlated \vs anti-correlated Countries}
\label{sec:cr3-variants}
%
For \queryRefCard{interactive-complex-read-03}{IC}{3},
variant \CR[3(a)] starts from \Countries that have a high correlation in the friendship network,
while
variant \CR[3(b)] starts from \Countries that have a low correlation of friendships between.
To generate these inputs, we use the \texttt{countryPairsNumFriends} factor table visualized in \autoref{fig:paramgen-frequency-disc-countries} and select values at percentile~1.00 for variant~\texttt{(a)} and percentile~0.01 for variant~\texttt{(b)}.
\paragraph{Complex reads 13 and 14: Reachable \vs unreachable Persons}
%
Path queries are expected to have different runtimes if there is a path \vs when there is no path.
While the performance characteristics vary highly between systems, in principle, the ``no path'' case should be simpler in the SNB graph, where one of the nodes is always in a small connected component.
To distinguish between these cases, we have two variants for the two path queries \queryRefCard{interactive-complex-read-13}{IC}{13} and \queryRefCard{interactive-complex-read-14-v2}{IC}{14v2}.
For variants~\snbOperation{(a)} we select \Person pairs which \emph{do not have a path},
and for variants~\snbOperation{(b)} we select pairs which \emph{have} a path of length~4.
% (guaranteed by path curation, see \autoref{sec:path-curation}).
\subsection{Parameter Generator Implementation}
\label{sec:paramgen-implementation}
% \paragraph{DuckDB SQL queries}
%
The parameter generator is implemented in Python using NetworKit~\cite{lit:networkit} and SQL queries executed by DuckDB~\cite{DBLP:conf/sigmod/RaasveldtM19}.
% We opted to use DuckDB for multiple reasons:
% it is an embeddable RDBMS with zero setup costs,
% it is optimized for join- and aggregation-heavy analytical queries,
% and it supports the aggregate functions%
% \footnote{\eg \texttt{percentile\_disc()}, see \url{https://duckdb.org/docs/sql/aggregates.html}}
% and window functions%
% \footnote{\eg \texttt{lag()}, see \url{https://duckdb.org/docs/sql/window_functions.html}}
% required by our queries.
%
%\paragraph{Scalability}
%
Based on our experiments in~\cite[Figure~4.3]{david-puroja-msc}, the new parameter generator is scalable.
Even with the significant extra work performed for temporal bucketing,
it outperforms the old parameter generator by more than $100\times$ on SF\numprint{1000},
and finishes in less than 1.5~hours on SF\numprint{10000}.
\section{Workload Scheduling and Benchmark Driver}
\label{sec:workload-and-driver}
In this section, we explain how operations are scheduled in the SNB Interactive workload, how the driver operates, and how the final \emph{throughput} metric is determined.
In all cases, we assume that the system-under-test has been populated with the \emph{initial snapshot} using a \emph{bulk loader} before the driver runs the operations.
\subsection{Scheduling Operations}
\label{sec:scheduling}
\paragraph{TCR (total compression ratio)}
%
The scheduling follows the \emph{simulation time} of the temporal social network graph.
The user-provided \emph{total compression ratio} (TCR) value controls the speed at which the simulation is replayed.
For example, a TCR value of $0.02$ means that the simulation is replayed $50\times$ faster, \ie for every 20~milliseconds in wall clock time, 1~second passes in the simulation time.
%Conversely, a TCR value of $3$ means that for every 3~seconds wall clock time, one second passes in the simulation.
%\footnote{The TCR has a minimum value of 0.001 to prevent the system from running out of updates during the 2.5-hour period of the benchmark.}
\paragraph{Update operations}
%
The driver replays the update operations starting from the cutoff date, Nov 29, 2012.
The operations are scheduled according to the distance of their start time from this date, adjusted by the TCR.
%\footnote{For example, if the TCR value is $0.02$, one hour of wall-clock time will result in replaying 50 hours of simulation time, reaching Dec 1, 2012 02:00.}
They are then used to set the cadence of the schedule for the complex reads and, in turn, the short read queries, as we will explain momentarily.
%The \emph{update interleave} value denotes the average (simulation) time that passes between update operations and is used to schedule read queries.
%$\frac{T_\textit{end} - T_\textit{start}}{\textrm{\#updates}}$.
\paragraph{Complex read queries}
%
The \emph{complex read queries} differ significantly in their expected runtimes as they touch on different amounts of data.
As each query instance contributes equally to the output metric,%
\footnote{Unlike in \tpcH~\cite{tpch} and \snbbi~\cite{DBLP:journals/pvldb/SzarnyasWSSBWZB22}, which use \emph{geometric mean} in their metrics.}
we balance them such that each query type is expected to take the same amount of time to execute.
For example, \CR[14 (new)] is expected to be more difficult than \CR[13], therefore it is scheduled less frequently.
Frequencies vary based on the SF as the relative difficulties of queries change with the data size
(\eg three-hop neighbourhood queries grow faster on larger SFs than one-hop ones).
\paragraph{Short read queries}
%
Short read queries are triggered by complex read queries and other short read queries, and use their output as their input.
For example, both \CR[3] and \CR[14] trigger \SR[2], which also triggers itself.
This mimics the real-life scenario of a user retrieving more information about \Person profiles based on the result of the earlier queries.
% The amount of short reads executed after a complex read query depends on the result size of that pre read operation.
% IIRC short reads follow some sort of exponential backoff.
To see which short read queries are potentially triggered after given short read and complex read queries, see \autoref{tab:short-read-queries-triggered}.
\subsection{Driver}
\label{sec:driver}
\begin{figure}[htb]
\centering
\begin{subfigure}{\linewidth}
\centering
\includegraphics[scale=\yedscale]{figures/interactive-v2-validation-workflow}
\caption{Validation workflow running on a single thread.}
\label{fig:interactive-validation-workflow}
\end{subfigure}
\begin{subfigure}{\linewidth}
\centering
\includegraphics[scale=\yedscale]{figures/interactive-v2-benchmark-workflow}
\caption{Benchmark workflow using multiple threads.}
\label{fig:interactive-benchmark-workflow}
\end{subfigure}
\caption{
Workflow of driver modes in SNB Interactive v2.
}
\label{fig:interactive-workflows}
\vspace{-2.5ex}
\end{figure}
\paragraph{Driver modes}
%
The SNB driver has two key modes of operation.
In \emph{cross-validation mode} (\autoref{fig:interactive-validation-workflow})m
the driver tests an implementation against the output of another implementation.
To ensure deterministic results, operations in this mode are executed sequentially with no overlap between queries and updates.
In \emph{benchmark mode} (\autoref{fig:interactive-benchmark-workflow}),
the driver performs a benchmark run where queries and updates are issued concurrently from multiple threads.
The run starts with a 30-minute warm-up period, followed by a 2-hour \emph{measurement window}.
This mode does not perform validation as query results may differ (slightly) due to concurrent updates.
\paragraph{Dependency tracking}
%
To ensure that updates are executable, concurrent threads must be synchronized so that an operation is only executed when its dependencies exist in the network (\eg two \Persons can only become friends if both of them already exist).
This is achieved via maintaining a global clock in the driver and performing \emph{dependency tracking} for the updates~\cite{DBLP:conf/sigmod/ErlingALCGPPB15}: each update operation has a timestamp denoting the creation time of the last operation it depends on.
The data generator calculates these timestamp during generation and ensures that there is a minimum time separation,
$T_\textit{safe}$,
between dependent entities to reduce synchronization overhead in the driver when executing operations.
The driver then only needs to check every $T_\textit{safe}$ time whether a given update operation can be executed.
By default, $T_\textit{safe}$ is set to 10 seconds in the simulation time.
\paragraph{Latency requirements}
%
The workload simulates a highly transactional scenario where operations are subject to (soft) latency requirements.
To incorporate this in the workload, it prescribes the \emph{95\% on-time requirement}:
for a benchmark run to be successful, 95\% of the operations must start \emph{on-time}, \ie within 1~second of their scheduled start time.
Benchmark runs where the system-under-test falls behind too much from the schedule are considered invalid.
\paragraph{Throughput}
%
The throughput of a run is the total number of operations
(\CR, \SR, \INS, \DEL)
executed per second.
A lower TCR value implies a higher throughput.
\paragraph{Individual execution times}
%
To facilitate deeper analyis, the benchmark driver also collects all individual query execution times.
Based on these, the benchmark reports must include statics for each operation type (min, max, mean, $P_{50}$, $P_{90}$, $P_{95}$, and $P_{99}$ of the execution times).
\paragraph{Driver implementation in v2}
The \interactivevtwo is implemented in Java~17.
It consists of \numprint{26500} lines of code for the core project and an additional \numprint{18000} lines of test code.
The new version contains several patches including bug fixes, usability improvements, and performance optimizations.
%Among other changes, the driver now uses DuckDB~\cite{DBLP:conf/sigmod/RaasveldtM19} for loading and the query parameters and update streams from Parquet format
%~\cite{DBLP:reference/bdt/Floratou19}
%for faster deserialization.