cost-models.tex

\chapter{Cost Models}

When analysing the performance of algorithms, we do so not by
implementing them on some concrete machine and measuring their
performance, but rather by mathematically quantifying how many
``steps'' they need to finish. In your previous studies, you may have
used an intuitive notion of what a ``step'' is, such as counting
program statements or arithmetic operations. When analysing parallel
algorithms, we need more precision, since we are not just interested
in the total amount of work, but also the nature of the work---i.e.,
whether it is parallel.

\begin{definition}[Cost model]
  A mathematical model used for algorithmic analysis that defines the
  cost of various operations in some specified language. May contain
  one or more \emph{cost measures}.
\end{definition}

\begin{definition}[Cost measure]
  A (usually) discrete measure for the cost of a computation along
  some axis, such as sequential work, total work, or space usage.
\end{definition}

\begin{definition}[Work]
  A cost measure denoting the total amount of steps in an execution.
\end{definition}

\begin{definition}[Span/depth]
  A cost measure denoting the length of the longest sequential
  dependency chain in an execution.
\end{definition}

On an infinitely parallel computer, the span of a program is the
number of sequential steps needed to execute the program. Since
infinitely parallel computers are somewhat unlikely, we must wonder
whether the span makes sense on our tragically finite computer. The
answer is \emph{yes}: we can simulate an infinitely parallel computer
on a finitely parallel computer with overhead proportional to the
amount of ``missing'' parallelism. The theoretical justification for
this is \emph{Brent's Theorem}~\cite{10.1145/321812.321815}.

  \begin{theorem}[Brent's Theorem]
    Writing $T_{i}$ for the time taken to execute a program on $i$
    processors,
  \[
    \frac{T_{1}}{p} \leq T_{p} \leq T_{\infty} + \frac{T_{1}}{p}
  \]
  where $p$ is the number of available processors.

  \begin{proof}
    Suppose that the computation DAG has $n$ levels, and at level $j$
    of the computation DAG there are $M_{j}$ independent operations.
    Then the total work of our program, and the time taken on a single
    processor, is
  \[
    T_{1} = \sum_{i=1}^{n}M_{i}.
  \]
  In the case where we have an infinite number of processors, each of
  the $M_{j}$ independent operations can be executed in a single step, so we also have
  \[
    T_{\infty} = n
  \]
  because each level can be executed in one step. For the case where we
  have $p$ processors, we denote the time taken to execute level $i$
  of the DAG by $T_{p}^{i}$, where
  \[
    T_{i}^{p} = \Bigl\lceil{\frac{M_{i}}{p}}\Bigr\rceil \leq
    \frac{M_{i}}{p}+1.
  \]
  The intuition behind the inequality is that if we need to execute
  $k$ independent nodes and we have $p$ processors, then we can
  perform $\lceil \frac{k}{n} \rceil$ sequential iterations, with each
  chunk executing (up to) $p$ nodes in parallel. Finally,
  \[
    T_{p} = \sum_{i=1}^{n}T_{i}^{p} \leq \sum_{i=1}^{n}\Bigl(\frac{M_{i}}{p}+1\Bigr) = \frac{T_{1}}{p} + T_{\infty}
  \]
  \end{proof}

  \end{theorem}


%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: "dpp-notes"
%%% End: