primate’s namesake (and some of the original code1) was inspired from the (excellent) imate package, prompting questions about their differences. In general, primate was developed with slightly different goals in mind than imate, most of which have to do with things like integrability, extensibility, and choice of FFI / build system.
-
Notable differences between the two packages include:
One motivation for developing primate was to modularize and streamline access to Lanczos-based methods, which is achieved through the use of things like function templates, type erasure, and header-only definitions. These modifications not only simplify access from user (i.e. dependent) packages, but they enable native support for arbitrary classes adhering to the LinearOperator concept. For more details on this, see the integration guides.
-
-
-
-
-
-
-
Footnotes
-
-
-
Before v0.2, much of primate’s code was essentially ported and refactored from imate. The code for v0.2+ has been re-written using the Eigen template C++ library.↩︎
-
primate does not provide native GPU-implemented Linear operators. However, there is nothing preventing one from using e.g. CUDA- or ROCm-based GPU-based tensor libraries to accelerate matrix-vector products. Indeed, primate was designed to work with essentially any operator matching the interface.↩︎
primate contains a variety of algorithms for estimating quantities from matrices and matrix functions, with a focus on common quantities of interest, such as the trace or the diagonal. This page briefly outlines the variety of options primate offers to approximate these quantities, and how to configure these approximations per use case.
-
-
Implicit matrix representations
-
In all of the examples below, by the term matrix we mean generically anything that comes with an associated v \mapsto Av capability via and overloaded __matvec__ or __matmul__ magic. While this includes dense matrices, such as those used by e.g. NumPy and PyTorch, is also includes sparse matrices and LinearOperator’s. Because the latter mots option need not store any part of A explicitly, such operators are referred to as implicit matrix representations.
-
-
-
Trace estimation
-
The implicit matrix trace estimation problem is to estimate the trace using onlyv \mapsto Av applications:
This implicit trace estimators are available in theprimate.trace, which include the Girard-Hutchinson estimator (hutch), the improved Hutch++ (hutchpp), and XTrace (xtrace):
Then you may use the matrix_function API to construct a LinearOperator that approximates matrix function applications v \mapsto f(A)v. To do this, simply call matrix_function(A, fun=...) where fun is either:
-
-
a string representing the name of one of a built-in matrix function, or
-
the corresponding spectral function as a Callable
-
-
For example, one might compute the log-determinant of a positive definite matrix as follows:
The diagonals of matrices and matrix functions (implicitly or explicitly represented) can also be estimated via nearly identical API used for the trace.
In primate, the matrix function f(A) is not constructed explicitly but instead the action v \mapsto f(A)v is approximated with a fixed-degree Krylov expansion. This can be useful when, for example, the matrix A itself is so large that the corresponding (typically dense) matrix function f(A) \in \mathbb{R}^{n \times n} simply is too large to be explicitly represented. If you just want to approximate the action of a matrix function for a single vector v \in \mathbb{R}^n, simply supply the vector and the matrix alongside the matrix_function call:
Alternatively, if you prefer an object-oriented approach (or you plan on doing multiple matvecs), you can construct a MatrixFunction instance and use it like any other LinearOperator:
-
-
from primate.operators import MatrixFunction
-ExpA = MatrixFunction(A, fun=np.exp)
-y = ExpA @ v
-print(f"exp(A)v = {arr_summary(y)}")
-
-
exp(A)v = [0.52,0.90,...,0.24]
-
-
-
If you don’t supply a vector v to the matrix_function call, a MatrixFunction instance is constructed using whatever additional arguments are passed in and returned. Note some function specializations are inherently more difficult to approximate and can depend on the smoothness of f and the conditioning of the corresponding operator f(A); in general, a MatrixFunction instance with degree k approximates the action v \mapsto f(A)v about as well as the operator p(A), where p is a degree 2k-1 polynomial interpolant of f.
As you can see, for smoother matrix functions (like \mathrm{exp}(A)), even a low degree Krylov expansion can be more than sufficient for many application purposes—all without any re-orthogonalization! See the matrix function guide for more background on this.
-
-
-
Configuring the output
-
-
Passing full=True returns additional information about the computation in the form of EstimatorResult (along with the estimate itself), which contains information about execution itself, convergence information of the estimator, and other status messages.
-
For example, with the default converge="confidene" criterion, the margin of error of a default-constructed confidence interval is returned:
-
-
est, info = hutch(A, converge="confidence", full=True)
-print(info.message)
-
-
Est: 74.741 +/- 1.206 (95% CI, #S:64)
-
-
-
A more visual way of viewing the sample values and the corresponding estimate as a function of the sample size is to plot the sequence with the figure_sequence function (note this requires saving the samples with record=True):
-
-
from primate.plotting import figure_sequence
-
-est, info = hutch(A, full=True, record=True)
-p = figure_sequence(info.estimator.values)
-show(p)
-
-
-
-
-
-
-
-
-
You can also pass a callback function, which receives as its only argument an EstimatorResult instance. This can be useful for quickly monitoring convergence status, saving intermediate information, etc.
-
-
from primate.estimators import EstimatorResult
-def hutch_callback(result: EstimatorResult):
-if (result.nit %10) ==0:
-print(result.criterion.message(result.estimator))
-
-est, info = hutch(A, converge="confidence", callback=hutch_callback)
primate contains a variety of methods for estimating quantities from matrix functions. One popular such quantity is the trace of f(A), for some generic f: [a,b] \to \mathbb{R} defined on the spectrum of A: \mathrm{tr}(f(A)) \triangleq \mathrm{tr}(U f(\Lambda) U^{\intercal}), \quad \quad f : [a,b] \to \mathbb{R}
-
Many quantities of common interest can be expressed in as traces of matrix functions with the right choice of f. A few such function specializations include:
In this introduction, the basics of randomized trace estimation are introduced, with a focus on how to use matrix-free algorithms to estimate traces of matrix functions.
-
-
-
-
-
-
-
-
Basics: Trace computation
-
-
Suppose we wanted to compute the trace of some matrix A. By the very definition of the trace, we have a variety of means to do it: \mathrm{tr}(A) \triangleq \sum\limits_{i=1}^n A_{ii} = \sum\limits_{i=1}^n \lambda_i = \sum\limits_{i=1}^n e_i^T A e_i
-
where, in the right-most sum, we’re using e_i to represent the ith identity vector: A_{ii} = e_i^T A e_i, \quad e_i = [0, \dots, 0, \underbrace{1}_{i}, 0, \dots, 0]
-
Let’s see some code. First, we start with a simple (random) positive-definite matrix A \in \mathbb{R}^{n \times n}.
-
-
from primate.random import symmetric
-A = symmetric(150)
-
-
By default, symmetric normalizes A such that the eigenvalues are uniformly distributed in the interval [-1, 1]. For reference, the spectrum of A looks as follows:
-
-
-
-
-
-
-
-
-
-
Now, using numpy, we can verify all of these trace definitions are indeed equivalent:
-
-
-
import numpy as np
-eye =lambda i: np.ravel(np.eye(1, 150, k=i))
-print(f"Direct: {np.sum(A.diagonal()):.8f}")
-print(f"Eigen: {np.sum(np.linalg.eigvalsh(A)):.8f}")
-print(f"Matvec: {sum([eye(i) @ A @ eye(i) for i inrange(150)]):.8f}")
While (1) trivially yields \mathrm{tr}(A) in (optimal) O(n) time, there exist situations where the diagonal entries of A are not available—particularly in the large-scale regime. In contrast, though both (2) and (3) are less efficient, they are nonetheless viable alternatives to obtain \mathrm{tr}(A).
-
-
-
Implicit trace estimation
-
The implicit trace estimation problem can be stated as follows:
-
-
Given access to a square matrix A \in \mathbb{R}^{n \times n} via its matrix-vector product operator x \mapsto Ax, estimate its trace \mathrm{tr}(A) = \sum\limits_{i=1}^n A_{ii}
-
-
Note we no longer assume we have direct access to the diagonal here, leaving us with the Eigen or Matvec methods. The Eigen method is expensive, and though the Matvec method solves the problem exactly, its pretty inefficient from a computational standpoint—most of the entries of e_i are zero, thus most inner product computations do not contribute to the trace estimate at all.
-
One idea, attributed to A. Girard, is to replace e_i with random zero-mean vectors, e.g. random sign vectors v \in \{-1, +1\}^{n} or random normal vectors v \sim \mathcal{N}(\mu=0, \sigma=1):
-
\mathtt{tr}(A) \approx \frac{1}{n_v}\sum\limits_{i=1}^{n_v} v_i^\top A v_i
-
It was shown more formally later by M.F. Huchinson that if these random vectors v \in \mathbb{R}^n satisfy \mathbb{E}[vv^T] = I then this approach indeed forms an unbiased estimator of \mathrm{tr}(A). To see this, its sufficient to combine the linearity of expectation, the cyclic-property of the trace function, and the aforementioned isotropy conditions: \mathtt{tr}(A) = \mathtt{tr}(A I) = \mathtt{tr}(A \mathbb{E}[v v^T]) = \mathbb{E}[\mathtt{tr}(Avv^T)] = \mathbb{E}[\mathtt{tr}(v^T A v)] = \mathbb{E}[v^T A v]
-
Naturally we expect the approximation to gradually improve as more vectors are sampled, converging as n_v \to \infty. Let’s see if we can check this: we start by collecting a few sample quadratic forms
-
-
from primate.random import isotropic
-estimates = np.array([v @ A @ v for v in isotropic(A.shape[1])])
-
-
Plotting both the estimator formed by averaging the first k samples along with its absolute error, we get the following figures for k = 50 and k = 500, respectively:
-
-
-
-
-
-
-
-
-
-
Sure enough, the estimator quickly gets within an error rate of about 1-5% away from the actual trace, getting generally closer as more samples are collected. On the other hand, improvement is not guaranteed, and securing additional precision much less than 1% becomes increasingly difficult due to the randomness and variance of the sample estimates.
-
-
-
Maximizing configurability
-
Rather than manually averaging samples, trace estimates can be obtained via the hutch function:
-
-
from primate.trace import hutch
-print(f"Trace estimate: {hutch(A)}")
-
-
Trace estimate: 4.517993874467151
-
-
-
Though its estimation is identical to averaging quadratic forms v^T A v of isotropic vectors as above, hutch comes with a few extra features to simplify its usage.
-
For example, to stop the estimator once it reaches an absolute tolerance, supply a residual value to atol:
-
-
## Stops the estimator if margin of error <= 0.15
-est = hutch(A, converge="confidence", atol=0.15, rtol=0.0)
-print(f"Trace est: {est}")
-
-
Trace est: 3.7060433891831823
-
-
-
The default early-stopping rule uses the CLT to bound the error of the estimator. A simpler early-stopping method can also be used, which is similar to the quadrature method:
-
-
## Stops the estimator when pair of iterates change <= 0.05
-hutch(A, converge='tolerance', atol=0.05, rtol=0.0)
-
-
1.9887531339820812
-
-
-
You can adjust the distribution used in sampling random vectors via the pdf parameter (one of “rademacher”, “normal”, or “sphere”). You can also fix the seed for deterministic behavior (you can also supply your own random number generators):
-
-
## Samples from the normal distribution N(0, 1)
-hutch(A, pdf="normal", seed=1234)
-
-rng = np.random.default_rng(1234)
-hutch(A, pdf="sphere", seed=rng)
In many settings, the trace of a matrix is not a very informative quantity, but the trace of a matrix functionf(A) may very well be an important quantity to estimate. It is natural to consider extending the Hutchinson estimator above with something like:
Of course, the remaining difficulty lies in computing quadratic forms v \mapsto v^T f(A) v efficiently. The approach taken by Ubaru, Saad, and Seghouane (2017) is to notice that sums of these quantities can transformed into a Riemann-Stieltjes integral:
-
v^T f(A) v = v^T U f(\Lambda) U^T v = \sum\limits_{i}^n f(\lambda_i) \mu_i^2 = \int\limits_{a}^b f(t) \mu(t)
-
where scalars \mu_i \in \mathbb{R} constitute a cumulative, piecewise constant function \mu : \mathbb{R} \to \mathbb{R}_+:
-
-\mu_i = (U^T v)_i, \quad \mu(t) = \begin{cases}
-0, & \text{if } t < a = \lambda_1 \\
-\sum_{j=1}^{i-1} \mu_j^2, & \text{if } \lambda_{i-1} \leq t < \lambda_i, i = 2, \dots, n \\
-\sum_{j=1}^n \mu_j^2, & \text{if } b = \lambda_n \leq t
-\end{cases}
-
-
As with any definite integral, one would ideally like to approximate its value with a quadrature rule. An exemplary such approximation is the m-point Gaussian quadrature rule:
with weights \{\omega_k\} and nodes \{ \eta_k\}. On one hand, Gaussian Quadrature (GQ) is just one of many quadrature rules that could be used to approximate v^T A v. On the other hand, GQ is often preferred over other rules due to its exactness on polynomials up to degree 2m - 1.
-
Of course, both the weights and nodes of GQ rule are unknown for the integral above. This contrasts the typical quadrature setting, wherein \omega is assumed uniform or is otherwise known ahead-of-time.
-
A surprising and powerful fact is that both the weights \{\omega_k\} and nodes \{ \eta_k\} of the m-point GQ above are directly computable the degree m matrix T_m produced by the Lanczos method:
In other words, the Ritz values \{\theta_i\} of T_m are exactly the nodes of GQ quadrature rule applied to \mu(t), while the first components of the Ritz vectors \tau_i (squared) are exactly the weights. For more information about this non-trivial fact, see Golub and Meurant (2009).
-
-
-
Stochastic Lanczos Quadrature
-
By using Lanczos quadrature estimates, the theory above enables the Hutchinson estimator to extend to be used as an estimator for the trace of a matrix function:
Under relatively mild assumptions, if the function of interest f: [a,b] \to \mathbb{R} is analytic on [\lambda_{\text{min}}, \lambda_{\text{max}}], then for constants \epsilon, \eta \in (0, 1) the output \Gamma of the Hutchinson estimator satisfies:
…if the number of sample vectors n_v satisfies n_v \geq (24/\epsilon^2) \log(2/\eta) and the degree of the Lanczos expansion is sufficiently large. In other words, we can achieve an arbitrary (1 \pm \epsilon)-approximation of \mathrm{tr}(f(A)) with success probability \eta using on the order of \sim O(\epsilon^{-2} \log(\eta^{-1})) evaluations of e_1^T f(T_m) e_1. This probablistic guarantee is most useful when \epsilon is not too small, i.e. only a relatively coarse approximation of \mathrm{tr}(f(A)) is needed.
-
-
-
Example: Logdet computation
-
Back to the computation, let’s see how the above theory translate to code with primate.
-
To parameterize a Girard-Hutchinson estimator for use with matrix-functions, it suffices to simply set the fun argument in hutch to either the name of a built-in function or an arbitrary Callable.
-
For example, to approximate the log-determinant of A:
Logdet (exact): nan
-Logdet (approx): 2.0236536930434426
-
-
-
/var/folders/0l/b3dbb2_d2bb4y3wbbfk0wt_80000gn/T/ipykernel_21376/1227822633.py:3: RuntimeWarning: invalid value encountered in log
- logdet_true = np.sum(np.log(np.linalg.eigvalsh(A)))
-
-
-
Here we see that using only n / 6 evaluations of e_1^T f(T_m) e_1, we get a decent approximation of logdet(A). To get a better idea of the quality of the estimator, set verbose=True:
-
-
est = hutch(A, fun="log", maxiter=25, seed=5, verbose=True)
-
-
The first line of the statement contains fixed information about the estimator, including the type of estimator (Girard-Hutchinson), the function applied to the spectrum (log), the degree of the Lanczos quadrature approximation (20), and the quadrature method used (golub_welsch).
-
The second line prints the runtime information about the samples, such as the final trace estimate, its margin of error and coefficient of variation (a.k.a relative std. deviation), the number of samples vectors n_v, and the isotropic distribution of choice (‘R’ for rademacher).
-
Another way to get an idea of summarize the convergence behavior of the estimator is to pass the plot=True option. Here, we pass the same seed for reproducibility with the above call.
-
-
est = hutch(A, fun="numrank", maxiter=150, seed=8, plot=True)
-
-
Similar information as reported by the verbose=True flag is shown cumulatively applied to every sample. The left plot shows the sample point estimates alongside yellow bands denoting the margin of error associated with the 95% confidence interval (CI) at every sample index. The right plot show convergence information, e.g. the running coefficient of variation is shown on the top-right and an upper bound on the absolute error is derived from the CI is shown on the bottom right.
-
While the margin of error is typically a valid, conservative estimate of estimators error, in some situations these bounds may not be valid as they are based solely on the CLT implications for normally distributed random variables. In particular, there is an inherent amount of error associated with the Lanczos-based quadrature approximations that is not taken into account in the error bound, which may invalidate the corresponding CI’s.
-
-
-
-
-
-
-
References
-
-Golub, Gene H, and Gérard Meurant. 2009. Matrices, Moments and Quadrature with Applications. Vol. 30. Princeton University Press.
-
-
-Ubaru, Shashanka, Yousef Saad, and Abd-Krim Seghouane. 2017. “Fast Estimation of Approximate Matrix Ranks Using Spectral Densities.”Neural Computation 29 (5): 1317–51.
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/docs/_site/basic/install.html b/docs/basic/install.html
similarity index 100%
rename from docs/_site/basic/install.html
rename to docs/basic/install.html
diff --git a/docs/_site/basic/integration.html b/docs/basic/integration.html
similarity index 99%
rename from docs/_site/basic/integration.html
rename to docs/basic/integration.html
index 1bfaee1..165c802 100644
--- a/docs/_site/basic/integration.html
+++ b/docs/basic/integration.html
@@ -618,7 +618,7 @@
It’s up to you to ensure shape() yields the correct size; primate will supply vectors to input of size .shape().second (number of columns) and guarantees the pointer to the output will be at least shape().first (number of rows), no more.
-
To read more about how semantics extend to the C++ side via C++20 concepts, see the C++ integration guide. If you’re using pybind11 and you want to extend primate’s Python API to work natively with linear operator implemented in C++, see the pybind11 integration guide.