Mission : Impossible is a concise header only C++17 implementation of automatic differentiation based on operator overloading. For repeated calculations it is very easy to define an auto-destructive local tape, hence the wink to the “Mission: Impossible” series.
Illustrates local tape usage:
#include "MissionImpossible/MissionImpossible.hpp"
#include <array>
#include <iostream>
using namespace MissionImpossible;
// A C++ function
template <typename T>
T
Rosenbrock(const std::array<T, 2>& X)
{
return (1 - X[0]) * (1 - X[0]) + 10 * (X[1] - X[0] * X[0]) * (X[1] - X[0] * X[0]);
}
// A C++ function that adds gradient computation
template <typename T>
T
Rosenbrock(const std::array<T, 2>& X, std::array<T, 2>& grad)
{
MissionImpossible_Tape<T> local_tape; // a local thread_local tape
std::array<AD<T>, 2> ad_X;
ad_X[0] = X[0];
ad_X[1] = X[1];
AD<T> ad_f = Rosenbrock(ad_X);
Tape_Vector<T> ad_grad_f = gradient(local_tape, ad_f); // use the local tape for ∇f
grad[0] = ad_grad_f[ad_X[0]];
grad[1] = ad_grad_f[ad_X[1]];
return ad_f.value();
// here the local tape is destroyed (in fact re-winded to avoid
// useless new/delete)
}
int
main()
{
std::array<float, 2> X{3., 4.};
std::array<float, 2> grad;
float f = Rosenbrock(X, grad);
std::cout << "f = " << f << std::endl;
std::cout << "∇f = [ " << grad[0] << ", " << grad[1] << " ]" << std::endl;
}
prints
f = 254
∇f = [ 604, -100 ]
This library has been optimized for unstructured (= not vectorized) first order derivatives. Its speed must be comparable to Adept as it relies on the same kind of approach [1].
[1], Srajer, Filip, Zuzana Kukelova, and Andrew Fitzgibbon. “A benchmark of selected algorithmic differentiation tools on some problems in computer vision and machine learning.” Optimization Methods and Software 33.4-6 (2018): 889-906.
The library supports arbitrary derivatives order but has not been optimized for that. By example for second order derivatives the symmetry ∂ij=∂ji is not taken into account. This leads to a fast growing number of redundant computations as derivation order increases (except for a function of one variable). For higher order derivatives, dedicated approaches like [2,3] are more effective.
[2], Wang, Mu, Assefaw Gebremedhin, and Alex Pothen. “Capitalizing on live variables: new algorithms for efficient Hessian computation via automatic differentiation.” Mathematical Programming Computation 8.4 (2016): 393-433.
[3], Gower, Robert Mansel, and Artur L. Gower. “Higher-order reverse automatic differentiation with emphasis on the third-order.” Mathematical Programming 155.1-2 (2016): 81-103.
- [2022-03-16 Wed 08:32]
min/max and complex comparison bugs fix thanks to Léopold Delahaye - [2020-02-09 Sun 19:44]
Adding a CMake build solution + CI + code cleaning. v0.1.1. - [2020-02-09 Sun 11:56]
After an effort to provide some documentation and asqrt()
bug fix (thanks to Fabrice Gaudier), I think it’s time to release the v0.1.0 version! - [2020-01-07 Tue 12:27]
This a pre-release. Some developments remain to be done (implementing special functions, adding examples and benchmarks) but the design and API is not expected to change a lot.
The library main build tool is the meson build system.
If you are not familiar with meson, the compilation procedure is as follows:
git clone https://github.com/vincent-picaud/MissionImpossible.git
cd MissionImpossible/
meson build
cd build
ninja test
To get an optimized version, use:
meson --buildtype=release -Db_ndebug=true build-release
For your convenience I also provide CMake solution.
Note: the CMake solution does not run tests.
These examples can be found in the build/examples/
directory.
Illustrates forward-mode and reverse-mode support. The first one is convenient to compute the Jacobian column by column. The second one is effective to compute gradients (or equivalently to compute the Jacobian row by row).
#include "MissionImpossible/MissionImpossible.hpp"
#include <iostream>
using namespace MissionImpossible;
int
main()
{
AD<double> r = 2, theta = 0.1;
AD<double> y1 = r * cos(theta);
AD<double> y2 = r * sin(theta);
//////////////////////////////////
// Computes Jacobian row by row //
//////////////////////////////////
//
// -> AKA reverse-mode
//
std::cout << "Jacobian row by row" << std::endl;
auto Jacobian_row_y1 = Jacobian_row(y1); // ∇y1 (or equivalently gradient(y1))
// computes ∂ᵣy¹, ∂ₒy¹
auto Jacobian_row_y2 = Jacobian_row(y2); // ∇y2 (or equivalently gradient(y2))
// computes ∂ᵣy², ∂ₒy²
std::cout << "∇y1(r,θ) = [ ∂ᵣy¹, ∂ₒy¹ ] = " << std::setw(20) << Jacobian_row_y1[r] << ", ";
std::cout << std::setw(20) << Jacobian_row_y1[theta] << std::endl;
std::cout << "∇y2(r,θ) = [ ∂ᵣy², ∂ₒy² ] = " << std::setw(20) << Jacobian_row_y2[r] << ", ";
std::cout << std::setw(20) << Jacobian_row_y2[theta] << std::endl;
////////////////////////////////////////
// Computes Jacobian column by column //
////////////////////////////////////////
//
// -> AKA forward-mode
//
std::cout << std::endl << "Jacobian column by column" << std::endl;
auto Jacobian_column_r = Jacobian_column(r); // r column computes ∂ᵣy¹, ∂ᵣy²
auto Jacobian_column_theta = Jacobian_column(theta); // θ column compules ∂ₒy¹, ∂ₒy²
std::cout << "∂ᵣy¹ = " << std::setw(20) << Jacobian_column_r[y1] << "\t"
<< "∂ᵣy² = " << std::setw(20) << Jacobian_column_theta[y1] << std::endl;
std::cout << "∂ₒy¹ = " << std::setw(20) << Jacobian_column_r[y2] << "\t"
<< "∂ₒy² = " << std::setw(20) << Jacobian_column_theta[y2] << std::endl;
}
prints
Jacobian row by row ∇y1(r,θ) = [ ∂ᵣy¹, ∂ₒy¹ ] = 0.995004, -0.199667 ∇y2(r,θ) = [ ∂ᵣy², ∂ₒy² ] = 0.0998334, 1.99001 Jacobian column by column ∂ᵣy¹ = 0.995004 ∂ᵣy² = -0.199667 ∂ₒy¹ = 0.0998334 ∂ₒy² = 1.99001
Illustrates complex number support:
#include "MissionImpossible/MissionImpossible.hpp"
#include <complex>
#include <iostream>
using namespace MissionImpossible;
void
most_efficient()
{
using T = std::complex<double>;
AD<T> z0 = T(1, 2), Z;
Z = 4 * exp(2 * z0 * z0);
auto dZ = gradient(Z);
std::cout << " f = " << Z << std::endl;
std::cout << "df = " << dZ[z0] << std::endl;
}
template <typename F>
void
more_versatile(F f)
{
AD<double> x(1), y(2);
std::complex<AD<double>> z0(x, y), Z;
Z = f(z0);
AD<double> u = Z.real(), v = Z.imag();
const auto grad_u = gradient(u);
// assumes that Z is holomorph
//
std::cout << " f = " << Z << std::endl;
std::cout << "df = " << grad_u[x] << ", ";
std::cout << -grad_u[y] << std::endl;
// Cauchy-Riemann
//
const auto grad_v = gradient(v);
std::cout << "--> Cauchy-Riemann check:" << std::endl;
std::cout << grad_u[x] << " ?= " << grad_v[y] << std::endl;
std::cout << grad_u[y] << " ?= " << -grad_v[x] << std::endl;
}
int
main()
{
std::cout << " f1: " << std::endl;
most_efficient();
//================
auto f_holomorph = [](const auto& z) { return 4 * exp(2 * z * z); };
auto f_not_holomorph = [](const auto& z) { return sqrt(z * conj(z)); };
std::cout << std::endl << "Holomorph f1: " << std::endl;
more_versatile(f_holomorph);
std::cout << std::endl << "Not holomorph f2: " << std::endl;
more_versatile(f_not_holomorph);
}
prints:
f1: f = (-0.00144263,+0.0098095) df = (-0.0842465,+0.0276969) Holomorph f1: f = (-0.00144263,+0.0098095) df = -0.0842465, +0.0276969 --> Cauchy-Riemann check: -0.0842465 ?= -0.0842465 -0.0276969 ?= -0.0276969 Not holomorph f2: f = (+2.23607,+0) df = +0.447214, -0.894427 --> Cauchy-Riemann check: +0.447214 ?= +0 +0.894427 ?= -0
Illustrates Hessian action Hv=∇ <∇f,v> computation:
#include "MissionImpossible/MissionImpossible.hpp"
using namespace MissionImpossible;
int
main()
{
AD<AD<double>> x0(3), x1(4), y;
y = (1 - x0) * (1 - x0) + 10 * (x1 - x0 * x0) * (x1 - x0 * x0);
std::cout << "f = " << y << std::endl;
auto y_gradient = gradient(y); // Computes ∇f
std::cout << "∇f= " << y_gradient[x0] << ", ";
std::cout << y_gradient[x1] << std::endl;
AD<double> z;
double v0(5), v1(6);
z = v0 * y_gradient[x0] + v1 * y_gradient[x1]; // Computes z=<∇f,v>
auto z_gradient = gradient(z); // Computes Hv = ∇z = ∇ <∇f,v>
std::cout << "Hv= " << z_gradient[x0] << ", ";
std::cout << z_gradient[x1] << std::endl;
}
prints
f = +254 ∇f= +604, -100 Hv= +3890, -480
Illustrates nested computations support
#include "MissionImpossible/MissionImpossible.hpp"
#include <iostream>
using namespace MissionImpossible;
template <typename T>
auto
Rosenbrock(const T& x0, const T& x1)
{
return (1 - x0) * (1 - x0) + 10 * (x1 - x0 * x0) * (x1 - x0 * x0);
}
// Third order demo
int
main()
{
AD<AD<AD<double>>> x0(3), x1(4), y;
y = Rosenbrock(x0, x1);
auto grad = gradient(y);
auto Hessian_x0_row = gradient(grad[x0]);
auto Hessian_x1_row = gradient(grad[x1]);
auto third_order_x0_x0_row = gradient(Hessian_x0_row[x0]);
auto third_order_x0_x1_row = gradient(Hessian_x0_row[x1]);
auto third_order_x1_x0_row = gradient(Hessian_x1_row[x0]);
auto third_order_x1_x1_row = gradient(Hessian_x1_row[x1]);
std::cout << "f = " << y << std::endl;
std::cout << std::endl;
std::cout << "∂₀f = " << grad[x0] << std::endl;
std::cout << "∂₁f = " << grad[x1] << std::endl;
std::cout << std::endl;
std::cout << "∂²₀₀f = " << Hessian_x0_row[x0] << std::endl;
std::cout << "∂²₀₁f = " << Hessian_x0_row[x1] << std::endl;
std::cout << "∂²₁₀f = " << Hessian_x1_row[x0] << std::endl;
std::cout << "∂²₁₁f = " << Hessian_x1_row[x1] << std::endl;
std::cout << std::endl;
std::cout << "∂³₀₀₀f = " << third_order_x0_x0_row[x0] << std::endl;
std::cout << "∂³₀₀₁f = " << third_order_x0_x0_row[x1] << std::endl;
std::cout << "∂³₀₁₀f = " << third_order_x0_x1_row[x0] << std::endl;
std::cout << "∂³₀₁₁f = " << third_order_x0_x1_row[x1] << std::endl;
std::cout << "∂³₁₀₀f = " << third_order_x1_x0_row[x0] << std::endl;
std::cout << "∂³₁₀₁f = " << third_order_x1_x0_row[x1] << std::endl;
std::cout << "∂³₁₁₀f = " << third_order_x1_x1_row[x0] << std::endl;
std::cout << "∂³₁₁₁f = " << third_order_x1_x1_row[x1] << std::endl;
}
which prints
f = +254 ∂₀f = +604 ∂₁f = -100 ∂²₀₀f = +922 ∂²₀₁f = -120 ∂²₁₀f = -120 ∂²₁₁f = +20 ∂³₀₀₀f = +720 ∂³₀₀₁f = -40 ∂³₀₁₀f = -40 ∂³₀₁₁f = +0 ∂³₁₀₀f = -40 ∂³₁₀₁f = +0 ∂³₁₁₀f = +0 ∂³₁₁₁f = +0
This part focuses on the things to know to properly use this library.
To compute derivatives you must use AD<T>
types in place of the usual
T
types (where T
represents a real type like float
or double
):
AD<T>
for first order derivativesAD<AD<T>>
for second order derivativesAD<AD<AD<T>>>
for third order derivatives- …
Note: you must always initialize AD<T>
variables before using them (in
order to register them in the tape).
Example:
#include "MissionImpossible/MissionImpossible.hpp"
using namespace MissionImpossible;
int
main()
{
// GOOD
//================
AD<double> x1, y1;
x1 = 1; // initializes x1
y1 = 2 * x1; // before usage
auto grad1 = gradient(y1); // OK
// BAD
//================
AD<double> x2, y2;
y2 = 2 * x2; // use of x2 without initialization
// triggers an assert(0) in DEBUG mode
auto grad2 = gradient(y2); // undefined behavior
}
The origin of the problem is not attached to this library, by example:
#include <vector>
// BAD
template <typename T>
void
scale_v1(const T scalar, std::vector<T>& v)
{
// version 1
}
// GOOD
template <typename T>
void
scale_v2(const typename std::vector<T>::value_type scalar, std::vector<T>& v)
{
// version 2
}
int
main()
{
std::vector<double> v(10);
scale_v1(2, v); // <- does not compile
// "...deduced conflicting types for parameter ‘T’ (‘int’ and ‘double’)..."
scale_v2(2, v); // <- OK
}
The use of typename std::vector<T>::value_type
avoids type conflict as
now only one expression (here std::vector<T>
) is used to deduce the
type of T (further detail: cppreference: type_identity).
Back to this “Mission : Impossible” library, if one wants to define a
function that takes a scalar constant 10
and computes 10*x*x
, you must
use Underlying_Type_t
(as a emplacement of typename
std::vector<T>::value_type
in the previous example):
#include "MissionImpossible/MissionImpossible.hpp"
using namespace MissionImpossible;
template <typename T>
T
my_function(const AD_Underlying_Type_t<T> scalar_constant, const T x)
{
return scalar_constant * x * x;
}
int
main()
{
AD<AD<double>> x = 2, y;
y = my_function(10, x);
auto dy = Jacobian_row(y); // auto = Tape_Vector<AD<double>>
auto d2y = Jacobian_row(dy[x]); // auto = Tape_Vector<double>
std::cout << "y = " << y << std::endl;
std::cout << "dy = " << dy[x] << " dx" << std::endl;
std::cout << "d2y = " << d2y[x] << " dx⊗dx" << std::endl;
}
which prints:
y = +40 dy = +40 dx d2y = +20 dx⊗dx
Maybe the last function to know, but to use with care (as it shortcuts the flow of
tape recording), is underlying_value()
. This function
returns the underlying stored value. By example:
#include "MissionImpossible/MissionImpossible.hpp"
using namespace MissionImpossible;
int
main()
{
AD<AD<double>> x = 2, y;
y = 10 * x * x;
auto dy = Jacobian_row(y); // auto = Tape_Vector<AD<double>>
auto d2y = Jacobian_row(dy[x]); // auto = Tape_Vector<double>
double value_y = underlying_value(y);
double value_dy = underlying_value(dy[x]);
double value_d2y = underlying_value(d2y[x]);
std::cout << "y = " << value_y << std::endl;
std::cout << "dy = " << value_dy << " dx" << std::endl;
std::cout << "d2y = " << value_d2y << " dx⊗dx" << std::endl;
}
y = 40 dy = 40 dx d2y = 20 dx⊗dx
A differential evaluated at point X, dfₓ is a linear application that can be represented (given basis) by a matrix (also known as Jacobian) of components ∂ⱼfⁱ where i denotes rows and j columns. You can compute the Jacobian:
- row by row using the
Jacobian_row()
function (fix i and compute ∂ⱼfⁱ for all j) - column by column using the
Jacobian_column()
(fix j and compute ∂ⱼfⁱ for all i)
Note: in applications we often encounter real functions. In that case there is only one row and the (total) differential is simply
dfₓ=Σ ∂ᵢf dxⁱ
It is clearly better to compute df row by row as we only have one row. We get a “row vector” that can be used to represent the gradient of f (“column vector”, denoted by ∇fₓ):
dfₓ.h = <∇fₓ,h>
That is the reason why there is an alias of the Jacobian_row()
function which is gradient()
.
Example: (I just reproduced the already given Jacobian example).
#include "MissionImpossible/MissionImpossible.hpp"
#include <iostream>
using namespace MissionImpossible;
int
main()
{
AD<double> r = 2, theta = 0.1;
AD<double> y1 = r * cos(theta);
AD<double> y2 = r * sin(theta);
//////////////////////////////////
// Computes Jacobian row by row //
//////////////////////////////////
//
// -> AKA reverse-mode
//
std::cout << "Jacobian row by row" << std::endl;
auto Jacobian_row_y1 = Jacobian_row(y1); // ∇y1 (or equivalently gradient(y1))
// computes ∂ᵣy¹, ∂ₒy¹
auto Jacobian_row_y2 = Jacobian_row(y2); // ∇y2 (or equivalently gradient(y2))
// computes ∂ᵣy², ∂ₒy²
std::cout << "∇y1(r,θ) = [ ∂ᵣy¹, ∂ₒy¹ ] = " << std::setw(20) << Jacobian_row_y1[r] << ", ";
std::cout << std::setw(20) << Jacobian_row_y1[theta] << std::endl;
std::cout << "∇y2(r,θ) = [ ∂ᵣy², ∂ₒy² ] = " << std::setw(20) << Jacobian_row_y2[r] << ", ";
std::cout << std::setw(20) << Jacobian_row_y2[theta] << std::endl;
////////////////////////////////////////
// Computes Jacobian column by column //
////////////////////////////////////////
//
// -> AKA forward-mode
//
std::cout << std::endl << "Jacobian column by column" << std::endl;
auto Jacobian_column_r = Jacobian_column(r); // r column computes ∂ᵣy¹, ∂ᵣy²
auto Jacobian_column_theta = Jacobian_column(theta); // θ column compules ∂ₒy¹, ∂ₒy²
std::cout << "∂ᵣy¹ = " << std::setw(20) << Jacobian_column_r[y1] << "\t"
<< "∂ᵣy² = " << std::setw(20) << Jacobian_column_theta[y1] << std::endl;
std::cout << "∂ₒy¹ = " << std::setw(20) << Jacobian_column_r[y2] << "\t"
<< "∂ₒy² = " << std::setw(20) << Jacobian_column_theta[y2] << std::endl;
}
Jacobian row by row ∇y1(r,θ) = [ ∂ᵣy¹, ∂ₒy¹ ] = 0.995004, -0.199667 ∇y2(r,θ) = [ ∂ᵣy², ∂ₒy² ] = 0.0998334, 1.99001 Jacobian column by column ∂ᵣy¹ = 0.995004 ∂ᵣy² = -0.199667 ∂ₒy¹ = 0.0998334 ∂ₒy² = 1.99001
Note: there are also variants of the Jacobian_row()
and
Jacobian_column()
that use local tape. In that case have a look at the “local tape” section and use:
MissionImpossible_Tape<double> local_tape;
// ... computations ...
auto row_by_row = Jacobian_row(local_tape, y1); // or equivalently gradient(local_tape,y1)
auto column_by_column = Jacobian_column(local_tape, r);
A local_thread
tape is globally stored. You can access it by:
tape<T>(); // returns a reference Tape<T>& to the tape associated to AD<T>
tape<AD<T>>(); // returns a reference Tape<T>& to the tape associated to AD<AD<T>>
tape<AD<AD<T>>>(); // returns a reference Tape<T>& to the tape associated to AD<AD<AD<T>>>
// etc...
From the library user perspective, you can use these methods:
statement_size()
: returns the number of statements (= number of expresisons + number of declaredAD<T>
variables).memory_size()
: used memory to store all the statementsallocated_memory_size()
: preallocated memoryreset()
rewinds the tape at the beginning, does not release currently allocated tape memory. Attention: this invalidates all previously declaredAD<T>
variables.clear()
rewinds the tape at the beginning and releases allocated memory. Attention: this invalidates all previously declaredAD<T>
variables.
#include "MissionImpossible/MissionImpossible.hpp"
using namespace MissionImpossible;
int
main()
{
auto print_tape_size = [](auto msg) {
std::cout << std::endl << ">>> " << msg << std::endl;
std::cout << "statements : " << tape<double>().statement_size() << std::endl;
std::cout << "memory (kBytes) : " << tape<double>().memory_size()/1024 << std::endl;
std::cout << "allocated memory (kBytes) : " << tape<double>().allocated_memory_size()/1024 << std::endl;
};
print_tape_size("Initial tape state (contains a small amount of preallocated memory)");
for (size_t i = 1; i < 1000; ++i)
{
AD<double> x0 = 2, x1 = 3, y;
y = 4 * x0 + 2 * x1;
}
print_tape_size("Final tape state (tape has allocated some fresh memory)");
tape<double>().reset();
print_tape_size("after tape.reset() (the extra allocated memory is not released)");
tape<double>().clear();
print_tape_size("after tape.clear() (releases extra memory and starts with a new tape)");
}
>>> Initial tape state (contains a small amount of preallocated memory) statements : 0 memory (kBytes) : 0 allocated memory (kBytes) : 24 >>> Final tape state (tape has allocated some fresh memory) statements : 2997 memory (kBytes) : 54 allocated memory (kBytes) : 64 >>> after tape.reset() (the extra allocated memory is not released) statements : 0 memory (kBytes) : 0 allocated memory (kBytes) : 64 >>> after tape.clear() (releases extra memory and starts with a new tape) statements : 0 memory (kBytes) : 0 allocated memory (kBytes) : 24
If you want to do local computations and rewind the tape afterward you can use a local tape.
#include "MissionImpossible/MissionImpossible.hpp"
using namespace MissionImpossible;
int
main()
{
auto print_tape_size = [](auto msg) {
std::cout << std::endl << ">>> " << msg << std::endl;
std::cout << "statements : " << tape<double>().statement_size() << std::endl;
std::cout << "memory : " << tape<double>().memory_size() << std::endl;
};
print_tape_size("Initial tape state");
AD<double> x0 = 2, x1 = 3, y;
y = 4 * x0 + 2 * x1;
auto grad = gradient(y);
std::cout << std::endl
<< "f: " << y << ", grad: [ " << grad[x0] << ", " << grad[x1] << " ]" << std::endl;
print_tape_size("Final tape state");
std::cout << std::endl << "[[ Same computation but using a local tape ]]" << std::endl;
print_tape_size("Initial tape state");
{
MissionImpossible_Tape<double> local_tape;
AD<double> x0 = 2, x1 = 3, y;
y = 4 * x0 + 2 * x1;
auto grad = gradient(local_tape, y); // <- here gradient use the local_tape
std::cout << std::endl
<< "f: " << y << ", grad: [ " << grad[x0] << ", " << grad[x1] << " ]" << std::endl;
}
print_tape_size("Final tape state (global tape state has not changed)");
}
>>> Initial tape state statements : 0 memory : 8 f: +14, grad: [ +4, +2 ] >>> Final tape state statements : 3 memory : 64 [[ Same computation but using a local tape ]] >>> Initial tape state statements : 3 memory : 64 f: +14, grad: [ +4, +2 ] >>> Final tape state (global tape state has not changed) statements : 3 memory : 64
If you use a local tape you must take care of only
using AD<T>
declared in the scope of this local tape. By example:
#include "MissionImpossible/MissionImpossible.hpp"
using namespace MissionImpossible;
int
main()
{
// GOOD
//================
{
MissionImpossible_Tape<double> local_tape;
AD<double> x0 = 2, x1 = 3, y;
y = 4 * x0 + 2 * x1;
auto grad = gradient(local_tape, y);
}
// GOOD
//================
AD<double> a = 1; // Ok, as "a" is not used in local_tape scope
{
MissionImpossible_Tape<double> local_tape;
AD<double> x0 = 2, x1 = 3, y;
y = 4 * x0 + 2 * x1;
auto grad = gradient(local_tape, y);
}
// BAD
//================
{
MissionImpossible_Tape<double> local_tape;
AD<double> x0 = 2, x1 = 3, y;
y = 4 * x0 + 2 * x1 + a; // BAD: "a" was not declared in the tape scope
auto grad = gradient(local_tape, y); // Undefined behavior. In
// DEBUG mode triggers an
// assert(0)
}
}
Note: local tapes can be nested too (but you still have to respect variable scopes!).