KL divergence estimators

The KL-divergence is normally defined between two probability distributions. In the case where only samples of the probability distribution are available, the KL-divergence can be estimated in a number of ways.

Here I test a few implementations of a KL-divergence estimator based on k-Nearest-Neighbours probability density estimation.

The estimator is that of

Qing Wang, Sanjeev R. Kulkarni, and Sergio Verdú. "Divergence estimation for multidimensional densities via k-nearest-neighbor distances." Information Theory, IEEE Transactions on 55.5 (2009): 2392-2405.

Samples are drawn from various test distributions, and the estimated KL-divergence between them is computed. Uncertainties are assessed by re-sampling the distributions and re-computing divergence estimates 100 times. Uncertainty bands are then given as the interval containing 68% of the re-sampled estimates closest to the median. Timings where provided are the time taken for the computation of all 100 re-samples on a sample size of N=1000 with k=5.

This study is far from exhaustive, and timings are sensitive to implementation details. Please take with a pinch of salt.

Estimator implementations

naive_estimator

KL-Divergence estimator using brute-force (numpy) k-NN
scipy_estimator

KL-Divergence estimator using scipy's KDTree
skl_estimator

KL-Divergence estimator using scikit-learn's NearestNeighbours
skl_efficient

An efficient version of the scikit-learn estimator by @LoryPack

These estimators have been benchmarked against slaypni/universal-divergence.

Tests

Self-divergence of samples from a 1-dimensional Gaussian

Estimate the divergence between two samples of size N and dimension 1, drawn from the same ~ N(0,1) probability distribution. The expected value for the divergence in this test is D=0.

Comparison of estimator implementations

Estimator	D(P\|Q)	Time (s)
naive_estimator	1.595e-03	7.998
scipy_estimator	1.595e-03	0.111
skl_estimator	1.595e-03	18.427
skl_efficient	1.595e-03	0.147

Convergence of estimator with N

Self-divergence of samples from a 2-dimensional Gaussian

Estimate the divergence between two samples of size N drawn from the same 2D distribution with mean=[0,0] and covariance=[[1, 0.1], [0.1, 1]]. The expected value for the divergence in this test is D=0.

Comparison of estimator implementations

Estimator	D(P\|Q)	Time (s)
naive_estimator	-6.811e-04	9.931
scipy_estimator	-6.811e-04	0.182
skl_estimator	-6.811e-04	18.362
skl_efficient	-6.811e-04	0.222

Convergence of estimator with N

Divergence of two 1-dimensional Gaussians

Estimate the divergence between two samples of size N and dimension 1. The first drawn from N(0,1), the second from N(2,1). The expected value for the divergence in this test is D=2.0.

Comparison of estimator implementations

Estimator	D(P\|Q)	Time (s)
naive_estimator	1.790e+00	7.014
scipy_estimator	1.790e+00	0.105
skl_estimator	1.790e+00	18.215
skl_efficient	1.790e+00	0.139

Convergence of estimator with N

Generating this document

Start in a clean python 3.10 environment and run the following

 # Setup dependencies
 pip install -r requirements.txt
 # Run the tests and generate the figures
 ./src/run_tests.py
 # Add the header and footer to the report
 cat templates/header.md report.md templates/footer.md > README.md

Which will then likely take some time to complete.

Important settings

The number of resamples used to estimate uncertainties is defined by n_resamples in tests.py. This is naturally an extremely sensitive variable for how long the tests take to run.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
figures		figures
src		src
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KL divergence estimators

Estimator implementations

Tests

Self-divergence of samples from a 1-dimensional Gaussian

Comparison of estimator implementations

Convergence of estimator with N

Self-divergence of samples from a 2-dimensional Gaussian

Comparison of estimator implementations

Convergence of estimator with N

Divergence of two 1-dimensional Gaussians

Comparison of estimator implementations

Convergence of estimator with N

Generating this document

Important settings

About

Releases

Packages

Contributors 2

Languages

License

nhartland/KL-divergence-estimators

Folders and files

Latest commit

History

Repository files navigation

KL divergence estimators

Estimator implementations

Tests

Self-divergence of samples from a 1-dimensional Gaussian

Comparison of estimator implementations

Convergence of estimator with N

Self-divergence of samples from a 2-dimensional Gaussian

Comparison of estimator implementations

Convergence of estimator with N

Divergence of two 1-dimensional Gaussians

Comparison of estimator implementations

Convergence of estimator with N

Generating this document

Important settings

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages