From 394fa37a88ff2f6c0287ace9138f7d5f12d4ea4f Mon Sep 17 00:00:00 2001 From: Jonas Moss Date: Thu, 11 Jul 2019 21:51:37 +0200 Subject: [PATCH] :books: Boundary bias. --- paper/paper.html | 2 +- paper/paper.md | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/paper/paper.html b/paper/paper.html index 12e80af..df435ec 100644 --- a/paper/paper.html +++ b/paper/paper.html @@ -385,7 +385,7 @@

Summary

Kernel density estimation (Silverman 2018) is a popular method for non-parametric density estimation based on placing kernels on each data point. Hjort and Glad (1995) extended kernel density estimation with parametric starts. The parametric start is a parametric density that is multiplied with the kernel estimate. When the data-generating density is reasonably close to the parametric start density, kernel density estimation with that parametric start will outperform ordinary kernel density estimation.

Asymmetric kernels are useful for estimating densities on the half-open interval \(\left[0,\infty\right)\) and bounded intervals such as \(\left[0, 1\right]\). On such intervals symmetric kernels are prone to serious boundary bias that should be corrected (Marron and Ruppert 1994). Asymmetric kernels are designed to avoid boundary bias.

kdensity is an R package (R Core Team 2019) to calculate and display kernel density estimates using non-parametric starts and potentially asymmetric kernels. In addition to the classical symmetric kernels, kdensity supports the following asymmetric kernels: For the unit interval, the Gaussian copula kernel of Jones and Henderson (2007) and the beta kernels of Chen (1999) are supported. On the half-open interval the gamma kernel of Chen (2000) is supported. The supported non-parametric starts include the normal, Laplace, Gumbel, exponential, gamma, log-normal, inverse Gaussian, Weibull, Beta, and Kumaraswamy densities. The parameters of all parametric starts are estimated using maximum likelihood. The implemented bandwidth selectors are the classical bandwidth selectors from stats, unbiased cross-validation, the Hermite polynomial method from Hjort and Glad (1995), and the tailored bandwidth selector for the Gaussian copula method of Jones and Henderson (2007). User defined parametric starts, kernels and bandwidth selectors are also supported.

-

The following example uses the data set from the built-in R package datasets. Since the data is positive we use Chen’s gamma kernel. As the data is likely to be better approximated by a gamma distribution than a uniform distribution, we use the gamma parametric start. The plotted density is in figure 1, where the gamma distribution with parameters estimated by maximum likelihood is in red and the ordinary kernel density estimate in blue.

+

The following example uses the data set from the built-in R package datasets. Since the data is positive we use Chen’s gamma kernel. As the data is likely to be better approximated by a gamma distribution than a uniform distribution, we use the gamma parametric start. The plotted density is in figure 1, where the gamma distribution with parameters estimated by maximum likelihood is in red and the ordinary kernel density estimate in blue. Notice the boundary bias of the ordinary kernel density estimator.

# install.packages("kdensity")
 library("kdensity")
 kde = kdensity(airquality$Wind, start = "gamma", kernel = "gamma")
diff --git a/paper/paper.md b/paper/paper.md
index f71e1b8..b00987f 100644
--- a/paper/paper.md
+++ b/paper/paper.md
@@ -13,7 +13,7 @@ authors:
     orcid: 0000-0002-6876-6964
     affiliation: 1
   - name: Martin Tveten
-    orcid: 0000-0000-0000-0000
+    orcid: 0000-0002-4236-633X
     affiliation: 1
 affiliations:
  - name: University of Oslo
@@ -56,7 +56,8 @@ R package `datasets`. Since the data is positive we use Chen's gamma kernel.
 As the data is likely to be better approximated by a gamma distribution than a 
 uniform distribution, we use the gamma parametric start. The plotted density is
 in figure 1, where the gamma distribution with parameters estimated by maximum 
-likelihood is in red and the ordinary kernel density estimate in blue.
+likelihood is in red and the ordinary kernel density estimate in blue. 
+Notice the boundary bias of the ordinary kernel density estimator. 
 
 ```r
 # install.packages("kdensity")