Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discussion: use roxygen + use new bandwidth estimation (fixes erors caused by lack of variation) + interprete density in device coordinates + deprecate "knn" #23

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

jan-glx
Copy link
Contributor

@jan-glx jan-glx commented Dec 1, 2023

I have made a few changes in my fork that cover my needs and would be happy to prepare issues / PRs for the parts you think are a good idea/solution.

  • housekeeping
  • slightly breaking: fix edgecases by using new bandwidth estimation based on the expanded scale limits instead of the data range/standard deviation (3d75d67, a7bc1fb)
  • slightly breaking: define density as density in device coordinates (i.e. in the plot you see, however it might be scaled) (b18b99f, 9b5e4c7)
  • deprecate the original "knn" based density estimation since it has fundamental issues with non-continous data and is anyway slower (50c2e20 (this actually just changes the default))

@@ -13,27 +13,67 @@ count_neighbors_r <- function(x, y, r2, xy) {
})
}


#' Wraps the user supplied Geom (typically GeomPoint) to add class "check_aspect_grob" and information about the aspect ratio assumed under which the densities were calculated to the grobs it draws
#' The check is injected by providing an S3 instance to the makeContext generic that is called by grid for each grob at render time (where the actual plot aspect ratio is finally known)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, does this mean the density gets re-calculated each time the viewer pane in RStudio is resized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. If you did not use either +theme(aspect.ratio=``) or +coord_fixed(), it will just emit a warning every time you resize the pane (unless you adjust it perfectly such that the actual panel aspect ratio is 1)

@LKremer
Copy link
Owner

LKremer commented May 16, 2024

Hi @jan-glx ,
thanks a lot for this massive PR! I finally found some time to prepare the release of a new CRAN version of this package, and now I'm deciding which new features to add in the next release.

The housekeeping stuff makes sense of course, but I have some questions regarding your other commits. It's been a few years since I wrote the original code so I forgot some details that would help me understand what you did 😄 So let me ask some questions to clarify and refresh my memory.

Regarding 3d75d67, what's the between using diff(scales$x$get_limits()) and scale_views$x.range to get the data range in practice? I was under the impression that scales$x$get_limits() does not return the data range, but rather the range of the plot limits. So shouldn't it be roughly equivalent to your solution, or did you find some case where this makes a difference?

Regarding b18b99f and 9b5e4c7, I do believe the density should be calculated in device coordinates. So the rationale behind your changes makes perfect sense to me, and honestly I thought that's what my implementation was already doing. So I guess you again found some scenario where this is not the case, or my original implementation was buggy. Which one is it?
I installed your fork and played around with it and works nicely, although the Warning: Actual plot aspect ratio does not match aspect.ratio of StatPointdensity might be a little annoying in day-to-day use. So do you think there is a reasonable default way to handle this? I haven't checked your code in great detail but would it be possible to automatically use the actual_aspect_ratio for density calculation?

Regarding 50c2e20, @slowkow benchmarked both methods a while ago (https://twitter.com/slowkow/status/1169047484716466176?t=bOaQ_Def4u2qz_9zwystiw&s=19) and found that the original method is faster (for small-ish data sets at least). So I guess it has some advantages. In any case, I want to overhaul the methods parameter in the long run but haven't made up my mind on the details yet, so I guess for now it can stay the way it currently is.

@jan-glx
Copy link
Contributor Author

jan-glx commented May 16, 2024

Hi @LKremer great to see you finding some time for this nice package, unfortunately I don't have any spare time at the moment and hardly remember how I reached this PR. Let me give a few unverified short comments that might help making use of this work:

regarding 3d75d67 : get_limits does not include expansion or the limits set at coordinate level; often the difference is not huge sometimes it is, most prominently when the data range has width 0, see links in 3d75d67c0eccc3bf2d452d8c86540049c22e358a#commitcomment-142102424)

Regarding b18b99f and 9b5e4c7, your dx/dy implementation does achieve that to some extent but only precisely if the aspect ratio is 1 (any maybe also only if there are no transformations and equal expansions).

There is no way to fix this completely automatically as the ggplot does not know on what device it will be be plotted at that time. If you resize the plot in Rstudio the aspect ratio changes (and thus the correct densities will be different) but no ggplot function is called during the resize. The easy way to fix this is to always use + theme(aspect.ratio=1) when using +geom_pointdensity, one workaround for the warning might be an option that a user can set to ignore if the aspect ratio is off for less than xx%

Regarding 50c2e20 : yes knn is O(n log(n)) MASS:kde is O(n m^2) where m is the number of grid points for each axis, one could limit evaluation to the n points of interest for small n to speed it up for smaller datasets, but in the end only the case with many data-points is the one that matters, the one where one actually needs to wait. So while MASS:kde might not always be optimal it is always good.

#' `adjust = 0.5` means use half of the default. Other arguments may be
#' aesthetics, used to set an aesthetic to a fixed value, like `shape =
#' 17` or `size = 3`. They may also be parameters to the paired geom/stat.
#' @param na.rm If `FALSE`, the default, missing values are removed with a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing @param for method and aspect.ratio arguments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants