Skip to content

Latest commit

 

History

History
159 lines (121 loc) · 4.09 KB

README.md

File metadata and controls

159 lines (121 loc) · 4.09 KB

factorlock

CRAN status Travis build status

Installation

The current version is not yet on CRAN, but you can install it from Github using the {remotes} package:

# install.packages("remotes")
remotes::install_github("jhelvy/factorlock")

Load the library with:

library(factorlock)

Usage

The only thing this package does is provide the factorlock::lock_factors() function to make it easier to reorder factors in a data frame according to the row order.

By default, bar charts made with {ggplot2} follow alphabetical ordering:

library(ggplot2)
library(dplyr)

mpg |>
  count(manufacturer) |>
  ggplot() +
  geom_col(aes(x = n, y = manufacturer))

If you wanted to sort the bars based on n, many people make the mistake of sorting the data frame and assuming the reordered rows will pass through to the bars, like this:

mpg |>
  count(manufacturer) |>
  arrange(n) |>
  ggplot() +
  geom_col(aes(x = n, y = manufacturer))

But this produces the same chart! 🤦

Instead, to sort the bars based on n you have to reorder the factor levels:

mpg |>
  count(manufacturer) |>
  ggplot() +
  geom_col(aes(x = n, y = reorder(manufacturer, n)))

I find this rather unintuitive and difficult to remember, let alone confusing because the ordering of the rows in the data frame won’t match the factor ordering (which is rather opaque to the user).

{factorlock} provides an alternative approach by allowing you to “lock” the factor ordering to that of the row ordering in the data frame:

mpg |>
  count(manufacturer) |>
  arrange(n) |>
  factorlock::lock_factors() |>
  ggplot() +
  geom_col(aes(x = n, y = manufacturer))

Notice that you also get better axis label names for free here since the y variable is still mapped to just manufacturer instead of reorder(manufacturer, n).

By default all character or factor type variables are “locked”, but you can also specify which variables you want to “lock” while leaving the others alone:

mpg |>
  count(manufacturer) |>
  arrange(n) |>
  factorlock::lock_factors("manufacturer") |>
  ggplot() +
  geom_col(aes(x = n, y = manufacturer))

You can also get a reverse factor ordering with rev = TRUE:

mpg |>
  count(manufacturer) |>
  arrange(n) |>
  factorlock::lock_factors(rev = TRUE) |>
  ggplot() +
  geom_col(aes(x = n, y = manufacturer))

Author, Version, and License Information

Citation Information

If you use this package for in a publication, I would greatly appreciate it if you cited it - you can get the citation by typing citation("factorlock") into R:

citation("factorlock")
#> 
#> To cite factorlock in publications use:
#> 
#>   John Paul Helveston (2022). factorlock: Set Factors Levels Based on
#>   Row Order.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {factorlock: Set Factors Levels Based on Row Order},
#>     author = {John Paul Helveston},
#>     year = {2022},
#>     note = {R package},
#>     url = {https://jhelvy.github.io/factorlock/},
#>   }