Skip to content

Commit

Permalink
New nrow_subset_linter (#2298)
Browse files Browse the repository at this point in the history
* New nrow_subset_linter

* examples

* remove commented code

* documentation feedback

* remove 'readability' tag

* more tests
  • Loading branch information
MichaelChirico authored Nov 19, 2023
1 parent c96d024 commit ff1dc21
Show file tree
Hide file tree
Showing 11 changed files with 118 additions and 3 deletions.
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ Collate:
'namespace_linter.R'
'nested_ifelse_linter.R'
'nonportable_path_linter.R'
'nrow_subset_linter.R'
'numeric_leading_zero_linter.R'
'nzchar_linter.R'
'object_length_linter.R'
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ export(namespace_linter)
export(nested_ifelse_linter)
export(no_tab_linter)
export(nonportable_path_linter)
export(nrow_subset_linter)
export(numeric_leading_zero_linter)
export(nzchar_linter)
export(object_length_linter)
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
* `which_grepl_linter()` for discouraging `which(grepl(ptn, x))` in favor of directly using `grep(ptn, x)` (part of #884, @MichaelChirico).
* `list_comparison_linter()` for discouraging comparisons on the output of `lapply()`, e.g. `lapply(x, sum) > 10` (part of #884, @MichaelChirico).
* `print_linter()` for discouraging usage of `print()` on string literals like `print("Reached here")` or `print(paste("Found", nrow(DF), "rows."))` (#1894, @MichaelChirico).
* `nrow_subset_linter()` for discouraging usage like `nrow(subset(x, conditions))` in favor of something like `with(x, sum(conditions))` which doesn't require a full subset of `x` (part of #884, @MichaelChirico).
* `pipe_return_linter()` for discouraging usage of `return()` inside a {magrittr} pipeline (part of #884, @MichaelChirico).

### Lint accuracy fixes: removing false positives
Expand Down
40 changes: 40 additions & 0 deletions R/nrow_subset_linter.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#' Block usage of `nrow(subset(x, .))`
#'
#' Using `nrow(subset(x, condition))` to count the instances where `condition`
#' applies inefficiently requires doing a full subset of `x` just to
#' count the number of rows in the resulting subset.
#' There are a number of equivalent expressions that don't require the full
#' subset, e.g. `with(x, sum(condition))` (or, more generically,
#' `with(x, sum(condition, na.rm = TRUE))`).
#'
#' @examples
#' # will produce lints
#' lint(
#' text = "nrow(subset(x, is_treatment))",
#' linters = nrow_subset_linter()
#' )
#'
#' # okay
#' lint(
#' text = "with(x, sum(is_treatment, na.rm = TRUE))",
#' linters = nrow_subset_linter()
#' )
#'
#' @evalRd rd_tags("nrow_subset_linter")
#' @seealso [linters] for a complete list of linters available in lintr.
#' @export
nrow_subset_linter <- make_linter_from_xpath(
xpath = "
//SYMBOL_FUNCTION_CALL[text() = 'subset']
/parent::expr
/parent::expr
/parent::expr[expr/SYMBOL_FUNCTION_CALL[text() = 'nrow']]
",
lint_message = paste(
"Use arithmetic to count the number of rows satisfying a condition,",
"rather than fully subsetting the data.frame and counting the resulting rows.",
"For example, replace nrow(subset(x, is_treatment))",
"with sum(x$is_treatment). NB: use na.rm = TRUE if `is_treatment` has",
"missing values."
)
)
1 change: 1 addition & 0 deletions inst/lintr/linters.csv
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ namespace_linter,correctness robustness configurable executing
nested_ifelse_linter,efficiency readability
no_tab_linter,style consistency deprecated
nonportable_path_linter,robustness best_practices configurable
nrow_subset_linter,efficiency consistency best_practices
numeric_leading_zero_linter,style consistency readability
nzchar_linter,efficiency best_practices consistency
object_length_linter,style readability default configurable executing
Expand Down
1 change: 1 addition & 0 deletions man/best_practices_linters.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/consistency_linters.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/efficiency_linters.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 4 additions & 3 deletions man/linters.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

36 changes: 36 additions & 0 deletions man/nrow_subset_linter.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 31 additions & 0 deletions tests/testthat/test-nrow_subset_linter.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
test_that("nrow_subset_linter skips allowed usage", {
linter <- nrow_subset_linter()

expect_lint("nrow(foo(subset(x, y == z)))", NULL, linter)
expect_lint("with(x, sum(y == z))", NULL, linter)
})

test_that("nrow_subset_linter blocks subset() cases", {
expect_lint(
"nrow(subset(x, y == z))",
rex::rex("Use arithmetic to count the number of rows satisfying a condition"),
nrow_subset_linter()
)
})

test_that("lints vectorize", {
lint_msg <- rex::rex("Use arithmetic to count the number of rows satisfying a condition")

expect_lint(
trim_some("{
nrow(subset(x, y == z))
subset(x) %>% transform(m = 2)
nrow(subset(a, b == c))
}"),
list(
list(lint_msg, line_number = 2L),
list(lint_msg, line_number = 4L)
),
nrow_subset_linter()
)
})

0 comments on commit ff1dc21

Please sign in to comment.