Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assess the feasibility of CodeDepends for all the static code analysis #41

Closed
wlandau-lilly opened this issue May 31, 2017 · 9 comments
Closed

Comments

@wlandau-lilly
Copy link
Collaborator

Same as richfitz/remake#172.

@wlandau-lilly
Copy link
Collaborator Author

Apparently, CodeDepends has been around since 2009. Maybe it is only now being revived.

@wlandau-lilly wlandau-lilly changed the title Consider the CodeDepends package. Consider the CodeDepends package as a replacement for codetools. Jun 1, 2017
@wlandau-lilly
Copy link
Collaborator Author

wlandau-lilly commented Jun 9, 2017

f <- function(){
  b = get("x", envir = globalenv())
  digest::digest(1234)
}
codetools::findGlobals(f) # Incorrectly ignores both x and digest().
CodeDepends::getInputs(body(f)) # Still ignores x but does find digest().

@wlandau-lilly
Copy link
Collaborator Author

wlandau-lilly commented Jun 9, 2017

Could solve #9 and #25.

@wlandau-lilly
Copy link
Collaborator Author

wlandau-lilly commented Jun 14, 2017

After more thought, I have decided against replacing codetools with CodeDepends in the preexisting internals of drake. Reasons:

  1. Existing projects powered by drake may need to rerun from scratch.
  2. The behavior of drake will not actually change much. Functions referenced with scoping rules :: and ::: would be detected, but after digging through CodeDepends and its documentation, I see no other potential improvement to Drake overlooks dependencies in some edge cases. #11.
  3. CodeDepends builds on top of codetools rather than replacing its entire existence.
  4. It would be a huge headache to overhaul the internals of drake.

However, I am very eager to use CodeDepends to solve #9 and #25.

@wlandau
Copy link
Member

wlandau commented Aug 18, 2018

I am considering reviving this issue after a recent conversation with @gmbecker at R/Pharma 2018.

drake now walks the AST manually because it has special code analysis needs, but it also calls codetools to find the globals. So we're walking the AST twice, which is slow. I think CodeDepends can handle all the code analysis in one pass.

But before I reopen this, I would strongly prefer to have

  1. Easier compatibility between CodeDepends and the default install.packages(). drake's long list of dependencies already turns off many users, and I would rather not add to the problem. Ref: The graph package moved from suggests to imports. duncantl/CodeDepends#30, Installation issue due to graph package not on CRAN duncantl/CodeDepends#34.
  2. A robust getInputs(). getInputs() may already be robust enough, but there were several issues, and I have not checked all the variants of CodeDepends::getInputs() fails very strangely if filter() is part of the expression #268 recently.

After I can confirm all the above is resolved, a couple initial steps on my end will be to

  1. Express the internal code_dependencies() function in terms of CodeDepends.
  2. See what kind of code analysis performance gains we accrue.

@gmbecker
Copy link

gmbecker commented Aug 20, 2018

@wlandau The new way of installing bioconductor packages is via the BiocManager package which itself is on CRAN. Do you feel this is easier enough than the old way to make number 1 a less stringent requirement?

As for 2, I'm extremely confident that CodeDepends should be able to cover all the specialized needs drake has, it is just a question of how many custom handles drake will need to use vs where can improve the default handlers generally in ways that also meet drake's needs. That will be an ongoing conversation/collaboration between you and me, I think.

The specific issue you linked to, though, is covered, I think. The behavior now should be that if dplyr or tidyverse was loaded anywhere in the script that getInputs() knows about, the dplyr filter() will be assumed (ie NSE), otherwise it won't. If you want drake to err the other way and just always assume filter() means dplyr::filter() that is easily doable via a custom handler.

@wlandau
Copy link
Member

wlandau commented Aug 22, 2018

  1. As you mentioned in Installation issue due to graph package not on CRAN duncantl/CodeDepends#34 (comment), BiocManager does smooth the installation process. In fact, since drake needs CodeDepends in Suggests: for code_to_plan(), I have added a BiocManager::install() call call in .travis.yml. But given drake's enormous list of dependencies, which unavoidably includes igraph and its own set of installation difficulties, I am just hyper vigilant about any slight increase to the friction of the installation experience. I am willing to reevaluate this position after seeing benchmarks of CodeDepends-driven code analysis.

To try to chip away at drake's dependency problem, I have identified several packages that are currently only necessary for specific minor features: for example, CodeDepends, visNetwork, networkD3, visNetwork, and webshot. These packages are now in Suggests:, and drake calls assert_pkg() whenever a minor feature requires one.

drake/R/utils.R

Lines 1 to 22 in 82ed556

assert_pkg <- function(pkg, version = NULL, install = "install.packages") {
if (!requireNamespace(pkg, quietly = TRUE)){
stop(
"package ", pkg, " not installed. ",
"Please install it with ", install, "(\"", pkg, "\").",
call. = FALSE
)
}
if (is.null(version)){
return()
}
installed_version <- as.character(utils::packageVersion(pkg))
is_too_old <- utils::compareVersion(installed_version, version) < 0
if (is_too_old){
stop(
"package ", pkg, " must be version ", version, " or greater.",
"Found version ", version, " installed.",
"Please update it with ", install, "(\"", pkg, "\").",
call. = FALSE
)
}
}

Is this an approach you would be willing to consider for graph?

  1. It sounds like there has been enough progress to begin a proof of concept implementation for drake. The real deciding factor will be performance benchmarks, and I think we can get to the benchmarks without all the handlers necessarily in place from the beginning. Reopening.

@wlandau wlandau reopened this Aug 22, 2018
@wlandau wlandau changed the title Consider the CodeDepends package as a replacement for codetools. Consider CodeDepends for all the static code analysis Aug 22, 2018
@wlandau
Copy link
Member

wlandau commented Aug 27, 2018

Hmm... I just saw that BiocManager requires R >= 3.5.0, so it may be a while before drake can adopt things that depend in it.

@wlandau wlandau changed the title Consider CodeDepends for all the static code analysis Assess the feasibility of CodeDepends for all the static code analysis Oct 27, 2018
@wlandau
Copy link
Member

wlandau commented Oct 28, 2018

A major priority and direction for development right now is to reduce the number of package dependencies, especially for components as central and sensitive as code analysis. (related: #562, #563). I may revisit CodeDepends for this purpose at some point, but I do not forsee it becoming a backend for drake's core dependency detection functionality any time soon. To achieve the possible speed gains mentioned above, I would actually prefer to find a base R alternative to codetools::findGlobals().

On the other hand, I think people really appreciate functionality that converts drake plans to and from scripts and notebooks (ref: #505, #193 (comment)). CodeDepends plays a central role there, so it will remain in "Suggests:" for the forseeable future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants