Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in exclude() 'from must be a finite number' for seq() usage #576

Closed
MichaelChirico opened this issue Sep 6, 2024 · 1 comment · Fixed by #588
Closed

Error in exclude() 'from must be a finite number' for seq() usage #576

MichaelChirico opened this issue Sep 6, 2024 · 1 comment · Fixed by #588

Comments

@MichaelChirico
Copy link
Contributor

Same error as old #322, but not related; this is coming from a different stack trace entirely:

9: stop("'from' must be a finite number")
8: seq.default(df[i, "first_line"], df[i, "last_line"])
7: seq(df[i, "first_line"], df[i, "last_line"])
6: seq(df[i, "first_line"], df[i, "last_line"]) %in% excl[[file]]
5: FUN(X[[i]], ...)
4: vapply(X, FUN, ..., FUN.VALUE = logical(1))
3: vlapply(seq_len(NROW(df)), function(i) {
       file <- df[i, "full_name"]
       which_exclusion <- match(file, names(excl))
       !is.na(which_exclusion) && (identical(excl[[which_exclusion]], 
           Inf) || all(seq(df[i, "first_line"], df[i, "last_line"]) %in% 
           excl[[file]]))
   })
2: exclude(coverage, line_exclusions = line_exclusions, function_exclusions = function_exclusions, 
       path = root)
1: covr::package_coverage()

Most likely this is related to #567. I am trying to reproduce the CI error locally, to no avail.

I will note that package_coverage() felt interminable -- the execution took at least 40 minutes. There was no segfault, though it did not succeed.

I'll try and debug the error but given the run times it's very hard to reproduce; filing first.

@MichaelChirico
Copy link
Contributor Author

MichaelChirico commented Sep 6, 2024

OK, there's a missing row in df

subset(df, is.na(first_line))
#           filename    functions first_line first_byte last_line last_byte first_column last_column first_parsed
# 135 R/data.table.R [.data.table         NA         NA        NA        NA           NA          NA           NA
#     last_parsed value                                      full_name
# 135          NA     1 /home/michael/github/data.table/R/data.table.R

That's correctly copied over from the coverage object:

coverage[[135L]]
# $value
# [1] 1

# $srcref
# <srcref: file "/home/michael/github/data.table/R/data.table.R" chars NA:NA to NA:NA>

# $functions
# [1] "[.data.table"

I see the same in the trace_files and that's about as far into the weeds as I'll get for now.

kyleam added a commit to kyleam/covr that referenced this issue Nov 15, 2024
utils::getParseData has a longstanding bug: for an installed package,
parse data is available only for the last file [1].  To work around
that, the get_tokens helper first calls getParseData and then falls
back to custom logic that extracts the concatenated source lines,
splits them on #line directives, and calls getParseData on each file's
lines.

The getParseData bug was fixed in R 4.4.0 (r84538).  Unfortunately
that change causes at least two issues (for some subset of packages):
a substantial performance regression [2] and an error when applying
exclusions [3].

Under R 4.4, getParseData always returns non-NULL as a result of that
change when calculating package coverage (in other words, the
get_parse_data fallback is _not_ triggered).  The slowdown is
partially due to the parse data no longer being cached across
get_tokens calls.  Another relevant aspect, for both the slowdown and
the error applying exclusions, is likely that the new getParseData
returns data for the entire package rather than the per-file parse
data the downstream covr code expects.

One solution would be to adapt covr's caching and handling of the
getParseData when running under R 4.4.0 or later.  Instead go with a
simpler and more minimal fix.  Reorder the calls so that the
get_parse_data call, which we know has been the primary code path for
package coverage before R 4.4.0, is the first call tried.  Leave
getParseData as the fallback to handle the non-package coverage cases.

[1] r-lib#154
    https://bugs.r-project.org/show_bug.cgi?id=16756

[2] As an extreme case, calling package_coverage on R.utils goes from
    under 15 minutes to over 6 hours.

[3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of
    packages that hit into this error.

Closes r-lib#576
Closes r-lib#579
Re: r-lib#567
kyleam added a commit to kyleam/covr that referenced this issue Nov 15, 2024
utils::getParseData has a longstanding bug: for an installed package,
parse data is available only for the last file [1].  To work around
that, the get_tokens helper first calls getParseData and then falls
back to custom logic that extracts the concatenated source lines,
splits them on #line directives, and calls getParseData on each file's
lines.

The getParseData bug was fixed in R 4.4.0 (r84538).  Unfortunately
that change causes at least two issues (for some subset of packages):
a substantial performance regression [2] and an error when applying
exclusions [3].

Under R 4.4, getParseData always returns non-NULL as a result of that
change when calculating package coverage (in other words, the
get_parse_data fallback is _not_ triggered).  The slowdown is
partially due to the parse data no longer being cached across
get_tokens calls.  Another relevant aspect, for both the slowdown and
the error applying exclusions, is likely that the new getParseData
returns data for the entire package rather than the per-file parse
data the downstream covr code expects.

One solution would be to adapt covr's caching and handling of the
getParseData when running under R 4.4.0 or later.  Instead go with a
simpler and more minimal fix.  Reorder the calls so that the
get_parse_data call, which we know has been the primary code path for
package coverage before R 4.4.0, is the first call tried.  Leave
getParseData as the fallback to handle the non-package coverage cases.

[1] r-lib#154
    https://bugs.r-project.org/show_bug.cgi?id=16756

[2] As an extreme case, calling package_coverage on R.utils goes from
    under 15 minutes to over 6 hours.

[3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of
    packages that hit into this error.

Closes r-lib#576
Closes r-lib#579
Re: r-lib#567
jimhester pushed a commit that referenced this issue Nov 19, 2024
* split_on_line_directives: guard against input without a directive

get_parse_data extracts lines from the input srcfile object and feeds
them to split_on_line_directives, which expects the lines to be a
concatenation of all the package R files, separated by #line
directives.

With how get_parse_data is currently called, that expectation is met.
get_parse_data is called only if utils::getParseData returns NULL, and
getParseData doesn't return NULL for any of the cases where the input
does _not_ have line directives (i.e. entry points other than
package_coverage).

An upcoming commit is going to move the get_parse_data call in front
of the getParseData call, so update split_on_line_directives to detect
the "no directives" case.

Without this guard, the mapply call in split_on_line_directives would
error under an R version before 4.2; with R 4.2 or later,
split_on_line_directives returns empty.

* split_on_line_directives: fix handling of single-file package case

split_on_line_directives breaks the input at #line directives and
returns a named list of lines for each file.

For a package with a single file under R/, there is one directive.
The bounds calculation is still correct for that case.  However, the
return value is incorrectly a matrix rather than a list because the
mapply call simplifies the result.

At this point, this bug is mostly [*] unexposed because this code path
is only triggered if utils::getParseData returns NULL, and it should
always return a non-NULL result for the single-file package case.  The
next commit will reorder things, exposing the bug.

Tell mapply to not simplify the result.

[*] The simplification to a matrix could also happen for multi-file
    packages in the unlikely event that all files have the same number
    of lines.

* parse_data: promote custom parse logic for R 4.4 compatibility

utils::getParseData has a longstanding bug: for an installed package,
parse data is available only for the last file [1].  To work around
that, the get_tokens helper first calls getParseData and then falls
back to custom logic that extracts the concatenated source lines,
splits them on #line directives, and calls getParseData on each file's
lines.

The getParseData bug was fixed in R 4.4.0 (r84538).  Unfortunately
that change causes at least two issues (for some subset of packages):
a substantial performance regression [2] and an error when applying
exclusions [3].

Under R 4.4, getParseData always returns non-NULL as a result of that
change when calculating package coverage (in other words, the
get_parse_data fallback is _not_ triggered).  The slowdown is
partially due to the parse data no longer being cached across
get_tokens calls.  Another relevant aspect, for both the slowdown and
the error applying exclusions, is likely that the new getParseData
returns data for the entire package rather than the per-file parse
data the downstream covr code expects.

One solution would be to adapt covr's caching and handling of the
getParseData when running under R 4.4.0 or later.  Instead go with a
simpler and more minimal fix.  Reorder the calls so that the
get_parse_data call, which we know has been the primary code path for
package coverage before R 4.4.0, is the first call tried.  Leave
getParseData as the fallback to handle the non-package coverage cases.

[1] #154
    https://bugs.r-project.org/show_bug.cgi?id=16756

[2] As an extreme case, calling package_coverage on R.utils goes from
    under 15 minutes to over 6 hours.

[3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of
    packages that hit into this error.

Closes #576
Closes #579
Re: #567
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant