Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get environment variables from R process for tasks #4606

Merged
merged 4 commits into from
Sep 11, 2024
Merged

Conversation

juliasilge
Copy link
Contributor

@juliasilge juliasilge commented Sep 6, 2024

Addresses #2723

Together with posit-dev/ark#507, this sets up infrastructure so we can propagate known sets of environment variables to our tasks that use ShellExecution (or ProcessExecution). I added the TESTTHAT environment variables for devtools::test() only, for now. We can add other sets of environment variables as they come up.

TODO:

  • Bump ark

QA Notes

To QA this, you'll need an R package with some failing tests. Here's one!

  • Run testthat::set_max_fails(1) in the console
  • Run the command "R: Test R Package"
  • Notice that although this package has 4 failing tests, only ones from the first file are run. You'll see output like this:

Maximum number of failures exceeded; quitting at end of file.
ℹ Increase this number with (e.g.) testthat::set_max_fails(Inf)

Copy link
Collaborator

@jmcphers jmcphers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to do "startsWith" but rather does a regex based match. e.g. if you ask for POS you'll get vars with POS anywhere in the var name.

> .ps.rpc.get_env_vars('POS')
$HOMEBREW_REPOSITORY
[1] "/opt/homebrew"

$POSITRON
[1] "1"

$POSITRON_VERSION
[1] "2024.09.0"

I don't feel great about the regex based approach; it means that we need to worry about magic characters and it's also easy to suck in environment values that are unintended -- which can contain sensitive values, change behavior in unwanted ways, etc.

I think it'd be better for this RPC to take an array of the exact variable names that are sought, and to just return the values for each of those variables. I realize that's a little more laborious, and would also require us to update this if e.g. more TESTTHAT variables are added in the future, but probably worth it to be more defensive/explicit. What do you think?

@juliasilge
Copy link
Contributor Author

@DavisVaughan thought the same here about the regex approach, so I'll change both PRs to be more explicit/conservative.

@juliasilge
Copy link
Contributor Author

For a more explicit and conservative approach, looks like we can just stick with only supporting TESTTHAT_MAX_FAILS for now:

https://github.com/search?q=repo%3Ar-lib%2Ftestthat%20Sys.setenv&type=code

It's the only one that folks have a way to set on their own.

},
{
task: 'r.task.packageTest',
message: vscode.l10n.t('{taskName}', { taskName: 'Test R package' }),
rcode: 'devtools::test()',
package: 'devtools'
package: 'devtools',
envVars: await getEnvVars(['TESTTHAT_MAX_FAILS'])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had more than one, we would list them out like this:

envVars: await getEnvVars(['TESTTHAT_MAX_FAILS', 'TESTTHAT_PKG'])

@juliasilge juliasilge requested a review from jmcphers September 9, 2024 22:52
jmcphers
jmcphers previously approved these changes Sep 9, 2024
Copy link
Collaborator

@jmcphers jmcphers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pretty straightforward. Do we need to worry about the timing of the call to getRPackageTasks()? (i.e. is it always going to be run when the R session is warm? is it OK to cache the env vars as part of the task defs or do we need to worry about them changing between the task defs and the actual execution of the package test task?)

},
{
task: 'r.task.packageTest',
message: vscode.l10n.t('{taskName}', { taskName: 'Test R package' }),
rcode: 'devtools::test()',
package: 'devtools'
package: 'devtools',
envVars: await getEnvVars(['TESTTHAT_MAX_FAILS'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs some exception handling; if we fail to get the env vars for some reason, the result will be that no package tasks will be defined all. We either need to try/catch here or have getEnvVars avoid throwing in all circumstances (errors just logged and empty results returned safely)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is OK in this situation, because of how getRPackageTasks() is called when the command is executed, right? I think I would rather surface weird errors than swallow them, like for example this one I got in the midst of a weird debugging state:

Screenshot 2024-09-09 at 7 56 01 PM

To be clear, I'm not worried about this error in particular but would rather such things show up.

@jmcphers I know you approved the PR but I'll wait for one more response from you to make sure I am not misunderstanding the situation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that looks good!

@juliasilge
Copy link
Contributor Author

Do we need to worry about the timing of the call to getRPackageTasks()?

This gets called when you execute the command, so I believe it is fine:

const tasks = await getRPackageTasks();
const task = tasks.filter(task => task.definition.task === 'r.task.packageTest')[0];

To see that this responds to the _current_value of the env vars empirically, use this example R package to do:

  • testthat::set_max_fails(1)
  • See [ FAIL 2 | WARN 0 | SKIP 0 | PASS 0 ]
  • `testthat::set_max_fails(3)
  • See [ FAIL 3 | WARN 0 | SKIP 0 | PASS 0 ]

@lionel-
Copy link
Contributor

lionel- commented Sep 10, 2024

Some thoughts:

Timing

Right now the callMethod() call will only resolve when the kernel becomes idle. A better way of getting these envvars would be to push them from ark to positron after each top-level command. Positron would save the current values of these variables and use them as needed when creating tasks. This way, tasks can be run at any time rather than just when R is idle. This is not necessarily a blocker for this PR since quite often R is idle during package development, but this is still a regression for the busy case. I also wonder about the user experience as the action will not have immediate feedback in some cases, which would be confusing. So it's probably worth it to implement the push approach now.

I see two ways to go about it:

  1. Push from the kernel to positron by sending a backend-to-frontend event.

  2. Push from the kernel to the LSP. Then add an LSP request that the frontend can use without having to wait.

(1) seems the hardest (surprisingly to me, and @jmcphers might see an easier approach than I can). First it depends on who should process these events. Should it be a language-agnostic part of positron or only positron-r? If the former, it's a matter of adding the new event type on the UI comm following existing cases. If the latter, I think the UI comm is not the best place for this and we might want to create a new positron-r-specific comm that would get these events, which I think has not been done before. The advantage of such a comm is that it could be developed and experimented with "in extension" without changing core parts of positron. Once done, it should be easy to create events targetting only the positron-r extension.

In both cases, the kernel will need to to compute the current value of the envvars and send them as some kind of event, and the place to do it is https://github.com/posit-dev/ark/blob/d1f72cb03be0a2c3add85c59c147067275659eb7/crates/ark/src/interface.rs#L659.

(2) is the easiest, it requires adding the envvars as part of the ConsoleInput that we already send to the LSP after each top-level command in refresh_lsp. Then we need to implement a new LSP request to get these values. You can follow what we did for statement range for instance, starting from statement_range. I'd be happy to help with this if needed, we could pair on this for instance.

Envvars set

FWIW to me matching the envvars with a regex is not that problematic because we're only reproducing what's happening anyway when launching subprocesses from R (propagation of envvars). Here the process parent is positron, and we're pretending it's R/Ark. I agree it feels a bit icky, but it's probably fine in practice.

In the longer term, a better way would be for packages to declare what environment variables should be reexported to subprocesses via a top-level declare() annotation in their package. The process would be as follows:

  • The LSP learns about what envvars to reexport by analysing top-level declarations in package code.
  • After a top-level command, the kernel queries the LSP what envvars are reexported
  • It peeks the current values and sends these to the frontend (or to the LSP depending on our choice above).

With this setup, neither positron nor ark would have to manage a list of environment variable names or matching patterns.

@jmcphers
Copy link
Collaborator

The ability to see what environment variables are currently set in the kernel is a very useful tool in a lot of contexts, so I think it is worth considering some language-agnostic supporting infrastructure around pushing/synchronizing the set to the front end as you describe.

However, I don't think that we should hold up this PR to set up that infrastructure because it's considerably more ambitious and we really need to think through the goals/approach. If there's concern around the UX for this PR specifically, it'd be more expedient to address it by adding a progress toast to the R package task collector if the RPC takes longer than 1s or so. @lionel-, what do you think?

@lionel-
Copy link
Contributor

lionel- commented Sep 11, 2024

That would be a nice improvement as it would give user feedback. But still it might not be clear to users why their tasks are not making progress when they have a Shiny application running in the console.

Adding a language agnostic event for envvars would be easier than an extension-specific one, so that's good news that it would be the right thing to do.

@juliasilge
Copy link
Contributor Author

Today @lionel- and I chatted in a little more detail on this. As of today, we already use call_method() in a way that only works when the kernel is idle throughout our package development tasks (to get the locale, to check if a package is installed, etc); adding another use here to get env vars isn't a big change in the behavior. I'm going to open a followup issue for us to move to a different approach for all of these so we can track when/how this is a problem for users in practice.

@juliasilge
Copy link
Contributor Author

Followup issue is here: #4641

@juliasilge juliasilge merged commit b3ac5df into main Sep 11, 2024
2 checks passed
@juliasilge juliasilge deleted the env-vars-to-task branch September 11, 2024 20:42
@github-actions github-actions bot locked and limited conversation to collaborators Sep 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants