-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function to modify path-like environment variable in a context #4681
base: 5.0.x
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The env parameter on it's own can be cumbersome to use as it requires redefining the entire environment
oh darn, i thought it only replaced the specified environment variables, but indeed it's not the case (and i guess all our docs do indicate that). Makes this pretty much useless for EB apart from maybe a few extremely specific cases.
The env parameter would not work for other python function calls (eg testing a module that independently calls commands)
though they really shouldn't do that
I think we absolutely need this
easybuild/tools/environment.py
Outdated
|
||
|
||
@contextlib.contextmanager | ||
def wrap_path_env(prepend=None, append=None, sep=os.pathsep, strict=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to use this also (perhaps mostly) for non-path environment variables, so just replace them
A few real cases that has just come up in the past days have been the need to set "BUILD_VERSION" and "HOME" (which of course should be isolated).
Would be worth covering the case of
with env.wrap_path_env(override={'BUILD_VERSION': self.version}):
...
or do we prefer to
with env.wrap_path_env():
setenv(...)
though, in either case, maybe one ought to downplay the _path
part?
Also, while rare, if we ever need to, say, set TCLLIBPATH (space separator) and a a custom libpath as the same time, we are in trouble. So, sep
as s separate argument worries me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking about this some more. I guess one can always nest in the rare case that one wants to set the separator.
with wrap_env(prepend=xxx), wrap_env(prepend=yyy, sep=' ')
though lines get long, the situation is extremely rare.
I would like to see something like this used for EB when it installs extensions/components in bundles (pseudo-code):
for extension in extensions:
with wrap_env(): # prevent extensions from leaking into global env
process(extension)
and they could, in turn of course just do this themselves when they need further isolation in different steps.
But this also got me thinking how this would possible interact with parallel installation of extensions (not sure if that is using multiprocessing or async or so? Haven't looked at that code yet, so i don't know how/if it would impact the environment)
Sorry for the unstructured brain dump, weekend is calling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the function is set up right now it is possible to pass a dict
to sep with a separator for every variable.
It will default to os.sep
if strict = False
or raise if there is no separator found for a variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this also got me thinking how this would possible interact with parallel installation of extensions (not sure if that is using multiprocessing or async or so? Haven't looked at that code yet, so i don't know how/if it would impact the environment)
Concerning parallel extension installation, it seems that ThreadPoolExecutor is being used to spawn threads
thread_pool = ThreadPoolExecutor(max_workers=self.cfg['parallel']) |
subprocess.Popen
through run_shell_cmd
https://github.com/easybuilders/easybuild-easyblocks/blob/b9b242a854332a7f0fbde7c8ec112b8f17638567/easybuild/easyblocks/generic/rpackage.py#L322
While the environment of each subprocess should be isolated, the one of threads is not, so unless the code is modified to make sure each run_shell_cmd
is invoked only after the previous one in the queue has started it might be difficult to avoid race conditions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the override argument, beside the threading discussion, I guess one design choice that I am unsure of is whether we should reset only variables that are being explicitly managed by the context or the entire environment,
eg using with wrap_env(override={'TEST1': '1'}):
if TEST2
is modified inside the context manager, it will not be reset after the context exits (atleast how the function is setup right now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, the issues i've hit recently has been about wrapping the entire environment and restoring, since so many easyblocks, hooks, etc. might be touching all kinds of environment variables which indirectly impacts other code.
I don't know of any legitimate cases where we want code to leak into os.environ for all the remaining runtime of eb
. Such thing would seem extremely fragile hack to me, but I maybe there are cases where it's truly needed?
If an easyblock needs some variable to be set for config + build + install step, then they can/should set again each time.
If one goes with just wrapping everything, i think it makes sense to just keep it dead simple, and just move the append/prepend helpers as separate functions.
import easybuild.tools.environment as env
def xxx(...):
with wrap_env():
env.setvar('BUILD_VERSION', '1.2.3')
env.append_path('PYTHONPATH', './lib/python')
env.prepend_path('TCLLIBPATH, './lib', sep=' ')
run_cmd(...)
but this design would allow for separating the context from the modifications:
def install_extensions(self, ...):
for extension in self.extension_list:
with wrap_env():
extension.install() # Might be setting environment variables?
However, the parallel extensions leaves me scratching my chin again here, maybe env
isn't so bad?
Parallel extensions
Now this throws a whole wrench into thing. Any python code that runs during these steps must not ever touch any global state; whether it is os.chdir
or os.environ
, and context can't help here.
The only possibility here is to allow the use of run_shell_cmd's env
parameter, despite it being cumbersome.
Maybe that is just best solved with some added convenience methods instead?
def copy_env(override, prepend, append, sep):
env_copy = os.environ,copy()
for key, value in override.items()
env_copy[key] = val
...
return env_copy
or someting like a
class CustomEnv:
def __init__(self):
self.environ = os.environ.copy()
def set(self, key, val):
self.environ[key] = val
return self
def append(self, key, val, sep=':'):
self.environ[key] = ...
return self
def __iter__(self):
return iter(self.environ)
or maybe an extra override_env
for run_shell_cmd directly.
run_shell_cmd('install_torchvision', override_env={'BUILD_VERSION', '1.2.3'})
run_shell_cmd('install_torchvision', override_env=append_path('PYTHONPATH', './lib/python')) # using a helper function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class approach could(should?) still be a context manager.
-
We could either have something like
class CustomEnv: ... env = CustomEnv() env.append_path(...) env.prepend_path(...) with env.apply(): ...
-
or something like
class CustomEnv: ... with CustomEnv() as env: env.append_path(...) env.prepend_path(...) ...
Regarding run_shell_cmd
probably adding an override_env
argument might be the easiest solution.
I was thinking if there might be some way to intercept all function calls inside a context and modify them, but that might be very hackish and likely fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could(should?) still be a context manager.
Maybe, the issue is what anything that might run in parallel threads or async must never modify os.environ
, their only option is to pass the env
to subprocess
(so my idea was that it would be passed to run_shell_cmd(..., env=CustomEnv(xxxx))
using some custom dict-like class just to possible make things more convenient for the typical usecases.
Currently not a lot of parallel code in EB, but if we do any major rework here might be good to design in a way that avoids those problematic globals, just like with cwd
which faces the same problem (and easyblocks should be using run_shell_cmd(.., workdir=relative_path)
instead).
28df3a4
to
a0497db
Compare
The proposed function
wrap_path_env
would have the utility to allow running commands or functions with modified values forPATH
/LD_LIBRARY_PATH
/...The function allows to both append and prepend to a path-like environment variables, using by default the default system separator.
A custom separator can be specified for each variable by passing a dictionary of strings with the same keys as
prepend and append.
If a key is not present in the sep dictionary, os.pathsep will be used unless strict is True, then an error
will be raised.
Comparison to
env=
parameter inrun_shell_cmd
for 5.0.x:env
parameter on it's own can be cumbersome to use as it requires redefining the entire environment, which would entail copying the environment and modifying the environment variables to have the same effect as the proposed function.env
parameter would not work for other python function calls (eg testing a module that independently calls commands)Comparison with currently available
framework.environment
functions:reset_changes
to reset all variables modified bysetvar
but it is possible that the user has already set some variables and only want to temporarily modify only some of them or different ones, without resetting the originally set variables.Possible enhancements/discussions:
set
/override
parameter to override the value instead of just appending/prependingset
/override
paramenter) i am not sure if this should be implemented in a separate function.