You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to reach a consensus with other project owners regarding best security practices for importing contributed recipes
User Goal
So that I know what security guardrails to observe while to developing new features on Pangeo Forge Cloud
Acceptance Criteria
An internal document and/or mutual understanding regarding best practices for importing nominally "untrusted" recipe modules. More details regarding motivating cases in Linked Issues section below.
Linked Issues
By way of background, there are two currently two places in the Registrar where we automatically create recipe runs in response to a push event:
In the second case, we can assume some Pangeo Forge maintainer (either a project owner or the maintainer of a feedstock) has looked at the code already. There may be risks here due to inattentiveness, etc. but we can leave those for another day.
What I'd like to discuss here is first case, wherein the submitted code is truly untrusted in the sense that literally anyone in the whole world can make a PR to /staged-recipes, and if it has a properly formatted and complete meta.yaml, then recipe runs will be created for all recipes listed in the meta.yaml. For this reason, I've assumed thus far that we should never actually import the recipe module when automatically creating recipe runs, and that is how the Registrar currently operates.
Certain open User Stories challenge this model, however. Namely:
In both of these cases, without importing the recipe module, we don't have enough information to create recipe runs. Specifically, as #3 is currently conceived, to determine whether or not to re-run a given recipe we would need to call self.sha256() on each of the recipes, in order to compare the resulting hashes to those of the prior run (if any) for the recipe. If the hashes match, we wouldn't create recipe runs at all. And for #10, we wouldn't know the names of the individual recipes within a dict_object without importing the recipe module and introspecting the specified dictionary.
Both of these User Stories have real, already-existing contributors that would like to use them, and from a design perspective would be big improvements to the platform. They would also be specifically useful for the low trust case of creating recipe runs for PRs, so simply saying "we don't support these features on PRs" seems far from ideal.
A few further questions/possibilities to kick off discussion:
Is there some importlib equivalent to yaml.safe_load which might be useful in this case?
One obvious option is to require a maintainer's approval to create recipe runs (rather than generating them automatically), but this feels (1) very un-ergonomic and tedious; (2) actually not that safe, because maintainers juggling lots of other tasks could potentially be fooled with phishing-style slight typos on import paths or the like.
The text was updated successfully, but these errors were encountered:
Thanks @jbusecke for the helpful suggestion of starting by enabling #10 just for members of the pangeo-forge GitHub org. This seems like a good way to get some use out of these features and see how they perform, without viewing the issue discussed here to be a fundamental blocker.
User Profile
As a project owner
User Action
I want to reach a consensus with other project owners regarding best security practices for importing contributed recipes
User Goal
So that I know what security guardrails to observe while to developing new features on Pangeo Forge Cloud
Acceptance Criteria
An internal document and/or mutual understanding regarding best practices for importing nominally "untrusted" recipe modules. More details regarding motivating cases in Linked Issues section below.
Linked Issues
By way of background, there are two currently two places in the Registrar where we automatically create recipe runs in response to a push event:
In the second case, we can assume some Pangeo Forge maintainer (either a project owner or the maintainer of a feedstock) has looked at the code already. There may be risks here due to inattentiveness, etc. but we can leave those for another day.
What I'd like to discuss here is first case, wherein the submitted code is truly untrusted in the sense that literally anyone in the whole world can make a PR to
/staged-recipes
, and if it has a properly formatted and completemeta.yaml
, then recipe runs will be created for all recipes listed in themeta.yaml
. For this reason, I've assumed thus far that we should never actually import the recipe module when automatically creating recipe runs, and that is how the Registrar currently operates.Certain open User Stories challenge this model, however. Namely:
meta.yaml
#10In both of these cases, without importing the recipe module, we don't have enough information to create recipe runs. Specifically, as #3 is currently conceived, to determine whether or not to re-run a given recipe we would need to call
self.sha256()
on each of the recipes, in order to compare the resulting hashes to those of the prior run (if any) for the recipe. If the hashes match, we wouldn't create recipe runs at all. And for #10, we wouldn't know the names of the individual recipes within adict_object
without importing the recipe module and introspecting the specified dictionary.Both of these User Stories have real, already-existing contributors that would like to use them, and from a design perspective would be big improvements to the platform. They would also be specifically useful for the low trust case of creating recipe runs for PRs, so simply saying "we don't support these features on PRs" seems far from ideal.
A few further questions/possibilities to kick off discussion:
importlib
equivalent toyaml.safe_load
which might be useful in this case?The text was updated successfully, but these errors were encountered: