Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly decide on availability of $secrets argument #979

Open
matthias-pichler opened this issue Aug 19, 2024 · 16 comments
Open

Properly decide on availability of $secrets argument #979

matthias-pichler opened this issue Aug 19, 2024 · 16 comments
Assignees
Labels
area: spec Changes in the Specification change: breaking A breaking change that will impact in a major version change. change: documentation Improvements or additions to documentation. It won't impact a version change.
Milestone

Comments

@matthias-pichler
Copy link
Collaborator

What would you like to be added:

Screenshot 2024-08-19 at 20 35 56

currently $secrets is documented to be available everywhere, but I know @cdavernas mentioned it should only be available in input.from to avoid leaks.

Why is this needed:

@JBBianchi
Copy link
Member

JBBianchi commented Aug 20, 2024

I understand the reasoning but I'm concerned it might have unintended consequences. Keeping $secrets accessible everywhere makes it clear to the author that they're dealing with sensitive information, ensuring caution in its use. However, if $secrets is only available in input.from, there's a risk that as the data moves through the pipeline, the secret could be aliased, potentially obscuring its sensitive nature.

do:
- someUnsafeCall:
  call: http
  with:
    method: get
    endpoint: ${ "https://server.com/api/unsafe-auth/\($secrets.token)" # <-- the use of sensitive data is clear

vs

do:
- someUnsafeCall:
  input:
    from:
      token: $secrets.token
  call: http
  with:
    method: get
    endpoint: https://server.com/api/unsafe-auth/{token} # <-- the use of sensitive data is not obvious

@cdavernas
Copy link
Member

cdavernas commented Aug 20, 2024

@JBBianchi I agree with you, but on the other hand leaving them accessible everywhere is like telling the authors it's ok to use them everywhere, like when calling a (potentially custom) function.

Plus, if the user explicitly decided to flow it down, the responsibility is IMHO his.

@matthias-pichler
Copy link
Collaborator Author

matthias-pichler commented Aug 20, 2024

@JBBianchi good point. It's much harder to accidentally leak the $secrets arg than the input

@cdavernas
Copy link
Member

@JBBianchi good point. It's much harder to accidentally leak the $secrets arg than the input

I disagree. I don't see why it would be harder. For most users, $secrets is just gonna be yet another runtime expression variable.

Annyways, whatever you guys decide is fine by me. I just think we should be extra careful with those, as it's our responsibility to "educate" authors on the proper approach to adopt when using this feature.

@cdavernas
Copy link
Member

cdavernas commented Aug 20, 2024

To quote ChatGPT's opinion on it:

In a Serverless Workflow, the availability of the $secrets argument should be carefully considered to balance security and flexibility.

1. Security Considerations

  • Limiting Scope: Making $secrets available only at the workflow input (or in controlled, specific places) reduces the risk of accidental exposure or misuse of sensitive data. The broader the availability of secrets, the higher the chances of them being logged, passed to unintended functions, or exposed in runtime errors.
  • Principle of Least Privilege: This principle suggests that secrets should only be accessible where absolutely necessary. By restricting $secrets to workflow input, you adhere to this principle, ensuring that secrets are only available to components that truly need them.

2. Flexibility Considerations

  • Complex Workflows: In some workflows, you might need to access secrets at different stages, not just at the beginning. Allowing $secrets to be available throughout the workflow can enhance flexibility and make the workflow easier to design and maintain.
  • Modular and Reusable Components: If different parts of the workflow are designed to be modular and reusable, they might require access to secrets independently. This scenario would argue for broader access to $secrets.

Best Practice Recommendation

Given the above considerations, the most secure approach is to limit the availability of $secrets to the workflow input and specific, well-defined points within the workflow where access is necessary. This could include:

  • The start of the workflow to initialize required parameters.
  • Specific states or functions that need secret access.
  • Ensuring secrets are not accidentally passed down through the workflow unnecessarily.

Implementation Strategy

  • Default Restriction: Make $secrets available by default only at the workflow input.
  • Explicit Access: Allow developers to explicitly declare if and where they need access to $secrets within the workflow. This could be achieved via configuration or annotations, requiring an extra step to expose secrets, thus making users consciously decide when and where secrets are needed.

This approach balances security with the necessary flexibility, ensuring that secrets are protected while still allowing workflows to function as needed.

@matthias-pichler
Copy link
Collaborator Author

I think it's easier to leak when the secret has to be put in the input

imagine I want to send some object (e.g. pet) to an api key authenticated API, first I write:

do:
  - apiCall:
      call: http
      with:
        method: post
        endpoint: https://server.com/api/pets
        body: ${ . }

... then I realise I forgot auth so I add it

do:
  - apiCall:
      input:
        from:
          key: ${ $secrets.apiKey }
      call: http
      with:
        method: post
        endpoint: https://server.com/api/pets
        headers:
          X-API-Key: ${ .key }
        body: ${ . }

and I already made the subtle mistake of no ONLY sending my api key, because input transformations have so much impact an the whole task. It would have been much harder to make such a mistake when using $secrets

@JBBianchi
Copy link
Member

Maybe there is a mitigation to be found with implicitly "sharing" a new $secrets variable ?

e.g.:

do:
  - apiCall:
      input:
        $secrets:
          key: ${ $secrets.apiKey }
      call: http
      with:
        method: post
        endpoint: https://server.com/api/pets
        headers:
          X-API-Key: ${ $secrets.key }
        body: ${ . }

And, for instance

do:
  - apiCall:
      input:
        from:
          key: ${ $secrets.apiKey }
#...

Would fail because we do not inject $secrets when processing input.from but only input.$secrets ?

Just throwing ideas here.

@matthias-pichler
Copy link
Collaborator Author

Would fail because we do not inject $secrets when processing input.from but only input.$secrets ?

This feels like it achieves the same just with extra steps because now I have to use ${ .$secrets.foo } (extra .) instead of ${ $secrets.foo } inside the task which seems really confusing. Plus I see this to just being used as

do:
  - apiCall:
      input:
        $secrets: ${ $secrets } # also looks terrible doesn't it 😅 
      call: http
      with:
        ...

@cdavernas
Copy link
Member

@matthias-pichler very good sample, which perfectly demonstrates your rightful concerns. You could however leak them in a similar fashion using the runtime expression argument.

Anyways, I won't endlessly ramble on the subject: I think you make a strong and valid point.

@JBBianchi even though I love both the idea and the (awesome) will to find a consensual solution, I feel this would complexify things for authors and might even be confusing.


To conclude, you guys convinced me, and have my vote to definitely close that issue!

@JBBianchi
Copy link
Member

I don't think we should dismiss this topic, the concern is real. Imagine a distributed runtime where each activity processor is a FaaS for instance.
Is it really a good idea to always send the all $secrets over the wire ?
Shouldn't we find a way to explicitly send the $secrets required by the activity ?
In that case, shouldn't we enforce a variable name that clearly underlines the sensitive nature of the data to the author ?

@cdavernas cdavernas added change: documentation Improvements or additions to documentation. It won't impact a version change. area: spec Changes in the Specification change: breaking A breaking change that will impact in a major version change. labels Aug 21, 2024
@cdavernas cdavernas added this to the v1.0.0 milestone Aug 21, 2024
@cdavernas
Copy link
Member

Imagine a distributed runtime where each activity processor is a FaaS for instance.

Very good point, which was also, if I remember properly, part of the motivation of the initial restriction.

@matthias-pichler Any idea/suggestion?

@matthias-pichler
Copy link
Collaborator Author

Imagine a distributed runtime where each activity processor is a FaaS for instance.

Our runtime is currently implemented such that every task is run in a Lambda Function.

I understand that it's not ideal/least privilege that every Lambda Function needs access to the secrets, however given the tradeoffs I would still prefer having $secrets everywhere. Furthermore currently we have input transformation and task execution in two separate functions but due to performance considerations I am currently merging input/output transforms and task executions into one Lambda Function. So given these circumstances (which I assume are fairly common) it wouldn't change anything and every Task Lambda Function would still need access to secrets for the sake of input transformations

@JBBianchi
Copy link
Member

Imagine a distributed runtime where each activity processor is a FaaS for instance.

Our runtime is currently implemented such that every task is run in a Lambda Function.

I understand that it's not ideal/least privilege that every Lambda Function needs access to the secrets, however given the tradeoffs I would still prefer having $secrets everywhere. Furthermore currently we have input transformation and task execution in two separate functions but due to performance considerations I am currently merging input/output transforms and task executions into one Lambda Function. So given these circumstances (which I assume are fairly common) it wouldn't change anything and every Task Lambda Function would still need access to secrets for the sake of input transformations

Well, I'm glad that my hypothetical use-case is an actual one. But I don't think it's OK to share secrets with "everyone", that's quite against their purpose.

What are the arguments against "explicitly select" the secrets provided to a task ?

The following is just a conceptual example, the how can be adjusted:

document:
  #...
use:
  secrets:
  - GitHubToken
  - XToken
do:
  - release:
      secrets:
        - GitHubToken # explicitly share the `GitHubToken` secret
      call: http
      with:
        method: GET
        endpoint:
          uril: https://api.github.com/my-repo/release
          token: ${ $secrets.GitHubToken }
- announceOnX:
      secrets:
        - XToken # explicitly share the `XToken` secret
      call: http
      with:
        method: POST
        endpoint:
          uril: https://api.github.com/my-repo/release
          token: ${ $secrets.XToken }
        body: ${ "Checkout our new release at \(.artifact.url)" }
- updateInternalRegistry:
      call: http
      with:
        method: POST
        endpoint:
          uril: https://server.local
          token: ${ $secrets.GitHubToken } # <--- error, $secrets is empty

It does add an extra step but it doesn't seem to be a huge overhead.

@matthias-pichler
Copy link
Collaborator Author

matthias-pichler commented Aug 21, 2024

Well, I'm glad that my hypothetical use-case is an actual one. But I don't think it's OK to share secrets with "everyone", that's quite against their purpose.

I don't quite agree with "share secrets with "everyone"" ... all the Lambda Functions are managed and implemented by us and it would not be any different if it were run in a container. If a custom function (which we don't control) is invoked the input and any expressions in it are evaluated on our side and then the untrusted code is executed with the parsed input... I don't really see who'd we be sharing the secrets with?

What are the arguments against "explicitly select" the secrets provided to a task ?

I think the only one I have is that it's just more work for authors and it seems super repetitive, e.g. when building a workflow that calls the same API 4 times having to specify it 5 times in total (including at the workflow level).
It's also so much more overhead around secrets that I have not seen from any other workflow engine. E.g. Pipedream, GitHub, GitLab, etc you define secrets and then they are available as environment variables. And I don't really see any reason why we should make it much more complicated

@JBBianchi
Copy link
Member

I don't quite agree with "share secrets with "everyone"" ... all the Lambda Functions are managed and implemented by us and it would not be any different if it were run in a container. If a custom function (which we don't control) is invoked the input and any expressions in it are evaluated on our side and then the untrusted code is executed with the parsed input... I don't really see who'd we be sharing the secrets with?

I must admit I dramatized the situation (although I wouldn't say relying on a 3rd party infrastructure is the same as running a container on your own isolated host). My concern doesn't 100% apply on the situation you describe. The problem would rather be when calling a 3rd party activity processor, or if the network has been compromised (e.g. subject to MitM attacks).

I think the only one I have is that it's just more work for authors and it seems super repetitive, e.g. when building a workflow that calls the same API 4 times having to specify it 5 times in total (including at the workflow level).
It's also so much more overhead around secrets that I have not seen from any other workflow engine. E.g. Pipedream, GitHub, GitLab, etc you define secrets and then they are available as environment variables. And I don't really see any reason why we should make it much more complicated

I agree, it's not very pretty and quite cumbersome.

@cdavernas cdavernas changed the title Properly decide on availability of $secrets argument Properly decide on availability of $secrets argument Sep 6, 2024
@cdavernas cdavernas modified the milestones: v1.0.0-alpha3, v1.0.0 Oct 7, 2024
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: spec Changes in the Specification change: breaking A breaking change that will impact in a major version change. change: documentation Improvements or additions to documentation. It won't impact a version change.
Projects
Status: Backlog
Development

No branches or pull requests

3 participants