Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Ephemeral Values prototype #35077

Closed
wants to merge 17 commits into from
Closed

Conversation

apparentlymart
Copy link
Contributor

@apparentlymart apparentlymart commented Apr 24, 2024

This is another attempt at introducing to Terraform the idea of objects and values being "ephemeral", which means something like "lives only for the duration of one Terraform phase".

Terraform already has at least two concepts that meet this definition, despite us not previously naming it:

  • Provider configurations (provider blocks): Terraform re-evaluates the arguments in a provider block separately during the plan and apply phases, and doesn't mind if the configuration is different between the two as long as the apply-time configuration allows performing the actions that were proposed during the plan phase.
  • Provisioners (provisioner and connection blocks): Terraform fully evaluates these only during the apply phase, so they aren't really considered during the plan phase at all, aside from basic static validation.

However, because the idea of "ephemeral" is not available in the rest of the language, it's tough to actually benefit from this ephemeral. This prototype aims to introduce "ephemeral" as a cross-cutting concern supported broadly across the language.

Ephemeral Values

The most fundamental idea is that values used in expressions can either be ephemeral or non-ephemeral. This is an idea similar to "sensitive" in that Terraform will perform dynamic analysis such that any value derived from an ephemeral value is itself ephemeral. Ephemeral values can then be used only in parts of the language which would not require persisting the value either between the plan phase and the apply phase, or from one plan/apply round to the next.

Considering only pre-existing language features, ephemeral values can be freely used in provider blocks, provisioner blocks, connection blocks, and in local values. The following sections describe some new additions that either accept or produce ephemeral values.

resource blocks (aside from special nested parts like the aforementioned provisioner blocks) do not accept ephemeral values, because preserving resource configuration unchanged between the plan and apply phases is a fundamental part of how Terraform works to keep its promise of either doing what the plan described or returning an error explaining why that's not possible.

Because ephemeral values are not expected to persist from plan to apply or between plan/apply rounds, there is no need to save them in saved plan files or state snapshots, thus finally giving a plausible answer for what to do about #516, which has been on my mind since long before I worked at HashiCorp.

Ephemeral Input Variables

An ephemeral input variable is, in the most general terms, just an input variable that is declared as accepting ephemeral values. A non-ephemeral input variable cannot accept ephemeral values, while an ephemeral value will accept both ephemeral and non-ephemeral values but the value will always be treated as ephemeral when used inside the declaring module.

The main interesting case is when a root module declares an ephemeral input variable. In that case, Terraform will no longer remember the value for the variable provided during planning and will instead expect any ephemeral variable set during the plan step to be provided again -- possibly with a different value -- during the apply step.

The primary goal of this is to be able to use input variables to set arguments in ephemeral contexts. For example, an input variable that's both ephemeral and sensitive could provide a JSON Web Token to be used when configuring a specific provider, and then automation around Terraform could provide separate JSON Web Tokens across the plan and apply phases so that the apply phase isn't subject to the expiration time for the plan-time JWT, and so that the plan-time JWT doesn't get persisted to disk as part of a saved plan.

Ephemeral Output Values

An ephemeral output value is essentially the opposite of an ephemeral input variable, allowing a module to expose an ephemeral value to its caller. As with input variables, a non-ephemeral output value will reject having an ephemeral value assigned to it. An ephemeral output value can have both ephemeral and non-ephemeral values assigned to it, but the calling module will always see it as ephemeral.

To start the utility of this is limited just to echoing back values derived from ephemeral input variables, since nothing else I've described so far actually produces ephemeral values. However, allowing this is important to ensure that ephemeral values are supported symmetrically and will cooperate well with all other language features.

Ephemeral Resources

The final idea in this prototype -- one which this prototype probably won't explore fully just yet, and introduce only just enough to validate that it fits in well with everything else -- is a new resource mode for representing remote objects that are ephemeral themselves.

Terraform currently has two "resource modes": managed resources (resource blocks) describe objects that Terraform is directly managing, while data resources (data blocks) describe objects that are managed elsewhere that the current configuration depends on. But in both cases the assumption is that those objects persist in some sense from plan to apply and from one plan/apply round to the next, and that Terraform is supposed to detect and react to any changes to those objects and therefore needs to persist information about them itself.

Ephemeral resources, (ephemeral blocks) on the other hand, represent objects that -- at least, as far as Terraform is concerned -- exist only briefly during a single Terraform phase, and then get cleaned up once the phase is complete. This idea is an evolution of some much earlier design work I did before I even worked at HashiCorp 😀 in relation to #8367, which was about establishing temporary SSH tunnels, and the HashiCorp Vault provider I wrote in #9158 (which evolved into today's official hashicorp/vault).

The general idea of ephemeral resources, then, is that their lifecycle includes three events:

  • OpenEphemeral: Prepares the object for use. For some kinds of objects this would represent a "create" action, but for others it might just open a temporary session to something that already exists, such as in the SSH tunnel use-case.

    This operation is the one that establishes the result attributes that can be accessed from other parts of the module where the resource is declared. All of these results would be ephemeral values, so that they can vary from plan to apply. For example, opening an SSH tunnel is likely to cause a different local TCP port number to be allocated each time, and so consistency between plan and apply phases is not expected.

  • RenewEphemeral: Some ephemeral remote objects need to be periodically refreshed in order to stay "live", such as leases for Vault secrets.

    This optional operation is therefore opted into by the provider's OpenEphemeral response, by providing a private set of data that should be sent back to the provider's RenewEphemeral implementation and a deadline before which Terraform must renew it. The provider can then do whatever is needed to keep the object from expiring, and optionally return another renew request with a new deadline in order to repeat this renewal process.

  • CloseEphemeral: Once Terraform has completed work for all objects that refer to the ephemeral resource, this operation gives the provider an explicit signal that the object is not longer required so that it can be promptly destroyed or invalidated.

    This detail is particularly helpful for the Vault provider and fixes a limitation I ran into immediately back in 2016: a dynamic secret fetched using a data block can never have its lease explicitly terminated, because data resources were intended only to read information about an object someone else is managing, not to directly manage an object (a Vault lease).

Because the results from ephemeral resources are ephemeral values, they're primarily useful in configuration for other ephemeral objects: provider blocks, provisioner/connection blocks, and of course other ephemeral blocks.

Actually changing the provider protocol and implementing real providers is not in scope for my initial prototyping work here, and so I intend to prototype this in a more limited way that just emulates how this mechanism might behave, so we can see how well it interacts with the rest of the language and the other ephemeral values discussed here.

I've also been considering a mechanism to allow managed resource types to declare individual arguments as being "write-only", such as for an RDS database password that only needs to be provided during creation and should not be provided again unless the operator actually intends to reset it. I don't intend to prototype that in here, but I intend to lay the foundations for it by having a convention that ephemeral input values and write-only arguments both treat null as meaning "don't set or change" and non-null as "set or change", thereby creating a small imperative-shaped niche in the otherwise-declarative Terraform Language to allow for using Terraform to manage objects that have write-only (typically, sensitive) arguments without needing to persist them in plan and state.


I'm still working on this, so not everything described above is in here yet, but the foundations for ephemeral values themselves are already in. I've opened this draft largely just because I need to put this work down for a while for a team offsite and don't want to lose the context.

For any request that can occur during the planning phase there is a chance
that either a resource configuration or its associated provider
configuration will contain unknown values that are placeholders for
results of operations that haven't yet completed.

Ideally a provider would be able to just do its best to predict the
outcome in spite of the partial information, but in practice that isn't
always possible. In those more complex situations it's better to let the
provider explicitly decline to complete the operation and have Terraform
Core defer it for a future run when there's hopefully more information
available due to having applied other changes upstream.

This commit does not yet introduce the idea of "deferred changes" into
Terraform Core, so as a temporary step Terraform Core will just return
an error if a provider tries to defer anything. In future commits we'll
teach Terraform Core how to handle this more gracefully by saving partial
results into the plan as "deferred changes" and then continuing on to
downstream resources to try to gather as much information as possible to
help the user understand the likely effects of those deferred actions.
This represents the two address types that could potentially have deferred
actions associated with them during a Terraform plan operation, because
deferring can happen either before or after instance expansion.
Previously we just immediately bailed out with an error if either count
or for_each were not sufficiently known to determine their full set of
instance keys.

The Expander abstraction can now talk about module calls and resources
having unknown expansion, so Terraform Core should tolerate that situation
and just let the expander know that the expansion is unknown, and then
we'll deal with that situation downstream.

For now "downstream" actually means directly after these functions return,
because the rest of Terraform Core isn't yet ready to deal with objects
that don't know their full expansions. We'll just return errors similar
to (but slightly lower quality than) the ones we used to return during
evaluation, as a temporary placeholder to keep things working until we
get downstream more ready to deal with this.

While working on this I also noticed that we were redundantly
re-evaluating the for_each expressions for each resource instance just to
prepare the repetition data, which is unnecessary because the Expander
abstraction already keeps track of that to ensure that all of the graph
nodes have a consistent view of the expansions. We'll now just ask the
expander directly what our RepetitionData should be, since that's part
of the expander's responsibility.
Traversing upward from a PartialExpandedModule is trickier than traversing
down because we need to deal with what happens if the traversal crosses
over the boundary from partial-expanded into fully-expanded.

To deal with that we end up having two different methods to handle the
two situations, and a third method to indicate which one to call.
Thankfully the need to ask for the parent of a partial-expanded module is
relatively rare -- mainly just for input variables whose definitions need
to eval in the parent module's scope -- so this awkward API shouldn't be
needed in two many places.
This is mainly just a proof-of-concept of what it might look like to
generate graph nodes representing placeholders for objects in
not-fully-expanded modules. These new codepaths are not really accessible
yet because it's still invalid to have a module whose expansion is
unknown; we'll continue down this path further in later commits once
there's actually somewhere to save the partially-evaluated placeholder
values.
Our evaluation strategy for module-namespaced objects unfortunately
depends quite strongly on having the right EvalContext in scope for each
graph node, referring to the appropriate namespace in which to evaluate
expressions.

Although I was pretty reluctant to integrate the idea of partial-expanded
module paths at quite this low a level, it does seem like the most
pragmatic answer since it works with rather than against the existing
evaluation strategies.

As of this commit this isn't really doing anything because it isn't
possible to reach any graph node that has a partial-expanded path and the
EvalContext itself doesn't actually properly support evaluation in a
partial-expanded path anyway; we'll fix up the rest of this in later
commits before making these codepaths reachable.
This replaces the direct manipulation of a map shared between three
different components, encapsulating that manipulation now inside a single
wrapping API that itself ensures safe concurrent access.

In future commits we'll do the same for local values and output values,
but for now those part of namedvals.State remain unused.
Now that we have the separate namedvals.State type to encapsulate all of
the named-value tracking we can simplify the EvalContext API to just
return that object directly.

This removes the slightly odd evolved API for setting and retrieving
input variable values, instead now just calling directly into the relevant
namedvals.State methods. It also slightly simplifies some of our test
code because there's no longer any need to mock accesses to what is just
a temporary in-memory data store anyway.

Finally, this now gives nodePartialExpandedModuleVariable somewhere to
save its placeholder values, though there's not yet anything to read them.
This is a new mode for the evaluator where instead of returning
information about exact objects it'll return placeholder values that
represent potentially many different hypothetical objects all declared
from the same static configuration object, in situations where we don't
yet have enough information to expand all of the modules and their
contents.

So far only the GetInputVariable function actually knows how to deal with
this, so this is far from sufficient but is a reasonable starting point
just to establish that it's possible to get Terraform into this evaluation
mode when working with graph nodes that represent such placeholder objects.
Back when we added local values (a long time ago now!) we put their
results in state mainly just because it was the only suitable shared data
structure to keep them in. They are a bit ideosyncratic there because we
intentionally discard them when serializing state to a snapshot, and
that's just fine because they never need to be retained between runs
anyway.

We now have namedvals.State for all of our named value result storage
needs, so we can remove the local-value-related fields of states.Module
and use the relevant map inside the local value state instead.
For any local value declared beneath a module call whose expansion isn't
known yet, we'll calculate a single value to serve as a placeholder for
all possible valid instances of that local value, using unknown values
in any situation where a value might differ between instances.
For a very long time we've had an annoying discrepancy between the
in-memory state model and our state snapshot format where the in-memory
format stores output values for all modules whereas the snapshot format
only tracks the root module output values because those are all we
actually need to preserve between runs.

That design wart was a result of us using the state both as an internal
and an external artifact, due to having nowhere else to store the
transient values of non-root module output values while Terraform Core
does its work.

We now have namedvals.State to internally track all of the throwaway
results from named values that don't need to persist between runs, so now
we'll use that for our internal work instead and reserve the states.State
model only for the data that we will preserve between runs in state
snapshots.

The namedvals internal model isn't really designed to support enumerating
all of the output values for a particular module call, but our expression
evaluator currently depends on being able to do that and so we have a
temporary inefficient implementation of that which just scans the entire
table of values as a stopgap just to avoid this commit growing even larger
than it already is. In a future commit we'll rework the evaluator to
support the PartialEval mode and at the same time move the responsiblity
for enumerating all of the output values into the evaluator itself, since
it should be able to determine what it's expecting by analyzing the
configuration rather than just by trusting that earlier evaluation has
completed correctly.

Because our legacy state string serialization previously included output
values for all modules, some of our context tests were accidentally
depending on the implementation detail of how those got stored internally.
Those tests are updated here to test only the data that is a real part
of Terraform Core's result, by ensuring that the relevant data appears
somewhere either in a root output value or in a resource attribute.

As of this commit, what remains of the states.State model can now be
entirely serialized by the state snapshot format, with no more situations
where we just silently drop some data that Terraform Core uses as an
implementation detail.
This will allow it to determine which instances _should_ be present rather
than just trusting which instances _are_ present, which will make it
harder to accidentally hide graph ordering bugs behind fallback behavior
and, more importantly, will allow the evaluator to recognize the
difference between there being no instances at all or the instance keys
not yet being known.
This is a totally different approach to GetModule which uses the
configuration and previously-registered expansion to determine what ought
to exist in our named values state, rather than treating the values in
the named values state as the source of truth.

As a result we get an overall simpler implementation which is able to
panic if other components aren't behaving correctly, and we can return
placeholder results in partial evaluation mode, at least as long as we're
working with a single-instance module.

There are some further opportunities for simplification and improving the
detail of the unknown results if we make broader changes in future, but
for the moment this is just enough to mimic the previous behavior using
a new strategy.
This doesn't actually do anything useful yet, but at least stubs out how
evaluation for these might work in later commits.
@apparentlymart
Copy link
Contributor Author

Ugh whoops I selected the deferred actions branch instead of the ephemeral values branch 🤦‍♂️

I'll figure out how to fix that later, but for now the real branch is f-ephemeral-values.

@apparentlymart
Copy link
Contributor Author

apparentlymart commented Apr 24, 2024

It seems like it isn't possible to change the source branch of a PR once it's created, so I'm going to close this one and open a fresh one with the same text but the correct branch. 😖

Edit: The new PR is #35078

Copy link
Contributor

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant