Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/resource: consistent applicators #491

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sttts
Copy link
Contributor

@sttts sttts commented Jul 31, 2023

Description of your changes

This PR does two things:

  1. the update applicator is deprecated. The override of the resource version is removed, i.e. the update applicator will correctly conflict with any other actor writing to the object when the base version of the reconcile (from the informer cache) is outdated. With that the updates follow the optimistic concurrency protocol of the API: get, modify object, update, retry on conflict.
  2. the patching applicator is changed to compute patches between the base version of the reconcile (from the informer cache) and the object to be applied. Because the applicator has no access to the former, it does another call against the cached client to get the current version. If that matches the resource version of the object to be applied, we are fine and can apply the computed patch to the apiserver. If the resource version differs, this is equivalent to a conflict error because the world has moved on during reconcile. That's returned as a conflict error to the reconcile loop.

Fixes #483:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

@sttts sttts requested review from a team as code owners July 31, 2023 11:19
@sttts sttts requested review from turkenh and jbw976 July 31, 2023 11:19
// An PartialApplicator applies changes to an object. The passed object can be
// partial, preserving parts of the object that are managed by other actors.
type PartialApplicator interface {
ApplyPartialObject(context.Context, client.Object, ...ApplyOption) error
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from discussion with @pedjak: an interface like that won't work for SSA as client.Object cannot be partial, maybe with exception of an *unstructured.Unstructured. Sending full objects to SSA won't end well.

@sttts sttts force-pushed the sttts-consistent-apply branch 2 times, most recently from d3a9ba0 to 520a765 Compare August 1, 2023 07:44
@sttts sttts changed the title WIP: pkg/resource: consistent applicators pkg/resource: consistent applicators Aug 1, 2023
@sttts sttts force-pushed the sttts-consistent-apply branch from 520a765 to 97f3757 Compare August 1, 2023 07:48
@negz
Copy link
Member

negz commented Aug 8, 2023

@sttts Could you update the PR description with a little more detail, please. Between your initial issue description and my lengthy comment we touch on quite a few things, and it's not obvious to me what parts of #483 this is intending to solve and how.

@sttts sttts force-pushed the sttts-consistent-apply branch 2 times, most recently from 764512b to 72874ef Compare August 29, 2023 15:52
pkg/resource/api.go Outdated Show resolved Hide resolved
@sttts
Copy link
Contributor Author

sttts commented Aug 30, 2023

This needs a proof PR against Crossplane. I have the suspicion that it breaks the composite reconciler as it relies on the inconsistent, additive behaviour.

@pedjak
Copy link
Contributor

pedjak commented Aug 30, 2023

Perhaps we can add a few e2e tests to validate that? (in crossplane PR ofcourse)

@negz
Copy link
Member

negz commented Aug 31, 2023

This needs a proof PR against Crossplane

Agreed. I haven't had time to review closely. I like the direction but suspect it could be breaking in a few places.

Comment on lines 51 to 56
func (a *APIPatchingApplicator) WithLogger(l logging.Logger) *APIPatchingApplicator {
a.optionalLog = l
return a
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is introducing a second way to pass options (builder style) in addition to the variadic ApplyOption style supported by the Apply method.

Given that:

  • As you mention LogDiff is expensive, so we want it to be done optionally.
  • LogDiff is called right after where we loop over the variadic option functions.
  • LogDiff has a very similar signature to the variadic option functions.

Could LogDiff itself be an option - e.g. p.Apply(ctx, o, WithLogDiff(l logging.Logger))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that in the beginning and forgot why I reverted it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main reason is that I couldn't find a case where I want per-call log settings.

pkg/resource/api.go Outdated Show resolved Hide resolved
Comment on lines 74 to 79
// Note: this check would ideally not be necessary if the Apply signature
// had a current object that we could us for the diff. But we have no
// current and for consistency of the patch it matters that the object we
// get above is the one that was originally used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we ab(use) the apply options to optionally provide the current object? e.g. p.Apply(ctx, desired, WithCurrent(observed)).

Presumably even if there was we'd still need to fall back to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly. Note though that our clients are cached. So it doesn't make a big difference.

@negz
Copy link
Member

negz commented Aug 31, 2023

This PR does two things

@sttts +1 for deprecating the APIUpdatingApplicator. That makes sense to me.

With regard to the APIPatchingApplicator changes, my understanding (as written up in #483 (comment)) is that this applicator does not have an optimistic concurrency issue at the moment. Is that understanding wrong?

Is the fact that the APIPatchingApplicator now computes patches between the base version of the resource and the object to be applied addressing an optimistic concurrency issue? Or is it instead intended to allow us to support deleting object fields, since it now creates a proper merge patch?

@sttts
Copy link
Contributor Author

sttts commented Aug 31, 2023

applicator does not have an optimistic concurrency issue at the moment. Is that understanding wrong?

It has a consistency issue though :) It does not apply what you tell it to apply. It additively merges your changes into the version on the server.

Or is it instead intended to allow us to support deleting object fields, since it now creates a proper merge patch?

Yes, it is.

But this of course means that it wipes everything not in the passed object. E.g. if you override the spec in the composition controller code to be equal to that in the Composition, it will wipe everything that a provider adds to the spec. To work around that we have to switch the compisition controller to server side apply. But that means that we will get rid of the patching applicator in that controller and talk directly via SSA to the server.

return errors.Wrap(a.client.Create(ctx, o), "cannot create object")
func (a *APIPatchingApplicator) Apply(ctx context.Context, obj client.Object, ao ...ApplyOption) error { //nolint:gocyclo // the logic here is crucial and deserves to stay in one method
if obj.GetName() == "" && obj.GetGenerateName() != "" {
return errors.Wrap(a.client.Create(ctx, obj), "cannot create object")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of error, would be nice to give some details about obj that could not be created?

return errors.Wrapf(a.client.Create(ctx, obj), "cannot create object %s of %s", obj.GetGenerateName(), obj.GetObjectKind().GroupVersionKind())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had added that context and removed it again because those error messages already contain the GR and name.

@@ -262,6 +273,30 @@ func UpdateFn(fn func(current, desired runtime.Object)) ApplyOption {
}
}

// LogDiff logs the diff between current and desired object.
func LogDiff(log logging.Logger, current, desired runtime.Object) error {
Copy link
Contributor

@pedjak pedjak Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since log can be nil, perhaps we could rename method toMaybeLogDiff to improve readability? Also, does it need to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -92,3 +93,20 @@ func (l logrLogger) Debug(msg string, keysAndValues ...any) {
func (l logrLogger) WithValues(keysAndValues ...any) Logger {
return logrLogger{log: l.log.WithValues(keysAndValues...)} //nolint:logrlint // False positive - logrlint thinks there's an odd number of args.
}

// ForResource returns logging values for a resource.
func ForResource(object client.Object) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that this is not used anywhere in the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but should be, see you comment above about the error context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

return err
}
}

if err := LogDiff(a.optionalLog, current, obj); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that diff compute is expensive, could we avoid doing it twice, in case when optionalLog is set? It is computed by client.MergeFromWithOptions(current, client.MergeFromWithOptimisticLock{})) anyway, so perhaps we can create a patch wrapper around it to cache the computed value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. done

@sttts sttts force-pushed the sttts-consistent-apply branch from f6b7ab7 to f9356f8 Compare September 1, 2023 14:27
@sttts
Copy link
Contributor Author

sttts commented Sep 1, 2023

Addressed comments.

Also extended tests.

func (a *APIUpdatingApplicator) Apply(ctx context.Context, obj client.Object, ao ...ApplyOption) error {
if obj.GetName() == "" && obj.GetGenerateName() != "" {
log := a.log.WithValues(logging.ForResource(obj))
log.Info("creating object")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps debug level?

Copy link
Contributor Author

@sttts sttts Sep 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is intentional. This is progress output, not debug.

Note: there is no output if nothing happens, i.e. in the steady state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, would this mean we would have one log line per created object during their creation? Which I believe would still be too verbose for default logging and not inline with the existing logging behavior in XP.

Also, there is another client.Create line below, why don't we have a similar log line below? Are they different cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there is another client.Create line below, why don't we have a similar log line below? Are they different cases?

fixed.

So, would this mean we would have one log line per created object during their creation?

yes

Which I believe would still be too verbose for default logging and not inline with the existing logging behavior in XP

It's intentionally not inline with the existing logging behaviour. The current behaviour is not useful.

Counter question: why is it ok to log the same in the apiserver, but not where it matters for understanding what a component does?

How many creates or updates do we expect in a reasonably big XP installation on average? My claim: the system is in steady state 99% of the time without any update to the apiserver.

Copy link
Member

@turkenh turkenh Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intentionally not inline with the existing logging behaviour. The current behaviour is not useful.

I believe this deserves some upfront discussion and alignment as it contradicts the current observability guide, for example:

Almost nothing is worth logging at info level. When deciding which logging level to use, consider a production deployment of Crossplane reconciling tens or hundreds of managed resources. If in doubt, pick debug. You can easily increase the log level later if it proves warranted.

And I personally would agree with the current approach we're following as I believe events and conditions are better fit for emitting "per instance" information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have moved the logging parts into #532. Please continue discussion there. And yes, we need a bigger discussion around what to log.

I believe events and conditions are better fit for emitting "per instance" information.

I disagree. Neither events nor conditions are for CRUD or reconcile logging. Logs are lower level, component level, not API level.

pkg/resource/api.go Outdated Show resolved Hide resolved
log := a.log.WithValues(logging.ForResource(obj))
log.WithValues("diff", string(patchBytes)).Info("updating object")

return a.client.Update(ctx, obj)
Copy link
Contributor

@pedjak pedjak Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wonder in case of error, would be able to see something in logs and correlate it with the previous log line? Also, do we really want to emit log line on INFO?

Should we log here in case of an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the clients produce errors with context. No need to add another one.

Error processing and error logging I leave to the caller. controller-runtime will also log if the error is passed up. If the error is expected, the caller will do the thing and not log. We don't know here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, I would like to see this in action in a proof PR in Crossplane.

// This only works with a desired object of type *unstructured.Unstructured.
//
// Deprecated: replace with Server Side Apply.
func AdditiveMergePatchApplyOption(ctx context.Context, current, desired runtime.Object) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedjak @negz this should get us over the line in Crossplane until we have SSA implemented.

@sttts sttts force-pushed the sttts-consistent-apply branch 2 times, most recently from 7301d90 to bf9437f Compare September 2, 2023 20:09
@turkenh
Copy link
Member

turkenh commented Sep 4, 2023

Or is it instead intended to allow us to support deleting object fields, since it now creates a proper merge patch?

Yes, it is.

But this of course means that it wipes everything not in the passed object. E.g. if you override the spec in the composition controller code to be equal to that in the Composition, it will wipe everything that a provider adds to the spec. To work around that we have to switch the compisition controller to server side apply. But that means that we will get rid of the patching applicator in that controller and talk directly via SSA to the server.

The logic in the composite controller (also in the claim controller) heavily relies on additive behavior currently. Hence, it wouldn't be possible to switch to the patching applicator introduced in this PR without a good amount of refactoring/testing, which could be a similar amount of effort as switching to Server Side Apply. IIUC, this will work quite similar to the extract flow mentioned here.
So, I am wondering if it is worth the effort to switch to this or if should we simply invest in Server Side Apply instead 🤔

}
}

// NOTE(hasheddan): we must set the resource version of the desired object
// to that of the current or the update will always fail.
m.SetResourceVersion(current.(metav1.Object).GetResourceVersion())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now update call fails without a resource version:

metadata.resourceVersion: Invalid value: 0x0: must be specified for an update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the object passed to the applicator here is based on a Get, i.e. has a resource version. Any other use of the applicator is invalid.

Copy link
Member

@turkenh turkenh Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pass the current (which is a deepcopy of obj) to the Get. So, the object passed to Update does not have a resource version.

Copy link
Member

@turkenh turkenh Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, you mean "should be based on a Get" ? Confused because, currently this is not the case for most of the usages of Apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that AdditiveMergePatchApplyOption additively merges. This means that the RV from current is copied into desired. Hence, we always have a RV if Get has been successful. And if it hasn't, we create object. So we should be fine.

//
// Deprecated: replace with Server Side Apply.
func AdditiveMergePatchApplyOption(_ context.Context, current, desired runtime.Object) error {
u, ok := desired.(*unstructured.Unstructured)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

desired object is usually our wrapper types, e.g. composed.Unstructured or composite.Unstructured, so would fail here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrTooManyWrappers. Fixing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Dr. Stefan Schimanski <[email protected]>
@sttts
Copy link
Contributor Author

sttts commented Sep 4, 2023

The logic in the composite controller (also in the claim controller) heavily relies on additive behavior currently. Hence, it wouldn't be possible to switch to the patching applicator introduced in this PR without a good amount of refactoring/testing, which could be a similar amount of effort as switching to Server Side Apply. IIUC, this will work quite similar to the extract flow mentioned here.
So, I am wondering if it is worth the effort to switch to this or if should we simply invest in Server Side Apply instead 🤔

Hence, AdditiveMergePatchApplyOption to make this old behaviour explicit. We can fix the compositor with SSA in a follow-up.

@sttts sttts force-pushed the sttts-consistent-apply branch from 1b63ad4 to 79f1338 Compare September 4, 2023 08:34
@sttts sttts force-pushed the sttts-consistent-apply branch from 79f1338 to 6747cf0 Compare September 4, 2023 08:41
type Applicator interface {
// Apply updates the given object to exactly the given state. It conflicts
// if the resource version stored on the apiserver does not match anymore
// the one in the given object.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turkenh this comment and the one above are crucial. Any other behaviour must go into an ApplyOption.

@pedjak
Copy link
Contributor

pedjak commented Sep 4, 2023

The logic in the composite controller (also in the claim controller) heavily relies on additive behavior currently. Hence, it wouldn't be possible to switch to the patching applicator introduced in this PR without a good amount of refactoring/testing, which could be a similar amount of effort as switching to Server Side Apply.

Independently from the strategy, we need to create crossplane e2e tests, covering at least major usecases advocated to users - only then we can switch with confidence to an improved applicator/server-side apply.

@sttts sttts force-pushed the sttts-consistent-apply branch 5 times, most recently from 0aa37e8 to 2bf1830 Compare September 5, 2023 09:33
@sttts sttts force-pushed the sttts-consistent-apply branch from 2bf1830 to 9a0fe90 Compare September 5, 2023 10:06
@lsviben
Copy link
Contributor

lsviben commented Sep 3, 2024

Hey @sttts , is this PR still relevant?

@sttts
Copy link
Contributor Author

sttts commented Sep 3, 2024

The applicators are semantically as broken as before. So yes, it's relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pkg/resource: applicator breaks optimistic concurrency and overwrites values
5 participants