Principle 1: should the deployment request also declaratively express the new desired state? #186

jviault · 2024-05-16T20:33:07Z

jviault
May 16, 2024

Hi all,

1st message here and I have a question regarding Principle 1 following a discussion in our company that, to keep it short, is around "What is GitOps?"

For the waaay longer version (and pardon my grammar, I am not an English native) I'll start by giving a bit of context.
We are a rather big company (~20k people, half of them IT engineers) transitioning to GitOps. As you can imagine from this description: we have some strong processes to comply with, a few central empowered teams, CI/CD pipelines, and fallback capabilities not triggered by CI/CD. The reason for the last one, before you ask, is that a CI/CD pipeline can be several hours long and in case of a critical incident, we must solve it very quickly - to give an idea, for some of our services, 15min is already too long.

When you mix that with GitOps, a few things immediately pop into your mind regarding Principle 1 (for 2-3-4, let's say we covered them thx to Git + ArgoCD)
The first thing is realizing that when you change the software source code, you never express this new desired state of the system. The CI/CD pipeline bakes the new SW version and expresses the new desired state on your behalf.

As I said earlier, we have strong processes in place, such as the infamous change management, and so for the rest of the discussion, I would like to distinguish 2 parts in the CD to be sure we all have the same understanding.

the deployment: alter one given environment (to simplify either test, staging, or prod) and ensure the change on that env is approved and communicated. It is also the thing that you fall back, as part of the standard procedure to solve an incident
the releasing process: bake and promote the new version up to prod. Release obviously uses Deploy (several times), and if the release process can also auto-approve the change on each environment because it is a standard one, then it is even better (and necessary to achieve CD)

So far so good I hope, I think we are still in a rather common pattern.

And that is the starting point of our discussion: should we care (from a GitOps Principle 1 perspective) who, between the release part and the deploy part of the automation, computes the new desired state? It is all automated anyway.

As long as things stay fully automated and always go through the CI/CD pipeline, I agree: we do not care.
But realistically, "shit happens", and sometimes it is just that "this one is too risky". So you have to do a manual deployment once in a while. Now, knowing that when you request such a deployment, your sole gatekeeping is a human review & approval, and so the remaining automation will only accelerate any mistake you may make, not prevent it: should it matter that you - human - do not declaratively express the new desired state of the system, as long as the tooling behind the deployment does it for you?

Maybe to give a more concrete example of what it looks like, the idea would be that instead of the usual pull request on a repo containing the system definition, you only request a tool to deploy the new version of this service (which is only a subpart of the system) in that environment (whatever it is), and the tool finds which change to do in the Git repo and performs it directly. Then ArgoCD reconciles the system.
A bit like what Flipt presented last GitOpsCon Europe, except that in my case the manual review and approval are done before saving the "cup edit", and saving will directly git-push - no PR

Some people in my company think we do not care and that it is still GitOps. Worth mentioning: those people are not "noob", I consider they have a good DevOps and GitOps culture (probably way more than me) and they regularly attend major conferences.

On my side, I initially thought that not having the new desired state expressed before the deployment request was not GitOps for sure! But, based on the searches I did the past couple of weeks, I could not find anything that would exclude this practice from being GitOps. To be fully precise, I also did not find anything mentioning this practice. But it is not because no one already thought about something, that it is wrong, right? :)

The closest things I found more or less related to this topic are:

this discussion around CICD and GitOps, but it did not go super deep: "manually update desired state files or environment… glaring issue" #40
something that would go against this model: the Space-Age GitOps presentation done during ArgoCon Europe 2024. But this proposal goes even further, putting Git as THE interface for GitOps.
(I searched more than on CNCF and OpenGitOps but, yeah, until now I truly did not find anything else that could help...)

I hope my message will trigger some discussion here too so that we can enrich either the definition, or, if valid, the GitOps patterns: #177

And I'll start by sharing my personal view and why I instinctively considered the suggested model as "not GitOps", before reading the current OpenGitOps definition.

IMHO, GitOps is primarily done for humans. Machines love imperative, humans prefer declarative. What would be the point of declaratively expressing the desired state of the system if, when you need it the most (manual, risky deployment), the future desired state of the system is not yet declaratively expressed? You do not do declarative for the machine :)
You can answer me that at least you can see the desired state computed by the deployment tool but I do not see the point. I consider that the tool handling your deployment is a black box: it has some characteristics, but it is a black box nevertheless, what is inside does not exist. Input, output. That is it. e.g. 99% of the people using ArgoCD would know its features & characteristics, a few would know how to operate it, but 0.001% would know how it works: black box. It takes a declarative state as an input, the output is that the system is continuously reconciled against this desired state. How does it do it? As a user, none of my business

Of course, during a critical incident, you'd still be able to open the box, access the declaratively expressed intended state generated by the deployment tool, and check what is wrong with it, even fix it directly. But that is the life during an incident here: you follow the procedure until the procedure does not suffice anymore, and you start trying all those exotic things you are not supposed to do because it is the last remaining solution.
So, for me, if you have to access this desired state only when an incident occurs, I think this principle fails contributing to the initial goal of GitOps: simplify our life and avoid incidents. But maybe I missed some other important benefits this principle brings? or just gave it too much importance? (having this desired state during an incident is already a benefit after all)

Ok so long first message, you know the context, you - I hope - get the question, you have my 2 cents - biased obviously, otherwise I would not ask for other opinions to mitigate mine :) - up to you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Principle 1: should the deployment request also declaratively express the new desired state? #186

{{title}}

Replies: 0 comments

Select a reply

Principle 1: should the deployment request also declaratively express the new desired state? #186

jviault May 16, 2024

Replies: 0 comments

jviault
May 16, 2024