Deprecated

Hello. If you are reading this, you are likely interested in automating CloudBees CI. Please follow up with your CloudBees contacts (CSM, Account Exec, PS, etc.), as there are newer, better, and most importantly, supported ways of achieving the goals that this solution set out to solve nearly two years ago.

CloudBees CI, Entirely As Code — An Opinionated Approach

Many of us in the Services organization at CloudBees have worked closely with customers to provide guidance on designing, implementing, and operating their installation of CloudBees CI (CBCI, formerly known as CloudBees Core). While no two customers are alike, there are some approaches that seem to work better than others. This Opinionated Approach is based on several goals and guiding principles. If those are acceptable, this should help automate the operation.

This is part specification, and part documentation (and justification) of a reference implementation. If the goal is "everything as code" as the title claims, we also better provide some code!

The main vision is to showcase what's possible. Getting to "everything as code" here does mean there are some hard stops. Not everything in Jenkins or CloudBees CI is ready to be declared somewhere in a manifest file in git. Many parts of Jenkins still want their source of truth to be a filesystem. However, with the recent GA release of the CloudBees CasC for Masters feature, we now have a supported approach for configuring masters. Most of the components for this vision are now available. This is an attempt to put them together.

And fill some gaps.

CloudBees CI Modern

First thing to note: this is for "CloudBees CI Modern", the Kubernetes-based variant, and not "CloudBees CI Traditional". Kubernetes makes much of this possible, as it helps abstract many of the infrastructure concerns (though, they are concerns, indeed). Most of these opinions and approaches could be reworked into a Traditional install — but providing a reference implementation for it is less attractive, as the diversity in the underlying infrastructure, configuration management and other disparate tooling, and operation of it all, is much, much greater. This is part of the appeal of Kubernetes.

Abbreviations

CJOC, OC = Operations Center MM = Managed Master TM = Team Master JCasC = OSS Jenkins configuration-as-code plugin/solution. CasC = CloudBees CasC for Masters (CloudBees-specific solution built on top of JCasC)

Goals, Guiding Principles, Decision-making

A single file to declare an entire installation.
Updating this single file is the mechanism to affect the state of the cluster.
A recognition that CloudBees CI is never the only piece of software installed into a single cluster.
No restarts of CJOC or a master to ensure an install is "complete". Only exception here is when Jenkins just naturally requires a restart (e.g. after a plugin installation).
No manual interventions during install/upgrade.
No deriving new docker images.
No forks of an upstream helm chart.
Uses only Jenkins/CloudBees CI/Kubernetes/helm primitives and templating via Go Templates in helm and helmfile. We agument and fill gaps where we have to (pre-pulling extra plugins on CJOC, JCasC on CJOC, Groovy scripts where JCasC doesn't work, etc.)
Favor self-configuration. As an example, we show preference to mount a groovy script to a master so it boots up and configures RBAC itself, versus booting the master and subsequently calling its REST API.
No admin access to OC or Masters for anyone outside the team that operations this installation. All RBAC is programmatically configured and all access is assigned from the level of a folder inside a master.
No one should ever need to go into the CJOC UI (and possibly never a Master's UI).
Self-servicing a master should be simple and secure.
Make decisions that maintain the supportability of the product, while enabling functionality that might not be present out of the box.
Scaling, and testing CloudBees CI at scale, is a first-class concern.
Begin treating masters like cattle.

Gaps

What are the gaps we're filling? How difficult would it be to reproduce these unsupported automations manually given the need for support?

Note that many of these are easy to reproduce individually. Taken together, it might require an engineer a good amount of time to effectively rebuild an environment manually. Pehaps that illustrates the benefits, though.

Operations Center Programmatic Install

Currently you cannot install CloudBees CI without human interaction. There are serveral components that require custom automation:

Skip Wizard

First things first, we skip the installation wizard with a cli flag (-Djenkins.install.runSetupWizard=false), as it is a manual interface.

License

We apply a license with a groovy init script. There are other approaches to this, but in this case the groovy script is super simple. While groovy is not supported, this relatively innocuous script is probably of little harm.

Tier1/Tier2 plugin installation

We use a special flag (-Dcb.IMProp.fsProfiles) and mount a specific file to tell the OperationsCenter to install a certain set of plugins from the envelope when booted.

Tier3 plugin installation

Because the OC allows you to install tier 3 plugins from its Update Center after boot, having these installed programmatically (again, without restart) is required. We achieve this by detecting the version of Operations Center ahead of time (via the helm chart app version), and using that version, download the plugin directly from that UC and mount it in a PV that will be mounted to the container. This process is contained in the oc-extra-plugins chart.

Jenkins Configuration

If you want to automate the setup of various System Configuration of the underlying Jenkins (and its plugins) for the Operations Center, you have two options: 1) Groovy scripts, or 2) JCasC. Both are unsupported.

The general direction of the product is moving toward JCasC support in the OC. While it's not yet supported, it is intended to be. Therefore my decision here was to pick using JCasC where it works now (e.g. Auth Realm setup), and using Groovy scripts where it isn't (e.g. enabling RBAC and initial configuration thereof). Hopefully this parlays easily into JCasC support when it lands with fewer changes needed.

Managed Master Provisioning

Typically we provision each individual MM via the Operations Center Managed Master Provisioning UI. This requires a human to click buttons and type in each specific input the desired configuration of a master, then boot it.

Programmatic Creation/Update of Managed Masters

We solve this by implementing a "shim" (a groovy script) that interfaces with the underlying Operation Center master provisioning plugin. It attempts to reproduce the same method calls the UI does to create+start/update+restart a MM.

The goal is to define a master as code, then define ALL masters as code, then use that definition of state to allow some other process to ensure that state is reconciled in the runtime. If this sounds familiar to you, you might understand the Operator/Controller pattern in Kubernetes. This is precisely the direction taken here. In the interests of saving time and effort as a PoC, I have used a very simple operator framework to trigger the shim script when the configuration of the masters changes:

https://github.com/flant/shell-operator

https://github.com/kyounger/shell-operator-derivatives

CasC Automation (Master Templating)

The new CloudBees CasC for Masters feature is what ultimately enables this approach. The CasC bundle needs to be defined on the OC before the MM is instantiated. Additionally, for Tier 3 plugins, you are required to manage which versions from the UpdateCenter are going to be installed, as well as any transitive dependencies for that plugin and their versions.

Handling these setups proves the need for this automation; it's easily fat-fingered.

TODO: elaborate more about how the templates work.

Tier 3 plugins require a different approach. The entire MM Update Center at the current version is downloaded. This allows us to then use that payload as an envrionmental values datasource in helmfile, which allows us to determine the version to specify in plugin-catalog.yaml programmatically. This is literally the same thing that Jenkins does when you click through the UI to install a plugin. Sadly, solving the transitive dependencies problem is not yet part of this solution.

Folder/Item Definitions

Any item in a MM will need to be defined as code. Currently we use job-dsl to create folder structure and GH org folders.

Another item to note is that "AppIDs" equate to a specific application identifier. This seems to be common in large, well-controlled organizations. We define that as a first class citizen of a solution. If you define an AppID to reside on a master, then it will get a folder created and associated RBAC applied and a Github Org folder inserted into it, linked back to the expected location for that AppID's repos to be defined.

Automated RBAC applied to Folders

Currently we use a groovy script to apply RBAC to folders that are expected to be there.

What exactly is "as code"?

All of it

The intention is that everything is as code. There should be no need to manually configure anything via the UI of CJOC, nor a master.

Ultimately, the flow of creating everthing will be:

git clone ...
vim cloudbees-ci.yaml #where you specify cluster-specific details. vim not required ;)
helmfile sync

And once installed, repeated use of:

vim cloudbees-ci.yaml
helmfile apply

to adjust cluster state. This last process is also very easily wrapped into a CI/CD pipeline.

Components

Infrastructure
- GCP terraform module for CloudBees CI provides an example of how to provision a Kubernetes cluster. Creates all the necessary GCP resources, the GKE cluster and its NodePools.
- GCP/GKE are not required. Any Kubernetes cluster 1.15 or later should [theoretically] work with this approach. Currently only tested in GCP/GKE.
- TODO! This component is not published yet, but can be shared if needed.
Applications
- All applications (and supplemental resources) are packaged as helm charts.
- helmfile declaratively manages all the helm chart installations
- Helm Charts
  - CloudBees CI (cloudbees-core) helm chart
    - CJOC configuration and plugins
    - All components of CJOC are declaratively managed as code.
    - Masters are defined as code.
    - Masters can be templated.
    - Item definitions
  - cert-manager
    - Provides TLS certificates via Let's Encrypt, or any other ACME-compliant Certificate Authority
  - nginx-ingress
    - The only supported ingress controller. Works well, good support and documentation.
  - prometheus
  - grafana
  - openldap
    - Probably the most common authentication realm used with CloudBees CI. Also open source and easily configured. A good pick for a reference implementation.
  - helper charts via the incubator/raw chart — these will be explained in detail later.
    - oc-extra-plugins
    - cm-cluster-issuer
    - master-definitions

What is helmfile? Why is this needed?

Helm has become the defacto standard for packaging applications deployed into Kubernetes. Under the hood, the helm chart is basically a collection of yaml templates. And the helm (v3) client is basically a client-side yaml templating engine, coupled with a deployment mechanism. Yaml templating is important because it allows reuse of top-level values as a sort of "public API" for the chart. The chart can then be reused, the underlying implementation refactored, or backward/foward compatible changes introduced.

So many charts

This is pretty fantastic until you start needing to:

Define all aspects of the helm upgrade as code (including chart version, repo locations, namespaces, etc.).
Operate across multiple environments the helm installation of more than a few charts (i.e. hello makefiles and shell script loops).
Run these upgrades through a CI/CD pipeline.
Include certain charts (a tool-specific UI) in one environment (test) and not in another (prod).
Coordinate value changes across multiple charts. In particular, if there are calculations needed from one value in one chart to produce another value in another chart.
You would like to make a particular template for the values of a helm release.
Define a dependency graph for your charts.
Augment a chart with custom resources.

There are a few tools/approaches out there that can help solve some of these issues. Let's look at some of the choices here to understand why I picked helmfile.

Options to address the problem

Shell scripts and makefiles

No, thanks.

Aggregate helm charts

This works fairly well, especially for very simple needs — the CBCI helm chart even aggregates two other charts (nginx-ingress & cloudbees-sidecar-injector) — and provides some of the needed functionality for coordination across multiple charts.

Downsides are that a helm list only shows one release, and that release can only go into a single namespace. Each time you do a deployment, you update all the charts. Each time you do a rollback you have to rollback every chart. You are usually still forced into using some kind of external tool for CI/CD pipelines to manage the helm upgrade flags as code.

Helmsman

For me, part of the attractiveness of helmfile was that it was basically a "superset" of helm. All the same yaml templating works as you'd expect. Helmsman did not have templating and used toml as file format. We are already adding another tool, I would prefer not to add another data format. To that end, I didn't use it, so it might be great!

Helm Operator / Flux

I felt these were a bit too heavy for what we're trying to accomplish. Additionally, the Helm Operator has to be installed... with helm. This sort of defeats the purpose here, in my mind. However, we wholeheartedly recommend a GitOps approach.

Helmfile

As I mentioned above, helmfile could be well-thought-of as a "superset" of helm. It's largely an additional Go Templating engine layer to produce a list of helm releases + values files via that templating engine. If you're comfortable with developing helm charts, helmfile is a very short learning curve.

My experience with this tool is that it seems to check all the boxes. My only complaint is that it is definitely "in active development". I wouldn't recommend you upgrade it willy-nilly and definitely click on the "Follow" button on GitHub to be sure you are keeping abreast of changes.

Helmfile is a good approach to operating helm installs via the apply command in a CD pipeline.

Terraform helm/helmfile providers

What about the helm or helmfile providers that are available for terraform (or using any other IaC tool)?

That is a reasonable approach. The perspective here is that the applications are managed separately from the infrastructure. But sometimes that overlaps. Nothing wrong with having terraform call helmfile if that works for you. There is also a terraform-helmfile-provider that might be worth looking into if you have interest in really hooking these tools together.

A single command of helmfile apply (possibly with the added -e flag, to specify an environment) is the goal. This enables effective long term operation of the cluster and forces this to be the only imperative interface. The terraform module outputs a yaml file that can be fed as an external environment values file to helmfile. However, we don't expect much of the state of the cluster to change to the degree that hooking these tools together would provide long-term continous benefit.

Conclusion

Helmfile was a natural fit and works well for this use case. These other tools are great and have their place, and might do a better job given a different set of requirements. If you like them, use them! This is an opinionated approach, so sticking with that theme: helmfile it is.

Gaps

Helmfile also allows us to fill some gaps without forking the cloudbees-core helm chart (or others). E.g. defining masters and being able to install resources specifically for them, adding the extra plugins to the OC, etc.

Caveat Emptor

Supported?

Not all of this is supported by CloudBees. Part of the intention behind this effort, again, is to showcase what is possible, not provide a 100%-supported solution based on 100%-supported components. That is not currently possible. Examples of what is not supported would be any of the Tier 3 plugins (e.g. job-dsl, prometheus), any of the groovy scripts, putting the configuration-as-code plugin on CJOC, etc. So, user beware.

Be Ready

If you do have a bug that you think is attributable to the underlying product, and not a filled gap, then the best situation to get support on that is to replicate the bug in a system that doesn't use the gap fillers. This means installing with no groovy auto-configuration, no jcasc on OC, provisioning masters manually, creating items manually, configuring RBAC manually, etc.

You can decide if the benefits of using this fully-automated system outweigh the costs of potential out-of-support pitfalls.

Getting started

Prerequisites

Ensure you have a GKE cluster created and set as your current context.
Currently the installer expects a static ip for the nginx-ingress-controller LoadBalancer. This needs to be specified in the initial values.
DNS host entry that points to the static IP.
These cli tools are used and need to be installed: helm (v3), helmfile, jq, yq, and wget (used for fetching tier3 plugins)
The helm-diff plugin must be installed.

Create

Clone this repo.
cp cloudbees-ci-template.yaml cloudbees-ci.yaml
Edit any values you need to, inserting license, defining master templates, masters, etc.
Run helmfile template to see if all is configured correctly. Not explicitly necessary, but a nice check before running the next step.
Ensure an environment variable named ENV_ABSTRACTED_PW is set to some random secure value and is exported.

For demonstration purposes this is how we set all passwords in a reasonably secure manner. In the TODO list is implementing external credentials — which is clearly needed for any sort of proper implementation.

Run helmfile sync

(or export ENV_ABSTRACTED_PW=somerandomvalue helmfile sync if you don't have/want the env var exported in your shell). We have to use sync instead of apply on the first run in case there are any CRDs that need to be installed into the cluster — these will fail on a apply/diff, because it can't validate the generated resources against their required definitions.

Wait for the install to finish and log in.

Update

Make an edit to your cloudbees-ci.yaml file.
Run helmfile apply to see the changes apply.
helmfile apply is used after the initial install, unless you are adding CRDs AND those CRDs are used by a chart in the same operation. Our first install is exactly this, because of the cert-manager CRDs that need to exist.

Notes

cloudbees-ci.yaml is ignored in git.
the ENV_ABSTRACTED_PW value is what defines all passwords in ldap and a few other places. Once an external credential provider is implemented, this will go away.

Reference

Inheritance works, but there is are only three layers.
1. a default masterTemplate that all other templates inherit from, and
2. a masterTemplate that is defined that can specify deltas on top of the default
3. specific master definitions can specify their own deltas
Each master must specify a masterTemplate, even if it is the default.
The manyMasters section allows for massive scaling of masters. Not entirely sure if this is effective for production use, but the intention of including this here is for testing scaling.
You only need to specify plugins in the list. Each is specified as the key in a map with a value {version: auto}. This is required currently, and cannot be changed.
PluginCatalogs are derived based on the CloudBees UpdateCenter for ManagedMasters (envelope-core-mm) for the current version of the chart/app. If a plugin is not specified in the envelope, it is added to that master's PluginCatalog as the version specified in the UC.
Transitive dependencies are STILL not calculated. (You try walking that tree in go templates!)

These are properties you can set on the provisioning section of a master template/definition. You have to be careful with some of these.

allowExternalAgents: false, //boolean
clusterEndpointId: "default", //String
cpus: 1.0, //Double
disk: //Integer
envVars //String
domain: "readYaml-custom-domain-1", //String
fsGroup: "1000", //String
image: "custom-image-name", //String -- set this up in Operations Center Docker Image configuration
javaOptions: "${KubernetesMasterProvisioning.JAVA_OPTIONS} -Dadditional.option", //String
jenkinsOptions:"", //String
kubernetesInternalDomain: "cluster.local", //String
livenessInitialDelaySeconds: 300, //Integer
livenessPeriodSeconds: 10, //Integer
livenessTimeoutSeconds: 10, //Integer
memory: //Integer
namespace: null, //String
nodeSelectors: null, //String
ratio: 0.7, //Double
storageClassName: null, //String
systemProperties:"", //String
terminationGracePeriodSeconds: 1200, //Integer

(Technically there is a yaml definition, but that is not accessible since we use it. Merging that might work?)

Known Issues

The master provisioning script is run by a kubernetes job after CJOC starts. There is need to determine how to run this. Implemeting a simple controller would be appropriate based on changes to master-definitions configmap. Long-term, we hope that masters are defined by a CRD or another helm chart.
CasC seems to require a definition of a plugin-catalog in the bundle. You can't leave the plugin-catalog.yaml file empty, though. Workaround for now is to make sure you are specifying a plugin catalog for all CascBundleTemplates, even if you aren' adding that plugin in the plugins.yaml file.
Inheritance works for provisioning, plugins, and any nodes within jcasc that are NOT lists. The merging function used considers a list the leaf of the merge tree and will replace it entirely.

Breaking down how it works

TODO: need to work through how to explain each part of the solution

Evolutionary approach to how I got this to where it is.
Start small and show how to build it up.
Maybe point out specific components that are "hard" as-code and explain that gap and the approach to fill it.
How to use various command as tooling to help things (hf diff, hf --debug to show values files)

Some fleshed out sections that need to be inserted above where it makes sense:

Why all the focus on AppIDs?

A common paradigm used by many organizations is to control applications/components that can be delivered into production by requiring a process to request an ID from a governance system. This ID controls the entire lifecycle of the application, as well as authorization and authentication mechanisms typically via an Identity Provider. The idea is that a new developer or team lead can be onboarded to an application very simply, and the tools (e.g. CloudBees CI, etc.) only accept valid IDs to be included in the CI/CD pipelines. No application can read production unless it goes through this governance process.

Why do we break up AppIDs across masters? Why wouldn't we just put them all onto a single master?

Security

Depending on your level of trust and threat modeling, you may be comfortable running fewer masters. Part of the consideration here is that CloudBees CI Modern runs on Kubernetes and securing workloads in Kubernetes requires us to implement certain design patterns. For example, we put CJOC, each master, and each master's agents in their own respective Namespaces and give them their own Service Accounts (note, this is still TODO). This might seem like overkill to some (and for some it might be), but to ensure masters (and their agents) are isolated from each other, this is required. Additionally, a master's agents typically have a lower trust threshold and therefore are run in their own namespace with their own Service Account.

Resource Contstraints

Masters can be resource constrained. It's not common as long as we're following best practices with regard to pipeline development (i.e. zero or few plugins that affect the pipeline, declarative syntax, etc.), but the potential is there for a team to build a pipeline, and run enough of them concurrently, that the resources consumed by the agents could create a noisy-neighbor situation. Putting each master's agents into their own namespace, and resource constraining that namespace can be useful. Since we're automating all of this, it's also pretty trivial to add in as part of the design.

Single point of failure

Masters are a single point of failure.

System/Plugin Configuration Conflicts

If you've run a large Jenkins instance with a lot of disparate usage, you have experienced this: one team needs to upgrade a plugin, or affect some system config, but doing so will break another team's usage of that master. If a single team does need to use very specific plugins (we advise against this, but maybe the fight just isn't worth it 😉), then splitting them off into their own master can be a way to shield other teams from this. Additionally, it is not uncommon for a team to see a plugin installed and just start using it.

Why all this code?

Security
Stability
Gitops / Cloud-native mentality
DR/BR concerns
Controls / Audit
Automate everything, so none of this is in people's heads
Replicate an environment easily to a lower environment for testing.
Dev/Test/**/Prod environment parity is achievable and provable.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
values		values
.gitignore		.gitignore
LICENSE		LICENSE
README.MD		README.MD
cloudbees-ci-template.yaml		cloudbees-ci-template.yaml
helmfile.yaml		helmfile.yaml
sane-defaults.yaml		sane-defaults.yaml

License

kyounger/cbci-helmfile

Folders and files

Latest commit

History

Repository files navigation

Deprecated

CloudBees CI, Entirely As Code — An Opinionated Approach

CloudBees CI Modern

Abbreviations

Goals, Guiding Principles, Decision-making

Gaps

Operations Center Programmatic Install

Skip Wizard

License

Tier1/Tier2 plugin installation

Tier3 plugin installation

Jenkins Configuration

Managed Master Provisioning

Programmatic Creation/Update of Managed Masters

CasC Automation (Master Templating)

Folder/Item Definitions

Automated RBAC applied to Folders

What exactly is "as code"?

All of it

Components

What is helmfile? Why is this needed?

So many charts

Options to address the problem

Shell scripts and makefiles

Aggregate helm charts

Helmsman

Helm Operator / Flux

Helmfile

Terraform helm/helmfile providers

Conclusion

Gaps

Caveat Emptor

Supported?

Be Ready

Getting started

Prerequisites

Create

Update

Notes

Reference

Known Issues

Breaking down how it works

Why all the focus on AppIDs?

Why do we break up AppIDs across masters? Why wouldn't we just put them all onto a single master?

Security

Resource Contstraints

Single point of failure

System/Plugin Configuration Conflicts

Why all this code?

TODOs

In general order of priority

What else would we want that this doesn't cover?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages