Cloud-Native-And-DevOps

@@@@🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔@@@@

Kubernetes features (1.16)

What and why of orchestration

There are many computing orchestrators
They make decisions about when and where to "do work"
We've done this since the dawn of computing:Mainframe schedulers, Puppet, Terraform, AWS, Mesos, Hadoop, etc.
Since 2014 we've had resurgence of new orchestration projects because:
1. Popularity of distributed computing
2. Docker containers as a app package and isolated runtime
We needed "many servers to act like one, and run many containers"
An the Container Orchestrator was born
Many open source projects have been created in the last 5 years to:
- Schedule running of containers on servers
- Dispatch them across many nodes
- Monitor and react to container and server health
- Provide storage, networking, proxy, security, and logging features
- Do all this in a declarative way, rather than imperative
- Provide API's to allow extensibility and management

Major container orchestration projects

Kubernetes, aka K8s
Docker Swarm(and Swarm classic)
Apache Mesos/Marathon
Cloud Foundry
Amazon ECS(not OSS, AWS-only)
HashiCorp Nomad
Many of these tools run on top of Docker Engine
Kubernetes is the one orchestator with many distributions

Kubernetes distributions

Kubernetes "vanilla upstream"(not a distribution)
Cloud-Managed distros: AKS, GKE, EKS,DOK...
Self-Managed distros: RedHat OpenShift, Docker Enterprise, Rancher, Canonical Charmed, openSUSE Kubic...
Vanilla installers: Kubeadm, kops, kubicorn...
Local dev/test: Docker Desktop, minikube, microK8s
CI testing:kind
Special builds: Rancher K3s
And Many, many more..(86 as of June 2019)

Kubernetes concepts

Kubernetes is a container management system
It runs and manages containerized applications on a cluster(one or more servers)
Often this is simply called "container orchestration"
Sometimes shortened to Kube or K8s("Kay-eights" or "Kates")

Basic things we can ask Kubernetes to do

Start 5 containers using image atseashop/api:v1.3
Place an internal load balancer in front of these containers
Start 10 containers using image atseashop/webfront:v1.3
Place a public load balancer in front of these containers
It's Black Friday(or Christmas), traffic spikes, grow our cluster and add containers
New release!Replace my containers with the new image atseashop/webfront:v1.4
Keep processing requests during the upgrade;update my containers one at a time

Other things that Kubernetes can do for us

Basic autoscaling
Blue/green deployment, canary deployment
Long running services, but also batch(one-off) and CRON-like jobs
Overcommit our cluster and evict low-priority jobs
Run services with stateful data(database etc.)
Fine-grained access control defining what can be done whom on which resources
Integrating third party services(service catalog)
Automating complex tasks(operators)

Kubernetes architecture

Ha ha ha ha
OK, I was trying to scare you, it's much simpler than that ❤️

Kubernetes architecture: the nodes

The nodes executing our containers run a collection of services:
- a container Engine(typically Docker)
- kubelet(the "node agent")
- kube-proxy(a necessary but not sufficient network component)
Nodes were formerly called "minions"
- (You might see that word in older articles or documentation)

Kubernetes architecture: the control plane

The Kubernetes logic(its "brains") is a collection of services:
- the API server(our point of entry to everything!)
- core services like the scheduler and controller manager
- etcd (a highly available key/value store; the "database" of Kubernetes)
Together, these services form the control plane of our cluster
The control plane is also called the "master"

Running the control plane on special nodes

It is common to reserve a dedicated node for the control plane
- (Except for single-node development clusters, like when using minikube)
This node is then called a "master"
- (Yes,this is ambiguous:is the "master" a node, or the whole control plane?)
Normal applications are restricted from running on this node
- (By using a mechanism called "taints")
When high availability is required, each service of the control plane must be resilient
The control plane is then replicated on multiple nodes
- (This is sometimes called a "multi-master" setup)

Running the control plane outside containers

The services of the control plane can run in or out of containers
For instance: since etcd is a critical service, some people deploy it directly on a dedicated cluster(without containers)
- (This is illustrated on the first "super complicated" schema)
In some hosted Kubernetes offerings(e.g. AKS, GKE, EKS), the control plane is invisible
- (We only "see" a Kubernetes API endpoint)
In that case, there is no "master node"

For this reason, it is more accurate to say "control plane" ranther than "master"

Do we need to run Docker at all?

No!

By default, Kubernetes uses the Docker Engine to run containers
Or leverage other pluggable runtimes through the Container Runtime Interface
We could also use rkt("Rocket") from CoreOS(deprecated)
containerd: maintained by Docker, IBM, and community
Used by Docker Engine, microK8s, K3s, GKE, and standalone; has ctr CLI
CRI-O: maintained by Red Hat, SUSE, and community; based on containerd
Used by OpenShift and Kubic, version matched to Kubernetes

Yes!

In this course, we'll run our apps on a single node first
We may need to build images and ship them around
We can dot these things without Docker
- (and get diagnosed with NIH syndrome)
Docker is still the most stable container engine today
- (but other options are maturing very quickly)

NIH--> Not Invented Here

On your development environments, CI pipelines...:
- Yes, almost certainly
On our production servers:
- Yes(today)
- Probally not(in the future)

More information about CRI on the Kubernetes blog

Interacting with Kubernetes

We will interact with our Kubernetes cluster through the Kubernetes API
The Kubernetes API is (mostly) RESTful
It allows us to create, read, update, delete resources
A few common resource types are:
- node (a machine == physical or virtual -- in our cluster)
- pod (group of containers running together on a node)
- service (stable network endpoint to connect to one or multiple containers)

Pods

Pods are a new abstraction!
A pod can have multiple containers working together
(But you usually only have on container per pod)
Pod is our smallest deployable unit; Kubernetes can't mange containers directly
IP address are associated with pods, not with individual containers
Containers in a pod share loacalhost, and can share volumes
Multiple containers in a pod are deployed together
In reality, Docker doesn't know a pod, only containers/namespaces/volumes

Getting a Kubernetes cluster for learning

Best: Get a environment locally
- Docker Desktop(Win/MacOS), minikube(Win Home), or microk8s(Linux)
- Small setup effort;free;flexible environments
- Requires 2GB+ of memory
Good: Setup a cloud Linux host to run microk8s
- Great if you don't have the local resources to run Kubernetes
- Small setup effort; only free for a while
- My $50 DigitalOcean coupon lets you run Kubernetes free for a month
Last choice: Use a browser-based solution
- Low setup effort; but host is short-lived and has limited resources
- Not all hands-on examples will work in the browser sandbox

Docker Desktop(Windows 10/macOS)

Docker Desktop(DD) is great for a local dev/test setup
Requires modern macOS or Windows 10 Pro/Ent/Edu(no Home)
Requires Hyper-V, and disables VirtualBox

Exercise

Download Windows or macOS versions and install
For Windows, ensure you pick "Linux Containers" mode
Once running, enabled Kubernetes in Settings/Perferences

Check your connection in a terminal

 kubectl get nodes

minikube(Windows 10 Home)

A good local install option if you can't run Docker Desktop
Inspired by Docker Toolbox
Will create a local VM and configure latest Kubernetes
Has lots of other features with it minikube CLI
But, requires spearate install of VirtualBox and kubectl
May not work with older Windows version(YMMV)

Exercise

Download and install VirtualBox
Download kubectl, and add to $PATH
Download and install minikube
Run minikube start to create and run a Kubernetes VM
Run minikube stop when you're done

microk8s(linux)

Easy install and management of local Kubernetes
Made by Canonical(Ubuntu).Installs using snap.Works nearly everywhere
Has lots of other features with its microk8s CLI
But, requires you install snap if not on Ubuntu
Runs on containerd rather than Docker, no biggie
Needs alias setup for microk8s.kubectl

Exercise

Install microk8s, change group permissions, then set alias in bashrc

sudo apt install snapd
sudo snap install microk8s --classic
microk8s.kubectl
sudo usermod -a -G microk8s <username>
echo "alias kubectl='microk8s.kubectl'" >> ~/.bashrc

Web-based options

Last choice: Use a browser-based solution

Low setup effort; but host is short-lived and has limited resources
Services are not always working right, and may not be up to date
Not all hands-on examples will work in the browser sandbox

Exercise

Use a prebuilt Kubernetes server at Katacoda
Or setup a Kubernetes node at play-with-k8s.com
Maybe try the latest OpenShift at learn.openshift.com
See if instruqt works for a Kubernetes playground

shpod: For a consistent Kubernetes experience...

You can use shpod for examples
shpod provides a shell running in a pod on the cluster
It comes with many tools pre-installed(helm, stern, curl, jq...)
These tools are used in many exercises in these slides
shpod also gives you shell completion and a fancy prompt
Create it with kubectl apply -f https://k8smastery.com/shpod.yaml
Attach to shell with kubectl attach --namespace=shpod -ti shpod
After finishing course kubectl delete -f https://k8smastery.com/shpod.yaml

First contact with `kubectl`

kubectl is(almost) the only tool we'll need to talk to Kubernetes
It is a rich CLI tool around the Kubernetes API
- (Everything you can do with kubectl, you can do directly with the API)
On our machines, there is a ~/.kube/config file with:
- the Kubernetes API address
- the path to our TLS certificates used to authenticate
You can also use the --kubeconfig flag to pass a config file
Or directly --server, --user, etc.
kubectl can be pronounced "Cube C T L", "Cube cuttle", "Cube cuddle"...
I'll be using the official name "Cube Control"

kubectl get

Let's look at our Node resources with kubectl get!

Exercise

Look at the composition of our cluster:

kubectl get node

These commands are equivalent:

kubectl get no
kubectl get node
kubectl get nodes

Obtaining machine-readable output

kubectl get con output JSON, YAML, or be directly formatted

Exercise

Give us more info about the nodes:

kubectl get nodes -o wide

Let's have some YAML:

kubectl get no -o yaml

See that kind: List at the end? It's the type of our result!

(Ab)using `kubectl` and `jq`

It's super easy to build custom reports

Exercise

Show the capacity of all our nodes as a stream of JSON objects:

kubectl get nodes -o json | jq ".items[] | {name:.metadata.name} + .status.capacity"

Kubectl describe

kubectl describe needs a resource type and (optionally) a resource name
It is possible to provide a resource name prefix(all matching objects will be displayed)
kubectl describe will retrieve some extra information about the resource

Exercise

Look at the information available for your node name with one of the following:

kubectl describe node/<node>
kubectl describe node <node>

kubectl describe node/docker-desktop

(We should notice a bunch of control plane pods.)

Exploring types and definitions

We can list all available resource types by running kubectl api-resources
- (in Kubernetes 1.10 and prior, this command used to be kubectl get)
We can view the definition for a resource type with:

kubectl explain type

We can view the definition of a field in a resource, for instance:

kubectl explain node.spec

Or get the list of all fields and sub-fields:

kubectl explain node --recursive

e.g.

kubectl explain node
kubectl explain node.spec

Introspection vs. documentation

We can access the same information by reading the API documentation
The API documentation is usually easier to read, but:
- it won't show custom types(like Custom Resource Definitions)
- We need to make sure that we look at the correct version
kubectl api-resources and kubectl explain perform introspection
- (they communicate with the API server and obtain the exact type definitions)

Type names

The most common resource names have three forms:
- singular(e.g. node, service, deployment)
- plural(e.g. nodes, services, deployments)
- short(e.g. no, svc, deploy)
Some resources do not have a short name
Endpoints only have a plural form
- (because even a single Endpoints resource is actually a list of endpoints)

More `get` commands: Services

A service is a stable endpoint to connect to "something"
- (in the initial proposal, they were called "portals")

Exercise

List the services on our cluster with one of these commands:

kubectl get services
kubectl get svc

There is already one service on our cluster: the Kubernetes API itself.

More `get` commands: Listing running containers

Containers are manipulated through pods
A pod is a group of containers:
- running together (on the same node)
- sharing resources(RAM, CPU; but also network, volumes)

Exercise

List pods on our cluster:

kubectl get pods

Namespaces

Namespaces allow us to segregate resources

Exercise

List the namespaces on our cluster with one of these commands:

kubectl get namespaces
kubectl get namespace
kubectl get ns

You know what...This kube-system thing looks suspicious

In fact, I'm pretty sure it showed up earlier, when we did:

kubectl describe node <node-name>

Accessing namespaces

By default, kubectl uses the default namespace
We can see resources in all namespaces with --all-namespaces

Exercise

List the pods in all namespaces:

kubectl get pods --all-namespaces

Since Kubernetes 1.14, we can also use -A as a shorter version:

kubectl get pods -A

Here are our system pods!

What are all these control plane pods?

etcd is our etcd server
kube-apiserver is the API server
kube-controller-manager and kube-scheduler are other control plane components
coredns provides DNS-based service discovery(replacing kube-dns as of 1.11)
kube-proxy is the(per-node) component managing port mappings and such
<net name> is the optional(per-node) component managing the network overlay
the READY column indicates the number of conatainers in each pod
Note: this only shows containers, you won't see host svcs(e.g. microk8s)
Also Note: you may see different namespaces depending on setup

Scoping another namespace

We can also look at a different namespace(other than default)

Exercise

List only the pods in the kube-system namespace:

kubectl get pods --namespace=kube-system
kubectl get pods -n kube-system

Namespaces and other `kubectl` commands

We can use -n/--namespace with almost every kubectl command
Example:
- kubectl create --namespace=X to create something in namespace X
We can use -A/--all-namespaces with most commands that manipulate multiple objects
Examples:
- kubectl delete can delete resources across multiple namespaces
- kubectl label can add/remove/update labels across multiple namespaces

What about `kube-public`?

Exercise

List the pods in the kube-public namespace:

kubectl -n kube-public get pods

Nothing!

kube-public is created by our installer & used for security bootstrapping.

Exploring `kube-public`

The only interesting object in kube-public is a ConfigMap named cluster-info

Exercise

List ConfigMap objects:

kubectl -n kube-public get configmaps

inspect cluster-info:

kubectl -n kube-public get configmap cluster-info -o yaml

Note the selfLink URI:/api/v1/namespaces/kube-public/configmaps/cluster-info

We can use that (later in kubectl context lectures)!

kubectl -n kube-public get configmaps
kubectl -n kube-public get configmap cluster-info -o yaml

What about `kube-node-lease`?

Starting with kubernetes 1.14, there is kube-node-lease namespace
- (or in Kubernetes 1.13 if the NodeLease feature gate is enabled)
That namespace contains one Lease object per node
Node leases are a new way to implement node heartbeats
- (i.e. node regularly pinging the control plane to say "I'm alive!")
For more details, see KEP-0009 or the node controller documentation

Running our first containers on Kubernetes

First things first: we cannot run a container
We are going to run a pod, and in that pod there will be a single container
In that container in the pod, we are going to run a simple ping command
Then we are going to start additional copies of the pod

Starting a simple pod with `kubectl run`

We need to spcify at least a name and the image we want to use

Exercise

Let's ping 1.1.1.1, Cloudflare's public DNS resolver:

kubectl run pingpong --image alpine ping 1.1.1.1

(Starting with Kubernetes 1.12, we get a message telling us that kubectl run is deprecated. Let's ignore it for now.)

Behind the scenes of `kubectl run`

Let's look at the resources that were created by kubectl run

Exercise

List most resource types:

kubectl get all

We should see the following things:

deployment.apps/pingpong (the deployment that we just created)
replicaset.apps/pingpong-xxxxxxxxx (a replica set created by the deployment)
pod/pingpong-xxxxxxxx-yyyyyyy (a pod created by the replica set)

Note: as of 1.10.1, resource types are displayed in more detail.

What are these different things?

A deployment is a high-level construct
- allows scaling, rolling updates, rollbacks
- multiple deployments can be used together to implement a canary deployment
- delegates pods management to replica sets
A replica set is a low-level construct
- makes sure that a given number of identical pods are running
- allows scaling
- rarely used directly
Note: A replication controller is the (deprecated) predecessor of a replica set

Our `pingpong` deployment

kubectl run created a deployment, deployment.apps/pingpong

NAME                       DESIRED     CURRENT      UP-TO-DATE     AVAILABLE  AGE
deployment.apps/pingpong   1           1            1              1          10m

That deployment created a replica set,replicaset.apps/pingpong-xxxxxxxxxxxxx

NAME                                  DESIRED     CURRENT      READY      AGE
replicaset.apps/pingpong-7c8bbcd9bc   1           1            1          10m

That replica set created a pod, pod/pingpong-xxxxxxxxxxxxx-yyyy

NAME                            READY    STATUS      RESTARTS    AGE
pod/pingpong-7c8bbcd9bc-6c9qz   1/1      Running     0           10m

We'll see later how these folks play together for:
- scaling, high availability, rolling updates

Viewing container output

Let's use the kubectl logs command
We will pass either a pod name, or a type/name
- (E.g. if we specify a deployment or replica set, it will get the first pod in it)
Unless specified otherwise, it will only show logs of the first container in the pod
- (Good thing there's only one in ours!)

Exercise

View the result of our ping command:

kubectl logs deploy/pingpong

Scaling our application

We can create additional copies of our container(I mean, our pod) with kubectl scale

Exercise

Scale our pingpong deployment:

kubectl scale deploy/pingpong --replicas 3

Note that this command does exactly the same thing:

kubectl scale deployment pingpong --replicas 3

Note: what if we tried to scale replicaset.apps/pingpong-xxxxxxx?

We could! But he deployment would notice it right away, and scale back to the ...

kubectl scale deploy/pingpong --replicas 3

Streaming logs in real time

Just like docker logs, kuberctl logs supports conveninent options:
- -f/--follow to stream logs in real time(a la tail -f)
- --tail to indicate how many lines you want to see(from the end)
- --since to get logs only after a given timestramp

Exercise

View the latest logs of our ping command:

kubectl logs deploy/pingpong --tail 1 --follow

Leave that command running, so that we can keep an eye on these logs

Streaming logs of multiple pods

What happens if we restart kubctl logs?

Exercise

interrupt kubectl logs (with Ctrl-C)
Restart it:

kubectl logs deploy/pingpong --tail 1 --follow

kubectl logs will warn us that multiple pods were found, and that it's showing us only one of them.

Let's leave kubectl logs running while we keep exploring.

Resilience

The deployment pingpong watches its replica set
The replica set ensures that the right number of pods are running
What happens if pods disappear?

Exercise

In a separate window, watch the list of pods:

watch kubectl get pods

Destroy the pod currently shown by kubectl logs:

kubectl delete pod pingpong-xxxxxxxxxxxxx-yyyyyy

e.g.

watch kubectl get pods
kubectl delete pod pingpong-xxxxxxxxxxxxx-yyyyyy

What happened?

kubectl delete pod terminates the pod gracefully
- (sending it the TERM signal and waiting for it to shutdown)
As soon as the pod is in "Terminating" state, the Replica Set replaces it
But we can still see the output of the "Terminating" pod in kubectl logs
Until 30 seconds later, when the grace period expires
The pod is then killed, and kubectl logs exits

What if we wanted something different?

What if we wanted to start a "one-shot" container that doesn't get restarted?
We could use kubectl run --restart=OnFailure or kubectl run --restart=Never
These commands would create jobs or pods instead of deployments
Under the hood, kubectl run invokes "generators" to create resource descriptions
We could also write these resource descriptions ourselves(typically in YAML), and create them on the cluster with kubectl apply -f(discussed later)
With kubectl run --schedule=..., we can also create cronjobs.

Scheduling periodic background work

A Cron Job is a job that will be executed at specific intervals
- (the name comes from the traditional cronjobs executed by the UNIX crond)
It requires a schedule,represented as five space-separated fields:
- minute[0,59]
- hour[0,23]
- day of the month[1,31]
- month of the year[1,12]
- day of the week([0,6] with 0=Sunday)
* means "all vaild values";/N means "every N"
Example: */3 * * * * means "every three minutes"

Creating a Cron Job

Let's create a simple job to be executed every three minutes
Cron Jobs need to terminate, otherwise they'd run forever

Exercise

Create the Cron Job:

kubectl run --schedule="*/3 * * * *" --restart=OnFailure --image=alpine sleep 10

Check the resource that was created:

kubectl get cronjobs

Cron Jobs in action

At the specified schedule, the Cron Job will create a Job
The Job will create a Pod
The Job will make sure that the Pod completes
- (re-creating another on if it fails, for instance if its node fails)

Exercise

Check the Jobs that are created:

kubectl get jobs

(it will take a few minutes before the first job is scheduled.)

What about that deprecation warning?

As we can see from the previous slide, kubectl run can do many things
The exact type of resource created is not obvious
To make things more explicit, it is better to use kubectl create:
- kubectl create deployment to create a deployment
- kubectl create job to create a job
- kubectl create cronjob to run a job periodically
  - (since Kubernetes 1.14)
Eventually, kubectl run will be used only to start one-shot pods
- (see kubernetes/kubernetes#68132)

Various ways of creating resources

kubectl run
- easy way to get started
- versatile
kubectl create <resource>
- explicit, but lacks some features
- can't create a CronJob before Kubernetes 1.14
- can't pass command-lin arguments to deployments
kubectl create -f foo.yaml or kubectl apply -f foo.yaml
- all features are available
- requires writing YAML

Streaming logs of multiple pods

Can we stream the logs of all our pingpong pods?

Exercise

Combine -l and -f flags:

kubectl logs -l run=pingpong --tail 1 -f

Note: combining -l and -f is only possible since Kubernetes 1.14!

Let's try to understand why...

Streaming logs of many pods

Let's see what happens if we try to stream the logs for more than 5 pods

Exercise

Scale up our deployment:

kubectl scale deployment pingpong --replicas=8

Stream the logs

kubectl logs -l run=pingpong --tail 1 -f

We see a message like the following one:

error: your are attempting to follow 8 log streams,
but maximum allowed concurency is 5,
use --max-log-requests to increase the limit

e.g.

kubectl scale deployment pingpong --replicas=8
kubectl get pods
kubectl logs -l run=pingpong --tail 1 -f
# error: you are attempting to follow 8 log streams, but maximum allowed concurency is 5, use --max-log-requests to increase the limit

Why can't we stream the logs of many pods?

kubectl opens one connection to the API server per pod
For each pod, the API server opens one extra connection to the corresponding kubelet
If there are 1000 pods in our deployments, that's 1000 inbound + 1000 outbound connections on the API server
This could easily put a lot of stress on the API server
Prior Kubernetes 1.14, it was decided to not allow multiple connections
From Kubernetes 1.14, it is allowed, but limited to 5 connetcions
- (this can be changed with --max-log-requests)
For more details about the rationale, see PR #67573

Shortcomings of `kubectl logs`

We don't see which pod sent which log line
If pods are restarted/replaced, the log stream stops
If new pods are added, we don't see their logs
To stream the logs of multiple pods, we need to write a selector
There are external tools to address these shortcomings
- (e.g.: Stern)

Accessing logs from the CLI

The kubectl logs commands has limitaions:
- it cannot stream logs from multiple pods at a time
- when showing logs from multiple pods, it mixes them all together
We are going to see how to do it better

Doing it manually

We could(if we were so inclined) write a program or script that would:
- take a selector as an argument
- enumerate all pods matching that selector (with kubectl get -l...)
- for one kubectl logs --follow ... command per container
- annotate the logs (the output of each kubectl logs .... process) with their origin
- preserve ordering by using kubectl logs --timestramps ... and merge the output
We could do it, but thankfully, other did it for us already!

Stern

Stern is an open source project by Wercker

From the README:

Stern allows you to tail multiple pods on Kubernetes and multiple containers within the pod. Each result is color coded for quicker debugging.

The query is a regular expression so the pod name can easily be filtered and you don't need to specify the exact id (for instance omitting the deployment id). If a pod is deleted it gets removed from tail and if a new pod is added it automatically gets tailed.

Exactly what we need!

Installing Stern

Run stern(without arguments) to check if it's installed

$ stern
Tail multiple pods and containers from Kubernetes

Usage:
stern pod-query [flags]

If it is not installed, the easiest method is to download a binary release
The following commands will install Stern on a Linux Intel 64 bit machine:

sudo curl -L -o /usr/local/bin/stern \
  https://github.com/wercker/stern/releases/download/1.11.0/stern_linux_amd64
sudo chmod +x /usr/local/bin/stern

On OS X, just brew install stern

Using Stern

There are two ways to specify the pods whose logs we want to see:
- -l followed by a selector expression(like with many kubectl commands)
- with a "pod query," i.e. a regex used to match pod names
These two ways can be combined if necessary

Exercise

View the logs for all the pingpong containers:

stern pingpong

Stern conveninent options

The --tail N flag shows the last N lines for each container
- (instead of showing the logs since the creation of the container)
The -t/--timestamps flag shows timestamps
The --all-namespaces flag is self-explanatory

Exercise

View what's up with the pingpong system containers:

stern --tail 1 --timestamps pingpong

Cleanup

Let's cleanup before we start the next lecture!

Exercise

remove our deployment and cronjob:

kubectl delete deployment/pingpong cronjob/sleep

19,000 words

They say, "a picture is worth one thousand words."
The following 19 slides show what really happens when we run:

kubectl run web --image=nginx --replicas=3

Exposing containers

kubectl expose creates a service for existing pods
A service is a stable address for a pod (or a bunch of pods)
If we want to connect to our pod(s), we need to create a service
Once a service is created, CoreDNS will allow us to resolve it by name
- (i.e. after creating service hello, the name hello will resolve to something)
There are different types of services, detailed on the following slides:
- ClusterIP, NodePort, LoadBalancer, ExternalName

Basic service types

ClusterIP(default type)
- a virtual IP address is allocated for the service(in an internal, private range)
- this IP address is reachable only from within the cluster(nodes and pods)
- our code can connect to the service using the original port number
NodePort
- a port is allocated for the service(by default, in the 30000-32768 range)
- that port is made available on all our nodes and anybody can connect to it
- our code must be changed to connect to that new port number

These service types are always available.

Under the hood:kube-proxy is using a userland proxy and a bunch of iptables rules.

LoadBalancer
- an external load balancer is allocated for the service
- the load balancer is configured accordingly
  - (e.g.: a NodePort service is created, and the load balancer sends traffic to that port)
- available only when the underlying infrastructure provides some "load balancer as as service"
  - (e.g. AWS, Azure, GCE, OpenStack...)
ExternalName
- the DNS entry managed by CoreDNS will just be a CNAME to a provided record
- no port, no IP address, no nothing else is allocated

Running containers with open ports

Since ping doesn't have anything to connect to, we'll have to run something else
We could use the nginx offical image, but ...
- ...we wouldn't be able to tell the backends from each other!
We are going to use bretfisher/httpenv, a tiny HTTP server written in Go
bretfisher/httpenv listens on port 8888
it serves its environment variables in JSON format
The environment variables will include HOSTNAME, which will be the pod name
- (and therefore, will be different on each backend)

Creating a deployment for our HTTP server

We could do kubectl run httpenv --image=bretfisher/httpenv ...
But since kubectl run is changing, let's see how to see kubectl create instead

Exercise

In another window, watch the pods(to see when they are created)

kubectl get pods -w

Create a deployment for this very lightweight HTTP server:

kubectl create deployment httpenv --image=bretfisher/httpenv

Scale it to 10 replicas:

kubectl scale deployment httpenv --replicas=10

Exercise

# t1
kubectl get pods -w
# t2
kubectl create deployment httpenv --image=bretfisher/httpenv
# t1
ctrl+c
kubectl get pods
kubectl get all
clear
kubectl get pods -w
# t2
kubectl scale deployment httpenv --replicas=10
kubectl expose deployment httpenv --port 8888
kubectl get service

Services are layer 4 constructs

You can assign IP addresses to services, but they are stll layer 4
- (i.e. a service is not an IP address; it's an IP address + protocol + port)
This is caused by the current implementation of kube-proxy
- (it relies on mechanisms that don't support layer 3)
As a result: you have to indicate the port number for your service
Running services with arbitrary port (or port ranges) requires hacks
- (e.g. host networking mode)

Testing our service

We will now send a few HTTP requests to our pods

Exercise

Run shpod if not on Linux host so we can access internal ClusterIP

kubectl apply -f https://bret.run/shpod.yml
kubectl attach --namespace=shpod -it shpod

Let's obtain the IP address that was allocated for our service, programmatically:

IP=$(kubectl get svc httpenv -o go-template --template '{{ .spec.clusterIP }}')
echo $IP

Send a few requests:

curl http://$IP:8888/

Too much output? Filter it with jq:

curl -s http://$IP:8888/ | jq .HOSTNAME
exit
kubectl delete -f https://bret.run/shpod.yml
kubectl delete deployment/httpenv

If we don't need a load balancer

Sometimes, we want to access our scaled services directly:
- if we want to save a tiny little bit of latency(typically less than 1ms)
- if we need to connect over arbitrary ports(instead of a few fixed ones)
- if we need to communicate over another protocol than UDP or TCP
- if we want to decide how to balance the requests client-side
- ...
In that case, we can use a headless service

Headless services

A headless service is obtained by setting the clusterIP field to None
- (Either with --cluster-ip=None, or by providing a custom YAML)
As a result, the service doesn't have a virtual IP address
Since there is no virtual IP address, there is no load balancer either
CoreDNS will return the pods' IP addresses as multiple A records
This gives us an easy way to discover all the replicas for a deployment

Services and endpoints

A service has a number of "endpoints"
Each endpoint is a host + port where the service is available
The endpoints are maintained and updated automatically by Kubernetes

Exercise

Check the endpoints that Kubernetes has associated with our httpenv service:

kubectl describe service httpenv

In the output, there will be a line starting with Endpoints:.
That line will list a bunch of addresses in host:port format.

Viewing endpoint details

When we have many endpoints, our deploy commands truncate the list

kubectl get endpoints

If we want to see the full list, we can use a different output:

kubectl get endpoints httpenv -o yaml

These IP addresses should match the addresses of the corresponding pods:

kubectl get pods -l app=httpenv -o wide

`endpints` not `endpoint`

endpoints is the only resource that cannot be singular

kubectl get endpoint
# error: the server doesn't have a resource type "endpoint"

This is because the type itself is plural(unlike every other resource)
There is no endpoint object: type Endpoints struct
The type doesn't represent a single endpoint, but a list of endpoints

Exposing services to the outside world

The default type(ClusterIP) only works for internal traffic
If we want to accept external traffic, we can use one of these:
- NodePort (expose a service on a TCP port between 30000-32768)
- LoadBalancer (provision a cloud load balancer for our service)
- ExternalIP (use one node's external IP address)
- Ingress(a special mechanism for HTTP services)

We'll see NodePorts and Ingresses more in detail later.

Cleanup

Let's cleanup before we start the next lecture!

Exercise

remove our httpenv resources:

kubectl delete deployment/httpenv service/httpenv

Kubernetes network model

TL,DR:
- Our cluster(nodes and pods) is one big flat IP network.
In detail:
- all nodes must be able to reach each other, without NAT
- all pods must be able to reach each other, without NAT
- pods and nodes must be able to reach each other, without NAT
- each pod is aware of its IP address(no NAT)
- pod IP addresses are assigned by the network implementation
Kubernetes doesn't mandate any particular implementaion

Kubernetes network model:the good

Everything can reach everything
No address translation
No port translation
No new protocol
The network implementation can decide how to allocate addresses
IP addresses don't have to be "portable" from a node to another
- (For example, We can use a subnet per node and use a simple routed topology)
The specification is simple enough to allow many various implementaions

Kubernetes network model:the less good

Everything can reach everything
- if you want security, you need to add network policies
- the network implementation you use needs to support them
There are literally dozens of implementations out there
- (15 are listed in the Kubernetes documentaion)
Pods have level 3(IP) connectivity, but services are level 4(TCP or UDP)
- (Services map to a single UDP or TCP port; no port ranges or arbitrary IP packets)
kube-proxy is on the data path when connecting to a pod or container,and its not particularly fast(relies on userland proxying or iptables)

Kubernetes network model:in practice

The nodes we are using have been set up to use kubenet, Calico, or something else
Don't worry about the warning about kube-proxy performance
Unless you:
- routinely saturate 10G network interfaces
- count packet rates in millions per second
- run high-traffic VOIP or gaming platforms
- do weird things that involve millions of simultaneous connections
  - (in which case you're already familiar with kernel tuning)
if necessary, there are alternatives to kube-proxy; e.g. kube-router

The Container Network Interface(CNI)

Most Kubernetes cluster use CNI "plugins" to implement networking
When a pod is created, Kubernetes delegates the network setup to these plugins
- (it can be a single plugin, or a combination of plugins, each doing one task)
Typically, CNI plugins will:
- allocate an IP address(by calling an IPAM plugin)
- add a network interface into the pod's network namespace
- configure the interface as well as required routes, etc.

Multiple moving parts

The "pod-to-pod network" or "pod network"
- provides communication between pods and nodes
- is generally implemented with CNI plugins
The "pod-to-service network"
- provides internal communication and load balancing
- is generally implemented with kube-prox(or maybe kube-router)
Network policies:
- provide firewalling and isolation
- can be bundled with the "pod network" or provided by another component

Even more moving parts

Inbound traffic can be handled by multiple components:
- something like kube-proxy or kube-router(for NodePort services)
- load balancers(ideally, connected to the pod network)
It is possible to use multiple pod networks in parallel
- (with "meta-plugins" like CNI-Genie or Multus)
Some solutions can fill multiple roles
- (e.g. kube-router can be set up to provide the pod network and/or network polcies and/or replace kube-proxy)

What's this application?

It is a DockerCoin miner!
No,you can't buy coffee with DockerCoins
How DockerCoins works:
- generate a few random bytes
- hash these bytes
- increment a counter(to keep track of speed)
- repeat forever!
DockerCoins is not a cryptocurrency
- (the only common points are "randomness", "hashing" and "coins" in the name)
DockerCoins is made of 5 services
- rng = web service generating random bytes
- hasher = web service computing hash of POSTed data
- worker = background process calling rng and hasher
- webui = web interface to watch progress
- redis = data store(holds a counter updated by worker)
These 5 services are visible in the application's Compose file, dockercoins-compose.yml

How DockerCoins works

worker invokes web service rng to generate random bytes
worker invokes web service hasher to hash these bytes
worker does this in an infinite loop
Every sceond, worker updates redis to indicate how many loops were done
webui queries redis, and computes and exposes "hashing speed" in our browser (See diagram on next slide!)

Service discovery in container-land

How does each service find out the address of the other ones?

We do not hard-code IP addresses in the code
We do not hard-code FQDNs in the code, either
We just connect to a service name, and container-magic does the rest
- (And by container-magic, we mean "a crafty, dynamic, embedded DNS server")

Example in `worker/worker.py`

redis = Redis("redis")

def get_random_bytes():
    r = requests.get("http://rng/32")
    return r.content

def hash_bytes(data):
    r = requets.post("http://hasher/",
                    data=data,
                    headers={"Content-Type":"application/octet-stream"})

(Full source code available here)

DockerCoins at work

worker will log HTTP requests to rng and hasher
rng and hasher will log incoming HTTP requests
webui will give us a graph on coins mined per second

Check out the app in Docker Compose

Compose is(still) great for local development
You can test this app if you have Docker and Compose installed
If not, remember play-with-docker.com

Exercise

Download the compose file somewhere and run it

curl -o docker-compose.yml https://k8smastery.com/dockercoins-compose.yml
docker-compose up

View the webui on localhost:8000 or click the 8080 link in PWD

The reason why this graph is not awesome

The worker doesn't update the counter after every loop, but up to once per second
The speed is computed by the browser, checking the counter about once per second
Between two consecutive updates, the counter will increase either by 4, or by 0
The perceived speed will therefore be 4 - 4 - 4 - 0 - 4 - 4 - 0 etc.
What can we conclude from this?
"I'm clearly incapple of writing good frontend code!"😁

Stopping the application

If we interrupt Compose(with ^C),it will politely ask the Docker Engine to stop the app
The Docker Engine will send a TERM signal to the containers
If the containers do not exit in a timely manner, the Engine sends a KILL signal

Exercise

Stop the application by hitting ^C

Shipping images with a registry

For development using Docker, it has build, ship, and run features
Now that we want to run on a cluster, things are different
Kubernetes doesn't have a build feature built-in
The way to ship(pull) images to Kubernetes is to use a registry

How Docker registries work(a reminder)

What happens when we execute docker run alpine?
If the Engine needs to pull the alpine image, it expands it into library/alpine
library/alpine is expanded into index.docker.io/library/alpine
The Engine communicates with index.docker.io to retrieve library/alpine:latest
To use something else than index.docker.io, we specify it in the image name
Examples:

docker pull gcr.io/google-containers/alpine-with-bash:1.0
docker build -t registry.mycompany.io:5000/myimage:awesome .
docker push registry.mycompany.io:5000/myimage:awesome

Building and shipping images

There are many options!
Manually:
- build locally(with docker build or otherwise)
- push to the registry
Automatically:
- build and test locally
- when ready, commit and push a code repository
- the code repository notifies an automated build system
- that system gets the code, builds it, pushes the image to the registry

Which registry do we want to use?

There are SAAS products like Docker Hub, Quay, GitLab...
Each major cloud provider has an option as well
- (ACR on Azure, ECR on AWS, GCR on Google Cloud...)
There are also commercial products to run our own registry
- (Docker Enterprise DTR, Quay, GitLab, JFrog Artifacotry...)
And open source options, too!
- (Quay, Portus, OpenShift OCR, Gitlab, Harbor, Kraken...)
- (I don't mention Docker Distribution here because it's too basic)
When picking a registry, pay attention to:
- Its build system
- Multi-user auth and mgmt(RBAC)
- Storage feature(replication, cahing, garbage collection)

Running DockerCoins on Kubernetes

Create one deployment for each component
- (hasher, redis, rng, webui, worker)
Expose deployments that need to accept connections
- (hasher, redis, rng, webui)
For redis, we can use the official redis image
For the 4 others, we need to build images and push them to some registry

�Using images from the Docker Hub

For everyone's convenience, we took care of building DockerCoins images
We pushed these images to the DockerHub, under the dockercoins user
These images are tagged with a version number, v0.1
The full image names are therefore:
- dockercoins/hasher:v0.1
- dockercoins/rng:v0.1
- dockercoins/webui:v0.1
- dockercoins/worker:v0.1

Running DockerCoins on Kubernetes

We can now deploy our code(as well as a redis instance)

Exercise

Deploy redis:

kubectl create deployment redis --image=redis

Deploy everything else:

kubectl create deployment hasher --image=dockercoins/hasher:v0.1
kubectl create deployment rng --image=dockercoins/rng:v0.1
kubectl create deployment worker --image=dockercoins/worker:v0.1
kubectl create deployment webui --image=dockercoins/webui:v0.1

Is this working?

After waiting for the deployment to complete, let's look at the logs!
- (Hint: use kubectl get deploy -w to watch deployment events)

Exercise

Look at some logs:

kubectl logs deploy/rng
kubectl logs deploy/worker
kubectl logs deploy/worker --follow

🤔rng is fine... But not worker.

Oh right! We forgot to expose

Connecting containers together

Three deployments need to be reachable by others:hasher, redis, rng
worker doesn't need to be exposed
webui will be dealt with later

Exercise

Expose each deployment, specifying the right port:

kubectl expose deployment redis --port 6379
kubectl expose deployment rng --port 80
kubectl expose deployment hasher --port 80

$ minikube start --vm-driver=hyperkit --registry-mirror=https://registry.docker-cn.com --image-mirror-country=cn --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers

$ minikube start --image-mirror-country=cn --iso-url=https://kubernetes.oss-cn-hangzhou.aliyuncs.com/minikube/iso/minikube-v1.6.0.iso --registry-mirror=https://dockerhub.azk8s.cn  --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers --vm-driver="hyperv" --hyperv-virtual-switch="minikube v switch"  --memory=4096

Exposing services for external access

Now we would like to access the Web UI
We will expose it with a NodePort
- (just like we did for the registry)

Exercise

Create a NodePort service for the Web UI:

kubectl expose deploy/webui --type=NodePort --port=80

Check the port that was allocated:

kubectl get svc

webui NodePort 10.96.39.74 80:31900/TCP 9s

e.g. ->> http://localhost:31900

Scaling our demo app

Our ultimate goal is to get more DockerCoins
- (i.e. increase the number of loops per second shown on the web UI)
Let's look at the architecture again:
We're at 4 hashes a second.Let's ramp this up!
The loop is done in the worker; perhaps we could try adding more workers?

Adding another worker

All we have to do is scale the worker Deployment

Exercise

Open two new terminals to check what's going on with pods and deployments:

kubectl get pods -w
kubectl get deployments -w

Now, create more worker replicas:

kubectl scale deployment worker --replicas=2

Adding more workers

If 2 workers give us 2x speed,what about 3 workers?

Exercise

Scale the worker Deployment further:

kubectl scale deployment worker --replicas=3

The graph in the web UI should go up again.
- (This is looking great! We're gonna be RICH!)

Adding even more workers

Let's see if 10 workers give us 10x speed!

Exercise

Scale the worker Deployment to a bigger number:

kubectl scale deployment worker --replicas=10
# kubectl scale deployment rng --replicas=2
# kubectl scale deployment hasher --replicas=2

The graph will peak at 10 hashes/second. (We can add as many workers as we want: we will never go past 10 hashes/second.)

Why are we stuck at 10 hashes per second?

If this was high-quality, production code, we would have instrumentation
- (Datadog, Honeycomb, New Relic, statsd, Sumologic,...)
It's not
Perhaps we could benchmark our web services?
- (with tools like ab, or even simpler, httping)

Benchmarking our web services

We want to check hasher and rng
We are going to use httping
It's just like ping, but using HTTP GET requests
- (it measures how long it takes to perform one GET request)
It's used like this:

httping [-c count] http://host:port/path

Or even simpler:

httping ip.ad.dr.ess

We will use httping on the ClusterIP addresses of our services

Obtaining ClusterIP addresses

We can simply check the output of kubectl get services
Or do it programmatically, as in the example below

Exercise

Retrieve the IP addresses:

HASHER=$(kubectl get svc hasher -o go-template={{.spec.clusterIP}})
RNG=$(kubectl get svc rng -o go-template={{.spec.clusterIP}})

Now we can access the IP addresses of our services through $HASHER and $RNG.

Checking `hasher` and `rng` response times

Exercise

Remember to use shpod on macOS and Windows:

kubectl apply -f https://bret.run/shpod.yml
kubectl attach --namespace=shpod -ti shpod

Check the response times for both services:

httping -c 3 $HASHER
httping -c 3 $RNG

Let's draw hasty conclusions

The bottleneck seems to be rng
What if we don't have enough entropy and can't generate enough random numbers?
We need to scale out the rng service on multiple machines!

Note:this is a fiction!We have enough entropy.But we need a pretext to scale out.

Oops we only have one node for learning.🤔
Let's pretend and I'll explain along the way.

(in fact, the code of rng uses /dev/urandom. which never runs out of entropy......

...and is just as good as /dev/random)

Deploying with YAML

So far, we created resources with the following commands:
- kubectl run
- kubectl create deployment
- kubectl expose
We can also create resources directly with YAML manifests

Kubectl apply vs create

kubectl create -f whatever.yaml
- creates resources if they don't exist
- if resources already exist, don't alter them(and display error message)
kubectl apply -f whatever.yaml
- creates resources if they don't exist
- if resources already exist,update them(to match the definition provided by the YAML file)
- stores the manifest as an annotation in the resource

Creating multiple resources

The manifest can contain multiple resources separated by ---

kind: ...
apiVersion: ...
metadata:
  name: ...
  ...
spec:
  ...
---
kind: ...
apiVersion: ...
metadata:
  name: ...
  ...
spec:
  ...

The manifest can also contain a list of resources

apiVersion: v1
kind: List
items: 
- kind: ...
  apiVersion: ...
  ...
- kind: ...
  apiVersion: ...
  ...

Deploying DockerCoins with YAML

We provide a YAML manifest with all the resources for DockerCoins (Deployments and Services)
We can use it if we need to deploy or redeploy DockerCoins

Exercise

Deploy or redeploy DockerCoins:

kubectl apply -f https://k8smastery.com/dockercoins.yaml

Note the warinings if you already had the resources created
This is because we didn't use apply before
This is OK for us learning, so ignore the warnings
Generally in production you want to stick with one method or the other

The Kubernetes Dashboard

Kubernetes resources can also be viewed with an official web UI
That dashboard is usually exposed over HTTPS (this requires obtaining a proper TLS certificate)
Dashboard users need to authenticate
We are going to take a dangerous shortcut

The insecure method

We could(and should) use Let's Encrypt...
...but we don't want to deal with TLS certificates
We could(and should) learn how authentication and authorization work...
...but we will use a guest account with admin access instead

Yes, this will open our cluster to all kinds of shenanigans.Don't do this at home.

Running a very insecure dashboard

We are going to deploy that dashboard with one single command
This command will create all the necessary resources (the dashboard itself, the HTTP wrpper, the admin/guest account)
All these resources are defined in a YAML file
All we have to do is load that YAML file with kubctl apply -f

Exercise

Create all the dashboard resources, with the following command:

kubectl apply -f https://k8smastery.com/insecure-dashboard.yaml
kubectl get svc dashboard

Dashboard authentication

We have three authentication options at this point:
- token (associated with a role that has appropriate permissions)
- kubeconfig (e.g. using the ~/.kube/config file)
- "skip" (use the dashboard "service account")
Let's use "skip": we're logged in!

By the way, we just added a backdoor to our Kubernetes cluster!

Running the Kubernetes Dashboard securely

The steps that we just showed you are for educational purpose only!
If you do that on your production cluster, people can and will abuse it
For an in-depth discussion about securing the dashboard
- check this execllent post on Heptio's blog
Minikube/microK8s can be enabled with easy commands
- minikube dashboard and microk8s.enable dashboard

Other dashboards

Kube Web View
- read-only dashboard
- optimized for "troubleshooting and incident response"
- see vision and goals for details
Kube Ops View
- "provides a common operational picture for multiple Kubernetes clusters"
Your Kubernetes distro comes with one!
Cloud-provided control-planes often don't come with one.

Security implications of `kubectl apply`

When we do kubectl apply -f <URL>.we create arbitrary resources
Resources can be evil;imagine a deployment that...
- starts bitcoin miners on the whole cluster
- hides in a non-default namespace
- bing-mounts our nodes' filesystem
- inserts SSH keys in the root account(on the node)
- encrypts our data and ransoms it

`kubectl apply` is the new `curl | sh`

curl | sh is convenient
It's safe if you use HTTPS URLs from trusted sources
kubectl apply -f is convenient
It's safe if you use HTTPS URLs from trusted sources
Example:the official setup instructions for most pod networks

Daemon sets

We want to scale rng in a way that is different from how we scaled worker
We want one(and exactly one) instance of rng per node
We do not want two instances of rng on the same node
We will do that with a daemon set

Why not a deployment?

Can't we just do kubectl scale deployment rng --replicas=...?
Nothing guarantees that the rng containers will be distributed evenly
If we add nodes later,they will not automatically run a copy of rng
If we remove(or reboot) a node, one rng container will restart elsewhere (add we will end up with two instances rng on the same node)

Daemon sets in practice

Daemon sets are great for cluster-wide, per-node processes:
- kube-proxy
- CNI network plugins
- monitoring agents
- hardware management tools(e.g. SCSI/FC HBA agents)
- etc.
They can also be restricted to run only on some nodes

Creating a daemon set

Unfortunately, as of Kubernetes 1.15, the CLI cannot create daemon sets
More precisely: it doesn't have a subcommand to create a daemon set
But any kind of resource can always be created by providing a YAML description:

kubectl apply -f foo.yaml

How do we create the YAML file for our daemon set?
- option 1: read the docs
- option 2: vi our way out of it

Creating the YAML file for our daemon set

Let's start with the YAML file for the current rng resource Exercise
Dump the rng resource in YAML:

kubectl get deploy/rng -o yaml > rng.yml

Edit rng.yml

"Casting" a resource to another

What if we just changed the kind field?(It can't be that easy,right?)

Exercise

Change kind: Deployment to kind: DaemonSet
Save, quit
Try to create our new resource:

kubectl apply -f rng.yml

Understanding the problem

The core of the error is:

error validating data:
[ValidationError(DaemonSet.spec):
unknown field "replicas" in io.k8s.api.extensions.v1beta1.DaemonSetSpec,
...

Obviously, it doesn't make sense to specify a number of replicas for a daemon set
Workaround:fix the YAML
- remove the replicas field
- remove the strategy field(which defines the rollout mechanism for a deployment)
- remove the progressDeadlineSeconds field(also used by the rollout mechani)
- remove the status: {} line at the end

Use the `--force`, Luke

We could also tell Kubernetes to ignore these errors and try anyway
The --force flag's actual name is --validate=false

Exercise

Try to load our YAML file and ignore errors:

kubectl apply -f rng.yml --validate=false

Wait...Now, can it be that easy?

Checking what we've done

Did we transform our deployment into a daemonset? Exercise
Look at the resources that we have now:

kubectl get all

`deploy/rng` and `ds/rng`

You can have different resource types with the same name (i.e. a deployment and a daemon set both named rng)
We still have the old rng deployment

NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/rng         1/1     1            1           20h

But how we have the new rng daemon set as well

NAME                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/rng   1         1         1       1            1           <none>          7s

Too many pods

If we check with kubectl get pods, we see:
- one pod for the deployment(named rng-xxxxxxxxxxx-yyyyy)
- one pod per node for the daemon set(named rng-zzzzz)

NAME                             READY   STATUS    RESTARTS   AGE
pod/rng-58596fdbd-lqlsp          1/1     Running   0          20h
pod/rng-qnq4n                    1/1     Running   0          7s

The daemon set created one pod per node.

In a multi-node setup, master usually have taints preventing pods from running there.

(To schedule a pod on this node anyway,the pod will require appropriate tolerations)

Is this working?

Look at the web UI
The graph should now go above 10 hashes per second!
It looks like the newly created pods are serving traffic correctly
How and why did this happen? (We didn't do anything special to add them to the rng service load balancer!)

Labels and selectors

The rng service is load balancing requests to a set of pods
That set of pods is defined by the selector of the rng service

Exercise

Check the selector in the rng service definition:

kubectl describe service rng

The selector is app=rng
It means "all the pods having the label app=rng" (They can have additional labels as well, that's OK!)

Selector evaluation

We can use selectors with many kubectl commands
For instance, with kubectl get, kubectl logs, kubectl delete... and more

Exercise

Get the list of pods matching selector app=rng:

kubectl get pods -l app=rng
kubectl get pods --selector app=rng

But...why do these pods(in particular, the new ones)have this app=rng label?

Where do labels come from?

When we create a deployment with kubectl create deployment rng, this deployment gets the label app=rng
The replica sets created by this deployment also get the label app=rng
The pods created by these replica sets also get the label app=rng
When we created the daemon set from the deployment, we re-used the same spec
Therefore, the pods created by the daemon set get the same labels
When we use kubectl run stuff, the label is run=stuff instead

�Updating load balancer configuration

We would like to remove a pod from the load balancer
What would happen if we removed that pod, with kubectl delete pod...?
- It would be re-created immediately(by the replica set ro the daemon set)
What would happen if we removed the app=rng label from that pod?
- It would also be re-created immediately
- Why?!?

Seletors for replia sets and daemon sets

The "mission" of a replica set is:
- "Make sure that there is the right number of pods matching this spec!"
The "mission" of a daemon set is:
- "Make sure that there is a pod matching this spec on each node!"
In fact, replica sets adn daemon sets do not check pod specifications
They merely have a selector, and they look for pods matching that selector
Yes, we can fool them by manually creating pods with the "right" labels
Bottom line:if we remove our app=rng label... *...The pod "disappears" for its parent, which re-creates another pod to replace it
Since both the rng daemon set and the rng replica set use app=rng...
- ...Why don't they "find" each other's pods?
Replica sets have a more specific selector, visible with kubectl describe
- (It looks like app=rng, pod-template-hash=abcd1234)
Daemon sets also have a more specific selector, but it's invisible
- (It looks like app=rng, controller-revision-hash=abcd1234)
As a result,each controller only "sees" the pods it manages

Removing a pod from the load balancer

Currently, the rng service is defined by the app=rng selector
The only way to remove a pod is to remove or change the app label
...But that will cause another pod to be created instead!
What's the solution?
We need to change the selector of the rng service!
Let's add another label to that selector(e.g. enabled=yes)

Complex selectors

If a selector specifies multiple labels, they are understood as a logical AND (In other words: the pods must match all the labels)
Kubernetes has support for advanced, set-based selectors (But these cannot be used with services, at least not yet!)

The plan

Add the label enabled=yes to all our rng pods
Update the selector for the rng service to also include enabled=yes
Toggle traffic to a pod by manually adding/removing the enabled label
Profit!

Note: if we swap steps 1 and 2, it will cause a short
 service disruption, becasuse there will be
a period of time during which the service selector
 won't match any pod. During that time,
requests to the service will time out.By doing
 things in the order above, we guarantee that
there won't be any interruption.

Adding labels to pods

We want to add the label enabled=yes to all pods that have app=rng
We could edit each pod one by one with kubectl edit...
... Or we could use kubectl label to label them all
kubectl label can use selectors itself

Exercise

Add enabled=yes to all pods that have app=rng

kubectl label pods -l app=rng enabled=yes

Updating the service selector

We need to edit the service specification
Reminder: in the service deinition, we will see app: rng in two places
- the label of the service itself(we don't need to touch that one)
- the selector of the service(that's the one we want to change)

Exercise

Update the service to add enabled: yes to its selector:

kubectl edit service rng

When the YAML parser is being too smart

YAML parsers try to help us:
- xyz is the string "xyz"
- 42 is the integer 42
- yes is the boolean value true
If we want the string "42" or the string "yes", we have to quote them
So we have to use enabled: "yes"

For a good laugh:if we had used "ja", "oui", "si"... as the value,it would have world.

Updating the service selector, take 2

Exercise

Update the service to add enabled: "yes" to its selector:

kubectl edit service rng

This time is should work!
If we did everything correctly, the web UI shouldn't show any change.

Updating labels

We want to disable the pod that was created by the deployment
All we have to do, is remove the enabled label from that pod
To identify that pod, we can use its name
... Or rely on the fact that it's the only one with a pod-template-hash label
Good to know:
- kubctl label ... foo= doesn't remove a label(it sets it to an empty string)
- to remove label foo, use kubectl label ... foo-
- to change an existing label, we would need to add --overwrite

Removing a pod from the load balancer

Exercise

In one window, check the logs of that pod:

POD=$(kubectl get pod -l app=rng,pod-template-hash -o name)
kubectl logs --tail 1 --follow $POD

(We should see a steady stream of HTTP logs)

In another window, remove the label from the pod:

kubectl label pod -l app=rng,pod-template-hash enabled-

(The stream of HTTP logs should stop immediately)

There might be a slight change in the web UI(since we removed a bit of capacity from rng service).if we remove more pods, the effect should be more visible.

Updating the daemon set

If we scale up our cluster by adding new nodes, the daemon set will create more pods
These pods won't have the enable=yes label
If we want these pods to have that label, we need to edit the daemon set spec
We can do that with e.g.kubectl edit daemonset rng

Labels and debugging

When a pod is misbehaving, we can delete it:another one will be recreated
But we can also change its labels
It will be removed from the load balancer(it won't receive traffic anymore)
Another pod will be recreated immediately
But the problematic pod is still here, and we can inspect and debug it
We can even re-add it to the rotation if necessary
- (Very useful to troubleshoot intermittent and elusive bugs)

Labels and advanced rollout control

Conversely, we can add pods matching a service's selector
These pods will then receive requests and serve traffic
Examples:
- one-shot pod with all debug flags enabled, to collect logs
- pods created automatically, but added to rotation in a second step (by setting their label accordingly)
This gives us building blocks for canary and blue/green deployments

Cleanup

Let's cleanup before we start the next lecture!

Exercise

remove our DockerCoin resources(for now):

kubectl delete -f https://k8smastery.com/dockercoins.yaml
kubectl delete daemonset/rng

Authoring YAML

To use Kubernetes is to "live in YAML"!
It's more important to learn the foundations then to memorize all YAML keys(hundreds+)
There are various ways to generate YAML with Kubernetes, e.g.:
- kubectl run
- kubectl create deployment(and a few other kubectl create variants)
- kubectl expose
These commands use "generators" because the API only accepts YAML(actually JSON)
Pro: They are easy to use
Con: They have limits
When and why do we need to write our own YAML?
How do we write YAML from scratch?
And maybe,what is YAML?

YAML Basics(just in case you need a refresher)

It's technically a superset of JSON, designed for humans
JSON was good for machines, but not for humans
Spaces set the structure. One space off and game over
Remember spaces not tabs,Ever!
Two spaces is standard, but four spaces works too
You don't have to learn all YAML features, but key concepts you need:
- Key/Value Pairs
- Array/Lists
- Dictionary/Maps
Good online tutorials exist here, here, here, and YouTube here

Basic parts of any Kubernetes resource manifest

Can be in YAML or JSON, but YAML is 100%.
Each file contains one or more manifests
Each manifest describes an API object(deployment, service, etc.)
Each manifest needs four parts(root key:values in the file)

apiVersion:
kind:
metadata:
spec:

A simple Pod in YAML

This is a single manifest that creates one Pod

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.17.3

Deployment and Service manifests in one YAML file

apiVersion: v1
kind: Service
metadata:
  name: mynginx
spec:
  type: NodePort
  ports:
  - port: 80
  selector:
    app: mynginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mynginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: mynginx
  template:
    metadata:
      labels:
        app: mynginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.17.3

The limits of generate YAML

Advanced(and even not-so-advanced) features require us to write YAML:
- pods with multiple containers
- resource limits
- healthchecks
- many other resource options
Other resource types don't have their own commands!
- DaemonSets
- StatefulSets
- and more!
How do we access these features?

We don't have to start from scratch

Output YAML from existing resources
- Create a resource(e.g. Deployment)
- Dump its YAML with kubectl get -o yaml...
- Edit the YAML
- Use kubectl apply -f ... with the YAML file to:
- update the resource(if it's the same kind)
- create a new resource(if it's a different kind)
Or...we have the docs, with good starter YAML
- StatefulSet, DaemonSet, ConfigMap, and a ton more on Github
Or...we can use -o yaml --dry-run

Generating YAML without creating resources

We can use the -o yaml --dry-run option combo withe run and create

Exercise

Generate the YAML for a Deployment without creating it:

kubectl create deployment web --image nginx -o yaml --dry-run

Generate the YAML for a Namespace without creating it:

kubectl create namespace awesome-app -o yaml --dry-run

We can clean up the YAML even more if we want
- (for instance, we can remove the creationTimestamp and empty dicts)

Try `-o yaml --dry-run` with other create commands

clusterrole         # Create a ClusterRole.
clusterrolebinding  # Create a ClusterRoleBinding for a particular ClusterRole.
configmap           # Create a configmap from a local file, directory or literal
cronjob             # Create a cronjob with the specified name
deployment          # Create a deployment with the specified name
job                 # Create a job with the specified name
namespace           # Create a namespace with the specified name
poddisruptionbudget # Create a pod disruption budget with the specified name
priorityclass       # Create a priorityclass with the specified name
quota               # Create a quota with the specified name
role                # Create a role with single rule
rolebinding         # Create a RoleBinding for a particular Role or ClusterRole
secret              # Create a secret using specified subcommand
service             # Create a service using specified subcommand
serviceaccount      # Create a service account with the specified name

Ensure you use valid create commands with required options for each

Writing YAML from scratch, "YAML The Hard Way"

Paying homage to Kelsey Hightower's "Kubernetes The Hard Way"
A reminder about manifest:
- Each file contains one or more manifests
- Each manifest describes an API object(deployment, service, etc.)
- Each manifest needs four parts(root key:values in the file)

apiVersion: # find with "kubectl api-versions"
kind:       # find with "kubectl api-resources"
metadata:
spec:       # find with "kubectl describe pod"

Those three kubectl commands, plus the API docs, is all we'll need

General workflow of YAML from scrach

Find the resource kind you want to create(api-resources)
Find the latest apiVersion your cluster supports for kind(api-versions)
Give it a name in metadata(minimum)
Dive into the spec of that kind
- kubectl explain <kind>.spec
- kubectl explain <kind> --recursive
Browse the docs API Reference for your cluster version to supplement
Use --dry-run and --server-dry-run for testing
kubectl create and delete until you get it right

kubectl api-resources
kubectl api-versions
kubectl explain pod
kubectl explain pod.spec
kubectl explain pod.spec.volumes
kubectl explain pod.spec --recursive

Advantage of YAML

Using YAML(instead of kubectl run/create/etc. )allows to be declarative
The YAML describes the desired state of our cluster and applications
YAML can be stored. versioned, archived(e.g. in git repositories)
To change resource, change the YAML files(instead of using kubectl edit/scale/label/etc.)
Changes can be reviewed before being applied (with code reviews, pull requests...)
This workflow is sometimes called "GitOps" (there are tools like Weave Flux or GitKube to facilitate it)

YAML in practice

Get started with kubectl run/create/expose/etc.
Dump the YAML with kubectl get -o yaml
Tweak that YAML and kubectl apply it back
Store that YAML for reference(for further deployments)
Feel free to clean up the YAML
- remove fields you don't know
- check that it still works!
That YAML will be useful later when using e.g. Kustomize or Helm

YAML linting and validation

Use generic linters to check proper YAML formatting
- yamllint.com
- codebeautiful.org/yaml-validator
For humans without kubectl, use a web Kubernetes YAML validator: kubeyaml.com
In CI, you might use CLI tools
- YAML linter: pip install yamllint github.com/adrienverge/yamllint
- Kubernetes validator: kubeval github.com/instrumenta/kubeval
We'll learn about Kubernetes cluster-specific validation with kubectl later

Using server-dry-run and diff

We already talked about using --dry-run for building YAML
Let's talk more about options for testing YAML
Including testing against the live cluster API!

Using `--dry-run` with `kubectl apply`

The --dry-run option can also be used with kubectl apply
However, it can be mileading(it doesn't do a "real" dry run)
Let's see what happens in the following scenario:
- generate the YAML for a Deployment
- tweak the YAML to transform it into a DaemonSet
- apply that YAML to see what would actually be created

The limits of `kubectl apply --dry-run`

Exercise

Generate the YAML for a deployment:

kubectl create deployment web --image=nginx -o yaml > web.yaml

Change the kind in the YAML to make it a DaemonSet
Ask kubectl what would be applied:

kubectl apply -f web.yaml --dry-run --validate=false -o yaml

The resulting YAML doesn't represent a valid DaemonSet.

Server-side dry run

Since Kubernetes 1.13, we can use server-side dry run and diffs
Server-side dry run will do all the work, but not persist to etcd
- (all validation and mutation hooks will be executed)

Exercise

Try the same YAML files as earlier,with server-side dry run:

kubectl apply -f web.yaml --server-dry-run --validate=false -o yaml

The resulting YAML doesn't have the replicas field anymore.
Instead, it has the fields expected in a DaemonSet

Advantages of server-side dry run

The YAML is verified much more extensively
The only step that is skipped is "write to etcd"
YAML that passes server-side dry run should apply successfully (unless the cluster state changes by the time the YAML is actually applied)
Validating or mutating hooks that have side effects can also be an issue

`kubectl diff`

Kubernetes 1.13 also introduced kubectl diff
kubectl diff does a server-side dry run, and shows differences

Exercise

Try kubectl diff on a simple Pod YAML:

curl -O https://k8smastery.com/just-a-pod.yaml
kubectl apply -f just-a-pod.yaml
# edit the image ta to :1.17
kubectl diff -f just-a-pod.yaml

Note: we don't need to specify --validate=false here.

Cleanup

Let's cleanup before we start the next lecture!

Exercise

remove our "hello" pod:

kubectl delete -f just-a-pod.yaml

Rolling updates

By default(without rolling updates), when a scaled resource is updated:
- new pods are created
- old pods are terminated
- ...all at the same time
- if something goes wrong,
With rolling updates, when a Deployment is updated, it happens progressively
The Deployment controls multiple ReplicaSets
Each ReplicaSet is a group of identical Pods (with the same image, arguments, parameters)
During the rolling update, we have at least two ReplicaSets:
- the "new" set(corresponding to the "target" version)
- at least one "old" set
We can have multiple "old" sets (if we start another update before the first one is done)

Update strategy

Two parameters determine the pace of the rollout: maxUnavailable and maxSurge
They can be specified in absolute number of pods, or percentage of the replicas count
At any given time...
- there will always be at least replicas-maxUnavailable pods available
- there will never be more than replicas + maxSurge pods in total
- there will therefore be up to maxUnavailable + maxSurge pods being updated
We have the possibility of rolling back to the previous version (if the update fails or is unsatisfactory in any way)

Checking current rollout parameters

Recall how we build custom reports with kubectl and jq:

Exercise

Show the rollout plan for our deployments:

kubectl apply -f https://k8smastery.com/dockercoins.yaml

kubectl get deploy -o json | jq ".items[] | {name:.metadata.name} + .spec.strategy.rollingUpdate"

Rolling updates in practice

As of Kubernetes 1.8, we can do rolling updates with: deployments, daemonsets, statefulsets
Editing one of these resources will automatically result in a rolling update
Rolling updates can be monitored with the kubectl rollout subcommand

Rolling out the new `worker` service

Exercise

Let's monitor what's going on by opening a few terminals, and run :

kubectl get pods -w
kubectl get replicasets -w
kubectl get deployments -w

Update worker either with kubectl edit, or by running:

kubectl set image deploy worker worker=dockercoins/worker:v0.2

That rollout should be pretty quick. What shows in the web UI?

Give it some time

At first, it looks like nothing is happening(the graph remains at the same level)
According to kubectl get deploy -w, the deployment was updated really quickly
But kubectl get pods -w tells a different story
The old pods are still here, and they stay in Terminating state for a while
Eventually, they are terminated; and then the graph decreases significantly
This delay is due to the fact that our worker doesn't handle signals
Kubernetes sends a "polite" shutdown request to the worker,which ignores it
After a grace period, Kubernetes gets impatient and kills the container (The grace period is 30 seconds, but can be changed if needed)

Rolling out something invalid

What happens if we make a mistake?

Exercise

Update worker by specifiying a non-existent image:

kubectl set image deploy worker worker=dockercoins/worker:v0.3

Check what's going on:

kubectl rollout status deploy worker

Our rollout is stuck. However, the app is not dead. (After a minute, it will stabilize to be 20-25% slower.)

kubectl get pods
# ErrImagePull

What's going on with our rollout?

Why is our app a bit slower?
Because MaxUnavailable=25%
- ...So the rollout terminated 2 replicas out of 10 available
Okay, but why do we see 5 new replicas being rolled out?
Because MaxSurge=25% ...So in addition to replacing 2 replicas, the rollout is also starting 3 more
It rounded down the number of MaxUnavailable pods conservatively. but the total number of pods being rolled out is allowed to be 25+25=50%

The nitty-gritty details

We start with 10 pods running for the worker deployment
Current settings: MaxUnavailable=25% and MaxSurge=25%
When we start the rollout:
- two replicas are taken down(as per MaxUnavailable=25%)
- two others are created(with the new version) to replace them
- three others are created(with the new version) per MaxSurge=25%)
Now we have 8 replicas up and running, and 5 being depoyed
Our rollout is stuck at this point!

Checking the dashboard during the bad rollout

If you didn't deploy the Kubernetes dashboard earlier, just kip this slide.

Exercise

Connect to the dashboard that we deployed earlier
Check that we have failures in Deployments, Pods, and Replica Sets
Can we see the reason for the failure?

kubectl describe deploy worker

Recovering from a bad rollout

We could push some v0.3 image (the pod retry logic will eventually catch it and the rollout will proceed)
Or we could invoke a manual rollback

Exercise

Cancel the deployment and wait for the dust to settle:

kubectl rollout undo deploy worker
# deployment.extensions/worker rolled back
kubectl rollout status deploy worker
# deployment "worker" successfully rolled out

Rolling back to an older version

We reverted to v0.2
But this version still has a performance problem
How can we get back to the previous version?

Multiple "undos"

What happens if we try kubectl rollout undo again?

Exercise

Try it:

kubectl rollout undo deployment worker

Check the web UI, the list of pods....

🤔That didn't work.

Multiple "undos" don't work

If we see successive versions as a stack:
- kubectl rollout undo doesn't "pop" the last element from the stack
- it copies the N-1th element to the top
Multiple "undos" just swap back and forth between the last two versions!

Exercise

Go back to v0.2 again:

kubectl rollout undo deployment worker

In this specific scenario

Our version numbers are easy to guess
What if we had used git hashes?
What if we had changed other parameters in the Pod spec?

Listing versions

We can list successive versions of a Deployment with kubectl rollout history

Exercise

Look at our successive versions:

kubectl rollout history deployment worker

We don't see all revisions.

We might see something like 1, 4, 5.

(Depending on how many "undos" we did before.)

Explaining deployment revisions

These revisions correspond to our ReplicaSets
This information is stored in the ReplicaSet annotations

Exercise

Check the annotations for our replica sets:

kubectl describe replicasets -l app=worker | grep -A3 Annotations

What about the missing revisions?

The missing revisions are stored in another annotation:

deployment.kubernetes.io/revision-history

These are not shown in kubectl rollout history
We could easily reconstruct the full list with a script (if we wanted to!)

Rolling back to an older version

kubectl rollout undo can work with a revision number

Exercise

Roll back to the "known good" deployment version:

kubectl rollout undo deployment worker --to-revision=1

Check the web UI or the list of pods

Changing rollout parameters

What if we wanted to, all at once:
- change image to v0.1
- be conservative on availability(always have desired number of available workers)
- go slow on rollout speed(update only one pod at a time)
- give some time to our workers to "warm up" before starting more
The corresponding changes can be expressed in the following YAML snippet:

spec:
  template:
    spec:
      containers:
      - name: worker
        image: dockercoins/worker:v0.1
  strategy:
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  minReadySeconds: 10

Applying changes through a YAML patch

We could use kubectl edit deployment worker
But we could also use kubectl patch with the exact YAML shown before
Apply our changes and wait for them to take effect:

kubectl patch deployment worker -p "
spec:
  template:
    spec:
      containers:
      - name: worker
        image: dockercoins/worker:v0.1
  strategy:
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  minReadySeconds: 10
"
kubectl rollout status deployment worker
kubectl get deploy -o json worker | jq "{name: .metadata.name} + .spec.strategy.rollingUpdate"

Healthchecks

Healthchecks are key to providing built-in lifecycle automation
Healthchecks are probes that apply to containers(not to pods)
Kubernetes will take action on containers that fail healthchecks
Each container can have three(optional) probes:
- liveness = is this container dead or alive?(most important probe)
- readiness = is this container ready to serve traffic?(only needed if a service)
- startup = is this container still starting up?(alpha in 1.16)
Different probe handlers are available(HTTP, TCP, program execution)
They don't replace a full monitoring solution
Let's see the difference and how to use them!

Liveness probe

Indicates if the container is dead or alive
A dead container cannot come back to life
If the liveness probe fails, the container is killed (to make really sure that it's really dead; no zombies or undeads!)
What happens next depends on the pod's restartPolicy:
- Never: the container is not restarted
- OnFailure or Always: the container is restarted

When to use a liveness probe

To indicate failures that can't be recovered
- deadlocks(causing all requests to time out)
- internal corruption(causing all requests to error)
Anything where our incident response would be "just restart/reboot it"
Do not use liveness probes for problems that can't be fixed by a restart
Otherwise we just restart our pods for no reason, creating useless load

Readiness probe

Indicates if the container is ready to serve traffic
If a container becomes "unready" it might be ready again soon
If the readiness probe fails:
- the container is not killed
- if the pod is memeber of a service, it is temporarily removed
- it is re-added as soon as the readiness probe passes again

When to use a readiness probe

To indicate failure due to an external cause
database is down or unreachable
mandatory auth or other backend service unavailable
To indicate temporary failure or unavailability
- application can only service N parallel connections
- runtime is busy doing garbage collection or ini

Startup probe

Kubernetes 1.16 introduces a third tye of probe: startupProbe (it is in alpha in Kubernetes 1.16)
It can be used to indicate "container not ready yet"
- process is still starting
- loading external data, priming caches
Before Kubernetes 1.16, we had to use the initialDelaySeconds parameter (available for both liveness and readiness probes)
initialDelaySeconds is a rigid delay(always wait X before running probes)
startupProbe works better when a container start time can vary a lot

Benefits of using probes

Rolling updates proceed when containers are actually ready (as opposed to merely started)
Containers in a broken state get killed and restarted (instead of serving errors or timeouts)
Unavailable backends get removed from load balancer rotation (thus improving response times across the board)
If a probe is not defined, it's as if there was an "always successful" probe

Different types of probe handlers

HTTP request
- specify URL of the request(and optional headers)
- any status code between 200 and 399 indicates success
TCP connection
- the probe succeeds if the TCP port is open
arbitrary exec
- a command is executed in the container
- exit status of zero indicates success

Timing and thresholds

Probes are executed at intervals of periodSeconds(default: 10)
The timeout for a probe is set with timeoutSeconds(default: 1)

If a probe takes longer than that,it is considered as a FAIL

A probe is considered successful after successThreshold successes(default: 1)
A probe is considered failing after failureThreshold failures (default: 3)
A probe can have an initialDelaySeconds parameter(default: 0)
Kubernetes will wait that amount of time before running the probe for the first time (this is important to avoid killing services that take a long time to start)

Example:HTTP probe

Here is a pod template for the rng web service of the DockerCoins app:

apiVersion: v1
kind: Pod
metadata:
  name: rng-with-liveness
spec:
  containers:
  - name: rng
    image: dockercoins/rng:v0.1
    livenessProbe:
      httpGet:
        path: /
        port: 80
      initialDelaySeconds: 10
      periodSeconds: 1

If the backend serves an error, or takes longer than 1s, 3 times in a row,it gets killed.

Example: exec probe

Here is a pod template for a Redis server:

apiVersion: v1
kind: Pod
metadata:
  name: redis-with-liveness
spec:
  containers:
  - name: redis
    image: redis
    livenessProbe:
      exec:
        command: ["redis-cli", "ping"]

If the Redis process becomes unresponsive, it will be killed.

Should probes check container Dependencies?

A HTTP/TCP probe can't check an external dependency
But a HTTP URL could kick off code to validate a remote dependency
If a web server depends on a database to function, and the database is down:
- the web server's liveness probe should succeed
- the web server's readiness probe should fail
Same thing for any hard dependency(without which the container can't work)

Do not fail liveness probes for problems that are external to the container

Healthchecks for workers

(In that context, worker = process that doesn't accept connections)

Readiness isn't useful(because workers aren't backends for a service)
Liveness may help us restart a broken worker, but how can we check it?
Embedding an HTTP server is a(potentially expensive) option
Using a "lease" file can be relatively easy:
- touch a file during each iteration of the main loop
- check the timestamp of that file from an exec probe
Writing logs(and checking them from the probe) also works

Questions to ask before adding healthchecks

Do we want liveness, readiness, both? (sometimes, we can use the same check, but with different failture thresholds)
Do we have existing HTTP endpoints that we can use?
Do we need to add new endpoints, or perhaps use something else?
Are our healthchecks likely to use resources and/or slow down the app?
Do they depend on additional services? (this can be particularly tricky)

Adding healthchecks to an app

Let's add healthchecks to DockerCoins!
We will examine the questions of the previous slide
Then we will review each component individually to add healthchecks

Liveness, readiness, or both?

To answer that question, we need to see the app run for a while
Do we get temporary, recoverable glitches?
- -> then sue liveness
Or do we get hard lock-ups requiring a restart?
- -> then use liveness
In the case of DockerCoins, we don't know yet!
Let's pick liveness

Do we have HTTP endpoints that we can use？

Each of the 3 web services(hasher, rng, webui) has a trivial route on /
These routes:
- don't seem to perform anything complex or expensive
- don't seem to call other services
Perfect! (See next slides for individual details)

hasher.rb

get '/' do
  "HASHER running on #{Socket.gethostname}\n"
end

rng.py

@app.route("/")
def index():
  return "RNG running on {}\n".format(hostname)

webui.js

app.get('/', function (req, res) {
  res.redirect('/index.html')
})

Retrieving DockerCoins manifests

I've split up the previous dockercoins.yaml into one-resource-per-file
This works with the apply command, and is easier for humans to manage
Clone them locally so we can add healthchecks and re-apply

Exercise

Clone that repository:

git clone https://github.com/bretfisher/kubercoins

Change directory to the repository:

cd kubercoins

A simple HTTP liveness probe

This is what our liveness probe should look like:

containers:
- name: ...
image: ...
livenessProbe:
  httpGet:
    path: /
    port: 80
  initialDelaySeconds: 30
  periodSeconds: 5

This will give 30 seconds to the service to start.(Way more than necessary!)

It will run the probe every 5 seconds.

It will use the default timeout(1 second).

It will use the default failure threshold(3 failed attempts = dead).

It will use the default success threshold(1 successful attempt = alive).

Adding the liveness probe

Let's add the liveness probe, then deploy DockerCoins
Remember if you don't have DockerCoins running, this will create
If you already have DockerCoins running, this will update rng

Exercise

Edit rng-deployment.yaml and add the liveness probe

vim rng-deployment.yaml

Load the YAML for all the resources of DockerCoins

kubectl apply -f .

Testing the liveness probe

The rng service needs 100ms to process a request (because it is single-threaded and sleeps 0.1s in each request)
The probe timeout is set to 1 second
If we send more than 10 requests per second per backend,it will break
Let's generate traffic and see what happens!

Exercise

Get the ClusterIP address of the rng service:

kubectl get svc rng

Monitoring the rng service

Each command below will show us what's happening on a different level

Exercise

In one window, monitor cluster events:

kubectl get events -w

In another window, monitor pods status:

kubectl get pods -w

Generating traffic

Let's use ab (Apache Bench) to send concurrent requests to rng

Exercise

In yet another window, generate traffic using shpod:

kubectl attach --namespace=shpod -ti shpod
ab -c 10 -n 1000 http://<ClusterIP>/1

Experiment with higher values of -c and see what happens
The -c parameter indicates the number of concurrent requests
The final /1 is important to generate actual traffic (otherwise we would use the ping endpoint, which doesn't sleep 0.1s per request)

e.g.

# t1
kubectl get svc rng
# kubectl apply -f https://k8smastery.com/shpod.yaml
kubectl attach --namespace=shpod -ti shpod
ab -c 10 -n 1000 http://10.101.25.252/1

# t2
kubectl get events -w
# t3
kubectl get pods -w

Discussion

Above a given threshold, the liveness probe starts failing
- (about 10 concurrent requests per backend should be plenty enough)
When the liveness probe fails 3 times in a row, the container is restarted
During the restart, there is less capacity available
...Meaning that the other backends are likely to timeout as well
...Eventually causing all backends to be restarted
...And each fresh backend gets restarted, too
This goes on until load goes down, or we add capacity

This wouldn't be a good healthcheck in a real application!

Better healthchecks

We need to make sure that the healthcheck doesn't trip when performance degrades due to external pressure
Using a readiness check would have fewer effects
- (but it would still be an imperfect solution)
A possible combination:
- readiness check with a short timeout / low failure threshold
- liveness check with a longer timeout / higher failure threshold

Healthchecks for redis

A liveness probe is enough (it's not useful to remove a backend from rotation when it's the only one)
We could use an exec probe running redis-cli ping

Exec probes and zombies

When using exec probes, we should make sure that we have a zombie reaper 🤔🤔Wait,what?
When a process terminates, its parent must call wait()/waitpid() (this is how the parent process retrieves the child's exit status)
In the meantime, the process is in zombie state (the process state will show as z in ps, top...)
When a process is killed, its children are orphaned and attached to PID 1
PID 1 has the responsibility of reaping these processes when they terminate
OK, but how does that affect us?

PID 1 in containers

On ordinary systems, PID 1(/sbin/init) has logic to reap porcesses
In containers, PID 1 is typically our application process (e.g. Apache, the JVM, NGINX, Redix...)
These do not take care of reaping orphans
If we use exec probes, we need to add a process reaper
We can add tini to our images
Or share the PID namespace between containers of a pod
- (and have gcr.io/pause take care of the reaping)
Discussion of this in Video - 10 Ways to Shoot Yourself in the Foot with Kubernetes, Will Surprise You

Tini and redis ping in a liveness probe

Add tini to your own custom redis image
Change the kubercoins YAML to use your own image
Create a liveness probe in kuberoins YAML
Use exec handeler and run tini -s -- redis-cli ping
Example repo here: github.com/BretFisher/redis-tini

containers:
- name: redis
  image: custom-redis-image
  livenessProbe:
    exec:
      command:
      - /tini
      - -s
      - --
      - redis-cli
      - ping
    initialDelaySeconds: 30
    periodSeconds: 5

Cleanup

Let's cleanup before we start the next lecture!

Exercise

remove our DockerCoin resources(for now):

kubectl delete -f https://k8smastery.com/dockercoins.yaml

Managing configuration

Some application need to be configured(obviously!)
There are many ways for our code to pick up configuration:
- command-line arguments
- environment variables
- configuration files
- configuration servers(getting configuration from a database, an API...)
- ... and more(because programmers can be very creative!)
How can we do these things with containers and Kubernetes?

Passing configuration to containers

There are many ways to pass configuration to code running in a container:
- baking it into a custom image
- command-line arguments
- environment variables
- injecting configuration files
- exposing it over the Kubernetes API
- configuration servers
Let's review these different strategies!

Baking custom images

Put the configuration in the image (it can be in a configuration file, but also ENV or CMD actions)
It's easy! It's simple!
Unfortunately, it also has downsides:
- multiplication of images
- different images for dev, staging, prod ...
- minor reconfigurations require a whole build/push/pull cycle
Avoid doing it unless you don't have the time to figure out other options

Command-line arguments

Pass options to args array in the container specification
Example(source):

args:
  - "--data-dir=/var/lib/etcd"
  - "--advertise-client-urls=http://127.0.0.1:2379"
  - "--listen-client-urls=http://127.0.0.1:2379"
  - "--listen-peer-urls=http://127.0.0.1:2380"
  - "--name=etcd"

The options can be passed directly to the program that we run... ...or to a wrapper script that will use them to e.g. generate a config file

Command-line arguments, pros の cons

Works great when options are passed directly to the running program (otherwise, a wrapper script can work around the issue)
Works great when there aren't too many parameters (to avoid a 20-lines args array)
Requires documentation and/or understanding of the underlying program ("which parameters and flags do I need, again?")
Well-suited for mandatory parameters(without default values)
Not ideal when we need to pass a real configuration file anyway

Environment variables

Pass options through the env map in the container specification
Example:

env:
- name: ADMIN_PORT
  value: "8080"
- name: ADMIN_AUTH
  value: Basic
- name: ADMIN_CRED
  value: "admin:0pensesame!"

value must be a string! Make sure that numbers and fancy strings are quoted

🤔Why this weired {name: xxx, value: yyy} scheme?It will be revealed soon!

The Downward API

In the previous example, environment variables have fixed values
We can also use a mechanism called the Downward API
The Downward API allows exposing pod or container information
- either through special files(we won't show that for now)
- or through environment variables
The value of these environment variables is computed when the container is started
Remember: environment variables won't(can't) change after container start
Let's see a few concrete examples!

Exposing the pod's namespace

- name: MY_POD_NAMESPACE
  valueFrom:
    fileRef:
      fieldPath: metadata.namespace

Useful to generate FQDN of services (in some contexts, a short name is not enough)
For instance, the two commands should be equivalent:

curl api-backend
curl api-backend.$MY_POD_NAMESPACE.svc.cluster.local

Exposing the pod's IP address

- name: MY_POD_IP
  valueFrom:
    fieldRef:
      fieldPath: status.podIP

Useful if we need to know our IP address (we could also read it from eth0, but this is more solid)

Exposing the container's resource limits

- name: MY_MEM_LIMIT
  valueFrom:
    resouceFieldRef:
      containerName: test-container
      resource: limits.memory

Useful for runtimes where memory is garbage collected
Example: the JVM (the memory available to the JVM should be set with the -Xmx flag)
Best practice:set a memory limit, and pass it to the runtime
Note: recent versions of the JVM can do this automatically (see JDK-8146115) and this blog post for detailed examples

More about the Downward API

This documentation page tells more about these environment variables
And this one explains the other way to use the Downward API (through files that get created in the container filesystem)

Environment variables, pros and cons

Works great when the running program expects these variables
Works great for optional parameters with reasonable defaults (since the container image can provide these defaults)
Sort of auto-documented (we can see which environment variables are defined in the image, and their values)
Can be (ab)used with longer values...
...You can put an entire Tomcat configuration file in an environment...
...But should you? (Do it if you really need to, we're not judging! Bu we'll see better ways.)

Injecting configuration files with ConfigMaps

Sometimes, there is no way around it: we need to inject a full config file
Kubernetes provides a mechanism for that purpose: ConfigMaps
A ConfigMap is a Kubernetes resource that exits in a namespace
Conceptually, it's a key/value map (values are arbitrary strings)
We can think about them in(at least) two different ways:
- as holding entire configuration file(s)
- as holding individual configuration parameters

Note: to hold sensitive information, we can use "Secrets", which are another typ of.. resource behaving very much like ConfigMaps. We'll cover them just after!

ConfigMaps storing entire files

In this case, each key/value pair corresponds to a configuration file
Key = name of the file
Value = content of the file
There can be one key/value pair, or as many as necessary (for complex apps with multiple configuration files)
Examples:

# Create a ConfigMap with a single key, "app.conf"
kubectl create configmap my-app-config --from-file=app.conf
# Create a ConfigMap with a single key, "app.conf" but another file
kubectl create configmap my-app-config --from-file=app.conf=app-prod.conf
# Create a ConfigMap with multiple keys (one per fiel in the config.d directory)
kubectl create configmap my-app-config --from-file=config.d/

ConfigMaps storing individual parameters

In this case, each key/value pair corresponds to a parameter
Key = name of the parameter
Value = value of the parameter
Examples:

# Create a ConfigMap with two keys
kubectl create cm my-app-config \
  --from-literal=foreground=red \
  --from-literal=foreground=blue

# Create a ConfigMap from a file containing key=val pairs
kubectl create cm my-app-config \
  --from-env-file=app.conf

Exposing ConfigMaps to containers

ConfigMaps can be exposed as plain files in the filesystem of a container
- this is achieved by declaring a volume and mounting it in the container
- this is particularly effective for ConfigMaps containing whole files
ConfigMaps can be exposed as environment variables in the container
- this is achieved with the Downward API
- this is particularly effective for ConfigMaps containing individual parameters
Let's see how to do both!

Passing a configuration file with a ConfigMap

We will start a load balancer powered by HAProxy
We will use the official haproxy image
It expects to find its configuration in /usr/local/etc/haproxy/haproxy.cfg
We will provide a simple HAProxy configuration
It listens on port 80, and load balances connections between IBM and Google

Creating the ConfigMap

Exercise

Download our simple HAProxy config:

curl -O https://k8smastery.com/haproxy.cfg

Create a ConfigMap named haproxy and holding the configuration file:

kubectl create configmap haproxy --from-file=haproxy.cfg

Check what our ConfigMap looks like:

kubectl get configmap haproxy -o yaml

Using the ConfigMap

We are going to use the following pod definition:

apiVersion: v1
kind: Pod
metadata:
  name: haproxy
spec:
  volumes:
  - name: config
    configMap:
      name: haproxy
  containers:
  - name: haproxy
    image: haproxy
    volumeMounts:
    - name: config
      mountPath: /usr/local/etc/haproxy/

Using the ConfigMap

Apply the resource definition from the previous slide

Exercise

Create the HAProxy pod:

kubectl apply -f https://k8smastery.com/haproxy.yaml

Check the IP address allocated to the pod, inside shpod:

kubectl attach --namespace=shpod -ti shpod
kubectl get pod haproxy -o wide
IP=$(kubectl get pod haproxy -o json | jq -r .status.podIP)

Testing our load balancer

The load balancer will send:
- half of the connections to Google
- the other half to IBM

Exercise

Access the load balancer a few times:

curl $IP
curl $IP
curl $IP

We should see connections served by Google, and others served by IBM. (Each server sends us a redirect page. Look at the URL that they send us to!)

Exposing ConfigMaps with the Downward API

We are going to run a Docker registry on a custom port
By default, the registry listens on port 5000
This can be changed by setting environment variable REGISTRY_HTTP_ADDR
We are going to store the port number in a ConfigMap
Then we will expose that ConfigMap as a container environment variable

Creating the ConfigMap

Exercise

Our ConfigMap will have a single key,http.addr:

kubectl create configmap registry --from-literal=http.addr=0.0.0.0:80

Check our ConfigMap:

kubectl get configmap registry -o yaml

Using the ConfigMap

We are going to use the following pod definition:

apiVersion: v1
kind: Pod
metadata:
  name: registry
spec:
  containers:
  - name: registry
    image: registry
    env:
    - name: REGISTRY_HTTP_ADDR
      valueFrom:
        configMapKeyRef:
          name: registry
          key: http.addr

The resource definition from the previous slide:

Exercise

Create the registry pod:

kubectl apply -f https://k8smastery.com/registry.yaml

Check the IP address allocated to the pod

kubectl attach --namespace=shpod -ti shpod
kubectl get pod registry -o wide
IP=$(kubectl get pod registry -o json | jq -r .status.podIP)

Confirm that the registry is available on port 80:

curl $IP/v2/_catalog

Passwords, tokens, sensitive information

For sensitive information, there is another special resource: Secrets
Secrets and Configmaps work almost the same way (we'll expose the differences on the next slide)
The intent is different, though:

"You should use secrets for things which are actually
 secret like API keys, credentials, etc.and use config map
 for not-secret configuration data."
"In the future there will likely be some differentiators
 for secrets like rotation or support for backing the
 secret API w/HSMs, etc."

(Source: the author of both features)

Differences between ConfigMaps and Secrets

Secrets are base64-encoded when shown with kubectl get secrets -o yaml
- keep in mind that this is just encoding, not encryption
- it is very easy to automatically extract and decode secrets
Secrets can be encrypted at rest
With RBAC, we can authorize a user to access ConfigMaps, but not Secrets (since they are two different kinds of resources)

Cleanup

Let's cleanup before we start the next lecture!

Exercise

remove our pods:

kubectl delete pod/haproxy pod/registry

Exposing HTTP services with Ingress resources

Services give us a way to access a pod or a set of pods
Services can be exposed to the outside world:
- with type NodePort(on a port > 30000)
- with type LoadBalancer(allocating an external load balancer)
What about HTTP services?
- how can we expose webui, rng, hasher?
- the Kubernetes dashboard?
- all on the same IP and port?

Exposing HTTP services

If we use NodePort services, clients have to specify port numbers (i.e. http://xxxxx:31234 instead of just http://xxxxx)
LoadBalancer services are nice, but:
- they are not available in all environments
- they often carry an additional cost(e.g they provision an ELB)
- they often work at OSI Layer 4(IP+Port) and not Layer 7(HTTP/S)
- they require one extra step for DNS integration (waiting for the LoadBalancer to be provisioned;then adding it to DNS)
We could build our own reverse proxy

Building a custom reverse proxy

There are many options available: Apache, HAProxy, Envoy Proxy, Gloo, NGINX, Traefik,...
Most of these options require us to update/edit configuration files after each change
Some of them can pick up virtual hosts and backends from a configuration store
Wouldn't it be nice if this configuration could be managed with Kubernetes API?
Enter Ingress resources!

ingress vs. Ingress

ingress

ingress definition: Going in, entering. The opposite of egress(leaving)
In networking terms, ingress refers to handling incomming connections
Could imply incoming to firewall, network, or in this case, a server cluter

Ingress

Ingress(capital 1)in these slides means the Kubernetes Ingress resource
Specific to HTTP/S

Ingress resources

Kubernetes API resource(kubectl get ingress/ingresses/ing)
Designed to expose HTTP services
Basic features:
- load balancing
- SSL termination
- name-based virtual hosting
Can also route to different services depending on:
- URI path(e.g /api -> api-service, static->assets-service)
- Client headers, including cookies(for A/B testing, canary deployment...)
- and more

Principle of operation

Step 1:deploy an Ingress controller
- Ingress controller = load balancing proxy + control loop
- the control loop watches over Ingress resources, and configures the LB accordingly
- these might be two separate processes(NGINX server + NGINX Ingress controller)
- or a single app that knows how to speak to Kubernetes API(Traefik)
Step 2: set up DNS(usually)
- associate external DNS entries with the load balancer or host address
Step 3: create Ingress resources for our Services resources
- these resources contain rules for handling HTTP/S connections
- the Ingress controller picks up these resources and configures the LB
- connections to the Ingress LB will be processed by the rules

Ingress in action: NGINX

We will deploy the NGINX Ingress controller first
- this is a popular, yet arbitrary choice, the docs list over a dozen options
For DNS, we will use nip.io
- *.127.0.0.1.nip.io resolves to 127.0.0.1
- we do this so we can use various FQDN's without editing our hosts file
We will create Ingress resources for various HTTP-based Services

Deploying pods listening on port 80

We want our Ingress load balancer to be available on port 80
We could do that with a LoadBalancer service
- ...but it requires support from the underlying infrastructure
- minikube and MicroK8s don't work with it
- ...but Docker Desktop supports it for localhost!
We could use pods specifying hostPort: 80
- ...but with most CNI plugins, this doesn't work or requires additional setup
We could use a NodePort service
- ...but that requires changing the --service-node-port-range flag in the API service
Last resort: the hostNetwork mode

Without `hostNetwork`

Normally, each pod gets its own network namespace (sometimes called sandbox or network sandbox)
An IP address is assigned to the pod
This IP address is routed/connected to the cluster network
All containers of that pod are sharing that network namespace (and therefore using the same IP address)

With `hostNetwork: true`

No network namespace gets created
The pod is using the network namespace of the host
It "sees" (and can use) the interfaces(and IP addresses) of the host(VM on macOS/Win)
The pod can receive outside traffic directly, on any port
Downside: with most network plugins, network polices won't work for that pod
- most network policies work at the IP address level
- filtering that pod = filtering traffic from the node

What you will use now

Docker Desktop
- no built-in ingress installer, we'll provide you YAML
- Ignores hostNetwork, but Service type: LoadBalancer works with localhost!
minikube
- has a built-in NGINX installer minikube addons enable ingress
- But, let's use YAML we provide for learning purposes
- hostNetwork: true enabled on pod works for minikube IP
MicroK8s:
- has a built-in NGINX installer microk8s.enable ingress
- let's use YAML we provide anyway for learning purposes
- hostNetwork: true enabled on pod works for MicroK8s host IP

Running NGINX on our cluster

Now let's deploy the NGINX controller. Pick your distro:

Exercise

Apply the YAML

# for Docker Desktop, create Service with LoadBalancer
kubectl apply -f https://k8smastery.com/ic-nginx-lb.yaml
# for minikube/MicroK8s, create Service with hostNetwork
kubectl apply -f https:/k8smastery.com/ic-nginx-hn.yaml

Check the pod Status

kubectl describe -n ingress-nginx deploy/nginx-ingress-controller
kubectl get all -n ingress-nginx

Deploying the NGINX Ingress controller

We need the YAML templates from the kubernetes/ingress-nginx project

The two main sections in the YAML are:

NGINX Deployment(or DaemonSet) and all its required resources
- Namespace
- ConfigMaps(storing NGINX configs)
- ServicesAccount(authenticate to Kubernetes API)
- Role/ClusterRole/RoleBindings(authorization to API parts)
- LimitRange(limit cpu/memory of NGINX)
Service to expose NGINX on 80/443
- different for each Kubernetes distribution

kubectl apply -f https://k8smastery.com/ic-nginx-lb.yaml
kubectl describe -n ingress-nginx deploy/nginx-ingress-controller
kubectl get all -n ingress-nginx

Checking that NGINX

If NGINX started correctly, we now have a web server listening on each node

Exercise

Direct your browser to your Kubernetes IP on port 80

We should get a 404 page not found error.

This is normal: we haven't provided any Ingress rule yet.

Setting up DNS

To make our lives easier, we will use nip.io
Check out http://cheddar.A.B.C.D.nip.io (replacing A.B.C.D with the IP address of your Kubernetes IP)
We should get the same 404 page not found error (meaning that our DNS is "set up properly" so to speak!)

cheddar.127.0.0.1.nip.io

Setting up host-based routing ingress rules

We are going to use errm/cheese images (there are 3 tags available: wensleydale, cheddar, stilton)
These images contain a simple static HTTP server sending a picture of cheese
We will run 3 deployments(one for each cheese)
We will create 3 services(one for each deployment)
Then we will create 3 ingress rules(one for each service)
We will route <name-of-cheese>.A.B.C.D.nip.io to the corresponding deployment

Running cheesy web servers

Exercise

Run all three deployments:

kubectl create deployment cheddar --image=errm/cheese:cheddar
kubectl create deployment stilton --image=errm/cheese:stilton
kubectl create deployment wensleydale --image=errm/cheese:wensleydale

Create a service for each of them:

kubectl expose deployment cheddar --port=80
kubectl expose deployment stilton --port=80
kubectl expose deployment wensleydale --port=80

What does an ingress resource look like?

Here is a minimal host-based ingress resource:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: cheddar
spec:
  rules:
  - host: cheddar.A.B.C.D.nip.io
    http:
      paths:
      - path: /
        backend:
          serviceName: cheddar
          servicePort: 80

(In 1.14 ingress moved from the extensions API to networking)

Creating our first ingress resources

Exercise

Download our YAML curl -O https://k8smastery.com/ingress.yaml
Edit the file ingress.yaml which has three Ingress resources
Replace the A.B.C.D with your Kubernetes IP (127.0.0.1 for localhost)
Apply the file kubectl apply -f ingress.yaml
Open http://cheddar.A.B.C.D.nip.io (An Image of a piece of cheese should show up.)

Bring up the other Ingress resources

Exercise

Open http://stilton.A.B.C.D.nip.io
Open http://wensleydale.A.B.C.D.nip.io
Different cheeses should show up for each URL

http://cheddar.127.0.0.1.nip.io
http://stilton.127.0.0.1.nip.io
http://wensleydale.127.0.0.1.nip.io

Adding features to a Ingress resource

Reverse proxies have lots of features
Let's add a 301 redirect to a new Ingress resource using annotations
It will apply when any other path is used in URL that we didn't already add

Exercise

Create a redirect

kubectl apply -f https://k8smastery.com/redirect.yaml

Open http://.A.B.C.D.nip.io or localhost or A.B.C.D
It should immediately redirect to google.com

http://somethingelse.127.0.0.1.nip.io

View Ingress resources

Let's inspect some ingress resources

Exercise

List all Ingress resources in the default namespace

kubectl get ingress

Get the details on the cheddar Ingress resource

kubectl describe ingress cheddar

Get the details on the my-google Ingress resource

kubectl describe ingress my-google

Output stilton in YAML

kubectl get ingress/stilton -o yaml

Swapping NGINX for Traefik

Traefik is a proxy with built-in Kubernetes Ingress support
It has a web dashboard, built-in Let's Encrypt, full TCP support, and more
Most importantly: Traefik releases are named after cheeses
The Traefik documentaion tells us to pick between Deployment and DaemonSet
We are going to use a DaemonSet so that each node can accept connections
We provide a YAML file which is essentially the sum of:
- Traefik's DaemonSet resources(patched with hostNetwork and tolerations)
- Traefik's RBAC rules allowing it to watch necessary API objects
We will make a minor change to the YAML provided by Traefik to enable hostNetwork for MicroK8s/minikube
For Docker Desktop we'll add a type: LoadBalancer to the Service

Removing NGINX from our cluster

Before starting Traefik, let's remove the NGINX controller
This won't remove Services or Ingress resources
But it will make them unavailable from outside the cluster

Exercise

Delete our NGINX controller and related resources:

# for Docker Desktop, create Service with LoadBalancer
kubectl delete -f https://k8smastery.com/ic-nginx-lb.yaml

# for minikube/MicroK8s, create Service with hostNetwork
kubectl delete -f https:/k8smastery.com/ic-nginx-hn.yaml

Also remove the redirect Ingress resource. It only worked in NGINX

kubectl delete -f https:/k8smastery.com/redirect.yaml

Running Traefik on our cluster

Now let's deploy the Traefik Ingress controller

Exercise

Apply the YAML

# for Docker Desktop with LoadBalancer
kubectl apply -f https://k8smastery.com/ic-traefik-lb.yaml
# kubectl delete -f https://k8smastery.com/ic-traefik-lb.yaml
# for minikube/MicroK8s hostNetwork
kubectl apply -f https:/k8smastery.com/ic-traefik-hn.yaml

Check the pod Status

kubectl describe -n kube-system ds/traefik-ingress-controller
kubectl get all -n kube-system

Checking that Traefik runs correctly

If Traefik started correctly, we can refresh a cheese and it still works

Exercise

Refresh http://cheddar.A.B.C.D.nip.io

http://cheddar.127.0.0.1.nip.io
http://stilton.127.0.0.1.nip.io
http://wensleydale.127.0.0.1.nip.io

Traefik web UI

Traefik provides a web dashboard on container port 8080
For those using the LoadBalancer method(Docker Desktop), it's enabled

Exercise

if using Docker Desktop, go to http://localhost:8080
For those use hostNetwork, this could be a problem
The container won't start if anything is listening on <host IP>:8080
On MicroK8s, Kubernetes API runs on 8080 😂
For those using minikube, you can un-comment the YAML and re-apply
You could also edit the resource(s) and manually add the details, e.g.
- kubectl edit -n kube-system ds/traefik-ingress-controller

What about Traefik 2.x IngressRoute resources?

We've been using Traefik 2.x as the Ingress controller
Traefik released 2.0 in late 2019
Their documentation talks about IngressRoute resource
But IngressRoute is not a build-in resource of Kubernetes
Traefik 2.x now supports a custom CRD(Custom Resource Definition)
We'll explore why in a bit

Using multiple ingress controllers

You can have multiple ingress controllers active simultaneously (e.g. Traefik, Gloo, and NGINX)
You can even have multiple instances of the same controller (e.g. one for internal, another for external traffic)
The kubernetes.io/ingress.class annotation can be used to tell which one to use
It's OK if multiple ingress controllers configure the same resource (it just means that the service will be accessible through multiple paths)

Ingress resources: the good

The traffic flows directly from the ingress load balancer to the backends
- it doesn't need to go through the ClusterIP
- in fact, we don't even need a ClusterIP(we can use a headless service)
The load balancer can be outside of Kubernetes (as long as it has access to the cluster subnet)
This allows the use of external(hardware, physical machines...) load balancers
Annotations can encode special features (rate-limiting, A/B testing, session stickiness, etc.)

Ingress resources: the bad(cough Annotations cough)

Aforementioned "special features" are not standardized yet
Some controller will support them; some won't
Even relatively common features(stripping a path prefix) can differ:
- traefik.ingress.kubernetes.io/rule-type: PathPrefixStrip
- Ingress.kubernetes.io/rewrite-target: /
This should eventually stabilize (remember that ingresses are currently apiVersion: networking.k8s.io/v1beta1)
Annotations are not validated in CLI
Some proxies provide a CRD(Custom Resource Definition) option

When not to use built-in Ingress resources

You need features beyond Ingress including:
- TCP support, traffic splitting, mTLS, egress, service mesh
- response transformation, routing to 2+ services
You have external load balancers(like AWS ELBs) which route to NodePorts
You don't need externally available HTTP services on the default ports

Using CRD's as alternatives to Ingress resources

Due to the limits of the built-in ingress, many projects are moving to CRD's
For example, Traefik 2.x has a IngressRoute CRD option
Ambassador, a controller for Envoy proxy, uses a Mapping CRD
These CRD proxy options do ingress plus more (sometimes called API Gateways):
- TCP Support(anything beyond HTTP/HTTPS)
- Traffic splitting, rate limiting, circuit breaking, etc
- Complex traffic routing, request and response transformation
Once we consider CRD's, many more proxy options are available:
- Envoy Proxy based (Gloo, Ambassador, Contour)
- Other �Proxies(Tyk, Traefik, Kong, KrakenD)
Eventually, some more advanced features might be added to "Ingress Resources"
We'll cover more after we learn about CRD's and Operators

Cleanup

Let's cleanup before we start the next lecture!

Exercise

remove our ingress controller:

# for Docker Desktop with LoadBalancer
kubectl delete -f https://k8smastery.com/ic-traefik-lb.yaml
# for minikube/MicroK8s with hostNetwork
kubectl delete -f https:/k8smastery.com/ic-traefik-hn.yaml

remove our ingress resources:

kubectl delete -f ingress.yaml
kubectl delete -f https:/k8smastery.com/redirect.yaml

remove our cheeses

kubectl delete svc/cheddar svc/stilton svc/wensleydale
kubectl delete deploy/cheddar deploy/stilton deploy/wensleydale

Image vs. Container

An Image is the application we want to run
A Container is an instance of that image running as a process
You can have many containers running off the same image
In this lecture our image will be the Nginx web server
Docker's default image "registry" is called Docker Hub
- (hub.docker.com)

docker container run --publish 80:80 nginx
# ctrl + c
# docker container run --publish 8080:80 nginx
# docker container run --publish 8080:80 --detach nginx
# docker container ls
# docker container stop 690
# docker container ls -a

docker container run --publish 80:80 nginx

Downloaded image 'nginx' from Docker Hub
Started a new container from that image
Opened port 80 on the host IP
Routes that traffic to the container IP, port 80

Note you'll get a "bind" error if the left number [host port]
is being used by anthing else, even another container.
You can use any port you want on the left, like 8080:80
or 8888:80,then use localhost:8888 when testing

docker container run --publish 80:80 --detach --name webhost nginx
docker container ls -a
docker container logs webhost
docker container top webhost
docker container --help

What happens in 'docker container run'

Looks for that image locally in image cache, doesn't find anything
Then looks in remote image repository(defaults to Docker Hub)
Downloads the latest version(nginx:latest by default)
Creates new container based on that image and prepares to start
Give it a virtual IP on a private network inside docker engine
Opens up port 80 on host and forwards to port 80 in container
Starts container by using the CMD in the image Dockerfile

Example Of Changing The Default

docker container run --publish 8080:80 --name webhost -d nginx:1.11 nginx -T
# 8080 -> change host listening port
# 1.11 -> change version of image
# nginx -T -> change CMD run on start

Containers aren't Mini-VM's

They are just processes
Limited to what resources they can access
Exit when process stops

docker run --name mongo -d mongo
docker ps
docker top mongo
ps aux
docker stop mongo
ps aux | grep mongo
docker start mongo
docker ps
docker top mongo

Assignment: Manage Multiple Containers

docs.docker.com and --help are your friend
Run a nginx, a mysql, and a httpd(apache) server
Run all of the --detach(or -d), name them with --name
nginx should listen on 80:80, httpd on 8080:80, mysql on 3306:3306
When running mysql, use the --env option(or -e) to pass in MYSQL_RANDOM_ROOT_PASSWORD=yes
Use docker container logs on mysql to find the random password it created on startup
Clean it all up with docker container stop and docker container rm (both can accept multiple names or ID's)
Use docker container ls to ensure everything is correct before and after cleanup

Assignment Reminder

This lecture is ANSWERS to previous assignment
Try the previous lecture (Assignment details) first on your own before watching this video
Forcing yourself to figure this assignment out by sudying commands, docs.docker.com, etc. will help this stuff stick in your brain better then just watching me do it!

docker container run -d -p 3306:3306 --name db -e MYSQL_RANDOM_ROOT_PASSWORD=yes mysql
docker container logs db
# find & copy-> GENERATED ROOT PASSWORD: Aekia1fiighaengeethe0eihai2ailij

docker container run -d --name webserver -p 8080:80 httpd
docker ps

docker container run -d --name proxy -p 80:80 nginx
docker container ls

curl localhost
curl localhost:8080
 
docker container stop nginx httpd mysql
docker container stop # tap completion

docker container ls -a
docker container rm # tap completion again

docker container rm 4f444c35ebf7 8f9ccb6b0553 0827cf019e82 bb7958b14a40

docker ps -a
docker image ls

What's Going On In Containers

docker container top - process list in one container
docker container inspect - details of one container config
docker container stats - performance stats for all containers

docker container run -d --name nginx nginx
docker container run -d --name mysql -e MYSQL_RANDOM_ROOT_PASSWORD=true mysql
docker container ls
docker container top mysql
docker container top nginx
docker container inspect mysql

docker container stats --help
docker container stats

Getting a Shell Inside Containers

docker container run -it - start new container interactively
docker container exec -it - run additional command in existing container
Different Linux distros in containers

docker container run -it --name proxy nginx bash
exit
docker container ls -a

docker container run -it --name ubuntu ubuntu
apt-get update
apt install -y curl
curl baidu.com
exit

docker container ls -a

docker container start --help
docker container start -ai ubuntu
curl google.com
exit

docker container exec --help
docker container exec -it mysql bash
ps aux
exit
docker container ls

docker pull alpine
docker container run -it alpine sh
apk

Docker Networks: Concepts

Review of docker container run -p
For local dev/testing, networks usually "just work"
Quick port check with docker container port <container>
Learn concepts of Docker Networking
Understand how network packets move around Docker

Docker Networks Defaults

Each container connected to a private virtual network "bridge"
Each virtual network routes through NAT firewall on host IP
All containers on a virtual network can talk to each other without -p
Best practice is to create a new virtual network for each app:
- network "my_web_app" for mysql and php/apache containers
- network "my_api" for mongo and nodejs containers

Docker Networks Cont.

"Batteries Included, But Removable"
- Defaults work well in many cases, but easy to swap out parts to customize it
Make new virtual networks
Attach containers to more then one virual network(or none)
Skip virual networks and use host IP(--net=host)
Use different Docker network drivers to gain new abilities
and much more...

docker container run -p 80:80 --name webhost -d nginx
docker container port webhost
docker container inspect --format '{{ .NetworkSettings.IPAddress }}' webhost
ifconfig en0

Docker Networks: CLI Management

Show networks docker network ls
Inspect a network docker network inspect
Create a network docker network create --driver
Attach a network to container docker network connect
Detach a network from container docker network disconnect

docker network ls
docker network inspect bridge

docker network create my_app_net
docker network ls

docker network create --help
docker container run -d --name new_nginx --network my_app_net nginx
docker network inspect my_app_net
docker network connect 21c7d8642536 b3f503669e05
docker container inspect b3f503669e05

docker network disconnect 21c7d8642536 b3f503669e05
docker container inspect b3f503669e05

Docker Networks: Default Security

Create your apps so frontend/backend sit on same Docker network
Their inter-communication never leaves host
All externally exposed ports closed by default
You must manually expose via -p, which is better default security!
This gets even better later with Swarm and Overlay networks

Docker Networks: DNS

Understand how DNS is the key to easy inter-container comms
See how it works by default with custom networks
Learn how to use --link to enable DNS on default bridge network

docker container ls
docker network inspect my_app_net

docker container run -d --name my_nginx --network my_app_net nginx
docker network inspect my_app_net

docker container exec -it my_nginx ping new_nginx

Docker Networks: DNS

Containers shouldn't rely on IP's for inter-communication
DNS for friendly names is built-in if you use custom networks
You're using custom networks right?
This gets way easier with Docker Compose in future Section

Assignment Requirements: CLI App Testing

Know how to use -it to get shell in container
Understand basics of what a Linux distributions is like Ubuntu and CentOS
Know how to run a container🐻

Assignment: CLI App Testing

Use different Linux distro containers to check curl cli tool version
Use two different terminal windows to start bash in both centos:7 and ubuntu:14.04, using -it
Learn the docker container --rm option so you can save cleanup
Ensoure curl is installed and on latest version for that distro
- ubuntu: apt-get update && apt-get install curl
- centos: yum update curl
Check curl --version

#t1
docker container run --rm -it centos:7 bash
yum update curl -y 

#t2
docker ps -a
docker container run --rm -it ubuntu:14.04 bash
apt-get update && apt-get install -y curl
curl --version

Assignment Requirements: DNS RR Test

Know how to use -it go get shell in container
Understand basics of what a Linux distribution is like Ubuntu and CentOS
Know how to run a container
Understand basics of DNS records

Assignment: DNS Round Robin Test

Ever since Docker Engine 1.11, we can have multiple containers on a created network respond to the same DNS address
Create a new virtual network(default bridge driver)
Create two containers from elasticsearch:2 image
Research and use -network-alias search when creating them to give them an additional DNS name to respond to
Run alpine nslookup search with --net to see the two containers list for the same DNS name
Run centos curl -s search:9200 with --net multiple times until you see both "name" fields show

docker network create dude
docker container run -d --net dude --net-alias search elasticsearch:2
# --net-alias and --network-alias both work
docker container run --rm --net dude alpine nslookup search
docker container run --rm --net dude centos curl -s search:9200
docker container run --rm --net dude centos curl -s search:9200
docker container run --rm --net dude centos curl -s search:9200
docker container run --rm --net dude centos curl -s search:9200
### Test DNS Round Robin
docker container ls
docker container rm -f kind_darwin fervent_jackson

This Section:

All about images, the building blocks of containers
What's in an image (and what isn't)
Using Docker Hub registry
Managing our local image cache
Building our own images

What's In An Image (And What Isn't)

App binaries and dependencies
Metadata about the image data and how to run the image
Official definition: "An Image is an ordered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime"
Not a complete OS. No kernel, kernel modules(e.g. drivers)
Small as one file(your app binary) like a golang static binary
Big as a Ubuntu distro with apt, and Apache, PHP, and more installed

This Lecture:

Basics of Docker Hub(hub.docker.com)
Find Official and other good public images
Download images and basics of image tags

This Lecture: Review

Docker Hub, "the apt package system for Containers"
Official images and how to use them
How to discern "good" public images
Using different base images like Debian or Alpine

This Lecture:

Image layers
union file system
history and inspect commands
copy on write

docker image ls
docker history nginx:latest
# Oops that's old command format
docker history mysql

docker image inspect
# docker inspect (old way)
# returns JSON metadata about the image
docker image inspect nginx

Image and Their Layers: Review

Images are made up of file system changes and metadata
Each layer is uniquely identified and only stored once on a host
This saves storage space on host and transfer time on push/pull
A container is just a single read/write layer on top of image
docker image history and inspect commands can teach us

This Lecture: Requirements

Know what container and images are
Understand image layer basics
Understand Docker Hub basics

This Lecture:

All about image tags
How to upload to Docker Hub
Image ID vs. Tag

docker image tag --help
docker image ls

docker pull mysql/mysql-server
docker pull nginx:mainline
docker image tag nginx lotteryjs/nginx
docker image tag --help
docker image ls
docker login
cat ~/.docker/config.json
docker image push lotteryjs/nginx
docker image tag lotteryjs/nginx lotteryjs/nginx:testing
docker image ls
docker image push lotteryjs/nginx:testing

This Lecture: Review

Properly tagging images
Tagging images for upload to Docker Hub
How tagging is related to image ID
The Latest Tag
Logging into Docker Hub from docker cli
How to create private Docker Hub images

e.g. docker build -f some-dockerfile

cd dockerfile-sample-1
docker image build -t customnginx .

cd dockerfile-sample-2
ll
vim Dockerfile
docker container run -p 80:80 --rm nginx
docker image build -t nginx-with-html .
docker container run -p 80:80 --rm nginx-with-html
docker image ls
docker image tag --help
docker image tag nginx-with-html:latest lotteryjs/nginx-with-html:latest
docker image ls

Assignment: Build Your Own Image

Dockerfiles are part process workflow and part art
Take existing Node.js app and Dockerize it
Make Dockerfile. Build it. Test it. Push it. (rm it). Run it.
Expect this to be iterative. Rarely do I get it right the first time.
Details in dockerfile-assignment-1/Dockerfile
Use the Alpine version of the official 'node' 6.x image
Expected result is web site at http://localhost
Tag and push to your Docker Hub account(free)
Remove your image from local cache, run again from Hub

cd dockerfile-assignment-1
ll
vim Dockerfile

# Instructions from the app developer
# - you should use the 'node' official image, with the alpine 6.x branch
FROM node:6-alpine
# - this app listens on port 3000, but the container should launch on port 80
  #  so it will respond to http://localhost:80 on your computer
EXPOSE 3000
# - then it should use alpine package manager to install tini: 'apk add --update tini'
RUN apk add --update tini
# - then it should create directory /usr/src/app for app files with 'mkdir -p /usr/src/app'
RUN mkdir -p /usr/src/app
# - Node uses a "package manager", so it needs to copy in package.json file
WORKDIR /usr/src/app
COPY package.json package.json
# - then it needs to run 'npm install' to install dependencies from that file
RUN npm install && npm cache clean
# - to keep it clean and small, run 'npm cache clean --force' after above
# - then it needs to copy in all files from current directory
COPY . .
# - then it needs to start container with command '/sbin/tini -- node ./bin/www'
CMD [ "tini", "--", "node", "./bin/www" ]
# - in the end you should be using FROM, RUN, WORKDIR, COPY, EXPOSE, and CMD commands

docker build -t testnode .
docker container run  --rm -p 80:3000 testnode
docker images
docker tag --help
docker tag testnode lotteryjs/testing-node
docker push --help
docker push lotteryjs/testing-node
docker image rm lotteryjs/testing-node
docker container run --rm -p 80:3000 lotteryjs/testing-node

Section Overview

Defining the problem of persistent data
Key concepts with containers: immutable, ephemeral
Learning and using Data Volumes
Learning and using Bind Mounts
Assignments

Container Lifetime & Persistent Data

Containers are usually immutable and ephemeral
"immutable infrastructure":only re-deploy containers, never change
This is the ideal scenario, but what about databases, or unique data?
Docker gives us features to ensure these "separation fo concerns"
This is known as "persistent data"
Two ways: Volumes and Bind Mounts
Volumes: make special location outside of container UFS
Bind Mounts: link container path to host path

Persistent Data: Volumes

VOLUME command in Dockerfile

docker pull mysql
# Note you might want to do a `docker volume prune` to
# cleanup unused volumes and make it easier to see what
# you're doing here
docker image inspect mysql
docker container run -d --name mysql -e MYSQL_ALLOW_EMPTY_PASSWORD=True mysql
docker container inspect mysql
docker volume inspect bd2dd9f5ecabc7d9bd6779436b8fdbee1426613601e24ef7d94ef77b497b7c1c
docker container run -d --name mysql2 -e MYSQL_ALLOW_EMPTY_PASSWORD=True mysql
docker volume ls
docker container stop mysql
docker container stop mysql2
docker container ls
docker container ls -a
docker volume ls
docker container rm mysql mysql2
docker volume ls
docker container run -d --name mysql -e MYSQL_ALLOW_EMPTY_PASSWORD=True -v mysql-db:/var/lib/mysql  mysql
docker volume ls
docker volume inspect mysql-db
docker container rm -f mysql
docker container run -d --name mysql3 -e MYSQL_ALLOW_EMPTY_PASSWORD=True -v mysql-db:/var/lib/mysql  mysql
docker volume ls
docker container inspect mysql3

docker volume create

required to do this before "docker run" to use custom drivers and labels

docker volume create --help

Persistent Data: Bind Mounting

Maps a host file or directory to a container file or directory
Basically just two locations pointing to the same file(s)
Again, skips UFS, and host files overwrite any in container
Can't use in Dockerfile, must be at container run
... run -v /Users/bret/stuff:/path/container (mac/linux)
... run -v //c/Users/bret/stuff:/path/container (windows)

cat dockerfile-sample-2
ll
pcat Dockerfile
docker container run -d --name nginx -p 80:80 -v $(pwd):/usr/share/nginx/html nginx
docker container run -d --name nginx2 -p 8080:80 nginx
ll
# t2
docker container exec -it nginx bash
cd /usr/share/nginx/html
ls -al
# t1
touch testme.txt
# t2
ls -al
# t1
echo "is it me you're looking for" > testme.txt

Assignment: Named Volumes

Database upgrade with containers
Create a postgres container with named volume psql-data using version 9.6.1
Use Docker Hub to learn VOLUME path and versions needed to run it
Check logs, stop container
Create a new postgre container with same named volume using 9.6.2
Check logs to validate
(this only works with patch versions, most SQL DB's require manual commands to upgrade DB's to major/minor versions,i.e. it's DB limitation not a container one)

docker container run -d --name psql -v psql:/var/lib/postgresql/data postgres:9.6.1
docker container logs -f psql
docker container stop psql
docker container run -d --name psql2 -v psql:/var/lib/postgresql/data postgres:9.6.2
docker container ps -a
docker volume ls
docker container logs psql2

Assignment: Bind Mounts

Use a Jekyll "Static Site Generator" to start a local web server
Don't have to be web developer: this is example of bridging the gap between local file access and apps running in containers
source code is in the course repo under bindmount-sample-1
We edit files with editor on our host using native tools
Container detects changes with host files and updates web server
start container with docker run -p 80:4000 -v $(pwd):/site bretfisher/jekyll-serve
Refresh our browser to see changes
Change the file in _posts\ and refresh browser to see changes

cd bindmount-sample-1
ls -la
docker run -p 80:4000 -v $(pwd):/site bretfisher/jekyll-serve

Docker Compose

Why: configure relationships between containers
Why: save our docker container run settings in easy-to-read file
Why: create one-liner developer environment startups
Comprised of 2 separate but related things
1. YAML-formatted file that describes our solution options for:
- containers
- networks
- volumes
1. A CLI tool docker-compose used for local dev/test automation with those YAML files

docker-compose.yml

Compose YAML format has it's own versions: 1, 2, 2.1, 3, 3.1
YAML file can be used with docker-compose command for local docker automation or...
With docker directly in production with Swarm(as of v1.13)
docker-compose --help
docker-compose.yml is default filename, but any can be used with docker-compose -f

cd compose-sample-2

docker-compose CLI

CLI tools comes with Docker for Windows/Mac, but separate download for Linux
Not a production-grade tool but ideal for local development and test
Two most common commands are
- docker-compose up # setup volumes/networks and start all containers
- docker-compose down # stop all containers and remove con/vol/net
If all your projects had a Dockerfile and docker-compose.yml then "new developer onboarding" would be:
- git clone github.com/some/software
- docker-compose up

cd compose-sample-2
ls -la
cat docker-compose.yml
docker-compose up
# browser -> http://localhost
# ctrl + c
docker-compose up -d
docker-compose logs
docker-compose --help
docker-compose ps
docker-compose top
docker-compose down

Assignment: Writing A Compose File

Build a basic compose file for a Drupal content management system website. Docker Hub is your friend
Use the drupal image along with the postgres image
Use ports to expose Drupal on 8080 so you can localhost:8080
Be sure to set POSTGRES_PASSWORD for postgres
Walk though Drupal setup via browser
Tip: Drupal assumes DB is localhost, but it's service name
Extra Credit: Use volumes to store Drupal unique data

docker pull drupal
docker image inspect drupal

version: '2'

services:
  drupal:
    image: drupal:8.2
    ports:
      - "8080:80"
    volumes:
      - drupal-modules:/var/www/html/modules
      - drupal-profiles:/var/www/html/profiles       
      - drupal-sites:/var/www/html/sites      
      - drupal-themes:/var/www/html/themes
  postgres:
    image: postgres:9.6
    environment:
      - POSTGRES_PASSWORD=mypasswd

volumes:
  drupal-modules:
  drupal-profiles:
  drupal-sites:
  drupal-themes:

docker-compose down -v
# Remove named volumes declared in the `volumes`
# section of the Compose file and anonymous volumes
# attached to containers.

Using Compose to Build

Compose can also build your custom images
Will build them with docker-compose up if not found in cache
Also rebuild with docker-compose build
Great for complex builds that have lots of vars or build args

cd compose-sample-3/
docker-compose up
# localhost
docker-compose down
docker image ls
docker-compose down --help
docker image rm nginx-custom
docker image ls
# remove -> image: nginx-custom
docker-compose up -d
docker image ls
docker-compose down --rmi local

Assignment: Build and Run Compose

"Building custom drupal image for local testing"
Compose isn't just for developers. Testing apps si easy/fun!
Maybe your learning Drupal admin, or are a software tester
Start with Compose file from previous assignment
Make your Dockerfile and docker-compose.yml in dir compose-assignment-2
Use the drupal image along with the postgres image as before
Use README.md in that dir for details

cd compose-assignment-2

Dockerfile

FROM drupal:8.6


RUN apt-get update && apt-get install -y git \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /var/www/html/themes

RUN git clone --branch 8.x-3.x --single-branch --depth 1 https://git.drupal.org/project/bootstrap.git \
    && chown -R www-data:www-data bootstrap

WORKDIR /var/www/html

docker-compose.yml

version: '2'
# NOTE: move this answer file up a directory so it'll work

services:

  drupal:
    container_name: drupal
    image: custom-drupal
    ports:
      - "8080:80"
    volumes:
      - drupal-modules:/var/www/html/modules
      - drupal-profiles:/var/www/html/profiles       
      - drupal-sites:/var/www/html/sites      
      - drupal-themes:/var/www/html/themes
 
  postgres:
    container_name: postgres
    image: postgres:9.6
    environment:
      - POSTGRES_PASSWORD=mypasswd
    volumes:
      - drupal-data:/var/lib/postgresql/data

volumes:
  drupal-data:
  drupal-modules:
  drupal-profiles:
  drupal-sites:
  drupal-themes:

Your First Swarm Service

Docker Swarm Docs

Play with Docker

Templates
- 3 Managers and 2 Workers

docker service create --name hello --replicas 3 --detach=false --publish 8080:80 nginx

Containers Everywhere = New Problems

How do we automate container lifecycle?
How can we easily scale out/in/up/down?
How can we ensure our containers are re-created if they fail?
How can we replace containers are-created if they fail?
How can we control/track where containers get started?
How can we create cross-node virtual networks?
How can we ensure only trusted servers run our containers?
How can we store secrets, keys, passwords and get them to the right container(and only that container)?

Swarm Mode:Built-In Orchestration

Swarm Mode is a clustering solution built inside Docker
Not related to Swarm "classic" for pre-1.12 versions
Added in 1.12(Summer 2016)via SwarmKit toolkit
Enhanced in 1.13(January 2017) via Stacks and Secrets
Not enabled by default, new commands once enabled
- docker swarm
- docker node
- docker service
- docker stack
- docker secret

docker swarm init: What Just Happened?

Lots of PKI and security automation
- Root Signing Certificate created for our Swarm
- Certificate is issued for first Manager node
- Join tokens are created
Raft database created to store root CA, configs and secrets
- Encrypted by default on disk(1.13+)
- No need for another key/value system to hold orchestration/secrets
- Replicates logs amongst Managers via mutual TLS in "control plane"

Create Your First Service and Scale It Locally

docker swarm init
docker node ls
# ...MANAGER STATUS
# ...Leader
docker node help
docker swarm help
docker service help

docker service create alpine ping 8.8.8.8
docker service ls
docker service ps <NAME(service)>
docker container ls
docker service update <ID(service)> --replicas 3
docker service ls
docker service ps <NAME(service)>

docker update --help
docker service update --help

docker container ls
docker container rm -f <name>.1.<ID>
docker service ls

docker service ps <NAME(service)>

docker service rm <NAME(service)>
docker service ls
docker container ls

Creating 3-Node Swarm: Host Options

A.play-with-docker.com
- Only needs a browser, but resets after 4 hours
B.docker-machine + VirtualBox
- Free and runs locally,but requires a machine with 8GB memory
C.Digital Ocean + Docker install
- Most like a production setup, but cost $5-10/node/month while learning
- Use my referral code in section resources to get $10 free
D.Roll your own
- docker-machine can provision machines for Amazon, Azure, DO, Google, etc.
- Install docker anywhere with get.docker.com

play-with-docker.com

# node1
# create 3 new instances
docker info
ping node2

docker-machine + VirtualBox

docker-machine create node1
docker-machine ssh node1
exit
docker-machine env node1
eval $(docker-machine env node1)
docker info # node1

docker-machine env --unset
eval $(docker-machine env --unset)
docker info

DO

# curl -fsSL https://get.docker.com -o get-docker.sh
# sh get-docker.sh

# root@node1
docker swarm init
docker swarm init --advertise-addr <IP address>
# docker swarm init --advertise-addr=192.168.99.100

#I'm going to copy the swarm join command and go over to node2 and add it in.
# root@node2
docker swarm join --token SWMTKN-1-1bn2hsyhjwqn4wztjqd7moftumffk4vwd2e88azwj7u7bg0q6m-c91v8sc6vpsyxw4zsqmrmg4ji 192.168.99.100:2377
# This node joined a swarm as a worker.
docker node ls
#Error response from daemon: This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager.

# go back to node1
# docker node ls
docker node update --role manager node2

# for node3, let's add it as a manager by default.
# We need to go back to our original command of docker swarm, and then we need to get the join-token.
# go back to node1
docker swarm join-token manager
# docker swarm join --token SWMTKN-1-1bn2hsyhjwqn4wztjqd7moftumffk4vwd2e88azwj7u7bg0q6m-dzz4pcg9inzkt9wnft4hskaxn 192.168.99.100:2377
# I'm going to copy this, paste it into node3

# node3
docker swarm join --token SWMTKN-1-1bn2hsyhjwqn4wztjqd7moftumffk4vwd2e88azwj7u7bg0q6m-dzz4pcg9inzkt9wnft4hskaxn 192.168.99.100:2377
# This node joined a swarm as a manager.

# Back on node1
docker node ls
docker service create --replicas 3 alpine ping 8.8.8.8
docker node ps
docker service ps <service name>

Overlay Multi-Host Networking

Just choose --driver overlay when creating network
For container-to-container traffic inside a single Swarm
Optional IPSec(AES) encryption on network creation
Each service can be connected to multiple networks
- (e.g. front-end, back-end)

# root@node1
docker network create --driver overlay mydrupal
docker network ls

# https://github.com/bitnami/bitnami-docker-postgresql
docker service create --name psql --network mydrupal -e POSTGRESQL_POSTGRES_PASSWORD=mypass -e POSTGRESQL_DATABASE=progres postgres
docker service ls
docker service ps psql
docker logs psql.1.terabjvf7wkt5j769t04tld02

docker service create --name drupal --network mydrupal -p 80:80 drupal
docker service ls
docker service ps drupal #drupal is actrually running on Node2.

docker service inspect drupal #VIP

Routing Mesh

Use swarm mode routing mesh

Routes ingress(incoming) packets for a Service to proper Task
Spans All nodes in Swarm
Uses IPVS from Linux Kernel
Load balances Swarm Services across their Tasks
Two ways this works:
Container-to-container in a Overlay network(uses VIP)
External traffic incoming to published ports(all nodes listen)

docker service create --name search --replicas 3 -p 9200:9200 elasticsearch:2
docker service ps search
curl localhost:9200
curl localhost:9200
curl localhost:9200

Routing Mesh Cont.

This is stateless load balancing
This LB is at OSI Layer 3(TCP), not Layer 4(DNS)
Both limitation can be overcome with:
Niginx or HAProxy LB proxy, or:
Docker Enterprise Edition, which comes with built-in L4 web proxy

Assignment: Create Multi-Service App

Using Docker's Distributed Voting App
use swarm-app-1 directory in our course repo for requirements
1 volume, 2 networks, and 5 services needed
Create the commands needed, spin up services, and test app
Everything is using Docker Hub images, so no data needed on Swarm
Like many computer things, this is 1/2 art form and 1/2 science

docker node ls
docker ps -a
docker service ls
docker network create -d overlay backend
docker network create -d overlay frontend

docker service create --name vote -p 80:80 --network frontend --replicas 2 <image>

docker service create --name redis --network frontend --replicas 1 redis:3.2

docker service create --name worker --network frontend --network backend <image>

docker service create --name db --network backend --mount type=volume,source=db-data,target=/var/lib/postgresql/data postgres:9.4

docker service create --name result --network backend -p 5001:80

docker service ls
docker service ps result
docker service ps redis
docker service ps db
docker service ps vote
docker service ps worker
cat /etc/docker/
docker service logs worker

Stacks: Production Grade Compose

In 1.13 Docker adds a new layer of abstraction to Swarm called Stacks
Stacks accept Compose files as their declarative definition for services, networks, and volumes
We use docker stack deploy ranther then docker service create
Stacks managers all those objects for us, including overlay network per stack. Adds stack name to start of their name
New deploy: key in Compose file. Can't do build:
Compose now ignores deploy:, Swarm ignores build:
docker-compose cli not needed on Swarm server

docker stack deploy -c example-voting-app-stack.yml voteapp

docker stack --help

docker stack ls

docker stack services voteapp

docker stack ps voteapp

docker network ls
# overlay

docker stack deploy -c example-voting-app-stack.yml voteapp #update

Secrets Storage

Easiest "secure" solution for storing secrets in Swarm
What is a Secret?
- Usernames and passwords
- TLS certificates and keys
- SSH keys
- Any data you would prefer not be "on front page of news"
Supports generic strings or binary content up to 500Kb in size
Doesn't require apps to be rewritten

Secrets Storage Cont.

As of Docker 1.13.0 Swarm Raft DB is encrypted on disk
Only stored on disk on Manager nodes
Default is Managers and Workers "control plane" is TLS + Mutual Auth
Secrets are first stored in Swarm, then assigned to a Service(s)
Only containers in assigned Services(s) can see them
They look lik files in container but are actually in-memory fs
/run/secrets/<secret_name> or /run/secrets/<secret_alias>
Local docker-compose can use file-based secrets, but not secure

docker secret create psql_user psql_user.txt
echo "myDBpassWORD" | docker secret create psql_pass -

docker secret ls
docker secret inspect psql_user

docker service create --name psql --secret psql_user --secret psql_pass -e POSTGRES_PASSWORD_FILE=/run/secrets/psql_pass -e POSTGRES_USER_FILE=/run/secrets/psql_user postgres

docker service ps psql
docker exec -it <container name> bash
cat /run/secrets/psql_user
# mysqluser -> it worked

docker service ps psql
docker service update --secret-rm

docker stack deploy -c docker-compose.yml mydb
docker secret ls

docker stack rm mydb

Using Secrets With Local Docker Compose

ll
#docker-compose.yml
#psql_password.txt
#psql_user.txt
docker node ls
docker-compose up -d
docker-compose exec psql cat /run/secrets/psql_user
#dbuser

Full App Lifecycle With Compose

Live The Dream！
Single set of Compose files for:
Local docker-compose up development environment
Remote docker-compose up CI environment
Remote docker stack deploy production environment

Check Our Tools

Docker Desktop preferred(Win/Mac)
Docker Toolbox(Win 7/8/10 Home)
Linux: Install via Docker Docs
- docs.docker.com

Getting Docker Compose

CLI: docker-compose
- Included in Docker Desktop & Toolbox
- Linux: pip install docker-compose

docker version
# xx
docker-compose version
# xx

Why Compose?

2 parts: CLI and YAML files
Designed around developer workflows
docker-compose CLI a substitute for docker CLI
Use CLI by default locally

OOPS!~I meant to say "I reccomend you use Docker Compose locally"

Compose File Format

Docker standard (not yet industry std)
Defines multiple containers, networks, volumes, etc.
Can layer sets of YAML files, use templates, variables, and more
docker-compose.yml default

YAML

YAML: "YAML Ain't Markup Language"

Common configuration file format
Used by Docker, Kubernetes, Amazon, and others
: used for key/value pairs
Only spaces, no tabs
- used for lists

Compose YAML v2 vs V3

Myth busting: v3 does not replace v2
v2 focus: single-node dev/test
v3 focus: muti-node orchestration
If not using Swarm/Kubernetes, stick to v2

docker-compose CLI

many docker commands == docker-compose
IDE's now support docker-compose
"batteries included, but swappable"
CLI and YAML versions differ

docker-compose up

"one stop shop"
build/pull image(s) if missing
create volume/network/container(s)
starts containers(s) in foregound(-d to detach)
--build to always build

docker-compose down

"one stop shop"
stop and delete network/container(s)
use -v to delete volumes

docker-compose...

Many commands take "service" option
build just build/rebuild image(s)
stop just stop containers don't delete
ps list "services"
push images to registry
logs same as docker CLI
exec same as docker CLI

Compose CLI Basics-1

Run through simple compose commands

cd sample-02
docker-compose up
ctrl-c (same as docker-compose stop)
docker-compose down
docker-compose up -d
docker-compose ps
docker-compose logs

While app is running detached...

docker-compose exec web sh
curl localhost
exit

edit Dockerfile, add curl with apk

RUN apk add --update curl
docker-compose up -d

Notice it didn't build so force it

docker-compose up -d --build

Now try curl again

docker-compose exec web sh
curl localhost
exit

Cleanup

Inside sample-02 directory

docker-compose down
docker-compose down --helps

Compose CLI Basics-2

build no chace

docker-compose build --no-cache

docker-compose ps

sample-02_web_1   docker-entrypoint.sh node  ...   Up      0.0.0.0:3000->3000/tcp

sample-02 -> name of the project
web -> name of the service
1 -> numerical number of the replica

open localhost:3000

docker-machine ls
docker-compose logs

docker-compose logs web

docker-compose exec

docker-compose exec web sh
curl localhost

add RUN apk add --update curl to Dockerfile

docker-compose up -d --build
docker-compose exec web sh
curl localhost
curl localhost:3000
docker-compose down

Dockerfile Node Basics

Best practices for writing Dockerfiles

COPY, not ADD
npm/yarn install during build
CMD node, not npm
- requires another application to run
- not as literal in Dockerfiles
- doesn't work well as an init or PID 1 process
WORKDIR not RUN mkdir
- Unless you need chown

FROM Base Image Guidelines

Alpine Linux| Node.js Release Schedule| GitHub: ONBUILD deprecation| Node Official Image on Docker Hub

Stick to even numbered major releases
Don't use :latest tag
Start with Debian if migrating
Move to Alpline later
Don't use :slim
Don't use :onbuild

When to use Alpine Images

Alpine is "small" and "sec focused"
But Debian/Ubuntu are smaller now too
~100MB space savings isn't significant
Alpine has its own issues
Alpine CVE scanning fails
Enterprises may require CentOS or Ubuntu/Debian

Making a CentOS Node Image

Install Node in the official CentOS
Copy Dockerfile lines frome node:10
Use ENV to specify node version
This will take a few tries
Useful for knowing how to make your own node, but only if you have to

Assignment Answers:Making a CentOS Node Image

NOTE: You must add 'USER node' before CMD in Dockerfile to enable non-root user

# Dockfile
RUN groupadd --gid 1000 node \
  && useradd --uid 1000 --gid node --shell /bin/bash --create-home node

build

docker build -t centos-node .
docker run centos-node node --version

Least Privilege: Using node User

Official node images have a node user
But it's not used by default
Do this after apt/apk and npm i -g
Do this before npm i
May cause permissions issues with write access
May require chown node:node
Change user from root to node

USER node

Set permissions on app dir

RUN mkdir app && chown -R node:node .

Run a command as root in container

docker-compose exec -u root

Making Images Efficiently

Pick proper FROM
Line order matters
COPY twice: package.json* then . .
1. copy only the package and lock files
2. run npm install
3. copy everything else
One apt-get per Dockerfile
- apt-get update cache prob

Node Process Management In Containers

No need for nodemon, forever, or pm2 on server
- We'll use nodemon in dev for file watch later
Docker manages app start, stop, restart, healthcheck
Node multi-thread: Docker manages multiple "replicas"
One npm/node problem: They don't listen for proper shutdown signal by default

The Truth About The PID 1 Problem

PID 1 (Process Identifier) is the first process in a system (or container) (AKA init)
Init process in a container has two jobs:
- reap zombie processes
- pass signals to sub-processes
Zombie not a big Node issue
Focus on proper Node shutdown

Proper CMD for Healthy Shutdown

Docker uses Linux signals to stop app (SIGINT/SIGTERM/SIGKILL)
SIGINT/SIGTERM allow graceful stop
npm doesn't respond to SIGINT/SIGTERM
node doesn't respond by default, but can with code
Docker provides a init PID 1 replacement option

Proper Node Shutdown Options

Temp: Use --init to fix ctrl-c for now
Workaround: add tini to your image
Production: your app captures SIGINT for proper exit
Run any node app with --init to handle signals(temp solution)

docker run --init -d nodeapp

Add tini to your Dockerfile, then use it in CMD (permanent workaround)

RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "./bin/www"]

Use JS snippet to properly capture signals(production solution)

./sample-graceful-shutdown/sample.js

Assignment: Node Dockerfiles

Make a Dockerfile for existing Node app
use ./assignment-dockerfile/Dockerfile
Start with node 10.15 on alpine
Install tini, start node with tini
Copy package/lock files first, then npm, then copy

Tini

FROM
EXPOSE
RUN
WORKDIR
COPY
RUN
COPY
ENTRYPOINT
CMD

docker build -t assignment1 .
docker run -p 3001:3000 assignment1
# docker run -d -p 3001:3000 assignment1

localhost:3001

Testing Graceful shutdowns

Use ./assignment-dockerfile/
Run with tini built in, try to ctrl-c
Run with tini built in, try to stop
Remove ENTRYPOINT, rebuild
Add --init to run command,ctrl-c/stop
Bonus: add signal watch code

docker build -t assignment1:notini .
docker run -d -p 3001:3000 assignment1:notini
docker top 48a6
docker stop 48a6
# docker run --init -d -p 3001:3000 assignment1:notini
# docker top

docker run -d -p 3002:3000 assignment1
docker top 6ec2
docker stop 6ec2

Multi-stage Builds

New feature in 17.06 (mid-2017)
Build multiple images from on file
Those images can FROM each other
COPY files between them
Space + Security benefits
Great for "artifact only"
Great for dev + test + prod
1. FROM node as prod
2. ENV NODE_ENV=production
3. COPY package*.json ./
4. RUN npm install && npm cache clear --force
5. COPY . .
6. CMD ["node", "./bin/wwww"]
7. FROM prod as dev
8. ENV NODE_ENV = development
9. CMD ["nodemon", "./bin/www", "--inspect=0.0.0.0:9229"]
To build dev image from dev stage

docker build -t myapp .

To build prod image from prod stage

docker build -t myapp:prod --target prod .

More Multi-stage

Add a test stage that runs npm test
Have CI build --target test stage before building prod
Add npm install --only=development to dev stage
Don't COPY code into dev stage

Building A 3-Stage Dockerfile

Create a Dockerfile from ./sample-multi-stage
Create three stages for prod, dev, and test
Prod has no devDependencies and runs node
Dev includes devDep, runs nodemon
Test has devDep, runs npm test
Build all three stages with unique tags
Goal: don't repeat lines

Assignment Answers

# prod
docker build -t multistage --target prod . && docker run --init -p 3000:3000 multistage

# dev
docker build -t multistage:dev --target dev . && docker run --init -p 3000:3000 multistage:dev

# test
docker build -t multistage:test --target test . && docker run --init multistage:test

Cloud Native App Guidelines

Follow 12factor.net principles, especially
- Use Environment Variables for config
- Log to stdout/stderr
- Pin all versions, even npm
- Graceful exit SIGTERM/INIT
Create a .dockerignore (like .gitignore)
Heroku wrote a highly respected guide to creating distributed apps
Twelve factors to consider when developing or designing distributed apps
Containers are almost always distributed apps
Good news: You get many of these by using Docker
Lets focus on a few for Node.js

12 Factor:Config

12factor.net/config
Store environment config in Environment Variables (env vars)
Docker & Compose are great at this with multiple options
Old apps: Use CMD or ENTRYPOINT script with envsubst to pass env vars into conf files

12 Factor:Logs

12factor.net/logs
Apps shouldn't route or transport logs to anything but stdout/stderr
console.log() works
Winston/Bunyan/Morgan: Use levels to control verbosity
Winston transport: "Console"

.dockerignore

Prevent bloat and unneeded files
- .git/
- node_modules/
- npm-debug
- docker-compose*.yml
Not needed but useful in image
- Dockerfile
- README.md

Migrating Traditional Apps

"Traditional App" = Pre-Docker App
Take a typical Node app and "migrate"
./assignment-mta
add .dockerignore
Create Dockerfile
Change Winston transport to Console

MTA Requirements

See README.md for app details
Image shouldn't include in, out, node_modules or logs directories
Change Winston to Console winston.transports.Console
bind-mount in and out dirs
Set CHARCOAL_FACTOR to 0.1

MTA Outcomes

Running container with ./in and ./out bind-mounts results in new chalk images in ./out on host
Changing --env CHARCOAL_FACTOR changes look of image (test with 10)
No .gif files in image
docker logs shows Winston output

Assignment

docker build -t mta .
docker run mta
docker run -it mta bash

docker run -v $(pwd)/in:/app/in -v $(pwd)/out:/app/out mta
docker run -v $(pwd)/in:/app/in -v $(pwd)/out:/app/out --env CHARCOAL_FACTOR=10 mta
docker ps -l
docker logs dbda736f5c09

docker run -v $(pwd)/logs:/app/logs -v $(pwd)/in:/app/in -v $(pwd)/out:/app/out mta
docker run -v $(pwd)/logs:/app/logs -v $(pwd)/in:/app/in -v $(pwd)/out:/app/out --env CHARCOAL_FACTOR=10 mta

Compose Project Tips: Do's

cd ./compose-tips
Do use docker-compose for local dev
Do use v2 format for local dev
- v2 only: depends_on, hardware specific
Do study compose file and CLI features

Compose Project Tips: Don'ts

Unnecessary: "alias" & "container_name"
Legacy: "expose" & "links"
No need to set defaults

Bind-Mounting Code

Don't use host file paths
Don't bing-mount databases
For local dev only？don't copy in code
DDforWin needs drive perms
Perms: Linux != Windows

Bind-Mounting: Performance

node_modules in Images

Problem: we shouldn't build images with node_modules from host
- Example: node-gyp
Solution: add node_modules to .dockerignore

Let's do this to ./sample-sails

cp .gitignore .dockerignore
docker build -t sailsbret .

node_modules in Bind-Mounts

Problem: we can't bind-mount node_modules from host on macOS/Windows (different arch)
To Potential Solutions:
- Never use npm i on host, run npm i in compose
- Move modules in image, hide modules from host

# sample-express
docker-compose up # can't find module ...

docker-compose run express npm install
docker-compose up

Solution 1, simple but less flexible;
- You can't docker-compose up until you've used docker-compose run
- node_modules on host is now only usable from container

Solution 2, more setup but flexible:

Move node_modules up a directory in Dockerfile
Use empty volume to hide node_modules on bind-mount
node_modules on host doesn't conflict

# .dockerignore--->node_modules
ls
npm install
docker-compose build # rebuiding
docker-compose up -d
docker-compose ps
docker-compose exec express bash
ls node_modules/

NPM, Yarn, and Other Tools in Compose

Two ways to run various tools inside the container:
docker-compose run: start a new container and run command/shell
docker-compose exec: run additional command/shell in currently running container

# sample-strapi, .dockerignore -> node_modules
# Also remember to postinstall for strapi:
docker-compose run api npm i
docker-compose up

# other iterm
docker-compose exec api strapi --help
docker-compose exec api bash

File Monitoring and Node Auto Restarts

Use nodemon for compose file monitoring
webpack-dev-server, etc. work the same
Override Dockerfile via compose command
If Windows, enable polling
Create a nodemon.yml for advanced workflows(bower, webpack, parcel)

docker-compose run express npm install nodemon --save-dev

docker-compose build
docker-compose up

Startup Order and Dependencies

Problem: Multi-service apps start out of order, node might exit or cycle
Multi-container apps need:
- Dependency awareness
- Name resolution (DNS)
- Connection failure handling

Dependency Awareness

depends_on: when "up X", start Y first
Fixes name resolution issues with "can't resolve <service_name>"
Only for compose, not Orch
compose YAML v2: works with healthchecks like a "wait for script"

Connection Failure Handling

restart: on-failure
- Good: helps slow db startup and Node.js failing. Better: depends_on
- Bad: could spike CPU with restart cycling
Solution: build connection timeout, buffer, and retries in your apps

Healthchecks for depends_on

depends_on: is only dependency control by default
Add v2 healthchecks for true "wait_for"
Let's see some examples
- Mongo
- Postgres/MySQL
- Web

Making Microservices Easier

Problem: many HTTP endpoints, many ports
Solution: Nginx/HAProxy/Traefik for host header routing + wildcard localhost domain
Problem: CORS failures in dev
Solution: Proxy with * header
Problem: HTTPS locally
Solution: Create local proxy certs

Local DNS For Many Endpoints

Problem: Multiple endpoints and need unique DNS for each
- Use x.localhost, y.localhost in Chrome
- Use wildcard domains like *.vcap.me or xip.io
- Use dnsmasq on macOS/Linux
- Manually edit hosts file

VS Code, Debugging, and TypeScript

VS Code and other editors have some Docker and Compose features built-in
Debugging works when we enable in nodemon and remote via TCP
TypeScript compile and other pre-processors go in nodemon.json

# typescript
docker-compose run ts npm i
docker-compose up

Build A Sweet Compose File

./assignment-sweet-compose
Take all the learning from this section and apply it to a single compose file!
Uses Docke's example voting app (Dog vs. Cat)
Step-by-step in README.md

Avoiding devDependencies in Prod

Multi-stage can solve this
prod stages: npm i --only=production
Dev stage: npm i --only=development
Use npm ci to speed up builds
Ensure NODE_ENV is set
Sample ./multi-stage-deps/

Dockerfile Documentation

Document every line that isn't obvious
FROM stage, document why it's needed
COPY = don't document
RUN = maybe document
Add LABELS
RUN npm config list

Example Dockerfile Labels

LABEL has OCI standards now
- LABEL org.opencontainers.image.<key>
Use ARG to add info to labels like build date or git commit
Docker Hub has built-in envvars for use with ARGs
Sample ./dockerfile-labels/

Compose File Documentation

YAML (unlike JSON) supports comments!
Document objects that aren't obvious
- Why a volume is needed
- Why custom CMD is needed
Template blocks at top
Override objects and files

Run Tests During Image Build

Run npm test in a specific build-stage
- Also good for linting commands
Only run unit tests in build
Test stage not default
Locally, run docker-compose, run node npm test
Sample ./multi-stage-test/

docker build -t testnode --target=test .
docker build -t testnode --target=test --no-cache .

Security Scanning and Audit

Use test stage in multi-stage, or new
Or run it once image is built with CI
Only report at first, no failing (most images have at least one CVE vuln)
Consider RUN npm audit
./multi-stage-scanning/

docker build -t auditnode --target=audit --build-arg MICROSCANNER_TOKEN=$MICROSCANNER_TOKEN .

CI/CD Automated Builds

Have CI build images on (some) branches
Push to registry once build/tests pass
Lint Dockerfile and Compose/Stack files
Use docker-compose run or --exit-code-from for proper exit codes
Docker Hub can do this

Image Tagging

:latest is only a convention
Use latest for local easy access to current release
Maybe do this per major branch too for convenience
Don't repeat tags on CI or servers

Dockerfile Healthchecks

Always include HEALTHCHECK
Docker run and docker-compose: info noly
Docker Swarm: Key for uptime and rolling updates
Kubernetes: Not used, but helps in others making readiness/liveness probes
Sample ./healthchecks/

# option 1
HEALTHCHECK --interval=5m --timeout=3s \
  CMD curl -f http://localhost/ || exit 1

# option 2
HEALTHCHECK CMD curl -f http://localhost/healthz || exit 1

# option 3
HEALTHCHECK --interval=30s CMD node hc.js

Ultimate Node.js Dockerfile

./ultimate-node-dockerfile/
Use an existing Node.js sample app
Make a production grade Dockerfile
Development friendly, testing stage, security, non-root user, labels, minimal prod size
Requirements in README.md

# security
docker build --build-arg=MICROSCANNER_TOKEN=$MICROSCANNER -t ultimatenode:test --target test .
# buildkit
DOCKER_BUILDKIT=1 docker build --build-arg=MICROSCANNER_TOKEN=$MICROSCANNER -t ultimatenode:test --target test .

DOCKER_BUILDKIT=1 docker build --build-arg=MICROSCANNER_TOKEN=$MICROSCANNER -t ultimatenode:prod --target prod .

Multi-Threaded Concerns

Node is usually single threaded
Use multiple replicas, not PM2/forever
Start with 1-2 replicas per CPU
Unit testing = single replica.
Integration testing = multiple replicas

Why Not Compose In Production?

Only understands a single server (engine)
Doesn't understand uptime or headlthchecks
Swarm is easy and solves most use cases
Single server? Use Swarm
Kubernetes not ideal for 1-5 servers. Try cloud hosted

Node.js With Proxies

Common: many HTTP containers need to listen on 80/443
Nginx and HAProxy have lots of options
Traefik is the new kid, full of cool features
Think early how your Node apps will communicate on a single server or cluster

Connections During Container Replacement

Add SIGTERM Code to all Node.js apps
- ./sample-graceful-shutdown
Prevents killing app, but not graceful connection migration
Check godaddy/terminus for easier hc + shutdown

Container Replacement Process

Shutdown wait defaults: Docker/Swarm: 10s, Kubernetes: 30s
Kubernetes/Swarm use healthchecks differently for ingress LB
Give shutdown waits longer than HTTP long polling
HTTP: Use stoppable to track open connections

Node.js With Orchestration

Multi-container, single image
Startup "ready" state: healthchecks
Multi-container client state sharing(don't use in-memory state)
Shutdown cleanup: reconnect clients, close DB, fail readiness (K8s)

Voting App, Cluster-Ready

./sample-result-orchestration
Kubernetes and Swarm-ready version
Healthcheck/Readiness wait for DB
Readiness re-checks DB connection
socket.io uses redis
Stoppable for cleanup

Node.js With Docker Swarm

./sample-swarm/
Example of Node.js app stack
Has cluster features under "deploy"
replicas, update_config
stop_grace_period

State of ARM + Docker for Node

ARM processors are used everywhere
But it's hard to develop on ARM
April 2019: docker + ARM partnership
Docker Desktop runs ARM now!
Node is great on ARM
Docker is the easiest way to dev for ARM

Run Node ARM Containers for Dev

Easy button: Change the FROM image to arm64v8/node:<tag>
This forces macOS/Win to run ARM
Uses QEMU "proc emulator"
Build/run like normal
Mix with x86 in compose

docker image inspect arm64v8/node:10-alpine | grep Arch
# "Architectrue":"arm64",
docker image inspect node:10-alpine | grep Arch
# "Architectrue":"amd64",

docker build -t arm64node .
docker run -p 8080:3000 -d arm64node
docker ps

Run Node ARM Containers for Prod

AWS A1 Instances (Graviton Processors)
Test your IOT/Embedded code
Docker Hub doesn't build arm64 images
Or does it?(QEMU hack)
Build your own CI with QEMU
Swarm just works!

The Futrue: Making ARM Easier

ARM + Docker partnership will make this easier
Build multi-arch in one command
Store multi-arch images in single repo
Easier to know which arch you're running locally

Name		Name	Last commit message	Last commit date
Latest commit History 1,158 Commits
COURSE		COURSE
DOCKER-STACK		DOCKER-STACK
HISTORY		HISTORY
K8S-STACK		K8S-STACK
portus/compose		portus/compose
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
k8s-arch.png		k8s-arch.png

Kirk-Wang/Cloud-Native

Folders and files

Latest commit

History

Repository files navigation

Cloud-Native-And-DevOps

@@@@🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔@@@@

Kubernetes features (1.16)

What and why of orchestration

Major container orchestration projects

Kubernetes distributions

Kubernetes concepts

Basic things we can ask Kubernetes to do

Other things that Kubernetes can do for us

Kubernetes architecture

Kubernetes architecture: the nodes

Kubernetes architecture: the control plane

Running the control plane on special nodes

Running the control plane outside containers

Do we need to run Docker at all?

Interacting with Kubernetes

Pods

Getting a Kubernetes cluster for learning

Docker Desktop(Windows 10/macOS)

Check your connection in a terminal

minikube(Windows 10 Home)

microk8s(linux)

Web-based options

shpod: For a consistent Kubernetes experience...

First contact with kubectl

kubectl get

Obtaining machine-readable output

(Ab)using kubectl and jq

Kubectl describe

Exploring types and definitions

Introspection vs. documentation

Type names

More get commands: Services

More get commands: Listing running containers

Namespaces

Accessing namespaces

What are all these control plane pods?

Scoping another namespace

Namespaces and other kubectl commands

What about kube-public?

Exploring kube-public

What about kube-node-lease?

Running our first containers on Kubernetes

Starting a simple pod with kubectl run

Behind the scenes of kubectl run

What are these different things?

Our pingpong deployment

Viewing container output

Scaling our application

Streaming logs of multiple pods

Resilience

What happened?

What if we wanted something different?

Scheduling periodic background work

Creating a Cron Job

Cron Jobs in action

What about that deprecation warning?

Various ways of creating resources

Streaming logs of multiple pods

Streaming logs of many pods

Why can't we stream the logs of many pods?

Shortcomings of kubectl logs

Accessing logs from the CLI

Doing it manually

Stern

Installing Stern

Using Stern

Stern conveninent options

Cleanup

19,000 words

Exposing containers

Basic service types

Running containers with open ports

Creating a deployment for our HTTP server

Services are layer 4 constructs

First contact with `kubectl`

(Ab)using `kubectl` and `jq`

More `get` commands: Services

More `get` commands: Listing running containers

Namespaces and other `kubectl` commands

What about `kube-public`?

Exploring `kube-public`

What about `kube-node-lease`?

Starting a simple pod with `kubectl run`

Behind the scenes of `kubectl run`

Our `pingpong` deployment

Shortcomings of `kubectl logs`

`endpints` not `endpoint`

Example in `worker/worker.py`

Checking `hasher` and `rng` response times

Security implications of `kubectl apply`

`kubectl apply` is the new `curl | sh`

Use the `--force`, Luke

`deploy/rng` and `ds/rng`