- Prerequisites for Epiphany engine
- Epiphany cluster
- Monitoring
- Kubernetes
- How to do Kubernetes RBAC
- How to run an example app
- How to set resource requests and limits for Containers
- How to run CronJobs
- How to test the monitoring features
- How to run chaos on Epiphany Kubernetes cluster and monitor it with Grafana
- How to tunnel Kubernetes dashboard from remote kubectl to your PC
- How to setup Azure VM as docker machine for development
- How to upgrade Kubernetes cluster
- How to authenticate to Azure AD app
- How to expose service through HA Proxy load balancer
- Security
- Data and log retention
- Databases
To be able to run the Epiphany engine from your local OS you have to install:
- Bash 4.4+
- Should be natively installed on Linux distributions.
- MacOS version of bash most likely needs upgrading.
- For Windows 10 you can install Ubuntu subsystem.
- For Windows 7 see the docker image options below.
- Ansible 2.6+
- Hashicorp Terraform 0.11.8+
- jq (JSON Query tool: https://stedolan.github.io/jq/download)
- Python 2.7
- jinja2 2.10+
- jmespath 0.9.3+
- Git
- Azure CLI 2.0+
- SSH
This can both be used for deploying/managing clusters or for development.
To facilitate an easier path for developers to contribute to Epiphany we have a development docker image based on alpine. This image will help to more easily setup a development environment or to develop on systems which do not support Bash like Windows 7.
The following prerequisites are needed when working with the development image:
- Docker https://www.docker.com
- For Windows 7 check here
- Git https://git-scm.com
There are 2 ways to get the image, build it localy yourself or pull it from the Epiphany docker registry.
-
Run the following to build the image locally:
docker build -t epiphany-dev -f core/src/docker/dev/Dockerfile .
-
To run the locally build image in a container use:
docker run -it -v LOCAL_DEV_DIR:/epiphany --rm epiphany-dev
Where
LOCAL_DEV_DIR
should be replaced with the local path to your core and data repositories. This will then be mapped toepiphany
inside the container. If everything is ok you will be presented with a Bash prompt from which one can run the Epiphany engine. Note that when filling in your data YAMLs one needs to specify the paths from the container's point of view.
-
Pull down the image from the registry:
docker pull epiphanyplatform/epiphany-dev
-
To run the pulled image in a container use:
docker run -it -v LOCAL_DEV_DIR:/epiphany --rm epiphanyplatform/epiphany-dev
Where
LOCAL_DEV_DIR
should be replaced with the local path to your local Epiphany repo. This will then be mapped toepiphany
inside the container. If everything is ok you will be presented with a Bash prompt from which one can run the Epiphany engine while editing the core and data sources on the local OS. Note that when filling in your data YAMLs one needs to specify the paths from the container's point of view.
For people who are only using the Epiphany engine to deploy and maintain clusters there is a Dockerfile for the image with the engine already embedded.
To get it from the registry and run it:
- Build an dev image described here.
- Run the following command to build the deployment image locally:
docker build -t epiphany-deploy -f core/core/src/docker/deploy/Dockerfile .
- To run the pulled image in a container use:
docker run -it -v LOCAL_DATA_DIR:/epiphany/core/data \ -v LOCAL_BUILD_DIR:/epiphany/core/build \ -v LOCAL_SSH_DIR:/epiphany/core/ssh \ --rm epiphany-deploy
LOCAL_DATA_DIR
should be the host input directy for your data YAMLs and certificates. LOCAL_BUILD_DIR
should be the host directory where you want the Epiphany engine to write its build output. LOCAL_SSH_DIR
should be the host directory where the SSH keys are stored. If everything is ok you will be presented with a Bash prompt from which one can run the Epiphany engine. Note that when filling in your data YAMLs one needs to specify the paths from the container's point of view.
[Azure specific
] Ensure that you have already enough resources/quotas accessible in your region/subscription on Azure before you run Epiphany - depending on your configuration it can create large number of resources.
-
Watch out for the line endings conversion. By default Git for Windows sets
core.autocrlf=true
. Mounting such files with Docker results in^M
end-of-line character in the config files. Use: Checkout as-is, commit Unix-style (core.autocrlf=input
) or Checkout as-is, commit as-is (core.autocrlf=false
). Be sure to use a text editor that can work with Unix line endings (e.g. Notepad++). -
Remember to allow Docker Desktop to mount drives in Settings -> Shared Drives
-
Escape your paths properly:
- Powershell example:
docker run -it -v C:\Users\USERNAME\git\epiphany:/epiphany --rm epiphany-dev
- Git-Bash example:
winpty docker run -it -v C:\\Users\\USERNAME\\git\\epiphany:/epiphany --rm epiphany-dev
-
Mounting NTFS disk folders in a linux based image causes permission issues with SSH keys. When running either the development or deploy image:
-
Copy the certs on the image:
mkdir -p ~/.ssh/epiphany-operations/ cp /epiphany/core/ssh/id_rsa* ~/.ssh/epiphany-operations/
-
Set the propper permission on the certs:
chmod 400 ~/.ssh/epiphany-operations/id_rsa*
Epiphany uses Grafana for monitoring data visualization. Epiphany installation creates Prometheus datasource in Grafana, so the only additional step you have to do is to create your dashboard.
You can create your own dashboards Grafana getting started page will help you with it. Knowledge of Prometheus will be really helpful when creating diagrams since it use PromQL to fetch data.
There are also many ready to take Grafana dashboards created by community - remember to check license before importing any of those dashboards. To import existing dashboard:
- If you have found dashboard that suits your needs you can import it directly to Grafana going to menu item
Dashboards/Manage
in your Grafana web page. - Click
+Import
button. - Enter dashboard id or load json file with dashboard definition
- Select datasource for dashboard - you should select
Prometheus
. - Click
Import
To configure PostgreSQL login to server using ssh and switch to postgres user with command:
sudo -u postgres -i
And then configure database server using psql according to your needs and PostgreSQL documentation, to which link you can find at https://www.postgresql.org/docs/
In order to configure PostgreSQL replication add to your data.yaml a block similar to the one below to core section:
postgresql:
replication:
enable: yes
user: your-postgresql-replication-user
password: your-postgresql-replication-password
max_wal_senders: 10 # (optional) - default value 5
wal_keep_segments: 34 # (optional) - default value 32
If enable is set to yes in replication then Epiphany will automatically create cluster of master and slave server with replication user with name and password specified in data.yaml.
There are many monitoring components deployed with Epiphany that you can visualize data from. The knowledge which components are used is important when you look for appropriate dashboard on Grafana website or creating your own query to Prometheus.
List of monitoring components - so called exporters:
- cAdvisor
- HAProxy Exporter
- JMX Exporter
- Kafka Exporter
- Node Exporter
- Zookeeper Exporter
When dashboard creation or import succeeds you will see it on your dashboard list.
In order to start viewing and analyzing logs with Kibana, you first need to add an index pattern for Filebeat according to the following steps:
- Goto the
Management
tab - Select
Index Patterns
- On the first step define as index pattern:
filebeat-*
Click next. - Configure the time filter field if desired by selecting
@timestamp
. This field represents the time that events occurred or were processed. You can choose not to have a time field, but you will not be able to narrow down your data by a time range.
This filter pattern can now be used to query the Elasticsearch indices.
By default Kibana adjusts the UTC time in @timestamp
to the browser's local timezone. This can be changed in Management
> Advanced Settings
> Timezone for date formatting
.
In order to send messages from Prometheus add monitoring block to your data.yaml similar to the one below:
monitoring:
alerts:
enable: true
handlers:
mail:
smtp_from: '[email protected]'
smtp_host: 'somesmtp.example.com:587'
smtp_auth_username: 'someusername'
smtp_auth_password: 'somepassword'
smtp_require_tls: true
recipients: ['[email protected]', '[email protected]']
rules:
- name: "disk"
expression: ((node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes) < 99
duration: 1m #1s, 1m, 1h, 1d, 1w, ...
severity: critical
message: "Disk space Exceeded"
- name: "updown"
expression: up == 0
duration: 1m #1s, 1m, 1h, 1d, 1w, ...
severity: critical
message: "Instance down"
monitoring: - this covers whole monitoring section and is needed to define alerts
alerts: - this covers whole alerts section and is needed to define alerts
enable: true - global switch to turn off/on alerts. Set to true enable alerts.
handlers: - this section covers email handlers, right now only email is supported
mail: - global configuration for smtp and email
smtp_from: '[email protected]' - name of email sender
smtp_host: 'somesmtp.example.com:port' - address of your smtp server with port
smtp_auth_username: 'someusername' - name of your smtp server username
smtp_auth_password: 'somepassword' - password for your smtp server user
smtp_require_tls: true - enabling/disabling tls. Set to true to enable TLS support.
recipients: ['[email protected]', '[email protected]'] - list of recipients in form
['[email protected]', '[email protected]']. At least one recipient has to be declared.
rules: - this section covers rules for Prometheus to enable monitoring. Each of rule have to follow pattern defined below.
- name: "disk" - name of file for Prometheus where rule will be stored. Permitted are alphanumerical characters only.
expression: ((node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes) < 99 - rule in format of Prometheus queries
duration: 1m #1s, 1m, 1h, 1d, 1w, ... - duration of event after which notification will be sent, follow Prometheus convention
severity: critical - severity label, that will be showed in email sent to users
message: "Disk space Exceeded" - email topic that will be showed in email sent to users
More information about Prometheus queries you can find under links provided below:
https://prometheus.io/docs/prometheus/latest/querying/basics/
https://prometheus.io/docs/prometheus/latest/querying/examples/
Right now we are only supporting email messages, but we are working heavily on introducing integration with Slack and Pager Duty.
If you want to create scalable Prometheus setup you can use federation. Federation lets you scrape metrics from different Prometheus instances on one Prometheus instance.
In order to create federation of Prometheus add to your configuration (for example to prometheus.yaml
file) of previously created Prometheus instance (on which you want to scrape data from other
Prometheus instances) to scrape_configs
section:
scrape_configs:
- job_name: federate
metrics_path: /federate
params:
'match[]':
- '{job=~".+"}'
honor_labels: true
static_configs:
- targets:
- your-prometheus-endpoint1:9090
- your-prometheus-endpoint2:9090
- your-prometheus-endpoint3:9090
...
- your-prometheus-endpointn:9090
To check if Prometheus from which you want to scrape data is accessible, you can use a command like below (on Prometheus instance where you want to scrape data):
curl -G --data-urlencode 'match[]={job=~".+"}' your-prometheus-endpoint:9090/federate
If everything is configured properly and Prometheus instance from which you want to gather data is up and running, this should return the metrics from that instance.
Setting up addtional monitoring on Azure for redundancy is good practice and might catch issues the Epiphany monitoring might miss like:
- Azure issues and resource downtime
- Issues with the VM which runs the Epiphany monitoring and Alerting (Prometheus)
More information about Azure monitoring and alerting you can find under links provided below:
https://docs.microsoft.com/en-us/azure/azure-monitor/overview
https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-overview-alerts
Kubernetes that comes with Epiphany has an admin account created, you should consider creating more roles and accounts - especially when having many deployments running on different namespaces.
To know more about RBAC in Kubernetes use this link
-
Pull
core
repository and if neededdata
repository (contains data.yaml files that can be used as example or base for creating your own data.yaml). -
Prepare your VM/Metal servers:
- Install one of supported OS: RedHat 7.4+, Ubuntu 16.04+
- Create user account with sudo privileges and nopasswd that will use rsa key for login.
- Assign static IP addresses for each of the machines - those addresses should not change after cluster creation.
- Assign hostnames for machines.
- Ensure machines have internet access - it will be needed during Epiphany execution.
- Machines will strongly utilize communication between each other, so ensure this communication does not go through proxy.
- Note down IP addresses and hostnames of your machines.
-
If you need you can create new directory in
repository_path/data/your_platform/
or you can use existing profile from data repository. Whereyour_platform
can bevmware
,vbox
,metal
. -
Create or modify data.yaml.
-
Fill in data.yaml with hostname information (
nodes[*]/hosts/name
). -
Fill in data.yaml with IP information (
nodes[*]/hosts/ips/public
). -
You can adjust roles for each machine - according to your needs (
nodes[*]/ansible_roles
). -
Run
bash epiphany -a -b -i -p your_platform -f your_profile
in main epiphany directory. Do not use trailing slash after profile name or as prefix to infrastructure. -
Store artifacts in
/build
directory securely. Keep those files in order to upgrade/scale your cluster.
-
Pull core repository and if needed data repository (contains data.yaml files that can be used as example or base for creating your own data.yaml).
-
If you need you can create new directory in
repository_path/data/azure/infrastructure/
or you can use existing profile from data repository. -
Fill/modify content in the
data.yml
file inrepository_path/data/azure/infrastructure/your_profile
according to your needs. Please, make sure you have enough free public ips/cores assigned to your subscription.- Data.yaml files can be very verbose and at the beginning you can find difficulties modifying it, especially when defining large clusters with many virtual machines. Instead of defining huge data.yaml file - you can use template.
- Look at data repository, there is a template for Azure environments in path
repository_path/data/azure/infrastructure/epiphany-template
- Create folder and
basic-data.yaml
file in it (like/infrastructure/epiphany-rhel-playground/basic-data.yaml
). This file contains basic data for new cluster like subscription, number of VMs, or keys location. - Execute Epiphany engine with following command when using template file:
bash epiphany -a -b -i -f infrastructure/your_profile -t /infrastructure/epiphany-template
-
If you executed point 2.4 - skip next step and go to 5.
-
Run
bash epiphany -a -b -i -f infrastructure/your_profile
in main epiphany directory. Do not use trailing slash after profile name or as prefix to infrastructure. -
The first you run the above command line it will prompt you to login to Microsoft and show you something like the following:
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code DBD7WRF3H to authenticate.
-
Store artifacts in
/build
directory securely. Keep those files in order to upgrade/scale your cluster.Follow the instructions and a token will be generated in your home directory and then a Service Principal will be created and you will not be prompted for this again.
-
Go to section Azure post deployment manual steps that may be applicable for your deployment.
Keep this in mind that Epiphany will create public IPs for each of the machines, you can remove it but running Epiphany again on the same cluster will recreate public IPs.
There are no manual steps required when you finished with How to create an Epiphany cluster on Azure until you decide to move to production environment
where cluster's virtual machines must not
be exposed to internet (except load balancer - HAProxy role).
Production environment on cloud should be composed of two elements:
- Demilitarized (
DMZ
) group that contains only load balancer (HAProxy role) - Applications (
APPS
) group that contains all other roles
Both elements are deployed independently (for now) that is why some manual steps, that will be described in this chapter, are required.
DMZ group should contain HAProxy role that is used for load balancing and TLS termination. VM that hosts HAProxy should be the only one
accessible from internet. You can see DMZ implementation with VPN for Epiphany build cluster repository_path/data/azure/infrastructure/epiphany-bld-dmz
.
APPS group contains all features/roles required by your installation - this group should contain (you can enable or disable it) also contain VPN connection so you can access dashboards and logs. You can see APPS group implementation with VPN for Epiphany build cluster repository_path/data/azure/infrastructure/epiphany-bld-apps
, there is nothing special with this configuration - normal Epiphany data.yaml with VPN enabled (just don't forget to specify you VPN' client certificate).
When you executed two deployments you should get two resource groups (dmz, apps) with two different VNETs and VPNs. Now manual steps goes:
-
Peer you VNET's. Go to VNET setting blade and add peering to another vnet - you have to do it twice, both ways.
-
Add monitoring endpoints for Prometheus. Load balancer (HAProxy) is separate deployment (for now), but still we have to monitor and take logs from it. That is why we have to add scrape configs for Prometheus (monitoring)
-
SSH into monitoring machine and add
two
files in folder/etc/prometheus/file_sd/
# OS Monitoring - haproxy-vm-node - targets: ['HAPROXY_MACHINE_PRIVATE_IP:9100'] labels: "job": "node"
# HAProxy monitoring - haproxy-exporter - targets: ['HAPROXY_MACHINE_PRIVATE_IP:9101'] labels: "job": "haproxy"
-
-
... and configure address for Elasticsearch (logging)
- SSH into Load Balancer (HAProxy) machine, and edit file
/etc/filebeat/filebeat.yml
. - Find
### KIBANA ###
section and add private IP address of Logging VM (Kibana
) as host value - Find
### OUTPUTS ###
section and add private IP address of Logging VM (Elasticsearch
) as host value
- SSH into Load Balancer (HAProxy) machine, and edit file
-
For security reasons you should also disassociate public IPs from your APPS virtual machines.
-
Ensure you defined firewall settings for public VM (load balancer): How to enable/disable network traffic- firewall
Here we will get a simple app to run using Docker through Kubernetes. We assume you are using Windows 10, have an Epiphany cluster on Azure ready and have an Azure Container Registry ready (might not be created in early version Epiphany clusters - if you don't have one you can skip to point no 11 and test the cluster using some public app from the original Docker Registry). Steps with asterisk can be skipped.
-
Install Chocolatey
-
Use Chocolatey to install:
- Docker-for-windows (
choco install docker-for-windows
, requires Hyper-V) - Azure-cli (
choco install azure-cli
)
- Docker-for-windows (
-
Make sure Docker for Windows is running (run as admin, might require a restart)
-
Run
docker build -t sample-app:v1 .
in examples/dotnet/epiphany-web-app. -
*For test purposes, run your image locally with
docker run -d -p 8080:80 --name myapp sample-app:v1
and head tolocalhost:8080
to check if it's working. -
*Stop your local docker container with:
docker stop myapp
and rundocker rm myapp
to delete the container. -
*Now that you have a working docker image we can proceed to the deployment of the app on the Epiphany Kubernetes cluster.
-
Run
docker login myregistry.azurecr.io -u myUsername -p myPassword
to login into your Azure Container Registry. Credentials are in theAccess keys
tab in your registry. -
Tag your image with:
docker tag sample-app:v1 myregistry.azurecr.io/samples/sample-app:v1
-
Push your image to the repo:
docker push myregistry.azurecr.io/samples/sample-app:v1
-
SSH into your Epiphany clusters master node.
-
*Run
kubectl cluster-info
andkubectl config view
to check if everything is okay. -
Run
kubectl create secret docker-registry myregistry --docker-server myregistry.azurecr.io --docker-username myusername --docker-password mypassword
to create k8s secret with your registry data. -
Create
sample-app.yaml
file with contents:apiVersion: apps/v1 kind: Deployment metadata: name: sample-app spec: selector: matchLabels: app: sample-app replicas: 2 template: metadata: labels: app: sample-app spec: containers: - name: sample-app image: myregistry.azurecr.io/samples/sample-app:v1 ports: - containerPort: 80 resources: requests: cpu: 100m memory: 64Mi limits: memory: 128Mi imagePullSecrets: - name: myregistry
-
Run
kubectl apply -f sample-app.yaml
, and after a minute runkubectl get pods
to see if it works. -
Run
kubectl expose deployment sample-app --type=NodePort --name=sample-app-nodeport
, then runkubectl get svc sample-app-nodeport
and note the second port. -
Run
kubectl get pods -o wide
and check on which node is the app running. -
Access the app through [AZURE_NODE_VM_IP]:[PORT] from the two previous points - firewall changes might be needed.
When Kubernetes schedules a Pod, it’s important that the Containers have enough resources to actually run. If you schedule a large application on a node with limited resources, it is possible for the node to run out of memory or CPU resources and for things to stop working! It’s also possible for applications to take up more resources than they should.
When you specify a Pod, it is strongly recommended to specify how much CPU and memory (RAM) each Container needs. Requests are what the Container is guaranteed to get. If a Container requests a resource, Kubernetes will only schedule it on a node that can give it that resource. Limits make sure a Container never goes above a certain value. For more details about the difference between requests and limits, see Resource QoS.
For more information, see the links below:
-
Follow the previous point using examples/dotnet/Epiaphany.SampleApps/Epiphany.SampleApps.CronApp
-
Create
cronjob.yaml
file with contents:apiVersion: batch/v1beta1 kind: CronJob metadata: name: sample-cron-job spec: schedule: "*/1 * * * *" # Run once a minute failedJobsHistoryLimit: 5 jobTemplate: spec: template: spec: containers: - name: sample-cron-job image: myregistry.azurecr.io/samples/sample-cron-app:v1 restartPolicy: OnFailure imagePullSecrets: - name: myregistrysecret
-
Run
kubectl apply -f cronjob.yaml
, and after a minute runkubectl get pods
to see if it works. -
Run
kubectl get cronjob sample-cron-job
to get status of our cron job. -
Run
kubectl get jobs --watch
to see job scheduled by the “sample-cron-job” cron job.
Prerequisites: Epiphany cluster on Azure with at least a single VM with prometheus
and grafana
roles enabled.
-
Copy ansible inventory from
build/epiphany/*/inventory/
toexamples/monitoring/
-
Run
ansible-playbook -i NAME_OF_THE_INVENTORY_FILE grafana.yml
inexamples/monitoring
-
In the inventory file find the IP adress of the node of the machine that has grafana installed and head over to
https://NODE_IP:3000
- you might have to head over to Portal Azure and allow traffic to that port in the firewall, also ignore the possible certificate error in your browser. -
Head to
Dashboards/Manage
on the side panel and selectKubernetes Deployment metrics
- here you can see a sample kubernetes monitoring dashboard. -
Head to
http://NODE_IP:9090
to see Prometheus UI - there in the dropdown you have all of the metrics you can monitor with Prometheus/Grafana.
-
SSH into the Kubernetes master.
-
Copy over
chaos-sample.yaml
file from the example folder and run it withkubectl apply -f chaos-sample.yaml
- it takes code fromgithub.com/linki/chaoskube
so normal security concerns apply. -
Run
kubectl create clusterrolebinding chaos --clusterrole=cluster-admin --user=system:serviceaccount:default:default
to start the chaos - random pods will be terminated with 5s ferquency, configurable inside the yaml file. -
Head over to Grafana at
https://NODE_IP:3000
, open a new dashboard, add a panel, set Prometheus as a data source and putkubelet_running_pod_count
in the query field - now you can see how Kubernetes is replacing killed pods and balancing them between the nodes. -
Run
kubectl get svc nginx-service
and note the second port. You can access the nginx page via[ANY_CLUSTER_VM_IP]:[PORT]
- it is accessible even though random pods carrying it are constantly killed at random, unless you have more vms in your cluster than deployed nginx instances and choose IP of one not carrying it.
Prerequisites: Epiphany cluster on Azure with at least a single VM with elasticsearch
, kibana
and filebeat
roles enabled.
-
Connect to kubectl using kubectl proxy or directly from Kubernetes master server
-
Apply from epiphany repository
extras/kubernetes/pod-counter
pod-counter.yaml
with command:kubectl apply -f yourpath_to_pod_counter/pod-counter.yaml
Paths are system dependend so please be aware of applying correct separator for your operatins system.
-
In the inventory file find the IP adress of the node of the machine that has kibana installed and head over to
http://NODE_IP:5601
- you might have to head over to Portal Azure and allow traffic to that port in the firewall. -
You can right now search for data from logs in Discover section in Kibana after creating filebeat-* index pattern. To create index pattern click Discover, then in Step 1: Define index pattern as filebeat-*. Then click Next step. In Step 2: Configure settings click Create index pattern. Right now you can go to Discover section and look at output from your logs.
-
You can verify if CounterPod is sending messages correctly and filebeat is gathering them correctly querying for
CounterPod
in search field in Discover section. -
For more informations refer to documentation: https://www.elastic.co/guide/en/kibana/current/index.html
-
SSH into server, and forward port 8001 to your machine
ssh -i epi_keys/id_rsa [email protected] -L 8001:localhost:8001
NOTE: substitute IP with your cluster master's IP. -
On remote host: get admin token bearer:
kubectl describe secret $(kubectl get secrets --namespace=kube-system | grep admin-user | awk '{print $1}') --namespace=kube-system | grep -E '^token' | awk '{print $2}' | head -1
NOTE: save this token for next points. -
On remote host, open proxy to the dashboard
kubectl proxy
-
Now on your local machine navigate to
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/overview?namespace=default
-
When prompted to put in credentials, use admin token from the previous point.
-
Make sure you have docker-machine installed
(choco install docker-machine)
-
Run the following:
docker-machine create --driver azure --azure-subscription-id <visual-studio-subscription-id> --azure-resource-group <resource-group> --azure-vnet <vnet> --azure-subnet default --azure-location westeurope <name-of-the-vm>
-
When the creation succeedes go ahead and connect to your docker-machine using
docker-machine env <name-of-the-vm>
and later invoke commands as instructed by docker-machine -
Check if everything is working with
docker run hello-world
Now your docker containers are running on a separate system without you having to worry about overhead.
Source: https://docs.docker.com/machine/drivers/azure/#options
Prerequisites: Epiphany Kubernetes cluster
-
SSH into the Kubernetes master.
-
Run
echo -n 'admin' > ./username.txt
,echo -n 'VeryStrongPassword!!1' > ./password.txt
andkubectl create secret generic mysecret --from-file=./username.txt --from-file=./password.txt
-
Copy over
secrets-sample.yaml
file from the example folder and run it withkubectl apply -f secrets-sample.yaml
-
Run
kubectl get pods
, copy the name of one of the ubuntu pods and runkubectl exec -it POD_NAME -- /bin/bash
with it. -
In the pods bash run
printenv | grep SECRET
- Kubernetes secret created in point 2 was attached to pods during creation (take a look atsecrets-sample.yaml
) and are availiable inside of them as an environmental variables.
-
Register you application. Go to Azure portal to
Azure Active Directory => App registrations
tab. -
Click button
New application registration
fill the data and confirm. -
Deploy app from
examples/dotnet/Epiphany.SampleApps/Epiphany.SampleApps.AuthService
.This is a test service for verification Azure AD authentication of registered app. (How to deploy app)
-
Create secret key for your app
settings => keys
. Remember to copy value of key after creation. -
Try to authenticate (e.g. using postman) calling service api
<service-url>/api/auth/
with following Body application/json type parameters :{ "TenantId": "<tenant-id>", "ClientId": "<client-id>", "Resource": "https://graph.windows.net/", "ClientSecret": "<client-secret>" }
-
TenantId - Directory ID, which you find in
Azure active Directory => Properties
tab. -
ClientId - Application ID, which you find in details of previously registered app
Azure Active Directory => App registrations => your app
-
Resource - https://graph.windows.net is the service root of Azure AD Graph API. The Azure Active Directory (AD) Graph API provides programmatic access to Azure AD through OData REST API endpoints. You can construct your own Graph API URL. (How to construct a Graph API URL)
-
ClientSecret - Created secret key from 4. point.
-
-
The service should return Access Token.
-
Add haproxy role to your data.yaml
-
Create a folder repository_path/core/src/ansible/roles/haproxy/vars/
-
Create a file repository_path/core/src/ansible/roles/haproxy/vars/main.yml:
-
Add to repository_path/core/src/ansible/roles/haproxy/vars/main.yml content:
--- service_port: your_service_port
Where
your_service_port
is a port where your service is exposed using NodePort.
-
Add haproxy_tls_termination role to your data.yaml
-
If you want to minimize risk of Slowloris like attacks add to your data.yaml in section for haproxy:
haproxy: http_request_timeout: 5s
Where http_request_timeout is the number_of_seconds with s after which connection to HAProxy will be terminated by HAProxy. This parameter is optional, if is not present no timeout http-request in global section of HAProxy configuration will be set.
If you want to use HAProxy with TLS/SSL certificate follow the instruction below.
-
Add haproxy_tls_termination role to your data.yaml
-
If you want to use your certificates, you can add to section core to your data.yaml:
haproxy: haproxy_certs_dir: your_path_to_certificates
Your certificates will be copied and applied automatically to HA Proxy configuration.
Please be aware that
your_path_to_certificates
cannot contain variables ($HOME
) or tilde (~
) as this will make deployment of Epiphany fail. Additionally if you need more than one DNS name for your frontend you need to provide certificates on your own, as there is only one self-signed certificate generated by this role with CN localhost. For multiple backends you need to provide also mapping as described in later part of this document. -
If you don't want to apply your certificates that will be generated automatically, then just don't put any certificate in
your_path_to_certificates
or don't put section withhaproxy: haproxy_certs_dir
in your data.yaml -
Below you can find example of configuration:
haproxy: haproxy_certs_dir: /home/epiphany/certs/ frontend: - name: https_front port: 443 https: yes backend: - http_back1 - http_back2 domain_backend_mapping: - domain: backend1.domain.com backend: http_back1 - domain: backend2.domain.com backend: http_back2 - name: http_front1 port: 80 https: no backend: - http_back2 - name: http_front2 port: 8080 https: no backend: - http_back1 - http_back2 domain_backend_mapping: - domain: http-backend1.domain.com backend: http_back1 - domain: http-backend2.domain.com backend: http_back2 backend: - name: http_back1 server_groups: - worker port: 30001 - name: http_back2 server_groups: - worker - kibana port: 30002
-
Parameters description:
haproxy_certs_dir
- (Optional) Path on machine from which you run Epiphany installer where certificates generated by you are stored. If not one certificate with CN localhost will be generated, works only with one frontend definition, in other cases it won't be able to redirect you to correct backend on HAProxy.frontend
- (Mandatory) At least one frontend configuration must exist, if more than one domain must be supported thandomain_backend_mapping
section is mandatory, as this will make fail. This is a list of frontend, each position has to start with-
.-
name
- (Mandatory) Name of each configuration for frontend. -
port
- (Mandatory) Port to which frontend should be binding. Must be unique for all frontends in other case it will make HAProxy fail. -
https
- (Mandatory) Information if https will be used - optionsyes
/no
. Ifno
, only http part of configuration for frontend will be generated. -
backend
- (Mandatory) At least one backend configuration must exist. Ifdomain_backend_mapping
exists this must match configuration indomain_backend_mapping
backend section. It always has to match configuration from backend name section. This is a list of backend, each position has to start with-
. This parameter shows to which backend configuration forward traffic from frontend to backend. -
domain_backend_mapping
- (Optional) If this exist at least one domain to backend mapping must exist. Must be provided if more than one domain has to be supported.domain
- (Mandatory ifdomain_backend_mapping
used for each mapping) Domain that matches SSL certificate CN for https configuration and domain name. For http, domain that will be mapped using http header.backend
- (Mandatory ifdomain_backend_mapping
used for each mapping) Must match name from backend section
backend
- (Mandatory) This is a list of backend, each position has to start with-
. At least one backend used by frontend must exist. If there won't be a match with each frontend configuration HAProxy will fail to start.name
- (Mandatory) Name of each configuration for backend, must match frontend backend configuration anddomain_backend_mapping
backend part in frontend section.server_groups
- (Mandatory) This is a list of server groups, each position has to start with-
. At least oneserver_group
used by backend must exist. It must match Epiphany role e.g.kibana
,worker
etc.port
- (Mandatory) Port on which backend service is exposed.
-
Upgrade procedure might be different for each Kubernetes version. Upgrade shall be done only from one minor version to next minor version. For example, upgrade from 1.9 to 1.11 looks like this:
1.9.x -> 1.9.y
1.9.y -> 1.10
1.10 -> 1.11
Each version can be upgraded in a bit different way, to find information how to upgrade your version of Kubernetes please use this guide.
Epiphany use kubeadm to boostrap a cluster and same tool shall be used to upgrade it.
Upgrading Kubernetes cluster with running applications shall be done step by step. To prevent your applications downtime you should use at least two Kubernetes worker nodes and at least two instances of each of your service.
Start cluster upgrade with upgrading master node. Detailed instructions how to upgrade each node, including master, are described in guide linked above. When Kubernetes master is down it does not affect running applications, at this time only control plane is not operating. Your services will be running but will not be recreated nor scaled when control plane is down.
Once master upgrade finished successfully, you shall start upgrading nodes - one by one. Kubernetes master will notice when worker node is down and it will instatiate services on existing operating node, that is why it is essential to have more than one worker node in cluster to minimize applications downtime.
No downtime upgrades are possible to achieve when upgrading Kafka, but before you start thinking about upgrading you have to think about your topics configuration. Kafka topics are distributed accross partitions with replication. Default value for replication is 3, it means each partition will be replicated to 3 brokers. You should remember to enable redundancy and keep at least two replicas all the time, it is important when upgrading Kafka cluser. When one of your Kafka nodes will be down during upgrade ZooKeeper will direct your producers and consumers to working instances - having replicated partitions on working nodes will ensure no downtime and no data loss work.
Upgrading Kafka could be different for every Kafka release, please refer to Apache Kafka documentation. Important point to remember during Kafka upgrade is the rule: only one broker at the time - to prevent downtime you should uprage you Kafka brokers one by one.
ZooKeeper redundancy is also recommended, since service restart is required during upgrade - it can cause ZooKeeper unavailability. Having at least two ZooKeeper services in ZooKeepers ensemble you can upgrade one and then start with the rest one by one.
More detailed information about ZooKeeper you can find in ZooKeeper documentation.
Epiphany 1.0 supports firewalld on host machines (RedHat only). You can enable firewall setting .../security/firewall/enable
to true
in data.yaml. Remember to allow port 22 to be open in ports_open (.../security/firewall/ports_open
) dictionary in order to configuration can do its job.
Security for internet facing infrastructure is extremely important thing - remember to configure Network Security Group
rules to allow network traffic only on required ports and directions. You can do it using Azure specific data.yaml in section .../network_security_group/rules
. Remember to allow port 22 (you can/should remove this rule after deployment) in order to configuration can do its job.
Epiphany will create point to site configuration (if you enable VPN in .../security/vpn/enable
and specify public key of your certificate, in base64 format, in public_cert_data
field). For production environments you have to use root certificate from trusted provider
.
For development purposes you can use self signed certificate which can be generated using powershell: https://docs.microsoft.com/en-us/azure/vpn-gateway/vpn-gateway-certificates-point-to-site
When you get root certificate you should generate child certificate(s) that will be distributed to the team that should have VPN access to clusters.
Configuration of client config in data.yaml (.../security/vpn/client_configuration/root_certificate
) looks like following:
...
root_certificate:
# name is the name of the cert that was created for you by a trusted party OR a name you give a self-signed cert
name: NAME-OF-YOUR-CERTIFICATE
revoked_certificate:
name: NAME-OF-REVOKED-CERTIFICATE
thumbprint: THUMBPRINT-OF-REVOKED-CERTIFICATE
# public_cert_data is the actual base64 public key from your cert. Put it in 'as is'. The '|' tells yaml to use 'as is'.
public_cert_data: |
YOUR-BASE64-CLIENT-AUTH-PUBLIC-KEY
...
Configuration requires to have revoked certificate filled in (for now).
Epiphany engine produce build artifacts during each deployment. Those artifacts contains:
- Generated terraform files.
- Generated terraform state files.
- Generated cluster manifest file.
- Generated ansible files.
- Azure login credentials for
service principal
if deploying to Azure.
Artifacts contains sensitive data so it is important to keep it in safe place like private GIT repository
or storage with limited access
. Generated build is also important in case of scaling or updating cluster - you will it in build folder in order to edit your cluster.
Epiphany creates (or use if you don't specified it to create) service principal account which can manage all resources in subscription, please store build artifacts securely.
For Azure specific deployment configuration for Kubernetes Node looks like that:
vms:
- name: vm-k8s-node
size: Standard_DS1_v2
os_type: linux
count: 1
bastian_host: false
# roles are how you define a grouping of nodes. These values will be used to create an inventory of your cluster
# Must be a member of the 'role' in core
roles:
- linux
- worker
- node_exporter
- filebeat
- reboot
There is 1 worker role defined - it means only one Kubernetes node virtual machine will be created and configured to join Kubernetes cluster. When Epiphany deployment was created with one Kubernetes node and then you decide to have more nodes you can simply change
count: 1
to
count: 2
and wait for add new node. It is important to have your build folder from initial deployment so now state will be automatically refreshed with no downtime. For more information about build folder go to Build artifacts section.
For all other deployments (Metal, VMWare, VirtualBox, etc.) you just have to add another definition for machine with worker role.
Scaling Kafka looks exactly the same like scaling Kubernetes. Once changed count:
property from 1
to n
and executed Epiphany you will have n
Kafka machines.
To add new Kafka broker to non-Azure deployment looks the same as adding new Kubernetes node.
When planning Kafka installation you have to think about number of partitions and replicas since it is strongly related to throughput of Kafka and its reliability. By default Kafka's replicas
number is set to 1 - you should change it in core/src/ansible/roles/kafka/defaults
in order to have partitions replicated to many virtual machines.
...
replicas: 1 # Default to at least 1 (1 broker)
partitions: 8 # 100 x brokers x replicas for reasonable size cluster. Small clusters can be less
...
You can read more here about planning number of partitions.
To install RabbitMQ in single mode just add rabbitmq role to your data.yaml for your sever and in general roles section. All configuration on RabbitMQ - e.g. user other than guest creation should be performed manually.
An Epiphany cluster has a number of components which log, collect and retain data. To make sure that these do not exceed the usable storage of the machines there running on the following configurations are available.
For managing the data storage that Elasticsearch consumes we use Elasticsearch Curator. To use it one needs to make sure the elasticsearch-curator is enabled. This role will install and configure the Elasticsearch Curator to run in a cronjob to clean up older indices which are older then a certain treshold.
In the default configuration /core/src/ansible/roles/elasticsearch-curator/defaults/main.yml
the following values can be tweaked regarding storage:
# Rentention time of Elasticsearch indices in days.
indices_retention_days: 30
The size of the storage consumed by Elasticsearch is depenedant on the clustersize and how much logging the deployed application will generate.
In the default configuration /core/src/ansible/roles/grafana/defaults/main.yml
the following values can be tweaked to control the ammount of storage used by Grafana:
# The path where Grafana stores its logs
grafana_logs_dir: "/var/log/grafana"
# The path where Grafana stores it's (Dashboards DB (SQLLite), sessions, etc)
grafana_data_dir: "/var/lib/grafana"
grafana_logging:
# Enable or disable log rotation
log_rotate: true
# Enable or disable daily log rotation
daily_rotate: true
# Number of days to retain the logs
max_days: 7
While logs can be rotated and have a retention time, the ammount of storage used by Grafana is dependant on user usage and dashboard count and cannot be directly controlled.
In the default configuration /core/src/ansible/roles/kafka/defaults/main.yml
the following values can be tweaked regarding storage:
# The path where kafka stores its data
data_dir: /var/lib/kafka
# The path where kafka stores its logs
log_dir: /var/log/kafka
# The minimum age of a log file to be eligible for deletion due to age
log_retention_hours: 168
# Offsets older than this retention period will be discarded
offset_retention_minutes: 10080
The ammount of storage Kafka consumes is dependant on the application running on Epiphany, how many messages producers create and how fast the consumers can consume them. It's up to the application developer to configure a log_retention_hours
and offset_retention_minutes
to suite the applications need.
Since Kafka does not have a mechanism for log rotation we use logrotate for this. The template for logrotate can be found here:
/core/src/ansible/roles/kafka/templates/logrotate.conf.j2
On the system the configuration can be found here:
/etc/logrotate.d/kafka
In the default configuration /core/src/ansible/roles/kibana/defaults/main.yml
the following values can be tweaked regarding storage:
# The path where Kibana stores its logs
kibana_log_dir: /var/log/kibana
Since Kibana does not have a mechanism for log rotation we use logrotate for this. The template for logrotate can be found here:
/core/src/ansible/roles/kibana/templates/logrotate.conf.j2
On the system the configuration can be found here:
/etc/logrotate.d/kibana
Besides logs any other data is depenedant on user usage (Dashboards, queries etc). Kibana stores that kind of data in ElasticSearch under the .kibana
index.
The kubelet and container runtime (Docker) do not run in containers. On machines with systemd they write to journald.
Everything a containerized application writes to stdout and stderr is redirected to the Docker logging driver (json-file
), which is configured to rotate logs automatically.
In the default configuration /core/src/ansible/roles/docker/defaults/main.yml
the following values can be tweaked regarding storage:
docker_logging:
log_opts:
# The maximum size of the log before it is rolled. A positive integer plus a modifier representing the unit of measure (k, m, or g).
max_file_size: "10m"
# The maximum number of log files that can be present. If rolling the logs creates excess files, the oldest file is removed.
max_files: 2
On the system the configuration can be found here:
/etc/docker/daemon.json
In the default configuration /core/src/ansible/roles/prometheus/defaults/main.yml
the following values can be tweaked to control the amount of storage used by Prometheus:
# The path where Prometheus stores its data
prometheus_db_dir: /var/lib/prometheus
# The time it will retain the data before it gets deleted
prometheus_storage_retention: "30d"
prometheus_global:
# The interval it will use to scrape the data from the sources
scrape_interval: 15s
The size of the data which Prometheus will scrape and retain is dependant on the cluster size (Kafka/Kubernetes nodes) and the scrape interval. The Prometheus storage documentation will help you determine how much data might be generated with a certain scrape interval and clustersize. This can then be used to determine a storage retention time in days. Note that one should not plan to use the entire disk space for data retention since it might also be used by other components like Grafana which might be deployed on the same system.
In the default configuration core/src/ansible/roles/zookeeper/defaults/main.yml
the following values can be tweaked regarding storage:
# The path where Zookeeper stores its logs
zookeeper_log_dir: /var/log/zookeeper
# The max size a logfile can have
zookeeper_rolling_log_file_max_size: 10MB
# How many logfiles can be retained before rolling over
zookeeper_max_rolling_log_file_count: 10