-
Notifications
You must be signed in to change notification settings - Fork 45
Deployment testing
Full testing of the Scale system is a somewhat arduous process. There are multiple sub-components that Scale depends on for state persistence and log capture. At its core, Scale consists of 5 main components:
- Scheduler
- Silo API
- Scale API
- UI Frontend
- Message Handlers
These components are logically separated into 3 GitHub repositories:
- Scale Scheduler, API and Message Handlers (github.com/ngageoint/scale)
- Silo API (github.com/ngageoint/seed-silo)
- Scale UI (github.com/ngageoint/scale-ui)
The reasons for this split are primarily to allow individual teams to independently iterate on their respective projects without undue interdependence. Ultimately, for the purposes of simplified Scale deployments and testing these individual projects are all deployed into DCOS via the container that contains the Scale scheduler.
We have hard dependencies on a number of additional services, that in a default deployment are installed by default:
- Postgresql
- RabbitMQ
- Elasticsearch
- Fluentd
Hashicorp Vault is an additional dependency if your system is going to leverage secrets and must be tested before any release.
This is a high-level view of the system. Throughout the following walk-through, I'll try to call out the places you can substitute other choices in specific cases.
The following steps are for performing a rolling update to an existing Scale deployment that only replaces the UI container. This container has 2 responsibilities: serving the compiled UI assets and single routing entry-point (via Nginx) to eliminate any CORS hurdles for the UI. The settings that govern the routing behavior can be found here: https://github.com/ngageoint/scale-ui/tree/master/docker
Assumptions:
- DCOS 1.11+ cluster w/Admin login
- DCOS package Marathon LB installed
- DCOS Public Agents configured to support dynamic subdomains by having wildcard DNS entry
- Previous run through of subsequent Full Scale Deploy section
- Experience deploying, scaling and removing services in DCOS
Checklist:
- Create a new build of the Scale UI Docker image.
- Edit configuration of the
scale-ui
service within DCOS. A basic change to Docker image JSON key value will re-deploy. - Once the
scale-ui
service has gone healthy, you can open the address specified in theHAPROXY_0_VHOST
label, in your browser. - If the service does not go healthy, check the service
stdout
andstderr
logs for any errors.
The following steps outline the steps to validate a build prior to a release:
- DCOS 1.11+ cluster w/Admin login
- DCOS package Marathon LB installed
- DCOS Public Agents configured to support dynamic subdomains by having wildcard DNS entry
- Experience deploying, scaling and removing services in DCOS
- Installation ofPostman or Newman for consuming testing collection.
- A full set of images (
geointdev/scale
,geointdev/scale-ui
,geointdev/scale-fluentd
and optionallygeointdev/scale-vault
) with matched tags.
- Deploys speeds can be vastly improved by using an in-cluster Docker Hub mirror proxy. Read to the bottom of the README for how you need to adjust your Docker image references to accomodate. https://github.com/gisjedi/docker-registry-mirror
- Ensure DCOS is free of any
scale-*
services. Delete it all. This will eliminate any legacy data for testing and cause your deploys to progress much faster.
- Deploy Vault https://github.com/ngageoint/scale/tree/master/dockerfiles/vault. You may need to clear out the
vault
key in Zookeeper if it has been previously initialized. This can be done using Exhibitor APIs:curl -k 'https://omega.aisohio.net/exhibitor/exhibitor/v1/explorer/znode/vault' -X DELETE -H 'netflix-ticket- number: 1' -H 'netflix-reason: redeploy vault' -H 'netflix-user-name: meyerjd' --compressed
- Deploy minimal
marathon.json
into DCOS via Services (https://your-dcos/services) or DCOS CLI:{ "healthChecks": [ { "gracePeriodSeconds": 300, "intervalSeconds": 30, "timeoutSeconds": 20, "maxConsecutiveFailures": 3, "protocol": "COMMAND", "command": { "value": "ps -ef | grep 'manage.py scale_scheduler' | grep -v grep > /dev/null" } } ], "env": { "SCALE_VHOST": "scale.omega.aisohio.net", "SECRETS_TOKEN": "ROOT_TOKEN_FROM_VAULT", "SECRETS_URL":"https://scale-vault.marathon.l4lb.thisdcos.directory:8200", "DCOS_PACKAGE_FRAMEWORK_NAME": "scale", "ENABLE_BOOTSTRAP": "true", "ADMIN_PASSWORD": "admin" }, "gpus": 0, "disk": 0, "mem": 1024, "cpus": 1, "args": ["scale_scheduler"], "container": { "volumes": [], "docker": { "image": "geointdev/scale", "forcePullImage": true, "privileged": false }, "type": "DOCKER" }, "instances": 1, "id": "/scale" }
- Wait for all services to be healthy. Some may cycle a few times due to the timing of the launch of dependent services such as fluentd on elasticsearch. You should see
scale
,scale-db
,scale-elasticsearch
,scale-fluentd
,scale-rabbitmq
,scale-ui
,scale-webserver
. - Once all services are healthy in DCOS, you can browse to the location you specified in the
SCALE_VHOST
environment variable. In Omega, that is: https://scale.omega.aisohio.net/ - The Scale UI should appear and prompt you for a login. Use the default superuser admin username (
admin
) and password (admin
) as configured via theADMIN_PASSWORD
environment variable. - The next step is to ensure that authentication worked and identifies you properly. This can be done by clicking the avatar in the top-right of the UI - it should list you as
Admin User
- Verify that all nodes have gone through cleanup phase and are healthy. https://scale.omega.aisohio.net/system/nodes?active=true&ready=true&paused=true&deprecated=true&offline=true°raded=true&initial_cleanup=true&image_pull=true&scheduler_stopped=true&collapsed=true
- Verify that the message handlers are up and error-free: https://omega.aisohio.net/#/services/detail/%2Fscale/tasks?q=is%3Aactive+message (check logs link)
- Drop test data into location for scanning.
/nas/DCOS/omega/testing/input/
should be used and test data can be found at/nas/Data/happy-couples
. - With an initial system configured, you can leverage our host workspace type Postman (newman) collection and environment to test an end-to-end processing pipeline.
wget https://raw.githubusercontent.com/ngageoint/scale/master/tests/postman/environment.json https://raw.githubusercontent.com/ngageoint/scale/master/tests/postman/full-host-test.json newman run -k -e environment.json full-host-test.json
- Home
- What's New
-
In-depth Topics
- Enable Scale to run CUDA GPU optimized algorithms
- Enable Scale to store secrets securely
- Test Scale's scan capability on the fly
- Test Scale's workspace broker capability on the fly
- Scale Performance Metrics
- Private docker repository configuration
- Setting up Automated Snapshots for Elasticsearch
- Setting up Cluster Monitoring
- Developer Notes