-
Notifications
You must be signed in to change notification settings - Fork 13
Construct a docker compose.yml file
In the last step we will combine all the parts that are configured in the previous steps in a Docker Compose file. Before starting make sure you know the basics of the YAML format and Docker Compose. Next, use a copy of the docker-compose.yml that is available in the BDE Pipeline repository as a starting point for your pilot case's YAML file.
First, add all the services you need for your pilot case under 'services' based on the image list created in step 1. Don't remove the services that are already configured in the YAML file. For each image you find a Docker Compose snippet which you can copy-paste into your docker-compose.yml in the image's README. If this should not be the case, contact the component responsible.
Next, configure an INIT_DAEMON_STEP environment variable for each service that needs to communicate with the init daemon service. The value of the variable must be the code of the corresponding step as configured in your flow in step 4. A service can have only one INIT_DAEMON_STEP configured. This can be done in your Docker Compose file as follows:
services:
demo:
image: bde2020/demo-spark-sensor-data:2.0.0
environment:
INIT_DAEMON_STEP: compute_aggregations
It might be that some steps in your flow don't match a service in the docker-compose.yml. These are steps that will need to be finished manually by the pipeline executor when running the pipeline. It might also be the case that some services in the docker-compose.yml don't have an INIT_DAEMON_STEP configured. These are services that can start immediately without dependency on another service or action.
Finally, perform all the necessary steps to add the desired features to the environment, such as logging, cpu stats, etc..
# Triplestore database that acts as the single source of truth.
database:
image: tenforce/virtuoso:1.3.0-virtuoso7.2.2
environment:
SPARQL_UPDATE: "true"
DEFAULT_GRAPH: "http://mu.semte.ch/application"
volumes:
- ./data/db:/data
- ./config/toLoad:/data/toLoad
# Logs CPU stats and docker container events.
swarm-logger:
image: bde2020/mu-swarm-logger-service:latest
links:
- database:database
volumes:
- /var/run/docker.sock:/var/run/docker.sock
# All event-query, docker-watcher, har-transformation & elasticsearch
# are necessary for HTTP logging.
event-query:
image: bde2020/mu-event-query-service
links:
- database:database
volumes:
- ./containers:/usr/src/app/containers/
docker-watcher:
image: bde2020/mu-docker-watcher-service
volumes:
- ./config/supervisord/supervisord.conf:/etc/supervisord.conf
- ./containers:/app/containers
- ./pcap:/app/pcap/
network_mode: host
environment:
PCAP_READ_DIR: '/pcap'
har-transformation:
image: bde2020/mu-har-transformation-service
volumes:
- ./pcap:/app/pcap
- ./har:/app/har
- ./containers:/app/containers
- ./backups:/app/backups
links:
- elasticsearch:elasticsearch
environment:
BU_DIR: "/app/backups"
elasticsearch:
image: elasticsearch:2.4.6
command: elasticsearch -D network.host=0.0.0.0
spark-master:
image: bde2020/spark-master:2.2.0-hadoop2.7
container_name: spark-master
ports:
- "8080:8080"
- "7077:7077"
environment:
VIRTUAL_HOST: "spark-master.big-data-europe.aksw.org"
VIRTUAL_PORT: "8080"
INIT_DAEMON_STEP: "setup_spark"
constraint: "node==<yourmasternode>"
LOG: "true" # Log container's docker events into the database.
logging: "true" # Log container's HTTP traffic.
(... Spark Worker1, Spark Worker2, etc...)
That's it. You now have a Docker Compose pipeline that can run on the BDE platform!