-
Notifications
You must be signed in to change notification settings - Fork 45
Database Backend
Scale captures and tracks a considerable volume of job and system metadata as a part of processing. This data ultimately resides in PostgreSQL and is updated via the Message Handler and Scheduler components. Using PostgreSQL along with PostGIS allows us to support spatial data storage and complex JSON blob querying within our relational data store.
Our deployment includes a default deployment of Postgres using a community Docker image for demonstration purpose. This Postgres instance does not have any storage persistence or failover support making it unsuitable for any long-term use. For production use, we recommend use of a managed Postgres offering such as Amazon RDS or Azure Database. Configuration of a Postgres database can be a very involved process with considerable room for optimization as the size of your database grows.
We are going to focus on the configuration of Scale to connect to various backends. There is one configuration setting used by the Scale deployment to configure the various system components: environment variable DATABASE_URL
. This environment variable follows the syntax outlined at dj-database-url project. In short, it follows the form:
postgis://user:password@host[:port]/name
Note: It is critical to be aware that the user must have full access to a scale
and silo
schema within the defined database name
where all respective tables will be created. PostGIS extension is also required in the database.
While Scale will deploy a local Postgres cluster automatically during launch if DATABASE_URL
is unset, this should never be relied on for anything beyond demonstration purposes. The default Postgres deployment has no mounted persistent storage, so all Scale configuration and data will be lost if there is a container restart.
The following sample marathon.json would be a reasonable starting point to provide:
{
"id": "/scale-persistent-db",
"instances": 1,
"mem": 512,
"gpus": 0,
"cpus": 0.5,
"disk": 0,
"container": {
"portMappings": [
{
"containerPort": 5432,
"labels": {
"VIP_0": "//scale-persistent-db:5432"
},
"protocol": "tcp"
}
],
"type": "DOCKER",
"volumes": [
{
"persistent": {
"size": 10240
},
"mode": "RW",
"containerPath": "/var/lib/postgresql/data"
}
],
"docker": {
"image": "mdillon/postgis:9.5-alpine",
"forcePullImage": true
}
},
"networks": [
{
"mode": "container/bridge"
}
],
"env": {
"POSTGRES_DB": "scale",
"POSTGRES_PASSWORD": "scale-pass",
"POSTGRES_USER": "scale-user"
},
"healthChecks": [
{
"gracePeriodSeconds": 300,
"intervalSeconds": 30,
"maxConsecutiveFailures": 3,
"portIndex": 0,
"timeoutSeconds": 15,
"delaySeconds": 15,
"protocol": "MESOS_TCP",
"ipProtocol": "IPv4"
}
],
"residency": {
"relaunchEscalationTimeoutSeconds": 10,
"taskLostBehavior": "WAIT_FOREVER"
}
}
The above configuration will generate a persistent storage volume (10GiB) and pin Postgres to that node. This will protect you from data loss as long as that node remains in your cluster. Setting up a truly fault tolerant, local Postgres cluster is outside the scope of this guide.
To configure Scale to use the Postgres deployed as described above we need to set environment variables as below:
-
DATABASE_URL
:postgis://scale-user:[email protected]:5432/scale
- Home
- What's New
-
In-depth Topics
- Enable Scale to run CUDA GPU optimized algorithms
- Enable Scale to store secrets securely
- Test Scale's scan capability on the fly
- Test Scale's workspace broker capability on the fly
- Scale Performance Metrics
- Private docker repository configuration
- Setting up Automated Snapshots for Elasticsearch
- Setting up Cluster Monitoring
- Developer Notes