Macrostrat's database backup service enables on-demand and periodic backups of PostgreSQL databases to local directories and remote S3 buckets. The service is designed to be run in a standalone Docker container, and is typically configured with environment variables.
It is based on pg_dump and Rclone and was initially created as a built-in backup service for the Sparrow data system.
Local backup to a directory and/or backup to a remote S3 bucket are supported, depending on which environment variables are set.
By default, the image runs the backup-service
command for periodic backups. A
backup-db
command that runs a one-off backup is also provided;
this can be run using
docker run pg-backup-service backup-db
with the appropriate environment variables.
Backups are named using an optional prefix, the database name, a 10-character file hash, and a timestamp, as such:
$DB_BACKUP_PREFIX/$dbname-5e753082e5-2021-11-01T20:51:55.pg-dump
This project is useful for low-intensity applications, but for larger-scale systems, using an incremental/streaming backup system like Barman or PGBackRest is recommended.
The application is configured with environment variables, allowing easy integration into Docker-centric workflows.
The name of the database to back up to (DB_NAME
takes precedence)
With DB_NAME
, a comma-separated string can be provided to back up multiple databases.
Other common PostgreSQL connection variables are also supported, such as:
PGHOST
(default:localhost
)PGPORT
(default:5432
)PGUSER
(default:postgres
)PGPASSWORD
(no default)
This service is primarily designed to support backup to an S3-compatible storage bucket. S3 buckets are provided
by most storage providers including many university IT systems. Macrostrat's backup services are typically used with
s3.drive.wisc.edu
.
S3_ENDPOINT
: the S3 endpoint (Required for cloud backup)S3_ACCESS_KEY
: the S3 access key (Required for cloud backup)S3_SECRET_KEY
: the S3 secret key (Required for cloud backup)S3_BACKUP_BUCKET
: the S3 bucket (Required for cloud backup)
DB_BACKUP_DIR
: the directory to back up to (Required for local backup).
In order to back up the database outside of the docker container, you will need to mount this directory on the host machine:
docker run \
--env DB_BACKUP_DIR=/db-backups \
--volume /local-backups:/db-backups \
pg-backup-service
Backups are scheduled using the go-cron library; the schedule
is set by the SCHEDULE
environment variable. Schedules can be
set to a crontab-style format, or more appealingly, a simple format: @daily
, @hourly
, @weekly
, @every 100s
etc.
See the go-cron
documentation for more information.
By default, no schedule is applied, and a single backup is performed on startup.
DB_BACKUP_PREFIX
: A prefix for database backups. If provided, all backups will be put within a specific namespace or folder. This is useful for sharing a bucket between many different database backup jobs.DB_BACKUP_MAX_N
: the maximum number of backups to keep (default:10
)PGDUMP_OPTIONS
: additional options to pass topg_dump
for all backup jobs.
If more customization of the backup process beyond $PGDUMP_OPTIONS
is desired, any dump-$dbname
or dump-database
commands added to the container's PATH
will override the normal pg_dump -Fc
command.
These have a signature dump-database <dbname> <out-dir>
and must output only the filename of the dump file created.
See bin/defs.bash
for more details.
The backup service creates custom-format PostgreSQL dump files. These can be restored with a command like
pg_restore -d $DB_NAME "$backup_file_name"
A built-in restore command may be provided in a future version of this image.
This container can be easily included in docker compose
container stacks.
services:
db_server:
image: postgis:13-3.1
...
db_backup:
image: ghcr.io/macrostrat/pg-backup-service:main
environment:
# Back up every weekend
- SCHEDULE=@weekly
- PGHOST=db_server
- PGPASSWORD=<your-password>
- DB_BACKUP_PREFIX=strata-v1
# Can set multiple databases for backup!!
- DB_NAME=strata-main,strata-dev
# S3 configuration parameters
- S3_ENDPOINT=s3.drive.wisc.edu
- S3_ACCESS_KEY
- S3_SECRET_KEY
- S3_BACKUP_BUCKET=database-backups
Basic backup functionality is fully tested. Tests can be run locally using make test
.
All pull requests and commits to the main
branch
are automatically tested, and updates to the Docker image are automatically pushed to the Github container registry.
Any contributions should add appropriate tests and documentation.
- A built-in command to restore from a backup (possibly with an interactive prompt).
- Possibly shift to Python from shell scripts.