Skip to content

Commit

Permalink
Merge pull request #16 from jameshy/refresh
Browse files Browse the repository at this point in the history
2019 update
  • Loading branch information
jameshy authored Dec 27, 2019
2 parents 39d31c6 + 8c1437e commit 9828dd7
Show file tree
Hide file tree
Showing 27 changed files with 4,708 additions and 266 deletions.
2 changes: 1 addition & 1 deletion .eslintrc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"indent": ["error", 4],
"semi": ["error", "never"],
"brace-style": ["error", "stroustrup"],
'no-restricted-syntax': [
"no-restricted-syntax": [
"error",
"ForInStatement",
"LabeledStatement",
Expand Down
4 changes: 1 addition & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
language: node_js
node_js:
- '6.10'
addons:
postgresql: '9.4'
- '12.14.0'
after_success: npm run coverage
before_deploy:
- npm run deploy
Expand Down
106 changes: 62 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,85 +3,103 @@
[![Build Status](https://travis-ci.org/jameshy/pgdump-aws-lambda.svg?branch=master)](https://travis-ci.org/jameshy/pgdump-aws-lambda)
[![Coverage Status](https://coveralls.io/repos/github/jameshy/pgdump-aws-lambda/badge.svg?branch=master)](https://coveralls.io/github/jameshy/pgdump-aws-lambda?branch=master)

# Overview

An AWS Lambda function that runs pg_dump and streams the output to s3.

It can be configured to run periodically using CloudWatch events.

## Quick start

1. Create an AWS lambda function:
- Runtime: Node.js 6.10
- Code entry type: Upload a .ZIP file
([pgdump-aws-lambda.zip](https://github.com/jameshy/pgdump-aws-lambda/releases/download/v1.1.5/pgdump-aws-lambda.zip))
- Configuration -> Advanced Settings
- Timeout = 5 minutes
- Select a VPC and security group (must be suitable for connecting to the target database server)
2. Create a CloudWatch rule:
- Event Source: Fixed rate of 1 hour
- Targets: Lambda Function (the one created in step #1)
- Configure input -> Constant (JSON text) and paste your config, e.g.:
- Author from scratch
- Runtime: Node.js 12.x
2. Configuration -> Function code:
- Code Entry Type: Upload a .zip file
- Upload ([pgdump-aws-lambda.zip](https://github.com/jameshy/pgdump-aws-lambda/releases/latest))
- Basic Settings -> Timeout: 15 minutes
- Save
3. Configuration -> Execution role
- Edit the role and attach the policy "AmazonS3FullAccess"
4. Test
- Create new test event, e.g.:
```json
{
"PGDATABASE": "oxandcart",
"PGUSER": "staging",
"PGPASSWORD": "uBXKFecSKu7hyNu4",
"PGHOST": "database.com",
"S3_BUCKET" : "my-db-backups",
"PGDATABASE": "dbname",
"PGUSER": "postgres",
"PGPASSWORD": "password",
"PGHOST": "host",
"S3_BUCKET" : "db-backups",
"ROOT": "hourly-backups"
}
```
- *Test* and check the output

Note: you can test the lambda function using the "Test" button and providing config like above.
5. Create a CloudWatch rule:
- Event Source: Schedule -> Fixed rate of 1 hour
- Targets: Lambda Function (the one created in step #1)
- Configure input -> Constant (JSON text) and paste your config (as per step #3)

**AWS lambda has a 5 minute maximum execution time for lambda functions, so your backup must take less time that that.**

## File Naming
#### File Naming

This function will store your backup with the following s3 key:

s3://${S3_BUCKET}${ROOT}/YYYY-MM-DD/[email protected]

## PostgreSQL version compatibility
#### AWS Firewall

This script uses the pg_dump utility from PostgreSQL 9.6.2.
- If you run the Lambda function outside a VPC, you must enable public access to your database instance, a non VPC Lambda function executes on the public internet.
- If you run the Lambda function inside a VPC (not tested), you must allow access from the Lambda Security Group to your database instance. Also you must add a NAT gateway to your VPC so the Lambda can connect to S3.

It should be able to dump older versions of PostgreSQL. I will try to keep the included binaries in sync with the latest from postgresql.org, but PR or message me if there is a newer PostgreSQL binary available.
#### Encryption

## Encryption
You can add an encryption key to your event, e.g.

You can pass the config option 'ENCRYPTION_PASSWORD' and the backup will be encrypted using aes-256-ctr algorithm.

Example config:
```json
{
"PGDATABASE": "dbname",
"PGUSER": "postgres",
"PGPASSWORD": "password",
"PGHOST": "localhost",
"S3_BUCKET" : "my-db-backups",
"ENCRYPTION_PASSWORD": "my-secret-password"
"PGHOST": "host",
"S3_BUCKET" : "db-backups",
"ROOT": "hourly-backups",
"ENCRYPT_KEY": "c0d71d7ae094bdde1ef60db8503079ce615e71644133dc22e9686dc7216de8d0"
}
```

To decrypt these dumps, use the command:
`openssl aes-256-ctr -d -in ./encrypted-db.backup -nosalt -out unencrypted.backup`
The key should be exactly 64 hex characters (32 hex bytes).

When this key is present the function will do streaming encryption directly from pg_dump -> S3.

It uses the aes-256-cbc encryption algorithm with a random IV for each backup file.
The IV is stored alongside the backup in a separate file with the .iv extension.

You can decrypt such a backup with the following bash command:

```bash
openssl enc -aes-256-cbc -d \
-in [email protected] \
-out [email protected] \
-K c0d71d7ae094bdde1ef60db8503079ce615e71644133dc22e9686dc7216de8d0 \
-iv $(< [email protected])
```


## Developer

## Loading your own `pg_dump` binary
1. Spin up an Amazon AMI image on EC2 (since the lambda function will run
on Amazon AMI image, based off of CentOS, using it would have the
best chance of being compatible)
2. Install PostgreSQL using yum. You can install the latest version from the [official repository](https://yum.postgresql.org/repopackages.php#pg96).
3. Add a new directory for your pg_dump binaries: `mkdir bin/postgres-9.5.2`
#### Bundling a new `pg_dump` binary
1. Launch an EC2 instance with the Amazon Linux 2 AMI
2. Connect via SSH and (Install PostgreSQL using yum)[https://stackoverflow.com/questions/55798856/deploy-postgres11-to-elastic-beanstalk-requires-etc-redhat-release].
3. Locally, create a new directory for your pg_dump binaries: `mkdir bin/postgres-11.6`
3. Copy the binaries
- `scp -i YOUR-ID.pem ec2-user@AWS_IP:/usr/bin/pg_dump ./bin/postgres-9.5.2/pg_dump`
- `scp -i YOUR-ID.pem ec2-user@AWS_UP:/usr/lib64/libpq.so.5.8 ./bin/postgres-9.5.2/libpq.so.5`
4. When calling the handler, pass the env variable PGDUMP_PATH=postgres-9.5.2 to use the binaries in the bin/postgres-9.5.2 directory.
- `scp -i <aws PEM> ec2-user@<EC2 Instance IP>:/usr/bin/pg_dump ./bin/postgres-11.6/pg_dump`
- `scp -i <aws PEM> ec2-user@<EC2 Instance IP>:/usr/lib64/{libcrypt.so.1,libnss3.so,libsmime3.so,libssl3.so,libsasl2.so.3,liblber-2.4.so.2,libldap_r-2.4.so.2} ./bin/postgres-11.6/`
- `scp -i <aws PEM> ec2-user@<EC2 Instance IP>:/usr/pgsql-11/lib/libpq.so.5 ./bin/postgres-11.6/libpq.so.5`
4. When calling the handler, pass the environment variable `PGDUMP_PATH=postgres-11.6` to use the binaries in the bin/postgres-11.6 directory.

#### Creating a new function zip

NOTE: `libpq.so.5.8` is found out by running `ll /usr/lib64/libpq.so.5`
and looking at where the symlink goes to.
`npm run deploy`

## Contributing
#### Contributing

Please submit issues and PRs.
50 changes: 21 additions & 29 deletions bin/makezip.sh
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
#!/bin/bash
set -e

SCRIPT=`readlink -f $0`
SCRIPTPATH=`dirname $SCRIPT`
PROJECTROOT=`readlink -f $SCRIPTPATH/..`
FILENAME="pgdump-aws-lambda.zip"

command_exists () {
Expand All @@ -12,43 +9,38 @@ command_exists () {

if ! command_exists zip ; then
echo "zip command not found, try: sudo apt-get install zip"
exit 0
exit 1
fi
if [ ! -f ./package.json ]; then
echo "command must be run from the project root directory"
exit 1
fi


cd $PROJECTROOT

echo "creating bundle.."
# create a temp directory for our bundle
BUNDLE_DIR=$(mktemp -d)
# copy entire app into BUNDLE_DIR
cp -r * $BUNDLE_DIR/

# prune things from BUNDLE_DIR
echo "running npm prune.."
cd $BUNDLE_DIR
# prune dev-dependancies from node_modules
npm prune --production >> /dev/null

# copy entire project into BUNDLE_DIR
cp -R * $BUNDLE_DIR/

# remove unnecessary things
pushd $BUNDLE_DIR > /dev/null
echo "cleaning.."
rm -rf node_modules/*
npm install --production --no-progress > /dev/null
rm -rf dist coverage test


# create and empty the dist directory
if [ ! -d $PROJECTROOT/dist ]; then
mkdir $PROJECTROOT/dist
fi
rm -rf $PROJECTROOT/dist/*

# create zip of bundle/
echo "creating zip.."
echo "zipping.."
zip -q -r $FILENAME *
echo "zip -q -r $FILENAME *"
mv $FILENAME $PROJECTROOT/dist/$FILENAME

# return to project dir
popd > /dev/null

# copy the zip
mkdir -p ./dist
cp $BUNDLE_DIR/$FILENAME ./dist/$FILENAME

echo "successfully created dist/$FILENAME"

# remove bundle/
rm -rf $BUNDLE_DIR


cd $PROJECTROOT
Binary file added bin/postgres-11.6/libcrypt.so.1
Binary file not shown.
Binary file added bin/postgres-11.6/liblber-2.4.so.2
Binary file not shown.
Binary file added bin/postgres-11.6/libldap_r-2.4.so.2
Binary file not shown.
Binary file added bin/postgres-11.6/libnss3.so
Binary file not shown.
Binary file added bin/postgres-11.6/libpq.so.5
Binary file not shown.
Binary file added bin/postgres-11.6/libsasl2.so.3
Binary file not shown.
Binary file added bin/postgres-11.6/libsmime3.so
Binary file not shown.
Binary file added bin/postgres-11.6/libssl3.so
Binary file not shown.
Binary file added bin/postgres-11.6/pg_dump
Binary file not shown.
Binary file removed bin/postgres-9.6.2/libpq.so.5
Binary file not shown.
Binary file removed bin/postgres-9.6.2/pg_dump
Binary file not shown.
4 changes: 3 additions & 1 deletion lib/config.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,7 @@ const path = require('path')

module.exports = {
S3_REGION: 'eu-west-1',
PGDUMP_PATH: path.join(__dirname, '../bin/postgres-9.6.2')
PGDUMP_PATH: path.join(__dirname, '../bin/postgres-11.6'),
// maximum time allowed to connect to postgres before a timeout occurs
PGCONNECT_TIMEOUT: 15
}
32 changes: 25 additions & 7 deletions lib/encryption.js
Original file line number Diff line number Diff line change
@@ -1,14 +1,32 @@
const crypto = require('crypto')

const algorithm = 'aes-256-ctr'

const ALGORITHM = 'aes-256-cbc'

module.exports = {
encrypt(readableStream, password) {
const cipher = crypto.createCipher(algorithm, password)
return readableStream.pipe(cipher)
encrypt(readableStream, key, iv) {
this.validateKey(key)
if (iv.length !== 16) {
throw new Error(`encrypt iv must be exactly 16 bytes, but received ${iv.length}`)
}
const cipher = crypto.createCipheriv(ALGORITHM, Buffer.from(key, 'hex'), iv)
readableStream.pipe(cipher)
return cipher
},
decrypt(readableStream, key, iv) {
this.validateKey(key)
const decipher = crypto.createDecipheriv(ALGORITHM, Buffer.from(key, 'hex'), iv)
readableStream.pipe(decipher)
return decipher
},
validateKey(key) {
const bytes = Buffer.from(key, 'hex')
if (bytes.length !== 32) {
throw new Error('encrypt key must be a 32 byte hex string')
}
return true
},
decrypt(readableStream, password) {
const decipher = crypto.createDecipher(algorithm, password)
return readableStream.pipe(decipher)
generateIv() {
return crypto.randomBytes(16)
}
}
63 changes: 29 additions & 34 deletions lib/handler.js
Original file line number Diff line number Diff line change
@@ -1,54 +1,49 @@
const utils = require('./utils')
const uploadS3 = require('./upload-s3')
const pgdump = require('./pgdump')
const encryption = require('./encryption')
const Promise = require('bluebird')
// todo: make these const, (mockSpawn doesn't allow this, so remove mockSpawn)
var uploadS3 = require('./upload-s3')
var pgdump = require('./pgdump')

const DEFAULT_CONFIG = require('./config')

function handler(event, context) {
const config = Object.assign({}, DEFAULT_CONFIG, event)

async function backup(config) {
if (!config.PGDATABASE) {
throw new Error('PGDATABASE not provided in the event data')
}
if (!config.S3_BUCKET) {
throw new Error('S3_BUCKET not provided in the event data')
}

// determine the path for the database dump
const key = utils.generateBackupPath(
config.PGDATABASE,
config.ROOT
)

const pgdumpProcess = pgdump(config)
return pgdumpProcess
.then(readableStream => {
if (config.ENCRYPTION_PASSWORD) {
console.log('encrypting dump')
readableStream = encryption.encrypt(
readableStream,
config.ENCRYPTION_PASSWORD
)
}
// stream to s3 uploader
return uploadS3(readableStream, config, key)
})
.catch(e => {
throw e
})
// spawn the pg_dump process
let stream = await pgdump(config)
if (config.ENCRYPT_KEY && encryption.validateKey(config.ENCRYPT_KEY)) {
// if encryption is enabled, we generate an IV and store it in a separate file
const iv = encryption.generateIv()
const ivKey = key + '.iv'

await uploadS3(iv.toString('hex'), config, ivKey)
stream = encryption.encrypt(stream, config.ENCRYPT_KEY, iv)
}
// stream the backup to S3
return uploadS3(stream, config, key)
}

module.exports = function (event, context, cb) {
return Promise.try(() => handler(event, context))
.then(result => {
cb(null)
return result
})
.catch(err => {
cb(err)
throw err
})
async function handler(event) {
const config = { ...DEFAULT_CONFIG, ...event }
try {
return await backup(config)
}
catch (error) {
// log the error and rethrow for Lambda
if (process.env.NODE_ENV !== "test") {
console.error(error)
}
throw error
}
}

module.exports = handler
Loading

0 comments on commit 9828dd7

Please sign in to comment.