go install github.com/processout/pg-upgrade-data-check@latest
or use one of the binaries available in https://github.com/processout/pg-upgrade-data-check/releases
pg-upgrade-data-check
will compare Postgres table contents of two databases (source and target) to ensure there are no differences
in data.
It will detect discrepancies like;
- Row missing in the target database table
- Extra row in the target database table
- Difference in any column value for the same id
Note: It's designed to work purely on tables content (it will not check indexes, etc). Note 2: It will only work for tables that have numeric id that is incremented for each row.
All configurations related to tables are contained in the examples/config.yaml
file. You must specify the table that needs to be checked in .tables
and then configure a function for collect
and compare
:
collect
: is executed in first and second stage in order to identify range of id's to compare, usually the max of the current table.compare
: should return two values:id
andhash
;hash
is used to compare rows of the sameid
between two databases.
A full example would be:
tables:
users:
collect: select max(id) from %s limit 1;
compare: select id, md5(%s::TEXT) as hash from %s where id>=%d and id<=%d order by id asc;
Each command will require the database DSN, which might be indicated by --source-url
, --target-url
or both.
As pg-upgrade-data-check
has been designed primarily for verifying effects of cloning & replication, it operates in three stages: before
, after
and compare
. Each of these are commands in the cli itself, e.g: pg-upgrade-data-check before --source-url postgres://postgres@localhost:6001/postgres
pg-upgrade-data-check before --source-url postgres://postgres@localhost:6001/postgres --config config.yaml
For each configured table find the starting id. This query is executed on the source database, and output is
saved to data/before_ids.json
file. For safety reasons it is not possible to execute this step if the destination
file already exist. Delete that file first if you don't care about previous result.
When using pg-upgrade-data-check
during Postgres upgrade this stage should be executed before you create a replication
slot on the source database.
pg-upgrade-data-check after --target-url postgres://postgres@localhost:6002/postgres --config config.yaml
For each configured table find the ending id. Again, this is executed on the source database, with output written
this time to data/after_ids.json
file. It has the same safety mechanism as previous stage.
When using pg-upgrade-data-check
during Postgres upgrade this stage should be executed after target database has been restored
from snapshot and logical replication caught up to source.
pg-upgrade-data-check compare --source-url postgres://postgres@localhost:6001/postgres --target-url postgres://postgres@localhost:6002/postgres --config config.yaml
Load the contents of after_ids.json
(generated by the previous stages) and for all tables from configuration compare
rows between startId
and stopId
, detecting additionally any missing or extra rows on the target db.