You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a lot of databases we'd like to keep backed up, but many of them don't change often. It would be nice if there was a deterministic or at least reasonably-accurate way to determine if a database has changed recently. This would keep us from filling up our backup space with similar database dumps.
Right now, we compute md5 hashes of final database backups. However, pg_dump -Fc ("custom format" backups) includes information that is not the same between dumps, so the hashes aren't stable even for backups generated seconds apart. Text dumps (without -Fc) can have stable hashes, but you lose significant capabilities in restore (for instance, the ability to restore only some tables, change table ownership, or manipulate permissions). I'd like to keep all backups as custom format, if possible, to simplify restoration.
One approach would be to generate a text dump locally from each custom-format dump created, for the purposes of hashing. However, this would effectively dump each database twice, which would be insane for large databases.
Another option would be to use the database system's internal functions to figure out the last transaction ID/time/something and generate a hash from that. I think there are solutions for this (especially for recent PostgreSQL) but they may not be available in default configurations. The big advantage of figuring something like this out would be that such a pre-backup check would be extremely quick and save us from a lot of extra work. It'd have to avoid false negatives though!
The text was updated successfully, but these errors were encountered:
Tracking commit timestamps may be the way to go here (see 3rd answer in this stackoverflow post) but it requires a config to be set at the database cluster level.
Getting the size of all tables before dumping could be a good approach (although stealthy changes could arise if data is modified internally):
select sum(pg_table_size((schemaname || '.'||tablename)::regclass))
from pg_tables;
One option here is that we could allow a user-specified query per database that would check if important things have changed internally. That result set could be used for hash generation and database backup skipped if the hash was the same as an already-existing backup.
We have a lot of databases we'd like to keep backed up, but many of them don't change often. It would be nice if there was a deterministic or at least reasonably-accurate way to determine if a database has changed recently. This would keep us from filling up our backup space with similar database dumps.
Right now, we compute md5 hashes of final database backups. However,
pg_dump -Fc
("custom format" backups) includes information that is not the same between dumps, so the hashes aren't stable even for backups generated seconds apart. Text dumps (without-Fc
) can have stable hashes, but you lose significant capabilities in restore (for instance, the ability to restore only some tables, change table ownership, or manipulate permissions). I'd like to keep all backups as custom format, if possible, to simplify restoration.One approach would be to generate a text dump locally from each custom-format dump created, for the purposes of hashing. However, this would effectively dump each database twice, which would be insane for large databases.
Another option would be to use the database system's internal functions to figure out the last transaction ID/time/something and generate a hash from that. I think there are solutions for this (especially for recent PostgreSQL) but they may not be available in default configurations. The big advantage of figuring something like this out would be that such a pre-backup check would be extremely quick and save us from a lot of extra work. It'd have to avoid false negatives though!
The text was updated successfully, but these errors were encountered: