-
Notifications
You must be signed in to change notification settings - Fork 24
Data Behaviors
Replicator supports a number of behaviors that modify how mutations are applied to the target database. These behaviors can be enabled for both cluster-to-cluster and logical-replication uses of Replicator.
The preferred way of configuring these behaviors is via the
user script.
Each mode described below is set via api.configureTable()
. The active apply configuration is available
from the /_/diag
endpoint.
The behaviors described below can be combined within a single table. For instance, the CAS and Extras behavior can be combined for conditional updates, where unmapped properties are stored as JSON in a designated column.
A compare-and-set (CAS) mode allows Replicator to discard some updates, based on version- or time-like fields.
The use of CAS mode intentionally discards mutations and may cause data inconsistencies.
Consider a table with a version INT
column. If Replicator receives a message to update some row in
this table, the update will only be applied if there is no pre-existing row in the destination
table, or if the update's version
value is strictly greater than the existing row in the
destination table. That is, updates are applied only if they would increase the version
column or
if they would insert a new row.
api.configureTable("my_db.public.some_table", {
cas: [ "version" ]
});
When multiple CAS columns are present, they are compared as a tuple. That is, the second column is only compared if the first column value is equal between the source and destination clusters. In this multi-column case, the following pseudo-sql clause is applied to each incoming update:
api.configureTable("my_db.public.some_table", {
cas: [ "version_major", "version_minor", "version_patch" ]
});
Will translate into approximately:
WHERE existing IS NULL OR (
(incoming.version_major, incoming.version_minor, incoming.version_patch) >
(existing.version_major, existing.version_minor, existing.version_patch)
)
Deletes from the source cluster are always applied.
A deadline mode allows Replicator to discard some updates, by comparing a time-like field to the destination cluster's time. This is useful when a feed's data is advisory only or changefeed downtime is expected.
The use of deadlines intentionally discards mutations and may cause data inconsistencies.
To opt into deadline mode for a table, set the deadlines
property for one
or more columns in a table:
api.configureTable("my_db.public.some_table", {
"deadlines": {
"updated_at": "1m",
}
});
Given the above configuration, each incoming update would behave as though it were filtered with a
clause such as WHERE incoming.updated_at > now() - '1m'::INTERVAL
.
Deletes from the source cluster are always applied.
By default, Replicator will reject any incoming data that cannot be mapped onto a column in the target table. Users may specify a textual or JSONB column in a target table to receive otherwise-unmapped properties. This is useful in logical-replication scenarios where the source data has a variable schema (e.g. migrations from document stores). Values in a JSONB column can be extracted in subsequent schema-change operations using computed columns.
To enable extras mode for a table, set the extras
property to the name of a column. The name of
the extras column should be chosen to avoid any conflicts with properties in incoming mutations.
api.configureTable("my_db.public.some_table", {
extras: "overflow_column"
});
By default, Replicator will reject incoming mutations that have columns which do not map to a column in the destination table.
The use of ignore mode intentionally discards columns and may cause data inconsistencies.
api.configureTable("my_db.public.some_table", {
ignore: {
"dont_care": true,
}
});
Ignore mode is mostly useful when using logical replication modes and performing schema changes.
See Merge.
Renaming of columns should be accomplished either via the schema-wide dispatch()
function or the
per-destination-table map()
function.
Renaming columns is mostly useful when using logical replication modes and performing schema changes.
Substitute expressions allow incoming column data to be transformed with arbitrary SQL expressions
when being applied to the target table. This can also be accomplished via the dispatch()
or
map()
if the value makes sense to compute within the userscript.
The use of substitute expressions may cause data inconsistencies.
Consider the following, contrived, case of multiplying a value by two before applying it:
api.configureTable("my_db.public.some_table", {
exprs: {
"some_column": "2 * $0"
}
});
The substitution string $0
will expand at runtime to the value of the incoming data. Expressions
may repeat the $0
expression. For example $0 || $0
would concatenate a value with itself. It is
also permissable to use a constant value that does not depend on an input property:
api.configureTable("my_db.public.some_table", {
exprs: {
"did_migrate": "true"
}
});
In cases where a column exists only in the target database, using a DEFAULT
for the column is
preferable to using a substitute expression.