-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a514b4a
commit 7750a56
Showing
2 changed files
with
128 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,33 +1,153 @@ | ||
--- | ||
title: Cross-region disaster recovery in Aiven for PostgreSQL® | ||
sidebar_label: CRDR overview | ||
limited: true | ||
keywords: [recovery, primary, outage, failure, failover] | ||
--- | ||
|
||
import ConsoleLabel from "@site/src/components/ConsoleIcons" | ||
import ConsoleLabel from "@site/src/components/ConsoleIcons"; | ||
import readyForCrdr from "@site/static/images/content/figma/ready-for-crdr.png"; | ||
|
||
## Purpose | ||
The cross-region disaster recovery (CRDR) feature ensures your business continuity by | ||
automatically recovering your workloads to a remote region in the event of a region-wide | ||
failure. | ||
|
||
## Benefits | ||
## CRDR overview | ||
|
||
The CRDR setup is a pair of integrated multi-node services sharing credentials and a | ||
DNS address but located in different regions. CRDR peer services may be hosted on 1-3 nodes. | ||
|
||
- Primary region service (**PRS**) is the original service you use unless there's a region | ||
outage, in which case it hands over to the RRS. As soon as the region is up again and | ||
the PRS is ready, the PRS takes back control from the RRS. | ||
- Recovery region service (**RRS**) is the service you create for disaster recovery purposes. | ||
This service takes over from the PRS when a region is down and hands over to the PRS when | ||
the region and the PRS are up and running again. | ||
|
||
The CRDR cycle is a sequence of actions involving CRDR peer services aimed at enabling and | ||
executing CRDR as well as resuming the original service operation. | ||
|
||
Throughout the CRDR cycle, CRDR peer services or service nodes go into the following states: | ||
|
||
- Active | ||
|
||
A CRDR peer service is **Active** when it runs on a node that is replicating data to | ||
CRDR standby nodes. | ||
- PRS is active during normal operations, when a region is up and running. | ||
- RRS is active after taking over from PRS in the event of a region outage. | ||
|
||
- Passive | ||
|
||
A CRDR peer service is **Passive** when it runs on CRDR standby nodes only. Either CRDR | ||
peer service can be passive depending on a phase of the CRDR cycle. | ||
|
||
- Failed | ||
|
||
A CRDR peer service is **Failed** when it's defunct or unreachable after failing over | ||
automatically or manually in the event of a region outage. Only a PRS can be failed. | ||
|
||
- Standby | ||
|
||
A CRDR service node is **Standby** when it is replicating data from the CRDR service | ||
node that is running the active service. | ||
|
||
## Limitations | ||
|
||
- RRS needs to use the same service plan and cloud provider as the PRS. | ||
|
||
## How it works | ||
|
||
## Failover to the recovery region | ||
The CRDR feature is eligible for all startup, business, and premium service plans. | ||
|
||
<img src={readyForCrdr} className="centered" alt="Ready for CRDR" width="100%" /> | ||
|
||
### CRDR setup | ||
|
||
You [enable CRDR by creating an RRS](/docs/products/postgresql/crdr/enable-crdr). As a | ||
result, both PRS and RRS are up and running: | ||
|
||
- PRS as an **Active** service, where the traffic goes | ||
- RRS as a **Passive** service in sync with the PRS | ||
|
||
### Automatic failover | ||
### Failover to the recovery region | ||
|
||
### Manual failover | ||
[Failover to the RRS](/docs/products/postgresql/crdr/crdr-failover-to-recovery) is | ||
performed either | ||
[automatically](/docs/products/postgresql/crdr/crdr-failover-to-recovery) or | ||
[manually](/docs/products/postgresql/crdr/crdr-failover-to-recovery). When completed, the | ||
PRS is **Failed** and the RRS is up and running as an **Active** service. | ||
|
||
#### Automatic switchover | ||
|
||
In the even of a region-wide failure, a CRDR process is triggered automatically. The RRS | ||
takes over from the PRS so that your workloads remain available at all times during | ||
the region outage. | ||
|
||
#### Manual switchover | ||
|
||
You trigger a failover to the RRS yourself for testing purposes: to simulate a | ||
disaster scenario and verify the disaster resilience of your infrastructure. | ||
|
||
### Revert to the primary region | ||
|
||
The purpose of a revert operation is shifting your workload back to the original region | ||
and restoring the CRDR setup to its original configuration. | ||
|
||
You [initiate a revert](/docs/products/postgresql/crdr/crdr-revert-to-primary) to the PRS | ||
manually as soon as the PRS is ready to take back control. | ||
|
||
A revert process consists of two steps you initiate at your convenience: | ||
|
||
1. **Restore the primary service** by recreating its nodes from the local backups as well as | ||
synchronize (replicate) the most recent data from the active service (RRS). | ||
When completed, the PRS is restored and in near real-time sync with the RRS. | ||
1. **Switch the direction of the replication** to effectively route the traffic back to the | ||
primary region. When completed, both the PRS and the RRS are up and running again: the | ||
PRS as an active service, and the RRS as a passive service. | ||
|
||
## DNS address and service URI | ||
|
||
### Active service DNS address | ||
|
||
CRDR allows you to access your active service always using the same **Service URI**, | ||
which doesn't change in the event of a failover to the recovery region. | ||
|
||
:::note | ||
**Service URI** is a locator that is shared between the PRS and the RRS. It always points | ||
to the replicating node of the active service. This node is the only read-write node | ||
in both CRDR regions. | ||
::: | ||
|
||
The **Service URI** of an active service can remain unchanged in the event of a region outage | ||
because the DNS record of this **Service URI** is updated to point to the active service. | ||
This allows your applications to work uninterrupted and adapt to the change automatically | ||
without updating its code or data. | ||
|
||
### Standby nodes DNS addresses | ||
|
||
## Revert to the primary region | ||
Regardless of the CRDR cycle phase, you can always connect and access separately | ||
each standby node in the CRDR peer services. This can help you compensate for potential | ||
network delays by using the service geographically closer to your applications. | ||
|
||
Standby nodes in the CRDR service pair can have two different URIs, depending on the CRDR | ||
service (region) they belong to: | ||
|
||
## Backups | ||
- For the **primary service standby URI**, the DNS record always points to the standby nodes | ||
in the primary region. | ||
- For the **recovery service standby URI**, the DNS record always points to the standby nodes | ||
in the recovery region. | ||
|
||
Both the PRS standby URI and the RRS standby URI are dedicated, not shared, and read-only. | ||
|
||
## Backups in the recovery region | ||
|
||
After a failover to the recovery region in the event of a primary region outage, service | ||
backups start to be taken in the recovery region. You can use this backup history for | ||
operations and data resiliency purposes. | ||
|
||
## Related pages | ||
|
||
- [Aiven for PostgreSQL high availability](/docs/products/postgresql/concepts/high-availability) | ||
- [Aiven for PostgreSQL backups](/docs/products/postgresql/concepts/pg-backups) | ||
- [Aiven for PostgreSQL read-only replica](/docs/products/postgresql/howto/create-read-replica) | ||
- [Backup to another region](/docs/platform/concepts/backup-to-another-region) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.