Incremental backup #22

GuChad369 · 2024-08-21T14:22:12Z

I'm starting to research and attempt to implement incremental backup. I plan to achieve this feature through three PRs:

Find a way to send the manifest files to the extension.
Compare the files with the server's PostgreSQL instance.
Handle the result.

GuChad369 · 2024-08-21T14:23:15Z

@Jubilee101 If I have any questions during the process, I will post them here along with any discussions.

Jubilee101 · 2024-08-22T08:00:52Z

Sounds good! Just let us know if you have ideas or problems.

GuChad369 · 2024-08-26T11:47:19Z

I have researched pgbackrest, which also supports incremental backups. The general idea behind its implementation is similar, using manifests to achieve this functionality.

In pgbackrest, there are two main files that handle communication between the client and server: src/protocol/client.c and src/protocol/server.c. This C code implements a client-server communication protocol.

The implementation follows strict state management and error-handling practices, ensuring reliable communication between the client and server. It also supports asynchronous communication, allowing for complex multi-session interactions.

Additionally, there are two packed interfaces, IoRead and IoWrite, used for reading from and writing to the communication channel.

In summary, pgbackrest implements its communication protocol between the client and server, enabling them to send and receive protocol messages effectively.

I will try to become familiar with our project's communication methods and explore using a protocol directly to send the manifest file, avoiding the need to create a table on the server side.

Jubilee101 · 2024-08-27T07:12:09Z

In summary, pgbackrest implements its communication protocol between the client and server, enabling them to send and receive protocol messages effectively.

Hmm, are they using this protocol to send manifest and receive backups? This is a bit weird since I don't recall having to install pgbackrest on both sides.

I don't think it's appropriate to setup connections on the extension side, it seems like a weird hacking of postgres extensions. Some other options would be A. we send manifest entry one query at a time, B. the extension sends manifest as a query response, one row per entry, and we do comparison on main.

There is a chance that we don't need manifest to know what changed, I'm looking at how postgres does it, and it seems that they use WAL purely. The idea is to get a WAL summary from last backup's start LSN to current backup's start LSN, files mentioned in WAL in the range are naturally subject to incremental backup. And here's a nice thing, we are going to have our WAL support within this GSoC. And we stream WAL segments to main all the time. So the whole "comparison"(it's not exactly a comparison now) could happen entirely on main side.

Either way, we need to know what the start LSN (and end LSN as well, but one thing at a time) is for current backup. And it is not easy, since I think postgres gets it by inducing a checkpoint, which I'm not sure if we have the authority to do. I think you can put a pin on sending manifest while Jesper and I think about the best way to address this, and look into getting the start LSN on the extension side. I think what pgbackrest does is to wait until postgres does its next regular checkpoint. But it seems that it also has an option to start immediately(--start-fast). I'm interested to know what they are doing underneath.

We are still in exploring phase of a hard problem so I think it is OK that things are slow -- postgres itself waits until v17 🤷‍♂️ . We can also get support for v17 going at the same time. Anyway, let us if you have better ideas or problems, as always.

jesperpedersen · 2024-08-27T11:39:12Z

I think we should think about

struct file_info
{
   char[] filename;
   char[] checksum;
   time_t timestamp;
}

in a deque (memory cached). That can be refreshed, and in most cases timestamp wouldn't have changed.

We should keep the last backup information cached such that is it easier to make a "diff"...

However, like @Jubilee101 pointed out, LSN is the important factor, so that is the driver. The above is an optimization.

I like B.

GuChad369 · 2024-08-27T14:47:46Z

For B, we could create a function similar to pgmoneta_ext_get_manifest within our extension. This function would return the latest manifest. We would then call this function on the main server to retrieve the result and proceed with the comparison. Do you think this approach sounds good to you?

The concern is that if the result is too large, we might encounter issues with this approach. It's uncertain whether it would still work effectively in such cases.

jesperpedersen · 2024-08-27T15:12:38Z

We should start with the file_info deque, and then prepare a JSON format similar to backup_manifest.

This is a very big task, so we need to break it down - while we consider very large database clusters

jesperpedersen · 2024-08-27T15:16:58Z

Talk to @Jubilee101 to port the core data structures over such that building the JSON document is easy

GuChad369 · 2024-08-27T15:17:31Z

So, each time we perform a backup, we can read the backup.manifest file and store the information in a file_info deque.

jesperpedersen · 2024-08-27T15:18:21Z

Work on getting a GUC in place such that checksums are calculated based on the setting

jesperpedersen · 2024-08-27T15:21:28Z

We can have "latest backup", and current checksums... we really need to know what has changed since the last backup

jesperpedersen · 2024-08-27T15:23:28Z

But, core/ does the backup, so you need a way to receive the "official" manifest and keep it

jesperpedersen · 2024-08-27T16:34:01Z

The core idea is that we can't afford to wait to calculate checksums on a PB-scale database cluster, so we need to keep the checksum current in memory.

Most of the files won't change, so that is "guarded" by the file timestamp check, so each iteration should be "minimal"

Jubilee101 · 2024-08-28T07:22:53Z

WAL summary sounds promising but it needs some more time. So for now maybe focus on porting the data structures(should be easy copy & paste), and getting the start/end LSN. As for the checksum cache, it's an optimization which is nice to have. So look at it if you have time, but it doesn't sound like top priority to me.

GuChad369 · 2024-08-29T18:53:12Z

Since the school year has started, I might not always be able to respond promptly. However, I'll do my best to stay on top of the project, and I'll do my best to complete it.

I'm a little confused right now because I don't have a rough outline of the entire process. Could you clarify where we should define the data structures and in which file?

I did some research on retrieving the start/end LSN and found an internal PostgreSQL function, GetXLogWriteRecPtr(), which retrieves the current write pointer of the WAL (Write-Ahead Log). This function returns the LSN as an XLogRecPtr type, which is a data structure used in PostgreSQL to represent an LSN. Essentially, it's a 64-bit integer.

I think this might be the LSN we need if we use it before the backup. If we define a custom function, pgmoneta_ext_get_lsn(), on our extension side, it could potentially serve this purpose.

We can execute the logic on the main side:

Get the LSN before the backup:
```
pgmoneta_ext_get_lsn();
```
Perform our backup process.
Get the LSN after the backup:
```
pgmoneta_ext_get_lsn();
```

Is this logic and my understanding correct?

Jubilee101 · 2024-08-30T01:05:25Z

Since the school year has started, I might not always be able to respond promptly. However, I'll do my best to stay on top of the project, and I'll do my best to complete it.

That is quite ok, just keep us posted every week and let us know if you get distracted by other business.

I'm a little confused right now because I don't have a rough outline of the entire process. Could you clarify where we should define the data structures and in which file?

I'm thinking in main it should be an option in backup, something like pgmoneta-cli backup primary [full | incremental], where it defaults to full if the option is not offered or there are no backups in the archive dir. And we need another set of workflow nodes for the incremental, which means changing wf_backup(), we may need to create some slightly different post backup workflow nodes for the incremental backup as well.

As for the major backup process, it should still happen in wf_backup.c, maybe a new set of incrementalbackup_setup/execute/teardown.

You don't have to worry about them now. Just have a rough picture in mind and focus on getting the APIs we are going to need in. We'll get to the assemble in the end.

I did some research on retrieving the start/end LSN and found an internal PostgreSQL function, GetXLogWriteRecPtr(), which retrieves the current write pointer of the WAL (Write-Ahead Log).

Great! Have you checked internally how it works? According to https://www.interdb.jp/pg/pgsql10/01.html#1011-pg_backup_start-ver14-or-earlier-pg_start_backup, it needs to do a checkpoint first. That's what we worry about, since doing a checkpoint requires a pretty hight level of privilege. I guess that's why pgbackrest by defaults waits until a regular checkpoint to happen naturally. But they do seem to have some ways to start immediately.

GuChad369 · 2024-08-30T19:40:49Z

Thank you for your explanation; it's much clearer now, and I have a rough outline.

If there are any specific APIs you need, just create an issue, write down the requirements, and I'll do my best to implement them.

As for GetXLogWriteRecPtr(), I believe it doesn't require a checkpoint first. Based on the source code, the GetXLogWriteRecPtr function does not trigger a checkpoint; it simply reads and returns the latest WAL write position.

I believe the structure in PostgreSQL, XLogCtlData, is always maintained in shared memory, which tracks various aspects of WAL processing.

If we define our own function in the extension and use it, without restricting the privilege, the repl role could also use it to get the LSN.

GuChad369 · 2024-09-02T11:09:38Z

Should we create the function pgmoneta_ext_get_lsn() in our extension using GetXLogWriteRecPtr() and test it to see if it works? Do you think it's worth trying?

jesperpedersen · 2024-09-02T12:01:55Z

Yes, we can try and see if it works. We can always remove as part of a cleanup before the release

GuChad369 added the feature New feature label Aug 21, 2024

GuChad369 added a commit to GuChad369/pgmoneta_ext that referenced this issue Sep 3, 2024

[pgmoneta#22] Get LSN

78b372e

GuChad369 added a commit to GuChad369/pgmoneta_ext that referenced this issue Sep 4, 2024

[pgmoneta#22] Get LSN

90ae42e

GuChad369 added a commit to GuChad369/pgmoneta_ext that referenced this issue Sep 14, 2024

[pgmoneta#22] Initial pgmoneta_ext_full_backup()

47a4eef

GuChad369 added a commit to GuChad369/pgmoneta_ext that referenced this issue Sep 28, 2024

[pgmoneta#22] Receive file from client

e9cfeb4

GuChad369 added a commit to GuChad369/pgmoneta_ext that referenced this issue Sep 30, 2024

[pgmoneta#22] Receive file from client

ff4a19d

GuChad369 added a commit to GuChad369/pgmoneta_ext that referenced this issue Oct 23, 2024

[pgmoneta#22] Receive file from client

4c4553c

GuChad369 added a commit to GuChad369/pgmoneta_ext that referenced this issue Oct 24, 2024

[pgmoneta#22] Receive file from client

3ac0c55

GuChad369 added a commit to GuChad369/pgmoneta_ext that referenced this issue Oct 24, 2024

[pgmoneta#22] Receive file from client

565ff96

GuChad369 added a commit to GuChad369/pgmoneta_ext that referenced this issue Oct 25, 2024

[pgmoneta#22] Receive file from client

d732aea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental backup #22

Incremental backup #22

GuChad369 commented Aug 21, 2024

GuChad369 commented Aug 21, 2024

Jubilee101 commented Aug 22, 2024

GuChad369 commented Aug 26, 2024

Jubilee101 commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

GuChad369 commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

GuChad369 commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

Jubilee101 commented Aug 28, 2024

GuChad369 commented Aug 29, 2024

Jubilee101 commented Aug 30, 2024 •

edited

Loading

GuChad369 commented Aug 30, 2024

GuChad369 commented Sep 2, 2024

jesperpedersen commented Sep 2, 2024

Incremental backup #22

Incremental backup #22

Comments

GuChad369 commented Aug 21, 2024

GuChad369 commented Aug 21, 2024

Jubilee101 commented Aug 22, 2024

GuChad369 commented Aug 26, 2024

Jubilee101 commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

GuChad369 commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

GuChad369 commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

jesperpedersen commented Aug 27, 2024

Jubilee101 commented Aug 28, 2024

GuChad369 commented Aug 29, 2024

Jubilee101 commented Aug 30, 2024 • edited Loading

GuChad369 commented Aug 30, 2024

GuChad369 commented Sep 2, 2024

jesperpedersen commented Sep 2, 2024

Jubilee101 commented Aug 30, 2024 •

edited

Loading