Do reflink copy with rsync ("inplace") #876

tleedjarv · 2023-03-20T20:00:35Z

Based on #577 (the first commit is from that PR).

With #577, it was very simple to extend this support to the rsync delta copy. This is like rsync's --inplace but it's actually safe to use. This would solve #65 in some configurations (see the caveat below).

The caveat is that it only works on platforms that export an API for fs block-level links (so-called reflinks) and on a filesystem that supports reflinks. Other platforms/filesystems continue using the old code.

(Note on implementation: the reflink mode currently works the same way as the plain old copy mode, by overlapping writes for data from the original file and for data from the network. This works well enough and makes the fallback to old mode extremely simple. An alternative implementation could be to reflink clone the entire file and then overwrite the changed blocks. This is not implemented because it would make the code more complicated and it's hard (at the moment) to see any general performance benefits.)

gdt · 2023-03-22T13:16:01Z

src/Makefile.OCaml

@@ -129,6 +129,8 @@ ifneq ($(strip $(LDLIBS)),)
  CLIBS+=-cclib '$(LDLIBS)'
 endif

+CAMLCFLAGS+=-ccopt -D_FILE_OFFSET_BITS=64


Why is that define ok in general, vs on some subset of systems? That feels like a non-POSIX thing that has to be guarded, but I'm unsure.

It will do no harm on systems that don't support this. This is required to request off_t being a 64-bit type. The new function calls added with this PR use off_t only on platforms which have this as a 64-bit type. It will do nothing bad on 32-platforms that don't support it (but you won't get large file support either).

The OCaml runtime itself (and by proxy unison) has relied on this since 2002, on all platforms, unconditionally.

src/Makefile.OCaml

src/copy_stubs.c

gdt · 2023-03-22T13:21:38Z

I don't really follow "safe". Is the syscall that writes the new bits atomic, and only called once, so we are guaranteed that even with a crash the user will get the new bits or the old bits? If that is the case, it would be good to make that argument in comments. If not, I don't understand.

tleedjarv · 2023-03-23T08:35:13Z

I don't really follow "safe". Is the syscall that writes the new bits atomic, and only called once, so we are guaranteed that even with a crash the user will get the new bits or the old bits? If that is the case, it would be good to make that argument in comments. If not, I don't understand.

This new code does not change anything in terms of Unison safety. Yes, in case of a crash or the source changing during sync, the user will get only the new bits or only the old bits, nothing mixed, nothing half-baked. But this is how Unison always works; it's not because of this new code. You could say that this continues working despite this new code.

That's why "inplace" is in quotes. It's actually not changing the target file directly like rsync does.

You could view this new code as basically augmenting read(2) and write(2), all other code (Unison's copy protocol, rsync algo) is untouched and remains exactly the same.

If you meant to have comments in code then I don't think any code comments are needed because none of this code is related to the safety of how Unison works and the comments would be out of place. All these "safety" and "inplace" claims are only included in this PR due to the discussions that had taken place in #65.

gdt · 2023-04-16T13:10:45Z

This is still draft, but having thought I am not comfortable with defining LARGEFILE macros on systems that are not known to use that. While i see the argument that defining random macros on other systems shouldn't change behavior, it's unclean and feels unsafe in the future.

In general, we need a way to feature test and adapt, as autoconf would do. That has lots of issues, mainly windows, but we could have a config.h that has #ifdef OS-that-uses-LFS and then #define, and include that widely, to centralize and have a comment home for such logic, if it isn't easy do to in the makefile. But we already do stuff like that.

tleedjarv · 2023-04-16T16:56:44Z

Is your worry about defining it globally, or about defining it unconditionally? These specifications have been in place for over 20 years (https://unix.org/version2/whatsnew/lfs.html and https://unix.org/version2/whatsnew/lfs20mar.html). Are there many systems that have not implemented these or are known to be broken?

Unison has already depended on this specific macro for over 20 years because OCaml and its Unix library define this macro unconditionally and Unison explicitly uses functions that are enabled for large files and require this macro to be defined. If this wasn't done then Unison couldn't handle files larger than 2 GB on 32-bit systems. The macro definition hasn't been needed in build scripts explicitly so far only because there has been no C code directly depending on it, and now there is in this PR. But indirectly it has always been there and it is required to support >2 GB on 32-bit systems.

Of course, since the new C code requiring this macro is highly platform-specific then it's maybe possible to rewrite it to not require this macro in the first place. And then there's also the possibility that we don't support 32-bit systems for functionality in this PR. That's not a bad option either, I think.

I did just learn that the portable way is to use getconf LFS_CFLAGS (and getconf LFS_LDFLAGS and getconf LFS_LIBS) instead of manually defining the macro. This is maybe equivalent to testing the feature set autoconf style like you mentioned? This is at least something I can fix easily.

gdt · 2023-04-16T20:17:54Z

Where I'm coming from is that the entire notion of "LFS" is only applicable to some systems. It just isn't true that all "32-bit" systems cannot handle large files without it. Some, e.g. NetBSD, have chosen instead to define off_t etc. to be types that are big enough, rather than using int, and did so a really long time ago. Just as time_t became int64_t. So on NetBSD, there is no LFS option; there are just native syscalls that work. (There are also old binary codepoints for ancient binary compat, not a unison issue). The name _LARGEFILE_SOURCE shows up only in a comment under our /usr/include.

It's perfectly ok to define this on systems where that is what you are supposed to do. But it is perpetuating the confusion that this is the sole path to being able to use large files.

I will have a deeper look, given that this is a standardized confusion-generating macro.

tleedjarv · 2023-04-16T20:32:46Z

Ok, I see. I thought the notion of "LFS" is in SUS, but that does not mean that every SUS-compatible system needs to provide a switch. "LFS" may very well be its natural and only state. In that case getconf LFS_CFLAGS will just return nothing. This is perfectly valid.

tleedjarv · 2023-04-16T20:45:54Z

... and did so a really long time ago.

Hehe, "LFS" is from 96 and was included in SUS v2 (97). Of course it's sad that it would still be opt-in on some systems (and I don't even know if that's the case).

Anyway, I think avenues worth exploring are using getconf (and it will return nothing on most systems), and possibility to rewrite the affected code so that it can't be ambiguous on 32-bit systems. Lastly, I wouldn't exclude just disabling this code if the target is discovered to be 32 bits.

gdt · 2023-04-16T23:24:55Z

My top-level point is that a system having a 32-bit CPU and there being an issue with large files that needs to be mitigated are not the same thing. This is the fundamental confusion that I am bothered by. The "large file issue" is because off_t is "int" on 32-bit CPUs (or rather, on ILP32 systems); it is not intrinsic to 32-bit CPUs. So switching on CPU type is wrong. Indeed, on PDP-11 Unix, one could use 2GB files in theory, on a machine where int was 16 bits. Of course, nobody had a disk that big.

I will try to understand this better -- and I certainly have no problem defining this when appropriate. I just don't understand when that is and isn't yet. It may be that what POSIX says is that if you define that, you are guaranteed that off_t is at least 64 bits, and if you don't, you are only guaranteed that it is some integral type. In which case, it's ok to define it. (One could build a system on an x86_64 that uses 32 bits for off_t without that define and be conforming, I think.)

tleedjarv · 2023-04-17T07:18:21Z

After some misunderstandings, I think we're converging now. It's not a hardware question, so not related to 32-bit CPUs (I used the word "system" in a very lax way). It's a libc/kernel ABI question primarily (and possibly API question secondarily, if it comes to that). Even though I basically meant ILP32 when I wrote "32-bit system", it's not directly related to ILP32 either. An ILP32 can have a working ABI and a non-ILP32 system can have a broken ABI. The expectation is that the ABI supports large files by default. If that's not the case then it may be possible to coerce it into a supporting state -- that's what this macro is for. And getconf is a way for the host itself to tell us (I don't know about cross-compilation but there's hardly any support for that as it is).

The original LFS rationale includes this statement:

Typical core utilities must be compiled in a "large" off_t compilation environment or must use the transitional APIs.

And these instructions for enabling "large" off_t (remember, it may be the default anyway): https://unix.org/version2/whatsnew/lfs20mar.html#A.3.2.4

One could build a system on an x86_64 that uses 32 bits for off_t without that define and be conforming, I think.

I don't know for sure but would expect it to be true, so it's better to be prepared (and as stated, this has been the case with OCaml and its Unix library).

This patch makes use of various platform- and filesystem-specific syscalls to accelerate local file copies. The following syscalls are tried: * clonefile(2) on macOS (version >= 10.12) * ioctl FICLONE on Linux (kernel >= 4.5) (also BTRFS_IOC_CLONE since Linux 2.6.29(?) should work) * copy_file_range(2) on Linux (kernel >= 4.5) and FreeBSD >= 13 * sendfile(2) on Linux (kernel >= 2.2) and sendfile(3ext) on Solaris Fallback to read-write loop is used if none of the above are available or supported by the filesystem.

Using power of 2 block sizes enables (or facilitates) filesystem block level links (so-called reflinks) when receiving rsync delta stream.

On filesystems that support block-level copy-on-write links (so-called reflinks), the accelerated copy functions can be used by the rsync algorithm to copy the unchanged data from the original file without actually copying it. If reflinks are not supported then an in-kernel copy is attempted instead. Worst case, if this is not supported then data will be copied as usual. For large files with only few changes, this means the following (when filesystem support exists): - syncing will be much faster; copying possibly gigabytes of data around can become almost free; - storage is only required for changed blocks; filesystem snapshots will record only changed blocks, as expected. This is like rsync's --inplace but completely safe.

tleedjarv mentioned this pull request Mar 20, 2023

Add --inplace option for syncing e.g. 15G files with small changes, trading away safety for speed #65

Open

gdt added the DRAFT PR is not ready to merge label Mar 22, 2023

gdt reviewed Mar 22, 2023

View reviewed changes

src/Makefile.OCaml Show resolved Hide resolved

gdt reviewed Mar 22, 2023

View reviewed changes

src/copy_stubs.c Show resolved Hide resolved

tleedjarv force-pushed the reflink-copy+rsync branch 2 times, most recently from 2e3b0be to 7d7d30e Compare March 23, 2023 15:35

tleedjarv and others added 3 commits March 29, 2024 12:55

Make rsync block size a power of 2

ceda54d

Using power of 2 block sizes enables (or facilitates) filesystem block level links (so-called reflinks) when receiving rsync delta stream.

tleedjarv force-pushed the reflink-copy+rsync branch from 7d7d30e to a60ed4e Compare March 29, 2024 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do reflink copy with rsync ("inplace") #876

Do reflink copy with rsync ("inplace") #876

tleedjarv commented Mar 20, 2023

gdt Mar 22, 2023

tleedjarv Mar 23, 2023

gdt commented Mar 22, 2023

tleedjarv commented Mar 23, 2023

gdt commented Apr 16, 2023

tleedjarv commented Apr 16, 2023

gdt commented Apr 16, 2023

tleedjarv commented Apr 16, 2023

tleedjarv commented Apr 16, 2023

gdt commented Apr 16, 2023

tleedjarv commented Apr 17, 2023

Do reflink copy with rsync ("inplace") #876

Are you sure you want to change the base?

Do reflink copy with rsync ("inplace") #876

Conversation

tleedjarv commented Mar 20, 2023

gdt Mar 22, 2023

Choose a reason for hiding this comment

tleedjarv Mar 23, 2023

Choose a reason for hiding this comment

gdt commented Mar 22, 2023

tleedjarv commented Mar 23, 2023

gdt commented Apr 16, 2023

tleedjarv commented Apr 16, 2023

gdt commented Apr 16, 2023

tleedjarv commented Apr 16, 2023

tleedjarv commented Apr 16, 2023

gdt commented Apr 16, 2023

tleedjarv commented Apr 17, 2023