-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do reflink copy with rsync ("inplace") #876
base: master
Are you sure you want to change the base?
Conversation
@@ -129,6 +129,8 @@ ifneq ($(strip $(LDLIBS)),) | |||
CLIBS+=-cclib '$(LDLIBS)' | |||
endif | |||
|
|||
CAMLCFLAGS+=-ccopt -D_FILE_OFFSET_BITS=64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that define ok in general, vs on some subset of systems? That feels like a non-POSIX thing that has to be guarded, but I'm unsure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will do no harm on systems that don't support this. This is required to request off_t
being a 64-bit type. The new function calls added with this PR use off_t
only on platforms which have this as a 64-bit type. It will do nothing bad on 32-platforms that don't support it (but you won't get large file support either).
The OCaml runtime itself (and by proxy unison) has relied on this since 2002, on all platforms, unconditionally.
I don't really follow "safe". Is the syscall that writes the new bits atomic, and only called once, so we are guaranteed that even with a crash the user will get the new bits or the old bits? If that is the case, it would be good to make that argument in comments. If not, I don't understand. |
This new code does not change anything in terms of Unison safety. Yes, in case of a crash or the source changing during sync, the user will get only the new bits or only the old bits, nothing mixed, nothing half-baked. But this is how Unison always works; it's not because of this new code. You could say that this continues working despite this new code. That's why "inplace" is in quotes. It's actually not changing the target file directly like rsync does. You could view this new code as basically augmenting read(2) and write(2), all other code (Unison's copy protocol, rsync algo) is untouched and remains exactly the same. If you meant to have comments in code then I don't think any code comments are needed because none of this code is related to the safety of how Unison works and the comments would be out of place. All these "safety" and "inplace" claims are only included in this PR due to the discussions that had taken place in #65. |
2e3b0be
to
7d7d30e
Compare
This is still draft, but having thought I am not comfortable with defining LARGEFILE macros on systems that are not known to use that. While i see the argument that defining random macros on other systems shouldn't change behavior, it's unclean and feels unsafe in the future. In general, we need a way to feature test and adapt, as autoconf would do. That has lots of issues, mainly windows, but we could have a config.h that has #ifdef OS-that-uses-LFS and then #define, and include that widely, to centralize and have a comment home for such logic, if it isn't easy do to in the makefile. But we already do stuff like that. |
Is your worry about defining it globally, or about defining it unconditionally? These specifications have been in place for over 20 years (https://unix.org/version2/whatsnew/lfs.html and https://unix.org/version2/whatsnew/lfs20mar.html). Are there many systems that have not implemented these or are known to be broken? Unison has already depended on this specific macro for over 20 years because OCaml and its Unix library define this macro unconditionally and Unison explicitly uses functions that are enabled for large files and require this macro to be defined. If this wasn't done then Unison couldn't handle files larger than 2 GB on 32-bit systems. The macro definition hasn't been needed in build scripts explicitly so far only because there has been no C code directly depending on it, and now there is in this PR. But indirectly it has always been there and it is required to support >2 GB on 32-bit systems. Of course, since the new C code requiring this macro is highly platform-specific then it's maybe possible to rewrite it to not require this macro in the first place. And then there's also the possibility that we don't support 32-bit systems for functionality in this PR. That's not a bad option either, I think. I did just learn that the portable way is to use |
Where I'm coming from is that the entire notion of "LFS" is only applicable to some systems. It just isn't true that all "32-bit" systems cannot handle large files without it. Some, e.g. NetBSD, have chosen instead to define off_t etc. to be types that are big enough, rather than using int, and did so a really long time ago. Just as time_t became int64_t. So on NetBSD, there is no LFS option; there are just native syscalls that work. (There are also old binary codepoints for ancient binary compat, not a unison issue). The name _LARGEFILE_SOURCE shows up only in a comment under our /usr/include. It's perfectly ok to define this on systems where that is what you are supposed to do. But it is perpetuating the confusion that this is the sole path to being able to use large files. I will have a deeper look, given that this is a standardized confusion-generating macro. |
Ok, I see. I thought the notion of "LFS" is in SUS, but that does not mean that every SUS-compatible system needs to provide a switch. "LFS" may very well be its natural and only state. In that case |
Hehe, "LFS" is from 96 and was included in SUS v2 (97). Of course it's sad that it would still be opt-in on some systems (and I don't even know if that's the case). Anyway, I think avenues worth exploring are using |
My top-level point is that a system having a 32-bit CPU and there being an issue with large files that needs to be mitigated are not the same thing. This is the fundamental confusion that I am bothered by. The "large file issue" is because off_t is "int" on 32-bit CPUs (or rather, on ILP32 systems); it is not intrinsic to 32-bit CPUs. So switching on CPU type is wrong. Indeed, on PDP-11 Unix, one could use 2GB files in theory, on a machine where int was 16 bits. Of course, nobody had a disk that big. I will try to understand this better -- and I certainly have no problem defining this when appropriate. I just don't understand when that is and isn't yet. It may be that what POSIX says is that if you define that, you are guaranteed that off_t is at least 64 bits, and if you don't, you are only guaranteed that it is some integral type. In which case, it's ok to define it. (One could build a system on an x86_64 that uses 32 bits for off_t without that define and be conforming, I think.) |
After some misunderstandings, I think we're converging now. It's not a hardware question, so not related to 32-bit CPUs (I used the word "system" in a very lax way). It's a libc/kernel ABI question primarily (and possibly API question secondarily, if it comes to that). Even though I basically meant ILP32 when I wrote "32-bit system", it's not directly related to ILP32 either. An ILP32 can have a working ABI and a non-ILP32 system can have a broken ABI. The expectation is that the ABI supports large files by default. If that's not the case then it may be possible to coerce it into a supporting state -- that's what this macro is for. And The original LFS rationale includes this statement:
And these instructions for enabling "large" off_t (remember, it may be the default anyway): https://unix.org/version2/whatsnew/lfs20mar.html#A.3.2.4
I don't know for sure but would expect it to be true, so it's better to be prepared (and as stated, this has been the case with OCaml and its Unix library). |
This patch makes use of various platform- and filesystem-specific syscalls to accelerate local file copies. The following syscalls are tried: * clonefile(2) on macOS (version >= 10.12) * ioctl FICLONE on Linux (kernel >= 4.5) (also BTRFS_IOC_CLONE since Linux 2.6.29(?) should work) * copy_file_range(2) on Linux (kernel >= 4.5) and FreeBSD >= 13 * sendfile(2) on Linux (kernel >= 2.2) and sendfile(3ext) on Solaris Fallback to read-write loop is used if none of the above are available or supported by the filesystem.
Using power of 2 block sizes enables (or facilitates) filesystem block level links (so-called reflinks) when receiving rsync delta stream.
On filesystems that support block-level copy-on-write links (so-called reflinks), the accelerated copy functions can be used by the rsync algorithm to copy the unchanged data from the original file without actually copying it. If reflinks are not supported then an in-kernel copy is attempted instead. Worst case, if this is not supported then data will be copied as usual. For large files with only few changes, this means the following (when filesystem support exists): - syncing will be much faster; copying possibly gigabytes of data around can become almost free; - storage is only required for changed blocks; filesystem snapshots will record only changed blocks, as expected. This is like rsync's --inplace but completely safe.
7d7d30e
to
a60ed4e
Compare
Based on #577 (the first commit is from that PR).
With #577, it was very simple to extend this support to the rsync delta copy. This is like rsync's
--inplace
but it's actually safe to use. This would solve #65 in some configurations (see the caveat below).The caveat is that it only works on platforms that export an API for fs block-level links (so-called reflinks) and on a filesystem that supports reflinks. Other platforms/filesystems continue using the old code.
(Note on implementation: the reflink mode currently works the same way as the plain old copy mode, by overlapping writes for data from the original file and for data from the network. This works well enough and makes the fallback to old mode extremely simple. An alternative implementation could be to reflink clone the entire file and then overwrite the changed blocks. This is not implemented because it would make the code more complicated and it's hard (at the moment) to see any general performance benefits.)