Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync output to disk before removing source #151

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sebastianas
Copy link
Contributor

Synchronize created output to disk before removing original input. This lowers the risk to loose source and destination if a crash happens shortly after.

Add a function which returns the directory part of the filename. This is
either the name up the last deliminator like '/' or simple '.' if it is
current directory.
The function reuses the inner parts of has_dir_sep() for the
deliminator. The inner parts are moved to get_dir_sep() in order to the
pointer while has_dir_sep() still returns the bool.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
If the file is compressed (or decompressed) then the original file is
deleted by default after the written file is closed. All this is fine.

If the machine crashes then it is possible that the newly written file
has not yet hit the disk while the previous delete operation might have.
In that case neither the original file nor the written file is
available. To avoid this scenario, sync the file, followed the directory
(to ensure that the file is part of the directory).

This may cause the `xz' command to take a bit longer because it now
waits additionally until the data has been written to disk.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
@Larhzu
Copy link
Member

Larhzu commented Nov 25, 2024

The Debian bug 814089 has an example why this can matter. Sebastian and I have discussed this both recently and in 2016. There are three related xz-devel mailing list messages from 2016 as well.

I tested fsync at some point before XZ Utils 5.0.0, likely in 2009. Back then it was on rotating media. When compressing a large number of tiny files, the performance was horrible. Obviously no one cares about performance if it leads to data loss. However, some use cases are on files which can be easily re-created, and then the lack of fsync doesn't matter but performance does.

gzip 1.7 (2016) added the --synchronous which enables fsync usage. This way users can get the safer behavior when they need it. However, it might be that users don't learn about the option before an accident has already happened.

Adding gzip's --synchronous would be a conservative change in sense that it wouldn't make current use cases slower. Opinions are welcome. Once we know what to do, we can focus on making the code do that. The current PR is fine for testing the performance impact on GNU/Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants