Releases: pkolaczk/fclones
Usability improvements
Since release 0.11, -R option has been removed and now all directories are scanned recursively.
This means most of the commands can be shorter. E.g. searching for duplicates in the current folder is now just fclones .
You can force non-recursive scans by adding --depth 0
.
Additionally fclones
validates input paths before running. If any of the input files is not readable, fclones
will terminate with an error.
Minor directory tree scanning performance improvement on HDD
It turns out sorting entries by inode slightly improves performance on HDDs and there is a cheap way of getting the inode id for each entry.
This optimization is very local now and sorting spans only a single directory contents. We can do better in the future by keeping unprocessed directories in a priority queue, but that would require a lot more changes to the code.
Make fclones usable as library
This is mostly code reorg release. Stuff got moved around.
- Dependency o system PCRE libs was dropped.
- Now
fclones
can be also used as a library - many internal APIs are hidden now - More code has been documented
- Minor look&feel enhancements in logging
- Finally I published it on crates.io!
Always access an HDD from a single thread
This release changes the default settings for parallelism.
For HDDs, both random and sequential thread pool are now sized to 1.
This turned out to improve performance even further now after we have access ordering introduced in 0.9.0.
Huge performance improvements on spinning drives
This release contains big performance improvements:
-
File hashing is performed in the order of physical location of the data on disk. This minimizes disk seek latency and hugely improves performance on HDDs. On file systems which don't support
ioctl
FIEMAP
feature to get physical location of the file data, accesses are ordered by file identifiers (e.g. inode ids on Unix) which seems to also improve performance as long as file data are not fragmented heavily. -
Switched from
HashMap
toBTreeMap
for file grouping. This reduces memory usage (and also improves memory access locality, but this probably isn't something you'll notice in cold-cache runs).
Stay tuned for updated benchmarks...
New output format: fdupes
Now you can format output as fdupes
with -f fdupes
.
Minor logging / output improvements
- Fixed message for
fclones -V
andfclones -h
- Logs "Started" message before scanning directories - makes it easier to measure how long the scanning phase took
Improved performance on spinning drives
This release significantly improves file scanning performance on spinning drives (HDDs).
Now fclones
maintains a pool of threads per each physical device. It detects the drive type and adapts its I/O access patterns appropriately.
Parallel access is performed for small, random I/O requests only. Sequential full-contents hashing is performed by a single thread per each HDD found. Multiple devices are supported, so on a computer equipped with an HDD and an SSD fclones
will use different strategy to access the HDD and different to access the SSD, and both devices will be used in parallel.
For more information, see Tuning in the README.md.
Minor performance and progress reporting improvements
- Grouping by suffixes is performed only for large enough files - the probability of files differing by suffix is small, therefore we should do it only if we can potentially save reading a huge amount of data
- Progress bar can be switched off with the new
--quiet
flag - Improved accuracy of the progress notification
Enhanced file transformation
- Transforming files is safer now; when using
$IN
the transform program is executed on a copy of the file - Added
--in-place
option to allow commands likeexiv2
which modify files in-place (they don't write new files) - Added
--no-copy
option to allow the old behaviour of working directly on the originals (it is faster, but please be cautious!)