-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs-2.3.0-rc4 patchset #16760 #16794
base: zfs-2.3-release
Are you sure you want to change the base?
zfs-2.3.0-rc4 patchset #16760 #16794
Commits on Nov 15, 2024
-
JSON: fix user properties output for zpool list
This commit fixes JSON output for zpool list when user properties are requested with -o flag. This case needed to be handled specifically since zpool_prop_to_name does not return property name for user properties, instead it is stored in pl->pl_user_prop. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#16734
Configuration menu - View commit details
-
Copy full SHA for 7e3af46 - Browse repository at this point
Copy the full SHA 7e3af46View commit details -
Fix user properties output for zpool list
In zpool_get_user_prop, when called from zpool_expand_proplist and collect_pool, we often have zpool_props present in zpool_handle_t equal to NULL. This mostly happens when only one user property is requested using zpool list -o <user_property>. Checking for this case and correctly initializing the zpool_props field in zpool_handle_t fixes this issue. Interestingly, this issue does not occur if we query any other property like name or guid along with a user property with -o flag because while accessing properties like guid, zpool_prop_get_int is called which checks for this case specifically and calls zpool_get_all_props. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#16734
Configuration menu - View commit details
-
Copy full SHA for 1c6b030 - Browse repository at this point
Copy the full SHA 1c6b030View commit details -
Fix a potential page leak in mappedread_sf()
mappedread_sf() may allocate pages; if it fails to populate a page can't free it, it needs to ensure that it's placed into a page queue, otherwise it can't be reclaimed until the vnode is destroyed. I think this is quite unlikely to happen in practice, it was noticed by code inspection. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes openzfs#16643
Configuration menu - View commit details
-
Copy full SHA for 7313c6e - Browse repository at this point
Copy the full SHA 7313c6eView commit details -
Grab the rangelock unconditionally in zfs_getpages()
As a deadlock avoidance measure, zfs_getpages() would only try to acquire a rangelock, falling back to a single-page read if this was not possible. However, this is incompatible with direct I/O. Instead, release the busy lock before trying to acquire the rangelock in blocking mode. This means that it's possible for the page to be replaced, so we have to re-lookup. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes openzfs#16643
Configuration menu - View commit details
-
Copy full SHA for 37e8f3a - Browse repository at this point
Copy the full SHA 37e8f3aView commit details -
L2ARC: Move different stats updates earlier
..., before we make the header or the log block visible to others. It should fix assertion on allocated space going negative if the header is freed once the lock is dropped, while the write is still going. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16040 Closes openzfs#16743
Configuration menu - View commit details
-
Copy full SHA for 025f8b2 - Browse repository at this point
Copy the full SHA 025f8b2View commit details -
dsl_dataset: put IO-inducing frees on the pool deadlist
dsl_free() calls zio_free() to free the block. For most blocks, this simply calls metaslab_free() without doing any IO or putting anything on the IO pipeline. Some blocks however require additional IO to free. This at least includes gang, dedup and cloned blocks. For those, zio_free() will issue a ZIO_TYPE_FREE IO and return. If a huge number of blocks are being freed all at once, it's possible for dsl_dataset_block_kill() to be called millions of time on a single transaction (eg a 2T object of 128K blocks is 16M blocks). If those are all IO-inducing frees, that then becomes 16M FREE IOs placed on the pipeline. At time of writing, a zio_t is 1280 bytes, so for just one 2T object that requires a 20G allocation of resident memory from the zio_cache. If that can't be satisfied by the kernel, an out-of-memory condition is raised. This would be better handled by improving the cases that the dmu_tx_assign() throttle will handle, or by reducing the overheads required by the IO pipeline, or with a better central facility for freeing blocks. For now, we simply check for the cases that would cause zio_free() to create a FREE IO, and instead put the block on the pool's freelist. This is the same place that blocks from destroyed datasets go, and the async destroy machinery will automatically see them and trickle them out as normal. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#6783 Closes openzfs#16708 Closes openzfs#16722 Closes openzfs#16697
Configuration menu - View commit details
-
Copy full SHA for 0274a9a - Browse repository at this point
Copy the full SHA 0274a9aView commit details -
Fix some nits in zfs_getpages()
- If we don't want dmu_read_pages() to perform extra readahead/behind, pass a pointer to 0 instead of a null pointer, as dum_read_pages() expects rahead and rbehind to be non-null. - Avoid unneeded iterations in a loop. Sponsored-by: Klara, Inc. Reported-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes openzfs#16758
Configuration menu - View commit details
-
Copy full SHA for ee3677d - Browse repository at this point
Copy the full SHA ee3677dView commit details -
zvol_os.c: Increase optimal IO size
Since zvol read and write can process up to (DMU_MAX_ACCESS / 2) bytes in a single operation, the current optimal I/O size is too low. SCST directly reports this value as the optimal transfer length for the target SCSI device. Increasing it from the previous volblocksize results in performance improvement for large block parallel I/O workloads. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes openzfs#16750
Configuration menu - View commit details
-
Copy full SHA for 4c9f2ce - Browse repository at this point
Copy the full SHA 4c9f2ceView commit details -
tests: fix uClibc for getversion.c
This patch fixes compilation with uClibc by applying the same fallback as commit e12d761 to the `getversion.c` file, which was previously overlooked. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: José Luis Salvador Rufo <[email protected]> Closes openzfs#16735 Closes openzfs#16741
Configuration menu - View commit details
-
Copy full SHA for 2600650 - Browse repository at this point
Copy the full SHA 2600650View commit details -
AUTHORS: refresh with recent new contributors
Welcome to the party 🎉 Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16762
Configuration menu - View commit details
-
Copy full SHA for 18474ef - Browse repository at this point
Copy the full SHA 18474efView commit details
Commits on Nov 21, 2024
-
zed: prevent automatic replacement of offline vdevs
When an OFFLINE device is physically removed, a spare is automatically activated. However, this behavior differs in FreeBSD, where we do not transition from OFFLINE state to REMOVED. Our support team has encountered cases where customers experienced unexpected behavior during drive replacements, with multiple spares activating for the same VDEV due to a single disk replacement. This patch ensures that a drive in an OFFLINE state remains in that state, preventing it from transitioning to REMOVED and being automatically replaced by a spare. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes openzfs#16751
Configuration menu - View commit details
-
Copy full SHA for 23f063d - Browse repository at this point
Copy the full SHA 23f063dView commit details -
Fix few __VA_ARGS typos in assertions
It should be __VA_ARGS__, not __VA_ARGS. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16780
Configuration menu - View commit details
-
Copy full SHA for 8023d9d - Browse repository at this point
Copy the full SHA 8023d9dView commit details -
Expand zpool-remove.8 manpage with example results
Also fix comment cross-referencing to zpool.8. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Steve Mokris <[email protected]> Closes openzfs#16777
Configuration menu - View commit details
-
Copy full SHA for 9a4b2f0 - Browse repository at this point
Copy the full SHA 9a4b2f0View commit details -
Move "no name changes" from compression to checksum table
Compression names actually aren't used in dedup table names, but checksum names are. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16776
Configuration menu - View commit details
-
Copy full SHA for 9206039 - Browse repository at this point
Copy the full SHA 9206039View commit details -
Remove hash_elements_max accounting from DBUF and ARC
Those values require global atomics to get current hash_elements values in few of the hottest code paths, while in all the years I never cared about it. If somebody wants, it should be easy to get it by periodic sampling, since neither ARC header nor DBUF counts change so fast that it would be difficult to catch. For now I've left hash_elements_max kstat for ARC, since it was used/reported by arc_summary and it would break older versions, but now it just reports the current value. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16759
Configuration menu - View commit details
-
Copy full SHA for f7675ae - Browse repository at this point
Copy the full SHA f7675aeView commit details -
L2ARC: Stop rebuild before setting spa_final_txg
Without doing that there is a race window on export when history log write by completed rebuild dirties transaction beyond final, triggering assertion. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16714 Closes openzfs#16782
Configuration menu - View commit details
-
Copy full SHA for 3f9af02 - Browse repository at this point
Copy the full SHA 3f9af02View commit details -
ZTS: Fix zpool_status_008_pos false positive
Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold to 750ms to avoid false positives due to unrelated slow IOs which may occur in the CI environment. Additionally, clear the fault injection as soon as it is no longer required for the test case. Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#16769
Configuration menu - View commit details
-
Copy full SHA for 7fb7eb9 - Browse repository at this point
Copy the full SHA 7fb7eb9View commit details -
zio: Avoid sleeping in the I/O path
zio_delay_interrupt(), apparently used for fault injection, is executed in the I/O pipeline. It can cause the calling thread to go to sleep, which is not allowed on FreeBSD. This happens only for small delays, though, and there's no apparent reason to avoid deferring to a taskqueue in that case, as it already does otherwise. Simply go to sleep unconditionally. This fixes an occasional panic I see when running the ZTS on FreeBSD. Also remove an unhelpful comment referencing the non-existent timeout_generic(). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes openzfs#16785
Configuration menu - View commit details
-
Copy full SHA for d7abeef - Browse repository at this point
Copy the full SHA d7abeefView commit details -
fix: block incompatible kernel from being installed
The current "Requires" lines only ensure the old kernel is available on the system but it does not prevent fedora from updating to an incompatible and breaking user's system. Set Conflicts to block incompatible kernels from being installed. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: tleydxdy <[email protected]> Closes openzfs#16139
Configuration menu - View commit details
-
Copy full SHA for 3c0b8da - Browse repository at this point
Copy the full SHA 3c0b8daView commit details -
ZTS: Avoid embedded blocks in bclone/bclone_prop_sync
If we write less than 113 bytes with enabled compression we get embeded block, which then fails check for number of cloned blocks in bclone_test. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16740
Configuration menu - View commit details
-
Copy full SHA for 9753fea - Browse repository at this point
Copy the full SHA 9753feaView commit details -
BRT: Don't call brt_pending_remove() on holes/embedded
We are doing exactly the same checks around all brt_pending_add(). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16740
Configuration menu - View commit details
-
Copy full SHA for 2b64d41 - Browse repository at this point
Copy the full SHA 2b64d41View commit details -
ZAP: Add by_dnode variants to lookup/prefetch_uint64
Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16740
Configuration menu - View commit details
-
Copy full SHA for 1917c26 - Browse repository at this point
Copy the full SHA 1917c26View commit details -
BRT: Rework structures and locks to be per-vdev
While block cloning operation from the beginning was made per-vdev, before this change most of its data were protected by two pool- wide locks. It created lots of lock contention in many workload. This change makes most of block cloning data structures per-vdev, which allows to lock them separately. The only pool-wide lock now it spa_brt_lock, protecting array of per-vdev pointers and in most cases taken as reader. Also this splits per-vdev locks into three different ones: bv_pending_lock protects the AVL-tree of pending operations in open context, bv_mos_entries_lock protects BRT ZAP object from while being prefetched, and bv_lock protects the rest of per-vdev context during TXG commit process. There should be no functional difference aside of some optimizations. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16740
Configuration menu - View commit details
-
Copy full SHA for 409aad3 - Browse repository at this point
Copy the full SHA 409aad3View commit details -
BRT: More optimizations after per-vdev splitting
- With both pending and current AVL-trees being per-vdev and having effectively identical comparison functions (pending tree compared also birth time, but I don't believe it is possible for them to be different for the same offset within one transaction group), it makes no sense to move entries from one to another. Instead inline dramatically simplified brt_entry_addref() into brt_pending_apply(). It no longer requires bv_lock, since there is nothing concurrent to it at the time. And it does not need to search the tree for the previous entries, since it is the same tree, we already have the entry and we know it is unique. - Put brt_vdev_lookup() and brt_vdev_addref() into different tree traversals to avoid false positives in the first due to the second entcount modifications. It saves dramatic amount of time when a file cloned first time by not looking for non-existent ZAP entries. - Remove avl_is_empty(bv_tree) check from brt_maybe_exists(). I don't think it is needed, since by the time all added entries are already accounted in bv_entcount. The extra check must be producing too many false positives for no reason. Also we don't need bv_lock there, since bv_entcount pointer must be table at this point, and we don't care about false positive races here, while false negative should be impossible, since all brt_vdev_addref() have already completed by this point. This dramatically reduces lock contention on massive deletes of cloned blocks. The only remaining one is between multiple parallel free threads calling brt_entry_decref(). - Do not update ZAP if net change for a block over the TXG was 0. In combination with above it makes file move between datasets as cheap operation as originally intended if it fits into one TXG. - Do not allocate vdevs on pool creation or import if it did not have active block cloning. This allows to save a bit in few cases. - While here, add proper error handling in brt_load() on pool import instead of assertions. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16773
Configuration menu - View commit details
-
Copy full SHA for 1a5414b - Browse repository at this point
Copy the full SHA 1a5414bView commit details -
BRT: Clear bv_entcount_dirty on destroy
This fixes assertion in brt_sync_table() on debug builds when last cloned block on the vdev is freed and bv_meta_dirty is cleared, while bv_entcount_dirty is not. Should not matter in production. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16791
Configuration menu - View commit details
-
Copy full SHA for c165daa - Browse repository at this point
Copy the full SHA c165daaView commit details -
Update the META file to reflect compatibility with the 6.12 kernel. Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#16793
Configuration menu - View commit details
-
Copy full SHA for 3ed1d60 - Browse repository at this point
Copy the full SHA 3ed1d60View commit details
Commits on Nov 23, 2024
-
ZAP: Reduce leaf array and free chunks fragmentation
Previous implementation of zap_leaf_array_free() put chunks on the free list in reverse order. Also zap_leaf_transfer_entry() and zap_entry_remove() were freeing name and value arrays in reverse order. Together this created a mess in the free list, making following allocations much more fragmented than necessary. This patch re-implements zap_leaf_array_free() to keep existing chunks order, and implements non-destructive zap_leaf_array_copy() to be used in zap_leaf_transfer_entry() to allow properly ordered freeing name and value arrays there and in zap_entry_remove(). With this change test of some writes and deletes shows percent of non-contiguous chunks in DDT reducing from 61% and 47% to 0% and 17% for arrays and frees respectively. Sure some explicit sorting could do even better, especially for ZAPs with variable-size arrays, but it would also cost much more, while this should be very cheap. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16766
Configuration menu - View commit details
-
Copy full SHA for 434ecad - Browse repository at this point
Copy the full SHA 434ecadView commit details -
by protecting against sb->s_shrink eviction on umount with newer kernels deactivate_locked_super calls shrinker_free and only then sops->kill_sb cb, resulting in UAF on umount when trying to reach for the shrinker functions in zpl_prune_sb of in-umount dataset Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes openzfs#16770
Configuration menu - View commit details
-
Copy full SHA for 6fb74ca - Browse repository at this point
Copy the full SHA 6fb74caView commit details -
FreeBSD: Lock vnode in zfs_ioctl()
Previously vnode was not locked there, unlike Linux. It required locking it in vn_flush_cached_data(), which recursed on the lock if called from zfs_clone_range(), having the vnode locked. Reviewed-by: Alan Somers <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16789 Closes openzfs#16796
Configuration menu - View commit details
-
Copy full SHA for eda3968 - Browse repository at this point
Copy the full SHA eda3968View commit details -
FreeBSD: Reduce copy_file_range() source lock to shared
Linux locks copy_file_range() source as shared. FreeBSD was doing it also, but then was changed to exclusive, partially because KPI of that time was doing so, and partially seems out of caution. Considering zfs_clone_range() uses range locks on both source and destination, neither should require exclusive vnode locks. But one step at a time, just sync it with Linux for now. Reviewed-by: Alan Somers <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16789 Closes openzfs#16797
Configuration menu - View commit details
-
Copy full SHA for 2db5bbe - Browse repository at this point
Copy the full SHA 2db5bbeView commit details -
Configuration menu - View commit details
-
Copy full SHA for f30e11f - Browse repository at this point
Copy the full SHA f30e11fView commit details