Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI tests for combined #16722 and #16040 #16744

Closed
wants to merge 2 commits into from

Conversation

behlendorf
Copy link
Contributor

@behlendorf behlendorf commented Nov 12, 2024

Motivation and Context

Verify using the CI that #16743 does resolve the accounting ASSERT
race which was uncovered by the change in #16722.

After the CI verifies we're no long able to hit the ASSERT with both
changes applied we can merge the component PRs and close this one.

Description

See #16722 and #16743

How Has This Been Tested?

Will be testing by the CI.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

amotin and others added 2 commits November 11, 2024 16:00
..., before we make the header or the log block visible to others.
It should fix assertion on allocated space going negative if the
header is freed once the lock is dropped, while the write is still
going.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Fixes openzfs#16040
dsl_free() calls zio_free() to free the block. For most blocks, this
simply calls metaslab_free() without doing any IO or putting anything on
the IO pipeline.

Some blocks however require additional IO to free. This at least
includes gang, dedup and cloned blocks. For those, zio_free() will issue
a ZIO_TYPE_FREE IO and return.

If a huge number of blocks are being freed all at once, it's possible
for dsl_dataset_block_kill() to be called millions of time on a single
transaction (eg a 2T object of 128K blocks is 16M blocks). If those are
all IO-inducing frees, that then becomes 16M FREE IOs placed on the
pipeline. At time of writing, a zio_t is 1280 bytes, so for just one 2T
object that requires a 20G allocation of resident memory from the
zio_cache. If that can't be satisfied by the kernel, an out-of-memory
condition is raised.

This would be better handled by improving the cases that the
dmu_tx_assign() throttle will handle, or by reducing the overheads
required by the IO pipeline, or with a better central facility for
freeing blocks.

For now, we simply check for the cases that would cause zio_free() to
create a FREE IO, and instead put the block on the pool's freelist. This
is the same place that blocks from destroyed datasets go, and the async
destroy machinery will automatically see them and trickle them out as
normal.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
@behlendorf behlendorf added the Status: Work in Progress Not yet ready for general review label Nov 12, 2024
@amotin
Copy link
Member

amotin commented Nov 12, 2024

None of the tests failed on the L2ARC issue. Restarted the failed ones for another try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Work in Progress Not yet ready for general review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants