Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce and handle EAGAIN errors on AIO label reads #16551

Merged
merged 1 commit into from
Sep 21, 2024

Conversation

amotin
Copy link
Member

@amotin amotin commented Sep 19, 2024

At least FreeBSD has a limit of 256 simultaneous AIO requests per process. Attempt to issue more results in EAGAIN errors. Since we issue 4 requests per disk/partition from 2xCPUs threads, it is quite easy to reach that limit on large systems, that results in random pool import failures. It annoyed me for quite a while on a system with 64 CPUs and 70+ partitioned disks.

This patch from one side limits the number of threads to avoid the error, while from another should softly fall back to sync reads in case of error. It takes into account _SC_AIO_MAX as a system-wide AIO limit and _SC_AIO_LISTIO_MAX as a closest value to per-process limit. The last not exactly right, but it is the best I found.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

At least FreeBSD has a limit of 256 simultaneous AIO requests per
process. Attempt to issue more results in EAGAIN errors. Since we
issue 4 requests per disk/partition from 2xCPUs threads, it is
quite easy to reach that limit on large systems, that results in
random pool import failures.  It annoyed me for quite a while on
a system with 64 CPUs and 70+ partitioned disks.

This patch from one side limits the number of threads to avoid the
error, while from another should softly fall back to sync reads in
case of error.  It takes into account _SC_AIO_MAX as a system-wide
AIO limit and _SC_AIO_LISTIO_MAX as a closest value to per-process
limit.  The last not exactly right, but it is the best I found.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find.

@behlendorf behlendorf added Status: Code Review Needed Ready for review and testing Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 19, 2024
@behlendorf behlendorf merged commit 3014dcb into openzfs:master Sep 21, 2024
23 of 31 checks passed
ptr1337 pushed a commit to CachyOS/zfs that referenced this pull request Nov 14, 2024
At least FreeBSD has a limit of 256 simultaneous AIO requests per
process. Attempt to issue more results in EAGAIN errors. Since we
issue 4 requests per disk/partition from 2xCPUs threads, it is
quite easy to reach that limit on large systems, that results in
random pool import failures.  It annoyed me for quite a while on
a system with 64 CPUs and 70+ partitioned disks.

This patch from one side limits the number of threads to avoid the
error, while from another should softly fall back to sync reads in
case of error.  It takes into account _SC_AIO_MAX as a system-wide
AIO limit and _SC_AIO_LISTIO_MAX as a closest value to per-process
limit.  The last not exactly right, but it is the best I found.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes openzfs#16551
@amotin amotin deleted the aio_max branch November 19, 2024 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants