-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix I/O replay timing #1540
Open
horshack-dpreview
wants to merge
1
commit into
axboe:master
Choose a base branch
from
horshack-dpreview:Fix_Iolog_Replay_Timing
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Fix I/O replay timing #1540
horshack-dpreview
wants to merge
1
commit into
axboe:master
from
horshack-dpreview:Fix_Iolog_Replay_Timing
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fio supports replaying I/Os from a previous invocation, captured either via blktrace or `write_iolog`. That support includes simulating the timing between I/Os by using the running timestamps from the capture log, recreating that timing by inserting delays in the replay submission path if necessary. The calculation for how much delay to insert is based on how long it takes to complete the most recent replayed I/O, as compared to the how long that same I/O took in the original workload recorded in the log. The issue is in iolog_delay(). It calculates the replay I/O execution time starting from just after issuing I/O to the kernel in td_io_queue() until the start of iolog_delay(). This excludes the time it actually took for the kernel to process the submission, which depending on the I/O engine can be significant. Because this time was under-measured, iolog_delay()'s added more delay than needed, since it thought the replay was executing too fast vs the logged I/O. This issue causes the total replay to take 2x or longer than the original workload due to the progressively-larger drift from these excessive delays. The fix is to change iolog_delay() to use the full execution path time for an I/O, including the kernel submission time. This is accomplished by removing the existing points where the replay I/O clock is started (td_io_queue and the async engines) and moving that into iolog_delay(). This means we're no longer technically matching the precise stage in the I/O path for replay vs the original workload, but this doesn't really matter since that's only an initial time skew reflected in the very first I/O operation. After that we only need to match the inter-I/O pacing. This change includes fixes to additional issues found in iolog_delay(): 1) The logic that measures how long usec_sleep() takes relative to how much sleep was requested was comparing against the original log delay value instead of the value adjusted by the last replay execution I/O, causing an under-adjustment when usec_sleep() took longer than requested, creating more total delay than necessary. 2) The logic wasn't accumulating the usec_sleep() adjustment from the above step across more than one replayed I/O. It would reset the adjustment to zero on the next I/O. I guess the logic was assuming the adjustment would never be larger than a single log delay interval. It also wasn't accumulating the replay I/O execution time across more than one interval, so for example if a replay I/O took 3x longer than the log I/O it would only adjust that down to a zero delay for one replay I/O, causing the 2x difference to never be accounted for and thus add to the total running replay time. Link: axboe#1537 (comment) Signed-off-by: Adam Horshack ([email protected])
horshack-dpreview
force-pushed
the
Fix_Iolog_Replay_Timing
branch
from
March 10, 2023 01:29
9dd03fe
to
6d608dd
Compare
Update commit message to give a more thorough explanation of the issue and fix. |
@vincentkfu, This PR has been languishing for a while. Is there anything preventing it from being merged? |
Please fix the build failures. I am on vacation now but this is on my list of things to look at after I return to work next week. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fio supports replaying I/Os from a previous invocation, captured either via blktrace or
write_iolog
. That support includes simulating the timing between I/Os by using the running timestamps from the capture log, recreating that timing by inserting delays in the replay submission path if necessary. The calculation for how much delay to insert is based on how long it takes to complete the most recent replayed I/O, as compared to the how long that same I/O took in the original workload recorded in the log.The issue is in iolog_delay(). It calculates the replay I/O execution time starting from just after issuing I/O to the kernel in td_io_queue() until the start of iolog_delay(). This excludes the time it actually took for the kernel to process the submission, which depending on the I/O engine can be significant. Because this time was under-measured, iolog_delay()'s added more delay than needed, since it thought the replay was executing too fast vs the logged I/O.
This issue causes the total replay to take 2x or longer than the original workload due to the progressively-larger drift from these excessive delays.
The fix is to change iolog_delay() to use the full execution path time for an I/O, including the kernel submission time. This is accomplished by removing the existing points where the replay I/O clock is started (td_io_queue and the async engines) and moving that into iolog_delay(). This means we're no longer technically matching the precise stage in the I/O path for replay vs the original workload, but this doesn't really matter since that's only an initial time skew reflected in the very first I/O operation. After that we only need to match the inter-I/O pacing.
This change includes fixes to additional issues found in iolog_delay():
The logic that measures how long usec_sleep() takes relative to how much sleep was requested was comparing against the original log delay value instead of the value adjusted by the last replay execution I/O, causing an under-adjustment when usec_sleep() took longer than requested, creating more total delay than necessary.
The logic wasn't accumulating the usec_sleep() adjustment from the above step across more than one replayed I/O. It would reset the adjustment to zero on the next I/O. I guess the logic was assuming the adjustment would never be larger than a single log delay interval. It also wasn't accumulating the replay I/O execution time across more than one interval, so for example if a replay I/O took 3x longer than the log I/O it would only adjust that down to a zero delay for one replay I/O, causing the 2x difference to never be accounted for and thus add to the total running replay time.
Link: #1537 (comment)
Signed-off-by: Adam Horshack ([email protected])