Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix I/O replay timing #1540

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Commits on Mar 10, 2023

  1. Fix I/O replay timing

    fio supports replaying I/Os from a previous invocation, captured either
    via blktrace or `write_iolog`. That support includes simulating the
    timing between I/Os by using the running timestamps from the capture
    log, recreating that timing by inserting delays in the replay submission
    path if necessary.  The calculation for how much delay to insert is
    based on how long it takes to complete the most recent replayed I/O, as
    compared to the how long that same I/O took in the original workload
    recorded in the log.
    
    The issue is in iolog_delay(). It calculates the replay I/O execution
    time starting from just after issuing I/O to the kernel in td_io_queue()
    until the start of iolog_delay(). This excludes the time it actually
    took for the kernel to process the submission, which depending on the
    I/O engine can be significant. Because this time was under-measured,
    iolog_delay()'s added more delay than needed, since it thought the
    replay was executing too fast vs the logged I/O.
    
    This issue causes the total replay to take 2x or longer than the
    original workload due to the progressively-larger drift from these
    excessive delays.
    
    The fix is to change iolog_delay() to use the full execution path time
    for an I/O, including the kernel submission time. This is accomplished
    by removing the existing points where the replay I/O clock is started
    (td_io_queue and the async engines) and moving that into iolog_delay().
    This means we're no longer technically matching the precise stage in the
    I/O path for replay vs the original workload, but this doesn't really
    matter since that's only an initial time skew reflected in the very
    first I/O operation. After that we only need to match the inter-I/O
    pacing.
    
    This change includes fixes to additional issues found in iolog_delay():
    
    1) The logic that measures how long usec_sleep() takes relative to how
    much sleep was requested was comparing against the original log delay
    value instead of the value adjusted by the last replay execution I/O,
    causing an under-adjustment when usec_sleep() took longer than
    requested, creating more total delay than necessary.
    
    2) The logic wasn't accumulating the usec_sleep() adjustment from the
    above step across more than one replayed I/O. It would reset the
    adjustment to zero on the next I/O. I guess the logic was assuming the
    adjustment would never be larger than a single log delay interval. It
    also wasn't accumulating the replay I/O execution time across more than
    one interval, so for example if a replay I/O took 3x longer than the log
    I/O it would only adjust that down to a zero delay for one replay I/O,
    causing the 2x difference to never be accounted for and thus add to the
    total running replay time.
    
    Link: axboe#1537 (comment)
    
    Signed-off-by: Adam Horshack ([email protected])
    horshack-dpreview committed Mar 10, 2023
    Configuration menu
    Copy the full SHA
    6d608dd View commit details
    Browse the repository at this point in the history