Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup Ninja backend with many extract_objects or targets #13879

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Commits on Nov 20, 2024

  1. compilers: cache the results of is_source()

    is_source() is called almost 900000 times in a QEMU setup.  Together with
    the previously added caching, this basically removes _determine_ext_objs()
    from the profile when building QEMU.
    
    Signed-off-by: Paolo Bonzini <[email protected]>
    bonzini committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    f1fbed6 View commit details
    Browse the repository at this point in the history
  2. utils: cache build directory files

    get_target_generated_sources often calls File.from_built_relative on
    the same file, if it is used by many sources.  This is a somewhat
    expensive call both CPU- and memory-wise, so cache the creation
    of build-directory files as well.
    
    Signed-off-by: Paolo Bonzini <[email protected]>
    bonzini committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    c1f4f6b View commit details
    Browse the repository at this point in the history
  3. ninjabackend: use File.from_built_relative()

    Do not reinvent it in NinjaBackend.determine_ext_objs(), so as to use
    the recently added caching of the results of File.from_built_relative().
    
    Signed-off-by: Paolo Bonzini <[email protected]>
    bonzini committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    589510f View commit details
    Browse the repository at this point in the history
  4. ninjabackend: prefer "in" to regex search

    Regexes can be surprisingly slow.  This small change brings
    ninja_quote() from 12 to 3 seconds when building QEMU.
    Before:
    
       ncalls  tottime  percall  cumtime  percall
      3734443    4.872    0.000   11.944    0.000
    
    After:
    
       ncalls  tottime  percall  cumtime  percall
      3595590    3.193    0.000    3.196    0.000
    
    Signed-off-by: Paolo Bonzini <[email protected]>
    bonzini committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    f3dfac4 View commit details
    Browse the repository at this point in the history
  5. arglist: optimize __init__()

    "Inline" CompilerArgs.__iter__() into CompilerArgs.__init__(), so that
    replace list(Iterable) is replaced by the much faster list(List).
    
    Before:
    
       ncalls  tottime  cumtime
        19268    0.163    3.586 arglist.py:97(__init__)
    
    After:
    
       ncalls  tottime  cumtime
        18674    0.211    3.442 arglist.py:97(__init__)
    
    Signed-off-by: Paolo Bonzini <[email protected]>
    bonzini committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    281b960 View commit details
    Browse the repository at this point in the history
  6. arglist: optimize flush_pre_post(), and __iadd__() with it

    Unless an argument is marked as Dedup.OVERRIDDEN, pre_flush_set and
    post_flush_set will always be empty and the loops in flush_pre_post()
    will not be doing anything interesting:
    
            for a in self.pre:
                dedup = self._can_dedup(a)
                if a not in pre_flush_set:
                    # This just makes new a copy of self.pre
                    new.append(a)
                    if dedup is Dedup.OVERRIDDEN:
                        # this never happens
                        pre_flush_set.add(a)
    
            for a in reversed(self.post):
                dedup = self._can_dedup(a)
                if a not in post_flush_set:
                    # Here self.post is reversed twice
                    post_flush.appendleft(a)
                    if dedup is Dedup.OVERRIDDEN:
                        # this never happens
                        post_flush_set.add(a)
            new.extend(post_flush)
    
    In this case it's possible to avoid expensive calls and loops, instead
    relying as much on Python builtins as possible.  Track whether any options
    have that flag and if not just concatenate pre, _container and post.
    
    Before:
    
       ncalls  tottime  cumtime
        45127    0.251    4.530 arglist.py:142(__iter__)
        81866    3.623    5.013 arglist.py:108(flush_pre_post)
        76618    3.793    5.338 arglist.py:273(__iadd__)
    
    After:
    
        35647    0.156    0.627 arglist.py:160(__iter__)
        78998    2.627    3.603 arglist.py:116(flush_pre_post)
        73774    3.605    5.049 arglist.py:292(__iadd__)
    
    The time in __iadd__ is reduced because it calls __iter__, which flushes
    pre and post.
    
    Signed-off-by: Paolo Bonzini <[email protected]>
    bonzini committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    d054f7f View commit details
    Browse the repository at this point in the history
  7. arglist: post is only appended to, make it a list

    self.post is only ever appended to on the right hand.  However,
    it is then reversed twice in flush_pre_post(), by using "for a in
    reversed.post()" and appendleft() within the loop.  It would be tempting
    to use appendleft() in __iadd__ to avoid the call to reversed(), but that
    is not a good idea because the loop of flush_pre_post() is part of a slow
    path.  It's rather more important to use a fast extend-with-list-argument
    in the fast path where needs_override_check if False.
    
    For clarity, and to remove the temptation, make "post" a list instead
    of a deque.
    
    Signed-off-by: Paolo Bonzini <[email protected]>
    bonzini committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    db0cadc View commit details
    Browse the repository at this point in the history
  8. backends: avoid extend() in _flatten_object_list

    Accumulate into lists that are passed by the caller, thus avoiding
    allocations and calls to extend() on recursive extract_objects().
    
    Signed-off-by: Paolo Bonzini <[email protected]>
    bonzini committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    2130bbb View commit details
    Browse the repository at this point in the history
  9. backends: remove unused argument

    The proj_dir_to_build_root argument of determine_ext_objs() is always empty,
    remove it.
    
    Signed-off-by: Paolo Bonzini <[email protected]>
    bonzini committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    3593e90 View commit details
    Browse the repository at this point in the history
  10. backends: avoid os.path.join in common case of _flatten_object_list

    proj_dir_to_build_root is empty by default, in fact always except on
    some cases of the VS2010 backend.
    
    Add it after the fact in flatten_object_list(), which reduces the
    numbers of os.path.join().
    
    Signed-off-by: Paolo Bonzini <[email protected]>
    bonzini committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    58a072d View commit details
    Browse the repository at this point in the history