Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #36850 - Add indexes to speed up applicability calculation when there are lots of module streams installed #10773

Conversation

jeremylenz
Copy link
Member

What are the changes introduced in this pull request?

Add indexes to the katello_installed_packages table. This should hopefully speed up applicability calculations when there are a lot of module streams installed.

Considerations taken when implementing this change?

This is really tricky to test when you don't have a lot of hosts. The best testing steps I found are those @hao-yu gave us, I'll adapt them below

What are the testing steps for this pull request?

Steps to Reproduce:

  1. Prepare a RHEL 8 client and register it to the Satellite.
  2. Enable the appstream repository.
  3. Enable and install as many module streams as possible.
  4. After that force upload the package profile to the Satellite to trigger the "Katello::Applicability::Hosts::BulkGenerate" task.

Now, in Rails console, do this both before and after running the migration:

cf = Host.find(XXXX).content_facet
bound_repos = cf.bound_repositories.collect {|repo| repo.library_instance_id.nil? ? repo.id : repo.library_instance_id}
helper = Katello::Applicability::ApplicableContentHelper.new(cf, ::Katello::Rpm, bound_repos)
helper.fetch_enabled_module_stream_ids.size
Benchmark.measure {p helper.locked_modular_installed_packages(helper.fetch_enabled_module_stream_ids).size}

Do the benchmark 5-6 times in a row and take the average.

For my dev setup with a single rhel8 host - Typical benchmarks after the migration:

#<Benchmark::Tms:0x0000562b392fe9b8
 @cstime=0.0,
 @cutime=0.0,
 @label="",
 @real=0.2544070029980503,
 @stime=0.0011980000000000046,
 @total=0.18494600000000003,
 @utime=0.18374800000000002>

Before the migration, the @real was closer to .50 or .41. So not a huge improvement at this scale, but still measurable.

  applicability calculation when there are
  lots of module streams installed
@jeremylenz
Copy link
Member Author

[test katello]

@lfu
Copy link
Member

lfu commented Nov 8, 2023

Before the migration, @real was between 0.50 ~ 0.65.
After migration, it was 0.33 ~ 0.48. The improvement is noticable.

Also checked the DB index usage.

    katello=#   SELECT * FROM pg_stat_user_indexes WHERE relname = 'katello_installed_packages' ORDER BY idx_scan;
     relid | indexrelid | schemaname |          relname           |                    indexrelname                    | idx_scan | idx_tup_read | idx_tup_fetch
    -------+------------+------------+----------------------------+----------------------------------------------------+----------+--------------+---------------
     22221 |     595787 | public     | katello_installed_packages | index_katello_installed_packages_on_nvra           |        0 |            0 |             0
     22221 |     595788 | public     | katello_installed_packages | index_katello_installed_packages_on_nvra_and_epoch |        0 |            0 |             0
     22221 |      22238 | public     | katello_installed_packages | index_katello_installed_packages_on_name_and_nvra  |     9453 |        23575 |           808
     22221 |      22228 | public     | katello_installed_packages | katello_installed_packages_pkey                    |    12799 |       223203 |        223203
     22221 |      23407 | public     | katello_installed_packages | index_katello_installed_packages_on_nvrea          |   208034 |       203624 |        198901
    (5 rows)

After console commands.

    katello=#   SELECT * FROM pg_stat_user_indexes WHERE relname = 'katello_installed_packages' ORDER BY idx_scan;
     relid | indexrelid | schemaname |          relname           |                    indexrelname                    | idx_scan | idx_tup_read | idx_tup_fetch 
    -------+------------+------------+----------------------------+----------------------------------------------------+----------+--------------+---------------
     22221 |     595788 | public     | katello_installed_packages | index_katello_installed_packages_on_nvra_and_epoch |        0 |            0 |             0
     22221 |      22238 | public     | katello_installed_packages | index_katello_installed_packages_on_name_and_nvra  |     9460 |        23585 |           818
     22221 |      22228 | public     | katello_installed_packages | katello_installed_packages_pkey                    |    12807 |       226138 |        226138
     22221 |     595787 | public     | katello_installed_packages | index_katello_installed_packages_on_nvra           |   158186 |         9548 |          9548
     22221 |      23407 | public     | katello_installed_packages | index_katello_installed_packages_on_nvrea          |   211280 |       206870 |        202147
    (5 rows)


@jeremylenz
Copy link
Member Author

[test katello]

@jeremylenz
Copy link
Member Author

🍏

Copy link
Member

@lfu lfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index does improve the performance 😄

@jeremylenz jeremylenz merged commit d2baeb2 into Katello:master Nov 9, 2023
6 checks passed
@hao-yu
Copy link
Contributor

hao-yu commented Nov 15, 2023

@jeremylenz Can I propose this change? The reason is we have already given the solution to some customers by creating the indexes manually. As the created index name are not the same, duplicate indexes will be created. Thus, I am proposing to drop the old indexes and recreate them as below.

class CreateIndexesForNvraAndEpoch < ActiveRecord::Migration[6.1]
  def up
    if index_exists?(:katello_installed_packages, :nvra)
      remove_index :katello_installed_packages, :nvra
    end

    if index_exists?(:katello_installed_packages, [:nvra, :epoch])
      remove_index :katello_installed_packages, [:nvra, :epoch]
    end

    add_index :katello_installed_packages, :nvra
    add_index :katello_installed_packages, [:nvra, :epoch]
  end

  def down
    remove_index :katello_installed_packages, :nvra
    remove_index :katello_installed_packages, [:nvra, :epoch]
  end
end

@jeremylenz
Copy link
Member Author

jeremylenz commented Nov 15, 2023

@jeremylenz Can I propose this change? The reason is we have already given the solution to some customers by creating the indexes manually. As the created index name are not the same, duplicate indexes will be created. Thus, I am proposing to drop the old indexes and recreate them as below.

Hey @hao-yu

Since this already-merged PR will benefit all users and not just the specific users you've created the indexes for manually, I would suggest giving this code to those specific users.

(edit: also I don't really see any harm in having an unused index, either :)

@hao-yu
Copy link
Contributor

hao-yu commented Nov 15, 2023

@jeremylenz we are not able to tell which users have applied the indexes manually. It could be many of them. My proposed change doesn't affect or cause bad impact to all other users. Can you please re-consider? I am happy to create a new PR with the amendment.

@jeremylenz
Copy link
Member Author

My proposed change doesn't affect or cause bad impact to all other users.

Sure, but neither does keeping things the way they are? Even in this PR review above some of the indexes weren't used; I don't think it will hurt anything.

And in principle I don't think it makes sense to make a patch in Katello that only fixes code that users have changed manually.

Please feel free to raise another PR if you like, and we can bring more people in to the discussion. (I'm not the only Katello decision-maker :) But that's the way I feel for now.

@hao-yu
Copy link
Contributor

hao-yu commented Nov 15, 2023

Sure, but neither does keeping things the way they are? Even in this PR review above some of the indexes weren't used; I don't think it will hurt anything.

One of the index created in this patch was using a lot based on the review above.

I am not exactly know how duplicate indexes will affect Postgres. If it will just ignore the duplicate then it is probably fine to leave them there and only some space are wasted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants