Fixes #36850 - Add indexes to speed up applicability calculation when there are lots of module streams installed #10773

jeremylenz · 2023-10-20T15:56:43Z

What are the changes introduced in this pull request?

Add indexes to the katello_installed_packages table. This should hopefully speed up applicability calculations when there are a lot of module streams installed.

Considerations taken when implementing this change?

This is really tricky to test when you don't have a lot of hosts. The best testing steps I found are those @hao-yu gave us, I'll adapt them below

What are the testing steps for this pull request?

Steps to Reproduce:

Prepare a RHEL 8 client and register it to the Satellite.
Enable the appstream repository.
Enable and install as many module streams as possible.
After that force upload the package profile to the Satellite to trigger the "Katello::Applicability::Hosts::BulkGenerate" task.

Now, in Rails console, do this both before and after running the migration:

cf = Host.find(XXXX).content_facet
bound_repos = cf.bound_repositories.collect {|repo| repo.library_instance_id.nil? ? repo.id : repo.library_instance_id}
helper = Katello::Applicability::ApplicableContentHelper.new(cf, ::Katello::Rpm, bound_repos)
helper.fetch_enabled_module_stream_ids.size
Benchmark.measure {p helper.locked_modular_installed_packages(helper.fetch_enabled_module_stream_ids).size}

Do the benchmark 5-6 times in a row and take the average.

For my dev setup with a single rhel8 host - Typical benchmarks after the migration:

#<Benchmark::Tms:0x0000562b392fe9b8
 @cstime=0.0,
 @cutime=0.0,
 @label="",
 @real=0.2544070029980503,
 @stime=0.0011980000000000046,
 @total=0.18494600000000003,
 @utime=0.18374800000000002>

Before the migration, the @real was closer to .50 or .41. So not a huge improvement at this scale, but still measurable.

applicability calculation when there are lots of module streams installed

jeremylenz · 2023-10-24T19:48:10Z

[test katello]

lfu · 2023-11-08T20:28:54Z

Before the migration, @real was between 0.50 ~ 0.65.
After migration, it was 0.33 ~ 0.48. The improvement is noticable.

Also checked the DB index usage.

    katello=#   SELECT * FROM pg_stat_user_indexes WHERE relname = 'katello_installed_packages' ORDER BY idx_scan;
     relid | indexrelid | schemaname |          relname           |                    indexrelname                    | idx_scan | idx_tup_read | idx_tup_fetch
    -------+------------+------------+----------------------------+----------------------------------------------------+----------+--------------+---------------
     22221 |     595787 | public     | katello_installed_packages | index_katello_installed_packages_on_nvra           |        0 |            0 |             0
     22221 |     595788 | public     | katello_installed_packages | index_katello_installed_packages_on_nvra_and_epoch |        0 |            0 |             0
     22221 |      22238 | public     | katello_installed_packages | index_katello_installed_packages_on_name_and_nvra  |     9453 |        23575 |           808
     22221 |      22228 | public     | katello_installed_packages | katello_installed_packages_pkey                    |    12799 |       223203 |        223203
     22221 |      23407 | public     | katello_installed_packages | index_katello_installed_packages_on_nvrea          |   208034 |       203624 |        198901
    (5 rows)

After console commands.

    katello=#   SELECT * FROM pg_stat_user_indexes WHERE relname = 'katello_installed_packages' ORDER BY idx_scan;
     relid | indexrelid | schemaname |          relname           |                    indexrelname                    | idx_scan | idx_tup_read | idx_tup_fetch 
    -------+------------+------------+----------------------------+----------------------------------------------------+----------+--------------+---------------
     22221 |     595788 | public     | katello_installed_packages | index_katello_installed_packages_on_nvra_and_epoch |        0 |            0 |             0
     22221 |      22238 | public     | katello_installed_packages | index_katello_installed_packages_on_name_and_nvra  |     9460 |        23585 |           818
     22221 |      22228 | public     | katello_installed_packages | katello_installed_packages_pkey                    |    12807 |       226138 |        226138
     22221 |     595787 | public     | katello_installed_packages | index_katello_installed_packages_on_nvra           |   158186 |         9548 |          9548
     22221 |      23407 | public     | katello_installed_packages | index_katello_installed_packages_on_nvrea          |   211280 |       206870 |        202147
    (5 rows)

jeremylenz · 2023-11-08T20:34:02Z

[test katello]

jeremylenz · 2023-11-09T14:42:20Z

🍏

lfu

Index does improve the performance 😄

hao-yu · 2023-11-15T01:54:17Z

@jeremylenz Can I propose this change? The reason is we have already given the solution to some customers by creating the indexes manually. As the created index name are not the same, duplicate indexes will be created. Thus, I am proposing to drop the old indexes and recreate them as below.

class CreateIndexesForNvraAndEpoch < ActiveRecord::Migration[6.1]
  def up
    if index_exists?(:katello_installed_packages, :nvra)
      remove_index :katello_installed_packages, :nvra
    end

    if index_exists?(:katello_installed_packages, [:nvra, :epoch])
      remove_index :katello_installed_packages, [:nvra, :epoch]
    end

    add_index :katello_installed_packages, :nvra
    add_index :katello_installed_packages, [:nvra, :epoch]
  end

  def down
    remove_index :katello_installed_packages, :nvra
    remove_index :katello_installed_packages, [:nvra, :epoch]
  end
end

jeremylenz · 2023-11-15T15:03:33Z

@jeremylenz Can I propose this change? The reason is we have already given the solution to some customers by creating the indexes manually. As the created index name are not the same, duplicate indexes will be created. Thus, I am proposing to drop the old indexes and recreate them as below.

Hey @hao-yu

Since this already-merged PR will benefit all users and not just the specific users you've created the indexes for manually, I would suggest giving this code to those specific users.

(edit: also I don't really see any harm in having an unused index, either :)

hao-yu · 2023-11-15T17:09:57Z

@jeremylenz we are not able to tell which users have applied the indexes manually. It could be many of them. My proposed change doesn't affect or cause bad impact to all other users. Can you please re-consider? I am happy to create a new PR with the amendment.

jeremylenz · 2023-11-15T17:21:59Z

My proposed change doesn't affect or cause bad impact to all other users.

Sure, but neither does keeping things the way they are? Even in this PR review above some of the indexes weren't used; I don't think it will hurt anything.

And in principle I don't think it makes sense to make a patch in Katello that only fixes code that users have changed manually.

Please feel free to raise another PR if you like, and we can bring more people in to the discussion. (I'm not the only Katello decision-maker :) But that's the way I feel for now.

hao-yu · 2023-11-15T17:55:04Z

Sure, but neither does keeping things the way they are? Even in this PR review above some of the indexes weren't used; I don't think it will hurt anything.

One of the index created in this patch was using a lot based on the review above.

I am not exactly know how duplicate indexes will affect Postgres. If it will just ignore the duplicate then it is probably fine to leave them there and only some space are wasted.

Fixes #36850 - Add indexes to speed up

b805d0f

applicability calculation when there are lots of module streams installed

pr-processor bot added the Not yet reviewed label Oct 20, 2023

lfu approved these changes Nov 9, 2023

View reviewed changes

pr-processor bot removed the Not yet reviewed label Nov 9, 2023

jeremylenz merged commit d2baeb2 into Katello:master Nov 9, 2023
6 checks passed

hao-yu mentioned this pull request Nov 17, 2023

Fixes #36850 - Avoid duplicate database indexes #10805

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #36850 - Add indexes to speed up applicability calculation when there are lots of module streams installed #10773

Fixes #36850 - Add indexes to speed up applicability calculation when there are lots of module streams installed #10773

jeremylenz commented Oct 20, 2023

jeremylenz commented Oct 24, 2023

lfu commented Nov 8, 2023

jeremylenz commented Nov 8, 2023

jeremylenz commented Nov 9, 2023

lfu left a comment

hao-yu commented Nov 15, 2023

jeremylenz commented Nov 15, 2023 •

edited

Loading

hao-yu commented Nov 15, 2023

jeremylenz commented Nov 15, 2023

hao-yu commented Nov 15, 2023 •

edited

Loading

Fixes #36850 - Add indexes to speed up applicability calculation when there are lots of module streams installed #10773

Fixes #36850 - Add indexes to speed up applicability calculation when there are lots of module streams installed #10773

Conversation

jeremylenz commented Oct 20, 2023

What are the changes introduced in this pull request?

Considerations taken when implementing this change?

What are the testing steps for this pull request?

jeremylenz commented Oct 24, 2023

lfu commented Nov 8, 2023

jeremylenz commented Nov 8, 2023

jeremylenz commented Nov 9, 2023

lfu left a comment

Choose a reason for hiding this comment

hao-yu commented Nov 15, 2023

jeremylenz commented Nov 15, 2023 • edited Loading

hao-yu commented Nov 15, 2023

jeremylenz commented Nov 15, 2023

hao-yu commented Nov 15, 2023 • edited Loading

jeremylenz commented Nov 15, 2023 •

edited

Loading

hao-yu commented Nov 15, 2023 •

edited

Loading