-
-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add side_by_side mode to update_materialized_view #387
base: main
Are you sure you want to change the base?
Conversation
This adds a `side_by_side` kwarg to the `update_materialized_view` method, which builds the new view alongside the old one and then atomically swaps them to reduce downtime at the cost of increasing disk usage. It is plumbed through to migrations as a hash value for the `materialized` kwarg of `update_view`.
it's somewhat maddening that this repository doesn't have a working |
We also implemented something similar in a project using Scenic, but we had some issues with the indexes of the view you are replacing: you should drop them in the current view before adding the swap one, otherwise when you rename the view you'll have conflicts in the naming. |
Ah, good point. I could probably modify this to rename all the indices, too… |
Actually, this won't cause conflicts because the indexes on the new view get created after the swap, and the drop at the end of the swap removes the conflicting indexes. That being said, building the indexes after the drop means that the indexes are built while holding the access exclusive lock, which probably is not a good idea. |
@calebhearth @derekprior Gentle ping on this, do you need anything else for this PR to be merged? This would really help some Mastodon materialized view changes :) |
Any changes you'd like me to make to this? |
Just adding another nudge here |
Anything you'd like me to do to help move this forward? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some issues that we're likely to encounter with this in the wild that require some more thought.
I also would like to encapsulate the functionality a bit differently but I don't have and easily explainable idea at the moment. Best bet may be to tackle the outstanding questions to get to a mergeable idea and then we can refactor after that.
lib/scenic/adapters/postgres.rb
Outdated
@@ -155,17 +155,36 @@ def create_materialized_view(name, sql_definition, no_data: false) | |||
# @param no_data [Boolean] Default: false. Set to true to create | |||
# materialized view without running the associated query. You will need | |||
# to perform a non-concurrent refresh to populate with data. | |||
# @param side_by_side [Boolean] Default: false. Set to true to create the | |||
# new version under a different name and atomically swap them, limiting | |||
# downtime at the cost of doubling disk usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be more precise than downtime
. We limit the amount of time the view is inaccessible, perhaps?
lib/scenic/adapters/postgres.rb
Outdated
def update_materialized_view( | ||
name, sql_definition, no_data: false, side_by_side: false | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know hound is not going to like this due to line length, but I don't care....
def update_materialized_view( | |
name, sql_definition, no_data: false, side_by_side: false | |
) | |
def update_materialized_view(name, sql_definition, no_data: false, side_by_side: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd need to also add some comments to make rubocop be quiet. I don't mind doing so (and actually set Metrics/LineLength
to 120 on all my projects anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess rubocop isn't automatically run (and the config file is remarkably old; at least <1.7 and broken with current rubocop versions) so I won't worry about it
lib/scenic/adapters/postgres.rb
Outdated
# | ||
# @raise [MaterializedViewsNotSupportedError] if the version of Postgres | ||
# in use does not support materialized views. | ||
# | ||
# @return [void] | ||
def update_materialized_view(name, sql_definition, no_data: false) | ||
def update_materialized_view( | ||
name, sql_definition, no_data: false, side_by_side: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure side_by_side
is the name I'd opt for here but I'm lacking a better suggetion at the moment other than background
which I don't love either.
In your experience, is side_by_side
an established name for this process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "online" might be a more established name but I was trying to think of something that was very specific about what it did
) do | ||
create_materialized_view(new_name, sql_definition, no_data: no_data) | ||
end | ||
rename_materialized_view(name, old_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to drop
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I did it this way out of the assumption that we might commit after the rename (but before the drop) and pause to give other threads a chance to finish using that view before we drop it, to reduce the lock impact. I didn't actually build that, of course.
lib/scenic/adapters/postgres.rb
Outdated
create_materialized_view(name, sql_definition, no_data: no_data) | ||
if side_by_side | ||
session_id = Time.now.to_i | ||
new_name = "#{name}_new_#{session_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postgres identifiers are limited to 63 characters. We're removing 15 characters from that allotment already. This will likely be problematic for some view names. What does Rails do when, for example, a generated index name would be too long? We may need to access that functionality (or copy it) here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC they use a truncated hash of the generated components.
The name only matters for debugging, so I'd be fine with doing that. Something like
new_name = "#{name}_new_#{session_id}"
if new_name.size > 63
# session_id is 10c, _new_ is 5c; leave lots of space free
new_name = "#{Digest::SHA256.hexdigest(name)[...40]}_new_#{session_id}"
end
def on_side_by_side(name, new_table_name, temporary_id) | ||
indexes = Indexes.new(connection: connection).on(name) | ||
indexes.each_with_index do |index, i| | ||
old_name = "predrop_index_#{temporary_id}_#{i}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it cuts 24 characters off the alloted 63 for postgres identifiers (see earlier comment). We may need to think more about how to lessen the chance this is an issue for folks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one doesn't actually have the index name in it (it just looks like predrop_index_1705535993_1
). I'm pretty sure I did it that way because I hit the index size limit. :-D
@@ -271,9 +304,19 @@ def refresh_dependencies_for(name, concurrently: false) | |||
name, | |||
self, | |||
connection, | |||
concurrently: concurrently, | |||
concurrently: concurrently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/TrailingCommaInArguments: Put a comma after the last parameter of a multiline method call.
if side_by_side | ||
session_id = Time.now.to_i | ||
new_name = generate_name name, "new_#{session_id}" | ||
drop_name = generate_name name, "drop_#{session_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lint/UselessAssignment: Useless assignment to variable - drop_name.
# | ||
# @raise [MaterializedViewsNotSupportedError] if the version of Postgres | ||
# in use does not support materialized views. | ||
# | ||
# @return [void] | ||
def update_materialized_view(name, sql_definition, no_data: false) | ||
def update_materialized_view(name, sql_definition, no_data: false, side_by_side: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics/LineLength: Line is too long. [93/80]
@@ -155,17 +157,34 @@ def create_materialized_view(name, sql_definition, no_data: false) | |||
# @param no_data [Boolean] Default: false. Set to true to create | |||
# materialized view without running the associated query. You will need | |||
# to perform a non-concurrent refresh to populate with data. | |||
# @param side_by_side [Boolean] Default: false. Set to true to create the | |||
# new version under a different name and atomically swap them, limiting | |||
# the time that a view is inaccessible at the cost of doubling disk usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics/LineLength: Line is too long. [81/80]
#{sql_definition.rstrip.chomp(';')} | ||
#{'WITH NO DATA' if no_data}; | ||
#{sql_definition.rstrip.chomp(";")} | ||
#{"WITH NO DATA" if no_data}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/StringLiteralsInInterpolation: Prefer single-quoted strings inside interpolations.
@@ -137,8 +139,8 @@ def create_materialized_view(name, sql_definition, no_data: false) | |||
|
|||
execute <<-SQL | |||
CREATE MATERIALIZED VIEW #{quote_table_name(name)} AS | |||
#{sql_definition.rstrip.chomp(';')} | |||
#{'WITH NO DATA' if no_data}; | |||
#{sql_definition.rstrip.chomp(";")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/StringLiteralsInInterpolation: Prefer single-quoted strings inside interpolations.
if rails_version == "master" | ||
rails_constraint = { github: "rails/rails" } | ||
rails_constraint = if rails_version == "master" | ||
{github: "rails/rails"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/IndentationWidth: Use 2 (not -17) spaces for indentation.
Layout/SpaceInsideHashLiteralBraces: Space inside { missing.
Layout/SpaceInsideHashLiteralBraces: Space inside } missing.
Rebasing this after #397 will likely get rid of hound noise in this PR. |
This adds a
side_by_side
kwarg to theupdate_materialized_view
method, which builds the new view alongside the old one and then atomically swaps them to reduce downtime at the cost of increasing disk usage. It is plumbed through to migrations as a hash value for thematerialized
kwarg ofupdate_view
.Fixes #386.