Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add v2 matviews to nightly refresh #2667

Merged
merged 15 commits into from
Jun 30, 2021
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion app/models/reporting_pipeline/matview.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ class Matview < ActiveRecord::Base
def self.refresh
ActiveRecord::Base.transaction do
ActiveRecord::Base.connection.execute("SET LOCAL TIME ZONE '#{Period::REPORTING_TIME_ZONE}'")
Scenic.database.refresh_materialized_view(table_name, concurrently: false, cascade: true)
Scenic.database.refresh_materialized_view(table_name, concurrently: false, cascade: false)
end
end
end
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
module ReportingPipeline
class PatientBloodPressuresPerMonth < Matview
self.table_name = "reporting_patient_blood_pressures_per_month"
belongs_to :patient
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid adding this model until we actually use it in code? I was even trying to avoid adding the Visits model, but we wanted to unit test that finely. This model's unit tests are already covered in the States model's unit tests.

Copy link
Contributor Author

@rsanheim rsanheim Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was doing that initially, but then I discovered the cascade issue in testing (see https://github.com/simpledotorg/simple-server/pull/2667/files#r661684079) and using this all locally...and I didn't want to add a manual refresh call for just one of our views.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just call scenic's refresh with a given table name without creating a model. I was doing this before I saw the cascade option.

66 changes: 38 additions & 28 deletions app/services/refresh_materialized_views.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,11 @@ def benchmark_and_statsd(operation)
end

def call
benchmark_and_statsd("all") do
refresh
benchmark_and_statsd("all_v1") do
refresh_v1
end
benchmark_and_statsd("all_v2") do
refresh_v2
end
end

Expand All @@ -39,37 +42,44 @@ def self.tz

delegate :tz, :set_last_updated_at, to: self

def refresh
# LatestBloodPressuresPerPatientPerMonth should be refreshed before
# LatestBloodPressuresPerPatientPerQuarter and LatestBloodPressuresPerPatient
V1_MATVIEWS = %w[
LatestBloodPressuresPerPatientPerMonth
LatestBloodPressuresPerPatient
LatestBloodPressuresPerPatientPerQuarter
BloodPressuresPerFacilityPerDay
PatientRegistrationsPerDayPerFacility
MaterializedPatientSummary
].freeze
V2_MATVIEWS = %w[
ReportingPipeline::PatientBloodPressuresPerMonth
ReportingPipeline::PatientVisitsPerMonth
ReportingPipeline::PatientStatesPerMonth
].freeze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rsanheim We would only need:

ReportingPipeline::PatientStatesPerMonth.refresh

The cascade: true in the scenic refresh function takes care of refreshing the dependent views.

Copy link
Contributor Author

@rsanheim rsanheim Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cascade doesn't work, and it relies some of its own SQL parsing and regexp that I wouldn't really trust even if it did work for our case 😄 . I'll make sure to remove all the cascade: true options for the new matviews.

I think its fine to know what our dependency chain is and declare it explicitly in the refresh object...its a pretty key part of these matviews and we shouldn't be refreshing them in isolation for most cases anyways. New tests that rely on the matviews can call refresh_v2 for their setup, the same way we had a repeated refresh call for the old ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh well. Thanks for catching that btw! 😅


# LatestBloodPressuresPerPatientPerMonth should be refreshed before
# LatestBloodPressuresPerPatientPerQuarter and LatestBloodPressuresPerPatient
def refresh_v1
ActiveRecord::Base.transaction do
ActiveRecord::Base.connection.execute("SET LOCAL TIME ZONE '#{tz}'")

benchmark_and_statsd("LatestBloodPressuresPerPatientPerMonth") do
LatestBloodPressuresPerPatientPerMonth.refresh
end

benchmark_and_statsd("LatestBloodPressuresPerPatient") do
LatestBloodPressuresPerPatient.refresh
end

benchmark_and_statsd("LatestBloodPressuresPerPatientPerQuarter") do
LatestBloodPressuresPerPatientPerQuarter.refresh
end

benchmark_and_statsd("BloodPressuresPerFacilityPerDay") do
BloodPressuresPerFacilityPerDay.refresh
end

benchmark_and_statsd("PatientRegistrationsPerDayPerFacility") do
PatientRegistrationsPerDayPerFacility.refresh
V1_MATVIEWS.each do |name|
benchmark_and_statsd(name) do
klass = name.constantize
klass.refresh
end
end
set_last_updated_at
end
end

benchmark_and_statsd("MaterializedPatientSummary") do
MaterializedPatientSummary.refresh
def refresh_v2
ActiveRecord::Base.transaction do
ActiveRecord::Base.connection.execute("SET LOCAL TIME ZONE '#{tz}'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReportingPipeline::Matview.refresh already takes care of the transaction and the timezone setting. We can just wrap with metrics here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outer transaction is to rollback if any part of this fails I think, which is why we are doing it w/ the other views as well.

The tz piece is also just for safety and consistency w/ the other refresh...the more I think about this the more I think we should just move all the refresh logic here and not have it in the model objects at all. It doesn't make sense to refresh a matview in isolation, as they are all pretty tightly coupled w/ each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it's better to express the dependency tree closer to the model where there's most context about it. Each view can express its own dependencies, that way the refresh logic is in one place. When we change it for tests, we change it for the job as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, you can't really see the dependency tree unless you look at the SQL itself. Abstractions on abstractions here. 😁

I actually think it's better to express the dependency tree closer to the model where there's most context about it. Each view can express its own dependencies, that way the refresh logic is in one place.

I can't think of a way to do that that would be more clear than a method that calls the refresh methods in the order we need, the dependency ordering is pretty clear there. Suggestions welcome though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant having the same list in the patient-state model itself. I'll raise a tiny PR for this perhaps.

V2_MATVIEWS.each do |name|
benchmark_and_statsd(name) do
klass = name.constantize
klass.refresh
end
end

set_last_updated_at
end
end
end
40 changes: 26 additions & 14 deletions spec/models/reporting_pipeline/patient_states_per_month_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,17 @@
it { should belong_to(:patient) }
end

around do |example|
# We need to enforce a known time for this test, otherwise we will have intermittent failures. For example,
# if we use live system time, many of these specs will fail after 18:30 UTC (ie 14:30 ET) when on the last day of a month,
# because that falls into the next day in IST (our reporting time zone). So to prevent confusing failures for
# developers or CI during North American afternoons, we freeze to a time that will be the end of the month for
# UTC, ET, and IST. Timezones! 🤯
Timecop.freeze("June 30 2021 5:30 UTC") do # June 30th 23:00 IST time
example.run
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is in using relative times. This was always the case, perhaps the reporting tables make the issue more apparent.

My preference would be to use the absolute times for the test as in june_2021 that we setup in the helper, as opposed to relative times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to use the absolute times for the test as in june_2021 that we setup in the helper, as opposed to relative times.

Yeah, I definitely prefer absolute times as well. I think the point I was trying to make is that even with the june_2021 helper, we need to establish an absolute time via freeze, or things will fail like this intermittently.

This spec does use june_2021 helper btw, but that doesn't help when time is a frustrating continually advancing thing. 😃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I'm not sure I follow. What are the timestamps in concern that are advancing? Is it the time at which refresh runs that's the issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I'm not sure I follow. What are the timestamps in concern that are advancing? Is it the time at which refresh runs that's the issue?

The timestamps that are advancing is system time, i.e. Time.current ...without the Timecop.freeze, we would have intermittent failures in this spec depending on the date and time the spec is being run.


context "indicators" do
describe "htn_care_state" do
it "marks a dead patient dead" do
Expand All @@ -19,7 +30,7 @@
patient_registered_13m_ago = Timecop.freeze(13.months.ago) { create(:patient) }
Timecop.freeze(13.months.ago) { create(:blood_pressure, patient: patient_registered_13m_ago) }

described_class.refresh
RefreshMaterializedViews.new.refresh_v2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this isn't too much slower in specs than just refreshing the described class.

with_reporting_time_zones do
expect(described_class
.where(htn_care_state: "lost_to_follow_up", month_date: Date.current.beginning_of_month)
Expand All @@ -34,7 +45,7 @@
patient_registered_12m_ago = Timecop.freeze(12.months.ago) { create(:patient) }
patient_registered_11m_ago = Timecop.freeze(11.months.ago) { create(:patient) }

described_class.refresh
RefreshMaterializedViews.new.refresh_v2
with_reporting_time_zones do
expect(described_class
.where(htn_care_state: "lost_to_follow_up", month_date: Date.current.beginning_of_month)
Expand All @@ -56,7 +67,7 @@
patient_with_recent_bp = Timecop.freeze(13.months.ago) { create(:patient) }
Timecop.freeze(11.months.ago) { create(:blood_pressure, patient: patient_with_recent_bp) }

described_class.refresh
RefreshMaterializedViews.new.refresh_v2
with_reporting_time_zones do
expect(described_class
.where(htn_care_state: "lost_to_follow_up", month_date: Date.current.beginning_of_month)
Expand All @@ -75,7 +86,7 @@
create(:blood_pressure, patient: under_care_patient, recorded_at: june_2021[:under_12_months_ago])
create(:blood_pressure, patient: ltfu_patient, recorded_at: june_2021[:over_12_months_ago])

described_class.refresh
RefreshMaterializedViews.new.refresh_v2

with_reporting_time_zones do
expect(described_class
Expand All @@ -100,7 +111,8 @@
create(:blood_pressure, patient: under_care_patient, recorded_at: june_2021[:end_of_month] - 1.minute)
create(:blood_pressure, patient: ltfu_patient, recorded_at: june_2021[:end_of_month] + 1.minute)

described_class.refresh
RefreshMaterializedViews.new.refresh_v2

with_reporting_time_zones do
expect(described_class
.where(htn_care_state: "lost_to_follow_up", month_date: june_2021[:beginning_of_month])
Expand All @@ -121,7 +133,7 @@
under_care_patient = create(:patient, recorded_at: june_2021[:under_12_months_ago])
ltfu_patient = create(:patient, recorded_at: june_2021[:over_12_months_ago])

described_class.refresh
RefreshMaterializedViews.new.refresh_v2
with_reporting_time_zones do
expect(described_class
.where(htn_care_state: "lost_to_follow_up", month_date: june_2021[:beginning_of_month])
Expand All @@ -147,7 +159,7 @@
patient_2 = create(:patient, recorded_at: june_2021[:long_ago])
create(:encounter, patient: patient_2, encountered_on: june_2021[:under_3_months_ago])
patient_3 = create(:patient, recorded_at: june_2021[:long_ago])
described_class.refresh
RefreshMaterializedViews.new.refresh_v2

with_reporting_time_zones do
expect(described_class.where(htn_treatment_outcome_in_last_3_months: "missed_visit", month_date: june_2021[:now]).pluck(:patient_id))
Expand Down Expand Up @@ -180,7 +192,7 @@
facility: patient_with_no_bp.registration_facility,
patient: patient_with_no_bp,
user: patient_with_no_bp.registration_user)
described_class.refresh
RefreshMaterializedViews.new.refresh_v2

with_reporting_time_zones do
expect(described_class.where(htn_treatment_outcome_in_last_3_months: "visited_no_bp", month_date: june_2021[:now]).pluck(:patient_id))
Expand All @@ -200,7 +212,7 @@
patient_bp_over_3_months = create(:patient, recorded_at: june_2021[:long_ago])
create(:blood_pressure, :with_encounter, patient: patient_bp_over_3_months, recorded_at: june_2021[:over_3_months_ago])

described_class.refresh
RefreshMaterializedViews.new.refresh_v2

with_reporting_time_zones do
expect(described_class.where(htn_treatment_outcome_in_last_3_months: "controlled", month_date: june_2021[:now]).pluck(:patient_id))
Expand All @@ -220,7 +232,7 @@
patient_3 = create(:patient, recorded_at: june_2021[:now])
patient_4 = create(:patient, recorded_at: june_2021[:over_3_months_ago])

described_class.refresh
RefreshMaterializedViews.new.refresh_v2
with_reporting_time_zones do
expect(described_class.find_by(patient_id: patient_1.id, month_string: june_2021[:month_string]).months_since_registration).to eq 11
expect(described_class.find_by(patient_id: patient_2.id, month_string: june_2021[:month_string]).months_since_registration).to eq 12
Expand All @@ -243,7 +255,7 @@

patient = create(:patient, registration_facility: registration_facility, assigned_facility: assigned_facility)

described_class.refresh
RefreshMaterializedViews.new.refresh_v2

with_reporting_time_zones do
patient_state = described_class.find_by(patient_id: patient.id, month_string: june_2021[:month_string])
Expand Down Expand Up @@ -275,7 +287,7 @@

patient = create(:patient, registration_facility: registration_facility, assigned_facility: assigned_facility)

described_class.refresh
RefreshMaterializedViews.new.refresh_v2

with_reporting_time_zones do
patient_state = described_class.find_by(patient_id: patient.id, month_string: june_2021[:month_string])
Expand Down Expand Up @@ -306,7 +318,7 @@

patient_no_bp = create(:patient, recorded_at: june_2021[:long_ago])

described_class.refresh
RefreshMaterializedViews.new.refresh_v2

with_reporting_time_zones do
controlled_state = described_class.find_by(patient_id: patient_controlled.id, month_string: june_2021[:month_string])
Expand Down Expand Up @@ -348,7 +360,7 @@ def patient_states(patient, from: nil, to: nil)
create(:prescription_drug, patient: patient, device_created_at: eight_months_ago)
create(:blood_pressure, :with_encounter, patient: patient, recorded_at: five_months_ago, systolic: 140, diastolic: 90)

described_class.refresh
RefreshMaterializedViews.new.refresh_v2

with_reporting_time_zones do
expect(patient_states(patient).pluck(:months_since_registration)).to eq((0..24).to_a)
Expand Down
14 changes: 13 additions & 1 deletion spec/services/refresh_materialized_views_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@
time = Time.current
# Just adding enough data to smoke test this; we test these views
# more thoroughly via various reporting specs
create_list(:blood_pressure, 2)

expect {
Timecop.freeze(time) do
create_list(:blood_pressure, 2)
RefreshMaterializedViews.call
end
}.to change { LatestBloodPressuresPerPatientPerMonth.count }.from(0).to(2)
Expand All @@ -37,4 +37,16 @@
.and change { PatientRegistrationsPerDayPerFacility.count }.from(0).to(2)
.and change { RefreshMaterializedViews.last_updated_at }.from(nil).to(time)
end

it "updates v2 matviews" do
time = Time.current
expect {
Timecop.freeze(time) do
create_list(:blood_pressure, 2)
RefreshMaterializedViews.call
end
}.to change { ReportingPipeline::PatientBloodPressuresPerMonth.count }.from(0).to(2)
.and change { ReportingPipeline::PatientStatesPerMonth.count }.from(0).to(2)
.and change { ReportingPipeline::PatientVisitsPerMonth.count }.from(0).to(2)
end
end