-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add v2 matviews to nightly refresh #2667
Conversation
This pull request has been linked to Clubhouse Story #3892: Wire up controlled & uncontrolled reports from pipeline via a toggle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics/measurement looks good. Just added some notes about refreshing the new matviews.
ReportingPipeline::PatientBloodPressuresPerMonth | ||
ReportingPipeline::PatientStatesPerMonth | ||
ReportingPipeline::PatientVisitsPerMonth | ||
].freeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rsanheim We would only need:
ReportingPipeline::PatientStatesPerMonth.refresh
The cascade: true
in the scenic refresh function takes care of refreshing the dependent views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cascade doesn't work, and it relies some of its own SQL parsing and regexp that I wouldn't really trust even if it did work for our case 😄 . I'll make sure to remove all the cascade: true
options for the new matviews.
I think its fine to know what our dependency chain is and declare it explicitly in the refresh object...its a pretty key part of these matviews and we shouldn't be refreshing them in isolation for most cases anyways. New tests that rely on the matviews can call refresh_v2
for their setup, the same way we had a repeated refresh
call for the old ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh well. Thanks for catching that btw! 😅
self.table_name = "reporting_patient_blood_pressures_per_month" | ||
belongs_to :patient | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid adding this model until we actually use it in code? I was even trying to avoid adding the Visits
model, but we wanted to unit test that finely. This model's unit tests are already covered in the States
model's unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was doing that initially, but then I discovered the cascade issue in testing (see https://github.com/simpledotorg/simple-server/pull/2667/files#r661684079) and using this all locally...and I didn't want to add a manual refresh call for just one of our views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also just call scenic's refresh with a given table name without creating a model. I was doing this before I saw the cascade option.
MaterializedPatientSummary.refresh | ||
def refresh_v2 | ||
ActiveRecord::Base.transaction do | ||
ActiveRecord::Base.connection.execute("SET LOCAL TIME ZONE '#{tz}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReportingPipeline::Matview.refresh
already takes care of the transaction and the timezone setting. We can just wrap with metrics here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outer transaction is to rollback if any part of this fails I think, which is why we are doing it w/ the other views as well.
The tz
piece is also just for safety and consistency w/ the other refresh...the more I think about this the more I think we should just move all the refresh logic here and not have it in the model objects at all. It doesn't make sense to refresh a matview in isolation, as they are all pretty tightly coupled w/ each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think it's better to express the dependency tree closer to the model where there's most context about it. Each view can express its own dependencies, that way the refresh logic is in one place. When we change it for tests, we change it for the job as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, you can't really see the dependency tree unless you look at the SQL itself. Abstractions on abstractions here. 😁
I actually think it's better to express the dependency tree closer to the model where there's most context about it. Each view can express its own dependencies, that way the refresh logic is in one place.
I can't think of a way to do that that would be more clear than a method that calls the refresh
methods in the order we need, the dependency ordering is pretty clear there. Suggestions welcome though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just meant having the same list in the patient-state model itself. I'll raise a tiny PR for this perhaps.
Responded to all feedback - we need to handle the cascade ourselves
…ple-server into add-v2-matviews
@@ -19,7 +19,7 @@ | |||
patient_registered_13m_ago = Timecop.freeze(13.months.ago) { create(:patient) } | |||
Timecop.freeze(13.months.ago) { create(:blood_pressure, patient: patient_registered_13m_ago) } | |||
|
|||
described_class.refresh | |||
RefreshMaterializedViews.new.refresh_v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this isn't too much slower in specs than just refreshing the described class.
Timecop.freeze("June 30 2021 5:30 UTC") do # June 30th 23:00 IST time | ||
example.run | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is in using relative times. This was always the case, perhaps the reporting tables make the issue more apparent.
My preference would be to use the absolute times for the test as in june_2021
that we setup in the helper, as opposed to relative times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be to use the absolute times for the test as in
june_2021
that we setup in the helper, as opposed to relative times.
Yeah, I definitely prefer absolute times as well. I think the point I was trying to make is that even with the june_2021
helper, we need to establish an absolute time via freeze
, or things will fail like this intermittently.
This spec does use june_2021
helper btw, but that doesn't help when time is a frustrating continually advancing thing. 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, I'm not sure I follow. What are the timestamps in concern that are advancing? Is it the time at which refresh runs that's the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, I'm not sure I follow. What are the timestamps in concern that are advancing? Is it the time at which refresh runs that's the issue?
The timestamps that are advancing is system time, i.e. Time.current
...without the Timecop.freeze
, we would have intermittent failures in this spec depending on the date and time the spec is being run.
Story card: ch3892
In my dev environment, with a large dataset, the new mat views take 20 minutes to refresh...so its pretty long for dev, but not unexpected given this data set size: