Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add v2 matviews to nightly refresh #2667

Merged
merged 15 commits into from
Jun 30, 2021
Merged

Add v2 matviews to nightly refresh #2667

merged 15 commits into from
Jun 30, 2021

Conversation

rsanheim
Copy link
Contributor

@rsanheim rsanheim commented Jun 29, 2021

Story card: ch3892

In my dev environment, with a large dataset, the new mat views take 20 minutes to refresh...so its pretty long for dev, but not unexpected given this data set size:

}
[2021-06-29T22:11:45.408-05:00] DEBUG:    (308.9ms)  COMMIT
[2021-06-29T22:11:45.409-05:00] INFO: refresh_matviews.all_v2 (1388043.2ms)
{

[1] pry(main)> Patient.count
=> 516,441
[2] pry(main)> BloodPressure.count
=> 1,867,157

@shortcut-integration
Copy link

@rsanheim rsanheim requested a review from a team June 29, 2021 22:49
@rsanheim rsanheim had a problem deploying to simple-review-pr-2667 June 29, 2021 23:59 Failure
Copy link
Contributor

@ssrihari ssrihari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics/measurement looks good. Just added some notes about refreshing the new matviews.

ReportingPipeline::PatientBloodPressuresPerMonth
ReportingPipeline::PatientStatesPerMonth
ReportingPipeline::PatientVisitsPerMonth
].freeze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rsanheim We would only need:

ReportingPipeline::PatientStatesPerMonth.refresh

The cascade: true in the scenic refresh function takes care of refreshing the dependent views.

Copy link
Contributor Author

@rsanheim rsanheim Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cascade doesn't work, and it relies some of its own SQL parsing and regexp that I wouldn't really trust even if it did work for our case 😄 . I'll make sure to remove all the cascade: true options for the new matviews.

I think its fine to know what our dependency chain is and declare it explicitly in the refresh object...its a pretty key part of these matviews and we shouldn't be refreshing them in isolation for most cases anyways. New tests that rely on the matviews can call refresh_v2 for their setup, the same way we had a repeated refresh call for the old ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh well. Thanks for catching that btw! 😅

self.table_name = "reporting_patient_blood_pressures_per_month"
belongs_to :patient
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid adding this model until we actually use it in code? I was even trying to avoid adding the Visits model, but we wanted to unit test that finely. This model's unit tests are already covered in the States model's unit tests.

Copy link
Contributor Author

@rsanheim rsanheim Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was doing that initially, but then I discovered the cascade issue in testing (see https://github.com/simpledotorg/simple-server/pull/2667/files#r661684079) and using this all locally...and I didn't want to add a manual refresh call for just one of our views.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just call scenic's refresh with a given table name without creating a model. I was doing this before I saw the cascade option.

MaterializedPatientSummary.refresh
def refresh_v2
ActiveRecord::Base.transaction do
ActiveRecord::Base.connection.execute("SET LOCAL TIME ZONE '#{tz}'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReportingPipeline::Matview.refresh already takes care of the transaction and the timezone setting. We can just wrap with metrics here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outer transaction is to rollback if any part of this fails I think, which is why we are doing it w/ the other views as well.

The tz piece is also just for safety and consistency w/ the other refresh...the more I think about this the more I think we should just move all the refresh logic here and not have it in the model objects at all. It doesn't make sense to refresh a matview in isolation, as they are all pretty tightly coupled w/ each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it's better to express the dependency tree closer to the model where there's most context about it. Each view can express its own dependencies, that way the refresh logic is in one place. When we change it for tests, we change it for the job as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, you can't really see the dependency tree unless you look at the SQL itself. Abstractions on abstractions here. 😁

I actually think it's better to express the dependency tree closer to the model where there's most context about it. Each view can express its own dependencies, that way the refresh logic is in one place.

I can't think of a way to do that that would be more clear than a method that calls the refresh methods in the order we need, the dependency ordering is pretty clear there. Suggestions welcome though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant having the same list in the patient-state model itself. I'll raise a tiny PR for this perhaps.

@rsanheim rsanheim temporarily deployed to simple-review-pr-2667 June 30, 2021 17:57 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2667 June 30, 2021 18:53 Inactive
@rsanheim rsanheim dismissed ssrihari’s stale review June 30, 2021 18:57

Responded to all feedback - we need to handle the cascade ourselves

@rsanheim rsanheim temporarily deployed to simple-review-pr-2667 June 30, 2021 19:03 Inactive
@@ -19,7 +19,7 @@
patient_registered_13m_ago = Timecop.freeze(13.months.ago) { create(:patient) }
Timecop.freeze(13.months.ago) { create(:blood_pressure, patient: patient_registered_13m_ago) }

described_class.refresh
RefreshMaterializedViews.new.refresh_v2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this isn't too much slower in specs than just refreshing the described class.

@rsanheim rsanheim temporarily deployed to simple-review-pr-2667 June 30, 2021 20:23 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2667 June 30, 2021 20:23 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2667 June 30, 2021 20:24 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2667 June 30, 2021 20:35 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2667 June 30, 2021 20:55 Inactive
@rsanheim rsanheim merged commit 04ece50 into master Jun 30, 2021
@rsanheim rsanheim deleted the add-v2-matviews branch June 30, 2021 21:00
Timecop.freeze("June 30 2021 5:30 UTC") do # June 30th 23:00 IST time
example.run
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is in using relative times. This was always the case, perhaps the reporting tables make the issue more apparent.

My preference would be to use the absolute times for the test as in june_2021 that we setup in the helper, as opposed to relative times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to use the absolute times for the test as in june_2021 that we setup in the helper, as opposed to relative times.

Yeah, I definitely prefer absolute times as well. I think the point I was trying to make is that even with the june_2021 helper, we need to establish an absolute time via freeze, or things will fail like this intermittently.

This spec does use june_2021 helper btw, but that doesn't help when time is a frustrating continually advancing thing. 😃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I'm not sure I follow. What are the timestamps in concern that are advancing? Is it the time at which refresh runs that's the issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I'm not sure I follow. What are the timestamps in concern that are advancing? Is it the time at which refresh runs that's the issue?

The timestamps that are advancing is system time, i.e. Time.current ...without the Timecop.freeze, we would have intermittent failures in this spec depending on the date and time the spec is being run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants