-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve timing of checks depending on changes since last check #163
base: main
Are you sure you want to change the base?
Conversation
715a695
to
739ef30
Compare
32ac865
to
87d2c52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what was the use of the temporary table, was it to make sure that it's threadsafe?
Other than a small nit, this looks good to me ;)
Also, didn't investigate why, but it seems you've got a failing test. |
# Conflicts: # CHANGELOG.md
776e189
to
463f229
Compare
# Conflicts: # CHANGELOG.md
# Conflicts: # CHANGELOG.md
# Conflicts: # CHANGELOG.md # udata_hydra/analysis/resource.py # udata_hydra/crawl/check_resources.py
# Conflicts: # CHANGELOG.md
# Conflicts: # CHANGELOG.md
# Conflicts: # CHANGELOG.md
# Conflicts: # CHANGELOG.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR! I like the logic!
I think we may want to test this PR in our dev env and monitor the next_check_at
values? Since it's quite difficult to test locally actually
Maybe we want to add some tests!
udata_hydra/crawl/select_batch.py
Outdated
AND catalog.last_check = checks.id | ||
AND checks.created_at <= $1 | ||
AND checks.next_check_at >= $1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually nice that this logic is kept simple with the next_check_at strategy! 👏
Thanks for the review. I wanted to confirm the logic first, now currently fixing/adding tests - in the same PR. |
5b1053d
to
50c7931
Compare
50c7931
to
d58d4af
Compare
7062507
to
2fd106a
Compare
2fd106a
to
9e0f626
Compare
Closes datagouv/data.gouv.fr#1312.
Add
CHECK_DELAYS
config var: list of delays (̶p̶o̶s̶t̶g̶r̶e̶s̶q̶l̶ ̶i̶n̶t̶e̶r̶v̶a̶l̶ ̶s̶y̶n̶t̶a̶x̶)̶ between two checks, if the resource has not been modifiedAdd a migration to add a
next_check_at
datetime column to thechecks
tableAdd logic to calculate
next_check_at
datetime when processing the check, depending on the date f the previous check and if it has changed since then, in a dedicatedcalculate_next_check_date
methodModify logic for
select_batch
in order to select resources that have a related last check that have an expirednext check
dateRefactor
store_last_modified_date
method to also store the next check date, and rename it toupdate_check_with_modification_and_next_dates
Rename
process_check_data
topreprocess_check_data
for more clarity[OUTDATED] First iteration using a SQL query
̶-̶ ̶R̶e̶m̶o̶v̶e̶s̶ ̶S̶Q̶L̶ ̶q̶u̶e̶r̶y̶ ̶w̶r̶a̶p̶p̶e̶r̶ ̶
̶s̶e̶l̶e̶c̶t̶_̶r̶o̶w̶s̶_̶b̶a̶s̶e̶d̶_̶o̶n̶_̶q̶u̶e̶r̶y̶
̶ ̶w̶h̶i̶c̶h̶ ̶w̶a̶s̶ ̶u̶s̶e̶d̶ ̶t̶o̶ ̶c̶r̶e̶a̶t̶e̶ ̶a̶ ̶t̶e̶m̶p̶o̶r̶a̶r̶y̶ ̶t̶a̶b̶l̶e̶ ̶a̶n̶d̶ ̶t̶e̶m̶p̶o̶r̶a̶r̶i̶l̶y̶ ̶m̶a̶r̶k̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶b̶e̶i̶n̶g̶ ̶c̶r̶a̶w̶l̶e̶d̶ ̶a̶s̶ ̶"̶c̶r̶a̶w̶l̶i̶n̶g̶"̶.̶̶S̶i̶n̶c̶e̶ ̶w̶e̶ ̶u̶p̶d̶a̶t̶e̶ ̶t̶h̶e̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶ ̶s̶t̶a̶t̶u̶s̶ ̶i̶n̶ ̶t̶h̶e̶ ̶c̶h̶e̶c̶k̶ ̶c̶o̶d̶e̶ ̶e̶l̶s̶e̶w̶h̶e̶r̶e̶,̶ ̶i̶t̶ ̶s̶h̶o̶u̶l̶d̶ ̶n̶o̶t̶ ̶b̶e̶ ̶n̶e̶c̶e̶s̶s̶a̶r̶y̶ ̶a̶n̶y̶m̶o̶r̶e̶ ̶t̶o̶ ̶h̶a̶v̶e̶ ̶t̶h̶i̶s̶ ̶t̶e̶m̶p̶o̶r̶a̶r̶y̶ ̶t̶a̶b̶l̶e̶.̶
̶̶̶O̶n̶ ̶r̶e̶v̶i̶e̶w̶,̶ ̶p̶l̶e̶a̶s̶e̶ ̶d̶o̶u̶b̶l̶e̶-̶c̶h̶e̶c̶k̶ ̶t̶h̶e̶ ̶v̶a̶l̶i̶d̶i̶t̶y̶ ̶o̶f̶ ̶t̶h̶i̶s̶ ̶r̶e̶m̶o̶v̶a̶l̶̶̶
̶-̶ ̶A̶d̶d̶ ̶
̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶_̶D̶E̶F̶A̶U̶L̶T̶
̶ ̶c̶o̶n̶f̶i̶g̶ ̶v̶a̶r̶:̶ ̶t̶i̶m̶e̶ ̶a̶f̶t̶e̶r̶ ̶w̶h̶i̶c̶h̶ ̶a̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶ ̶m̶u̶s̶t̶ ̶b̶e̶ ̶c̶h̶e̶c̶k̶ ̶i̶f̶ ̶w̶e̶ ̶d̶o̶n̶'̶t̶ ̶k̶n̶o̶w̶ ̶a̶b̶o̶u̶t̶ ̶i̶t̶s̶ ̶l̶a̶s̶t̶ ̶m̶o̶d̶i̶f̶i̶c̶a̶t̶i̶o̶n̶ ̶d̶a̶t̶e̶ ̶(̶r̶e̶p̶l̶a̶c̶e̶s̶ ̶t̶h̶e̶ ̶p̶r̶e̶v̶i̶o̶u̶s̶ ̶̶S̶I̶N̶C̶E̶
̶ ̶c̶o̶n̶f̶i̶g̶ ̶v̶a̶r̶)̶̶-̶ ̶U̶p̶d̶a̶t̶e̶ ̶t̶h̶e̶ ̶
̶s̶e̶l̶e̶c̶t̶_̶b̶a̶t̶c̶h̶_̶r̶e̶s̶o̶u̶r̶c̶e̶s̶_̶t̶o̶_̶c̶h̶e̶c̶k̶
̶ ̶m̶e̶t̶h̶o̶d̶ ̶w̶i̶t̶h̶ ̶t̶h̶e̶ ̶f̶o̶l̶l̶o̶w̶i̶n̶g̶ ̶l̶o̶g̶i̶c̶:̶̶ ̶ ̶ ̶ ̶1̶.̶ ̶F̶i̶r̶s̶t̶ ̶a̶d̶d̶s̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶i̶t̶h̶ ̶p̶r̶i̶o̶r̶i̶t̶y̶=̶T̶r̶u̶e̶ ̶t̶o̶ ̶t̶h̶e̶ ̶b̶a̶t̶c̶h̶ ̶(̶_̶t̶h̶i̶s̶ ̶d̶o̶e̶s̶n̶'̶t̶ ̶c̶h̶a̶n̶g̶e̶_̶)̶
̶ ̶ ̶ ̶ ̶2̶.̶ ̶T̶h̶e̶n̶ ̶a̶d̶d̶s̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶i̶t̶h̶o̶u̶t̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶(̶=̶w̶i̶t̶h̶ ̶n̶o̶ ̶c̶h̶e̶c̶k̶,̶ ̶n̶e̶v̶e̶r̶ ̶h̶a̶v̶e̶ ̶b̶e̶e̶n̶ ̶c̶h̶e̶c̶k̶e̶d̶)̶ ̶t̶o̶ ̶t̶h̶e̶ ̶b̶a̶t̶c̶h̶ ̶(̶_̶t̶h̶i̶s̶ ̶d̶o̶e̶s̶n̶'̶t̶ ̶c̶h̶a̶n̶g̶e̶_̶)̶
̶ ̶ ̶ ̶ ̶3̶.̶ ̶T̶h̶e̶n̶,̶ ̶i̶f̶ ̶t̶h̶e̶ ̶t̶o̶t̶a̶l̶ ̶n̶u̶m̶b̶e̶r̶ ̶o̶f̶ ̶s̶e̶l̶e̶c̶t̶e̶d̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶i̶s̶ ̶s̶t̶i̶l̶l̶ ̶l̶e̶s̶s̶ ̶t̶h̶a̶n̶ ̶t̶h̶e̶ ̶b̶a̶t̶c̶h̶ ̶s̶i̶z̶e̶:̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶3̶.̶1̶.̶ ̶a̶d̶d̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶h̶i̶c̶h̶ ̶d̶o̶n̶'̶t̶ ̶h̶a̶v̶e̶ ̶a̶t̶ ̶l̶e̶a̶s̶t̶ ̶t̶w̶o̶ ̶c̶h̶e̶c̶k̶s̶ ̶y̶e̶t̶,̶ ̶a̶n̶d̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶i̶s̶ ̶o̶l̶d̶e̶r̶ ̶t̶h̶a̶n̶ ̶
̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶_̶D̶E̶F̶A̶U̶L̶T̶
̶̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶3̶.̶2̶.̶ ̶a̶d̶d̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶h̶i̶c̶h̶ ̶a̶t̶ ̶l̶e̶a̶s̶t̶ ̶o̶n̶e̶ ̶t̶h̶e̶ ̶t̶h̶e̶i̶r̶ ̶t̶w̶o̶ ̶m̶o̶s̶t̶ ̶r̶e̶c̶e̶n̶t̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶s̶ ̶h̶a̶s̶ ̶a̶n̶ ̶u̶n̶k̶n̶o̶w̶n̶ ̶m̶o̶d̶i̶f̶i̶e̶d̶ ̶d̶a̶t̶e̶ ̶(̶n̶o̶ ̶
̶d̶e̶t̶e̶c̶t̶e̶d̶_̶l̶a̶s̶t̶_̶m̶o̶d̶i̶f̶i̶e̶d̶_̶a̶t̶
̶,̶ ̶s̶o̶ ̶w̶e̶ ̶c̶a̶n̶n̶o̶t̶ ̶c̶o̶m̶p̶a̶r̶e̶ ̶w̶i̶t̶h̶ ̶a̶n̶o̶t̶h̶e̶r̶ ̶c̶h̶e̶c̶k̶)̶,̶ ̶a̶n̶d̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶i̶s̶ ̶o̶l̶d̶e̶r̶ ̶t̶h̶a̶n̶ ̶̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶_̶D̶E̶F̶A̶U̶L̶T̶
̶̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶3̶.̶3̶.̶ ̶a̶d̶d̶ ̶r̶e̶s̶s̶o̶u̶r̶c̶e̶s̶ ̶t̶o̶ ̶w̶h̶i̶c̶h̶ ̶t̶h̶e̶i̶r̶ ̶t̶w̶o̶ ̶m̶o̶s̶t̶ ̶r̶e̶c̶e̶n̶t̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶s̶ ̶h̶a̶v̶e̶ ̶c̶h̶a̶n̶g̶e̶d̶,̶ ̶a̶n̶d̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶i̶s̶ ̶o̶l̶d̶e̶r̶ ̶t̶h̶a̶n̶ ̶
̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶S̶[̶0̶]̶
̶ ̶(̶t̶h̶i̶s̶ ̶i̶s̶ ̶t̶o̶ ̶r̶e̶-̶c̶h̶e̶c̶k̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶t̶h̶a̶t̶ ̶w̶e̶ ̶k̶n̶o̶w̶ ̶i̶t̶ ̶h̶a̶s̶ ̶c̶h̶a̶n̶g̶e̶d̶)̶̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶3̶.̶4̶.̶ ̶a̶d̶d̶ ̶r̶e̶s̶s̶o̶u̶r̶c̶e̶s̶ ̶t̶o̶ ̶w̶h̶i̶c̶h̶ ̶t̶h̶e̶i̶r̶ ̶l̶a̶s̶t̶ ̶t̶w̶o̶ ̶c̶h̶e̶c̶k̶s̶ ̶h̶a̶v̶e̶ ̶n̶o̶t̶ ̶c̶h̶a̶n̶g̶e̶d̶,̶ ̶t̶h̶e̶ ̶t̶w̶o̶ ̶c̶h̶e̶c̶k̶s̶ ̶h̶a̶v̶e̶ ̶d̶o̶n̶e̶ ̶b̶e̶t̶w̶e̶e̶n̶ ̶t̶w̶o̶ ̶d̶e̶l̶a̶y̶s̶ ̶i̶n̶ ̶
̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶S̶
̶,̶ ̶a̶n̶d̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶i̶s̶ ̶o̶l̶d̶e̶r̶ ̶t̶h̶a̶n̶ ̶t̶h̶e̶ ̶s̶a̶m̶e̶ ̶d̶e̶l̶a̶y̶ ̶i̶n̶ ̶̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶S̶
̶ ̶(̶t̶h̶i̶s̶ ̶i̶s̶ ̶i̶n̶ ̶o̶r̶d̶e̶r̶ ̶t̶o̶ ̶a̶v̶o̶i̶d̶ ̶c̶h̶e̶c̶k̶i̶n̶g̶ ̶t̶o̶o̶ ̶o̶f̶t̶e̶n̶ ̶t̶h̶e̶ ̶s̶a̶m̶e̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶ ̶w̶h̶i̶c̶h̶ ̶d̶o̶e̶s̶n̶'̶t̶ ̶c̶h̶a̶n̶g̶e̶)̶ ̶̶
̶-̶ ̶A̶d̶a̶p̶t̶ ̶t̶h̶e̶ ̶w̶a̶y̶ ̶w̶e̶ ̶c̶o̶u̶n̶t̶ ̶
̶p̶e̶n̶d̶i̶n̶g̶_̶c̶h̶e̶c̶k̶s̶
̶ ̶a̶n̶d̶ ̶̶f̶r̶e̶s̶h̶_̶c̶h̶e̶c̶k̶s̶
̶ ̶i̶n̶ ̶t̶h̶e̶ ̶̶/̶a̶p̶i̶/̶s̶t̶a̶t̶u̶s̶/̶c̶r̶a̶w̶l̶e̶r̶
̶ ̶e̶n̶d̶p̶o̶i̶n̶t̶ ̶r̶e̶s̶p̶o̶n̶s̶e̶:̶̶ ̶ ̶ ̶ ̶-̶ ̶
̶p̶e̶n̶d̶i̶n̶g̶_̶c̶h̶e̶c̶k̶s̶
̶ ̶a̶r̶e̶ ̶t̶h̶e̶ ̶n̶u̶m̶b̶e̶r̶ ̶o̶f̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶i̶t̶h̶ ̶n̶o̶ ̶c̶h̶e̶c̶k̶,̶ ̶p̶l̶u̶s̶ ̶t̶h̶e̶ ̶n̶u̶m̶b̶e̶r̶ ̶w̶i̶t̶h̶ ̶o̶u̶t̶d̶a̶t̶e̶d̶ ̶c̶h̶e̶c̶k̶s̶ ̶(̶o̶u̶t̶d̶a̶t̶e̶d̶ ̶c̶h̶e̶c̶k̶s̶ ̶a̶r̶e̶ ̶n̶o̶w̶ ̶c̶o̶u̶n̶t̶e̶d̶ ̶d̶i̶f̶f̶e̶r̶e̶n̶t̶l̶y̶,̶ ̶d̶e̶p̶e̶n̶d̶i̶n̶g̶ ̶o̶n̶ ̶t̶h̶e̶ ̶d̶e̶l̶a̶y̶s̶)̶̶ ̶ ̶ ̶ ̶ ̶-̶ ̶
̶f̶r̶e̶s̶h̶_̶c̶h̶e̶c̶k̶s̶
̶ ̶a̶r̶e̶ ̶t̶h̶e̶ ̶n̶u̶m̶b̶e̶r̶ ̶o̶f̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶i̶t̶h̶ ̶a̶ ̶c̶h̶e̶c̶k̶,̶ ̶m̶i̶n̶u̶s̶ ̶t̶h̶e̶ ̶n̶u̶m̶b̶e̶r̶ ̶w̶i̶t̶h̶ ̶o̶u̶t̶d̶a̶t̶e̶d̶ ̶c̶h̶e̶c̶k̶s̶ ̶(̶o̶u̶t̶d̶a̶t̶e̶d̶ ̶c̶h̶e̶c̶k̶s̶ ̶a̶r̶e̶ ̶n̶o̶w̶ ̶c̶o̶u̶n̶t̶e̶d̶ ̶d̶i̶f̶f̶e̶r̶e̶n̶t̶l̶y̶,̶ ̶d̶e̶p̶e̶n̶d̶i̶n̶g̶ ̶o̶n̶ ̶t̶h̶e̶ ̶d̶e̶l̶a̶y̶s̶)̶