Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve timing of checks depending on changes since last check #163

Open
wants to merge 85 commits into
base: main
Choose a base branch
from

Conversation

bolinocroustibat
Copy link
Contributor

@bolinocroustibat bolinocroustibat commented Sep 10, 2024

Closes datagouv/data.gouv.fr#1312.

  • Add CHECK_DELAYS config var: list of delays (̶p̶o̶s̶t̶g̶r̶e̶s̶q̶l̶ ̶i̶n̶t̶e̶r̶v̶a̶l̶ ̶s̶y̶n̶t̶a̶x̶)̶ between two checks, if the resource has not been modified

  • Add a migration to add a next_check_at datetime column to the checks table

  • Add logic to calculate next_check_at datetime when processing the check, depending on the date f the previous check and if it has changed since then, in a dedicated calculate_next_check_datemethod

  • Modify logic for select_batch in order to select resources that have a related last check that have an expired next check date

  • Refactor store_last_modified_date method to also store the next check date, and rename it to update_check_with_modification_and_next_dates

  • Rename process_check_data to preprocess_check_data for more clarity

[OUTDATED] First iteration using a SQL query

̶-̶ ̶R̶e̶m̶o̶v̶e̶s̶ ̶S̶Q̶L̶ ̶q̶u̶e̶r̶y̶ ̶w̶r̶a̶p̶p̶e̶r̶ ̶̶s̶e̶l̶e̶c̶t̶_̶r̶o̶w̶s̶_̶b̶a̶s̶e̶d̶_̶o̶n̶_̶q̶u̶e̶r̶y̶̶ ̶w̶h̶i̶c̶h̶ ̶w̶a̶s̶ ̶u̶s̶e̶d̶ ̶t̶o̶ ̶c̶r̶e̶a̶t̶e̶ ̶a̶ ̶t̶e̶m̶p̶o̶r̶a̶r̶y̶ ̶t̶a̶b̶l̶e̶ ̶a̶n̶d̶ ̶t̶e̶m̶p̶o̶r̶a̶r̶i̶l̶y̶ ̶m̶a̶r̶k̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶b̶e̶i̶n̶g̶ ̶c̶r̶a̶w̶l̶e̶d̶ ̶a̶s̶ ̶"̶c̶r̶a̶w̶l̶i̶n̶g̶"̶.̶
̶S̶i̶n̶c̶e̶ ̶w̶e̶ ̶u̶p̶d̶a̶t̶e̶ ̶t̶h̶e̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶ ̶s̶t̶a̶t̶u̶s̶ ̶i̶n̶ ̶t̶h̶e̶ ̶c̶h̶e̶c̶k̶ ̶c̶o̶d̶e̶ ̶e̶l̶s̶e̶w̶h̶e̶r̶e̶,̶ ̶i̶t̶ ̶s̶h̶o̶u̶l̶d̶ ̶n̶o̶t̶ ̶b̶e̶ ̶n̶e̶c̶e̶s̶s̶a̶r̶y̶ ̶a̶n̶y̶m̶o̶r̶e̶ ̶t̶o̶ ̶h̶a̶v̶e̶ ̶t̶h̶i̶s̶ ̶t̶e̶m̶p̶o̶r̶a̶r̶y̶ ̶t̶a̶b̶l̶e̶.̶
̶̶̶O̶n̶ ̶r̶e̶v̶i̶e̶w̶,̶ ̶p̶l̶e̶a̶s̶e̶ ̶d̶o̶u̶b̶l̶e̶-̶c̶h̶e̶c̶k̶ ̶t̶h̶e̶ ̶v̶a̶l̶i̶d̶i̶t̶y̶ ̶o̶f̶ ̶t̶h̶i̶s̶ ̶r̶e̶m̶o̶v̶a̶l̶̶̶

̶-̶ ̶A̶d̶d̶ ̶̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶_̶D̶E̶F̶A̶U̶L̶T̶̶ ̶c̶o̶n̶f̶i̶g̶ ̶v̶a̶r̶:̶ ̶t̶i̶m̶e̶ ̶a̶f̶t̶e̶r̶ ̶w̶h̶i̶c̶h̶ ̶a̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶ ̶m̶u̶s̶t̶ ̶b̶e̶ ̶c̶h̶e̶c̶k̶ ̶i̶f̶ ̶w̶e̶ ̶d̶o̶n̶'̶t̶ ̶k̶n̶o̶w̶ ̶a̶b̶o̶u̶t̶ ̶i̶t̶s̶ ̶l̶a̶s̶t̶ ̶m̶o̶d̶i̶f̶i̶c̶a̶t̶i̶o̶n̶ ̶d̶a̶t̶e̶ ̶(̶r̶e̶p̶l̶a̶c̶e̶s̶ ̶t̶h̶e̶ ̶p̶r̶e̶v̶i̶o̶u̶s̶ ̶̶S̶I̶N̶C̶E̶̶ ̶c̶o̶n̶f̶i̶g̶ ̶v̶a̶r̶)̶

̶-̶ ̶U̶p̶d̶a̶t̶e̶ ̶t̶h̶e̶ ̶̶s̶e̶l̶e̶c̶t̶_̶b̶a̶t̶c̶h̶_̶r̶e̶s̶o̶u̶r̶c̶e̶s̶_̶t̶o̶_̶c̶h̶e̶c̶k̶̶ ̶m̶e̶t̶h̶o̶d̶ ̶w̶i̶t̶h̶ ̶t̶h̶e̶ ̶f̶o̶l̶l̶o̶w̶i̶n̶g̶ ̶l̶o̶g̶i̶c̶:̶
̶ ̶ ̶ ̶ ̶1̶.̶ ̶F̶i̶r̶s̶t̶ ̶a̶d̶d̶s̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶i̶t̶h̶ ̶p̶r̶i̶o̶r̶i̶t̶y̶=̶T̶r̶u̶e̶ ̶t̶o̶ ̶t̶h̶e̶ ̶b̶a̶t̶c̶h̶ ̶(̶_̶t̶h̶i̶s̶ ̶d̶o̶e̶s̶n̶'̶t̶ ̶c̶h̶a̶n̶g̶e̶_̶)̶
̶ ̶ ̶ ̶ ̶2̶.̶ ̶T̶h̶e̶n̶ ̶a̶d̶d̶s̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶i̶t̶h̶o̶u̶t̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶(̶=̶w̶i̶t̶h̶ ̶n̶o̶ ̶c̶h̶e̶c̶k̶,̶ ̶n̶e̶v̶e̶r̶ ̶h̶a̶v̶e̶ ̶b̶e̶e̶n̶ ̶c̶h̶e̶c̶k̶e̶d̶)̶ ̶t̶o̶ ̶t̶h̶e̶ ̶b̶a̶t̶c̶h̶ ̶(̶_̶t̶h̶i̶s̶ ̶d̶o̶e̶s̶n̶'̶t̶ ̶c̶h̶a̶n̶g̶e̶_̶)̶
̶ ̶ ̶ ̶ ̶3̶.̶ ̶T̶h̶e̶n̶,̶ ̶i̶f̶ ̶t̶h̶e̶ ̶t̶o̶t̶a̶l̶ ̶n̶u̶m̶b̶e̶r̶ ̶o̶f̶ ̶s̶e̶l̶e̶c̶t̶e̶d̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶i̶s̶ ̶s̶t̶i̶l̶l̶ ̶l̶e̶s̶s̶ ̶t̶h̶a̶n̶ ̶t̶h̶e̶ ̶b̶a̶t̶c̶h̶ ̶s̶i̶z̶e̶:̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶3̶.̶1̶.̶ ̶a̶d̶d̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶h̶i̶c̶h̶ ̶d̶o̶n̶'̶t̶ ̶h̶a̶v̶e̶ ̶a̶t̶ ̶l̶e̶a̶s̶t̶ ̶t̶w̶o̶ ̶c̶h̶e̶c̶k̶s̶ ̶y̶e̶t̶,̶ ̶a̶n̶d̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶i̶s̶ ̶o̶l̶d̶e̶r̶ ̶t̶h̶a̶n̶ ̶̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶_̶D̶E̶F̶A̶U̶L̶T̶̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶3̶.̶2̶.̶ ̶a̶d̶d̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶h̶i̶c̶h̶ ̶a̶t̶ ̶l̶e̶a̶s̶t̶ ̶o̶n̶e̶ ̶t̶h̶e̶ ̶t̶h̶e̶i̶r̶ ̶t̶w̶o̶ ̶m̶o̶s̶t̶ ̶r̶e̶c̶e̶n̶t̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶s̶ ̶h̶a̶s̶ ̶a̶n̶ ̶u̶n̶k̶n̶o̶w̶n̶ ̶m̶o̶d̶i̶f̶i̶e̶d̶ ̶d̶a̶t̶e̶ ̶(̶n̶o̶ ̶̶d̶e̶t̶e̶c̶t̶e̶d̶_̶l̶a̶s̶t̶_̶m̶o̶d̶i̶f̶i̶e̶d̶_̶a̶t̶̶,̶ ̶s̶o̶ ̶w̶e̶ ̶c̶a̶n̶n̶o̶t̶ ̶c̶o̶m̶p̶a̶r̶e̶ ̶w̶i̶t̶h̶ ̶a̶n̶o̶t̶h̶e̶r̶ ̶c̶h̶e̶c̶k̶)̶,̶ ̶a̶n̶d̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶i̶s̶ ̶o̶l̶d̶e̶r̶ ̶t̶h̶a̶n̶ ̶̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶_̶D̶E̶F̶A̶U̶L̶T̶̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶3̶.̶3̶.̶ ̶a̶d̶d̶ ̶r̶e̶s̶s̶o̶u̶r̶c̶e̶s̶ ̶t̶o̶ ̶w̶h̶i̶c̶h̶ ̶t̶h̶e̶i̶r̶ ̶t̶w̶o̶ ̶m̶o̶s̶t̶ ̶r̶e̶c̶e̶n̶t̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶s̶ ̶h̶a̶v̶e̶ ̶c̶h̶a̶n̶g̶e̶d̶,̶ ̶a̶n̶d̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶i̶s̶ ̶o̶l̶d̶e̶r̶ ̶t̶h̶a̶n̶ ̶̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶S̶[̶0̶]̶̶ ̶(̶t̶h̶i̶s̶ ̶i̶s̶ ̶t̶o̶ ̶r̶e̶-̶c̶h̶e̶c̶k̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶t̶h̶a̶t̶ ̶w̶e̶ ̶k̶n̶o̶w̶ ̶i̶t̶ ̶h̶a̶s̶ ̶c̶h̶a̶n̶g̶e̶d̶)̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶3̶.̶4̶.̶ ̶a̶d̶d̶ ̶r̶e̶s̶s̶o̶u̶r̶c̶e̶s̶ ̶t̶o̶ ̶w̶h̶i̶c̶h̶ ̶t̶h̶e̶i̶r̶ ̶l̶a̶s̶t̶ ̶t̶w̶o̶ ̶c̶h̶e̶c̶k̶s̶ ̶h̶a̶v̶e̶ ̶n̶o̶t̶ ̶c̶h̶a̶n̶g̶e̶d̶,̶ ̶t̶h̶e̶ ̶t̶w̶o̶ ̶c̶h̶e̶c̶k̶s̶ ̶h̶a̶v̶e̶ ̶d̶o̶n̶e̶ ̶b̶e̶t̶w̶e̶e̶n̶ ̶t̶w̶o̶ ̶d̶e̶l̶a̶y̶s̶ ̶i̶n̶ ̶̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶S̶̶,̶ ̶a̶n̶d̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶i̶s̶ ̶o̶l̶d̶e̶r̶ ̶t̶h̶a̶n̶ ̶t̶h̶e̶ ̶s̶a̶m̶e̶ ̶d̶e̶l̶a̶y̶ ̶i̶n̶ ̶̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶S̶̶ ̶(̶t̶h̶i̶s̶ ̶i̶s̶ ̶i̶n̶ ̶o̶r̶d̶e̶r̶ ̶t̶o̶ ̶a̶v̶o̶i̶d̶ ̶c̶h̶e̶c̶k̶i̶n̶g̶ ̶t̶o̶o̶ ̶o̶f̶t̶e̶n̶ ̶t̶h̶e̶ ̶s̶a̶m̶e̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶ ̶w̶h̶i̶c̶h̶ ̶d̶o̶e̶s̶n̶'̶t̶ ̶c̶h̶a̶n̶g̶e̶)̶ ̶
̶
̶-̶ ̶A̶d̶a̶p̶t̶ ̶t̶h̶e̶ ̶w̶a̶y̶ ̶w̶e̶ ̶c̶o̶u̶n̶t̶ ̶̶p̶e̶n̶d̶i̶n̶g̶_̶c̶h̶e̶c̶k̶s̶̶ ̶a̶n̶d̶ ̶̶f̶r̶e̶s̶h̶_̶c̶h̶e̶c̶k̶s̶̶ ̶i̶n̶ ̶t̶h̶e̶ ̶̶/̶a̶p̶i̶/̶s̶t̶a̶t̶u̶s̶/̶c̶r̶a̶w̶l̶e̶r̶̶ ̶e̶n̶d̶p̶o̶i̶n̶t̶ ̶r̶e̶s̶p̶o̶n̶s̶e̶:̶
̶ ̶ ̶ ̶ ̶-̶ ̶̶p̶e̶n̶d̶i̶n̶g̶_̶c̶h̶e̶c̶k̶s̶̶ ̶a̶r̶e̶ ̶t̶h̶e̶ ̶n̶u̶m̶b̶e̶r̶ ̶o̶f̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶i̶t̶h̶ ̶n̶o̶ ̶c̶h̶e̶c̶k̶,̶ ̶p̶l̶u̶s̶ ̶t̶h̶e̶ ̶n̶u̶m̶b̶e̶r̶ ̶w̶i̶t̶h̶ ̶o̶u̶t̶d̶a̶t̶e̶d̶ ̶c̶h̶e̶c̶k̶s̶ ̶(̶o̶u̶t̶d̶a̶t̶e̶d̶ ̶c̶h̶e̶c̶k̶s̶ ̶a̶r̶e̶ ̶n̶o̶w̶ ̶c̶o̶u̶n̶t̶e̶d̶ ̶d̶i̶f̶f̶e̶r̶e̶n̶t̶l̶y̶,̶ ̶d̶e̶p̶e̶n̶d̶i̶n̶g̶ ̶o̶n̶ ̶t̶h̶e̶ ̶d̶e̶l̶a̶y̶s̶)̶
̶ ̶ ̶ ̶ ̶ ̶-̶ ̶̶f̶r̶e̶s̶h̶_̶c̶h̶e̶c̶k̶s̶̶ ̶a̶r̶e̶ ̶t̶h̶e̶ ̶n̶u̶m̶b̶e̶r̶ ̶o̶f̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶i̶t̶h̶ ̶a̶ ̶c̶h̶e̶c̶k̶,̶ ̶m̶i̶n̶u̶s̶ ̶t̶h̶e̶ ̶n̶u̶m̶b̶e̶r̶ ̶w̶i̶t̶h̶ ̶o̶u̶t̶d̶a̶t̶e̶d̶ ̶c̶h̶e̶c̶k̶s̶ ̶(̶o̶u̶t̶d̶a̶t̶e̶d̶ ̶c̶h̶e̶c̶k̶s̶ ̶a̶r̶e̶ ̶n̶o̶w̶ ̶c̶o̶u̶n̶t̶e̶d̶ ̶d̶i̶f̶f̶e̶r̶e̶n̶t̶l̶y̶,̶ ̶d̶e̶p̶e̶n̶d̶i̶n̶g̶ ̶o̶n̶ ̶t̶h̶e̶ ̶d̶e̶l̶a̶y̶s̶)̶

@bolinocroustibat bolinocroustibat changed the title docs: update changelog improve unavalaible resources management Sep 10, 2024
@bolinocroustibat bolinocroustibat self-assigned this Sep 10, 2024
@bolinocroustibat bolinocroustibat changed the title improve unavalaible resources management Improve unavalaible resources management Sep 11, 2024
@bolinocroustibat bolinocroustibat marked this pull request as ready for review September 12, 2024 12:10
@bolinocroustibat bolinocroustibat changed the title Improve unavalaible resources management Improve management of unavalaible resources with variable delays between two checks Sep 12, 2024
Copy link
Contributor

@magopian magopian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what was the use of the temporary table, was it to make sure that it's threadsafe?

Other than a small nit, this looks good to me ;)

udata_hydra/config_default.toml Outdated Show resolved Hide resolved
@magopian
Copy link
Contributor

Also, didn't investigate why, but it seems you've got a failing test.

@bolinocroustibat bolinocroustibat changed the title Improve management of unavalaible resources with variable delays between two checks Improve timing of checks depending on resources last modification date Sep 18, 2024
@bolinocroustibat bolinocroustibat changed the title feat: improve timing of checks depending on resources changes feat: improve timing of checks depending on resource/check changes Nov 6, 2024
@bolinocroustibat bolinocroustibat changed the title feat: improve timing of checks depending on resource/check changes feat: improve timing of checks depending on changes since last check Nov 6, 2024
Copy link
Contributor

@maudetes maudetes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR! I like the logic!

I think we may want to test this PR in our dev env and monitor the next_check_at values? Since it's quite difficult to test locally actually

Maybe we want to add some tests!

udata_hydra/analysis/resource.py Show resolved Hide resolved
AND catalog.last_check = checks.id
AND checks.created_at <= $1
AND checks.next_check_at >= $1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually nice that this logic is kept simple with the next_check_at strategy! 👏

@bolinocroustibat
Copy link
Contributor Author

Thank you for this PR! I like the logic!

I think we may want to test this PR in our dev env and monitor the next_check_at values? Since it's quite difficult to test locally actually

Maybe we want to add some tests!

Thanks for the review. I wanted to confirm the logic first, now currently fixing/adding tests - in the same PR.

@bolinocroustibat bolinocroustibat force-pushed the unavailable-resources-management branch 2 times, most recently from 5b1053d to 50c7931 Compare November 22, 2024 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[hydra] Improve management of unavalaible resources with variable delays between two checks
3 participants