Critical Tasks & Crons should log + alert when interrupted #10206
Labels
Affects: Operations
Affects the IA DevOps folks
Affects: Server
Issues with the server (olweb) or its plugins. [managed]
Lead: @jimchamp
Issues overseen by Jim (Front-end Lead, BookNotes) [managed]
Needs: Response
Issues which require feedback from lead
Needs: Staff / Internal
Reviewed a PR but don't have merge powers? Use this.
Priority: 2
Important, as time permits. [managed]
Type: Epic
A feature or refactor that is big enough to require subissues. [managed]
Type: Feature Request
Issue describes a feature or enhancement we'd like to implement. [managed]
Milestone
Proposal
Now
Later
web logs
cover archival
solr updater
Know that the cron / dump / service started (or was triggered)
We want to be alerted that cron / dumps failed or succeeded
Historical view of how often failures occur (stats.inc?)
Look back at how often restarts attempted / failed
Justification
Problem: What problem does this proposal address & for what audience(s)? -->
Today, when crons or other critical tasks fail, we are often learning about it from patrons rather than workflows.
Impact: What's the predicted impact, how do we measure, & what does success look like?
Some of our biggest sources of value (bots that clean things up for us, like solr restarter or the bot to fix redirects) potentially don't run for weeks on end if broken.
Breakdown
Requirements Checklist
The text was updated successfully, but these errors were encountered: