-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
O+M 2024-06-24 #4799
Comments
NewRelic shows catalog-web performed better after #4708. For example: transaction time reduced from ~1100ms to ~400 ms:Apdex score:Web request timeout percentage dropped from ~1% to ~0.1%:POSTGRES DB tracking_summary SELECT query throughput:tracking_summary SELECT query dropped from the top to no.6 in most time consuming ranking: |
Set the following harvest source to manual schedule until source url is fixed.
|
Starting from June 5, 2024, our harvest agent is blocked by Institute of Museum and Library Services' web server, harvest source /harvest/imls-json can not be harvested. |
Have reached out to the contact addresses I have for IMLS. |
Since 2024-06-25 03:50 EDT googlebot started to send nonsense traffic to catalog, doubling the total requests catalog receives, and doubling the catalog-web CPU usage. If this trends continues, we might have to block certain traffic based on the request pattern. Details in slack discussion. |
Reduced prod catalog-web instances from 5 to 3. Mem from 850M to 800M. The following two PRs save us 2050M memory. |
Change harvest sources their original schedules for those that were paused due to repeated ParentNotHarvestedException error.
|
As part of day-to-day operation of Data.gov, there are many Operation and Maintenance (O&M) responsibilities. Instead of having the entire team watching notifications and risking some notifications slipping through the cracks, we have created an O&M Triage role. One person on the team is assigned the Triage role which rotates each sprint. This is not meant to be a 24/7 responsibility, only East Coast business hours. If you are unavailable, please note when you will be unavailable in Slack and ask for someone to take on the role for that time.
Check the O&M Rotation Schedule for future planning.
Acceptance criteria
You are responsible for all O&M responsibilities this week. We've highlighted a few so they're not forgotten. You can copy each checklist into your daily report.
Daily Checklist
Weekly Checklist
Monthly Checklist
ad-hoc checklist
Reference
The text was updated successfully, but these errors were encountered: