Improve logging and handling of transient 500 errors #1563
Labels
bug
devops
building, running, deploying, environment stuff, handy utils, repository-related, engineer QoL, etc
logs and monitoring
logging, monitoring, alerting
mysql
mysql database related
From time to time, API server processes can to get into a strange state where their db connections seem to be closed or otherwise invalid in some way, and this causes 500s. Thankfully this can be remedied by restarting the API server jobs, but it would be better if we could handle it automatically.
We cannot (yet!) explain what gets us into this condition, but we can identify it through log entries emitted at
delphi-epidata/src/server/_common.py
Line 166 in 17ce59d
... Those log entries include a long stack trace that boils down to
We should do a couple additional things where we currently log that:
AttributeError: 'NoneType' object has no attribute 'cursor'
" here)g.db
, that includes fields for its.closed
and.invalidated
attributes. The.info
and.connection
attributes might prove useful too.If it turns out that the db connections in these cases are indeed closed (and perhaps even if not), we can check for a closed connection in the
_get_db()
convenience method and then re-open it.Also, almost all of the instances where
DatabaseException
is raised are in smalltry
blocks around.run_query()
calls -- we should just haverun_query
raise the exception itself (here) and remove thetry
blocks in those instances.The text was updated successfully, but these errors were encountered: