-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GESIS BinderHub server was accumulating Running pods that were more than 1 day old #2686
Comments
OVH is seeing this, too. I suspect it's a recent update to jupyterhub/zero-to-jupyterhub that's causing something to get missed. Two categories of problem to track down:
|
Thanks for the information covering OVH. |
I've looked through some logs, and OVH definitely has quite a few orphan pods. So I think this is a change in kubespawner that's making it possible to leave orphaned pods, likely failing to clean up after a failed start (hard to say precisely, because OVH has no log retention, so we can only look back into the very recent past). OVH is showing occasional reflector failure events, which may well be related, because deleting a pod not in the reflector will skip the deletion. Unfortunately, JupyterHub doesn't give Spawners a hook to look for orphaned resources. Here's a notebook to collect and view (and clean up, if you want) orphaned pods on a cluster. |
Around 2023-06-21 17:15 CEST, we launch a cron job to avoid this problem.
Need further investigation to discover the source of the problem.
The text was updated successfully, but these errors were encountered: