You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our Zero-to-JupyterHub 2.0.0 instance running on GKE (1.24.5-gke.600) has failed to start an instance with this error message:
Multi-Attach error for volume "pvc-27f9f2c0-f18d-449f-a9c3-b0d909bd0ea0"
Volume is already exclusively attached to one node and can't be attached to another
The JupyterHub admin interface does not show the instance as running. But looking in GKE the pod is actually running, days after the incident. So the pod was leaked. (We actually have 4 cases of this.)
In the JupyterHub logs I see the following for this user:
[I 2022-12-12 09:15:45.005 JupyterHub roles:238] Adding role user for User: [email protected]
[I 2022-12-12 09:15:45.021 JupyterHub base:810] User logged in: [email protected]
[I 2022-12-12 09:15:45.192 JupyterHub log:186] 302 GET /hub/ -> /hub/spawn ([email protected]@10.128.0.69) 13.12ms
[I 2022-12-12 09:15:45.367 JupyterHub provider:651] Creating oauth client jupyterhub-user-XXXXXX%40mail.tau.ac.il
[I 2022-12-12 09:15:45.387 JupyterHub spawner:2509] Attempting to create pvc claim-XXXXXX-40mail-2etau-2eac-2eil, with timeout 3
[I 2022-12-12 09:15:45.389 JupyterHub log:186] 302 GET /hub/spawn -> /hub/spawn-pending/[email protected] ([email protected]@10.128.0.69) 38.38ms
[I 2022-12-12 09:15:45.423 JupyterHub spawner:2469] Attempting to create pod jupyter-XXXXXX-40mail-2etau-2eac-2eil, with timeout 3
[I 2022-12-12 09:15:45.550 JupyterHub pages:394] [email protected] is pending spawn
[I 2022-12-12 09:15:45.552 JupyterHub log:186] 200 GET /hub/spawn-pending/[email protected] ([email protected]@10.128.0.69) 4.76ms
[I 2022-12-12 09:16:24.943 JupyterHub base:963] User [email protected] took 39.588 seconds to start
[I 2022-12-12 09:16:24.943 JupyterHub proxy:333] Adding user [email protected] to proxy /user/[email protected]/ => http://10.108.1.31:8888
[I 2022-12-12 09:16:24.947 JupyterHub users:749] Server [email protected] is ready
[I 2022-12-12 09:16:24.948 JupyterHub log:186] 200 GET /hub/api/users/[email protected]/server/progress ([email protected]@10.128.0.69) 38939.51ms
[I 2022-12-12 09:16:25.114 JupyterHub log:186] 302 GET /hub/spawn-pending/[email protected] -> /user/[email protected]/ ([email protected]@10.128.0.69) 4.88ms
[I 221212 10:16:40 __init__:236] Culling server [email protected] (inactive for 01:00:14)
[I 2022-12-12 10:16:40.036 JupyterHub proxy:359] Removing user [email protected] from proxy (/user/[email protected]/)
[I 2022-12-12 10:16:40.039 JupyterHub spawner:2780] Deleting pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil
[I 2022-12-12 10:16:43.062 JupyterHub base:1170] User [email protected] server took 3.025 seconds to stop
[I 2022-12-12 10:16:43.063 JupyterHub log:186] 204 DELETE /hub/api/users/XXXXXX%40mail.tau.ac.il/server ([email protected]) 3029.02ms
[I 2022-12-12 15:28:20.375 JupyterHub log:186] 302 GET /user/[email protected]/lab -> /hub/user/[email protected]/lab (@10.128.0.69) 1.01ms
[W 2022-12-12 15:28:20.593 JupyterHub log:186] 424 GET /hub/user/[email protected]/lab ([email protected]@10.128.0.69) 14.60ms
[I 2022-12-12 15:28:23.322 JupyterHub provider:651] Creating oauth client jupyterhub-user-XXXXXX%40mail.tau.ac.il
[I 2022-12-12 15:28:23.346 JupyterHub spawner:2509] Attempting to create pvc claim-XXXXXX-40mail-2etau-2eac-2eil, with timeout 3
[I 2022-12-12 15:28:23.349 JupyterHub log:186] 302 GET /hub/spawn/[email protected]?next=%2Fhub%2Fuser%2FXXXXXX%40mail.tau.ac.il%2Flab -> /hub/spawn-pending/[email protected]?next=%2Fhub%2Fuser%2FXXXXXX%40mail.tau.ac.il%2Flab ([email protected]@10.128.0.69) 46.38ms
[I 2022-12-12 15:28:23.363 JupyterHub spawner:2525] PVC claim-XXXXXX-40mail-2etau-2eac-2eil already exists, so did not create new pvc.
[I 2022-12-12 15:28:23.370 JupyterHub spawner:2469] Attempting to create pod jupyter-XXXXXX-40mail-2etau-2eac-2eil, with timeout 3
[I 2022-12-12 15:28:23.553 JupyterHub pages:394] [email protected] is pending spawn
[I 2022-12-12 15:28:23.556 JupyterHub log:186] 200 GET /hub/spawn-pending/[email protected]?next=%2Fhub%2Fuser%2FXXXXXX%40mail.tau.ac.il%2Flab ([email protected]@10.128.0.69) 9.76ms
[I 2022-12-12 15:31:36.825 JupyterHub log:186] 302 GET /hub/ -> /user/[email protected]/ ([email protected]@10.128.0.69) 8.11ms
[I 2022-12-12 15:31:39.064 JupyterHub log:186] 302 GET /user/[email protected]/files/Go%20to%20LynxKite.html -> /hub/user/[email protected]/files/Go%20to%20LynxKite.html (@10.128.0.69) 1.08ms
[I 2022-12-12 15:31:39.279 JupyterHub log:186] 303 GET /hub/user/[email protected]/files/Go%20to%20LynxKite.html ([email protected]@10.128.0.69) 4.71ms
[I 2022-12-12 15:31:39.502 JupyterHub pages:394] [email protected] is pending spawn
[I 2022-12-12 15:31:39.505 JupyterHub log:186] 200 GET /hub/spawn-pending/[email protected]?next=%2Fhub%2Fuser%2FXXXXXX%40mail.tau.ac.il%2Ffiles%2FGo%2520to%2520LynxKite.html ([email protected]@10.128.0.69) 7.97ms
[W 2022-12-12 15:33:23.346 JupyterHub user:824] [email protected]'s server failed to start in 300 seconds, giving up.
[I 2022-12-12 15:33:23.347 JupyterHub spawner:2780] Deleting pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil
[I 2022-12-12 15:33:26.518 JupyterHub spawner:2780] Deleting pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil
[I 2022-12-12 15:33:29.865 JupyterHub spawner:2780] Deleting pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil
[I 2022-12-12 15:33:33.638 JupyterHub spawner:2780] Deleting pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil
[I 2022-12-12 15:33:37.538 JupyterHub spawner:2780] Deleting pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil
[I 2022-12-12 15:33:41.187 JupyterHub spawner:2780] Deleting pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil
[E 2022-12-12 15:33:41.229 JupyterHub gen:630] Exception in Future <Task finished name='Task-6089322' coro=<KubeSpawner._start() done, defined at /usr/local/lib/python3.9/site-packages/kubespawner/spawner.py:2621> exception=TimeoutError('pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil did not start in 300 seconds!')> after timeout
asyncio.exceptions.TimeoutError: pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil did not start in 300 seconds!
[I 2022-12-12 15:33:44.551 JupyterHub spawner:2780] Deleting pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil
[I 2022-12-12 15:33:44.568 JupyterHub pages:394] [email protected] is pending stop
[I 2022-12-12 15:33:44.570 JupyterHub log:186] 200 GET /hub/spawn-pending/[email protected]?next=%2Fhub%2Fuser%2FXXXXXX%40mail.tau.ac.il%2Flab ([email protected]@10.128.0.69) 11.14ms
[E 2022-12-12 15:34:50.986 JupyterHub app:2461] Failed to poll spawner for [email protected], assuming the spawner is not running.
[W 2022-12-12 15:34:50.987 JupyterHub app:2512] [email protected] appears to have stopped while the Hub was down
[W 2022-12-12 15:41:49.964 JupyterHub user:881] [email protected]'s server never showed up at http://hub-65fd484dc-cbx7j:60937/user/[email protected]/ after 300 seconds. Giving up.
[W 2022-12-12 15:41:49.965 JupyterHub spawner:2258] Pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil url changed! hub-65fd484dc-cbx7j:60937 -> 10.108.20.3:8888
[I 2022-12-12 15:41:49.974 JupyterHub spawner:2780] Deleting pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil
[I 2022-12-12 15:41:53.017 JupyterHub spawner:2780] Deleting pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil
[I 2022-12-12 15:41:56.367 JupyterHub spawner:2780] Deleting pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil
[I 2022-12-12 15:42:00.034 JupyterHub spawner:2780] Deleting pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil
[I 2022-12-12 15:42:03.956 JupyterHub spawner:2780] Deleting pod jh/jupyter-XXXXXX-40mail-2etau-2eac-2eil
(I've replaced the user name with XXXXXX. Sorry for the long block.)
The user's instance first started normally at 9:15. It was shut down for inactivity at 10:16. At 15:28 the user came back and JupyterHub tried to start up the instance again. Startup failed and JupyterHub tried to delete the instance at 15:33. It keeps trying for 10 minutes, but halfway decides that it "appears to have stopped while the Hub was down".
The Hub was down? Yes, looks like it went down at 15:33:
[C 2022-12-12 15:33:43.937 JupyterHub base:1037] Aborting due to 5 consecutive spawn failures
My guess is that restart glitched the cleanup process and the instance was left running. The next time the user tried it they got the "Volume is already exclusively attached" because the volume was attached to the leaked instance.
Expected behaviour
The startup failure may be my fault or GKE had a slow day. But JupyterHub shouldn't leave pods running.
How to reproduce
Seems very difficult. I have saved the JupyterHub logs from the incident. It includes some personal email addresses but I can share it privately.
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋
Bug description
Our Zero-to-JupyterHub 2.0.0 instance running on GKE (1.24.5-gke.600) has failed to start an instance with this error message:
The JupyterHub admin interface does not show the instance as running. But looking in GKE the pod is actually running, days after the incident. So the pod was leaked. (We actually have 4 cases of this.)
In the JupyterHub logs I see the following for this user:
(I've replaced the user name with XXXXXX. Sorry for the long block.)
The user's instance first started normally at 9:15. It was shut down for inactivity at 10:16. At 15:28 the user came back and JupyterHub tried to start up the instance again. Startup failed and JupyterHub tried to delete the instance at 15:33. It keeps trying for 10 minutes, but halfway decides that it "appears to have stopped while the Hub was down".
The Hub was down? Yes, looks like it went down at 15:33:
My guess is that restart glitched the cleanup process and the instance was left running. The next time the user tried it they got the "Volume is already exclusively attached" because the volume was attached to the leaked instance.
Expected behaviour
The startup failure may be my fault or GKE had a slow day. But JupyterHub shouldn't leave pods running.
How to reproduce
Seems very difficult. I have saved the JupyterHub logs from the incident. It includes some personal email addresses but I can share it privately.
Your personal set up
It's a normal Zero-to-JupyterHub on GKE.
Thanks for Zero-to-JupyterHub! It's the best!
The text was updated successfully, but these errors were encountered: