Datalab won't connect to VM instance after long time waiting for it to be reachable at port 8081 #2124

miguel2488 · 2019-03-15T12:35:54Z

Hi,

i've been working on this for days, and have read a lot in google about this issue. Although i couldn't find anything to help me solving it.

The case is that i have created a datalab instance via the gcloud shell like this:

datalab create --image-name c2-deeplearning-tf-1-13-cu100-20190227 --disk-size-gb 100 --machine-type n1-standard-8 my-instance --network-name my-net-01 --zone europe-west1-b

it all works fine, i'm asked to create a passphrase, rsa keys are propagated and then, i got this message of death:

Waiting for Datalab to be reachable at http://localhost:8081/

I can SSH to the vm instance using the button to the right, or using gcloud compute ssh instance. No problems with that.

Running the datalab connect command passing --ssh-log-level=debug i got thousands of messages like this one:

It walks through all the ports trying to connect to the 8081 port but it never succeeds, so finally after a long waiting, i get this message:

connection closed
attempting to reconnect

and the whole process starts again from the beginning.

This is a screenshot of my firewall rules:

i think everything is ok here. What am i missing?? Where's the problem?? Can someone help please? i've been stuck here for over a week now, any help will be much appreciated.

Thank you very much in advance.

The text was updated successfully, but these errors were encountered:

antellgc · 2019-04-10T22:37:50Z

Having the same problems here. @miguel2488 have you had any luck with a fix?

miguel2488 · 2019-04-11T07:07:13Z

Nope, nothing new here, i wasn't able to fix it since i don't have a clue about where the problem is coming. Instead of using datablab, i resigned myself to run jupyter notebooks on the machine, i'm totally blind with this and for what i've seen so far, no one seems to care about this thread. I wish you a better luck.

hacktuarial · 2019-06-05T21:44:05Z

I had the same problem, and observed that the container running jupyter on the VM took ~5 minutes to start up. My workaround was:

datalab create ... --ssh-log-level=debug
wait for the Connection refused messages to begin
CTRL+c to kill it
gcloud compute ssh ..., then run docker ps every 1-2 minutes until the logger and datalab containers appear
datalab connect ...
Then I was able to use datalab in the normal way.

MchlUh · 2020-02-05T18:06:40Z

Hello hacktuarial,
I have the same issue, and tried your solution.
the datalab container never appears for me.
Did you simply run cloud compute ssh ...(name of instance) ?
Thanks for your help !

hacktuarial · 2020-02-05T18:17:53Z

Yes, that's what I ran. Can you post a sample of your ssh logs? It sounds like the problem may be with the datalab create command.

MchlUh · 2020-02-05T18:29:17Z

I was using a datalab connect ... command until now, and tried really with datalab create ....
It actually works exactly as you said, the loggers and datalab containers appeared !

It has maybe something to do with the way I created my instance at the beginning, I used:
datalab beta create-gpu datalab-instance-name at the time.

Anyway, I am now able to use Datalab !
Thanks :)

MchlUh · 2020-02-05T22:25:34Z

It seems that when creating an instance with a GPU, the same problem appears but this solution does not apply.
I have now created it for an hour, and docker ps only shows the logger container but no datalab container.

chanyou0311 · 2020-02-11T08:12:11Z

I have a similar problem with @MichaelTheBrute.

I tried to launch an instance of Datalab with the command below.

$ datalab beta create-gpu --machine-type n1-standard-4 --zone us-west1-b --accelerator-type nvidia-tesla-k80 --accelerator-count 1 datalab-instance
By accepting below, you will download and install the
following third-party software onto your managed GCE instances:
    NVidia GPU Driver: NVIDIA-Linux-x86_64-390.46
Do you accept (y/N)?: y
Creating the disk datalab-instance-pd
Creating the instance datalab-instance

Due to GPU Driver installation, please note that Datalab GPU instances take significantly longer to startup compared to non-GPU instances.
Created [https://www.googleapis.com/compute/beta/projects/xxxxxxxx/zones/us-west1-b/instances/datalab-instance].
Connecting to datalab-instance.
This will create an SSH tunnel and may prompt you to create an rsa key pair. To manage these keys, see https://cloud.google.com/compute/docs/instances/adding-removing-ssh-keys
Waiting for Datalab to be reachable at http://localhost:8081/

However, there is no response after more than 30 minutes.
I saw information that it took about 15 minutes, but I thought it was still too long.

I made an ssh connection to the instance and started investigating.
As discussed before, I also ran the docker ps command.

$ datalab@datalab-instance ~ $ sudo docker ps -a
CONTAINER ID        IMAGE                                         COMMAND                  CREATED             STATUS              PORTS               NAMES
4994361cf048        gcr.io/google-containers/fluentd-gcp:2.0.17   "/bin/sh -c '/run.sh…"   19 minutes ago      Up 19 minutes       80/tcp              logger

The datalab container was not running.
However, when I ran the same command a few minutes later, I saw gcr.io/cloud-datalab/datalab-gpu:latest image only once.
(I forgot to take notes.)
Since then, we have never been able to see the container.

When the CPU worked correctly, I thought that it might be because the GPU was not set up correctly.
The GPU setup seems to be done in the startup script, so I checked that the script finished successfully.

datalab@datalab-instance ~ $ systemctl status google-startup-scripts.service
● google-startup-scripts.service - Google Compute Engine Startup Scripts
   Loaded: loaded (/usr/lib/systemd/system/google-startup-scripts.service; disabled; vendor preset: disabled)
   Active: inactive (dead) since Tue 2020-02-11 07:27:30 UTC; 34min ago
 Main PID: 421 (code=exited, status=0/SUCCESS)
      CPU: 881ms

I checked the log with the journalctl command, but it seemed to have finished successfully.

In the process, I noticed that wait-for-startup-script.service did not finish properly.

datalab@datalab-instance ~ $ systemctl --failed
  UNIT                            LOAD   ACTIVE SUB    DESCRIPTION
● wait-for-startup-script.service loaded failed failed Wait for the startup script to setup required directories

datalab@datalab-instance ~ $ sudo journalctl -u wait-for-startup-script.service
-- Logs begin at Tue 2020-02-11 06:59:19 UTC, end at Tue 2020-02-11 08:05:27 UTC. --
Feb 11 06:59:34 datalab-instance systemd[1]: Starting Wait for the startup script to setup required directories...
Feb 11 06:59:34 datalab-instance docker-credential-gcr[768]: ERROR: Unable to save docker config: mkdir /root/.docker: read-only file system
Feb 11 06:59:34 datalab-instance systemd[1]: wait-for-startup-script.service: Control process exited, code=exited status=1
Feb 11 06:59:34 datalab-instance systemd[1]: wait-for-startup-script.service: Failed with result 'exit-code'.
Feb 11 06:59:34 datalab-instance systemd[1]: Failed to start Wait for the startup script to setup required directories.
Feb 11 06:59:34 datalab-instance systemd[1]: wait-for-startup-script.service: Consumed 82ms CPU time
Feb 11 06:59:34 datalab-instance systemd[1]: Starting Wait for the startup script to setup required directories...
Feb 11 06:59:34 datalab-instance docker-credential-gcr[792]: ERROR: Unable to save docker config: mkdir /root/.docker: read-only file system
Feb 11 06:59:34 datalab-instance systemd[1]: wait-for-startup-script.service: Control process exited, code=exited status=1
Feb 11 06:59:34 datalab-instance systemd[1]: wait-for-startup-script.service: Failed with result 'exit-code'.
Feb 11 06:59:34 datalab-instance systemd[1]: Failed to start Wait for the startup script to setup required directories.
Feb 11 06:59:34 datalab-instance systemd[1]: wait-for-startup-script.service: Consumed 94ms CPU time

You can confirm that an error has occurred in docker-credential-gcr.
I don't understand what this means in the startup-script, but I hope it helps.

I will continue to investigate.

chanyou0311 · 2020-02-11T08:17:06Z

May be related to this Pull Request.
#2147

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datalab won't connect to VM instance after long time waiting for it to be reachable at port 8081 #2124

Datalab won't connect to VM instance after long time waiting for it to be reachable at port 8081 #2124

miguel2488 commented Mar 15, 2019

antellgc commented Apr 10, 2019

miguel2488 commented Apr 11, 2019

hacktuarial commented Jun 5, 2019

MchlUh commented Feb 5, 2020

hacktuarial commented Feb 5, 2020

MchlUh commented Feb 5, 2020 •

edited

Loading

MchlUh commented Feb 5, 2020

chanyou0311 commented Feb 11, 2020

chanyou0311 commented Feb 11, 2020

Datalab won't connect to VM instance after long time waiting for it to be reachable at port 8081 #2124

Datalab won't connect to VM instance after long time waiting for it to be reachable at port 8081 #2124

Comments

miguel2488 commented Mar 15, 2019

antellgc commented Apr 10, 2019

miguel2488 commented Apr 11, 2019

hacktuarial commented Jun 5, 2019

MchlUh commented Feb 5, 2020

hacktuarial commented Feb 5, 2020

MchlUh commented Feb 5, 2020 • edited Loading

MchlUh commented Feb 5, 2020

chanyou0311 commented Feb 11, 2020

chanyou0311 commented Feb 11, 2020

MchlUh commented Feb 5, 2020 •

edited

Loading