Agent fails to install SSH server when running in venv/Conda #3

norrishd · 2021-07-12T05:41:36Z

I've followed fairly straightforward steps to install a ClearML agent and connect to it using clearml-session, but get the following output:

Installing SSH Server on ip-172-31-4-42 [172.31.4.42]
Unable to load host key "/home/ubuntu/.clearml/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_rsa_key.pub": invalid format
Unable to load host key: /home/ubuntu/.clearml/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_rsa_key.pub
Unable to load host key "/home/ubuntu/.clearml/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_ecdsa_key.pub": invalid format
Unable to load host key: /home/ubuntu/.clearml/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_ecdsa_key.pub
Unable to load host key: /home/ubuntu/.clearml/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_ed25519_key.pub
ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring

On the client side I then get:

Password: Error: incorrect password
Please enter password manually:

Any suggestions? Would the recommendation to be to install/run the ClearML agent as root and/or using the system Python?

Steps to reproduce

On the agent:

# System: Ubuntu Focal 20.04, AMD64
# Install Miniconda, then
conda create -n clearml python=3.8
pip install clearml-agent
clearml-agent init
# Copy/paste credentials obtained from ClearML server
clearml-agent daemon --queue default --foreground

Then on the client:

clearml-session --public_ip true

# {
#     "base_task_id": null,
#    "git_credentials": false,
#    "jupyter_lab": true,
#    "password": "<long random-looking password>",
#    "public_ip": true,
#    "queue": "default",
#    "vscode_server": true
#}

The text was updated successfully, but these errors were encountered:

jkhenning · 2021-07-16T08:55:11Z

Hi @norrishd ,

Thanks for the details - I'll try to reproduce and update as soon as possible!

bmartinn · 2021-07-19T22:27:58Z

Hi @norrishd
The agent has no permissions to install the SSH server when running inside venv/conda.
I'm not sure how we can support it without having root access for it.
If an SSH daemon is already installed, it should be able to spin a second copy of it.
wdyt?

norrishd · 2021-07-20T12:34:15Z

Thanks for the explanation @bmartinn! So do you mean that the current version is able to spin a second SSH daemon? (assuming there's an SSH daemon installed). If so that's very cool and would be fine (I must just be doing something wrong)

I tend to use venvs for everything just to avoid ever messing with the system Python. But I guess the use case for clearml-agent is that it's intended to run on servers (or in containers?) that are reserved for that purpose, so the recommendation is to use the system Python and install necessary packages there?

Would you also recommend running it as sudo, or does it not need that level of privileges?

bmartinn · 2021-07-21T00:07:38Z

If so that's very cool and would be fine (I must just be doing something wrong)

Yes, at least in theory (if this doesn't work and /usr/sbin/sshd is still preinstalled, let me know what's the setup, it might be we are missing something)

... But I guess the use case for clearml-agent is that it's intended to run on servers

The agent itself can be installed on a venv (even though it might be easier to install system wide).
The issue is the process the agent spins, I.e. when the agent gets a Job (a Task) it can either, create a new temporary venv for the Task install everything the Task needs there, spin the process and leave. Or it can spin a container for the Task, then repeat the same process (venv creation) inside the container.
When the agent is used to spin the clearml-session usually the setup i s the agent is running in docker mode (i.e. with the flag --docker, then it spins all jobs inside a container, including the clearml-session's remote interactive session.
Make sense?

norrishd · 2021-07-21T00:51:20Z

Yep makes total sense, thanks 😁

norrishd changed the title ~~Agent fails to install SSH server when install in venv/Conda~~ Agent fails to install SSH server when running in venv/Conda Jul 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent fails to install SSH server when running in venv/Conda #3

Agent fails to install SSH server when running in venv/Conda #3

norrishd commented Jul 12, 2021 •

edited

Loading

jkhenning commented Jul 16, 2021

bmartinn commented Jul 19, 2021

norrishd commented Jul 20, 2021 •

edited

Loading

bmartinn commented Jul 21, 2021

norrishd commented Jul 21, 2021

Agent fails to install SSH server when running in venv/Conda #3

Agent fails to install SSH server when running in venv/Conda #3

Comments

norrishd commented Jul 12, 2021 • edited Loading

Steps to reproduce

jkhenning commented Jul 16, 2021

bmartinn commented Jul 19, 2021

norrishd commented Jul 20, 2021 • edited Loading

bmartinn commented Jul 21, 2021

norrishd commented Jul 21, 2021

norrishd commented Jul 12, 2021 •

edited

Loading

norrishd commented Jul 20, 2021 •

edited

Loading