Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parsing of SSH config file for verdi computer configure ssh #4100

Open
ConradJohnston opened this issue May 21, 2020 · 10 comments
Open
Assignees

Comments

@ConradJohnston
Copy link
Contributor

Related in spirit to issue #3311, and to the "UI Improvements" project.

Currently, when using verdi computer configure ssh to configure an SSH connection, the SSH config file, if present, is parsed by Paramiko and verdi offers some suggestions for populating the configuration options.

A typical stanza as currently supported might look like this:

Host best-hpc.phy.uni.ch
    User Conrad
    IdentityFile ~/.ssh/best-hpc.key

One limitation of the current implementation is that one must specify the fully qualified hostname as the Host and anything specified using the Hostname keyword is only used to substitute %h in SSH proxy commands, as I understand it. This is frustrating, as one typically uses the SSH Config to do something like:

Host BestHPC
    Hostname best-hpc.phy.uni.ch
    User Conrad
    IdentityFile ~/.ssh/best-hpc.key

where Host is used to define convenience aliases.

If this more canonical approach was supported it would reduce frustration, but also allow semi-convenient work arounds for locked down supercomputers (see #3929) like:

Host BestHPCwith2FA
    Hostname localhost
    User Conrad
    Port 22000

where one opens an SSH connection separately, supplies some 2FA, and redirects this connection to some local port, say 22000, and then points AiiDA to this localhost port, using a nice alias.

@chrisjsewell
Copy link
Member

Sounds reasonable I think. On a semi-related note, I was recently messing around with a bit recently with automating this configuration when building VMs in https://github.com/chrisjsewell/ansible-role-aiida-computer

@giovannipizzi
Copy link
Member

Indeed, I also had the same experience as you had @ConradJohnston
Any suggestion on how to allow the parser to do the right job?
Should we check the hostname? (Doesn't seem enough in the last example) - and if we do it, what should we do if there are multiple matches?

Or should we ask the user to pick from one of the "valid" entries, or maybe just ask to provide a new name, then print out the raw content of the .ssh/config file, and if they are happy, continue with those defaults? (Note that the Host could even just be Best* so we should also deal with these wildcard cases)

Or any other suggestions?

@ltalirz
Copy link
Member

ltalirz commented Jan 14, 2021

Personally, I've always found the fact that AiiDA tries to parse the .ssh/config file more confusing than helpful - as a user, it was never quite clear to me which information it takes from there and whether the config still has some effect after the computer was configured (and I just lost more than an hour because I did not know/remember that you cannot specify %h/%p in verdi computer ssh config - although it is documented).

Since the documentation anyhow starts by setting up your ~/.ssh/config, what speaks against storing in the AiiDA computer only the Host to use and let paramiko or some other SSH library take care of parsing the config?

Major benefits I would see:

  • separation of concerns: let ssh libraries deal with ssh configuration; relieves us from having to deal with parsing stuff ourselves and keeping up (e.g. I'd now prefer to use the more user friendly ProxyJump directive, but AiiDA does not support it)
  • if ssh <host> works, then AiiDA will be able to connect as well => only one thing to get working, easier to test/inspect/change

Potential downsides

  • less control over defaults

P.S. After looking a bit into it, I realized that while paramiko provides parsers for the SSH config, it does not seem to help you with e.g. automatically parsing the information for the proxy server as well, i.e. it looks to me like some legwork would remain on the AiiDA side when using paramiko (?).

However, there are tools like fabric (built on top of paramiko) that do just that.

Here's an excerpt from my ssh config:

Host ela
    User tal
    HostName ela.cscs.ch
    IdentityFile ~/.ssh/cscs
Host daint
    User tal
    Hostname daint.cscs.ch
    IdentityFile ~/.ssh/cscs
    ForwardAgent yes
    Port 22
    ProxyJump ela
    ForwardX11 yes
    ForwardX11Trusted yes

And here's what I can do after pip install --upgrade fabric paramiko (note: I needed paramiko 2.7.2 for this to work):

ipython
In [1]: from fabric import Connection

In [2]: c = Connection('ela')

In [3]: c.run('hostname')
ela5.cscs.ch
Out[3]: <Result cmd='hostname' exited=0>

In [4]: c = Connection('daint')

In [5]: c.run('hostname')
daint102
Out[5]: <Result cmd='hostname' exited=0>

In [6]: c.ssh_config
Out[6]:
{'user': 'tal',
 'hostname': 'daint.cscs.ch',
 'identityfile': ['/Users/leopold/.ssh/cscs'],
 'forwardagent': 'yes',
 'port': '22',
 'proxyjump': 'ela',
 'forwardx11': 'yes',
 'forwardx11trusted': 'yes',
 'sendenv': 'LANG LC_*'}

@giovannipizzi
Copy link
Member

giovannipizzi commented Jan 14, 2021

Just one comment to clarify some points where I think there might be some misunderstanding:

For these reasons I believe we cannot yet "trust" automatic parsing of the ssh_config.
If we get to a point where we do trust it, then I would still vote to keep this automatic parsing and copy the settings as we do now in the DB (AuthInfo table), so we know which settings we are using for each computer.
Example case: you might want to connect as two different users to the same computer.

One important thing to remember:
the Host line in the ssh_config can also contain wildcards (e.g. Host daint*), so one has to be careful with parsing (and in particular the string to search for might not be the actual hostname, but just an alias - that is also the problem discussed in this issue) - I tested and fabric seems to be doing the right thing. I don't know if it's worth adding yet one more dependency (to fabric), possibly the basic parsing works also from paramiko? Even if I think there is some fabric-specific code, the config is loaded in Config().base_ssh_config._config.

In any case, again - if we have a way to correctly parse it, maybe we should just implement it rather than dropping copying the information in the AiiDA DB.

Maybe the simplest is, in verdi computer configure ssh (when run interactively), to search for the hostname, show all the settings that would be used, and ask the user if those are ok or if he/she wants to provide another host.
In your case, the default wouldn't show the correct results, so you would type daint, this will now show that it correctly parsed the various keys (forwardagent, user, hostname, ...), you would confirm, and then it would just continue as now (using those as the defaults that you can still change interactively.
Would this work?

@ltalirz
Copy link
Member

ltalirz commented Jan 18, 2021

Example case: you might want to connect as two different users to the same computer.

I haven't come across this use case before, have you?
Of course, this only applies to the case where you also want the same prepend text for both users; as soon as there are differences one anyhow needs to set up a separate computer.

My point is just that by storing all this information (which is not considered part of the provenance) in the database, we are forcing ourselves to re-implement all this logic that is already present in the ssh program itself; while resulting in a less feature-complete and less robust interface for the user.
For example, I just came across another confusing behavior, where the gotocomputer command does not make use of the Proxycommand set for the computer: it uses only the configured hostname and private key (i.e. to make it work one then has to modify the ~/.ssh/config). This is really non-obvious.

If one could simply tell AiiDA the ssh alias to connect to and let the ssh config do the rest [1], it seems to me that we would make life easier both for users and developers.

[1] Either by using ssh directly like in gotocomputer or by getting the paramiko connection via some dedicated library like fabric that does the parsing for us.

@giovannipizzi
Copy link
Member

Assuming we can use all features in the SSH config file (not obvious, probably needs fabric, which means adapting most of the transport) we could go for a transition phase where the first question is "do you want to use your user SSH config" and if yes you just provide a name (note, one downside: you might connect to a different machine than the one declared in the hostname of the computer, in this way).

Or, even better (if as I suspect this requires fabric): one should implement a ssh-fabric plugin and use this approach there. No problem in hosting it directly on aiida-core, I think, and if we realise it's much more robust we can suggest it as a default.
Feel free to give a shot to it if you want to test if the idea would work!

@giovannipizzi
Copy link
Member

I also mention here the option to look into AsyncSSH as mentioned by @dev-zero in this comment.
If one wants to look into this, I suggest implementing a new asyncssh transport plugin first.

@mbercx
Copy link
Member

mbercx commented Apr 17, 2021

In general, I fully support @ltalirz's suggestion to move towards simply storing the Host in the database, and having ssh libraries deal with the configuration. Would make life a lot easier for both beginning users and ourselves. This would indeed involve some substantial changes, and it may be better to write all of this down in AEP.

To deal with the problem raised in this issue, I was playing around with adapting the transports.plugins.ssh.parse_sshconfig() method:

def parse_sshconfig(hostname):
    """
    Parse the ``.ssh/config`` file in the home directory and return each configuration that has the given hostname.

    :param hostname: the hostname for which we want the configuration.
    :return: dict of configurations mapping each matching ``Host`` with the specified ``hostname`` to its configuration.
    """
    import paramiko
    config = paramiko.SSHConfig()
    try:
        with open(os.path.expanduser('~/.ssh/config'), encoding='utf8') as fhandle:
            config.parse(fhandle)
    except IOError:
        # No file found, so empty configuration
        pass

    matching_hosts = {}

    for host in config.get_hostnames():
        host_config = config.lookup(host)
        if host_config['hostname'] == hostname:
            matching_hosts[host] = host_config

    return matching_hosts

In an effort to deal with the possibility of multiple Hosts with the same HostName, I look up all Hosts with the corresponding HostName and return a dictionary instead of a single configuration. This will still require further changes, though. The _get_{}_suggestion_string()methods will probably need to be adapted to return this dict instead of a str, but I'm not sure where sn the transports.cli module we would add the possibility for the user to select which Host to use. The verdi computer show command also uses the transport_option_default() method, so we'll have to be careful not to mess up this command as well.

This problem would be avoided if we could set up the parsing of the .ssh/config based on the Host instead of the HostName. I think paramiko deals with wildcards as well, at least for this example config:

Host *
  ForwardX11 yes
Host daint
  HostName daint.cscs.ch
  User mbercx
  IdentityFile ~/.ssh/id_rsa-daint
  ProxyCommand ssh -W %h:%p -i ~/.ssh/id_rsa-daint [email protected]
In [1]: import os, paramiko

In [2]: config = paramiko.SSHConfig()
   ...: with open(os.path.expanduser('~/.ssh/config'), encoding='utf8') as fhandle:
   ...:     config.parse(fhandle)
   ...: 

In [3]: config.lookup('daint')
Out[3]: 
{'forwardx11': 'yes',
 'hostname': 'daint.cscs.ch',
 'user': 'mbercx',
 'identityfile': ['/home/mbercx/.ssh/id_rsa-daint'],
 'proxycommand': 'ssh -W daint.cscs.ch:22 -i /home/mbercx/.ssh/id_rsa-daint [email protected]'}

Or is there some other issue I'm missing?

To deal with this issue in a minimal way, perhaps we can add an option to verdi computer configure ssh to load the ssh config from the .ssh/config file based on a specified Host. Then the user could just do e.g.:

verdi computer configure ssh --load-from-system-host daint

However, I still think that @ltalirz's suggestion is the best solution long term, if we can hash out @giovannipizzi's concerns. I.e. instead of using a hostname during computer setup, we just set a host, and if the transport is ssh and this host is present in the .ssh/config file, the user doesn't need to do the configure step. If it isn't present, either the command fails or the user is warned that the Host is missing from the .ssh/config file.

@dev-zero
Copy link
Contributor

ProxyJump support is here: #4951

@ConradJohnston
Copy link
Contributor Author

ProxyJump support is here: #4951

Heroic stuff, Tiziano! 🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants