Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verdi run consumes CPU in case of connection error #4827

Open
dev-zero opened this issue Mar 19, 2021 · 1 comment
Open

verdi run consumes CPU in case of connection error #4827

dev-zero opened this issue Mar 19, 2021 · 1 comment
Labels

Comments

@dev-zero
Copy link
Contributor

Describe the bug

I am running a workchain with verdi run and get disconnected from SSH (resp. disconnected my smartcard which made the private key unavailable to paramiko since there's no SSH master connection).
As expected after some retries I got some error messages and the transport tasks were paused, With that, verdi run started to consume up to 2 cores fully (with one daemon worker), and being on a mobile device this is somewhat less ideal.

Steps to reproduce

Steps to reproduce the behavior:

  1. Run a workchain
  2. Disconnect network
  3. Wait

the errors for completeness are:

...
   self._jobs_cache = await self._get_jobs_from_scheduler()
  File "/home/tiziano/work/aiida/aiida_core/aiida/engine/processes/calcjobs/manager.py", line 98, in _get_job
s_from_scheduler
    transport = await request
  File "/usr/lib/python3.9/asyncio/futures.py", line 284, in __await__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup
    future.result()
  File "/usr/lib/python3.9/asyncio/futures.py", line 201, in result
    raise self._exception
  File "/home/tiziano/work/aiida/aiida_core/aiida/engine/transports.py", line 89, in do_open
    transport.open()
  File "/home/tiziano/work/aiida/aiida_core/aiida/transports/plugins/ssh.py", line 438, in open
    self._client.connect(self._machine, **connection_arguments)
  File "/home/tiziano/.local/share/virtualenvs/aiida/lib/python3.9/site-packages/paramiko/client.py", line 435, in connect
    self._auth(
  File "/home/tiziano/.local/share/virtualenvs/aiida/lib/python3.9/site-packages/paramiko/client.py", line 764, in _auth
    raise saved_exception
  File "/home/tiziano/.local/share/virtualenvs/aiida/lib/python3.9/site-packages/paramiko/client.py", line 734, in _auth
    key = self._key_from_filepath(
  File "/home/tiziano/.local/share/virtualenvs/aiida/lib/python3.9/site-packages/paramiko/client.py", line 586, in _key_from_filepath
    key = klass.from_private_key_file(key_path, password)
  File "/home/tiziano/.local/share/virtualenvs/aiida/lib/python3.9/site-packages/paramiko/pkey.py", line 235, in from_private_key_file
    key = cls(filename=filename, password=password)
  File "/home/tiziano/.local/share/virtualenvs/aiida/lib/python3.9/site-packages/paramiko/ed25519key.py", line 63, in __init__
    signing_key = self._parse_signing_key_data(data, password)
  File "/home/tiziano/.local/share/virtualenvs/aiida/lib/python3.9/site-packages/paramiko/ed25519key.py", line 96, in _parse_signing_key_data
    raise PasswordRequiredException(
paramiko.ssh_exception.PasswordRequiredException: Private key file is encrypted
03/19/2021 03:14:55 PM <171713> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [WARNING] maximum attempts 5 of calling do_update, exceeded
03/19/2021 03:14:55 PM <171713> aiida.engine.processes.calcjobs.tasks: [WARNING] updating CalcJob<470> failed

20210319_15h53m35s_grim

... playing the paused process did not resolve the issue.

Expected behavior

wait quietly and patiently without using much CPU for the network to be available again ;-)

Your environment

  • Operating system [e.g. Linux]: Arch Linux
  • Python version [e.g. 3.7.1]: 3.9..2
  • aiida-core version [e.g. 1.2.1]: 1.6.0
  • Postres 13.2
  • RabbitMQ 3.8.14

possibly related to #4801

@dev-zero
Copy link
Contributor Author

The hanging seems to caused by the proxy_command, after applying #4951 and switching to proxy_jump the CPU spinning part of this issue was gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant