You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from time to time i see that my docker tcmu-runner container is exiting and restarting. After that, i get lots of
Sep 25 12:39:52 ceph-osd02 kernel: Unable to locate Target Portal Group on iqn.2019-03.com.redhat.iscsi-gw:ceph-igw
Sep 25 12:39:52 ceph-osd02 kernel: iSCSI Login negotiation failed.
Sep 25 12:39:52 ceph-osd02 kernel: Unable to locate Target Portal Group on iqn.2019-03.com.redhat.iscsi-gw:ceph-igw
Sep 25 12:39:52 ceph-osd02 kernel: iSCSI Login negotiation failed.
errors on the failed tcmu-runner node. So the tcmu-runner container has restarted, is running but clients are not able to login via iscsi. The other 3x tcmu-runner nodes are running like expected. On the failed node i'm not able to restart rbd-target-api because it seems to be stuck. The only thing i can do is to reboot the whole node.
I checked all logs and it seems to occur from a slow OSD (deep scrub was running at the moment the trouble begun):
tcmu-runner logs from slow IO on osd.39 till reboot:
2022-09-25 03:46:36.558 83 [CRIT] create_file_output:485: log file path now is '/var/log/tcmu-runner/tcmu-runner.log'
2022-09-25 12:39:40.136 83 [ERROR] tcmu_rbd_handle_timedout_cmd:1226 rbd/ovirt.lun1: Timing out cmd.
2022-09-25 12:39:40.137 83 [ERROR] __tcmu_notify_conn_lost:214 rbd/ovirt.lun1: Handler connection lost (lock state 5)
2022-09-25 12:39:48.872 83 [ERROR] tcmu_rbd_handle_timedout_cmd:1226 rbd/ovirt.lun1: Timing out cmd.
2022-09-25 12:39:48.883 83 [ERROR] tcmu_rbd_handle_timedout_cmd:1226 rbd/ovirt.lun1: Timing out cmd.
2022-09-25 12:39:48.884 83 [ERROR] tcmu_rbd_handle_timedout_cmd:1226 rbd/ovirt.lun1: Timing out cmd.
2022-09-25 12:39:48.885 83 [INFO] tgt_port_grp_recovery_work_fn:246: Disabled iscsi/iqn.2019-03.com.redhat.iscsi-gw:ceph-igw/tpgt_2.
2022-09-25 12:40:04.916 83 [INFO] tcmu_rbd_close:1199 rbd/ovirt.lun1: appended blocklist entry: {192.168.90.32:0/3857887088}
2022-09-25 12:40:05.048 83 [INFO] tcmu_rbd_rm_stale_entries_from_blacklist:341 rbd/ovirt.lun1: removing blocklist entry: {192.168.90.32:0/3857887088}
2022-09-25 12:40:05.049 83 [INFO] tgt_port_grp_recovery_work_fn:272: Enabled iscsi/iqn.2019-03.com.redhat.iscsi-gw:ceph-igw/tpgt_2.
2022-09-25 12:40:06.892 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 5
2022-09-25 12:40:06.893 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 1
2022-09-25 12:40:06.911 83 [INFO] alua_implicit_transition:592 rbd/ovirt.lun1: Starting write lock acquisition operation.
2022-09-25 12:40:06.912 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:06.912 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:06.913 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:06.913 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:06.913 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:06.923 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:06.941 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.107 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.107 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.108 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.108 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.108 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.109 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.518 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.518 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.518 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.519 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.519 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.519 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.519 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.519 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.519 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 12:40:07.519 83 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 13:56:26.504 83 [CRIT] main:1425: Exiting...
2022-09-25 13:56:26.612 83 [INFO] tcmu_rbd_close:1199 rbd/ovirt.lun3: appended blocklist entry: {192.168.90.32:0/3847743111}
2022-09-25 13:56:26.635 83 [INFO] tcmu_rbd_close:1199 rbd/ovirt.lun2: appended blocklist entry: {192.168.90.32:0/1152438333}
2022-09-25 13:59:45.551 82 [CRIT] main:1302: Starting...
2022-09-25 13:59:45.593 82 [INFO] load_our_module:576: Inserted module 'target_core_user'
2022-09-25 13:59:45.594 82 [INFO] tcmur_register_handler:92: Handler fbo is registered
2022-09-25 13:59:45.596 82 [INFO] tcmur_register_handler:92: Handler zbc is registered
2022-09-25 13:59:45.616 82 [INFO] tcmur_register_handler:92: Handler glfs is registered
2022-09-25 13:59:45.616 82 [INFO] tcmur_register_handler:92: Handler qcow is registered
2022-09-25 13:59:45.676 82 [INFO] tcmur_register_handler:92: Handler rbd is registered
2022-09-25 14:00:53.917 82 [INFO] alua_implicit_transition:592 rbd/ovirt.lun1: Starting write lock acquisition operation.
2022-09-25 14:00:54.643 82 [INFO] tcmu_rbd_lock:948 rbd/ovirt.lun1: Acquired exclusive lock.
2022-09-25 14:00:54.643 82 [INFO] tcmu_acquire_dev_lock:489 rbd/ovirt.lun1: Write lock acquisition successful
2022-09-25 14:00:57.362 82 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 5
2022-09-25 14:00:57.362 82 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 1
2022-09-25 14:00:57.362 82 [INFO] alua_implicit_transition:592 rbd/ovirt.lun1: Starting write lock acquisition operation.
2022-09-25 14:00:57.362 82 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 14:00:57.362 82 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 4
2022-09-25 14:00:57.367 82 [INFO] tcmu_rbd_close:1199 rbd/ovirt.lun1: appended blocklist entry: {192.168.90.32:0/3722889083}
2022-09-25 14:00:57.402 82 [INFO] tcmu_rbd_rm_stale_entries_from_blacklist:341 rbd/ovirt.lun1: removing blocklist entry: {192.168.90.32:0/3722889083}
2022-09-25 14:00:58.716 82 [INFO] tcmu_acquire_dev_lock:482 rbd/ovirt.lun1: Lock acquisition unsuccessful
2022-09-25 14:00:58.719 82 [INFO] alua_implicit_transition:592 rbd/ovirt.lun1: Starting write lock acquisition operation.
2022-09-25 14:00:59.737 82 [INFO] tcmu_rbd_lock:948 rbd/ovirt.lun1: Acquired exclusive lock.
2022-09-25 14:00:59.737 82 [INFO] tcmu_acquire_dev_lock:489 rbd/ovirt.lun1: Write lock acquisition successful
2022-09-25 14:01:00.784 82 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 5
2022-09-25 14:01:00.784 82 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 1
2022-09-25 14:01:00.784 82 [WARN] tcmu_notify_lock_lost:271 rbd/ovirt.lun1: Async lock drop. Old state 1
2022-09-25 14:01:00.790 82 [INFO] alua_implicit_transition:592 rbd/ovirt.lun1: Starting write lock acquisition operation.
2022-09-25 14:01:00.795 82 [INFO] tcmu_rbd_close:1199 rbd/ovirt.lun1: appended blocklist entry: {192.168.90.32:0/128875194}
2022-09-25 14:01:00.843 82 [INFO] tcmu_rbd_rm_stale_entries_from_blacklist:341 rbd/ovirt.lun1: removing blocklist entry: {192.168.90.32:0/128875194}
2022-09-25 14:01:02.844 82 [INFO] tcmu_rbd_lock:948 rbd/ovirt.lun1: Acquired exclusive lock.
2022-09-25 14:01:02.844 82 [INFO] tcmu_acquire_dev_lock:489 rbd/ovirt.lun1: Write lock acquisition successful
2022-09-25 14:03:01.333 82 [ERROR] tcmu_rbd_handle_timedout_cmd:1226 rbd/ovirt.lun1: Timing out cmd.
2022-09-25 14:03:01.333 82 [ERROR] __tcmu_notify_conn_lost:214 rbd/ovirt.lun1: Handler connection lost (lock state 5)
2022-09-25 14:03:02.687 82 [ERROR] tcmu_rbd_handle_timedout_cmd:1226 rbd/ovirt.lun1: Timing out cmd.
2022-09-25 14:03:05.365 82 [INFO] tgt_port_grp_recovery_work_fn:246: Disabled iscsi/iqn.2019-03.com.redhat.iscsi-gw:ceph-igw/tpgt_2.
2022-09-25 14:03:05.387 82 [INFO] tcmu_rbd_close:1199 rbd/ovirt.lun1: appended blocklist entry: {192.168.90.32:0/543220842}
2022-09-25 14:03:05.448 82 [INFO] tcmu_rbd_rm_stale_entries_from_blacklist:341 rbd/ovirt.lun1: removing blocklist entry: {192.168.90.32:0/543220842}
2022-09-25 14:03:05.449 82 [INFO] tgt_port_grp_recovery_work_fn:272: Enabled iscsi/iqn.2019-03.com.redhat.iscsi-gw:ceph-igw/tpgt_2.
2022-09-25 14:04:56.278 82 [ERROR] tcmu_rbd_handle_timedout_cmd:1226 rbd/ovirt.lun1: Timing out cmd.
2022-09-25 14:04:56.278 82 [ERROR] __tcmu_notify_conn_lost:214 rbd/ovirt.lun1: Handler connection lost (lock state 5)
2022-09-25 14:05:07.307 82 [ERROR] tcmu_rbd_handle_timedout_cmd:1226 rbd/ovirt.lun1: Timing out cmd.
2022-09-25 14:05:11.951 82 [ERROR] tcmu_rbd_handle_timedout_cmd:1226 rbd/ovirt.lun1: Timing out cmd.
2022-09-25 14:05:18.861 82 [ERROR] tcmu_rbd_handle_timedout_cmd:1226 rbd/ovirt.lun1: Timing out cmd.
2022-09-25 14:05:18.862 82 [INFO] tgt_port_grp_recovery_work_fn:246: Disabled iscsi/iqn.2019-03.com.redhat.iscsi-gw:ceph-igw/tpgt_2.
2022-09-25 14:05:19.046 82 [INFO] tgt_port_grp_recovery_work_fn:272: Enabled iscsi/iqn.2019-03.com.redhat.iscsi-gw:ceph-igw/tpgt_2.
For me it looks like tcmu-runner isn't able to send IO requests to osd.39 and then it fails. After that, target-api is stuck and iscsi clients try to login but fail. Is there anything i can do against such situations?
It is not only that i have to reboot the exporter node, on ovirt / client site i have really slow IO and VMs are pausing.
I found two other issues which seem to be the same direction (maybe other issue but tcmu-runner fails and target-api is stuck):
Hi,
from time to time i see that my docker tcmu-runner container is exiting and restarting. After that, i get lots of
errors on the failed tcmu-runner node. So the tcmu-runner container has restarted, is running but clients are not able to login via iscsi. The other 3x tcmu-runner nodes are running like expected. On the failed node i'm not able to restart rbd-target-api because it seems to be stuck. The only thing i can do is to reboot the whole node.
I checked all logs and it seems to occur from a slow OSD (deep scrub was running at the moment the trouble begun):
tcmu-runner logs from slow IO on osd.39 till reboot:
For me it looks like tcmu-runner isn't able to send IO requests to osd.39 and then it fails. After that, target-api is stuck and iscsi clients try to login but fail. Is there anything i can do against such situations?
It is not only that i have to reboot the exporter node, on ovirt / client site i have really slow IO and VMs are pausing.
I found two other issues which seem to be the same direction (maybe other issue but tcmu-runner fails and target-api is stuck):
ceph/ceph-iscsi-cli#102
#169
Versions:
OSD Config:
Target-CLI:
Thanks
The text was updated successfully, but these errors were encountered: