Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel Panic #54

Open
johnsonyjose opened this issue Sep 19, 2016 · 0 comments
Open

kernel Panic #54

johnsonyjose opened this issue Sep 19, 2016 · 0 comments

Comments

@johnsonyjose
Copy link

Hi All,

Have two machines with normal NIC adapter on it. On machine acting as NVMe-Host and the other machine NVMe-Target. Target is NULL_BLOCK_DEVICE provided by linux. Discovery/Connect NVMe commands are working fine. Data transfer is happening fine through the Soft-RoCE interface.

When tried running IO's [Read] using fio command, NVMe-Host tries to re-connect to the target and then kernel panic happens. Stack trace shows the error in rdma_disconnect().

Below is the stack trace when panic happened.
Sep 16 16:40:44 john kernel: [ 4660.937003] nvme nvme0: rdma_resolve_addr wait failed (-104).
Sep 16 16:40:53 john kernel: [ 4669.289136] rxe: set rxe0 active
Sep 16 16:40:53 john kernel: [ 4669.289138] rxe: added rxe0 to eno1
Sep 16 16:40:53 john kernel: [ 4669.291500] interface en01 not found
Sep 16 16:41:03 john kernel: [ 4679.172136] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.0.154:1023
Sep 16 16:41:05 john kernel: [ 4681.896008] nvme nvme0: creating 4 I/O queues.
Sep 16 16:41:05 john kernel: [ 4681.928447] nvme nvme0: new ctrl: NQN "testsubsystem", addr 192.168.0.154:1023
[ 5128.118832] blk_update_request: I/O error, dev nvme0n1, sector 664872
[ 5128.125658] blk_update_request: I/O error, dev nvme0n1, sector 1614312
[ 5128.132569] blk_update_request: I/O error, dev nvme0n1, sector 1309672
[ 5128.139307] blk_update_request: I/O error, dev nvme0n1, sector 1240976
Sep 16 16:48:32 [ 5128.146346] blk_update_request: I/O error, dev nvme0n1, sector 2037616
john kernel: [ 5[ 5128.154293] blk_update_request: I/O error, dev nvme0n1, sector 450352
128.118832] blk_[ 5128.162782] blk_update_request: I/O error, dev nvme0n1, sector 1719776
update_request: [ 5128.170989] blk_update_request: I/O error, dev nvme0n1, sector 441656
I/O error, dev n[ 5128.178936] blk_update_request: I/O error, dev nvme0n1, sector 668736
vme0n1, sector 6[ 5128.187821] blk_update_request: I/O error, dev nvme0n1, sector 1249384
64872
Sep 16 16:48:32 john kernel: [ 5128.125658] blk_update_request: I/O error, dev nvme0n1, sector 1614312
Sep 16 16:48:32 john kernel: [ 5128.132569] blk_update_request: I/O error, dev nvme0n1, sector 1309672
Sep 16 16:48:32 john kernel: [ 5128.139307] blk_update_request: I/O error, dev nvme0n1, sector 1240976
Sep 16 16:48:32 john kernel: [ 5128.146346] blk_update_request: I/O error, dev nvme0n1, sector 2037616
Sep 16 16:48:32 john kernel: [ 5128.154293] blk_update_request: I/O error, dev nvme0n1, sector 450352
Sep 16 16:48:32 john kernel: [ 5128.162782] blk_update_request: I/O error, dev nvme0n1, sector 1719776
Sep 16 16:48:32 john kernel: [ 5128.170989] blk_update_request: I/O error, dev nvme0n1, sector 441656
Sep 16 16:48:32 john kernel: [ 5128.178936] blk_update_request: I/O error, dev nvme0n1, sector 668736
Sep 16 16:48:32 john kernel: [ 5128.187821] blk_update_request: I/O error, dev nvme0n1, sector 1249384
Sep 16 16:48:32 john kernel: [ 5128.195526] nvme nvme0: reconnecting in 10 seconds
[ 5149.206030] nvme nvme0: failed nvme_keep_alive_end_io error=16391
Sep 16 16:48:53 john kernel: [ 5149.206030] nvme nvme0: failed nvme_keep_alive_end_io error=16391
[ 5198.356270] nvme nvme0: Connect command failed, error wo/DNR bit: 7
Sep 16 16:49:42 john kernel: [ 5198.356270] nvme nvme0: Connect command failed, error wo/DNR bit: 7
Sep 16 16:49:42 john kernel: [ 5198.362922] nvme nvme0: Failed reconnect attempt, requeueing...
Sep 16 16:49:53 john kernel: [ 5209.619737] nvme nvme0: rdma_resolve_addr wait failed (-110).
Sep 16 16:49:53 john kernel: [ 5209.620031] nvme nvme0: Failed reconnect attempt, requeueing...
[ 5219.859419] general protection fault: 0000 [#1] SMP
[ 5219.864479] Modules linked in: rdma_ucm ib_uverbs nvme_rdma(OE) rdma_cm iw_cm ib_cm configfs nvme_fabrics(OE) nvme_core(OE) rdma_rxe ip6_udp_tunnel udp_tunnel ib_core binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel intel_powerclamp snd_hda_codec coretemp kvm_intel snd_hda_core kvm snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq gpio_ich joydev snd_seq_device input_leds snd_timer snd irqbypass mei_me serio_raw mei soundcore lpc_ich mac_hid parport_pc ppdev lp parport autofs4 i915 hid_microsoft hid_generic i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops e1000e psmouse usbhid ptp hid drm pps_core pata_acpi fjes video
[ 5219.931164] CPU: 3 PID: 4130 Comm: kworker/3:0 Tainted: G OE 4.8.0-rc1+ #1
[ 5219.939458] Hardware name: /DH55TC, BIOS TCIBX10H.86A.0037.2010.0614.1712 06/14/2010
[ 5219.949302] Workqueue: nvme_rdma_wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[ 5219.956929] task: ffff8d0d2b8b4240 task.stack: ffff8d0d87ab8000
[ 5219.963223] RIP: 0010:[] [] rdma_disconnect+0x2e/0x90 [rdma_cm]
[ 5219.972958] RSP: 0018:ffff8d0d87abbdb0 EFLAGS: 00010206
[ 5219.978541] RAX: 6e5f656572745f88 RBX: ffff8d0d34914400 RCX: 0000000000000001
[ 5219.986052] RDX: ffff8d0d34917800 RSI: ffff8d0d35cd8580 RDI: ffff8d0d2b399a00
[ 5219.993504] RBP: ffff8d0d87abbdb8 R08: ffff8d0da34d8c40 R09: 0000000000000002
[ 5220.001116] R10: 0000000000000000 R11: 0000000000003000 R12: ffff8d0d915e9930
[ 5220.008680] R13: ffffe58dffac2600 R14: 00000000000000c0 R15: ffff8d0d915e9930
[ 5220.016211] FS: 0000000000000000(0000) GS:ffff8d0da34c0000(0000) knlGS:0000000000000000
[ 5220.024747] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5220.030719] CR2: 0000556ef4ce1db8 CR3: 00000000afe06000 CR4: 00000000000006e0
[ 5220.038181] Stack:
[ 5220.040308] ffff8d0d914b2400 ffff8d0d87abbdd0 ffffffffc061184e ffff8d0d915e9800
[ 5220.048170] ffff8d0d87abbdf8 ffffffffc0611aef ffff8d0d909b2480 ffff8d0da34d8c40
[ 5220.056159] ffffe58dffac2600 ffff8d0d87abbe38 ffffffff8909eac2 0000000000000000
[ 5220.064048] Call Trace:
[ 5220.066635] [] nvme_rdma_stop_and_free_queue+0x1e/0x40 [nvme_rdma]
[ 5220.074886] [] nvme_rdma_reconnect_ctrl_work+0x7f/0x1d0 [nvme_rdma]
[ 5220.083235] [] process_one_work+0x162/0x4b0
[ 5220.089394] [] worker_thread+0x4b/0x4f0
[ 5220.095199] [] ? process_one_work+0x4b0/0x4b0
[ 5220.101693] [] ? process_one_work+0x4b0/0x4b0
[ 5220.108080] [] kthread+0xf8/0x110
[ 5220.113441] [] ret_from_fork+0x1f/0x40
[ 5220.119170] [] ? kthread_worker_fn+0x1a0/0x1a0
[ 5220.125594] Code: 66 90 55 48 89 e5 53 48 89 fb 48 8b bf 00 03 00 00 48 85 ff 74 65 0f b6 83 b8 01 00 00 48 8b 13 48 c1 e0 04 48 03 82 f8 00 00 00 <8b> 50 08 f6 c2 04 75 14 83 e2 08 b8 ea ff ff ff 74 07 31 f6 e8
[ 5220.146752] RIP [] rdma_disconnect+0x2e/0x90 [rdma_cm]
[ 5220.153918] RSP
[ 5220.168895] ---[ end trace 4e3fbc3ad0b11617 ]---
[ 5220.168899] Kernel panic - not syncing: Fatal exception

Regards
John

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant