* nvme/rdma initiator stuck on reboot @ 2016-08-16 19:40 Steve Wise 2016-08-17 10:23 ` Sagi Grimberg 0 siblings, 1 reply; 15+ messages in thread From: Steve Wise @ 2016-08-16 19:40 UTC (permalink / raw) Hey Sagi, Here is another issue I'm seeing doing reboot testing. The test does this: 1) connect 10 ram devices over iw_cxgb4 2) reboot the target node 3) the initiator goes into recovery/reconnect mode 4) reboot the inititator at this point. The initiator gets stuck doing this continually and the system never reboots: [ 596.411842] nvme nvme1: Failed reconnect attempt, requeueing... [ 596.907865] nvme nvme9: rdma_resolve_addr wait failed (-104). [ 596.914461] nvme nvme9: Failed reconnect attempt, requeueing... [ 597.939935] nvme nvme10: rdma_resolve_addr wait failed (-104). [ 597.946625] nvme nvme10: Failed reconnect attempt, requeueing... [ 598.963995] nvme nvme2: rdma_resolve_addr wait failed (-110). [ 598.971968] nvme nvme2: Failed reconnect attempt, requeueing... [ 602.036135] nvme nvme3: rdma_resolve_addr wait failed (-104). [ 602.043797] nvme nvme3: Failed reconnect attempt, requeueing... [ 603.060171] nvme nvme4: rdma_resolve_addr wait failed (-104). [ 603.068153] nvme nvme4: Failed reconnect attempt, requeueing... [ 604.084223] nvme nvme5: rdma_resolve_addr wait failed (-104). [ 604.092191] nvme nvme5: Failed reconnect attempt, requeueing... [ 605.108294] nvme nvme6: rdma_resolve_addr wait failed (-104). [ 605.116251] nvme nvme6: Failed reconnect attempt, requeueing... Debugging now... Steve. ^ permalink raw reply [flat|nested] 15+ messages in thread
* nvme/rdma initiator stuck on reboot 2016-08-16 19:40 nvme/rdma initiator stuck on reboot Steve Wise @ 2016-08-17 10:23 ` Sagi Grimberg 2016-08-17 14:33 ` Steve Wise 0 siblings, 1 reply; 15+ messages in thread From: Sagi Grimberg @ 2016-08-17 10:23 UTC (permalink / raw) > Hey Sagi, > > Here is another issue I'm seeing doing reboot testing. The test does this: > > 1) connect 10 ram devices over iw_cxgb4 > 2) reboot the target node > 3) the initiator goes into recovery/reconnect mode > 4) reboot the inititator at this point. > > The initiator gets stuck doing this continually and the system never reboots: > > [ 596.411842] nvme nvme1: Failed reconnect attempt, requeueing... > [ 596.907865] nvme nvme9: rdma_resolve_addr wait failed (-104). > [ 596.914461] nvme nvme9: Failed reconnect attempt, requeueing... > [ 597.939935] nvme nvme10: rdma_resolve_addr wait failed (-104). > [ 597.946625] nvme nvme10: Failed reconnect attempt, requeueing... > [ 598.963995] nvme nvme2: rdma_resolve_addr wait failed (-110). > [ 598.971968] nvme nvme2: Failed reconnect attempt, requeueing... > [ 602.036135] nvme nvme3: rdma_resolve_addr wait failed (-104). > [ 602.043797] nvme nvme3: Failed reconnect attempt, requeueing... > [ 603.060171] nvme nvme4: rdma_resolve_addr wait failed (-104). > [ 603.068153] nvme nvme4: Failed reconnect attempt, requeueing... > [ 604.084223] nvme nvme5: rdma_resolve_addr wait failed (-104). > [ 604.092191] nvme nvme5: Failed reconnect attempt, requeueing... > [ 605.108294] nvme nvme6: rdma_resolve_addr wait failed (-104). > [ 605.116251] nvme nvme6: Failed reconnect attempt, requeueing... > > Debugging now... Hmm... Does this reproduce also when you simply delete all the controllers (via sysfs)? Do you see the hung task watchdog? can you share the threads state? (echo t > /proc/sysrq-trigger) ^ permalink raw reply [flat|nested] 15+ messages in thread
* nvme/rdma initiator stuck on reboot 2016-08-17 10:23 ` Sagi Grimberg @ 2016-08-17 14:33 ` Steve Wise 2016-08-17 14:46 ` Sagi Grimberg 0 siblings, 1 reply; 15+ messages in thread From: Steve Wise @ 2016-08-17 14:33 UTC (permalink / raw) > > Hey Sagi, > > > > Here is another issue I'm seeing doing reboot testing. The test does this: > > > > 1) connect 10 ram devices over iw_cxgb4 > > 2) reboot the target node > > 3) the initiator goes into recovery/reconnect mode > > 4) reboot the inititator at this point. > > > > The initiator gets stuck doing this continually and the system never reboots: > > > > [ 596.411842] nvme nvme1: Failed reconnect attempt, requeueing... > > [ 596.907865] nvme nvme9: rdma_resolve_addr wait failed (-104). > > [ 596.914461] nvme nvme9: Failed reconnect attempt, requeueing... > > [ 597.939935] nvme nvme10: rdma_resolve_addr wait failed (-104). > > [ 597.946625] nvme nvme10: Failed reconnect attempt, requeueing... > > [ 598.963995] nvme nvme2: rdma_resolve_addr wait failed (-110). > > [ 598.971968] nvme nvme2: Failed reconnect attempt, requeueing... > > [ 602.036135] nvme nvme3: rdma_resolve_addr wait failed (-104). > > [ 602.043797] nvme nvme3: Failed reconnect attempt, requeueing... > > [ 603.060171] nvme nvme4: rdma_resolve_addr wait failed (-104). > > [ 603.068153] nvme nvme4: Failed reconnect attempt, requeueing... > > [ 604.084223] nvme nvme5: rdma_resolve_addr wait failed (-104). > > [ 604.092191] nvme nvme5: Failed reconnect attempt, requeueing... > > [ 605.108294] nvme nvme6: rdma_resolve_addr wait failed (-104). > > [ 605.116251] nvme nvme6: Failed reconnect attempt, requeueing... > > > > Debugging now... > > Hmm... > > Does this reproduce also when you simply delete all the > controllers (via sysfs)? > Yes. > Do you see the hung task watchdog? can you share the > threads state? (echo t > /proc/sysrq-trigger) The deleting thread is stuck: Aug 17 07:21:32 stevo1 kernel: [ 2213.279798] INFO: task bash:4066 blocked for more than 120 seconds. Aug 17 07:21:32 stevo1 kernel: [ 2213.279802] Tainted: G E 4.8.0-rc2-nvmf-4.8-rc-rebased-rc2+ #15 Aug 17 07:21:32 stevo1 kernel: [ 2213.279803] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 17 07:21:32 stevo1 kernel: [ 2213.279811] bash D ffff881034547ac8 0 4066 4056 0x10000080 Aug 17 07:21:32 stevo1 kernel: [ 2213.279819] ffff881034547ac8 ffff881033f04140 ffff88102bc80000 ffff881032b66250 Aug 17 07:21:32 stevo1 kernel: [ 2213.279824] ffff880fefe64230 ffff88103e88aa50 ffff881034544008 ffff881034547c98 Aug 17 07:21:32 stevo1 kernel: [ 2213.279828] ffff881034547c08 ffff88102bc80000 7fffffffffffffff ffff881034547b18 Aug 17 07:21:32 stevo1 kernel: [ 2213.279829] Call Trace: Aug 17 07:21:32 stevo1 kernel: [ 2213.279842] [<ffffffff816ddde0>] schedule+0x40/0xb0 Aug 17 07:21:32 stevo1 kernel: [ 2213.279850] [<ffffffff810b2de4>] ? try_to_wake_up+0x54/0x240 Aug 17 07:21:32 stevo1 kernel: [ 2213.279856] [<ffffffff816e0a8d>] schedule_timeout+0x2ad/0x410 Aug 17 07:21:32 stevo1 kernel: [ 2213.279861] [<ffffffff810b3025>] ? wake_up_process+0x15/0x20 Aug 17 07:21:32 stevo1 kernel: [ 2213.279868] [<ffffffff8109d940>] ? wake_up_worker+0x30/0x40 Aug 17 07:21:32 stevo1 kernel: [ 2213.279873] [<ffffffff8109fdbc>] ? insert_work+0x6c/0xb0 Aug 17 07:21:32 stevo1 kernel: [ 2213.279878] [<ffffffff816deefe>] wait_for_completion+0xde/0x110 Aug 17 07:21:32 stevo1 kernel: [ 2213.279882] [<ffffffff810b2fd0>] ? try_to_wake_up+0x240/0x240 Aug 17 07:21:32 stevo1 kernel: [ 2213.279887] [<ffffffff810a0e0b>] flush_work+0x2b/0x40 Aug 17 07:21:32 stevo1 kernel: [ 2213.279892] [<ffffffff8109dcc0>] ? worker_detach_from_pool+0xa0/0xa0 Aug 17 07:21:32 stevo1 kernel: [ 2213.279902] [<ffffffffa064e46a>] ? __nvme_rdma_del_ctrl+0x4a/0x60 [nvme_rdma] Aug 17 07:21:32 stevo1 kernel: [ 2213.279908] [<ffffffffa064e4b8>] nvme_rdma_del_ctrl+0x38/0x50 [nvme_rdma] Aug 17 07:21:32 stevo1 kernel: [ 2213.279918] [<ffffffffa02703d4>] nvme_sysfs_delete+0x34/0x50 [nvme_core] Aug 17 07:21:32 stevo1 kernel: [ 2213.279923] [<ffffffff814884a0>] dev_attr_store+0x20/0x30 Aug 17 07:21:32 stevo1 kernel: [ 2213.279931] [<ffffffff812ae29e>] sysfs_kf_write+0x3e/0x40 Aug 17 07:21:32 stevo1 kernel: [ 2213.279935] [<ffffffff812add13>] kernfs_fop_write+0x113/0x1b0 Aug 17 07:21:32 stevo1 kernel: [ 2213.279942] [<ffffffff8115a588>] ? trace_buffer_unlock_commit_regs+0x68/0x80 Aug 17 07:21:32 stevo1 kernel: [ 2213.279949] [<ffffffff8122cfe4>] __vfs_write+0x34/0x120 Aug 17 07:21:32 stevo1 kernel: [ 2213.279957] [<ffffffff81003725>] ? trace_event_raw_event_sys_enter+0xb5/0x130 Aug 17 07:21:32 stevo1 kernel: [ 2213.279963] [<ffffffff810d1d06>] ? percpu_down_read+0x16/0x60 Aug 17 07:21:32 stevo1 kernel: [ 2213.279967] [<ffffffff8122d191>] vfs_write+0xc1/0x130 Aug 17 07:21:32 stevo1 kernel: [ 2213.279974] [<ffffffff81249633>] ? __fdget+0x13/0x20 Aug 17 07:21:32 stevo1 kernel: [ 2213.279978] [<ffffffff8122d306>] SyS_write+0x56/0xc0 Aug 17 07:21:32 stevo1 kernel: [ 2213.279983] [<ffffffff81003e7d>] do_syscall_64+0x7d/0x230 Aug 17 07:21:32 stevo1 kernel: [ 2213.279991] [<ffffffff8106f057>] ? do_page_fault+0x37/0x90 Aug 17 07:21:32 stevo1 kernel: [ 2213.279997] [<ffffffff816e1921>] entry_SYSCALL64_slow_path+0x25/0x25 Nvme_rdma_wq is doing nothing: Aug 17 07:20:08 stevo1 kernel: [ 2130.461614] nvme_rdma_wq S ffff880fedd7bd58 0 4607 2 0x10000080 Aug 17 07:20:08 stevo1 kernel: [ 2130.469572] ffff880fedd7bd58 ffff8810381a15c0 ffff88102bdb5700 ffff881032b66250 Aug 17 07:20:08 stevo1 kernel: [ 2130.477972] ffffffff81038959 ffff880fedd7bd48 ffff880fedd78008 ffff8810206dde00 Aug 17 07:20:08 stevo1 kernel: [ 2130.486357] ffff88102bdb5700 ffffffff810a1730 ffff8810381abed8 ffff880fedd7bda8 Aug 17 07:20:08 stevo1 kernel: [ 2130.494755] Call Trace: Aug 17 07:20:08 stevo1 kernel: [ 2130.498088] [<ffffffff81038959>] ? sched_clock+0x9/0x10 Aug 17 07:20:08 stevo1 kernel: [ 2130.504287] [<ffffffff810a1730>] ? process_one_work+0x4d0/0x4d0 Aug 17 07:20:08 stevo1 kernel: [ 2130.511168] [<ffffffff816ddde0>] schedule+0x40/0xb0 Aug 17 07:20:08 stevo1 kernel: [ 2130.517002] [<ffffffff810b1064>] ? set_user_nice+0x114/0x1b0 Aug 17 07:20:08 stevo1 kernel: [ 2130.523623] [<ffffffff810a1a22>] rescuer_thread+0x2f2/0x380 Aug 17 07:20:08 stevo1 kernel: [ 2130.530146] [<ffffffff816dd920>] ? __schedule+0x1f0/0x5b0 Aug 17 07:20:08 stevo1 kernel: [ 2130.536501] [<ffffffff810cb8a6>] ? __wake_up_common+0x56/0x90 Aug 17 07:20:08 stevo1 kernel: [ 2130.543173] [<ffffffff810a1730>] ? process_one_work+0x4d0/0x4d0 Aug 17 07:20:08 stevo1 kernel: [ 2130.550013] [<ffffffff816ddde0>] ? schedule+0x40/0xb0 Aug 17 07:20:08 stevo1 kernel: [ 2130.555971] [<ffffffff810a1730>] ? process_one_work+0x4d0/0x4d0 Aug 17 07:20:08 stevo1 kernel: [ 2130.562800] [<ffffffff810a6bbc>] kthread+0xcc/0xf0 Aug 17 07:20:08 stevo1 kernel: [ 2130.568485] [<ffffffff810b157e>] ? schedule_tail+0x1e/0xc0 Aug 17 07:20:08 stevo1 kernel: [ 2130.574871] [<ffffffff816e1a7f>] ret_from_fork+0x1f/0x40 Aug 17 07:20:08 stevo1 kernel: [ 2130.581073] [<ffffffff810a6af0>] ? kthread_freezable_should_stop+0x70/0x70 Here is another thread in the delete controller sysfs path: Aug 17 07:20:05 stevo1 kernel: [ 2127.184581] bash D ffff881034547ac8 0 4066 4056 0x10000080 Aug 17 07:20:05 stevo1 kernel: [ 2127.192445] ffff881034547ac8 ffff881033f04140 ffff88102bc80000 ffff881032b66250 Aug 17 07:20:05 stevo1 kernel: [ 2127.200730] ffff880fefe64230 ffff88103e88aa50 ffff881034544008 ffff881034547c98 Aug 17 07:20:05 stevo1 kernel: [ 2127.209022] ffff881034547c08 ffff88102bc80000 7fffffffffffffff ffff881034547b18 Aug 17 07:20:05 stevo1 kernel: [ 2127.217278] Call Trace: Aug 17 07:20:05 stevo1 kernel: [ 2127.220523] [<ffffffff816ddde0>] schedule+0x40/0xb0 Aug 17 07:20:05 stevo1 kernel: [ 2127.226244] [<ffffffff810b2de4>] ? try_to_wake_up+0x54/0x240 Aug 17 07:20:05 stevo1 kernel: [ 2127.232753] [<ffffffff816e0a8d>] schedule_timeout+0x2ad/0x410 Aug 17 07:20:05 stevo1 kernel: [ 2127.239353] [<ffffffff810b3025>] ? wake_up_process+0x15/0x20 Aug 17 07:20:05 stevo1 kernel: [ 2127.245871] [<ffffffff8109d940>] ? wake_up_worker+0x30/0x40 Aug 17 07:20:05 stevo1 kernel: [ 2127.252285] [<ffffffff8109fdbc>] ? insert_work+0x6c/0xb0 Aug 17 07:20:05 stevo1 kernel: [ 2127.258432] [<ffffffff816deefe>] wait_for_completion+0xde/0x110 Aug 17 07:20:05 stevo1 kernel: [ 2127.263351] nvme nvme4: rdma_resolve_addr wait failed (-110). Aug 17 07:20:05 stevo1 kernel: [ 2127.263355] nvme nvme4: Failed reconnect attempt, requeueing... Aug 17 07:20:05 stevo1 kernel: [ 2127.278353] [<ffffffff810b2fd0>] ? try_to_wake_up+0x240/0x240 Aug 17 07:20:05 stevo1 kernel: [ 2127.284912] [<ffffffff810a0e0b>] flush_work+0x2b/0x40 Aug 17 07:20:05 stevo1 kernel: [ 2127.290771] [<ffffffff8109dcc0>] ? worker_detach_from_pool+0xa0/0xa0 Aug 17 07:20:05 stevo1 kernel: [ 2127.297970] [<ffffffffa064e46a>] ? __nvme_rdma_del_ctrl+0x4a/0x60 [nvme_rdma] Aug 17 07:20:05 stevo1 kernel: [ 2127.305945] [<ffffffffa064e4b8>] nvme_rdma_del_ctrl+0x38/0x50 [nvme_rdma] Aug 17 07:20:05 stevo1 kernel: [ 2127.313577] [<ffffffffa02703d4>] nvme_sysfs_delete+0x34/0x50 [nvme_core] Aug 17 07:20:05 stevo1 kernel: [ 2127.321111] [<ffffffff814884a0>] dev_attr_store+0x20/0x30 Aug 17 07:20:05 stevo1 kernel: [ 2127.327343] [<ffffffff812ae29e>] sysfs_kf_write+0x3e/0x40 Aug 17 07:20:05 stevo1 kernel: [ 2127.333551] [<ffffffff812add13>] kernfs_fop_write+0x113/0x1b0 Aug 17 07:20:05 stevo1 kernel: [ 2127.340127] [<ffffffff8115a588>] ? trace_buffer_unlock_commit_regs+0x68/0x80 Aug 17 07:20:05 stevo1 kernel: [ 2127.348017] [<ffffffff8122cfe4>] __vfs_write+0x34/0x120 Aug 17 07:20:05 stevo1 kernel: [ 2127.354083] [<ffffffff81003725>] ? trace_event_raw_event_sys_enter+0xb5/0x130 Aug 17 07:20:05 stevo1 kernel: [ 2127.362060] [<ffffffff810d1d06>] ? percpu_down_read+0x16/0x60 Aug 17 07:20:05 stevo1 kernel: [ 2127.368667] [<ffffffff8122d191>] vfs_write+0xc1/0x130 Aug 17 07:20:05 stevo1 kernel: [ 2127.374559] [<ffffffff81249633>] ? __fdget+0x13/0x20 Aug 17 07:20:05 stevo1 kernel: [ 2127.380367] [<ffffffff8122d306>] SyS_write+0x56/0xc0 Aug 17 07:20:05 stevo1 kernel: [ 2127.386201] [<ffffffff81003e7d>] do_syscall_64+0x7d/0x230 Aug 17 07:20:05 stevo1 kernel: [ 2127.392485] [<ffffffff8106f057>] ? do_page_fault+0x37/0x90 Aug 17 07:20:05 stevo1 kernel: [ 2127.398866] [<ffffffff816e1921>] entry_SYSCALL64_slow_path+0x25/0x25 Here is the reconnect workq thread stuck trying to connect: Aug 17 07:19:45 stevo1 kernel: [ 2107.446780] Workqueue: nvme_rdma_wq nvme_rdma_reconnect_ctrl_work [nvme_rdma] Aug 17 07:19:45 stevo1 kernel: [ 2107.454764] ffff881033a57aa8 ffff88103815c140 ffff8810371dab80 ffff881032b66250 Aug 17 07:19:45 stevo1 kernel: [ 2107.463091] 00000001001b93e9 0000000000000001 ffff881033a54008 ffff88103ee500c0 Aug 17 07:19:45 stevo1 kernel: [ 2107.471431] 00000001001b93e9 0000000000000000 ffff88103ee500c0 ffff881033a57af8 Aug 17 07:19:45 stevo1 kernel: [ 2107.479805] Call Trace: Aug 17 07:19:45 stevo1 kernel: [ 2107.483076] [<ffffffff816ddde0>] schedule+0x40/0xb0 Aug 17 07:19:45 stevo1 kernel: [ 2107.488892] [<ffffffff810f4a80>] ? internal_add_timer+0x70/0x80 Aug 17 07:19:45 stevo1 kernel: [ 2107.495765] [<ffffffff816e09fb>] schedule_timeout+0x21b/0x410 Aug 17 07:19:45 stevo1 kernel: [ 2107.502464] [<ffffffff810f4c10>] ? del_timer+0x70/0x70 Aug 17 07:19:45 stevo1 kernel: [ 2107.508547] [<ffffffffa031da9f>] ? rdma_resolve_ip+0x1cf/0x1e0 [ib_core] Aug 17 07:19:45 stevo1 kernel: [ 2107.516183] [<ffffffffa0736fe0>] ? cma_work_handler+0xa0/0xa0 [rdma_cm] Aug 17 07:19:45 stevo1 kernel: [ 2107.523721] [<ffffffff816deb6f>] wait_for_completion_interruptible_timeout+0xef/0x130 Aug 17 07:19:45 stevo1 kernel: [ 2107.532472] [<ffffffff810b2fd0>] ? try_to_wake_up+0x240/0x240 Aug 17 07:19:45 stevo1 kernel: [ 2107.539152] [<ffffffffa064f5c2>] nvme_rdma_init_queue+0xe2/0x180 [nvme_rdma] Aug 17 07:19:45 stevo1 kernel: [ 2107.547132] [<ffffffffa064f9bc>] nvme_rdma_reconnect_ctrl_work+0x9c/0x1f0 [nvme_rdma] Aug 17 07:19:45 stevo1 kernel: [ 2107.555908] [<ffffffff810a13e3>] process_one_work+0x183/0x4d0 Aug 17 07:19:45 stevo1 kernel: [ 2107.562592] [<ffffffff816dd920>] ? __schedule+0x1f0/0x5b0 Aug 17 07:19:45 stevo1 kernel: [ 2107.568938] [<ffffffff816ddde0>] ? schedule+0x40/0xb0 Aug 17 07:19:45 stevo1 kernel: [ 2107.574927] [<ffffffff810a207d>] worker_thread+0x16d/0x530 Aug 17 07:19:45 stevo1 kernel: [ 2107.581346] [<ffffffff810a1f10>] ? maybe_create_worker+0x120/0x120 Aug 17 07:19:45 stevo1 kernel: [ 2107.588474] [<ffffffff816dd920>] ? __schedule+0x1f0/0x5b0 Aug 17 07:19:45 stevo1 kernel: [ 2107.594813] [<ffffffff810cb8a6>] ? __wake_up_common+0x56/0x90 Aug 17 07:19:45 stevo1 kernel: [ 2107.601488] [<ffffffff810a1f10>] ? maybe_create_worker+0x120/0x120 Aug 17 07:19:45 stevo1 kernel: [ 2107.608625] [<ffffffff816ddde0>] ? schedule+0x40/0xb0 Aug 17 07:19:45 stevo1 kernel: [ 2107.614626] [<ffffffff810a1f10>] ? maybe_create_worker+0x120/0x120 Aug 17 07:19:45 stevo1 kernel: [ 2107.621745] [<ffffffff810a6bbc>] kthread+0xcc/0xf0 Aug 17 07:19:45 stevo1 kernel: [ 2107.627464] [<ffffffff810b157e>] ? schedule_tail+0x1e/0xc0 Aug 17 07:19:45 stevo1 kernel: [ 2107.633886] [<ffffffff816e1a7f>] ret_from_fork+0x1f/0x40 Aug 17 07:19:45 stevo1 kernel: [ 2107.640121] [<ffffffff810a6af0>] ? kthread_freezable_should_stop+0x70/0x70 Here is controller delete workq thread: Aug 17 07:19:42 stevo1 kernel: [ 2104.515910] Workqueue: nvme_rdma_wq nvme_rdma_del_ctrl_work [nvme_rdma] Aug 17 07:19:42 stevo1 kernel: [ 2104.523373] ffff881033f13888 ffff88103817ab80 ffff881033f04140 ffff881032b66250 Aug 17 07:19:42 stevo1 kernel: [ 2104.531744] 00000001001c1a60 0000000000000005 ffff881033f10008 ffff88103ef500c0 Aug 17 07:19:42 stevo1 kernel: [ 2104.540116] 00000001001c1a60 0000000000000000 ffff88103ef500c0 ffff881033f138d8 Aug 17 07:19:42 stevo1 kernel: [ 2104.548501] Call Trace: Aug 17 07:19:42 stevo1 kernel: [ 2104.551777] [<ffffffff816ddde0>] schedule+0x40/0xb0 Aug 17 07:19:42 stevo1 kernel: [ 2104.557593] [<ffffffff810f4a80>] ? internal_add_timer+0x70/0x80 Aug 17 07:19:42 stevo1 kernel: [ 2104.564445] [<ffffffff816e09fb>] schedule_timeout+0x21b/0x410 Aug 17 07:19:42 stevo1 kernel: [ 2104.571117] [<ffffffff810f4c10>] ? del_timer+0x70/0x70 Aug 17 07:19:42 stevo1 kernel: [ 2104.577191] [<ffffffff810fe67c>] ? ktime_get+0x4c/0xc0 Aug 17 07:19:42 stevo1 kernel: [ 2104.583248] [<ffffffff816dd6c4>] io_schedule_timeout+0xa4/0x110 Aug 17 07:19:42 stevo1 kernel: [ 2104.590083] [<ffffffff816df013>] wait_for_completion_io_timeout+0xe3/0x120 Aug 17 07:19:42 stevo1 kernel: [ 2104.597943] [<ffffffff810b2fd0>] ? try_to_wake_up+0x240/0x240 Aug 17 07:19:42 stevo1 kernel: [ 2104.604613] [<ffffffff81332a4d>] ? blk_execute_rq_nowait+0x7d/0x160 Aug 17 07:19:42 stevo1 kernel: [ 2104.611810] [<ffffffff81332bbc>] blk_execute_rq+0x8c/0x110 Aug 17 07:19:42 stevo1 kernel: [ 2104.618221] [<ffffffffa02715c5>] __nvme_submit_sync_cmd+0x85/0xe0 [nvme_core] Aug 17 07:19:42 stevo1 kernel: [ 2104.626293] [<ffffffffa0672526>] nvmf_reg_write32+0x66/0xb0 [nvme_fabrics] Aug 17 07:19:42 stevo1 kernel: [ 2104.634086] [<ffffffffa027095e>] nvme_shutdown_ctrl+0x4e/0xe0 [nvme_core] Aug 17 07:19:42 stevo1 kernel: [ 2104.641796] [<ffffffffa064f161>] nvme_rdma_shutdown_ctrl+0xa1/0xe0 [nvme_rdma] Aug 17 07:19:42 stevo1 kernel: [ 2104.649940] [<ffffffffa064f21c>] __nvme_rdma_remove_ctrl+0x7c/0x80 [nvme_rdma] Aug 17 07:19:42 stevo1 kernel: [ 2104.658093] [<ffffffffa064f25a>] nvme_rdma_del_ctrl_work+0x1a/0x20 [nvme_rdma] Aug 17 07:19:42 stevo1 kernel: [ 2104.666248] [<ffffffff810a13e3>] process_one_work+0x183/0x4d0 Aug 17 07:19:42 stevo1 kernel: [ 2104.672936] [<ffffffff816dd920>] ? __schedule+0x1f0/0x5b0 Aug 17 07:19:42 stevo1 kernel: [ 2104.679269] [<ffffffff8109daf2>] ? move_linked_works+0x62/0x90 Aug 17 07:19:42 stevo1 kernel: [ 2104.686032] [<ffffffff810a2204>] worker_thread+0x2f4/0x530 Aug 17 07:19:42 stevo1 kernel: [ 2104.692441] [<ffffffff810a1f10>] ? maybe_create_worker+0x120/0x120 Aug 17 07:19:42 stevo1 kernel: [ 2104.699540] [<ffffffff816dd920>] ? __schedule+0x1f0/0x5b0 Aug 17 07:19:42 stevo1 kernel: [ 2104.705867] [<ffffffff810cb8a6>] ? __wake_up_common+0x56/0x90 Aug 17 07:19:42 stevo1 kernel: [ 2104.712534] [<ffffffff810a1f10>] ? maybe_create_worker+0x120/0x120 Aug 17 07:19:42 stevo1 kernel: [ 2104.719257] nvme nvme4: rdma_resolve_addr wait failed (-104). Aug 17 07:19:42 stevo1 kernel: [ 2104.719261] nvme nvme4: Failed reconnect attempt, requeueing... Aug 17 07:19:42 stevo1 kernel: [ 2104.733085] [<ffffffff816ddde0>] ? schedule+0x40/0xb0 Aug 17 07:19:42 stevo1 kernel: [ 2104.739063] [<ffffffff810a1f10>] ? maybe_create_worker+0x120/0x120 Aug 17 07:19:42 stevo1 kernel: [ 2104.746163] [<ffffffff810a6bbc>] kthread+0xcc/0xf0 Aug 17 07:19:42 stevo1 kernel: [ 2104.751878] [<ffffffff810b157e>] ? schedule_tail+0x1e/0xc0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* nvme/rdma initiator stuck on reboot 2016-08-17 14:33 ` Steve Wise @ 2016-08-17 14:46 ` Sagi Grimberg 2016-08-17 15:13 ` Steve Wise 0 siblings, 1 reply; 15+ messages in thread From: Sagi Grimberg @ 2016-08-17 14:46 UTC (permalink / raw) > > The deleting thread is stuck: > > Aug 17 07:21:32 stevo1 kernel: [ 2213.279798] INFO: task bash:4066 blocked > for more than 120 seconds. > Aug 17 07:21:32 stevo1 kernel: [ 2213.279802] Tainted: G E > 4.8.0-rc2-nvmf-4.8-rc-rebased-rc2+ #15 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279803] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Aug 17 07:21:32 stevo1 kernel: [ 2213.279811] bash D > ffff881034547ac8 0 4066 4056 0x10000080 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279819] ffff881034547ac8 > ffff881033f04140 ffff88102bc80000 ffff881032b66250 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279824] ffff880fefe64230 > ffff88103e88aa50 ffff881034544008 ffff881034547c98 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279828] ffff881034547c08 > ffff88102bc80000 7fffffffffffffff ffff881034547b18 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279829] Call Trace: > Aug 17 07:21:32 stevo1 kernel: [ 2213.279842] [<ffffffff816ddde0>] > schedule+0x40/0xb0 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279850] [<ffffffff810b2de4>] ? > try_to_wake_up+0x54/0x240 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279856] [<ffffffff816e0a8d>] > schedule_timeout+0x2ad/0x410 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279861] [<ffffffff810b3025>] ? > wake_up_process+0x15/0x20 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279868] [<ffffffff8109d940>] ? > wake_up_worker+0x30/0x40 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279873] [<ffffffff8109fdbc>] ? > insert_work+0x6c/0xb0 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279878] [<ffffffff816deefe>] > wait_for_completion+0xde/0x110 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279882] [<ffffffff810b2fd0>] ? > try_to_wake_up+0x240/0x240 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279887] [<ffffffff810a0e0b>] > flush_work+0x2b/0x40 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279892] [<ffffffff8109dcc0>] ? > worker_detach_from_pool+0xa0/0xa0 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279902] [<ffffffffa064e46a>] ? > __nvme_rdma_del_ctrl+0x4a/0x60 [nvme_rdma] > Aug 17 07:21:32 stevo1 kernel: [ 2213.279908] [<ffffffffa064e4b8>] > nvme_rdma_del_ctrl+0x38/0x50 [nvme_rdma] > Aug 17 07:21:32 stevo1 kernel: [ 2213.279918] [<ffffffffa02703d4>] > nvme_sysfs_delete+0x34/0x50 [nvme_core] > Aug 17 07:21:32 stevo1 kernel: [ 2213.279923] [<ffffffff814884a0>] > dev_attr_store+0x20/0x30 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279931] [<ffffffff812ae29e>] > sysfs_kf_write+0x3e/0x40 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279935] [<ffffffff812add13>] > kernfs_fop_write+0x113/0x1b0 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279942] [<ffffffff8115a588>] ? > trace_buffer_unlock_commit_regs+0x68/0x80 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279949] [<ffffffff8122cfe4>] > __vfs_write+0x34/0x120 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279957] [<ffffffff81003725>] ? > trace_event_raw_event_sys_enter+0xb5/0x130 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279963] [<ffffffff810d1d06>] ? > percpu_down_read+0x16/0x60 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279967] [<ffffffff8122d191>] > vfs_write+0xc1/0x130 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279974] [<ffffffff81249633>] ? > __fdget+0x13/0x20 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279978] [<ffffffff8122d306>] > SyS_write+0x56/0xc0 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279983] [<ffffffff81003e7d>] > do_syscall_64+0x7d/0x230 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279991] [<ffffffff8106f057>] ? > do_page_fault+0x37/0x90 > Aug 17 07:21:32 stevo1 kernel: [ 2213.279997] [<ffffffff816e1921>] > entry_SYSCALL64_slow_path+0x25/0x25 > > > > Nvme_rdma_wq is doing nothing: > > Aug 17 07:20:08 stevo1 kernel: [ 2130.461614] nvme_rdma_wq S > ffff880fedd7bd58 0 4607 2 0x10000080 > Aug 17 07:20:08 stevo1 kernel: [ 2130.469572] ffff880fedd7bd58 > ffff8810381a15c0 ffff88102bdb5700 ffff881032b66250 > Aug 17 07:20:08 stevo1 kernel: [ 2130.477972] ffffffff81038959 > ffff880fedd7bd48 ffff880fedd78008 ffff8810206dde00 > Aug 17 07:20:08 stevo1 kernel: [ 2130.486357] ffff88102bdb5700 > ffffffff810a1730 ffff8810381abed8 ffff880fedd7bda8 > Aug 17 07:20:08 stevo1 kernel: [ 2130.494755] Call Trace: > Aug 17 07:20:08 stevo1 kernel: [ 2130.498088] [<ffffffff81038959>] ? > sched_clock+0x9/0x10 > Aug 17 07:20:08 stevo1 kernel: [ 2130.504287] [<ffffffff810a1730>] ? > process_one_work+0x4d0/0x4d0 > Aug 17 07:20:08 stevo1 kernel: [ 2130.511168] [<ffffffff816ddde0>] > schedule+0x40/0xb0 > Aug 17 07:20:08 stevo1 kernel: [ 2130.517002] [<ffffffff810b1064>] ? > set_user_nice+0x114/0x1b0 > Aug 17 07:20:08 stevo1 kernel: [ 2130.523623] [<ffffffff810a1a22>] > rescuer_thread+0x2f2/0x380 > Aug 17 07:20:08 stevo1 kernel: [ 2130.530146] [<ffffffff816dd920>] ? > __schedule+0x1f0/0x5b0 > Aug 17 07:20:08 stevo1 kernel: [ 2130.536501] [<ffffffff810cb8a6>] ? > __wake_up_common+0x56/0x90 > Aug 17 07:20:08 stevo1 kernel: [ 2130.543173] [<ffffffff810a1730>] ? > process_one_work+0x4d0/0x4d0 > Aug 17 07:20:08 stevo1 kernel: [ 2130.550013] [<ffffffff816ddde0>] ? > schedule+0x40/0xb0 > Aug 17 07:20:08 stevo1 kernel: [ 2130.555971] [<ffffffff810a1730>] ? > process_one_work+0x4d0/0x4d0 > Aug 17 07:20:08 stevo1 kernel: [ 2130.562800] [<ffffffff810a6bbc>] > kthread+0xcc/0xf0 > Aug 17 07:20:08 stevo1 kernel: [ 2130.568485] [<ffffffff810b157e>] ? > schedule_tail+0x1e/0xc0 > Aug 17 07:20:08 stevo1 kernel: [ 2130.574871] [<ffffffff816e1a7f>] > ret_from_fork+0x1f/0x40 > Aug 17 07:20:08 stevo1 kernel: [ 2130.581073] [<ffffffff810a6af0>] ? > kthread_freezable_should_stop+0x70/0x70 > > Here is another thread in the delete controller sysfs path: > > Aug 17 07:20:05 stevo1 kernel: [ 2127.184581] bash D > ffff881034547ac8 0 4066 4056 0x10000080 > Aug 17 07:20:05 stevo1 kernel: [ 2127.192445] ffff881034547ac8 > ffff881033f04140 ffff88102bc80000 ffff881032b66250 > Aug 17 07:20:05 stevo1 kernel: [ 2127.200730] ffff880fefe64230 > ffff88103e88aa50 ffff881034544008 ffff881034547c98 > Aug 17 07:20:05 stevo1 kernel: [ 2127.209022] ffff881034547c08 > ffff88102bc80000 7fffffffffffffff ffff881034547b18 > Aug 17 07:20:05 stevo1 kernel: [ 2127.217278] Call Trace: > Aug 17 07:20:05 stevo1 kernel: [ 2127.220523] [<ffffffff816ddde0>] > schedule+0x40/0xb0 > Aug 17 07:20:05 stevo1 kernel: [ 2127.226244] [<ffffffff810b2de4>] ? > try_to_wake_up+0x54/0x240 > Aug 17 07:20:05 stevo1 kernel: [ 2127.232753] [<ffffffff816e0a8d>] > schedule_timeout+0x2ad/0x410 > Aug 17 07:20:05 stevo1 kernel: [ 2127.239353] [<ffffffff810b3025>] ? > wake_up_process+0x15/0x20 > Aug 17 07:20:05 stevo1 kernel: [ 2127.245871] [<ffffffff8109d940>] ? > wake_up_worker+0x30/0x40 > Aug 17 07:20:05 stevo1 kernel: [ 2127.252285] [<ffffffff8109fdbc>] ? > insert_work+0x6c/0xb0 > Aug 17 07:20:05 stevo1 kernel: [ 2127.258432] [<ffffffff816deefe>] > wait_for_completion+0xde/0x110 > Aug 17 07:20:05 stevo1 kernel: [ 2127.263351] nvme nvme4: rdma_resolve_addr > wait failed (-110). > Aug 17 07:20:05 stevo1 kernel: [ 2127.263355] nvme nvme4: Failed reconnect > attempt, requeueing... > Aug 17 07:20:05 stevo1 kernel: [ 2127.278353] [<ffffffff810b2fd0>] ? > try_to_wake_up+0x240/0x240 > Aug 17 07:20:05 stevo1 kernel: [ 2127.284912] [<ffffffff810a0e0b>] > flush_work+0x2b/0x40 > Aug 17 07:20:05 stevo1 kernel: [ 2127.290771] [<ffffffff8109dcc0>] ? > worker_detach_from_pool+0xa0/0xa0 > Aug 17 07:20:05 stevo1 kernel: [ 2127.297970] [<ffffffffa064e46a>] ? > __nvme_rdma_del_ctrl+0x4a/0x60 [nvme_rdma] > Aug 17 07:20:05 stevo1 kernel: [ 2127.305945] [<ffffffffa064e4b8>] > nvme_rdma_del_ctrl+0x38/0x50 [nvme_rdma] > Aug 17 07:20:05 stevo1 kernel: [ 2127.313577] [<ffffffffa02703d4>] > nvme_sysfs_delete+0x34/0x50 [nvme_core] > Aug 17 07:20:05 stevo1 kernel: [ 2127.321111] [<ffffffff814884a0>] > dev_attr_store+0x20/0x30 > Aug 17 07:20:05 stevo1 kernel: [ 2127.327343] [<ffffffff812ae29e>] > sysfs_kf_write+0x3e/0x40 > Aug 17 07:20:05 stevo1 kernel: [ 2127.333551] [<ffffffff812add13>] > kernfs_fop_write+0x113/0x1b0 > Aug 17 07:20:05 stevo1 kernel: [ 2127.340127] [<ffffffff8115a588>] ? > trace_buffer_unlock_commit_regs+0x68/0x80 > Aug 17 07:20:05 stevo1 kernel: [ 2127.348017] [<ffffffff8122cfe4>] > __vfs_write+0x34/0x120 > Aug 17 07:20:05 stevo1 kernel: [ 2127.354083] [<ffffffff81003725>] ? > trace_event_raw_event_sys_enter+0xb5/0x130 > Aug 17 07:20:05 stevo1 kernel: [ 2127.362060] [<ffffffff810d1d06>] ? > percpu_down_read+0x16/0x60 > Aug 17 07:20:05 stevo1 kernel: [ 2127.368667] [<ffffffff8122d191>] > vfs_write+0xc1/0x130 > Aug 17 07:20:05 stevo1 kernel: [ 2127.374559] [<ffffffff81249633>] ? > __fdget+0x13/0x20 > Aug 17 07:20:05 stevo1 kernel: [ 2127.380367] [<ffffffff8122d306>] > SyS_write+0x56/0xc0 > Aug 17 07:20:05 stevo1 kernel: [ 2127.386201] [<ffffffff81003e7d>] > do_syscall_64+0x7d/0x230 > Aug 17 07:20:05 stevo1 kernel: [ 2127.392485] [<ffffffff8106f057>] ? > do_page_fault+0x37/0x90 > Aug 17 07:20:05 stevo1 kernel: [ 2127.398866] [<ffffffff816e1921>] > entry_SYSCALL64_slow_path+0x25/0x25 > > Here is the reconnect workq thread stuck trying to connect: > > Aug 17 07:19:45 stevo1 kernel: [ 2107.446780] Workqueue: nvme_rdma_wq > nvme_rdma_reconnect_ctrl_work [nvme_rdma] > Aug 17 07:19:45 stevo1 kernel: [ 2107.454764] ffff881033a57aa8 > ffff88103815c140 ffff8810371dab80 ffff881032b66250 > Aug 17 07:19:45 stevo1 kernel: [ 2107.463091] 00000001001b93e9 > 0000000000000001 ffff881033a54008 ffff88103ee500c0 > Aug 17 07:19:45 stevo1 kernel: [ 2107.471431] 00000001001b93e9 > 0000000000000000 ffff88103ee500c0 ffff881033a57af8 > Aug 17 07:19:45 stevo1 kernel: [ 2107.479805] Call Trace: > Aug 17 07:19:45 stevo1 kernel: [ 2107.483076] [<ffffffff816ddde0>] > schedule+0x40/0xb0 > Aug 17 07:19:45 stevo1 kernel: [ 2107.488892] [<ffffffff810f4a80>] ? > internal_add_timer+0x70/0x80 > Aug 17 07:19:45 stevo1 kernel: [ 2107.495765] [<ffffffff816e09fb>] > schedule_timeout+0x21b/0x410 > Aug 17 07:19:45 stevo1 kernel: [ 2107.502464] [<ffffffff810f4c10>] ? > del_timer+0x70/0x70 > Aug 17 07:19:45 stevo1 kernel: [ 2107.508547] [<ffffffffa031da9f>] ? > rdma_resolve_ip+0x1cf/0x1e0 [ib_core] > Aug 17 07:19:45 stevo1 kernel: [ 2107.516183] [<ffffffffa0736fe0>] ? > cma_work_handler+0xa0/0xa0 [rdma_cm] > Aug 17 07:19:45 stevo1 kernel: [ 2107.523721] [<ffffffff816deb6f>] > wait_for_completion_interruptible_timeout+0xef/0x130 > Aug 17 07:19:45 stevo1 kernel: [ 2107.532472] [<ffffffff810b2fd0>] ? > try_to_wake_up+0x240/0x240 > Aug 17 07:19:45 stevo1 kernel: [ 2107.539152] [<ffffffffa064f5c2>] > nvme_rdma_init_queue+0xe2/0x180 [nvme_rdma] > Aug 17 07:19:45 stevo1 kernel: [ 2107.547132] [<ffffffffa064f9bc>] > nvme_rdma_reconnect_ctrl_work+0x9c/0x1f0 [nvme_rdma] > Aug 17 07:19:45 stevo1 kernel: [ 2107.555908] [<ffffffff810a13e3>] > process_one_work+0x183/0x4d0 > Aug 17 07:19:45 stevo1 kernel: [ 2107.562592] [<ffffffff816dd920>] ? > __schedule+0x1f0/0x5b0 > Aug 17 07:19:45 stevo1 kernel: [ 2107.568938] [<ffffffff816ddde0>] ? > schedule+0x40/0xb0 > Aug 17 07:19:45 stevo1 kernel: [ 2107.574927] [<ffffffff810a207d>] > worker_thread+0x16d/0x530 > Aug 17 07:19:45 stevo1 kernel: [ 2107.581346] [<ffffffff810a1f10>] ? > maybe_create_worker+0x120/0x120 > Aug 17 07:19:45 stevo1 kernel: [ 2107.588474] [<ffffffff816dd920>] ? > __schedule+0x1f0/0x5b0 > Aug 17 07:19:45 stevo1 kernel: [ 2107.594813] [<ffffffff810cb8a6>] ? > __wake_up_common+0x56/0x90 > Aug 17 07:19:45 stevo1 kernel: [ 2107.601488] [<ffffffff810a1f10>] ? > maybe_create_worker+0x120/0x120 > Aug 17 07:19:45 stevo1 kernel: [ 2107.608625] [<ffffffff816ddde0>] ? > schedule+0x40/0xb0 > Aug 17 07:19:45 stevo1 kernel: [ 2107.614626] [<ffffffff810a1f10>] ? > maybe_create_worker+0x120/0x120 > Aug 17 07:19:45 stevo1 kernel: [ 2107.621745] [<ffffffff810a6bbc>] > kthread+0xcc/0xf0 > Aug 17 07:19:45 stevo1 kernel: [ 2107.627464] [<ffffffff810b157e>] ? > schedule_tail+0x1e/0xc0 > Aug 17 07:19:45 stevo1 kernel: [ 2107.633886] [<ffffffff816e1a7f>] > ret_from_fork+0x1f/0x40 > Aug 17 07:19:45 stevo1 kernel: [ 2107.640121] [<ffffffff810a6af0>] ? > kthread_freezable_should_stop+0x70/0x70 > > Here is controller delete workq thread: > > Aug 17 07:19:42 stevo1 kernel: [ 2104.515910] Workqueue: nvme_rdma_wq > nvme_rdma_del_ctrl_work [nvme_rdma] > Aug 17 07:19:42 stevo1 kernel: [ 2104.523373] ffff881033f13888 > ffff88103817ab80 ffff881033f04140 ffff881032b66250 > Aug 17 07:19:42 stevo1 kernel: [ 2104.531744] 00000001001c1a60 > 0000000000000005 ffff881033f10008 ffff88103ef500c0 > Aug 17 07:19:42 stevo1 kernel: [ 2104.540116] 00000001001c1a60 > 0000000000000000 ffff88103ef500c0 ffff881033f138d8 > Aug 17 07:19:42 stevo1 kernel: [ 2104.548501] Call Trace: > Aug 17 07:19:42 stevo1 kernel: [ 2104.551777] [<ffffffff816ddde0>] > schedule+0x40/0xb0 > Aug 17 07:19:42 stevo1 kernel: [ 2104.557593] [<ffffffff810f4a80>] ? > internal_add_timer+0x70/0x80 > Aug 17 07:19:42 stevo1 kernel: [ 2104.564445] [<ffffffff816e09fb>] > schedule_timeout+0x21b/0x410 > Aug 17 07:19:42 stevo1 kernel: [ 2104.571117] [<ffffffff810f4c10>] ? > del_timer+0x70/0x70 > Aug 17 07:19:42 stevo1 kernel: [ 2104.577191] [<ffffffff810fe67c>] ? > ktime_get+0x4c/0xc0 > Aug 17 07:19:42 stevo1 kernel: [ 2104.583248] [<ffffffff816dd6c4>] > io_schedule_timeout+0xa4/0x110 > Aug 17 07:19:42 stevo1 kernel: [ 2104.590083] [<ffffffff816df013>] > wait_for_completion_io_timeout+0xe3/0x120 > Aug 17 07:19:42 stevo1 kernel: [ 2104.597943] [<ffffffff810b2fd0>] ? > try_to_wake_up+0x240/0x240 > Aug 17 07:19:42 stevo1 kernel: [ 2104.604613] [<ffffffff81332a4d>] ? > blk_execute_rq_nowait+0x7d/0x160 > Aug 17 07:19:42 stevo1 kernel: [ 2104.611810] [<ffffffff81332bbc>] > blk_execute_rq+0x8c/0x110 > Aug 17 07:19:42 stevo1 kernel: [ 2104.618221] [<ffffffffa02715c5>] > __nvme_submit_sync_cmd+0x85/0xe0 [nvme_core] > Aug 17 07:19:42 stevo1 kernel: [ 2104.626293] [<ffffffffa0672526>] > nvmf_reg_write32+0x66/0xb0 [nvme_fabrics] > Aug 17 07:19:42 stevo1 kernel: [ 2104.634086] [<ffffffffa027095e>] > nvme_shutdown_ctrl+0x4e/0xe0 [nvme_core] > Aug 17 07:19:42 stevo1 kernel: [ 2104.641796] [<ffffffffa064f161>] > nvme_rdma_shutdown_ctrl+0xa1/0xe0 [nvme_rdma] > Aug 17 07:19:42 stevo1 kernel: [ 2104.649940] [<ffffffffa064f21c>] > __nvme_rdma_remove_ctrl+0x7c/0x80 [nvme_rdma] > Aug 17 07:19:42 stevo1 kernel: [ 2104.658093] [<ffffffffa064f25a>] > nvme_rdma_del_ctrl_work+0x1a/0x20 [nvme_rdma] > Aug 17 07:19:42 stevo1 kernel: [ 2104.666248] [<ffffffff810a13e3>] > process_one_work+0x183/0x4d0 > Aug 17 07:19:42 stevo1 kernel: [ 2104.672936] [<ffffffff816dd920>] ? > __schedule+0x1f0/0x5b0 > Aug 17 07:19:42 stevo1 kernel: [ 2104.679269] [<ffffffff8109daf2>] ? > move_linked_works+0x62/0x90 > Aug 17 07:19:42 stevo1 kernel: [ 2104.686032] [<ffffffff810a2204>] > worker_thread+0x2f4/0x530 > Aug 17 07:19:42 stevo1 kernel: [ 2104.692441] [<ffffffff810a1f10>] ? > maybe_create_worker+0x120/0x120 > Aug 17 07:19:42 stevo1 kernel: [ 2104.699540] [<ffffffff816dd920>] ? > __schedule+0x1f0/0x5b0 > Aug 17 07:19:42 stevo1 kernel: [ 2104.705867] [<ffffffff810cb8a6>] ? > __wake_up_common+0x56/0x90 > Aug 17 07:19:42 stevo1 kernel: [ 2104.712534] [<ffffffff810a1f10>] ? > maybe_create_worker+0x120/0x120 > Aug 17 07:19:42 stevo1 kernel: [ 2104.719257] nvme nvme4: rdma_resolve_addr > wait failed (-104). > Aug 17 07:19:42 stevo1 kernel: [ 2104.719261] nvme nvme4: Failed reconnect > attempt, requeueing... > Aug 17 07:19:42 stevo1 kernel: [ 2104.733085] [<ffffffff816ddde0>] ? > schedule+0x40/0xb0 > Aug 17 07:19:42 stevo1 kernel: [ 2104.739063] [<ffffffff810a1f10>] ? > maybe_create_worker+0x120/0x120 > Aug 17 07:19:42 stevo1 kernel: [ 2104.746163] [<ffffffff810a6bbc>] > kthread+0xcc/0xf0 > Aug 17 07:19:42 stevo1 kernel: [ 2104.751878] [<ffffffff810b157e>] ? > schedule_tail+0x1e/0xc0 > > > Can this be related due to the fact that we use a signle-threaded workqueue for delete/reset/reconnect? (delete cancel_sync the active reconnect work...) Does this untested patch help? -- diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 8d2875b4c56d..78a47c17fb01 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1997,7 +1997,7 @@ static struct nvmf_transport_ops nvme_rdma_transport = { static int __init nvme_rdma_init_module(void) { - nvme_rdma_wq = create_workqueue("nvme_rdma_wq"); + nvme_rdma_wq = alloc_workqueue("nvme_rdma_wq", 0, 0); if (!nvme_rdma_wq) return -ENOMEM; -- ^ permalink raw reply related [flat|nested] 15+ messages in thread
* nvme/rdma initiator stuck on reboot 2016-08-17 14:46 ` Sagi Grimberg @ 2016-08-17 15:13 ` Steve Wise 2016-08-18 7:01 ` Sagi Grimberg 0 siblings, 1 reply; 15+ messages in thread From: Steve Wise @ 2016-08-17 15:13 UTC (permalink / raw) > > Can this be related due to the fact that we use a signle-threaded > workqueue for delete/reset/reconnect? (delete cancel_sync the active > reconnect work...) > > Does this untested patch help? That seems to do it! > -- > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c > index 8d2875b4c56d..78a47c17fb01 100644 > --- a/drivers/nvme/host/rdma.c > +++ b/drivers/nvme/host/rdma.c > @@ -1997,7 +1997,7 @@ static struct nvmf_transport_ops > nvme_rdma_transport = { > > static int __init nvme_rdma_init_module(void) > { > - nvme_rdma_wq = create_workqueue("nvme_rdma_wq"); > + nvme_rdma_wq = alloc_workqueue("nvme_rdma_wq", 0, 0); > if (!nvme_rdma_wq) > return -ENOMEM; > -- ^ permalink raw reply [flat|nested] 15+ messages in thread
* nvme/rdma initiator stuck on reboot 2016-08-17 15:13 ` Steve Wise @ 2016-08-18 7:01 ` Sagi Grimberg 2016-08-18 13:59 ` Steve Wise 0 siblings, 1 reply; 15+ messages in thread From: Sagi Grimberg @ 2016-08-18 7:01 UTC (permalink / raw) >> Can this be related due to the fact that we use a signle-threaded >> workqueue for delete/reset/reconnect? (delete cancel_sync the active >> reconnect work...) >> >> Does this untested patch help? > > That seems to do it! Is this a formal tested-by? ^ permalink raw reply [flat|nested] 15+ messages in thread
* nvme/rdma initiator stuck on reboot 2016-08-18 7:01 ` Sagi Grimberg @ 2016-08-18 13:59 ` Steve Wise 2016-08-18 14:47 ` Steve Wise 2016-08-18 15:21 ` 'Christoph Hellwig' 0 siblings, 2 replies; 15+ messages in thread From: Steve Wise @ 2016-08-18 13:59 UTC (permalink / raw) > > >> Can this be related due to the fact that we use a signle-threaded > >> workqueue for delete/reset/reconnect? (delete cancel_sync the active > >> reconnect work...) > >> > >> Does this untested patch help? > > > > That seems to do it! > > Is this a formal tested-by? Sure, but let me ask a question: So the bug was that the delete controller worker was blocked waiting for the reconnect worker to complete. Yes? And the reconnect worker was never completing? Why is that? Here are a few tidbits about iWARP connections: address resolution == neighbor discovery. So if the neighbor is unreachable, it will take a few seconds for the OS to give up and fail the resolution. If the neigh entry is valid and the peer becomes unreachable during connection setup, it might take 60 seconds or so for a connect operation to give up and fail. So this is probably slowing the reconnect thread down. But shouldn't the reconnect thread notice that a delete is trying to happen and bail out? ^ permalink raw reply [flat|nested] 15+ messages in thread
* nvme/rdma initiator stuck on reboot 2016-08-18 13:59 ` Steve Wise @ 2016-08-18 14:47 ` Steve Wise 2016-08-18 15:21 ` 'Christoph Hellwig' 1 sibling, 0 replies; 15+ messages in thread From: Steve Wise @ 2016-08-18 14:47 UTC (permalink / raw) > > > > >> Can this be related due to the fact that we use a signle-threaded > > >> workqueue for delete/reset/reconnect? (delete cancel_sync the active > > >> reconnect work...) > > >> > > >> Does this untested patch help? > > > > > > That seems to do it! > > > > Is this a formal tested-by? > > Sure, While the patch worked for deleting the controllers, it still hangs if I reboot the host after the target reboots and the host begins kato recovery. Looks like the reconnect thread just gets stuck doing this: [ 947.095936] nvme nvme4: Failed reconnect attempt, requeueing... [ 947.616015] nvme nvme5: rdma_resolve_addr wait failed (-110). [ 947.623943] nvme nvme5: Failed reconnect attempt, requeueing... [ 948.128012] nvme nvme6: rdma_resolve_addr wait failed (-110). [ 948.135956] nvme nvme6: Failed reconnect attempt, requeueing... [ 948.624052] nvme nvme7: rdma_resolve_addr wait failed (-104). I'll try and get a crash dump of this state to look at all the threads. But I think we need the reconnect worker to give up if the controller it is reconnecting is getting deleted or the device removed. > > but let me ask a question: So the bug was that the delete controller > worker was blocked waiting for the reconnect worker to complete. Yes? And the > reconnect worker was never completing? Why is that? Here are a few tidbits > about iWARP connections: address resolution == neighbor discovery. So if the > neighbor is unreachable, it will take a few seconds for the OS to give up and > fail the resolution. If the neigh entry is valid and the peer becomes > unreachable during connection setup, it might take 60 seconds or so for a > connect operation to give up and fail. So this is probably slowing the > reconnect thread down. But shouldn't the reconnect thread notice that a delete > is trying to happen and bail out? > > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
* nvme/rdma initiator stuck on reboot 2016-08-18 13:59 ` Steve Wise 2016-08-18 14:47 ` Steve Wise @ 2016-08-18 15:21 ` 'Christoph Hellwig' 2016-08-18 17:59 ` Steve Wise 2016-08-19 8:58 ` Sagi Grimberg 1 sibling, 2 replies; 15+ messages in thread From: 'Christoph Hellwig' @ 2016-08-18 15:21 UTC (permalink / raw) On Thu, Aug 18, 2016@08:59:15AM -0500, Steve Wise wrote: > Sure, but let me ask a question: So the bug was that the delete controller > worker was blocked waiting for the reconnect worker to complete. Btw, in that case the patch is not actually correct, as even workqueue with a higher concurrency level MAY deadlock under enough memory pressure. We'll need separate workqueues to handle this case I think. > Yes? And the > reconnect worker was never completing? Why is that? Here are a few tidbits > about iWARP connections: address resolution == neighbor discovery. So if the > neighbor is unreachable, it will take a few seconds for the OS to give up and > fail the resolution. If the neigh entry is valid and the peer becomes > unreachable during connection setup, it might take 60 seconds or so for a > connect operation to give up and fail. So this is probably slowing the > reconnect thread down. But shouldn't the reconnect thread notice that a delete > is trying to happen and bail out? I think we should aim for a state machine that can detect this, but we'll have to see if that will end up in synchronization overkill. ^ permalink raw reply [flat|nested] 15+ messages in thread
* nvme/rdma initiator stuck on reboot 2016-08-18 15:21 ` 'Christoph Hellwig' @ 2016-08-18 17:59 ` Steve Wise 2016-08-18 18:50 ` Steve Wise 2016-08-19 8:58 ` Sagi Grimberg 1 sibling, 1 reply; 15+ messages in thread From: Steve Wise @ 2016-08-18 17:59 UTC (permalink / raw) > Btw, in that case the patch is not actually correct, as even workqueue > with a higher concurrency level MAY deadlock under enough memory > pressure. We'll need separate workqueues to handle this case I think. > > > Yes? And the > > reconnect worker was never completing? Why is that? Here are a few tidbits > > about iWARP connections: address resolution == neighbor discovery. So if the > > neighbor is unreachable, it will take a few seconds for the OS to give up and > > fail the resolution. If the neigh entry is valid and the peer becomes > > unreachable during connection setup, it might take 60 seconds or so for a > > connect operation to give up and fail. So this is probably slowing the > > reconnect thread down. But shouldn't the reconnect thread notice that a delete > > is trying to happen and bail out? > > I think we should aim for a state machine that can detect this, but > we'll have to see if that will end up in synchronization overkill. Looking at the state machine I don't see why the reconnect thread would get stuck continually rescheduling once the controller was deleted. Changing from RECONNECTING to DELETING will be done by nvme_change_ctrl_state(). So once that happens, in __nvme_rdma_del_ctrl() , the thread running reconnect logic should stop rescheduling due to this in the failure logic of nvme_rdma_reconnect_ctrl_work(): ... requeue: /* Make sure we are not resetting/deleting */ if (ctrl->ctrl.state == NVME_CTRL_RECONNECTING) { dev_info(ctrl->ctrl.device, "Failed reconnect attempt, requeueing...\n"); queue_delayed_work(nvme_rdma_wq, &ctrl->reconnect_work, ctrl->reconnect_delay * HZ); } ... So something isn't happening like I think it is, I guess. Also, even with the workqueue_alloc() change, a reboot during reconnect gets stuck. I never see the controllers getting deleted nor the unplug event handler happening, so the reconnect thread seems to hang the shutdown/reboot... ^ permalink raw reply [flat|nested] 15+ messages in thread
* nvme/rdma initiator stuck on reboot 2016-08-18 17:59 ` Steve Wise @ 2016-08-18 18:50 ` Steve Wise 2016-08-18 19:11 ` Steve Wise 0 siblings, 1 reply; 15+ messages in thread From: Steve Wise @ 2016-08-18 18:50 UTC (permalink / raw) > > Btw, in that case the patch is not actually correct, as even workqueue > > with a higher concurrency level MAY deadlock under enough memory > > pressure. We'll need separate workqueues to handle this case I think. > > > > > Yes? And the > > > reconnect worker was never completing? Why is that? Here are a few tidbits > > > about iWARP connections: address resolution == neighbor discovery. So if > the > > > neighbor is unreachable, it will take a few seconds for the OS to give up > and > > > fail the resolution. If the neigh entry is valid and the peer becomes > > > unreachable during connection setup, it might take 60 seconds or so for a > > > connect operation to give up and fail. So this is probably slowing the > > > reconnect thread down. But shouldn't the reconnect thread notice that a > delete > > > is trying to happen and bail out? > > > > I think we should aim for a state machine that can detect this, but > > we'll have to see if that will end up in synchronization overkill. > > Looking at the state machine I don't see why the reconnect thread would get > stuck continually rescheduling once the controller was deleted. Changing from > RECONNECTING to DELETING will be done by nvme_change_ctrl_state(). So once > that > happens, in __nvme_rdma_del_ctrl() , the thread running reconnect logic should > stop rescheduling due to this in the failure logic of > nvme_rdma_reconnect_ctrl_work(): > > ... > requeue: > /* Make sure we are not resetting/deleting */ > if (ctrl->ctrl.state == NVME_CTRL_RECONNECTING) { > dev_info(ctrl->ctrl.device, > "Failed reconnect attempt, requeueing...\n"); > queue_delayed_work(nvme_rdma_wq, &ctrl->reconnect_work, > ctrl->reconnect_delay * HZ); > } > ... > > So something isn't happening like I think it is, I guess. I see what happens. Assume the 10 controllers are reconnecting and failing, thus they reschedule each time. I then run a script to delete all 10 devices sequentially. Like this: for i in $(seq 1 10); do nvme disconnect -d nvme${i}n1; done The first device, nvme1n1 gets a disconnect/delete command and changes the controller state from RECONNECTING to DELETING, and then schedules nvme_rdma_del_ctrl_work(), but that is stuck behind the 9 others continually reconnecting, failing, and rescheduling. I'm not sure why the delete never gets run though? I would think if it is scheduled, then it would get executed before the reconnect threads rescheduling? Maybe we need some round-robin mode for our workq? And because the first delete is stuck, none of the subsequent delete commands get executed. Note: If I run each disconnect command in the background, then they all get cleaned up ok. Like this: for i in $(seq 1 10); do nvme disconnect -d nvme${i}n1 & done ^ permalink raw reply [flat|nested] 15+ messages in thread
* nvme/rdma initiator stuck on reboot 2016-08-18 18:50 ` Steve Wise @ 2016-08-18 19:11 ` Steve Wise 0 siblings, 0 replies; 15+ messages in thread From: Steve Wise @ 2016-08-18 19:11 UTC (permalink / raw) > > > Btw, in that case the patch is not actually correct, as even workqueue > > > with a higher concurrency level MAY deadlock under enough memory > > > pressure. We'll need separate workqueues to handle this case I think. > > > > > > > Yes? And the > > > > reconnect worker was never completing? Why is that? Here are a few > tidbits > > > > about iWARP connections: address resolution == neighbor discovery. So if > > the > > > > neighbor is unreachable, it will take a few seconds for the OS to give up > > and > > > > fail the resolution. If the neigh entry is valid and the peer becomes > > > > unreachable during connection setup, it might take 60 seconds or so for a > > > > connect operation to give up and fail. So this is probably slowing the > > > > reconnect thread down. But shouldn't the reconnect thread notice that a > > delete > > > > is trying to happen and bail out? > > > > > > I think we should aim for a state machine that can detect this, but > > > we'll have to see if that will end up in synchronization overkill. > > > > Looking at the state machine I don't see why the reconnect thread would get > > stuck continually rescheduling once the controller was deleted. Changing from > > RECONNECTING to DELETING will be done by nvme_change_ctrl_state(). So once > > that > > happens, in __nvme_rdma_del_ctrl() , the thread running reconnect logic should > > stop rescheduling due to this in the failure logic of > > nvme_rdma_reconnect_ctrl_work(): > > > > ... > > requeue: > > /* Make sure we are not resetting/deleting */ > > if (ctrl->ctrl.state == NVME_CTRL_RECONNECTING) { > > dev_info(ctrl->ctrl.device, > > "Failed reconnect attempt, requeueing...\n"); > > queue_delayed_work(nvme_rdma_wq, &ctrl->reconnect_work, > > ctrl->reconnect_delay * HZ); > > } > > ... > > > > So something isn't happening like I think it is, I guess. > > > I see what happens. Assume the 10 controllers are reconnecting and failing, > thus they reschedule each time. I then run a script to delete all 10 devices > sequentially. Like this: > > for i in $(seq 1 10); do nvme disconnect -d nvme${i}n1; done > > The first device, nvme1n1 gets a disconnect/delete command and changes the > controller state from RECONNECTING to DELETING, and then schedules > nvme_rdma_del_ctrl_work(), but that is stuck behind the 9 others continually > reconnecting, failing, and rescheduling. I'm not sure why the delete never gets > run though? I would think if it is scheduled, then it would get executed before > the reconnect threads rescheduling? Maybe we need some round-robin mode for > our > workq? And because the first delete is stuck, none of the subsequent delete > commands get executed. Note: If I run each disconnect command in the > background, then they all get cleaned up ok. Like this: > > for i in $(seq 1 10); do nvme disconnect -d nvme${i}n1 & done > > Experimenting more, running the 'nvme disconnect's in the back ground doesn't really avoid things getting stuck. BTW: I'm running these with the single threaded workq, to understand the deadlock. (well, trying to understand...) ^ permalink raw reply [flat|nested] 15+ messages in thread
* nvme/rdma initiator stuck on reboot 2016-08-18 15:21 ` 'Christoph Hellwig' 2016-08-18 17:59 ` Steve Wise @ 2016-08-19 8:58 ` Sagi Grimberg 2016-08-19 14:22 ` Steve Wise [not found] ` <008001d1fa25$0c960fb0$25c22f10$@opengridcomputing.com> 1 sibling, 2 replies; 15+ messages in thread From: Sagi Grimberg @ 2016-08-19 8:58 UTC (permalink / raw) > Btw, in that case the patch is not actually correct, as even workqueue > with a higher concurrency level MAY deadlock under enough memory > pressure. We'll need separate workqueues to handle this case I think. Steve, does it help if you run the delete on the system_long_wq [1]? Note, I've seen problems with forward progress when sharing a workqueue between teardown/reconnect sequences and the rest of the system (mostly in srp). >> Yes? And the >> reconnect worker was never completing? Why is that? Here are a few tidbits >> about iWARP connections: address resolution == neighbor discovery. So if the >> neighbor is unreachable, it will take a few seconds for the OS to give up and >> fail the resolution. If the neigh entry is valid and the peer becomes >> unreachable during connection setup, it might take 60 seconds or so for a >> connect operation to give up and fail. So this is probably slowing the >> reconnect thread down. But shouldn't the reconnect thread notice that a delete >> is trying to happen and bail out? > > I think we should aim for a state machine that can detect this, but > we'll have to see if that will end up in synchronization overkill. The reconnect logic does take care of this state transition... [1]: -- diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 8d2875b4c56d..93ea2831ff31 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1342,7 +1342,7 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue) } /* Queue controller deletion */ - queue_work(nvme_rdma_wq, &ctrl->delete_work); + queue_work(system_long_wq, &ctrl->delete_work); flush_work(&ctrl->delete_work); return ret; } @@ -1681,7 +1681,7 @@ static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl *ctrl) if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING)) return -EBUSY; - if (!queue_work(nvme_rdma_wq, &ctrl->delete_work)) + if (!queue_work(system_long_wq, &ctrl->delete_work)) return -EBUSY; return 0; @@ -1763,7 +1763,7 @@ static int nvme_rdma_reset_ctrl(struct nvme_ctrl *nctrl) if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING)) return -EBUSY; - if (!queue_work(nvme_rdma_wq, &ctrl->reset_work)) + if (!queue_work(system_long_wq, &ctrl->reset_work)) return -EBUSY; flush_work(&ctrl->reset_work); -- ^ permalink raw reply related [flat|nested] 15+ messages in thread
* nvme/rdma initiator stuck on reboot 2016-08-19 8:58 ` Sagi Grimberg @ 2016-08-19 14:22 ` Steve Wise [not found] ` <008001d1fa25$0c960fb0$25c22f10$@opengridcomputing.com> 1 sibling, 0 replies; 15+ messages in thread From: Steve Wise @ 2016-08-19 14:22 UTC (permalink / raw) > > > Btw, in that case the patch is not actually correct, as even workqueue > > with a higher concurrency level MAY deadlock under enough memory > > pressure. We'll need separate workqueues to handle this case I think. > > Steve, does it help if you run the delete on the system_long_wq [1]? > Note, I've seen problems with forward progress when sharing > a workqueue between teardown/reconnect sequences and the rest of > the system (mostly in srp). > I can try this, but see my comments below. I'm not sure there is any deadlock at this point.. > >> Yes? And the > >> reconnect worker was never completing? Why is that? Here are a few tidbits > >> about iWARP connections: address resolution == neighbor discovery. So if the > >> neighbor is unreachable, it will take a few seconds for the OS to give up and > >> fail the resolution. If the neigh entry is valid and the peer becomes > >> unreachable during connection setup, it might take 60 seconds or so for a > >> connect operation to give up and fail. So this is probably slowing the > >> reconnect thread down. But shouldn't the reconnect thread notice that a delete > >> is trying to happen and bail out? > > > > I think we should aim for a state machine that can detect this, but > > we'll have to see if that will end up in synchronization overkill. > > The reconnect logic does take care of this state transition... Yes, I agree. The disconnect/delete command changes the controller state from RECONNECTING to DELETEING and the reconnecting thread will not reschedule itself for that controller. In further debugging (see my subsequent emails), it appears that there really isn't a deadlock. First, let me describe the main problem: the IWCM will block destroying a cm_id until the driver has completed a connection setup attempt. See IWCM_F_CONNECT_WAIT in drivers/infiniband/core/iwcm.c. Further, iw_cxgb4's TCP engine can take up to 60 seconds to fail a TCP connection setup if the neigh entry is valid yet the peer is unresponsive. So what we see happening is that when kato kicks in after the target reboots and _before_ the neigh entry for the target is flushed due to no connectivity, the connection setup attempts all get initiated by the reconnect logic in the nvmf/rdma host driver. Even though the nvme/rdma host driver times out such an attempt in ~1sec (NVME_RDMA_CONNECT_TIMEOUT_MS), it gets stuck for up to 60 seconds destroying the cm_id. So for my setup, each controller has 32 io queues. That causes the reconnect thread to get stuck for VERY LONG periods of time. Even if the controllers are deleted thus changing the state of the controller to DELETING, the thread will still get stuck for at least 60 seconds trying to destroy its current connecting cm_id. Then you multiply that by 10 controllers in my test and you see that the reconnect logic is taking way too long to give up. So I think I need to see about removing the IWCM_F_CONNECT_WAIT logic in the iwcm. One other thing: in both nvme_rdma_device_unplug() and nvme_rdma_del_ctrl(), the code kicks the delete_work thread to delete the controller and then calls flush_work(). This is a possible touch-after-free, no? The proper way, I think, should be to take a ref on ctrl, kick the delete_work thread, call flush_work(), and then nvme_put_ctrl(ctrl). Do you agree? While doing this debug, I wondered if this issue was causing a delete thread to get stuck in flush_work(). I never proved that, and I think the real issue is the IWCM_F_CONNECT_WAIT logic. Steve. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <008001d1fa25$0c960fb0$25c22f10$@opengridcomputing.com>]
* nvme/rdma initiator stuck on reboot [not found] ` <008001d1fa25$0c960fb0$25c22f10$@opengridcomputing.com> @ 2016-08-19 14:24 ` Steve Wise 0 siblings, 0 replies; 15+ messages in thread From: Steve Wise @ 2016-08-19 14:24 UTC (permalink / raw) > One other thing: in both nvme_rdma_device_unplug() and > nvme_rdma_del_ctrl(), the code kicks the delete_work thread to delete the > controller and then calls flush_work(). This is a possible > touch-after-free, no? The proper way, I think, should be to take a ref on > ctrl, kick the delete_work thread, call flush_work(), and then > nvme_put_ctrl(ctrl). Do you agree? IE: do we need this: diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 9c69393..6198eaa 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1341,9 +1341,15 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue) ret = 1; } - /* Queue controller deletion */ + /* + * Queue controller deletion. Keep a reference until all + * work is flushed since delete_work will free the ctrl mem + */ + kref_get(&ctrl->ctrl.kref); queue_work(nvme_rdma_wq, &ctrl->delete_work); flush_work(&ctrl->delete_work); + nvme_put_ctrl(&ctrl->ctrl); + return ret; } @@ -1690,15 +1696,22 @@ static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl *ctrl) static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl) { struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl); - int ret; + int ret = 0; + + /* + * Keep a reference until all work is flushed since + * __nvme_rdma_del_ctrl can free the ctrl mem + */ + kref_get(&ctrl->ctrl.kref); ret = __nvme_rdma_del_ctrl(ctrl); if (ret) - return ret; + goto out; flush_work(&ctrl->delete_work); - - return 0; +out: + nvme_put_ctrl(&ctrl->ctrl); + return ret; } static void nvme_rdma_remove_ctrl_work(struct work_struct *work) ^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-08-19 14:24 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-08-16 19:40 nvme/rdma initiator stuck on reboot Steve Wise 2016-08-17 10:23 ` Sagi Grimberg 2016-08-17 14:33 ` Steve Wise 2016-08-17 14:46 ` Sagi Grimberg 2016-08-17 15:13 ` Steve Wise 2016-08-18 7:01 ` Sagi Grimberg 2016-08-18 13:59 ` Steve Wise 2016-08-18 14:47 ` Steve Wise 2016-08-18 15:21 ` 'Christoph Hellwig' 2016-08-18 17:59 ` Steve Wise 2016-08-18 18:50 ` Steve Wise 2016-08-18 19:11 ` Steve Wise 2016-08-19 8:58 ` Sagi Grimberg 2016-08-19 14:22 ` Steve Wise [not found] ` <008001d1fa25$0c960fb0$25c22f10$@opengridcomputing.com> 2016-08-19 14:24 ` Steve Wise
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.