All of lore.kernel.org
 help / color / mirror / Atom feed
* nvme/rdma initiator stuck on reboot
@ 2016-08-16 19:40 Steve Wise
  2016-08-17 10:23 ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Wise @ 2016-08-16 19:40 UTC (permalink / raw)


Hey Sagi,

Here is another issue I'm seeing doing reboot testing.  The test does this:

1) connect 10 ram devices over iw_cxgb4
2) reboot the target node
3) the initiator goes into recovery/reconnect mode
4) reboot the inititator at this point.

The initiator gets stuck doing this continually and the system never reboots:

[  596.411842] nvme nvme1: Failed reconnect attempt, requeueing...
[  596.907865] nvme nvme9: rdma_resolve_addr wait failed (-104).
[  596.914461] nvme nvme9: Failed reconnect attempt, requeueing...
[  597.939935] nvme nvme10: rdma_resolve_addr wait failed (-104).
[  597.946625] nvme nvme10: Failed reconnect attempt, requeueing...
[  598.963995] nvme nvme2: rdma_resolve_addr wait failed (-110).
[  598.971968] nvme nvme2: Failed reconnect attempt, requeueing...
[  602.036135] nvme nvme3: rdma_resolve_addr wait failed (-104).
[  602.043797] nvme nvme3: Failed reconnect attempt, requeueing...
[  603.060171] nvme nvme4: rdma_resolve_addr wait failed (-104).
[  603.068153] nvme nvme4: Failed reconnect attempt, requeueing...
[  604.084223] nvme nvme5: rdma_resolve_addr wait failed (-104).
[  604.092191] nvme nvme5: Failed reconnect attempt, requeueing...
[  605.108294] nvme nvme6: rdma_resolve_addr wait failed (-104).
[  605.116251] nvme nvme6: Failed reconnect attempt, requeueing...

Debugging now...

Steve.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
  2016-08-16 19:40 nvme/rdma initiator stuck on reboot Steve Wise
@ 2016-08-17 10:23 ` Sagi Grimberg
  2016-08-17 14:33   ` Steve Wise
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2016-08-17 10:23 UTC (permalink / raw)



> Hey Sagi,
>
> Here is another issue I'm seeing doing reboot testing.  The test does this:
>
> 1) connect 10 ram devices over iw_cxgb4
> 2) reboot the target node
> 3) the initiator goes into recovery/reconnect mode
> 4) reboot the inititator at this point.
>
> The initiator gets stuck doing this continually and the system never reboots:
>
> [  596.411842] nvme nvme1: Failed reconnect attempt, requeueing...
> [  596.907865] nvme nvme9: rdma_resolve_addr wait failed (-104).
> [  596.914461] nvme nvme9: Failed reconnect attempt, requeueing...
> [  597.939935] nvme nvme10: rdma_resolve_addr wait failed (-104).
> [  597.946625] nvme nvme10: Failed reconnect attempt, requeueing...
> [  598.963995] nvme nvme2: rdma_resolve_addr wait failed (-110).
> [  598.971968] nvme nvme2: Failed reconnect attempt, requeueing...
> [  602.036135] nvme nvme3: rdma_resolve_addr wait failed (-104).
> [  602.043797] nvme nvme3: Failed reconnect attempt, requeueing...
> [  603.060171] nvme nvme4: rdma_resolve_addr wait failed (-104).
> [  603.068153] nvme nvme4: Failed reconnect attempt, requeueing...
> [  604.084223] nvme nvme5: rdma_resolve_addr wait failed (-104).
> [  604.092191] nvme nvme5: Failed reconnect attempt, requeueing...
> [  605.108294] nvme nvme6: rdma_resolve_addr wait failed (-104).
> [  605.116251] nvme nvme6: Failed reconnect attempt, requeueing...
>
> Debugging now...

Hmm...

Does this reproduce also when you simply delete all the
controllers (via sysfs)?

Do you see the hung task watchdog? can you share the
threads state? (echo t > /proc/sysrq-trigger)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
  2016-08-17 10:23 ` Sagi Grimberg
@ 2016-08-17 14:33   ` Steve Wise
  2016-08-17 14:46     ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Wise @ 2016-08-17 14:33 UTC (permalink / raw)


> > Hey Sagi,
> >
> > Here is another issue I'm seeing doing reboot testing.  The test does
this:
> >
> > 1) connect 10 ram devices over iw_cxgb4
> > 2) reboot the target node
> > 3) the initiator goes into recovery/reconnect mode
> > 4) reboot the inititator at this point.
> >
> > The initiator gets stuck doing this continually and the system never
reboots:
> >
> > [  596.411842] nvme nvme1: Failed reconnect attempt, requeueing...
> > [  596.907865] nvme nvme9: rdma_resolve_addr wait failed (-104).
> > [  596.914461] nvme nvme9: Failed reconnect attempt, requeueing...
> > [  597.939935] nvme nvme10: rdma_resolve_addr wait failed (-104).
> > [  597.946625] nvme nvme10: Failed reconnect attempt, requeueing...
> > [  598.963995] nvme nvme2: rdma_resolve_addr wait failed (-110).
> > [  598.971968] nvme nvme2: Failed reconnect attempt, requeueing...
> > [  602.036135] nvme nvme3: rdma_resolve_addr wait failed (-104).
> > [  602.043797] nvme nvme3: Failed reconnect attempt, requeueing...
> > [  603.060171] nvme nvme4: rdma_resolve_addr wait failed (-104).
> > [  603.068153] nvme nvme4: Failed reconnect attempt, requeueing...
> > [  604.084223] nvme nvme5: rdma_resolve_addr wait failed (-104).
> > [  604.092191] nvme nvme5: Failed reconnect attempt, requeueing...
> > [  605.108294] nvme nvme6: rdma_resolve_addr wait failed (-104).
> > [  605.116251] nvme nvme6: Failed reconnect attempt, requeueing...
> >
> > Debugging now...
> 
> Hmm...
> 
> Does this reproduce also when you simply delete all the
> controllers (via sysfs)?
>

Yes.

 
> Do you see the hung task watchdog? can you share the
> threads state? (echo t > /proc/sysrq-trigger)

The deleting thread is stuck:

Aug 17 07:21:32 stevo1 kernel: [ 2213.279798] INFO: task bash:4066 blocked
for more than 120 seconds.
Aug 17 07:21:32 stevo1 kernel: [ 2213.279802]       Tainted: G            E
4.8.0-rc2-nvmf-4.8-rc-rebased-rc2+ #15
Aug 17 07:21:32 stevo1 kernel: [ 2213.279803] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 17 07:21:32 stevo1 kernel: [ 2213.279811] bash            D
ffff881034547ac8     0  4066   4056 0x10000080
Aug 17 07:21:32 stevo1 kernel: [ 2213.279819]  ffff881034547ac8
ffff881033f04140 ffff88102bc80000 ffff881032b66250
Aug 17 07:21:32 stevo1 kernel: [ 2213.279824]  ffff880fefe64230
ffff88103e88aa50 ffff881034544008 ffff881034547c98
Aug 17 07:21:32 stevo1 kernel: [ 2213.279828]  ffff881034547c08
ffff88102bc80000 7fffffffffffffff ffff881034547b18
Aug 17 07:21:32 stevo1 kernel: [ 2213.279829] Call Trace:
Aug 17 07:21:32 stevo1 kernel: [ 2213.279842]  [<ffffffff816ddde0>]
schedule+0x40/0xb0
Aug 17 07:21:32 stevo1 kernel: [ 2213.279850]  [<ffffffff810b2de4>] ?
try_to_wake_up+0x54/0x240
Aug 17 07:21:32 stevo1 kernel: [ 2213.279856]  [<ffffffff816e0a8d>]
schedule_timeout+0x2ad/0x410
Aug 17 07:21:32 stevo1 kernel: [ 2213.279861]  [<ffffffff810b3025>] ?
wake_up_process+0x15/0x20
Aug 17 07:21:32 stevo1 kernel: [ 2213.279868]  [<ffffffff8109d940>] ?
wake_up_worker+0x30/0x40
Aug 17 07:21:32 stevo1 kernel: [ 2213.279873]  [<ffffffff8109fdbc>] ?
insert_work+0x6c/0xb0
Aug 17 07:21:32 stevo1 kernel: [ 2213.279878]  [<ffffffff816deefe>]
wait_for_completion+0xde/0x110
Aug 17 07:21:32 stevo1 kernel: [ 2213.279882]  [<ffffffff810b2fd0>] ?
try_to_wake_up+0x240/0x240
Aug 17 07:21:32 stevo1 kernel: [ 2213.279887]  [<ffffffff810a0e0b>]
flush_work+0x2b/0x40
Aug 17 07:21:32 stevo1 kernel: [ 2213.279892]  [<ffffffff8109dcc0>] ?
worker_detach_from_pool+0xa0/0xa0
Aug 17 07:21:32 stevo1 kernel: [ 2213.279902]  [<ffffffffa064e46a>] ?
__nvme_rdma_del_ctrl+0x4a/0x60 [nvme_rdma]
Aug 17 07:21:32 stevo1 kernel: [ 2213.279908]  [<ffffffffa064e4b8>]
nvme_rdma_del_ctrl+0x38/0x50 [nvme_rdma]
Aug 17 07:21:32 stevo1 kernel: [ 2213.279918]  [<ffffffffa02703d4>]
nvme_sysfs_delete+0x34/0x50 [nvme_core]
Aug 17 07:21:32 stevo1 kernel: [ 2213.279923]  [<ffffffff814884a0>]
dev_attr_store+0x20/0x30
Aug 17 07:21:32 stevo1 kernel: [ 2213.279931]  [<ffffffff812ae29e>]
sysfs_kf_write+0x3e/0x40
Aug 17 07:21:32 stevo1 kernel: [ 2213.279935]  [<ffffffff812add13>]
kernfs_fop_write+0x113/0x1b0
Aug 17 07:21:32 stevo1 kernel: [ 2213.279942]  [<ffffffff8115a588>] ?
trace_buffer_unlock_commit_regs+0x68/0x80
Aug 17 07:21:32 stevo1 kernel: [ 2213.279949]  [<ffffffff8122cfe4>]
__vfs_write+0x34/0x120
Aug 17 07:21:32 stevo1 kernel: [ 2213.279957]  [<ffffffff81003725>] ?
trace_event_raw_event_sys_enter+0xb5/0x130
Aug 17 07:21:32 stevo1 kernel: [ 2213.279963]  [<ffffffff810d1d06>] ?
percpu_down_read+0x16/0x60
Aug 17 07:21:32 stevo1 kernel: [ 2213.279967]  [<ffffffff8122d191>]
vfs_write+0xc1/0x130
Aug 17 07:21:32 stevo1 kernel: [ 2213.279974]  [<ffffffff81249633>] ?
__fdget+0x13/0x20
Aug 17 07:21:32 stevo1 kernel: [ 2213.279978]  [<ffffffff8122d306>]
SyS_write+0x56/0xc0
Aug 17 07:21:32 stevo1 kernel: [ 2213.279983]  [<ffffffff81003e7d>]
do_syscall_64+0x7d/0x230
Aug 17 07:21:32 stevo1 kernel: [ 2213.279991]  [<ffffffff8106f057>] ?
do_page_fault+0x37/0x90
Aug 17 07:21:32 stevo1 kernel: [ 2213.279997]  [<ffffffff816e1921>]
entry_SYSCALL64_slow_path+0x25/0x25



Nvme_rdma_wq is doing nothing:

Aug 17 07:20:08 stevo1 kernel: [ 2130.461614] nvme_rdma_wq    S
ffff880fedd7bd58     0  4607      2 0x10000080
Aug 17 07:20:08 stevo1 kernel: [ 2130.469572]  ffff880fedd7bd58
ffff8810381a15c0 ffff88102bdb5700 ffff881032b66250
Aug 17 07:20:08 stevo1 kernel: [ 2130.477972]  ffffffff81038959
ffff880fedd7bd48 ffff880fedd78008 ffff8810206dde00
Aug 17 07:20:08 stevo1 kernel: [ 2130.486357]  ffff88102bdb5700
ffffffff810a1730 ffff8810381abed8 ffff880fedd7bda8
Aug 17 07:20:08 stevo1 kernel: [ 2130.494755] Call Trace:
Aug 17 07:20:08 stevo1 kernel: [ 2130.498088]  [<ffffffff81038959>] ?
sched_clock+0x9/0x10
Aug 17 07:20:08 stevo1 kernel: [ 2130.504287]  [<ffffffff810a1730>] ?
process_one_work+0x4d0/0x4d0
Aug 17 07:20:08 stevo1 kernel: [ 2130.511168]  [<ffffffff816ddde0>]
schedule+0x40/0xb0
Aug 17 07:20:08 stevo1 kernel: [ 2130.517002]  [<ffffffff810b1064>] ?
set_user_nice+0x114/0x1b0
Aug 17 07:20:08 stevo1 kernel: [ 2130.523623]  [<ffffffff810a1a22>]
rescuer_thread+0x2f2/0x380
Aug 17 07:20:08 stevo1 kernel: [ 2130.530146]  [<ffffffff816dd920>] ?
__schedule+0x1f0/0x5b0
Aug 17 07:20:08 stevo1 kernel: [ 2130.536501]  [<ffffffff810cb8a6>] ?
__wake_up_common+0x56/0x90
Aug 17 07:20:08 stevo1 kernel: [ 2130.543173]  [<ffffffff810a1730>] ?
process_one_work+0x4d0/0x4d0
Aug 17 07:20:08 stevo1 kernel: [ 2130.550013]  [<ffffffff816ddde0>] ?
schedule+0x40/0xb0
Aug 17 07:20:08 stevo1 kernel: [ 2130.555971]  [<ffffffff810a1730>] ?
process_one_work+0x4d0/0x4d0
Aug 17 07:20:08 stevo1 kernel: [ 2130.562800]  [<ffffffff810a6bbc>]
kthread+0xcc/0xf0
Aug 17 07:20:08 stevo1 kernel: [ 2130.568485]  [<ffffffff810b157e>] ?
schedule_tail+0x1e/0xc0
Aug 17 07:20:08 stevo1 kernel: [ 2130.574871]  [<ffffffff816e1a7f>]
ret_from_fork+0x1f/0x40
Aug 17 07:20:08 stevo1 kernel: [ 2130.581073]  [<ffffffff810a6af0>] ?
kthread_freezable_should_stop+0x70/0x70

Here is another thread in the delete controller sysfs path:

Aug 17 07:20:05 stevo1 kernel: [ 2127.184581] bash            D
ffff881034547ac8     0  4066   4056 0x10000080
Aug 17 07:20:05 stevo1 kernel: [ 2127.192445]  ffff881034547ac8
ffff881033f04140 ffff88102bc80000 ffff881032b66250
Aug 17 07:20:05 stevo1 kernel: [ 2127.200730]  ffff880fefe64230
ffff88103e88aa50 ffff881034544008 ffff881034547c98
Aug 17 07:20:05 stevo1 kernel: [ 2127.209022]  ffff881034547c08
ffff88102bc80000 7fffffffffffffff ffff881034547b18
Aug 17 07:20:05 stevo1 kernel: [ 2127.217278] Call Trace:
Aug 17 07:20:05 stevo1 kernel: [ 2127.220523]  [<ffffffff816ddde0>]
schedule+0x40/0xb0
Aug 17 07:20:05 stevo1 kernel: [ 2127.226244]  [<ffffffff810b2de4>] ?
try_to_wake_up+0x54/0x240
Aug 17 07:20:05 stevo1 kernel: [ 2127.232753]  [<ffffffff816e0a8d>]
schedule_timeout+0x2ad/0x410
Aug 17 07:20:05 stevo1 kernel: [ 2127.239353]  [<ffffffff810b3025>] ?
wake_up_process+0x15/0x20
Aug 17 07:20:05 stevo1 kernel: [ 2127.245871]  [<ffffffff8109d940>] ?
wake_up_worker+0x30/0x40
Aug 17 07:20:05 stevo1 kernel: [ 2127.252285]  [<ffffffff8109fdbc>] ?
insert_work+0x6c/0xb0
Aug 17 07:20:05 stevo1 kernel: [ 2127.258432]  [<ffffffff816deefe>]
wait_for_completion+0xde/0x110
Aug 17 07:20:05 stevo1 kernel: [ 2127.263351] nvme nvme4: rdma_resolve_addr
wait failed (-110).
Aug 17 07:20:05 stevo1 kernel: [ 2127.263355] nvme nvme4: Failed reconnect
attempt, requeueing...
Aug 17 07:20:05 stevo1 kernel: [ 2127.278353]  [<ffffffff810b2fd0>] ?
try_to_wake_up+0x240/0x240
Aug 17 07:20:05 stevo1 kernel: [ 2127.284912]  [<ffffffff810a0e0b>]
flush_work+0x2b/0x40
Aug 17 07:20:05 stevo1 kernel: [ 2127.290771]  [<ffffffff8109dcc0>] ?
worker_detach_from_pool+0xa0/0xa0
Aug 17 07:20:05 stevo1 kernel: [ 2127.297970]  [<ffffffffa064e46a>] ?
__nvme_rdma_del_ctrl+0x4a/0x60 [nvme_rdma]
Aug 17 07:20:05 stevo1 kernel: [ 2127.305945]  [<ffffffffa064e4b8>]
nvme_rdma_del_ctrl+0x38/0x50 [nvme_rdma]
Aug 17 07:20:05 stevo1 kernel: [ 2127.313577]  [<ffffffffa02703d4>]
nvme_sysfs_delete+0x34/0x50 [nvme_core]
Aug 17 07:20:05 stevo1 kernel: [ 2127.321111]  [<ffffffff814884a0>]
dev_attr_store+0x20/0x30
Aug 17 07:20:05 stevo1 kernel: [ 2127.327343]  [<ffffffff812ae29e>]
sysfs_kf_write+0x3e/0x40
Aug 17 07:20:05 stevo1 kernel: [ 2127.333551]  [<ffffffff812add13>]
kernfs_fop_write+0x113/0x1b0
Aug 17 07:20:05 stevo1 kernel: [ 2127.340127]  [<ffffffff8115a588>] ?
trace_buffer_unlock_commit_regs+0x68/0x80
Aug 17 07:20:05 stevo1 kernel: [ 2127.348017]  [<ffffffff8122cfe4>]
__vfs_write+0x34/0x120
Aug 17 07:20:05 stevo1 kernel: [ 2127.354083]  [<ffffffff81003725>] ?
trace_event_raw_event_sys_enter+0xb5/0x130
Aug 17 07:20:05 stevo1 kernel: [ 2127.362060]  [<ffffffff810d1d06>] ?
percpu_down_read+0x16/0x60
Aug 17 07:20:05 stevo1 kernel: [ 2127.368667]  [<ffffffff8122d191>]
vfs_write+0xc1/0x130
Aug 17 07:20:05 stevo1 kernel: [ 2127.374559]  [<ffffffff81249633>] ?
__fdget+0x13/0x20
Aug 17 07:20:05 stevo1 kernel: [ 2127.380367]  [<ffffffff8122d306>]
SyS_write+0x56/0xc0
Aug 17 07:20:05 stevo1 kernel: [ 2127.386201]  [<ffffffff81003e7d>]
do_syscall_64+0x7d/0x230
Aug 17 07:20:05 stevo1 kernel: [ 2127.392485]  [<ffffffff8106f057>] ?
do_page_fault+0x37/0x90
Aug 17 07:20:05 stevo1 kernel: [ 2127.398866]  [<ffffffff816e1921>]
entry_SYSCALL64_slow_path+0x25/0x25

Here is the reconnect workq thread stuck trying to connect:

Aug 17 07:19:45 stevo1 kernel: [ 2107.446780] Workqueue: nvme_rdma_wq
nvme_rdma_reconnect_ctrl_work [nvme_rdma]
Aug 17 07:19:45 stevo1 kernel: [ 2107.454764]  ffff881033a57aa8
ffff88103815c140 ffff8810371dab80 ffff881032b66250
Aug 17 07:19:45 stevo1 kernel: [ 2107.463091]  00000001001b93e9
0000000000000001 ffff881033a54008 ffff88103ee500c0
Aug 17 07:19:45 stevo1 kernel: [ 2107.471431]  00000001001b93e9
0000000000000000 ffff88103ee500c0 ffff881033a57af8
Aug 17 07:19:45 stevo1 kernel: [ 2107.479805] Call Trace:
Aug 17 07:19:45 stevo1 kernel: [ 2107.483076]  [<ffffffff816ddde0>]
schedule+0x40/0xb0
Aug 17 07:19:45 stevo1 kernel: [ 2107.488892]  [<ffffffff810f4a80>] ?
internal_add_timer+0x70/0x80
Aug 17 07:19:45 stevo1 kernel: [ 2107.495765]  [<ffffffff816e09fb>]
schedule_timeout+0x21b/0x410
Aug 17 07:19:45 stevo1 kernel: [ 2107.502464]  [<ffffffff810f4c10>] ?
del_timer+0x70/0x70
Aug 17 07:19:45 stevo1 kernel: [ 2107.508547]  [<ffffffffa031da9f>] ?
rdma_resolve_ip+0x1cf/0x1e0 [ib_core]
Aug 17 07:19:45 stevo1 kernel: [ 2107.516183]  [<ffffffffa0736fe0>] ?
cma_work_handler+0xa0/0xa0 [rdma_cm]
Aug 17 07:19:45 stevo1 kernel: [ 2107.523721]  [<ffffffff816deb6f>]
wait_for_completion_interruptible_timeout+0xef/0x130
Aug 17 07:19:45 stevo1 kernel: [ 2107.532472]  [<ffffffff810b2fd0>] ?
try_to_wake_up+0x240/0x240
Aug 17 07:19:45 stevo1 kernel: [ 2107.539152]  [<ffffffffa064f5c2>]
nvme_rdma_init_queue+0xe2/0x180 [nvme_rdma]
Aug 17 07:19:45 stevo1 kernel: [ 2107.547132]  [<ffffffffa064f9bc>]
nvme_rdma_reconnect_ctrl_work+0x9c/0x1f0 [nvme_rdma]
Aug 17 07:19:45 stevo1 kernel: [ 2107.555908]  [<ffffffff810a13e3>]
process_one_work+0x183/0x4d0
Aug 17 07:19:45 stevo1 kernel: [ 2107.562592]  [<ffffffff816dd920>] ?
__schedule+0x1f0/0x5b0
Aug 17 07:19:45 stevo1 kernel: [ 2107.568938]  [<ffffffff816ddde0>] ?
schedule+0x40/0xb0
Aug 17 07:19:45 stevo1 kernel: [ 2107.574927]  [<ffffffff810a207d>]
worker_thread+0x16d/0x530
Aug 17 07:19:45 stevo1 kernel: [ 2107.581346]  [<ffffffff810a1f10>] ?
maybe_create_worker+0x120/0x120
Aug 17 07:19:45 stevo1 kernel: [ 2107.588474]  [<ffffffff816dd920>] ?
__schedule+0x1f0/0x5b0
Aug 17 07:19:45 stevo1 kernel: [ 2107.594813]  [<ffffffff810cb8a6>] ?
__wake_up_common+0x56/0x90
Aug 17 07:19:45 stevo1 kernel: [ 2107.601488]  [<ffffffff810a1f10>] ?
maybe_create_worker+0x120/0x120
Aug 17 07:19:45 stevo1 kernel: [ 2107.608625]  [<ffffffff816ddde0>] ?
schedule+0x40/0xb0
Aug 17 07:19:45 stevo1 kernel: [ 2107.614626]  [<ffffffff810a1f10>] ?
maybe_create_worker+0x120/0x120
Aug 17 07:19:45 stevo1 kernel: [ 2107.621745]  [<ffffffff810a6bbc>]
kthread+0xcc/0xf0
Aug 17 07:19:45 stevo1 kernel: [ 2107.627464]  [<ffffffff810b157e>] ?
schedule_tail+0x1e/0xc0
Aug 17 07:19:45 stevo1 kernel: [ 2107.633886]  [<ffffffff816e1a7f>]
ret_from_fork+0x1f/0x40
Aug 17 07:19:45 stevo1 kernel: [ 2107.640121]  [<ffffffff810a6af0>] ?
kthread_freezable_should_stop+0x70/0x70

Here is controller delete workq thread:

Aug 17 07:19:42 stevo1 kernel: [ 2104.515910] Workqueue: nvme_rdma_wq
nvme_rdma_del_ctrl_work [nvme_rdma]
Aug 17 07:19:42 stevo1 kernel: [ 2104.523373]  ffff881033f13888
ffff88103817ab80 ffff881033f04140 ffff881032b66250
Aug 17 07:19:42 stevo1 kernel: [ 2104.531744]  00000001001c1a60
0000000000000005 ffff881033f10008 ffff88103ef500c0
Aug 17 07:19:42 stevo1 kernel: [ 2104.540116]  00000001001c1a60
0000000000000000 ffff88103ef500c0 ffff881033f138d8
Aug 17 07:19:42 stevo1 kernel: [ 2104.548501] Call Trace:
Aug 17 07:19:42 stevo1 kernel: [ 2104.551777]  [<ffffffff816ddde0>]
schedule+0x40/0xb0
Aug 17 07:19:42 stevo1 kernel: [ 2104.557593]  [<ffffffff810f4a80>] ?
internal_add_timer+0x70/0x80
Aug 17 07:19:42 stevo1 kernel: [ 2104.564445]  [<ffffffff816e09fb>]
schedule_timeout+0x21b/0x410
Aug 17 07:19:42 stevo1 kernel: [ 2104.571117]  [<ffffffff810f4c10>] ?
del_timer+0x70/0x70
Aug 17 07:19:42 stevo1 kernel: [ 2104.577191]  [<ffffffff810fe67c>] ?
ktime_get+0x4c/0xc0
Aug 17 07:19:42 stevo1 kernel: [ 2104.583248]  [<ffffffff816dd6c4>]
io_schedule_timeout+0xa4/0x110
Aug 17 07:19:42 stevo1 kernel: [ 2104.590083]  [<ffffffff816df013>]
wait_for_completion_io_timeout+0xe3/0x120
Aug 17 07:19:42 stevo1 kernel: [ 2104.597943]  [<ffffffff810b2fd0>] ?
try_to_wake_up+0x240/0x240
Aug 17 07:19:42 stevo1 kernel: [ 2104.604613]  [<ffffffff81332a4d>] ?
blk_execute_rq_nowait+0x7d/0x160
Aug 17 07:19:42 stevo1 kernel: [ 2104.611810]  [<ffffffff81332bbc>]
blk_execute_rq+0x8c/0x110
Aug 17 07:19:42 stevo1 kernel: [ 2104.618221]  [<ffffffffa02715c5>]
__nvme_submit_sync_cmd+0x85/0xe0 [nvme_core]
Aug 17 07:19:42 stevo1 kernel: [ 2104.626293]  [<ffffffffa0672526>]
nvmf_reg_write32+0x66/0xb0 [nvme_fabrics]
Aug 17 07:19:42 stevo1 kernel: [ 2104.634086]  [<ffffffffa027095e>]
nvme_shutdown_ctrl+0x4e/0xe0 [nvme_core]
Aug 17 07:19:42 stevo1 kernel: [ 2104.641796]  [<ffffffffa064f161>]
nvme_rdma_shutdown_ctrl+0xa1/0xe0 [nvme_rdma]
Aug 17 07:19:42 stevo1 kernel: [ 2104.649940]  [<ffffffffa064f21c>]
__nvme_rdma_remove_ctrl+0x7c/0x80 [nvme_rdma]
Aug 17 07:19:42 stevo1 kernel: [ 2104.658093]  [<ffffffffa064f25a>]
nvme_rdma_del_ctrl_work+0x1a/0x20 [nvme_rdma]
Aug 17 07:19:42 stevo1 kernel: [ 2104.666248]  [<ffffffff810a13e3>]
process_one_work+0x183/0x4d0
Aug 17 07:19:42 stevo1 kernel: [ 2104.672936]  [<ffffffff816dd920>] ?
__schedule+0x1f0/0x5b0
Aug 17 07:19:42 stevo1 kernel: [ 2104.679269]  [<ffffffff8109daf2>] ?
move_linked_works+0x62/0x90
Aug 17 07:19:42 stevo1 kernel: [ 2104.686032]  [<ffffffff810a2204>]
worker_thread+0x2f4/0x530
Aug 17 07:19:42 stevo1 kernel: [ 2104.692441]  [<ffffffff810a1f10>] ?
maybe_create_worker+0x120/0x120
Aug 17 07:19:42 stevo1 kernel: [ 2104.699540]  [<ffffffff816dd920>] ?
__schedule+0x1f0/0x5b0
Aug 17 07:19:42 stevo1 kernel: [ 2104.705867]  [<ffffffff810cb8a6>] ?
__wake_up_common+0x56/0x90
Aug 17 07:19:42 stevo1 kernel: [ 2104.712534]  [<ffffffff810a1f10>] ?
maybe_create_worker+0x120/0x120
Aug 17 07:19:42 stevo1 kernel: [ 2104.719257] nvme nvme4: rdma_resolve_addr
wait failed (-104).
Aug 17 07:19:42 stevo1 kernel: [ 2104.719261] nvme nvme4: Failed reconnect
attempt, requeueing...
Aug 17 07:19:42 stevo1 kernel: [ 2104.733085]  [<ffffffff816ddde0>] ?
schedule+0x40/0xb0
Aug 17 07:19:42 stevo1 kernel: [ 2104.739063]  [<ffffffff810a1f10>] ?
maybe_create_worker+0x120/0x120
Aug 17 07:19:42 stevo1 kernel: [ 2104.746163]  [<ffffffff810a6bbc>]
kthread+0xcc/0xf0
Aug 17 07:19:42 stevo1 kernel: [ 2104.751878]  [<ffffffff810b157e>] ?
schedule_tail+0x1e/0xc0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
  2016-08-17 14:33   ` Steve Wise
@ 2016-08-17 14:46     ` Sagi Grimberg
  2016-08-17 15:13       ` Steve Wise
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2016-08-17 14:46 UTC (permalink / raw)



>
> The deleting thread is stuck:
>
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279798] INFO: task bash:4066 blocked
> for more than 120 seconds.
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279802]       Tainted: G            E
> 4.8.0-rc2-nvmf-4.8-rc-rebased-rc2+ #15
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279803] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279811] bash            D
> ffff881034547ac8     0  4066   4056 0x10000080
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279819]  ffff881034547ac8
> ffff881033f04140 ffff88102bc80000 ffff881032b66250
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279824]  ffff880fefe64230
> ffff88103e88aa50 ffff881034544008 ffff881034547c98
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279828]  ffff881034547c08
> ffff88102bc80000 7fffffffffffffff ffff881034547b18
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279829] Call Trace:
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279842]  [<ffffffff816ddde0>]
> schedule+0x40/0xb0
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279850]  [<ffffffff810b2de4>] ?
> try_to_wake_up+0x54/0x240
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279856]  [<ffffffff816e0a8d>]
> schedule_timeout+0x2ad/0x410
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279861]  [<ffffffff810b3025>] ?
> wake_up_process+0x15/0x20
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279868]  [<ffffffff8109d940>] ?
> wake_up_worker+0x30/0x40
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279873]  [<ffffffff8109fdbc>] ?
> insert_work+0x6c/0xb0
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279878]  [<ffffffff816deefe>]
> wait_for_completion+0xde/0x110
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279882]  [<ffffffff810b2fd0>] ?
> try_to_wake_up+0x240/0x240
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279887]  [<ffffffff810a0e0b>]
> flush_work+0x2b/0x40
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279892]  [<ffffffff8109dcc0>] ?
> worker_detach_from_pool+0xa0/0xa0
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279902]  [<ffffffffa064e46a>] ?
> __nvme_rdma_del_ctrl+0x4a/0x60 [nvme_rdma]
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279908]  [<ffffffffa064e4b8>]
> nvme_rdma_del_ctrl+0x38/0x50 [nvme_rdma]
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279918]  [<ffffffffa02703d4>]
> nvme_sysfs_delete+0x34/0x50 [nvme_core]
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279923]  [<ffffffff814884a0>]
> dev_attr_store+0x20/0x30
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279931]  [<ffffffff812ae29e>]
> sysfs_kf_write+0x3e/0x40
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279935]  [<ffffffff812add13>]
> kernfs_fop_write+0x113/0x1b0
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279942]  [<ffffffff8115a588>] ?
> trace_buffer_unlock_commit_regs+0x68/0x80
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279949]  [<ffffffff8122cfe4>]
> __vfs_write+0x34/0x120
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279957]  [<ffffffff81003725>] ?
> trace_event_raw_event_sys_enter+0xb5/0x130
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279963]  [<ffffffff810d1d06>] ?
> percpu_down_read+0x16/0x60
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279967]  [<ffffffff8122d191>]
> vfs_write+0xc1/0x130
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279974]  [<ffffffff81249633>] ?
> __fdget+0x13/0x20
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279978]  [<ffffffff8122d306>]
> SyS_write+0x56/0xc0
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279983]  [<ffffffff81003e7d>]
> do_syscall_64+0x7d/0x230
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279991]  [<ffffffff8106f057>] ?
> do_page_fault+0x37/0x90
> Aug 17 07:21:32 stevo1 kernel: [ 2213.279997]  [<ffffffff816e1921>]
> entry_SYSCALL64_slow_path+0x25/0x25
>
>
>
> Nvme_rdma_wq is doing nothing:
>
> Aug 17 07:20:08 stevo1 kernel: [ 2130.461614] nvme_rdma_wq    S
> ffff880fedd7bd58     0  4607      2 0x10000080
> Aug 17 07:20:08 stevo1 kernel: [ 2130.469572]  ffff880fedd7bd58
> ffff8810381a15c0 ffff88102bdb5700 ffff881032b66250
> Aug 17 07:20:08 stevo1 kernel: [ 2130.477972]  ffffffff81038959
> ffff880fedd7bd48 ffff880fedd78008 ffff8810206dde00
> Aug 17 07:20:08 stevo1 kernel: [ 2130.486357]  ffff88102bdb5700
> ffffffff810a1730 ffff8810381abed8 ffff880fedd7bda8
> Aug 17 07:20:08 stevo1 kernel: [ 2130.494755] Call Trace:
> Aug 17 07:20:08 stevo1 kernel: [ 2130.498088]  [<ffffffff81038959>] ?
> sched_clock+0x9/0x10
> Aug 17 07:20:08 stevo1 kernel: [ 2130.504287]  [<ffffffff810a1730>] ?
> process_one_work+0x4d0/0x4d0
> Aug 17 07:20:08 stevo1 kernel: [ 2130.511168]  [<ffffffff816ddde0>]
> schedule+0x40/0xb0
> Aug 17 07:20:08 stevo1 kernel: [ 2130.517002]  [<ffffffff810b1064>] ?
> set_user_nice+0x114/0x1b0
> Aug 17 07:20:08 stevo1 kernel: [ 2130.523623]  [<ffffffff810a1a22>]
> rescuer_thread+0x2f2/0x380
> Aug 17 07:20:08 stevo1 kernel: [ 2130.530146]  [<ffffffff816dd920>] ?
> __schedule+0x1f0/0x5b0
> Aug 17 07:20:08 stevo1 kernel: [ 2130.536501]  [<ffffffff810cb8a6>] ?
> __wake_up_common+0x56/0x90
> Aug 17 07:20:08 stevo1 kernel: [ 2130.543173]  [<ffffffff810a1730>] ?
> process_one_work+0x4d0/0x4d0
> Aug 17 07:20:08 stevo1 kernel: [ 2130.550013]  [<ffffffff816ddde0>] ?
> schedule+0x40/0xb0
> Aug 17 07:20:08 stevo1 kernel: [ 2130.555971]  [<ffffffff810a1730>] ?
> process_one_work+0x4d0/0x4d0
> Aug 17 07:20:08 stevo1 kernel: [ 2130.562800]  [<ffffffff810a6bbc>]
> kthread+0xcc/0xf0
> Aug 17 07:20:08 stevo1 kernel: [ 2130.568485]  [<ffffffff810b157e>] ?
> schedule_tail+0x1e/0xc0
> Aug 17 07:20:08 stevo1 kernel: [ 2130.574871]  [<ffffffff816e1a7f>]
> ret_from_fork+0x1f/0x40
> Aug 17 07:20:08 stevo1 kernel: [ 2130.581073]  [<ffffffff810a6af0>] ?
> kthread_freezable_should_stop+0x70/0x70
>
> Here is another thread in the delete controller sysfs path:
>
> Aug 17 07:20:05 stevo1 kernel: [ 2127.184581] bash            D
> ffff881034547ac8     0  4066   4056 0x10000080
> Aug 17 07:20:05 stevo1 kernel: [ 2127.192445]  ffff881034547ac8
> ffff881033f04140 ffff88102bc80000 ffff881032b66250
> Aug 17 07:20:05 stevo1 kernel: [ 2127.200730]  ffff880fefe64230
> ffff88103e88aa50 ffff881034544008 ffff881034547c98
> Aug 17 07:20:05 stevo1 kernel: [ 2127.209022]  ffff881034547c08
> ffff88102bc80000 7fffffffffffffff ffff881034547b18
> Aug 17 07:20:05 stevo1 kernel: [ 2127.217278] Call Trace:
> Aug 17 07:20:05 stevo1 kernel: [ 2127.220523]  [<ffffffff816ddde0>]
> schedule+0x40/0xb0
> Aug 17 07:20:05 stevo1 kernel: [ 2127.226244]  [<ffffffff810b2de4>] ?
> try_to_wake_up+0x54/0x240
> Aug 17 07:20:05 stevo1 kernel: [ 2127.232753]  [<ffffffff816e0a8d>]
> schedule_timeout+0x2ad/0x410
> Aug 17 07:20:05 stevo1 kernel: [ 2127.239353]  [<ffffffff810b3025>] ?
> wake_up_process+0x15/0x20
> Aug 17 07:20:05 stevo1 kernel: [ 2127.245871]  [<ffffffff8109d940>] ?
> wake_up_worker+0x30/0x40
> Aug 17 07:20:05 stevo1 kernel: [ 2127.252285]  [<ffffffff8109fdbc>] ?
> insert_work+0x6c/0xb0
> Aug 17 07:20:05 stevo1 kernel: [ 2127.258432]  [<ffffffff816deefe>]
> wait_for_completion+0xde/0x110
> Aug 17 07:20:05 stevo1 kernel: [ 2127.263351] nvme nvme4: rdma_resolve_addr
> wait failed (-110).
> Aug 17 07:20:05 stevo1 kernel: [ 2127.263355] nvme nvme4: Failed reconnect
> attempt, requeueing...
> Aug 17 07:20:05 stevo1 kernel: [ 2127.278353]  [<ffffffff810b2fd0>] ?
> try_to_wake_up+0x240/0x240
> Aug 17 07:20:05 stevo1 kernel: [ 2127.284912]  [<ffffffff810a0e0b>]
> flush_work+0x2b/0x40
> Aug 17 07:20:05 stevo1 kernel: [ 2127.290771]  [<ffffffff8109dcc0>] ?
> worker_detach_from_pool+0xa0/0xa0
> Aug 17 07:20:05 stevo1 kernel: [ 2127.297970]  [<ffffffffa064e46a>] ?
> __nvme_rdma_del_ctrl+0x4a/0x60 [nvme_rdma]
> Aug 17 07:20:05 stevo1 kernel: [ 2127.305945]  [<ffffffffa064e4b8>]
> nvme_rdma_del_ctrl+0x38/0x50 [nvme_rdma]
> Aug 17 07:20:05 stevo1 kernel: [ 2127.313577]  [<ffffffffa02703d4>]
> nvme_sysfs_delete+0x34/0x50 [nvme_core]
> Aug 17 07:20:05 stevo1 kernel: [ 2127.321111]  [<ffffffff814884a0>]
> dev_attr_store+0x20/0x30
> Aug 17 07:20:05 stevo1 kernel: [ 2127.327343]  [<ffffffff812ae29e>]
> sysfs_kf_write+0x3e/0x40
> Aug 17 07:20:05 stevo1 kernel: [ 2127.333551]  [<ffffffff812add13>]
> kernfs_fop_write+0x113/0x1b0
> Aug 17 07:20:05 stevo1 kernel: [ 2127.340127]  [<ffffffff8115a588>] ?
> trace_buffer_unlock_commit_regs+0x68/0x80
> Aug 17 07:20:05 stevo1 kernel: [ 2127.348017]  [<ffffffff8122cfe4>]
> __vfs_write+0x34/0x120
> Aug 17 07:20:05 stevo1 kernel: [ 2127.354083]  [<ffffffff81003725>] ?
> trace_event_raw_event_sys_enter+0xb5/0x130
> Aug 17 07:20:05 stevo1 kernel: [ 2127.362060]  [<ffffffff810d1d06>] ?
> percpu_down_read+0x16/0x60
> Aug 17 07:20:05 stevo1 kernel: [ 2127.368667]  [<ffffffff8122d191>]
> vfs_write+0xc1/0x130
> Aug 17 07:20:05 stevo1 kernel: [ 2127.374559]  [<ffffffff81249633>] ?
> __fdget+0x13/0x20
> Aug 17 07:20:05 stevo1 kernel: [ 2127.380367]  [<ffffffff8122d306>]
> SyS_write+0x56/0xc0
> Aug 17 07:20:05 stevo1 kernel: [ 2127.386201]  [<ffffffff81003e7d>]
> do_syscall_64+0x7d/0x230
> Aug 17 07:20:05 stevo1 kernel: [ 2127.392485]  [<ffffffff8106f057>] ?
> do_page_fault+0x37/0x90
> Aug 17 07:20:05 stevo1 kernel: [ 2127.398866]  [<ffffffff816e1921>]
> entry_SYSCALL64_slow_path+0x25/0x25
>
> Here is the reconnect workq thread stuck trying to connect:
>
> Aug 17 07:19:45 stevo1 kernel: [ 2107.446780] Workqueue: nvme_rdma_wq
> nvme_rdma_reconnect_ctrl_work [nvme_rdma]
> Aug 17 07:19:45 stevo1 kernel: [ 2107.454764]  ffff881033a57aa8
> ffff88103815c140 ffff8810371dab80 ffff881032b66250
> Aug 17 07:19:45 stevo1 kernel: [ 2107.463091]  00000001001b93e9
> 0000000000000001 ffff881033a54008 ffff88103ee500c0
> Aug 17 07:19:45 stevo1 kernel: [ 2107.471431]  00000001001b93e9
> 0000000000000000 ffff88103ee500c0 ffff881033a57af8
> Aug 17 07:19:45 stevo1 kernel: [ 2107.479805] Call Trace:
> Aug 17 07:19:45 stevo1 kernel: [ 2107.483076]  [<ffffffff816ddde0>]
> schedule+0x40/0xb0
> Aug 17 07:19:45 stevo1 kernel: [ 2107.488892]  [<ffffffff810f4a80>] ?
> internal_add_timer+0x70/0x80
> Aug 17 07:19:45 stevo1 kernel: [ 2107.495765]  [<ffffffff816e09fb>]
> schedule_timeout+0x21b/0x410
> Aug 17 07:19:45 stevo1 kernel: [ 2107.502464]  [<ffffffff810f4c10>] ?
> del_timer+0x70/0x70
> Aug 17 07:19:45 stevo1 kernel: [ 2107.508547]  [<ffffffffa031da9f>] ?
> rdma_resolve_ip+0x1cf/0x1e0 [ib_core]
> Aug 17 07:19:45 stevo1 kernel: [ 2107.516183]  [<ffffffffa0736fe0>] ?
> cma_work_handler+0xa0/0xa0 [rdma_cm]
> Aug 17 07:19:45 stevo1 kernel: [ 2107.523721]  [<ffffffff816deb6f>]
> wait_for_completion_interruptible_timeout+0xef/0x130
> Aug 17 07:19:45 stevo1 kernel: [ 2107.532472]  [<ffffffff810b2fd0>] ?
> try_to_wake_up+0x240/0x240
> Aug 17 07:19:45 stevo1 kernel: [ 2107.539152]  [<ffffffffa064f5c2>]
> nvme_rdma_init_queue+0xe2/0x180 [nvme_rdma]
> Aug 17 07:19:45 stevo1 kernel: [ 2107.547132]  [<ffffffffa064f9bc>]
> nvme_rdma_reconnect_ctrl_work+0x9c/0x1f0 [nvme_rdma]
> Aug 17 07:19:45 stevo1 kernel: [ 2107.555908]  [<ffffffff810a13e3>]
> process_one_work+0x183/0x4d0
> Aug 17 07:19:45 stevo1 kernel: [ 2107.562592]  [<ffffffff816dd920>] ?
> __schedule+0x1f0/0x5b0
> Aug 17 07:19:45 stevo1 kernel: [ 2107.568938]  [<ffffffff816ddde0>] ?
> schedule+0x40/0xb0
> Aug 17 07:19:45 stevo1 kernel: [ 2107.574927]  [<ffffffff810a207d>]
> worker_thread+0x16d/0x530
> Aug 17 07:19:45 stevo1 kernel: [ 2107.581346]  [<ffffffff810a1f10>] ?
> maybe_create_worker+0x120/0x120
> Aug 17 07:19:45 stevo1 kernel: [ 2107.588474]  [<ffffffff816dd920>] ?
> __schedule+0x1f0/0x5b0
> Aug 17 07:19:45 stevo1 kernel: [ 2107.594813]  [<ffffffff810cb8a6>] ?
> __wake_up_common+0x56/0x90
> Aug 17 07:19:45 stevo1 kernel: [ 2107.601488]  [<ffffffff810a1f10>] ?
> maybe_create_worker+0x120/0x120
> Aug 17 07:19:45 stevo1 kernel: [ 2107.608625]  [<ffffffff816ddde0>] ?
> schedule+0x40/0xb0
> Aug 17 07:19:45 stevo1 kernel: [ 2107.614626]  [<ffffffff810a1f10>] ?
> maybe_create_worker+0x120/0x120
> Aug 17 07:19:45 stevo1 kernel: [ 2107.621745]  [<ffffffff810a6bbc>]
> kthread+0xcc/0xf0
> Aug 17 07:19:45 stevo1 kernel: [ 2107.627464]  [<ffffffff810b157e>] ?
> schedule_tail+0x1e/0xc0
> Aug 17 07:19:45 stevo1 kernel: [ 2107.633886]  [<ffffffff816e1a7f>]
> ret_from_fork+0x1f/0x40
> Aug 17 07:19:45 stevo1 kernel: [ 2107.640121]  [<ffffffff810a6af0>] ?
> kthread_freezable_should_stop+0x70/0x70
>
> Here is controller delete workq thread:
>
> Aug 17 07:19:42 stevo1 kernel: [ 2104.515910] Workqueue: nvme_rdma_wq
> nvme_rdma_del_ctrl_work [nvme_rdma]
> Aug 17 07:19:42 stevo1 kernel: [ 2104.523373]  ffff881033f13888
> ffff88103817ab80 ffff881033f04140 ffff881032b66250
> Aug 17 07:19:42 stevo1 kernel: [ 2104.531744]  00000001001c1a60
> 0000000000000005 ffff881033f10008 ffff88103ef500c0
> Aug 17 07:19:42 stevo1 kernel: [ 2104.540116]  00000001001c1a60
> 0000000000000000 ffff88103ef500c0 ffff881033f138d8
> Aug 17 07:19:42 stevo1 kernel: [ 2104.548501] Call Trace:
> Aug 17 07:19:42 stevo1 kernel: [ 2104.551777]  [<ffffffff816ddde0>]
> schedule+0x40/0xb0
> Aug 17 07:19:42 stevo1 kernel: [ 2104.557593]  [<ffffffff810f4a80>] ?
> internal_add_timer+0x70/0x80
> Aug 17 07:19:42 stevo1 kernel: [ 2104.564445]  [<ffffffff816e09fb>]
> schedule_timeout+0x21b/0x410
> Aug 17 07:19:42 stevo1 kernel: [ 2104.571117]  [<ffffffff810f4c10>] ?
> del_timer+0x70/0x70
> Aug 17 07:19:42 stevo1 kernel: [ 2104.577191]  [<ffffffff810fe67c>] ?
> ktime_get+0x4c/0xc0
> Aug 17 07:19:42 stevo1 kernel: [ 2104.583248]  [<ffffffff816dd6c4>]
> io_schedule_timeout+0xa4/0x110
> Aug 17 07:19:42 stevo1 kernel: [ 2104.590083]  [<ffffffff816df013>]
> wait_for_completion_io_timeout+0xe3/0x120
> Aug 17 07:19:42 stevo1 kernel: [ 2104.597943]  [<ffffffff810b2fd0>] ?
> try_to_wake_up+0x240/0x240
> Aug 17 07:19:42 stevo1 kernel: [ 2104.604613]  [<ffffffff81332a4d>] ?
> blk_execute_rq_nowait+0x7d/0x160
> Aug 17 07:19:42 stevo1 kernel: [ 2104.611810]  [<ffffffff81332bbc>]
> blk_execute_rq+0x8c/0x110
> Aug 17 07:19:42 stevo1 kernel: [ 2104.618221]  [<ffffffffa02715c5>]
> __nvme_submit_sync_cmd+0x85/0xe0 [nvme_core]
> Aug 17 07:19:42 stevo1 kernel: [ 2104.626293]  [<ffffffffa0672526>]
> nvmf_reg_write32+0x66/0xb0 [nvme_fabrics]
> Aug 17 07:19:42 stevo1 kernel: [ 2104.634086]  [<ffffffffa027095e>]
> nvme_shutdown_ctrl+0x4e/0xe0 [nvme_core]
> Aug 17 07:19:42 stevo1 kernel: [ 2104.641796]  [<ffffffffa064f161>]
> nvme_rdma_shutdown_ctrl+0xa1/0xe0 [nvme_rdma]
> Aug 17 07:19:42 stevo1 kernel: [ 2104.649940]  [<ffffffffa064f21c>]
> __nvme_rdma_remove_ctrl+0x7c/0x80 [nvme_rdma]
> Aug 17 07:19:42 stevo1 kernel: [ 2104.658093]  [<ffffffffa064f25a>]
> nvme_rdma_del_ctrl_work+0x1a/0x20 [nvme_rdma]
> Aug 17 07:19:42 stevo1 kernel: [ 2104.666248]  [<ffffffff810a13e3>]
> process_one_work+0x183/0x4d0
> Aug 17 07:19:42 stevo1 kernel: [ 2104.672936]  [<ffffffff816dd920>] ?
> __schedule+0x1f0/0x5b0
> Aug 17 07:19:42 stevo1 kernel: [ 2104.679269]  [<ffffffff8109daf2>] ?
> move_linked_works+0x62/0x90
> Aug 17 07:19:42 stevo1 kernel: [ 2104.686032]  [<ffffffff810a2204>]
> worker_thread+0x2f4/0x530
> Aug 17 07:19:42 stevo1 kernel: [ 2104.692441]  [<ffffffff810a1f10>] ?
> maybe_create_worker+0x120/0x120
> Aug 17 07:19:42 stevo1 kernel: [ 2104.699540]  [<ffffffff816dd920>] ?
> __schedule+0x1f0/0x5b0
> Aug 17 07:19:42 stevo1 kernel: [ 2104.705867]  [<ffffffff810cb8a6>] ?
> __wake_up_common+0x56/0x90
> Aug 17 07:19:42 stevo1 kernel: [ 2104.712534]  [<ffffffff810a1f10>] ?
> maybe_create_worker+0x120/0x120
> Aug 17 07:19:42 stevo1 kernel: [ 2104.719257] nvme nvme4: rdma_resolve_addr
> wait failed (-104).
> Aug 17 07:19:42 stevo1 kernel: [ 2104.719261] nvme nvme4: Failed reconnect
> attempt, requeueing...
> Aug 17 07:19:42 stevo1 kernel: [ 2104.733085]  [<ffffffff816ddde0>] ?
> schedule+0x40/0xb0
> Aug 17 07:19:42 stevo1 kernel: [ 2104.739063]  [<ffffffff810a1f10>] ?
> maybe_create_worker+0x120/0x120
> Aug 17 07:19:42 stevo1 kernel: [ 2104.746163]  [<ffffffff810a6bbc>]
> kthread+0xcc/0xf0
> Aug 17 07:19:42 stevo1 kernel: [ 2104.751878]  [<ffffffff810b157e>] ?
> schedule_tail+0x1e/0xc0
>
>
>


Can this be related due to the fact that we use a signle-threaded
workqueue for delete/reset/reconnect? (delete cancel_sync the active
reconnect work...)

Does this untested patch help?
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 8d2875b4c56d..78a47c17fb01 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1997,7 +1997,7 @@ static struct nvmf_transport_ops 
nvme_rdma_transport = {

  static int __init nvme_rdma_init_module(void)
  {
-       nvme_rdma_wq = create_workqueue("nvme_rdma_wq");
+       nvme_rdma_wq = alloc_workqueue("nvme_rdma_wq", 0, 0);
         if (!nvme_rdma_wq)
                 return -ENOMEM;
--

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
  2016-08-17 14:46     ` Sagi Grimberg
@ 2016-08-17 15:13       ` Steve Wise
  2016-08-18  7:01         ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Wise @ 2016-08-17 15:13 UTC (permalink / raw)


> 
> Can this be related due to the fact that we use a signle-threaded
> workqueue for delete/reset/reconnect? (delete cancel_sync the active
> reconnect work...)
> 
> Does this untested patch help?

That seems to do it!


> --
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 8d2875b4c56d..78a47c17fb01 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1997,7 +1997,7 @@ static struct nvmf_transport_ops
> nvme_rdma_transport = {
> 
>   static int __init nvme_rdma_init_module(void)
>   {
> -       nvme_rdma_wq = create_workqueue("nvme_rdma_wq");
> +       nvme_rdma_wq = alloc_workqueue("nvme_rdma_wq", 0, 0);
>          if (!nvme_rdma_wq)
>                  return -ENOMEM;
> --

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
  2016-08-17 15:13       ` Steve Wise
@ 2016-08-18  7:01         ` Sagi Grimberg
  2016-08-18 13:59           ` Steve Wise
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2016-08-18  7:01 UTC (permalink / raw)



>> Can this be related due to the fact that we use a signle-threaded
>> workqueue for delete/reset/reconnect? (delete cancel_sync the active
>> reconnect work...)
>>
>> Does this untested patch help?
>
> That seems to do it!

Is this a formal tested-by?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
  2016-08-18  7:01         ` Sagi Grimberg
@ 2016-08-18 13:59           ` Steve Wise
  2016-08-18 14:47             ` Steve Wise
  2016-08-18 15:21             ` 'Christoph Hellwig'
  0 siblings, 2 replies; 15+ messages in thread
From: Steve Wise @ 2016-08-18 13:59 UTC (permalink / raw)


> 
> >> Can this be related due to the fact that we use a signle-threaded
> >> workqueue for delete/reset/reconnect? (delete cancel_sync the active
> >> reconnect work...)
> >>
> >> Does this untested patch help?
> >
> > That seems to do it!
> 
> Is this a formal tested-by?

Sure, but let me ask a question:  So the bug was that the delete controller
worker was blocked waiting for the reconnect worker to complete.  Yes?  And the
reconnect worker was never completing?  Why is that?  Here are a few tidbits
about iWARP connections:  address resolution == neighbor discovery.  So if the
neighbor is unreachable, it will take a few seconds for the OS to give up and
fail the resolution.  If the neigh entry is valid and the peer becomes
unreachable during connection setup, it might take 60 seconds or so for a
connect operation to give up and fail.  So this is probably slowing the
reconnect thread down.   But shouldn't the reconnect thread notice that a delete
is trying to happen and bail out?  

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
  2016-08-18 13:59           ` Steve Wise
@ 2016-08-18 14:47             ` Steve Wise
  2016-08-18 15:21             ` 'Christoph Hellwig'
  1 sibling, 0 replies; 15+ messages in thread
From: Steve Wise @ 2016-08-18 14:47 UTC (permalink / raw)


> >
> > >> Can this be related due to the fact that we use a signle-threaded
> > >> workqueue for delete/reset/reconnect? (delete cancel_sync the active
> > >> reconnect work...)
> > >>
> > >> Does this untested patch help?
> > >
> > > That seems to do it!
> >
> > Is this a formal tested-by?
> 
> Sure, 

While the patch worked for deleting the controllers, it still hangs if I reboot
the host after the target reboots and the host begins kato recovery.  Looks like
the reconnect thread just gets stuck doing this:

[  947.095936] nvme nvme4: Failed reconnect attempt, requeueing...
[  947.616015] nvme nvme5: rdma_resolve_addr wait failed (-110).
[  947.623943] nvme nvme5: Failed reconnect attempt, requeueing...
[  948.128012] nvme nvme6: rdma_resolve_addr wait failed (-110).
[  948.135956] nvme nvme6: Failed reconnect attempt, requeueing...
[  948.624052] nvme nvme7: rdma_resolve_addr wait failed (-104).

I'll try and get a crash dump of this state to look at all the threads.  But I
think we need the reconnect worker to give up if the controller it is
reconnecting is getting deleted or the device removed.

>
> but let me ask a question:  So the bug was that the delete controller
> worker was blocked waiting for the reconnect worker to complete.  Yes?  And
the
> reconnect worker was never completing?  Why is that?  Here are a few tidbits
> about iWARP connections:  address resolution == neighbor discovery.  So if the
> neighbor is unreachable, it will take a few seconds for the OS to give up and
> fail the resolution.  If the neigh entry is valid and the peer becomes
> unreachable during connection setup, it might take 60 seconds or so for a
> connect operation to give up and fail.  So this is probably slowing the
> reconnect thread down.   But shouldn't the reconnect thread notice that a
delete
> is trying to happen and bail out?
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
  2016-08-18 13:59           ` Steve Wise
  2016-08-18 14:47             ` Steve Wise
@ 2016-08-18 15:21             ` 'Christoph Hellwig'
  2016-08-18 17:59               ` Steve Wise
  2016-08-19  8:58               ` Sagi Grimberg
  1 sibling, 2 replies; 15+ messages in thread
From: 'Christoph Hellwig' @ 2016-08-18 15:21 UTC (permalink / raw)


On Thu, Aug 18, 2016@08:59:15AM -0500, Steve Wise wrote:
> Sure, but let me ask a question:  So the bug was that the delete controller
> worker was blocked waiting for the reconnect worker to complete.

Btw, in that case the patch is not actually correct, as even workqueue
with a higher concurrency level MAY deadlock under enough memory
pressure.  We'll need separate workqueues to handle this case I think.

> Yes?  And the
> reconnect worker was never completing?  Why is that?  Here are a few tidbits
> about iWARP connections:  address resolution == neighbor discovery.  So if the
> neighbor is unreachable, it will take a few seconds for the OS to give up and
> fail the resolution.  If the neigh entry is valid and the peer becomes
> unreachable during connection setup, it might take 60 seconds or so for a
> connect operation to give up and fail.  So this is probably slowing the
> reconnect thread down.   But shouldn't the reconnect thread notice that a delete
> is trying to happen and bail out? 

I think we should aim for a state machine that can detect this, but
we'll have to see if that will end up in synchronization overkill.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
  2016-08-18 15:21             ` 'Christoph Hellwig'
@ 2016-08-18 17:59               ` Steve Wise
  2016-08-18 18:50                 ` Steve Wise
  2016-08-19  8:58               ` Sagi Grimberg
  1 sibling, 1 reply; 15+ messages in thread
From: Steve Wise @ 2016-08-18 17:59 UTC (permalink / raw)


> Btw, in that case the patch is not actually correct, as even workqueue
> with a higher concurrency level MAY deadlock under enough memory
> pressure.  We'll need separate workqueues to handle this case I think.
> 
> > Yes?  And the
> > reconnect worker was never completing?  Why is that?  Here are a few tidbits
> > about iWARP connections:  address resolution == neighbor discovery.  So if
the
> > neighbor is unreachable, it will take a few seconds for the OS to give up
and
> > fail the resolution.  If the neigh entry is valid and the peer becomes
> > unreachable during connection setup, it might take 60 seconds or so for a
> > connect operation to give up and fail.  So this is probably slowing the
> > reconnect thread down.   But shouldn't the reconnect thread notice that a
delete
> > is trying to happen and bail out?
> 
> I think we should aim for a state machine that can detect this, but
> we'll have to see if that will end up in synchronization overkill.

Looking at the state machine I don't see why the reconnect thread would get
stuck continually rescheduling once the controller was deleted.  Changing from
RECONNECTING to DELETING will be done by nvme_change_ctrl_state().  So once that
happens, in __nvme_rdma_del_ctrl() , the thread running reconnect logic should
stop rescheduling due to this in the failure logic of
nvme_rdma_reconnect_ctrl_work():

...
requeue:
        /* Make sure we are not resetting/deleting */
        if (ctrl->ctrl.state == NVME_CTRL_RECONNECTING) {
                dev_info(ctrl->ctrl.device,
                        "Failed reconnect attempt, requeueing...\n");
                queue_delayed_work(nvme_rdma_wq, &ctrl->reconnect_work,
                                        ctrl->reconnect_delay * HZ);
        }
...

So something isn't happening like I think it is, I guess.

Also, even with the workqueue_alloc() change, a reboot during reconnect gets
stuck.  I never see the controllers getting deleted nor the unplug event handler
happening, so the reconnect thread seems to hang the shutdown/reboot...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
  2016-08-18 17:59               ` Steve Wise
@ 2016-08-18 18:50                 ` Steve Wise
  2016-08-18 19:11                   ` Steve Wise
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Wise @ 2016-08-18 18:50 UTC (permalink / raw)


> > Btw, in that case the patch is not actually correct, as even workqueue
> > with a higher concurrency level MAY deadlock under enough memory
> > pressure.  We'll need separate workqueues to handle this case I think.
> >
> > > Yes?  And the
> > > reconnect worker was never completing?  Why is that?  Here are a few
tidbits
> > > about iWARP connections:  address resolution == neighbor discovery.  So if
> the
> > > neighbor is unreachable, it will take a few seconds for the OS to give up
> and
> > > fail the resolution.  If the neigh entry is valid and the peer becomes
> > > unreachable during connection setup, it might take 60 seconds or so for a
> > > connect operation to give up and fail.  So this is probably slowing the
> > > reconnect thread down.   But shouldn't the reconnect thread notice that a
> delete
> > > is trying to happen and bail out?
> >
> > I think we should aim for a state machine that can detect this, but
> > we'll have to see if that will end up in synchronization overkill.
> 
> Looking at the state machine I don't see why the reconnect thread would get
> stuck continually rescheduling once the controller was deleted.  Changing from
> RECONNECTING to DELETING will be done by nvme_change_ctrl_state().  So once
> that
> happens, in __nvme_rdma_del_ctrl() , the thread running reconnect logic should
> stop rescheduling due to this in the failure logic of
> nvme_rdma_reconnect_ctrl_work():
> 
> ...
> requeue:
>         /* Make sure we are not resetting/deleting */
>         if (ctrl->ctrl.state == NVME_CTRL_RECONNECTING) {
>                 dev_info(ctrl->ctrl.device,
>                         "Failed reconnect attempt, requeueing...\n");
>                 queue_delayed_work(nvme_rdma_wq, &ctrl->reconnect_work,
>                                         ctrl->reconnect_delay * HZ);
>         }
> ...
> 
> So something isn't happening like I think it is, I guess.


I see what happens.  Assume the 10 controllers are reconnecting and failing,
thus they reschedule each time.  I then run a script to delete all 10 devices
sequentially.  Like this:

for i in $(seq 1 10); do nvme disconnect -d nvme${i}n1; done

The first device, nvme1n1 gets a disconnect/delete command and changes the
controller state from RECONNECTING to DELETING, and then schedules
nvme_rdma_del_ctrl_work(), but that is stuck behind the 9 others continually
reconnecting, failing, and rescheduling.  I'm not sure why the delete never gets
run though?  I would think if it is scheduled, then it would get executed before
the reconnect threads rescheduling?  Maybe we need some round-robin mode for our
workq?  And because the first delete is stuck, none of the subsequent delete
commands get executed.  Note: If I run each disconnect command in the
background, then they all get cleaned up ok.   Like this:

for i in $(seq 1 10); do nvme disconnect -d nvme${i}n1 & done

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
  2016-08-18 18:50                 ` Steve Wise
@ 2016-08-18 19:11                   ` Steve Wise
  0 siblings, 0 replies; 15+ messages in thread
From: Steve Wise @ 2016-08-18 19:11 UTC (permalink / raw)


> > > Btw, in that case the patch is not actually correct, as even workqueue
> > > with a higher concurrency level MAY deadlock under enough memory
> > > pressure.  We'll need separate workqueues to handle this case I think.
> > >
> > > > Yes?  And the
> > > > reconnect worker was never completing?  Why is that?  Here are a few
> tidbits
> > > > about iWARP connections:  address resolution == neighbor discovery.  So
if
> > the
> > > > neighbor is unreachable, it will take a few seconds for the OS to give
up
> > and
> > > > fail the resolution.  If the neigh entry is valid and the peer becomes
> > > > unreachable during connection setup, it might take 60 seconds or so for
a
> > > > connect operation to give up and fail.  So this is probably slowing the
> > > > reconnect thread down.   But shouldn't the reconnect thread notice that
a
> > delete
> > > > is trying to happen and bail out?
> > >
> > > I think we should aim for a state machine that can detect this, but
> > > we'll have to see if that will end up in synchronization overkill.
> >
> > Looking at the state machine I don't see why the reconnect thread would get
> > stuck continually rescheduling once the controller was deleted.  Changing
from
> > RECONNECTING to DELETING will be done by nvme_change_ctrl_state().  So once
> > that
> > happens, in __nvme_rdma_del_ctrl() , the thread running reconnect logic
should
> > stop rescheduling due to this in the failure logic of
> > nvme_rdma_reconnect_ctrl_work():
> >
> > ...
> > requeue:
> >         /* Make sure we are not resetting/deleting */
> >         if (ctrl->ctrl.state == NVME_CTRL_RECONNECTING) {
> >                 dev_info(ctrl->ctrl.device,
> >                         "Failed reconnect attempt, requeueing...\n");
> >                 queue_delayed_work(nvme_rdma_wq, &ctrl->reconnect_work,
> >                                         ctrl->reconnect_delay * HZ);
> >         }
> > ...
> >
> > So something isn't happening like I think it is, I guess.
> 
> 
> I see what happens.  Assume the 10 controllers are reconnecting and failing,
> thus they reschedule each time.  I then run a script to delete all 10 devices
> sequentially.  Like this:
> 
> for i in $(seq 1 10); do nvme disconnect -d nvme${i}n1; done
> 
> The first device, nvme1n1 gets a disconnect/delete command and changes the
> controller state from RECONNECTING to DELETING, and then schedules
> nvme_rdma_del_ctrl_work(), but that is stuck behind the 9 others continually
> reconnecting, failing, and rescheduling.  I'm not sure why the delete never
gets
> run though?  I would think if it is scheduled, then it would get executed
before
> the reconnect threads rescheduling?  Maybe we need some round-robin mode for
> our
> workq?  And because the first delete is stuck, none of the subsequent delete
> commands get executed.  Note: If I run each disconnect command in the
> background, then they all get cleaned up ok.   Like this:
> 
> for i in $(seq 1 10); do nvme disconnect -d nvme${i}n1 & done
> 
> 

Experimenting more, running the 'nvme disconnect's in the back ground doesn't
really avoid things getting stuck.  

BTW: I'm running these with the single threaded workq, to understand the
deadlock. (well, trying to understand...)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
  2016-08-18 15:21             ` 'Christoph Hellwig'
  2016-08-18 17:59               ` Steve Wise
@ 2016-08-19  8:58               ` Sagi Grimberg
  2016-08-19 14:22                 ` Steve Wise
       [not found]                 ` <008001d1fa25$0c960fb0$25c22f10$@opengridcomputing.com>
  1 sibling, 2 replies; 15+ messages in thread
From: Sagi Grimberg @ 2016-08-19  8:58 UTC (permalink / raw)



> Btw, in that case the patch is not actually correct, as even workqueue
> with a higher concurrency level MAY deadlock under enough memory
> pressure.  We'll need separate workqueues to handle this case I think.

Steve, does it help if you run the delete on the system_long_wq [1]?
Note, I've seen problems with forward progress when sharing
a workqueue between teardown/reconnect sequences and the rest of
the system (mostly in srp).

>> Yes?  And the
>> reconnect worker was never completing?  Why is that?  Here are a few tidbits
>> about iWARP connections:  address resolution == neighbor discovery.  So if the
>> neighbor is unreachable, it will take a few seconds for the OS to give up and
>> fail the resolution.  If the neigh entry is valid and the peer becomes
>> unreachable during connection setup, it might take 60 seconds or so for a
>> connect operation to give up and fail.  So this is probably slowing the
>> reconnect thread down.   But shouldn't the reconnect thread notice that a delete
>> is trying to happen and bail out?
>
> I think we should aim for a state machine that can detect this, but
> we'll have to see if that will end up in synchronization overkill.

The reconnect logic does take care of this state transition...

[1]:
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 8d2875b4c56d..93ea2831ff31 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1342,7 +1342,7 @@ static int nvme_rdma_device_unplug(struct 
nvme_rdma_queue *queue)
         }

         /* Queue controller deletion */
-       queue_work(nvme_rdma_wq, &ctrl->delete_work);
+       queue_work(system_long_wq, &ctrl->delete_work);
         flush_work(&ctrl->delete_work);
         return ret;
  }
@@ -1681,7 +1681,7 @@ static int __nvme_rdma_del_ctrl(struct 
nvme_rdma_ctrl *ctrl)
         if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING))
                 return -EBUSY;

-       if (!queue_work(nvme_rdma_wq, &ctrl->delete_work))
+       if (!queue_work(system_long_wq, &ctrl->delete_work))
                 return -EBUSY;

         return 0;
@@ -1763,7 +1763,7 @@ static int nvme_rdma_reset_ctrl(struct nvme_ctrl 
*nctrl)
         if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING))
                 return -EBUSY;

-       if (!queue_work(nvme_rdma_wq, &ctrl->reset_work))
+       if (!queue_work(system_long_wq, &ctrl->reset_work))
                 return -EBUSY;

         flush_work(&ctrl->reset_work);
--

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
  2016-08-19  8:58               ` Sagi Grimberg
@ 2016-08-19 14:22                 ` Steve Wise
       [not found]                 ` <008001d1fa25$0c960fb0$25c22f10$@opengridcomputing.com>
  1 sibling, 0 replies; 15+ messages in thread
From: Steve Wise @ 2016-08-19 14:22 UTC (permalink / raw)


> 
> > Btw, in that case the patch is not actually correct, as even workqueue
> > with a higher concurrency level MAY deadlock under enough memory
> > pressure.  We'll need separate workqueues to handle this case I think.
> 
> Steve, does it help if you run the delete on the system_long_wq [1]?
> Note, I've seen problems with forward progress when sharing
> a workqueue between teardown/reconnect sequences and the rest of
> the system (mostly in srp).
> 

I can try this, but see my comments below.  I'm not sure there is any deadlock
at this point..

> >> Yes?  And the
> >> reconnect worker was never completing?  Why is that?  Here are a few
tidbits
> >> about iWARP connections:  address resolution == neighbor discovery.  So if
the
> >> neighbor is unreachable, it will take a few seconds for the OS to give up
and
> >> fail the resolution.  If the neigh entry is valid and the peer becomes
> >> unreachable during connection setup, it might take 60 seconds or so for a
> >> connect operation to give up and fail.  So this is probably slowing the
> >> reconnect thread down.   But shouldn't the reconnect thread notice that a
delete
> >> is trying to happen and bail out?
> >
> > I think we should aim for a state machine that can detect this, but
> > we'll have to see if that will end up in synchronization overkill.
> 
> The reconnect logic does take care of this state transition...

Yes, I agree.  The disconnect/delete command changes the controller state from
RECONNECTING to DELETEING and the reconnecting thread will not reschedule itself
for that controller.

In further debugging (see my subsequent emails), it appears that there really
isn't a deadlock.   First, let me describe the main problem: the IWCM will block
destroying a cm_id until the driver has completed a connection setup attempt.
See IWCM_F_CONNECT_WAIT in drivers/infiniband/core/iwcm.c.  Further, iw_cxgb4's
TCP engine can take up to 60 seconds to fail a TCP connection setup if the neigh
entry is valid yet the peer is unresponsive.   So what we see happening is that
when kato kicks in after the target reboots and _before_ the neigh entry for the
target is flushed due to no connectivity, the connection setup attempts all get
initiated by the reconnect logic in the nvmf/rdma host driver.  Even though the
nvme/rdma host driver times out such an attempt in ~1sec
(NVME_RDMA_CONNECT_TIMEOUT_MS), it gets stuck for up to 60 seconds destroying
the cm_id.   So for my setup, each controller has 32 io queues.  That causes the
reconnect thread to get stuck for VERY LONG periods of time.  Even if the
controllers are deleted thus changing the state of the controller to DELETING,
the thread will still get stuck for at least 60 seconds trying to destroy its
current connecting cm_id.  Then you multiply that by 10 controllers in my test
and you see that the reconnect logic is taking way too long to give up.

So I think I need to see about removing the IWCM_F_CONNECT_WAIT logic in the
iwcm.

One other thing:  in both nvme_rdma_device_unplug() and nvme_rdma_del_ctrl(),
the code kicks the delete_work thread to delete the controller and then calls
flush_work().  This is a possible touch-after-free, no?  The proper way, I
think, should be to take a ref on ctrl, kick the delete_work thread, call
flush_work(), and then nvme_put_ctrl(ctrl).  Do you agree?  While doing this
debug, I wondered if this issue was causing a delete thread to get stuck in
flush_work().  I never proved that, and I think the real issue is the
IWCM_F_CONNECT_WAIT logic.

Steve.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme/rdma initiator stuck on reboot
       [not found]                 ` <008001d1fa25$0c960fb0$25c22f10$@opengridcomputing.com>
@ 2016-08-19 14:24                   ` Steve Wise
  0 siblings, 0 replies; 15+ messages in thread
From: Steve Wise @ 2016-08-19 14:24 UTC (permalink / raw)


> One other thing:  in both nvme_rdma_device_unplug() and
> nvme_rdma_del_ctrl(), the code kicks the delete_work thread to delete the
> controller and then calls flush_work().  This is a possible
> touch-after-free, no?  The proper way, I think, should be to take a ref on
> ctrl, kick the delete_work thread, call flush_work(), and then
> nvme_put_ctrl(ctrl).  Do you agree? 

IE: do we need this:

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 9c69393..6198eaa 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1341,9 +1341,15 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue
*queue)
                ret = 1;
        }

-       /* Queue controller deletion */
+       /*
+        * Queue controller deletion. Keep a reference until all
+        * work is flushed since delete_work will free the ctrl mem
+        */
+       kref_get(&ctrl->ctrl.kref);
        queue_work(nvme_rdma_wq, &ctrl->delete_work);
        flush_work(&ctrl->delete_work);
+       nvme_put_ctrl(&ctrl->ctrl);
+
        return ret;
 }

@@ -1690,15 +1696,22 @@ static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl
*ctrl)
 static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl)
 {
        struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
-       int ret;
+       int ret = 0;
+
+       /*
+        * Keep a reference until all work is flushed since
+        * __nvme_rdma_del_ctrl can free the ctrl mem
+        */
+       kref_get(&ctrl->ctrl.kref);

        ret = __nvme_rdma_del_ctrl(ctrl);
        if (ret)
-               return ret;
+               goto out;

        flush_work(&ctrl->delete_work);
-
-       return 0;
+out:
+       nvme_put_ctrl(&ctrl->ctrl);
+       return ret;
 }

 static void nvme_rdma_remove_ctrl_work(struct work_struct *work)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-08-19 14:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-16 19:40 nvme/rdma initiator stuck on reboot Steve Wise
2016-08-17 10:23 ` Sagi Grimberg
2016-08-17 14:33   ` Steve Wise
2016-08-17 14:46     ` Sagi Grimberg
2016-08-17 15:13       ` Steve Wise
2016-08-18  7:01         ` Sagi Grimberg
2016-08-18 13:59           ` Steve Wise
2016-08-18 14:47             ` Steve Wise
2016-08-18 15:21             ` 'Christoph Hellwig'
2016-08-18 17:59               ` Steve Wise
2016-08-18 18:50                 ` Steve Wise
2016-08-18 19:11                   ` Steve Wise
2016-08-19  8:58               ` Sagi Grimberg
2016-08-19 14:22                 ` Steve Wise
     [not found]                 ` <008001d1fa25$0c960fb0$25c22f10$@opengridcomputing.com>
2016-08-19 14:24                   ` Steve Wise

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.