All of lore.kernel.org
 help / color / mirror / Atom feed
* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-01 15:57 ` Steve Wise
  0 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-01 15:57 UTC (permalink / raw)
  To: sagig, Christoph Hellwig
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hey guys,

I just hit an nvmf target NULL pointer deref BUG after a few hours of keep-alive
timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL which
causes the crash.

In the log, I see that the target side keep-alive fired:

[20676.867545] eth2: link up, 40Gbps, full-duplex, Tx/Rx PAUSE
[20677.079669] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[20677.079684] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!

Then all the queues are freed followed by the crash.

[20677.080066] nvmet_rdma: freeing queue 222
[20677.080074] nvmet_rdma: sending cmd response failed
[20677.080351] nvmet_rdma: freeing queue 227
[20677.080775] nvmet_rdma: freeing queue 230
[20677.081137] nvmet_rdma: freeing queue 232
[20677.081371] nvmet_rdma: freeing queue 234
[20677.081604] nvmet_rdma: freeing queue 236
[20677.081835] nvmet_rdma: freeing queue 237
[20677.082062] nvmet_rdma: freeing queue 238
[20677.082106] nvmet_rdma: freeing queue 239
[20677.082366] nvmet_rdma: freeing queue 240
[20677.082570] nvmet_rdma: freeing queue 241
[20677.082995] nvmet_rdma: freeing queue 242
[20677.083222] nvmet_rdma: freeing queue 243
[20677.083475] nvmet_rdma: freeing queue 244
[20677.083522] nvmet_rdma: freeing queue 245
[20677.083801] nvmet_rdma: freeing queue 246
[20677.084264] nvmet_rdma: freeing queue 247
[20677.084307] nvmet_rdma: freeing queue 248
[20677.084501] nvmet_rdma: freeing queue 249
[20677.084846] nvmet_rdma: freeing queue 250
[20677.085184] nvmet_rdma: freeing queue 252
[20677.085500] nvmet_rdma: freeing queue 254
[20677.085733] nvmet_rdma: freeing queue 256
[20677.085997] nvmet_rdma: freeing queue 258
[20677.086224] nvmet_rdma: freeing queue 260
[20677.086517] nvmet_rdma: freeing queue 262
[20677.086768] nvmet_rdma: freeing queue 264
[20677.087031] nvmet_rdma: freeing queue 266
[20677.087359] nvmet_rdma: freeing queue 268
[20677.087567] nvmet_rdma: freeing queue 270
[20677.087821] nvmet_rdma: freeing queue 272
[20677.088162] nvmet_rdma: freeing queue 274
[20677.088402] nvmet_rdma: freeing queue 276
[20677.090981] BUG: unable to handle kernel NULL pointer dereference at
0000000000000120
[20677.090988] IP: [<ffffffffa084b6b4>] nvmet_rdma_queue_disconnect+0x24/0x90
[nvmet_rdma]


So maybe there is just a race in that keep-alive can free the queue and yet a
DISCONNECTED event still received on the cm_id after the queue is freed?

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-01 15:57 ` Steve Wise
  0 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-01 15:57 UTC (permalink / raw)


Hey guys,

I just hit an nvmf target NULL pointer deref BUG after a few hours of keep-alive
timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL which
causes the crash.

In the log, I see that the target side keep-alive fired:

[20676.867545] eth2: link up, 40Gbps, full-duplex, Tx/Rx PAUSE
[20677.079669] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[20677.079684] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!

Then all the queues are freed followed by the crash.

[20677.080066] nvmet_rdma: freeing queue 222
[20677.080074] nvmet_rdma: sending cmd response failed
[20677.080351] nvmet_rdma: freeing queue 227
[20677.080775] nvmet_rdma: freeing queue 230
[20677.081137] nvmet_rdma: freeing queue 232
[20677.081371] nvmet_rdma: freeing queue 234
[20677.081604] nvmet_rdma: freeing queue 236
[20677.081835] nvmet_rdma: freeing queue 237
[20677.082062] nvmet_rdma: freeing queue 238
[20677.082106] nvmet_rdma: freeing queue 239
[20677.082366] nvmet_rdma: freeing queue 240
[20677.082570] nvmet_rdma: freeing queue 241
[20677.082995] nvmet_rdma: freeing queue 242
[20677.083222] nvmet_rdma: freeing queue 243
[20677.083475] nvmet_rdma: freeing queue 244
[20677.083522] nvmet_rdma: freeing queue 245
[20677.083801] nvmet_rdma: freeing queue 246
[20677.084264] nvmet_rdma: freeing queue 247
[20677.084307] nvmet_rdma: freeing queue 248
[20677.084501] nvmet_rdma: freeing queue 249
[20677.084846] nvmet_rdma: freeing queue 250
[20677.085184] nvmet_rdma: freeing queue 252
[20677.085500] nvmet_rdma: freeing queue 254
[20677.085733] nvmet_rdma: freeing queue 256
[20677.085997] nvmet_rdma: freeing queue 258
[20677.086224] nvmet_rdma: freeing queue 260
[20677.086517] nvmet_rdma: freeing queue 262
[20677.086768] nvmet_rdma: freeing queue 264
[20677.087031] nvmet_rdma: freeing queue 266
[20677.087359] nvmet_rdma: freeing queue 268
[20677.087567] nvmet_rdma: freeing queue 270
[20677.087821] nvmet_rdma: freeing queue 272
[20677.088162] nvmet_rdma: freeing queue 274
[20677.088402] nvmet_rdma: freeing queue 276
[20677.090981] BUG: unable to handle kernel NULL pointer dereference at
0000000000000120
[20677.090988] IP: [<ffffffffa084b6b4>] nvmet_rdma_queue_disconnect+0x24/0x90
[nvmet_rdma]


So maybe there is just a race in that keep-alive can free the queue and yet a
DISCONNECTED event still received on the cm_id after the queue is freed?

Steve.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-01 15:57 ` Steve Wise
@ 2016-11-01 16:15   ` Sagi Grimberg
  -1 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-01 16:15 UTC (permalink / raw)
  To: Steve Wise, Christoph Hellwig
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


> Hey guys,
>
> I just hit an nvmf target NULL pointer deref BUG after a few hours of keep-alive
> timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
> cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
> NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL which
> causes the crash.

AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even
get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown
paths does not nullify cm_id->qp...

Are you sure that the flow is indeed DISCONNECTED event?

> In the log, I see that the target side keep-alive fired:
>
> [20676.867545] eth2: link up, 40Gbps, full-duplex, Tx/Rx PAUSE
> [20677.079669] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> [20677.079684] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!

Wow, two keep-alive timeouts on the same controller? that is
seriously wrong...


> [20677.088402] nvmet_rdma: freeing queue 276
> [20677.090981] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000120
> [20677.090988] IP: [<ffffffffa084b6b4>] nvmet_rdma_queue_disconnect+0x24/0x90
> [nvmet_rdma]

No stack trace?

>
>
> So maybe there is just a race in that keep-alive can free the queue and yet a
> DISCONNECTED event still received on the cm_id after the queue is freed?

rdma_destroy_id should barrier this scenario.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-01 16:15   ` Sagi Grimberg
  0 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-01 16:15 UTC (permalink / raw)



> Hey guys,
>
> I just hit an nvmf target NULL pointer deref BUG after a few hours of keep-alive
> timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
> cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
> NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL which
> causes the crash.

AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even
get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown
paths does not nullify cm_id->qp...

Are you sure that the flow is indeed DISCONNECTED event?

> In the log, I see that the target side keep-alive fired:
>
> [20676.867545] eth2: link up, 40Gbps, full-duplex, Tx/Rx PAUSE
> [20677.079669] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> [20677.079684] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!

Wow, two keep-alive timeouts on the same controller? that is
seriously wrong...


> [20677.088402] nvmet_rdma: freeing queue 276
> [20677.090981] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000120
> [20677.090988] IP: [<ffffffffa084b6b4>] nvmet_rdma_queue_disconnect+0x24/0x90
> [nvmet_rdma]

No stack trace?

>
>
> So maybe there is just a race in that keep-alive can free the queue and yet a
> DISCONNECTED event still received on the cm_id after the queue is freed?

rdma_destroy_id should barrier this scenario.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-01 16:15   ` Sagi Grimberg
@ 2016-11-01 16:20       ` Steve Wise
  -1 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-01 16:20 UTC (permalink / raw)
  To: 'Sagi Grimberg', 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

> > I just hit an nvmf target NULL pointer deref BUG after a few hours of
keep-alive
> > timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
> > cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
> > NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL
> which
> > causes the crash.
> 
> AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even
> get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown
> paths does not nullify cm_id->qp...

rdma_destroy_qp() nulls out cm_id->qp.

> 
> Are you sure that the flow is indeed DISCONNECTED event?
>

That is the only event I would expect during this test.  From
nvmet_rdma_cm_handler():

        case RDMA_CM_EVENT_ADDR_CHANGE:
        case RDMA_CM_EVENT_DISCONNECTED:
        case RDMA_CM_EVENT_TIMEWAIT_EXIT:
                nvmet_rdma_queue_disconnect(queue);
                break;

 
> > In the log, I see that the target side keep-alive fired:
> >
> > [20676.867545] eth2: link up, 40Gbps, full-duplex, Tx/Rx PAUSE
> > [20677.079669] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> > [20677.079684] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> 
> Wow, two keep-alive timeouts on the same controller? that is
> seriously wrong...
> 
> 
> > [20677.088402] nvmet_rdma: freeing queue 276
> > [20677.090981] BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000120
> > [20677.090988] IP: [<ffffffffa084b6b4>]
> nvmet_rdma_queue_disconnect+0x24/0x90
> > [nvmet_rdma]
> 
> No stack trace?
> 

Sure:

[20677.090992] Oops: 0000 [#1] SMP
[20677.091030] Modules linked in: iw_cxgb4(OE) cxgb4(OE) null_blk brd nvmet_rdma
nvmet ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM
iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc
ipmi_devintf cachefiles fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs
ib_umad ocrdma be2net iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib
mlx5_core mlx4_ib mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror
dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm irqbypass uinput
iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr dm_mod i2c_i801 i2c_smbus sg ipmi_si
ipmi_msghandler nvme nvme_core lpc_ich mfd_core
[20677.091040]  mei_me mei igb dca ptp pps_core wmi ext4(E) mbcache(E) jbd2(E)
sd_mod(E) ahci(E) libahci(E) libata(E) mgag200(E) ttm(E) drm_kms_helper(E)
drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E)
i2c_core(E) [last unloaded: cxgb4]
[20677.091043] CPU: 10 PID: 24331 Comm: kworker/u64:1 Tainted: G           OE
4.8.0 #130
[20677.091044] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
[20677.091049] Workqueue: iw_cm_wq cm_work_handler [iw_cm]
[20677.091050] task: ffff88102fa4ab80 task.stack: ffff880f111d8000
[20677.091053] RIP: 0010:[<ffffffffa084b6b4>]  [<ffffffffa084b6b4>]
nvmet_rdma_queue_disconnect+0x24/0x90 [nvmet_rdma]
[20677.091054] RSP: 0018:ffff880f111db968  EFLAGS: 00010296
[20677.091055] RAX: ffff88102fa4ab80 RBX: 0000000000000000 RCX: 0000000000000000
[20677.091056] RDX: 0000000000000005 RSI: 000000000000000a RDI: ffffffffa084f040
[20677.091056] RBP: ffff880f111db998 R08: ffff88102a30e358 R09: ffff880f111dba78
[20677.091058] R10: 0000000000000c2c R11: 0000000000000001 R12: 0000000000000000
[20677.091059] R13: ffff880805ca2c00 R14: ffff880f111db9e8 R15: ffff880f111dbc48
[20677.091060] FS:  0000000000000000(0000) GS:ffff88105f280000(0000)
knlGS:0000000000000000
[20677.091061] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20677.091062] CR2: 0000000000000120 CR3: 000000105a203000 CR4: 00000000000406e0
[20677.091063] Stack:
[20677.091065]  ffff881053f6c100 0000000000000046 000012ce42a8b7b3
ffff88105bdf7000
[20677.091067]  ffff880f111db9e8 0000000000000000 ffff880f111db9c8
ffffffffa084d484
[20677.091069]  ffff880f111dbb28 ffff880805ca2c00 ffff880805ca2e70
ffff880f111db9e8
[20677.091070] Call Trace:
[20677.091074]  [<ffffffffa084d484>] nvmet_rdma_cm_handler+0x154/0x1a0
[nvmet_rdma]
[20677.091078]  [<ffffffffa0737e90>] cma_iw_handler+0x110/0x1c0 [rdma_cm]
[20677.091084]  [<ffffffff8115575e>] ? ring_buffer_lock_reserve+0x7e/0xc0
[20677.091086]  [<ffffffffa07271b3>] cm_close_handler+0x93/0xc0 [iw_cm]
[20677.091089]  [<ffffffffa0729027>] process_event+0xd7/0xf0 [iw_cm]
[20677.091092]  [<ffffffffa0729170>] cm_work_handler+0x130/0x190 [iw_cm]
[20677.091097]  [<ffffffff8109d196>] ?
trace_event_raw_event_workqueue_execute_start+0x66/0xa0
[20677.091099]  [<ffffffff810a1483>] process_one_work+0x183/0x4d0
[20677.091103]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[20677.091106]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[20677.091108]  [<ffffffff810a211d>] worker_thread+0x16d/0x530
[20677.091110]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[20677.091115]  [<ffffffff810cb9b6>] ? __wake_up_common+0x56/0x90
[20677.091117]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[20677.091119]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[20677.091121]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[20677.091123]  [<ffffffff810a6c5c>] kthread+0xcc/0xf0
[20677.091126]  [<ffffffff810b1aae>] ? schedule_tail+0x1e/0xc0
[20677.091129]  [<ffffffff816e2eff>] ret_from_fork+0x1f/0x40
[20677.091131]  [<ffffffff810a6b90>] ? kthread_freezable_should_stop+0x70/0x70
[20677.091149] Code: 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d f0 4c
89 65 f8 66 66 66 66 90 48 89 fb 48 c7 c7 40 f0 84 a0 e8 1c 55 e9 e0 <48> 8b 83
20 01 00 00 4c 8d a3 20 01 00 00 49 39 c4 48 89 45 e0
[20677.091152] RIP  [<ffffffffa084b6b4>] nvmet_rdma_queue_disconnect+0x24/0x90
[nvmet_rdma]
[20677.091152]  RSP <ffff880f111db968>
[20677.091153] CR2: 0000000000000120

> >
> >
> > So maybe there is just a race in that keep-alive can free the queue and yet
a
> > DISCONNECTED event still received on the cm_id after the queue is freed?
> 
> rdma_destroy_id should barrier this scenario.

I'm guessing the cm_id hasn't been destroyed.  But the qp has...


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-01 16:20       ` Steve Wise
  0 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-01 16:20 UTC (permalink / raw)


> > I just hit an nvmf target NULL pointer deref BUG after a few hours of
keep-alive
> > timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
> > cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
> > NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL
> which
> > causes the crash.
> 
> AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even
> get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown
> paths does not nullify cm_id->qp...

rdma_destroy_qp() nulls out cm_id->qp.

> 
> Are you sure that the flow is indeed DISCONNECTED event?
>

That is the only event I would expect during this test.  From
nvmet_rdma_cm_handler():

        case RDMA_CM_EVENT_ADDR_CHANGE:
        case RDMA_CM_EVENT_DISCONNECTED:
        case RDMA_CM_EVENT_TIMEWAIT_EXIT:
                nvmet_rdma_queue_disconnect(queue);
                break;

 
> > In the log, I see that the target side keep-alive fired:
> >
> > [20676.867545] eth2: link up, 40Gbps, full-duplex, Tx/Rx PAUSE
> > [20677.079669] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> > [20677.079684] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> 
> Wow, two keep-alive timeouts on the same controller? that is
> seriously wrong...
> 
> 
> > [20677.088402] nvmet_rdma: freeing queue 276
> > [20677.090981] BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000120
> > [20677.090988] IP: [<ffffffffa084b6b4>]
> nvmet_rdma_queue_disconnect+0x24/0x90
> > [nvmet_rdma]
> 
> No stack trace?
> 

Sure:

[20677.090992] Oops: 0000 [#1] SMP
[20677.091030] Modules linked in: iw_cxgb4(OE) cxgb4(OE) null_blk brd nvmet_rdma
nvmet ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM
iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc
ipmi_devintf cachefiles fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs
ib_umad ocrdma be2net iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib
mlx5_core mlx4_ib mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror
dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm irqbypass uinput
iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr dm_mod i2c_i801 i2c_smbus sg ipmi_si
ipmi_msghandler nvme nvme_core lpc_ich mfd_core
[20677.091040]  mei_me mei igb dca ptp pps_core wmi ext4(E) mbcache(E) jbd2(E)
sd_mod(E) ahci(E) libahci(E) libata(E) mgag200(E) ttm(E) drm_kms_helper(E)
drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E)
i2c_core(E) [last unloaded: cxgb4]
[20677.091043] CPU: 10 PID: 24331 Comm: kworker/u64:1 Tainted: G           OE
4.8.0 #130
[20677.091044] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
[20677.091049] Workqueue: iw_cm_wq cm_work_handler [iw_cm]
[20677.091050] task: ffff88102fa4ab80 task.stack: ffff880f111d8000
[20677.091053] RIP: 0010:[<ffffffffa084b6b4>]  [<ffffffffa084b6b4>]
nvmet_rdma_queue_disconnect+0x24/0x90 [nvmet_rdma]
[20677.091054] RSP: 0018:ffff880f111db968  EFLAGS: 00010296
[20677.091055] RAX: ffff88102fa4ab80 RBX: 0000000000000000 RCX: 0000000000000000
[20677.091056] RDX: 0000000000000005 RSI: 000000000000000a RDI: ffffffffa084f040
[20677.091056] RBP: ffff880f111db998 R08: ffff88102a30e358 R09: ffff880f111dba78
[20677.091058] R10: 0000000000000c2c R11: 0000000000000001 R12: 0000000000000000
[20677.091059] R13: ffff880805ca2c00 R14: ffff880f111db9e8 R15: ffff880f111dbc48
[20677.091060] FS:  0000000000000000(0000) GS:ffff88105f280000(0000)
knlGS:0000000000000000
[20677.091061] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20677.091062] CR2: 0000000000000120 CR3: 000000105a203000 CR4: 00000000000406e0
[20677.091063] Stack:
[20677.091065]  ffff881053f6c100 0000000000000046 000012ce42a8b7b3
ffff88105bdf7000
[20677.091067]  ffff880f111db9e8 0000000000000000 ffff880f111db9c8
ffffffffa084d484
[20677.091069]  ffff880f111dbb28 ffff880805ca2c00 ffff880805ca2e70
ffff880f111db9e8
[20677.091070] Call Trace:
[20677.091074]  [<ffffffffa084d484>] nvmet_rdma_cm_handler+0x154/0x1a0
[nvmet_rdma]
[20677.091078]  [<ffffffffa0737e90>] cma_iw_handler+0x110/0x1c0 [rdma_cm]
[20677.091084]  [<ffffffff8115575e>] ? ring_buffer_lock_reserve+0x7e/0xc0
[20677.091086]  [<ffffffffa07271b3>] cm_close_handler+0x93/0xc0 [iw_cm]
[20677.091089]  [<ffffffffa0729027>] process_event+0xd7/0xf0 [iw_cm]
[20677.091092]  [<ffffffffa0729170>] cm_work_handler+0x130/0x190 [iw_cm]
[20677.091097]  [<ffffffff8109d196>] ?
trace_event_raw_event_workqueue_execute_start+0x66/0xa0
[20677.091099]  [<ffffffff810a1483>] process_one_work+0x183/0x4d0
[20677.091103]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[20677.091106]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[20677.091108]  [<ffffffff810a211d>] worker_thread+0x16d/0x530
[20677.091110]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[20677.091115]  [<ffffffff810cb9b6>] ? __wake_up_common+0x56/0x90
[20677.091117]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[20677.091119]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[20677.091121]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[20677.091123]  [<ffffffff810a6c5c>] kthread+0xcc/0xf0
[20677.091126]  [<ffffffff810b1aae>] ? schedule_tail+0x1e/0xc0
[20677.091129]  [<ffffffff816e2eff>] ret_from_fork+0x1f/0x40
[20677.091131]  [<ffffffff810a6b90>] ? kthread_freezable_should_stop+0x70/0x70
[20677.091149] Code: 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d f0 4c
89 65 f8 66 66 66 66 90 48 89 fb 48 c7 c7 40 f0 84 a0 e8 1c 55 e9 e0 <48> 8b 83
20 01 00 00 4c 8d a3 20 01 00 00 49 39 c4 48 89 45 e0
[20677.091152] RIP  [<ffffffffa084b6b4>] nvmet_rdma_queue_disconnect+0x24/0x90
[nvmet_rdma]
[20677.091152]  RSP <ffff880f111db968>
[20677.091153] CR2: 0000000000000120

> >
> >
> > So maybe there is just a race in that keep-alive can free the queue and yet
a
> > DISCONNECTED event still received on the cm_id after the queue is freed?
> 
> rdma_destroy_id should barrier this scenario.

I'm guessing the cm_id hasn't been destroyed.  But the qp has...

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-01 16:20       ` Steve Wise
@ 2016-11-01 16:34         ` Sagi Grimberg
  -1 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-01 16:34 UTC (permalink / raw)
  To: Steve Wise, 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


>>> I just hit an nvmf target NULL pointer deref BUG after a few hours of
> keep-alive
>>> timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
>>> cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
>>> NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL
>> which
>>> causes the crash.
>>
>> AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even
>> get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown
>> paths does not nullify cm_id->qp...
>
> rdma_destroy_qp() nulls out cm_id->qp.

pphh, somehow managed to miss it...

So we have a case where we can call rdma_destroy_qp and
then rdma_destroy_id but still get events on the cm_id...
Not very nice...

So I think that the patch from Bart a few weeks ago was correct:

---
  drivers/nvme/target/rdma.c | 5 ++++-
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index d1aea17..a61e47f 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1354,9 +1354,12 @@ static int nvmet_rdma_cm_handler(struct 
rdma_cm_id *cm_id,
  		break;
  	case RDMA_CM_EVENT_ADDR_CHANGE:
  	case RDMA_CM_EVENT_DISCONNECTED:
-	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
  		nvmet_rdma_queue_disconnect(queue);
  		break;
+	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
+		if (queue)
+			nvmet_rdma_queue_disconnect(queue);
+		break;
  	case RDMA_CM_EVENT_DEVICE_REMOVAL:
  		ret = nvmet_rdma_device_removal(cm_id, queue);
  		break;
---

In case this fixes the issue (as expected) I'll queue it up
with a change log and a code comment on why we need to do
this (and include all the relevant cases around it)...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-01 16:34         ` Sagi Grimberg
  0 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-01 16:34 UTC (permalink / raw)



>>> I just hit an nvmf target NULL pointer deref BUG after a few hours of
> keep-alive
>>> timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
>>> cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
>>> NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL
>> which
>>> causes the crash.
>>
>> AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even
>> get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown
>> paths does not nullify cm_id->qp...
>
> rdma_destroy_qp() nulls out cm_id->qp.

pphh, somehow managed to miss it...

So we have a case where we can call rdma_destroy_qp and
then rdma_destroy_id but still get events on the cm_id...
Not very nice...

So I think that the patch from Bart a few weeks ago was correct:

---
  drivers/nvme/target/rdma.c | 5 ++++-
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index d1aea17..a61e47f 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1354,9 +1354,12 @@ static int nvmet_rdma_cm_handler(struct 
rdma_cm_id *cm_id,
  		break;
  	case RDMA_CM_EVENT_ADDR_CHANGE:
  	case RDMA_CM_EVENT_DISCONNECTED:
-	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
  		nvmet_rdma_queue_disconnect(queue);
  		break;
+	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
+		if (queue)
+			nvmet_rdma_queue_disconnect(queue);
+		break;
  	case RDMA_CM_EVENT_DEVICE_REMOVAL:
  		ret = nvmet_rdma_device_removal(cm_id, queue);
  		break;
---

In case this fixes the issue (as expected) I'll queue it up
with a change log and a code comment on why we need to do
this (and include all the relevant cases around it)...

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* RE: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-01 16:34         ` Sagi Grimberg
@ 2016-11-01 16:37             ` Steve Wise
  -1 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-01 16:37 UTC (permalink / raw)
  To: 'Sagi Grimberg', 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

> 
> >>> I just hit an nvmf target NULL pointer deref BUG after a few hours of
> > keep-alive
> >>> timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
> >>> cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left
as
> >>> NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL
> >> which
> >>> causes the crash.
> >>
> >> AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even
> >> get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown
> >> paths does not nullify cm_id->qp...
> >
> > rdma_destroy_qp() nulls out cm_id->qp.
> 
> pphh, somehow managed to miss it...
> 
> So we have a case where we can call rdma_destroy_qp and
> then rdma_destroy_id but still get events on the cm_id...
> Not very nice...
> 
> So I think that the patch from Bart a few weeks ago was correct:
>

Not quite.  It just guards against a null queue for TIMEWAIT_EXIT, which is only
generated by the IB_CM.  
 
> ---
>   drivers/nvme/target/rdma.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index d1aea17..a61e47f 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -1354,9 +1354,12 @@ static int nvmet_rdma_cm_handler(struct
> rdma_cm_id *cm_id,
>   		break;
>   	case RDMA_CM_EVENT_ADDR_CHANGE:
>   	case RDMA_CM_EVENT_DISCONNECTED:
> -	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
>   		nvmet_rdma_queue_disconnect(queue);
>   		break;
> +	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
> +		if (queue)
> +			nvmet_rdma_queue_disconnect(queue);
> +		break;
>   	case RDMA_CM_EVENT_DEVICE_REMOVAL:
>   		ret = nvmet_rdma_device_removal(cm_id, queue);
>   		break;
> ---
> 
> In case this fixes the issue (as expected) I'll queue it up
> with a change log and a code comment on why we need to do
> this (and include all the relevant cases around it)...

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-01 16:37             ` Steve Wise
  0 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-01 16:37 UTC (permalink / raw)


> 
> >>> I just hit an nvmf target NULL pointer deref BUG after a few hours of
> > keep-alive
> >>> timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
> >>> cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left
as
> >>> NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL
> >> which
> >>> causes the crash.
> >>
> >> AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even
> >> get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown
> >> paths does not nullify cm_id->qp...
> >
> > rdma_destroy_qp() nulls out cm_id->qp.
> 
> pphh, somehow managed to miss it...
> 
> So we have a case where we can call rdma_destroy_qp and
> then rdma_destroy_id but still get events on the cm_id...
> Not very nice...
> 
> So I think that the patch from Bart a few weeks ago was correct:
>

Not quite.  It just guards against a null queue for TIMEWAIT_EXIT, which is only
generated by the IB_CM.  
 
> ---
>   drivers/nvme/target/rdma.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index d1aea17..a61e47f 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -1354,9 +1354,12 @@ static int nvmet_rdma_cm_handler(struct
> rdma_cm_id *cm_id,
>   		break;
>   	case RDMA_CM_EVENT_ADDR_CHANGE:
>   	case RDMA_CM_EVENT_DISCONNECTED:
> -	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
>   		nvmet_rdma_queue_disconnect(queue);
>   		break;
> +	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
> +		if (queue)
> +			nvmet_rdma_queue_disconnect(queue);
> +		break;
>   	case RDMA_CM_EVENT_DEVICE_REMOVAL:
>   		ret = nvmet_rdma_device_removal(cm_id, queue);
>   		break;
> ---
> 
> In case this fixes the issue (as expected) I'll queue it up
> with a change log and a code comment on why we need to do
> this (and include all the relevant cases around it)...

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-01 16:37             ` Steve Wise
@ 2016-11-01 16:44               ` Sagi Grimberg
  -1 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-01 16:44 UTC (permalink / raw)
  To: Steve Wise, 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


>> pphh, somehow managed to miss it...
>>
>> So we have a case where we can call rdma_destroy_qp and
>> then rdma_destroy_id but still get events on the cm_id...
>> Not very nice...
>>
>> So I think that the patch from Bart a few weeks ago was correct:
>>
>
> Not quite.  It just guards against a null queue for TIMEWAIT_EXIT, which is only
> generated by the IB_CM.

Yes, this is why we need ADDR_CHANGE and DISCONNECTED too
"(and include all the relevant cases around it)"

The other events we don't get to LIVE state and we don't have
other error flows that will trigger queue teardown sequence.

--
nvmet-rdma: Fix possible NULL deref when handling rdma cm
  events

When we initiate queue teardown sequence we call rdma_destroy_qp
which clears cm_id->qp, afterwards we call rdma_destroy_id, but
we might see a rdma_cm event in between with a cleared cm_id->qp
so watch out for that and silently ignore the event because this
means that the queue teardown sequence is in progress.

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
  drivers/nvme/target/rdma.c | 8 +++++++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index b4d648536c3e..240888efd920 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1351,7 +1351,13 @@ static int nvmet_rdma_cm_handler(struct 
rdma_cm_id *cm_id,
         case RDMA_CM_EVENT_ADDR_CHANGE:
         case RDMA_CM_EVENT_DISCONNECTED:
         case RDMA_CM_EVENT_TIMEWAIT_EXIT:
-               nvmet_rdma_queue_disconnect(queue);
+               /*
+                * We might end up here when we already freed the qp
+                * which means queue release sequence is in progress,
+                * so don't get in the way...
+                */
+               if (!queue)
+                       nvmet_rdma_queue_disconnect(queue);
                 break;
         case RDMA_CM_EVENT_DEVICE_REMOVAL:
                 ret = nvmet_rdma_device_removal(cm_id, queue);
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-01 16:44               ` Sagi Grimberg
  0 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-01 16:44 UTC (permalink / raw)



>> pphh, somehow managed to miss it...
>>
>> So we have a case where we can call rdma_destroy_qp and
>> then rdma_destroy_id but still get events on the cm_id...
>> Not very nice...
>>
>> So I think that the patch from Bart a few weeks ago was correct:
>>
>
> Not quite.  It just guards against a null queue for TIMEWAIT_EXIT, which is only
> generated by the IB_CM.

Yes, this is why we need ADDR_CHANGE and DISCONNECTED too
"(and include all the relevant cases around it)"

The other events we don't get to LIVE state and we don't have
other error flows that will trigger queue teardown sequence.

--
nvmet-rdma: Fix possible NULL deref when handling rdma cm
  events

When we initiate queue teardown sequence we call rdma_destroy_qp
which clears cm_id->qp, afterwards we call rdma_destroy_id, but
we might see a rdma_cm event in between with a cleared cm_id->qp
so watch out for that and silently ignore the event because this
means that the queue teardown sequence is in progress.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
  drivers/nvme/target/rdma.c | 8 +++++++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index b4d648536c3e..240888efd920 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1351,7 +1351,13 @@ static int nvmet_rdma_cm_handler(struct 
rdma_cm_id *cm_id,
         case RDMA_CM_EVENT_ADDR_CHANGE:
         case RDMA_CM_EVENT_DISCONNECTED:
         case RDMA_CM_EVENT_TIMEWAIT_EXIT:
-               nvmet_rdma_queue_disconnect(queue);
+               /*
+                * We might end up here when we already freed the qp
+                * which means queue release sequence is in progress,
+                * so don't get in the way...
+                */
+               if (!queue)
+                       nvmet_rdma_queue_disconnect(queue);
                 break;
         case RDMA_CM_EVENT_DEVICE_REMOVAL:
                 ret = nvmet_rdma_device_removal(cm_id, queue);
--

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* RE: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-01 16:44               ` Sagi Grimberg
@ 2016-11-01 16:49                   ` Steve Wise
  -1 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-01 16:49 UTC (permalink / raw)
  To: 'Sagi Grimberg', 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

> >> So I think that the patch from Bart a few weeks ago was correct:
> >>
> >
> > Not quite.  It just guards against a null queue for TIMEWAIT_EXIT, which is
only
> > generated by the IB_CM.
> 
> Yes, this is why we need ADDR_CHANGE and DISCONNECTED too
> "(and include all the relevant cases around it)"
> 
> The other events we don't get to LIVE state and we don't have
> other error flows that will trigger queue teardown sequence.
> 
> --
> nvmet-rdma: Fix possible NULL deref when handling rdma cm
>   events
> 
> When we initiate queue teardown sequence we call rdma_destroy_qp
> which clears cm_id->qp, afterwards we call rdma_destroy_id, but
> we might see a rdma_cm event in between with a cleared cm_id->qp
> so watch out for that and silently ignore the event because this
> means that the queue teardown sequence is in progress.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Signed-off-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> ---
>   drivers/nvme/target/rdma.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index b4d648536c3e..240888efd920 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -1351,7 +1351,13 @@ static int nvmet_rdma_cm_handler(struct
> rdma_cm_id *cm_id,
>          case RDMA_CM_EVENT_ADDR_CHANGE:
>          case RDMA_CM_EVENT_DISCONNECTED:
>          case RDMA_CM_EVENT_TIMEWAIT_EXIT:
> -               nvmet_rdma_queue_disconnect(queue);
> +               /*
> +                * We might end up here when we already freed the qp
> +                * which means queue release sequence is in progress,
> +                * so don't get in the way...
> +                */
> +               if (!queue)
> +                       nvmet_rdma_queue_disconnect(queue);
>                  break;
>          case RDMA_CM_EVENT_DEVICE_REMOVAL:
>                  ret = nvmet_rdma_device_removal(cm_id, queue);
> --

This looks good.  But you mentioned the 2 rapid-fire keep alive timeout logs for
the same controller as being seriously broken. Perhaps that is another problem?
Maybe keep alives aren't getting stopped correctly or something... 

But:  I'll try this patch and run for a few hours and see what happens.  I
believe regardless of a keep alive issue, the above patch is still needed.

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-01 16:49                   ` Steve Wise
  0 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-01 16:49 UTC (permalink / raw)


> >> So I think that the patch from Bart a few weeks ago was correct:
> >>
> >
> > Not quite.  It just guards against a null queue for TIMEWAIT_EXIT, which is
only
> > generated by the IB_CM.
> 
> Yes, this is why we need ADDR_CHANGE and DISCONNECTED too
> "(and include all the relevant cases around it)"
> 
> The other events we don't get to LIVE state and we don't have
> other error flows that will trigger queue teardown sequence.
> 
> --
> nvmet-rdma: Fix possible NULL deref when handling rdma cm
>   events
> 
> When we initiate queue teardown sequence we call rdma_destroy_qp
> which clears cm_id->qp, afterwards we call rdma_destroy_id, but
> we might see a rdma_cm event in between with a cleared cm_id->qp
> so watch out for that and silently ignore the event because this
> means that the queue teardown sequence is in progress.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
>   drivers/nvme/target/rdma.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index b4d648536c3e..240888efd920 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -1351,7 +1351,13 @@ static int nvmet_rdma_cm_handler(struct
> rdma_cm_id *cm_id,
>          case RDMA_CM_EVENT_ADDR_CHANGE:
>          case RDMA_CM_EVENT_DISCONNECTED:
>          case RDMA_CM_EVENT_TIMEWAIT_EXIT:
> -               nvmet_rdma_queue_disconnect(queue);
> +               /*
> +                * We might end up here when we already freed the qp
> +                * which means queue release sequence is in progress,
> +                * so don't get in the way...
> +                */
> +               if (!queue)
> +                       nvmet_rdma_queue_disconnect(queue);
>                  break;
>          case RDMA_CM_EVENT_DEVICE_REMOVAL:
>                  ret = nvmet_rdma_device_removal(cm_id, queue);
> --

This looks good.  But you mentioned the 2 rapid-fire keep alive timeout logs for
the same controller as being seriously broken. Perhaps that is another problem?
Maybe keep alives aren't getting stopped correctly or something... 

But:  I'll try this patch and run for a few hours and see what happens.  I
believe regardless of a keep alive issue, the above patch is still needed.

Steve.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-01 16:49                   ` Steve Wise
@ 2016-11-01 17:41                     ` Sagi Grimberg
  -1 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-01 17:41 UTC (permalink / raw)
  To: Steve Wise, 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


> This looks good.  But you mentioned the 2 rapid-fire keep alive timeout logs for
> the same controller as being seriously broken. Perhaps that is another problem?
> Maybe keep alives aren't getting stopped correctly or something...
>
> But:  I'll try this patch and run for a few hours and see what happens.  I
> believe regardless of a keep alive issue, the above patch is still needed.

In your tests, can you enable dynamic debug on:
nvmet_start_keep_alive_timer
nvmet_stop_keep_alive_timer
nvmet_execute_keep_alive

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-01 17:41                     ` Sagi Grimberg
  0 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-01 17:41 UTC (permalink / raw)



> This looks good.  But you mentioned the 2 rapid-fire keep alive timeout logs for
> the same controller as being seriously broken. Perhaps that is another problem?
> Maybe keep alives aren't getting stopped correctly or something...
>
> But:  I'll try this patch and run for a few hours and see what happens.  I
> believe regardless of a keep alive issue, the above patch is still needed.

In your tests, can you enable dynamic debug on:
nvmet_start_keep_alive_timer
nvmet_stop_keep_alive_timer
nvmet_execute_keep_alive

Thanks,

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: nvmet_rdma crash - DISCONNECT event with NULL queue
       [not found]                     ` <025201d23476$66812290$338367b0$@opengridcomputing.com>
@ 2016-11-01 19:42                         ` Steve Wise
  0 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-01 19:42 UTC (permalink / raw)
  To: 'Sagi Grimberg', 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

> > But:  I'll try this patch and run for a few hours and see what
happens.  I
> > believe regardless of a keep alive issue, the above patch is still
needed.
> 
> In your tests, can you enable dynamic debug on:
> nvmet_start_keep_alive_timer
> nvmet_stop_keep_alive_timer
> nvmet_execute_keep_alive

Hey Sagi.  I hit another crash on the target.  This was with 4.8.0 + the
patch to skip disconnect events if the cm_id->qp is NULL. This time the
crash is in _raw_spin_lock_irqsave() called by nvmet_rdma_recv_done().
The log is too big to include everything inline, but I'm attaching the
full log as an attachment.  Looks like at around 4988.169 seconds in the
log, we see 5 controllers created, all named "controller 1"!  And 32
queues assigned to controller 1 5 times!  And shortly after that we hit
the BUG.

Snipits:

[ 4988.035500] nvmet_start_keep_alive_timer: nvmet: ctrl 1 start
keep-alive timer for 15 secs
[ 4988.035507] nvmet: creating controller 1 for NQN
nqn.2014-08.org.nvmexpress:NVMf:uuid:0c8e193a-9c77-4e40-adff-07a7b8a352af.
[ 4988.040872] nvmet_start_keep_alive_timer: nvmet: ctrl 1 start
keep-alive timer for 15 secs
[ 4988.040877] nvmet: creating controller 1 for NQN
nqn.2014-08.org.nvmexpress:NVMf:uuid:0c8e193a-9c77-4e40-adff-07a7b8a352af.
[ 4988.044614] nvmet_start_keep_alive_timer: nvmet: ctrl 1 start
keep-alive timer for 15 secs
[ 4988.044619] nvmet: creating controller 1 for NQN
nqn.2014-08.org.nvmexpress:NVMf:uuid:0c8e193a-9c77-4e40-adff-07a7b8a352af.
[ 4988.054419] nvmet_start_keep_alive_timer: nvmet: ctrl 1 start
keep-alive timer for 15 secs
[ 4988.054423] nvmet: creating controller 1 for NQN
nqn.2014-08.org.nvmexpress:NVMf:uuid:0c8e193a-9c77-4e40-adff-07a7b8a352af.
[ 4988.054428] nvmet_start_keep_alive_timer: nvmet: ctrl 1 start
keep-alive timer for 15 secs
[ 4988.054433] nvmet: creating controller 1 for NQN
nqn.2014-08.org.nvmexpress:NVMf:uuid:0c8e193a-9c77-4e40-adff-07a7b8a352af.

Queues assigned to the same controller multiple times:

[ 4988.371465] nvmet: adding queue 1 to ctrl 1.
[ 4988.376144] nvmet: adding queue 2 to ctrl 1.
[ 4988.380790] nvmet: adding queue 3 to ctrl 1.
[ 4988.386110] nvmet: adding queue 1 to ctrl 1.
[ 4988.390751] nvmet: adding queue 2 to ctrl 1.
[ 4988.416056] nvmet: adding queue 4 to ctrl 1.
[ 4988.420658] nvmet: adding queue 3 to ctrl 1.
[ 4988.425257] nvmet: adding queue 1 to ctrl 1.
[ 4988.429841] nvmet: adding queue 1 to ctrl 1.
[ 4988.434402] nvmet: adding queue 5 to ctrl 1.
[ 4988.438957] nvmet: adding queue 4 to ctrl 1.
[ 4988.443497] nvmet: adding queue 2 to ctrl 1.
[ 4988.448041] nvmet: adding queue 2 to ctrl 1.

And the BUG in the middle of all this:

[ 4988.964835] BUG: unable to handle kernel
[ 4988.964877] nvmet: adding queue 32 to ctrl 1.
[ 4988.973233] NULL pointer dereference
[ 4988.977047]  at 00000000000000b8
[ 4988.978932] nvmet: adding queue 1 to ctrl 1.
[ 4988.978977] nvmet: adding queue 2 to ctrl 1.
...
[ 4988.980539] nvmet: adding queue 32 to ctrl 1.
[ 4989.051155] nvmet_start_keep_alive_timer: nvmet: ctrl 1 start
keep-alive timer for 15 secs
[ 4989.051158] nvmet: creating controller 1 for NQN
nqn.2014-08.org.nvmexpress:NVMf:uuid:0c8e193a-9c77-4e40-adff-07a7b8a352af.
[ 4989.149157] IP: [<ffffffff816e2b32>] _raw_spin_lock_irqsave+0x22/0x40
[ 4989.155978] PGD 858398067 PUD 858109067 PMD 0
[ 4989.160844] Oops: 0002 [#1] SMP
[ 4989.164359] Modules linked in: iw_cxgb4(OE) cxgb4(OE) null_blk brd
nvmet_rdma nvmet ip6table_filter ip6_tables ebtable_nat ebtables
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
nf_reject_ipv4
[ 4989.189404] nvmet: adding queue 1 to ctrl 1.
...
[ 4989.825586] nvmet: adding queue 30 to ctrl 1.
[ 4989.825643] nvmet: adding queue 31 to ctrl 1.
[ 4989.825700] nvmet: adding queue 32 to ctrl 1.
[ 4989.979826]  [<ffffffff816e2b32>] _raw_spin_lock_irqsave+0x22/0x40
[ 4989.987220] RSP: 0018:ffff880fb5e6bbe0  EFLAGS: 00010046
[ 4989.993071] RAX: 0000000000000000 RBX: 0000000000000006 RCX:
dead000000000200
[ 4990.000777] RDX: 0000000000000001 RSI: 0000000000000212 RDI:
00000000000000b8
[ 4990.008492] RBP: ffff880fb5e6bbe8 R08: ffff880fac0c0228 R09:
ffff880fb5e6bb64
[ 4990.016219] R10: 000000000000086c R11: 0000000000000000 R12:
ffff880fac0c0228
[ 4990.023936] R13: ffff880fb5935828 R14: 0000000000000297 R15:
ffff880fac0c0000
[ 4990.031627] FS:  0000000000000000(0000) GS:ffff88105f380000(0000)
knlGS:0000000000000000
[ 4990.040274] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4990.046538] CR2: 00000000000000b8 CR3: 00000008587e9000 CR4:
00000000000406e0
[ 4990.054178] Stack:
[ 4990.056665]  ffff880fb5935800 ffff880fb5e6bc38 ffffffffa0853642
ffff880fb5937cf0
[ 4990.064633]  0000000000000212 0000000000000000 ffff880fb5937c00
0000000000000000
[ 4990.072598]  0000000000010000 0000000000000000 0000000000000000
ffff880fb5e6bc88
[ 4990.080562] Call Trace:
[ 4990.083495]  [<ffffffffa0853642>] nvmet_rdma_recv_done+0x162/0x19c
[nvmet_rdma]
[ 4990.091291]  [<ffffffffa0255d48>] __ib_process_cq+0x48/0xc0 [ib_core]
[ 4990.098235]  [<ffffffffa0255f1a>] ib_cq_poll_work+0x2a/0x70 [ib_core]
[ 4990.105191]  [<ffffffff810a1483>] process_one_work+0x183/0x4d0
[ 4990.111555]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[ 4990.117566]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[ 4990.123243]  [<ffffffff810a211d>] worker_thread+0x16d/0x530
[ 4990.129367]  [<ffffffff810b4785>] ?
trace_event_raw_event_sched_switch+0xe5/0x130
[ 4990.137418]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[ 4990.143492]  [<ffffffff810cb9b6>] ? __wake_up_common+0x56/0x90
[ 4990.149916]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[ 4990.156784]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[ 4990.162535]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[ 4990.169441]  [<ffffffff810a6c5c>] kthread+0xcc/0xf0
[ 4990.174954]  [<ffffffff810b1aae>] ? schedule_tail+0x1e/0xc0
[ 4990.181173]  [<ffffffff816e2eff>] ret_from_fork+0x1f/0x40
[ 4990.187237]  [<ffffffff810a6b90>] ?
kthread_freezable_should_stop+0x70/0x70
[ 4990.194876] Code: 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 66 66 66 66
90 9c 58 66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 31 c0 ba 01 00 00 00
<f0> 0f b1 17 85 c0 75 06 48 89 d8 5b c9 c3 89 c6 e8 39 f9 9e ff
[ 4990.216376] RIP  [<ffffffff816e2b32>] _raw_spin_lock_irqsave+0x22/0x40
[ 4990.223711]  RSP <ffff880fb5e6bbe0>
[ 4990.227996] CR2: 00000000000000b8


Given the way the BUG and stack trace are interleaved with "adding queue"
messages, I guess this was happening on multiple cores.  (the nodes have
32 cores). 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-01 19:42                         ` Steve Wise
  0 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-01 19:42 UTC (permalink / raw)


> > But:  I'll try this patch and run for a few hours and see what
happens.  I
> > believe regardless of a keep alive issue, the above patch is still
needed.
> 
> In your tests, can you enable dynamic debug on:
> nvmet_start_keep_alive_timer
> nvmet_stop_keep_alive_timer
> nvmet_execute_keep_alive

Hey Sagi.  I hit another crash on the target.  This was with 4.8.0 + the
patch to skip disconnect events if the cm_id->qp is NULL. This time the
crash is in _raw_spin_lock_irqsave() called by nvmet_rdma_recv_done().
The log is too big to include everything inline, but I'm attaching the
full log as an attachment.  Looks like at around 4988.169 seconds in the
log, we see 5 controllers created, all named "controller 1"!  And 32
queues assigned to controller 1 5 times!  And shortly after that we hit
the BUG.

Snipits:

[ 4988.035500] nvmet_start_keep_alive_timer: nvmet: ctrl 1 start
keep-alive timer for 15 secs
[ 4988.035507] nvmet: creating controller 1 for NQN
nqn.2014-08.org.nvmexpress:NVMf:uuid:0c8e193a-9c77-4e40-adff-07a7b8a352af.
[ 4988.040872] nvmet_start_keep_alive_timer: nvmet: ctrl 1 start
keep-alive timer for 15 secs
[ 4988.040877] nvmet: creating controller 1 for NQN
nqn.2014-08.org.nvmexpress:NVMf:uuid:0c8e193a-9c77-4e40-adff-07a7b8a352af.
[ 4988.044614] nvmet_start_keep_alive_timer: nvmet: ctrl 1 start
keep-alive timer for 15 secs
[ 4988.044619] nvmet: creating controller 1 for NQN
nqn.2014-08.org.nvmexpress:NVMf:uuid:0c8e193a-9c77-4e40-adff-07a7b8a352af.
[ 4988.054419] nvmet_start_keep_alive_timer: nvmet: ctrl 1 start
keep-alive timer for 15 secs
[ 4988.054423] nvmet: creating controller 1 for NQN
nqn.2014-08.org.nvmexpress:NVMf:uuid:0c8e193a-9c77-4e40-adff-07a7b8a352af.
[ 4988.054428] nvmet_start_keep_alive_timer: nvmet: ctrl 1 start
keep-alive timer for 15 secs
[ 4988.054433] nvmet: creating controller 1 for NQN
nqn.2014-08.org.nvmexpress:NVMf:uuid:0c8e193a-9c77-4e40-adff-07a7b8a352af.

Queues assigned to the same controller multiple times:

[ 4988.371465] nvmet: adding queue 1 to ctrl 1.
[ 4988.376144] nvmet: adding queue 2 to ctrl 1.
[ 4988.380790] nvmet: adding queue 3 to ctrl 1.
[ 4988.386110] nvmet: adding queue 1 to ctrl 1.
[ 4988.390751] nvmet: adding queue 2 to ctrl 1.
[ 4988.416056] nvmet: adding queue 4 to ctrl 1.
[ 4988.420658] nvmet: adding queue 3 to ctrl 1.
[ 4988.425257] nvmet: adding queue 1 to ctrl 1.
[ 4988.429841] nvmet: adding queue 1 to ctrl 1.
[ 4988.434402] nvmet: adding queue 5 to ctrl 1.
[ 4988.438957] nvmet: adding queue 4 to ctrl 1.
[ 4988.443497] nvmet: adding queue 2 to ctrl 1.
[ 4988.448041] nvmet: adding queue 2 to ctrl 1.

And the BUG in the middle of all this:

[ 4988.964835] BUG: unable to handle kernel
[ 4988.964877] nvmet: adding queue 32 to ctrl 1.
[ 4988.973233] NULL pointer dereference
[ 4988.977047]  at 00000000000000b8
[ 4988.978932] nvmet: adding queue 1 to ctrl 1.
[ 4988.978977] nvmet: adding queue 2 to ctrl 1.
...
[ 4988.980539] nvmet: adding queue 32 to ctrl 1.
[ 4989.051155] nvmet_start_keep_alive_timer: nvmet: ctrl 1 start
keep-alive timer for 15 secs
[ 4989.051158] nvmet: creating controller 1 for NQN
nqn.2014-08.org.nvmexpress:NVMf:uuid:0c8e193a-9c77-4e40-adff-07a7b8a352af.
[ 4989.149157] IP: [<ffffffff816e2b32>] _raw_spin_lock_irqsave+0x22/0x40
[ 4989.155978] PGD 858398067 PUD 858109067 PMD 0
[ 4989.160844] Oops: 0002 [#1] SMP
[ 4989.164359] Modules linked in: iw_cxgb4(OE) cxgb4(OE) null_blk brd
nvmet_rdma nvmet ip6table_filter ip6_tables ebtable_nat ebtables
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
nf_reject_ipv4
[ 4989.189404] nvmet: adding queue 1 to ctrl 1.
...
[ 4989.825586] nvmet: adding queue 30 to ctrl 1.
[ 4989.825643] nvmet: adding queue 31 to ctrl 1.
[ 4989.825700] nvmet: adding queue 32 to ctrl 1.
[ 4989.979826]  [<ffffffff816e2b32>] _raw_spin_lock_irqsave+0x22/0x40
[ 4989.987220] RSP: 0018:ffff880fb5e6bbe0  EFLAGS: 00010046
[ 4989.993071] RAX: 0000000000000000 RBX: 0000000000000006 RCX:
dead000000000200
[ 4990.000777] RDX: 0000000000000001 RSI: 0000000000000212 RDI:
00000000000000b8
[ 4990.008492] RBP: ffff880fb5e6bbe8 R08: ffff880fac0c0228 R09:
ffff880fb5e6bb64
[ 4990.016219] R10: 000000000000086c R11: 0000000000000000 R12:
ffff880fac0c0228
[ 4990.023936] R13: ffff880fb5935828 R14: 0000000000000297 R15:
ffff880fac0c0000
[ 4990.031627] FS:  0000000000000000(0000) GS:ffff88105f380000(0000)
knlGS:0000000000000000
[ 4990.040274] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4990.046538] CR2: 00000000000000b8 CR3: 00000008587e9000 CR4:
00000000000406e0
[ 4990.054178] Stack:
[ 4990.056665]  ffff880fb5935800 ffff880fb5e6bc38 ffffffffa0853642
ffff880fb5937cf0
[ 4990.064633]  0000000000000212 0000000000000000 ffff880fb5937c00
0000000000000000
[ 4990.072598]  0000000000010000 0000000000000000 0000000000000000
ffff880fb5e6bc88
[ 4990.080562] Call Trace:
[ 4990.083495]  [<ffffffffa0853642>] nvmet_rdma_recv_done+0x162/0x19c
[nvmet_rdma]
[ 4990.091291]  [<ffffffffa0255d48>] __ib_process_cq+0x48/0xc0 [ib_core]
[ 4990.098235]  [<ffffffffa0255f1a>] ib_cq_poll_work+0x2a/0x70 [ib_core]
[ 4990.105191]  [<ffffffff810a1483>] process_one_work+0x183/0x4d0
[ 4990.111555]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[ 4990.117566]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[ 4990.123243]  [<ffffffff810a211d>] worker_thread+0x16d/0x530
[ 4990.129367]  [<ffffffff810b4785>] ?
trace_event_raw_event_sched_switch+0xe5/0x130
[ 4990.137418]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[ 4990.143492]  [<ffffffff810cb9b6>] ? __wake_up_common+0x56/0x90
[ 4990.149916]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[ 4990.156784]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[ 4990.162535]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[ 4990.169441]  [<ffffffff810a6c5c>] kthread+0xcc/0xf0
[ 4990.174954]  [<ffffffff810b1aae>] ? schedule_tail+0x1e/0xc0
[ 4990.181173]  [<ffffffff816e2eff>] ret_from_fork+0x1f/0x40
[ 4990.187237]  [<ffffffff810a6b90>] ?
kthread_freezable_should_stop+0x70/0x70
[ 4990.194876] Code: 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 66 66 66 66
90 9c 58 66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 31 c0 ba 01 00 00 00
<f0> 0f b1 17 85 c0 75 06 48 89 d8 5b c9 c3 89 c6 e8 39 f9 9e ff
[ 4990.216376] RIP  [<ffffffff816e2b32>] _raw_spin_lock_irqsave+0x22/0x40
[ 4990.223711]  RSP <ffff880fb5e6bbe0>
[ 4990.227996] CR2: 00000000000000b8


Given the way the BUG and stack trace are interleaved with "adding queue"
messages, I guess this was happening on multiple cores.  (the nodes have
32 cores). 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
       [not found]                     ` <024e01d23476$6668b890$333a29b0$@opengridcomputing.com>
@ 2016-11-01 22:34                         ` Sagi Grimberg
  0 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-01 22:34 UTC (permalink / raw)
  To: Steve Wise, 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


>>> But:  I'll try this patch and run for a few hours and see what happens.  I
>>> believe regardless of a keep alive issue, the above patch is still needed.
>>
>> In your tests, can you enable dynamic debug on:
>> nvmet_start_keep_alive_timer
>> nvmet_stop_keep_alive_timer
>> nvmet_execute_keep_alive
>
> Hey Sagi.  I hit another crash on the target.  This was with 4.8.0 + the patch
> to skip disconnect events if the cm_id->qp is NULL. This time the crash is in
> _raw_spin_lock_irqsave() called by nvmet_rdma_recv_done().  The log is too big
> to include everything inline, but I'm attaching the full log as an attachment.
> Looks like at around 4988.169 seconds in the log, we see 5 controllers created,
> all named "controller 1"!  And 32 queues assigned to controller 1 5 times!  And
> shortly after that we hit the BUG.

So I think you're creating multiple subsystems and provision each
subsystem differently correct? the controller ids are unique within
a subsystem so two different subsystems can have ctrl id 1. Perhaps
our logging should mention the subsysnqn too?

Anyway, is there traffic going on?

The only way we can get recv_done with corrupted data is if we posted
something after the qp drain completed, can you check if that can happen?

Can you share your test case?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-01 22:34                         ` Sagi Grimberg
  0 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-01 22:34 UTC (permalink / raw)



>>> But:  I'll try this patch and run for a few hours and see what happens.  I
>>> believe regardless of a keep alive issue, the above patch is still needed.
>>
>> In your tests, can you enable dynamic debug on:
>> nvmet_start_keep_alive_timer
>> nvmet_stop_keep_alive_timer
>> nvmet_execute_keep_alive
>
> Hey Sagi.  I hit another crash on the target.  This was with 4.8.0 + the patch
> to skip disconnect events if the cm_id->qp is NULL. This time the crash is in
> _raw_spin_lock_irqsave() called by nvmet_rdma_recv_done().  The log is too big
> to include everything inline, but I'm attaching the full log as an attachment.
> Looks like at around 4988.169 seconds in the log, we see 5 controllers created,
> all named "controller 1"!  And 32 queues assigned to controller 1 5 times!  And
> shortly after that we hit the BUG.

So I think you're creating multiple subsystems and provision each
subsystem differently correct? the controller ids are unique within
a subsystem so two different subsystems can have ctrl id 1. Perhaps
our logging should mention the subsysnqn too?

Anyway, is there traffic going on?

The only way we can get recv_done with corrupted data is if we posted
something after the qp drain completed, can you check if that can happen?

Can you share your test case?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-01 22:34                         ` Sagi Grimberg
@ 2016-11-02 15:07                             ` Steve Wise
  -1 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-02 15:07 UTC (permalink / raw)
  To: 'Sagi Grimberg', 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

> > Hey Sagi.  I hit another crash on the target.  This was with 4.8.0 + the
patch
> > to skip disconnect events if the cm_id->qp is NULL. This time the crash is
in
> > _raw_spin_lock_irqsave() called by nvmet_rdma_recv_done().  The log is too
big
> > to include everything inline, but I'm attaching the full log as an
attachment.
> > Looks like at around 4988.169 seconds in the log, we see 5 controllers
created,
> > all named "controller 1"!  And 32 queues assigned to controller 1 5 times!
And
> > shortly after that we hit the BUG.
> 
> So I think you're creating multiple subsystems and provision each
> subsystem differently correct? the controller ids are unique within
> a subsystem so two different subsystems can have ctrl id 1. Perhaps
> our logging should mention the subsysnqn too?
> 

I'm not sure I understand the "subsystem" concept.  I never noticed before that
any target device had the same controller ID.  The target json config file is
inserted below.  There are 10 ramdisks exported over 2 ports of a cxgb4 40GE
device and 1 port of an mlx4 RoCE device.   For this test, the NVMF host
connects to all 10 targets over 1 port of the cxgb4 device.  Like this:

for i in $(seq 0 9) ; do nvme connect --transport=rdma --trsvcid=4420
--traddr=10.0.1.14 --nqn=test-ram${i}; done

> Anyway, is there traffic going on?
>

Yes, heavy fio on all 10 attached ramdisks.
 
> The only way we can get recv_done with corrupted data is if we posted
> something after the qp drain completed, can you check if that can happen?
>

Hmm, posting after the drain would result in a synchronous error returned by
ib_post_send() for cxgb4.  There is that issue with cxgb4's drain logic in that
it really only guarantees that the CQEs are polled, not that the completion
handler was called.  I have a fix in progress for this (actually decided to
support drain like IB does with a small delta from the iWARP spec).  I'll also
try and reproduce this on mlx4 to rule out iwarp and cxgb4 anomolies.  And I can
try my new drain fix which will be posted for review soon for inclusion into
4.10.
 
> Can you share your test case?

Of course!  This is the same test that was killing the host side very quickly,
until Christoph fixed it with:

http://lists.infradead.org/pipermail/linux-nvme/2016-November/007043.html

Now it runs for ~60-90 minutes before the target dies.  

After connecting all 10 ramdisks over 1 cxgb4 port, with 32 core NVMF
host/target nodes, you run this script (note nvme0n1 is a local nvme device, so
the NVMF devices are nvme[1-10]n1):

[root@stevo1 sw]# cat /root/bug30782/fio.sh
for i in $(seq 1 200) ; do

         fio --startdelay=1-10 --ioengine=libaio --rw=randwrite --name=randwrite
--size=200m --direct=1 \
        --invalidate=1 --fsync_on_close=1 --group_reporting --exitall
--runtime=60 \
        --time_based --filename=/dev/nvme10n1 --filename=/dev/nvme1n1 \
        --filename=/dev/nvme2n1 --filename=/dev/nvme3n1 --filename=/dev/nvme4n1
\
        --filename=/dev/nvme5n1 --filename=/dev/nvme6n1 --filename=/dev/nvme7n1
\
        --filename=/dev/nvme8n1 --filename=/dev/nvme9n1 --iodepth=4 --numjobs=32
\
        --bs=2K |grep -i "aggrb\|iops"
        sleep 3
        echo "### Iteration $i Done ###"
done

And then run this script (eth2 is the port handling the NVMF traffic) to force
keep alive timeouts and reconnects:

while : ; do
    ifconfig eth2 down
    sleep $(( ($RANDOM & 0xf) + 8 ))
    ifconfig eth2 up
    sleep 30
done


Here is the target json file:

[root@stevo2 ~]# cat /etc/nvmet-10ram.json
{
  "hosts": [],
  "ports": [
    {
      "addr": {
        "adrfam": "ipv4",
        "traddr": "10.0.1.14",
        "treq": "not specified",
        "trsvcid": "4420",
        "trtype": "rdma"
      },
      "portid": 1,
      "referrals": [],
      "subsystems": [
        "test-ram9",
        "test-ram8",
        "test-ram7",
        "test-ram6",
        "test-ram5",
        "test-ram4",
        "test-ram3",
        "test-ram2",
        "test-ram1",
        "test-ram0"
      ]
    },
    {
      "addr": {
        "adrfam": "ipv4",
        "traddr": "10.0.2.14",
        "treq": "not specified",
        "trsvcid": "4420",
        "trtype": "rdma"
      },
      "portid": 2,
      "referrals": [],
      "subsystems": [
        "test-ram9",
        "test-ram8",
        "test-ram7",
        "test-ram6",
        "test-ram5",
        "test-ram4",
        "test-ram3",
        "test-ram2",
        "test-ram1",
        "test-ram0"
      ]
    },
    {
      "addr": {
        "adrfam": "ipv4",
        "traddr": "10.0.5.14",
        "treq": "not specified",
        "trsvcid": "4420",
        "trtype": "rdma"
      },
      "portid": 5,
      "referrals": [],
      "subsystems": [
        "test-ram9",
        "test-ram8",
        "test-ram7",
        "test-ram6",
        "test-ram5",
        "test-ram4",
        "test-ram3",
        "test-ram2",
        "test-ram1",
        "test-ram0"
      ]
    },
    {
      "addr": {
        "adrfam": "ipv4",
        "traddr": "10.0.7.14",
        "treq": "not specified",
        "trsvcid": "4420",
        "trtype": "rdma"
      },
      "portid": 7,
      "referrals": [],
      "subsystems": [
        "test-ram9",
        "test-ram8",
        "test-ram7",
        "test-ram6",
        "test-ram5",
        "test-ram4",
        "test-ram3",
        "test-ram2",
        "test-ram1",
        "test-ram0"
      ]
    }
  ],
  "subsystems": [
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram9"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram9"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram8"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram8"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram7"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram7"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram6"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram6"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram5"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram5"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram4"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram4"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram3"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram3"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram2"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram2"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram1"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram1"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram0"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram0"
    }
  ]
}

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-02 15:07                             ` Steve Wise
  0 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-02 15:07 UTC (permalink / raw)


> > Hey Sagi.  I hit another crash on the target.  This was with 4.8.0 + the
patch
> > to skip disconnect events if the cm_id->qp is NULL. This time the crash is
in
> > _raw_spin_lock_irqsave() called by nvmet_rdma_recv_done().  The log is too
big
> > to include everything inline, but I'm attaching the full log as an
attachment.
> > Looks like at around 4988.169 seconds in the log, we see 5 controllers
created,
> > all named "controller 1"!  And 32 queues assigned to controller 1 5 times!
And
> > shortly after that we hit the BUG.
> 
> So I think you're creating multiple subsystems and provision each
> subsystem differently correct? the controller ids are unique within
> a subsystem so two different subsystems can have ctrl id 1. Perhaps
> our logging should mention the subsysnqn too?
> 

I'm not sure I understand the "subsystem" concept.  I never noticed before that
any target device had the same controller ID.  The target json config file is
inserted below.  There are 10 ramdisks exported over 2 ports of a cxgb4 40GE
device and 1 port of an mlx4 RoCE device.   For this test, the NVMF host
connects to all 10 targets over 1 port of the cxgb4 device.  Like this:

for i in $(seq 0 9) ; do nvme connect --transport=rdma --trsvcid=4420
--traddr=10.0.1.14 --nqn=test-ram${i}; done

> Anyway, is there traffic going on?
>

Yes, heavy fio on all 10 attached ramdisks.
 
> The only way we can get recv_done with corrupted data is if we posted
> something after the qp drain completed, can you check if that can happen?
>

Hmm, posting after the drain would result in a synchronous error returned by
ib_post_send() for cxgb4.  There is that issue with cxgb4's drain logic in that
it really only guarantees that the CQEs are polled, not that the completion
handler was called.  I have a fix in progress for this (actually decided to
support drain like IB does with a small delta from the iWARP spec).  I'll also
try and reproduce this on mlx4 to rule out iwarp and cxgb4 anomolies.  And I can
try my new drain fix which will be posted for review soon for inclusion into
4.10.
 
> Can you share your test case?

Of course!  This is the same test that was killing the host side very quickly,
until Christoph fixed it with:

http://lists.infradead.org/pipermail/linux-nvme/2016-November/007043.html

Now it runs for ~60-90 minutes before the target dies.  

After connecting all 10 ramdisks over 1 cxgb4 port, with 32 core NVMF
host/target nodes, you run this script (note nvme0n1 is a local nvme device, so
the NVMF devices are nvme[1-10]n1):

[root at stevo1 sw]# cat /root/bug30782/fio.sh
for i in $(seq 1 200) ; do

         fio --startdelay=1-10 --ioengine=libaio --rw=randwrite --name=randwrite
--size=200m --direct=1 \
        --invalidate=1 --fsync_on_close=1 --group_reporting --exitall
--runtime=60 \
        --time_based --filename=/dev/nvme10n1 --filename=/dev/nvme1n1 \
        --filename=/dev/nvme2n1 --filename=/dev/nvme3n1 --filename=/dev/nvme4n1
\
        --filename=/dev/nvme5n1 --filename=/dev/nvme6n1 --filename=/dev/nvme7n1
\
        --filename=/dev/nvme8n1 --filename=/dev/nvme9n1 --iodepth=4 --numjobs=32
\
        --bs=2K |grep -i "aggrb\|iops"
        sleep 3
        echo "### Iteration $i Done ###"
done

And then run this script (eth2 is the port handling the NVMF traffic) to force
keep alive timeouts and reconnects:

while : ; do
    ifconfig eth2 down
    sleep $(( ($RANDOM & 0xf) + 8 ))
    ifconfig eth2 up
    sleep 30
done


Here is the target json file:

[root at stevo2 ~]# cat /etc/nvmet-10ram.json
{
  "hosts": [],
  "ports": [
    {
      "addr": {
        "adrfam": "ipv4",
        "traddr": "10.0.1.14",
        "treq": "not specified",
        "trsvcid": "4420",
        "trtype": "rdma"
      },
      "portid": 1,
      "referrals": [],
      "subsystems": [
        "test-ram9",
        "test-ram8",
        "test-ram7",
        "test-ram6",
        "test-ram5",
        "test-ram4",
        "test-ram3",
        "test-ram2",
        "test-ram1",
        "test-ram0"
      ]
    },
    {
      "addr": {
        "adrfam": "ipv4",
        "traddr": "10.0.2.14",
        "treq": "not specified",
        "trsvcid": "4420",
        "trtype": "rdma"
      },
      "portid": 2,
      "referrals": [],
      "subsystems": [
        "test-ram9",
        "test-ram8",
        "test-ram7",
        "test-ram6",
        "test-ram5",
        "test-ram4",
        "test-ram3",
        "test-ram2",
        "test-ram1",
        "test-ram0"
      ]
    },
    {
      "addr": {
        "adrfam": "ipv4",
        "traddr": "10.0.5.14",
        "treq": "not specified",
        "trsvcid": "4420",
        "trtype": "rdma"
      },
      "portid": 5,
      "referrals": [],
      "subsystems": [
        "test-ram9",
        "test-ram8",
        "test-ram7",
        "test-ram6",
        "test-ram5",
        "test-ram4",
        "test-ram3",
        "test-ram2",
        "test-ram1",
        "test-ram0"
      ]
    },
    {
      "addr": {
        "adrfam": "ipv4",
        "traddr": "10.0.7.14",
        "treq": "not specified",
        "trsvcid": "4420",
        "trtype": "rdma"
      },
      "portid": 7,
      "referrals": [],
      "subsystems": [
        "test-ram9",
        "test-ram8",
        "test-ram7",
        "test-ram6",
        "test-ram5",
        "test-ram4",
        "test-ram3",
        "test-ram2",
        "test-ram1",
        "test-ram0"
      ]
    }
  ],
  "subsystems": [
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram9"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram9"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram8"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram8"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram7"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram7"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram6"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram6"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram5"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram5"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram4"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram4"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram3"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram3"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram2"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram2"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram1"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram1"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/ram0"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "test-ram0"
    }
  ]
}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-02 15:07                             ` Steve Wise
@ 2016-11-02 15:15                               ` 'Christoph Hellwig'
  -1 siblings, 0 replies; 40+ messages in thread
From: 'Christoph Hellwig' @ 2016-11-02 15:15 UTC (permalink / raw)
  To: Steve Wise
  Cc: 'Sagi Grimberg', 'Christoph Hellwig',
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Nov 02, 2016 at 10:07:38AM -0500, Steve Wise wrote:
> I'm not sure I understand the "subsystem" concept.  I never noticed before that
> any target device had the same controller ID.  The target json config file is
> inserted below.  There are 10 ramdisks exported over 2 ports of a cxgb4 40GE
> device and 1 port of an mlx4 RoCE device.   For this test, the NVMF host
> connects to all 10 targets over 1 port of the cxgb4 device.  Like this:

Yes, you have multiple subsystems.  It's sort of the NVMe equivalent
of a target which can have multiple LUNs, aka Namespaces in NVMe.

Btw, I want to actually make the ctrlid global for the target instead of
per-subsystem to ease a few things, and debuggability is just one more
on the list.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-02 15:15                               ` 'Christoph Hellwig'
  0 siblings, 0 replies; 40+ messages in thread
From: 'Christoph Hellwig' @ 2016-11-02 15:15 UTC (permalink / raw)


On Wed, Nov 02, 2016@10:07:38AM -0500, Steve Wise wrote:
> I'm not sure I understand the "subsystem" concept.  I never noticed before that
> any target device had the same controller ID.  The target json config file is
> inserted below.  There are 10 ramdisks exported over 2 ports of a cxgb4 40GE
> device and 1 port of an mlx4 RoCE device.   For this test, the NVMF host
> connects to all 10 targets over 1 port of the cxgb4 device.  Like this:

Yes, you have multiple subsystems.  It's sort of the NVMe equivalent
of a target which can have multiple LUNs, aka Namespaces in NVMe.

Btw, I want to actually make the ctrlid global for the target instead of
per-subsystem to ease a few things, and debuggability is just one more
on the list.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: nvmet_rdma crash - DISCONNECT event with NULL queue
       [not found]                         ` <004701d2351a$d9e4ad70$8dae0850$@opengridcomputing.com>
@ 2016-11-02 19:18                             ` Steve Wise
  0 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-02 19:18 UTC (permalink / raw)
  To: 'Sagi Grimberg', 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

> I'll also try and reproduce this on mlx4 to rule out
> iwarp and cxgb4 anomolies.

Running the same test over mlx4/roce, I hit a warning in list_debug, and then a
stuck CPU...

I see this a few times:

[  916.207157] ------------[ cut here ]------------
[  916.212455] WARNING: CPU: 1 PID: 5553 at lib/list_debug.c:33
__list_add+0xbe/0xd0
[  916.220670] list_add corruption. prev->next should be next
(ffffffffa0847070), but was           (null). (prev=ffff880833baaf20).
[  916.233852] Modules linked in: iw_cxgb4 cxgb4 nvmet_rdma nvmet null_blk brd
ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_dfrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM
iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc
ipmi_devintf cachefiles fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverb
ib_umad ocrdma be2net iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib
mlx5_core mlx4_ib mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror
dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvmirqbypass uinput
iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr dm_mod i2c_i801 i2c_smbus sg lpc_ich
mfd_core mei_me mei nvme nvme_core igb dca ptp pps_core ipmi_si ipmi_msghandler
wmi ext4(E) mbcache(E) jbd2(E) sd_mod(E)ahci(E) libahci(E) libata(E) mgag200(E)
ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E)
syscopyarea(E) i2c_algo_bit(E) i2c_core(E) [last unloaded: cxgb4]
[  916.337427] CPU: 1 PID: 5553 Comm: kworker/1:15 Tainted: G            E
4.8.0+ #131
[  916.346192] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
[  916.354126] Workqueue: ib_cm cm_work_handler [ib_cm]
[  916.360096]  0000000000000000 ffff880817483968 ffffffff8135a817
ffffffff8137813e
[  916.368594]  ffff8808174839c8 ffff8808174839c8 0000000000000000
ffff8808174839b8
[  916.377112]  ffffffff81086dad 000000f002080020 0000002134f11400
ffff880834f11470
[  916.385642] Call Trace:
[  916.389181]  [<ffffffff8135a817>] dump_stack+0x67/0x90
[  916.395430]  [<ffffffff8137813e>] ? __list_add+0xbe/0xd0
[  916.401863]  [<ffffffff81086dad>] __warn+0xfd/0x120
[  916.407862]  [<ffffffff81086e89>] warn_slowpath_fmt+0x49/0x50
[  916.414741]  [<ffffffff8137813e>] __list_add+0xbe/0xd0
[  916.421034]  [<ffffffff816e0be6>] ? mutex_lock+0x16/0x40
[  916.427522]  [<ffffffffa0844d40>] nvmet_rdma_queue_connect+0x110/0x1a0
[nvmet_rdma]
[  916.436374]  [<ffffffffa0845430>] nvmet_rdma_cm_handler+0x100/0x1b0
[nvmet_rdma]
[  916.444998]  [<ffffffffa072e1d0>] cma_req_handler+0x200/0x300 [rdma_cm]
[  916.452847]  [<ffffffffa06f3937>] cm_process_work+0x27/0x100 [ib_cm]
[  916.460452]  [<ffffffffa06f61ea>] cm_req_handler+0x35a/0x540 [ib_cm]
[  916.468070]  [<ffffffffa06f641b>] cm_work_handler+0x4b/0xd0 [ib_cm]
[  916.475614]  [<ffffffff810a1483>] process_one_work+0x183/0x4d0
[  916.482751]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[  916.489539]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[  916.495985]  [<ffffffff810a211d>] worker_thread+0x16d/0x530
[  916.502892]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[  916.509730]  [<ffffffff810cb9b6>] ? __wake_up_common+0x56/0x90
[  916.516926]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[  916.524568]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[  916.531084]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[  916.538758]  [<ffffffff810a6c5c>] kthread+0xcc/0xf0
[  916.545053]  [<ffffffff810b1aae>] ? schedule_tail+0x1e/0xc0
[  916.552082]  [<ffffffff816e2eff>] ret_from_fork+0x1f/0x40
[  916.558935]  [<ffffffff810a6b90>] ? kthread_freezable_should_stop+0x70/0x70
[  916.567430] ---[ end trace a294c05aa08938f6 ]---

...

And then a cpu gets stuck:

[  988.672768] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
[kworker/1:12:5549]
[  988.681814] Modules linked in: iw_cxgb4 cxgb4 nvmet_rdma nvmet null_blk brd
ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_dfrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM
iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc
ipmi_devintf cachefiles fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverb
ib_umad ocrdma be2net iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib
mlx5_core mlx4_ib mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror
dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvmirqbypass uinput
iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr dm_mod i2c_i801 i2c_smbus sg lpc_ich
mfd_core mei_me mei nvme nvme_core igb dca ptp pps_core ipmi_si ipmi_msghandler
wmi ext4(E) mbcache(E) jbd2(E) sd_mod(E)ahci(E) libahci(E) libata(E) mgag200(E)
ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E)
syscopyarea(E) i2c_algo_bit(E) i2c_core(E) [last unloaded: cxgb4]
[  988.786988] CPU: 1 PID: 5549 Comm: kworker/1:12 Tainted: G        W   EL
4.8.0+ #131
[  988.796023] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
[  988.804188] Workqueue: events nvmet_keep_alive_timer [nvmet]
[  988.811068] task: ffff880819328000 task.stack: ffff880819324000
[  988.818195] RIP: 0010:[<ffffffffa084361c>]  [<ffffffffa084361c>]
nvmet_rdma_delete_ctrl+0x3c/0xb0 [nvmet_rdma]
[  988.829434] RSP: 0018:ffff880819327c58  EFLAGS: 00000287
[  988.835946] RAX: ffff880834f11b20 RBX: ffff880834f11b20 RCX: 0000000000000000
[  988.844285] RDX: 0000000000000001 RSI: ffff88085fa58ae0 RDI: ffffffffa0847040
[  988.852626] RBP: ffff880819327c88 R08: ffff88085fa58ae0 R09: ffff880819327918
[  988.860968] R10: 0000000000000920 R11: 0000000000000001 R12: ffff880834f11a00
[  988.869310] R13: ffff88081a6a4800 R14: 0000000000000000 R15: ffff88085fa5d505
[  988.877655] FS:  0000000000000000(0000) GS:ffff88085fa40000(0000)
knlGS:0000000000000000
[  988.886955] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  988.893906] CR2: 00007f28fcc6e74b CR3: 0000000001c06000 CR4: 00000000000406e0
[  988.902246] Stack:
[  988.905457]  ffff880817fc6720 0000000000000002 000000000000000f
ffff88081a6a4800
[  988.914142]  ffff88085fa58ac0 ffff88085fa5d500 ffff880819327ca8
ffffffffa0830237
[  988.922825]  ffff88085fa58ac0 ffff8808584ce900 ffff880819327d88
ffffffff810a1483
[  988.931507] Call Trace:
[  988.935152]  [<ffffffffa0830237>] nvmet_keep_alive_timer+0x37/0x40 [nvmet]
[  988.943232]  [<ffffffff810a1483>] process_one_work+0x183/0x4d0
[  988.950273]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[  988.956963]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[  988.963299]  [<ffffffff8102eb34>] ? __switch_to+0x1e4/0x790
[  988.970070]  [<ffffffff810a211d>] worker_thread+0x16d/0x530
[  988.976848]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[  988.983541]  [<ffffffff810cb9b6>] ? __wake_up_common+0x56/0x90
[  988.990578]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[  988.998055]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[  989.004394]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[  989.011861]  [<ffffffff810a6c5c>] kthread+0xcc/0xf0
[  989.017944]  [<ffffffff810b1aae>] ? schedule_tail+0x1e/0xc0
[  989.024728]  [<ffffffff816e2eff>] ret_from_fork+0x1f/0x40
[  989.031325]  [<ffffffff810a6b90>] ? kthread_freezable_should_stop+0x70/0x70
[  989.039488] Code: 90 49 89 fd 48 c7 c7 40 70 84 a0 e8 cf d5 e9 e0 48 8b 05 68
3a 00 00 48 3d 70 70 84 a0 4c 8d a0 e0 fe ff ff 48 89 c3 75 1c eb 55 <49> 8b 84
24 20 01 00 00 48 3d 70 70 84 a0 4c 8d a0 e0 fe ff ff



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-02 19:18                             ` Steve Wise
  0 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-02 19:18 UTC (permalink / raw)


> I'll also try and reproduce this on mlx4 to rule out
> iwarp and cxgb4 anomolies.

Running the same test over mlx4/roce, I hit a warning in list_debug, and then a
stuck CPU...

I see this a few times:

[  916.207157] ------------[ cut here ]------------
[  916.212455] WARNING: CPU: 1 PID: 5553 at lib/list_debug.c:33
__list_add+0xbe/0xd0
[  916.220670] list_add corruption. prev->next should be next
(ffffffffa0847070), but was           (null). (prev=ffff880833baaf20).
[  916.233852] Modules linked in: iw_cxgb4 cxgb4 nvmet_rdma nvmet null_blk brd
ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_dfrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM
iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc
ipmi_devintf cachefiles fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverb
ib_umad ocrdma be2net iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib
mlx5_core mlx4_ib mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror
dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvmirqbypass uinput
iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr dm_mod i2c_i801 i2c_smbus sg lpc_ich
mfd_core mei_me mei nvme nvme_core igb dca ptp pps_core ipmi_si ipmi_msghandler
wmi ext4(E) mbcache(E) jbd2(E) sd_mod(E)ahci(E) libahci(E) libata(E) mgag200(E)
ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E)
syscopyarea(E) i2c_algo_bit(E) i2c_core(E) [last unloaded: cxgb4]
[  916.337427] CPU: 1 PID: 5553 Comm: kworker/1:15 Tainted: G            E
4.8.0+ #131
[  916.346192] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
[  916.354126] Workqueue: ib_cm cm_work_handler [ib_cm]
[  916.360096]  0000000000000000 ffff880817483968 ffffffff8135a817
ffffffff8137813e
[  916.368594]  ffff8808174839c8 ffff8808174839c8 0000000000000000
ffff8808174839b8
[  916.377112]  ffffffff81086dad 000000f002080020 0000002134f11400
ffff880834f11470
[  916.385642] Call Trace:
[  916.389181]  [<ffffffff8135a817>] dump_stack+0x67/0x90
[  916.395430]  [<ffffffff8137813e>] ? __list_add+0xbe/0xd0
[  916.401863]  [<ffffffff81086dad>] __warn+0xfd/0x120
[  916.407862]  [<ffffffff81086e89>] warn_slowpath_fmt+0x49/0x50
[  916.414741]  [<ffffffff8137813e>] __list_add+0xbe/0xd0
[  916.421034]  [<ffffffff816e0be6>] ? mutex_lock+0x16/0x40
[  916.427522]  [<ffffffffa0844d40>] nvmet_rdma_queue_connect+0x110/0x1a0
[nvmet_rdma]
[  916.436374]  [<ffffffffa0845430>] nvmet_rdma_cm_handler+0x100/0x1b0
[nvmet_rdma]
[  916.444998]  [<ffffffffa072e1d0>] cma_req_handler+0x200/0x300 [rdma_cm]
[  916.452847]  [<ffffffffa06f3937>] cm_process_work+0x27/0x100 [ib_cm]
[  916.460452]  [<ffffffffa06f61ea>] cm_req_handler+0x35a/0x540 [ib_cm]
[  916.468070]  [<ffffffffa06f641b>] cm_work_handler+0x4b/0xd0 [ib_cm]
[  916.475614]  [<ffffffff810a1483>] process_one_work+0x183/0x4d0
[  916.482751]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[  916.489539]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[  916.495985]  [<ffffffff810a211d>] worker_thread+0x16d/0x530
[  916.502892]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[  916.509730]  [<ffffffff810cb9b6>] ? __wake_up_common+0x56/0x90
[  916.516926]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[  916.524568]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[  916.531084]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[  916.538758]  [<ffffffff810a6c5c>] kthread+0xcc/0xf0
[  916.545053]  [<ffffffff810b1aae>] ? schedule_tail+0x1e/0xc0
[  916.552082]  [<ffffffff816e2eff>] ret_from_fork+0x1f/0x40
[  916.558935]  [<ffffffff810a6b90>] ? kthread_freezable_should_stop+0x70/0x70
[  916.567430] ---[ end trace a294c05aa08938f6 ]---

...

And then a cpu gets stuck:

[  988.672768] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
[kworker/1:12:5549]
[  988.681814] Modules linked in: iw_cxgb4 cxgb4 nvmet_rdma nvmet null_blk brd
ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_dfrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM
iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc
ipmi_devintf cachefiles fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverb
ib_umad ocrdma be2net iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib
mlx5_core mlx4_ib mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror
dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvmirqbypass uinput
iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr dm_mod i2c_i801 i2c_smbus sg lpc_ich
mfd_core mei_me mei nvme nvme_core igb dca ptp pps_core ipmi_si ipmi_msghandler
wmi ext4(E) mbcache(E) jbd2(E) sd_mod(E)ahci(E) libahci(E) libata(E) mgag200(E)
ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E)
syscopyarea(E) i2c_algo_bit(E) i2c_core(E) [last unloaded: cxgb4]
[  988.786988] CPU: 1 PID: 5549 Comm: kworker/1:12 Tainted: G        W   EL
4.8.0+ #131
[  988.796023] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
[  988.804188] Workqueue: events nvmet_keep_alive_timer [nvmet]
[  988.811068] task: ffff880819328000 task.stack: ffff880819324000
[  988.818195] RIP: 0010:[<ffffffffa084361c>]  [<ffffffffa084361c>]
nvmet_rdma_delete_ctrl+0x3c/0xb0 [nvmet_rdma]
[  988.829434] RSP: 0018:ffff880819327c58  EFLAGS: 00000287
[  988.835946] RAX: ffff880834f11b20 RBX: ffff880834f11b20 RCX: 0000000000000000
[  988.844285] RDX: 0000000000000001 RSI: ffff88085fa58ae0 RDI: ffffffffa0847040
[  988.852626] RBP: ffff880819327c88 R08: ffff88085fa58ae0 R09: ffff880819327918
[  988.860968] R10: 0000000000000920 R11: 0000000000000001 R12: ffff880834f11a00
[  988.869310] R13: ffff88081a6a4800 R14: 0000000000000000 R15: ffff88085fa5d505
[  988.877655] FS:  0000000000000000(0000) GS:ffff88085fa40000(0000)
knlGS:0000000000000000
[  988.886955] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  988.893906] CR2: 00007f28fcc6e74b CR3: 0000000001c06000 CR4: 00000000000406e0
[  988.902246] Stack:
[  988.905457]  ffff880817fc6720 0000000000000002 000000000000000f
ffff88081a6a4800
[  988.914142]  ffff88085fa58ac0 ffff88085fa5d500 ffff880819327ca8
ffffffffa0830237
[  988.922825]  ffff88085fa58ac0 ffff8808584ce900 ffff880819327d88
ffffffff810a1483
[  988.931507] Call Trace:
[  988.935152]  [<ffffffffa0830237>] nvmet_keep_alive_timer+0x37/0x40 [nvmet]
[  988.943232]  [<ffffffff810a1483>] process_one_work+0x183/0x4d0
[  988.950273]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[  988.956963]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[  988.963299]  [<ffffffff8102eb34>] ? __switch_to+0x1e4/0x790
[  988.970070]  [<ffffffff810a211d>] worker_thread+0x16d/0x530
[  988.976848]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[  988.983541]  [<ffffffff810cb9b6>] ? __wake_up_common+0x56/0x90
[  988.990578]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[  988.998055]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[  989.004394]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[  989.011861]  [<ffffffff810a6c5c>] kthread+0xcc/0xf0
[  989.017944]  [<ffffffff810b1aae>] ? schedule_tail+0x1e/0xc0
[  989.024728]  [<ffffffff816e2eff>] ret_from_fork+0x1f/0x40
[  989.031325]  [<ffffffff810a6b90>] ? kthread_freezable_should_stop+0x70/0x70
[  989.039488] Code: 90 49 89 fd 48 c7 c7 40 70 84 a0 e8 cf d5 e9 e0 48 8b 05 68
3a 00 00 48 3d 70 70 84 a0 4c 8d a0 e0 fe ff ff 48 89 c3 75 1c eb 55 <49> 8b 84
24 20 01 00 00 48 3d 70 70 84 a0 4c 8d a0 e0 fe ff ff

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-02 15:15                               ` 'Christoph Hellwig'
@ 2016-11-06  7:35                                   ` Sagi Grimberg
  -1 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-06  7:35 UTC (permalink / raw)
  To: 'Christoph Hellwig', Steve Wise
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


> Btw, I want to actually make the ctrlid global for the target instead of
> per-subsystem to ease a few things, and debuggability is just one more
> on the list.

Sounds good
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-06  7:35                                   ` Sagi Grimberg
  0 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-06  7:35 UTC (permalink / raw)



> Btw, I want to actually make the ctrlid global for the target instead of
> per-subsystem to ease a few things, and debuggability is just one more
> on the list.

Sounds good

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-02 19:18                             ` Steve Wise
@ 2016-11-06  8:51                               ` Sagi Grimberg
  -1 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-06  8:51 UTC (permalink / raw)
  To: Steve Wise, 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

>> I'll also try and reproduce this on mlx4 to rule out
>> iwarp and cxgb4 anomolies.
>
> Running the same test over mlx4/roce, I hit a warning in list_debug, and then a
> stuck CPU...

Steve,

Does this patch help?
--
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index bafd9898b5f6..5946dac9564f 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -951,6 +951,7 @@ static int nvmet_rdma_create_queue_ib(struct 
nvmet_rdma_queue *queue)

  static void nvmet_rdma_destroy_queue_ib(struct nvmet_rdma_queue *queue)
  {
+       ib_drain_qp(queue->cm_id->qp);
         rdma_destroy_qp(queue->cm_id);
         ib_free_cq(queue->cq);
  }
@@ -1247,7 +1248,6 @@ static void __nvmet_rdma_queue_disconnect(struct 
nvmet_rdma_queue *queue)

         if (disconnect) {
                 rdma_disconnect(queue->cm_id);
-               ib_drain_qp(queue->cm_id->qp);
                 schedule_work(&queue->release_work);
         }
  }
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-06  8:51                               ` Sagi Grimberg
  0 siblings, 0 replies; 40+ messages in thread
From: Sagi Grimberg @ 2016-11-06  8:51 UTC (permalink / raw)


>> I'll also try and reproduce this on mlx4 to rule out
>> iwarp and cxgb4 anomolies.
>
> Running the same test over mlx4/roce, I hit a warning in list_debug, and then a
> stuck CPU...

Steve,

Does this patch help?
--
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index bafd9898b5f6..5946dac9564f 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -951,6 +951,7 @@ static int nvmet_rdma_create_queue_ib(struct 
nvmet_rdma_queue *queue)

  static void nvmet_rdma_destroy_queue_ib(struct nvmet_rdma_queue *queue)
  {
+       ib_drain_qp(queue->cm_id->qp);
         rdma_destroy_qp(queue->cm_id);
         ib_free_cq(queue->cq);
  }
@@ -1247,7 +1248,6 @@ static void __nvmet_rdma_queue_disconnect(struct 
nvmet_rdma_queue *queue)

         if (disconnect) {
                 rdma_disconnect(queue->cm_id);
-               ib_drain_qp(queue->cm_id->qp);
                 schedule_work(&queue->release_work);
         }
  }
--

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-06  7:35                                   ` Sagi Grimberg
@ 2016-11-07 18:29                                       ` J Freyensee
  -1 siblings, 0 replies; 40+ messages in thread
From: J Freyensee @ 2016-11-07 18:29 UTC (permalink / raw)
  To: Sagi Grimberg, 'Christoph Hellwig', Steve Wise
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Minturn, Dave B

On Sun, 2016-11-06 at 09:35 +0200, Sagi Grimberg wrote:
> > 
> > Btw, I want to actually make the ctrlid global for the target
> > instead of
> > per-subsystem to ease a few things, and debuggability is just one
> > more
> > on the list.

How will that be NVMe-over-Fabrics spec compliant?

The way I interpret the spec, ctrlid (I'm assuming you mean cntlid) is
allocated on a NVM subsystem basis.  For example, Figure 34 of the
Discovery Log Page entry and Figure 20 of the Connect Command implies
to me CNTLID values are allocated on a NVM Subsystem granular-level
when I see statements such as:

(Figure 20: Connect Command Data): "...If the NVM subsystem uses the
static controller model and the value is FFFEh for the Admin Queue,
then any available controller may be returned."

This implies to me cntlid are allocated on an NVM subsystem basis, not
an NVMe Target basis.

> 
> Sounds good
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-07 18:29                                       ` J Freyensee
  0 siblings, 0 replies; 40+ messages in thread
From: J Freyensee @ 2016-11-07 18:29 UTC (permalink / raw)


On Sun, 2016-11-06@09:35 +0200, Sagi Grimberg wrote:
> > 
> > Btw, I want to actually make the ctrlid global for the target
> > instead of
> > per-subsystem to ease a few things, and debuggability is just one
> > more
> > on the list.

How will that be NVMe-over-Fabrics spec compliant?

The way I interpret the spec, ctrlid (I'm assuming you mean cntlid) is
allocated on a NVM subsystem basis. ?For example, Figure 34 of the
Discovery Log Page entry and Figure 20 of the Connect Command implies
to me CNTLID values are allocated on a NVM Subsystem granular-level
when I see statements such as:

(Figure 20: Connect Command Data): "...If the NVM subsystem uses the
static controller model and the value is FFFEh for the Admin Queue,
then any available controller may be returned."

This implies to me cntlid are allocated on an NVM subsystem basis, not
an NVMe Target basis.

> 
> Sounds good
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-07 18:29                                       ` J Freyensee
@ 2016-11-07 18:41                                           ` 'Christoph Hellwig'
  -1 siblings, 0 replies; 40+ messages in thread
From: 'Christoph Hellwig' @ 2016-11-07 18:41 UTC (permalink / raw)
  To: J Freyensee
  Cc: Sagi Grimberg, 'Christoph Hellwig',
	Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Minturn, Dave B

On Mon, Nov 07, 2016 at 10:29:38AM -0800, J Freyensee wrote:
> The way I interpret the spec, ctrlid (I'm assuming you mean cntlid) is
> allocated on a NVM subsystem basis.  For example, Figure 34 of the
> Discovery Log Page entry and Figure 20 of the Connect Command implies
> to me CNTLID values are allocated on a NVM Subsystem granular-level
> when I see statements such as:

It is per-subsystem.  But nothing in the spec prohibits and implementation
that has multiple subsystems to simply not allocate cntlids that
would conflict betweens it's subsystems.

And in fact there is a TP in the working group that would require
implementations not to reuse cntlids for it to work.  We'll probably
hear more about that once it's published.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-07 18:41                                           ` 'Christoph Hellwig'
  0 siblings, 0 replies; 40+ messages in thread
From: 'Christoph Hellwig' @ 2016-11-07 18:41 UTC (permalink / raw)


On Mon, Nov 07, 2016@10:29:38AM -0800, J Freyensee wrote:
> The way I interpret the spec, ctrlid (I'm assuming you mean cntlid) is
> allocated on a NVM subsystem basis. ?For example, Figure 34 of the
> Discovery Log Page entry and Figure 20 of the Connect Command implies
> to me CNTLID values are allocated on a NVM Subsystem granular-level
> when I see statements such as:

It is per-subsystem.  But nothing in the spec prohibits and implementation
that has multiple subsystems to simply not allocate cntlids that
would conflict betweens it's subsystems.

And in fact there is a TP in the working group that would require
implementations not to reuse cntlids for it to work.  We'll probably
hear more about that once it's published.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-07 18:41                                           ` 'Christoph Hellwig'
@ 2016-11-07 18:50                                               ` J Freyensee
  -1 siblings, 0 replies; 40+ messages in thread
From: J Freyensee @ 2016-11-07 18:50 UTC (permalink / raw)
  To: 'Christoph Hellwig'
  Cc: Sagi Grimberg, Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Minturn, Dave B

On Mon, 2016-11-07 at 19:41 +0100, 'Christoph Hellwig' wrote:
> On Mon, Nov 07, 2016 at 10:29:38AM -0800, J Freyensee wrote:
> > 
> > The way I interpret the spec, ctrlid (I'm assuming you mean cntlid)
> > is
> > allocated on a NVM subsystem basis.  For example, Figure 34 of the
> > Discovery Log Page entry and Figure 20 of the Connect Command
> > implies
> > to me CNTLID values are allocated on a NVM Subsystem granular-level
> > when I see statements such as:
> 
> It is per-subsystem.  But nothing in the spec prohibits and
> implementation
> that has multiple subsystems to simply not allocate cntlids that
> would conflict betweens it's subsystems.

OK, so basically the nvmet change would be to make sure unique cntlids
are used across all NVM subsystems within the NVMe Target then?


> 
> And in fact there is a TP in the working group that would require
> implementations not to reuse cntlids for it to work.  We'll probably
> hear more about that once it's published.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-07 18:50                                               ` J Freyensee
  0 siblings, 0 replies; 40+ messages in thread
From: J Freyensee @ 2016-11-07 18:50 UTC (permalink / raw)


On Mon, 2016-11-07@19:41 +0100, 'Christoph Hellwig' wrote:
> On Mon, Nov 07, 2016@10:29:38AM -0800, J Freyensee wrote:
> > 
> > The way I interpret the spec, ctrlid (I'm assuming you mean cntlid)
> > is
> > allocated on a NVM subsystem basis. ?For example, Figure 34 of the
> > Discovery Log Page entry and Figure 20 of the Connect Command
> > implies
> > to me CNTLID values are allocated on a NVM Subsystem granular-level
> > when I see statements such as:
> 
> It is per-subsystem.??But nothing in the spec prohibits and
> implementation
> that has multiple subsystems to simply not allocate cntlids that
> would conflict betweens it's subsystems.

OK, so basically the nvmet change would be to make sure unique cntlids
are used across all NVM subsystems within the NVMe Target then?


> 
> And in fact there is a TP in the working group that would require
> implementations not to reuse cntlids for it to work.??We'll probably
> hear more about that once it's published.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-07 18:50                                               ` J Freyensee
@ 2016-11-07 18:51                                                   ` 'Christoph Hellwig'
  -1 siblings, 0 replies; 40+ messages in thread
From: 'Christoph Hellwig' @ 2016-11-07 18:51 UTC (permalink / raw)
  To: J Freyensee
  Cc: 'Christoph Hellwig',
	Sagi Grimberg, Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Minturn, Dave B

On Mon, Nov 07, 2016 at 10:50:16AM -0800, J Freyensee wrote:
> OK, so basically the nvmet change would be to make sure unique cntlids
> are used across all NVM subsystems within the NVMe Target then?

Yes.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-07 18:51                                                   ` 'Christoph Hellwig'
  0 siblings, 0 replies; 40+ messages in thread
From: 'Christoph Hellwig' @ 2016-11-07 18:51 UTC (permalink / raw)


On Mon, Nov 07, 2016@10:50:16AM -0800, J Freyensee wrote:
> OK, so basically the nvmet change would be to make sure unique cntlids
> are used across all NVM subsystems within the NVMe Target then?

Yes.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: nvmet_rdma crash - DISCONNECT event with NULL queue
  2016-11-06  8:51                               ` Sagi Grimberg
@ 2016-11-08 20:45                                   ` Steve Wise
  -1 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-08 20:45 UTC (permalink / raw)
  To: 'Sagi Grimberg', 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

> > Running the same test over mlx4/roce, I hit a warning in list_debug, and
then a
> > stuck CPU...
> 
> Steve,
> 
> Does this patch help?
> --

Hey Sagi,

With this patch applied, I haven't seen any more target side crashes.  I say
this with a caveat, though, that I'm chasing a few other host side issues
still...

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* nvmet_rdma crash - DISCONNECT event with NULL queue
@ 2016-11-08 20:45                                   ` Steve Wise
  0 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2016-11-08 20:45 UTC (permalink / raw)


> > Running the same test over mlx4/roce, I hit a warning in list_debug, and
then a
> > stuck CPU...
> 
> Steve,
> 
> Does this patch help?
> --

Hey Sagi,

With this patch applied, I haven't seen any more target side crashes.  I say
this with a caveat, though, that I'm chasing a few other host side issues
still...

Steve.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2016-11-08 20:45 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-01 15:57 nvmet_rdma crash - DISCONNECT event with NULL queue Steve Wise
2016-11-01 15:57 ` Steve Wise
2016-11-01 16:15 ` Sagi Grimberg
2016-11-01 16:15   ` Sagi Grimberg
     [not found]   ` <6f42d056-284d-00fc-2b98-189f54957980-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-11-01 16:20     ` Steve Wise
2016-11-01 16:20       ` Steve Wise
2016-11-01 16:34       ` Sagi Grimberg
2016-11-01 16:34         ` Sagi Grimberg
     [not found]         ` <4cc25277-429a-4ab9-470c-b3af1428ce93-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-11-01 16:37           ` Steve Wise
2016-11-01 16:37             ` Steve Wise
2016-11-01 16:44             ` Sagi Grimberg
2016-11-01 16:44               ` Sagi Grimberg
     [not found]               ` <dbe5f18d-7928-f065-920f-753b30fb99a2-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-11-01 16:49                 ` Steve Wise
2016-11-01 16:49                   ` Steve Wise
2016-11-01 17:41                   ` Sagi Grimberg
2016-11-01 17:41                     ` Sagi Grimberg
     [not found]                     ` <025201d23476$66812290$338367b0$@opengridcomputing.com>
2016-11-01 19:42                       ` Steve Wise
2016-11-01 19:42                         ` Steve Wise
     [not found]                     ` <024e01d23476$6668b890$333a29b0$@opengridcomputing.com>
2016-11-01 22:34                       ` Sagi Grimberg
2016-11-01 22:34                         ` Sagi Grimberg
     [not found]                         ` <3512b8bb-4d29-b90a-49e1-ebf1085c47d7-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-11-02 15:07                           ` Steve Wise
2016-11-02 15:07                             ` Steve Wise
2016-11-02 15:15                             ` 'Christoph Hellwig'
2016-11-02 15:15                               ` 'Christoph Hellwig'
     [not found]                               ` <20161102151540.GB14825-jcswGhMUV9g@public.gmane.org>
2016-11-06  7:35                                 ` Sagi Grimberg
2016-11-06  7:35                                   ` Sagi Grimberg
     [not found]                                   ` <bd82c206-668a-0794-2d51-3c48058350d3-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-11-07 18:29                                     ` J Freyensee
2016-11-07 18:29                                       ` J Freyensee
     [not found]                                       ` <1478543378.3350.17.camel-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-11-07 18:41                                         ` 'Christoph Hellwig'
2016-11-07 18:41                                           ` 'Christoph Hellwig'
     [not found]                                           ` <20161107184126.GA4400-jcswGhMUV9g@public.gmane.org>
2016-11-07 18:50                                             ` J Freyensee
2016-11-07 18:50                                               ` J Freyensee
     [not found]                                               ` <1478544616.3350.29.camel-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-11-07 18:51                                                 ` 'Christoph Hellwig'
2016-11-07 18:51                                                   ` 'Christoph Hellwig'
     [not found]                         ` <004701d2351a$d9e4ad70$8dae0850$@opengridcomputing.com>
2016-11-02 19:18                           ` Steve Wise
2016-11-02 19:18                             ` Steve Wise
2016-11-06  8:51                             ` Sagi Grimberg
2016-11-06  8:51                               ` Sagi Grimberg
     [not found]                               ` <b499abb0-afc9-2985-c4c4-3ceba4ca6f33-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-11-08 20:45                                 ` Steve Wise
2016-11-08 20:45                                   ` Steve Wise

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.