All of lore.kernel.org
 help / color / mirror / Atom feed
* nvme-fabrics: crash at nvme connect-all
@ 2016-06-09  9:18 Marta Rybczynska
  2016-06-09  9:29 ` Sagi Grimberg
  2016-06-09 13:24 ` Christoph Hellwig
  0 siblings, 2 replies; 37+ messages in thread
From: Marta Rybczynska @ 2016-06-09  9:18 UTC (permalink / raw)


Hello,
I'm testing the nvme-fabrics patchset and I get a kernel stall or errors when running 
nvme connect-all. Below you have the commands and kernel log I get when it outputs
errors. I'm going to debug it further today.

The commands I run:

./nvme discover -t rdma -a 10.0.0.3
Discovery Log Number of Records 1, Generation counter 1
=====Discovery Log Entry 0======
trtype:  ipv4
adrfam:  rdma
nqntype: 2
treq:    0
portid:  2
trsvcid: 4420
subnqn:  testnqn
traddr:  10.0.0.3
rdma_prtype: 0
rdma_qptype: 0
rdma_cms:    0
rdma_pkey: 0x0000

./nvme connect -t rdma -n testnqn -a 10.0.0.3
Failed to write to /dev/nvme-fabrics: Connection reset by peer

./nvme connect-all -t rdma  -a 10.0.0.3
<here the kernel crashes>

In the kernel log I have:
[  591.484708] nvmet_rdma: enabling port 2 (10.0.0.3:4420)
[  656.778004] nvmet: creating controller 1 for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:a2e92078-7f9f-4b19-bb4f-4250599bdb14.
[  656.778255] nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.0.3:4420
[  656.778573] nvmet_rdma: freeing queue 0
[  703.195100] nvmet: creating controller 1 for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:a2e92078-7f9f-4b19-bb4f-4250599bdb14.
[  703.195339] nvme nvme1: creating 8 I/O queues.
[  703.239462] rdma_rw_init_mrs: failed to allocated 128 MRs
[  703.239498] failed to init MR pool ret= -12
[  703.239541] nvmet_rdma: failed to create_qp ret= -12
[  703.239582] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
[  703.243569] nvme nvme1: Connect rejected, no private data.
[  703.243615] nvme nvme1: rdma_resolve_addr wait failed (-104).
[  703.243625] nvme nvme1: failed to initialize i/o queue: -104
[  703.243739] nvmet_rdma: freeing queue 6
[  703.243824] nvmet_rdma: freeing queue 5
[  703.243931] nvmet_rdma: freeing queue 4
[  703.244014] nvmet_rdma: freeing queue 3
[  703.244148] nvmet_rdma: freeing queue 2
[  703.244247] nvmet_rdma: freeing queue 1
[  703.244310] nvmet_rdma: freeing queue 0
[  708.201593] nvme h-6-\xffffff88\xffffffff\xffffffffp-6-\xffffff88\xffffffff\xffffffffx-6-\xffffff88\xffffffff\xffffffffH-6-\xffffff88\xffffffff\xffffffffP-6-\xffffff88\xffffffff\xffffffffX-6-\xffffff88\xffffffff\xffffffff`-6-\xffffff88\xffffffff\xffffffff8-6-\xffffff88\xffffffff\xffffffff\xfffffff8,6-\xffffff88\xffffffff\xffffffff\xffffff88-6-\xffffff88\xffffffff\xffffffff\xffffff90-6-\xffffff88\xffffffff\xffffffff\xffffff98-6-\xffffff88\xffffffff\xffffffff\xffffffa0-6-\xffffff88\xffffffff\xffffffff\xffffffa8-6-\xffffff88\xffffffff\xffffffff\xffffffb0-6-\xffffff88\xffffffff\xffffffff\xffffffb8-6-\xffffff88\xffffffff\xffffffff\xffffffc0-6-\xffffff88\xffffffff\xffffffff\xffffffc8-6-\xffffff88\xffffffff\xffffffff\xffffffd0-6-\xffffff88\xffffffff\xffffffff\xffffffd8-6-\xffffff88\xffffffff\xffffffff\xffffffe0-6-\xffffff88\xffffffff\xffffffff\xffffffe8-6-\xffffff88\xffffffff\xffffffff\xfffffff0-6-\xffffff88\xffffffff\xffffffff\xfffffff8-6-\xffffff88\xffffffff\xffffffff: keep-alive failed
[  795.061742] ------------[ cut here ]------------
[  795.061756] WARNING: CPU: 0 PID: 3920 at include/linux/kref.h:46 nvmf_dev_write+0x89d/0x95c [nvme_fabrics]
[  795.061759] Modules linked in: nvmet_rdma nvme_rdma nvme_fabrics nvmet cts rpcsec_gss_krb5 nfsv4 dns_resolver nfsv3 nfs fscache ocrdma edac_core x86_pkg_temp_thermal intel_powerclamp iw_cxgb4 rpcrdma coretemp ib_isert iscsi_target_mod kvm_intel ib_iser libiscsi kvm scsi_transport_iscsi irqbypass ib_srpt crct10dif_pclmul crc32_pclmul snd_hda_codec_hdmi target_core_mod snd_hda_codec_ca0132 snd_hda_intel snd_hda_codec ib_srp crc32c_intel ghash_clmulni_intel aesni_intel lrw snd_hda_core gf128mul scsi_transport_srp glue_helper ib_ipoib snd_hwdep snd_seq snd_seq_device snd_pcm rdma_ucm ablk_helper ib_ucm cxgb4 ib_uverbs snd_timer cryptd nfsd snd ib_umad dm_mirror rdma_cm be2net ib_cm nuvoton_cir rc_core iTCO_wdt soundcore dm_region_hash iTCO_vendor_support iw_cm mxm_wmi mei_me auth_rpcgss i2c_i801 serio_raw
[  795.061817]  lpc_ich mfd_core wmi mei dm_log ib_core dm_mod nfs_acl lockd grace shpchp sunrpc uinput ext4 jbd2 mbcache sd_mod radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm tg3 drm ptp ahci libahci pps_core mpt3sas libata firewire_ohci firewire_core nvme crc_itu_t raid_class nvme_core scsi_transport_sas i2c_dev i2c_core
[  795.061851] CPU: 0 PID: 3920 Comm: nvme Not tainted 4.7.0-rc2+ #1
[  795.061854] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X79 Extreme11, BIOS P3.30 02/14/2014
[  795.061856]  0000000000000286 00000000f124b0d3 ffff88040d68bd18 ffffffff8133b92f
[  795.061861]  0000000000000000 0000000000000000 ffff88040d68bd58 ffffffff810828f1
[  795.061865]  0000002e8134b2ac 0000000000000047 0000000000000000 ffff88040b7c5240
[  795.061869] Call Trace:
[  795.061877]  [<ffffffff8133b92f>] dump_stack+0x63/0x84
[  795.061882]  [<ffffffff810828f1>] __warn+0xd1/0xf0
[  795.061885]  [<ffffffff81082a2d>] warn_slowpath_null+0x1d/0x20
[  795.061890]  [<ffffffffa072f18d>] nvmf_dev_write+0x89d/0x95c [nvme_fabrics]
[  795.061896]  [<ffffffff812101d7>] __vfs_write+0x37/0x140
[  795.061901]  [<ffffffff8122fbd3>] ? __fd_install+0x33/0xe0
[  795.061904]  [<ffffffff81210ee2>] vfs_write+0xb2/0x1b0
[  795.061908]  [<ffffffff81212335>] SyS_write+0x55/0xc0
[  795.061913]  [<ffffffff81003b12>] do_syscall_64+0x62/0x110
[  795.061919]  [<ffffffff816aefa1>] entry_SYSCALL64_slow_path+0x25/0x25
[  795.061923] ---[ end trace 0147b15a80ad801a ]---
[  795.062175] cma acquire res 0
[  795.062411] cma acquire res 0
[  795.064339] nvmet: creating controller 1 for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:a2e92078-7f9f-4b19-bb4f-4250599bdb14.
[  795.064520] nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.0.3:4420
[  840.409731] INFO: task kworker/7:1:232 blocked for more than 120 seconds.
[  840.409800]       Tainted: G        W       4.7.0-rc2+ #1
[  840.409848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  840.409915] kworker/7:1     D ffff880423c8fb88     0   232      2 0x00000000
[  840.409930] Workqueue: nvme_rdma_wq nvme_rdma_reset_ctrl_work [nvme_rdma]
[  840.409933]  ffff880423c8fb88 ffff880423c8fbd0 ffff880423ca0000 00000001000711cc
[  840.409937]  ffff880423c90000 7fffffffffffffff ffff880423c8fce0 ffff880423ca0000
[  840.409941]  ffff880423ca0000 ffff880423c8fba0 ffffffff816ab1c5 ffff880423c8fce8
[  840.409945] Call Trace:
[  840.409954]  [<ffffffff816ab1c5>] schedule+0x35/0x80
[  840.409959]  [<ffffffff816ae171>] schedule_timeout+0x231/0x2d0
[  840.409964]  [<ffffffff816abcc1>] wait_for_completion+0xf1/0x130
[  840.409969]  [<ffffffff810ad2d0>] ? wake_up_q+0x80/0x80
[  840.409975]  [<ffffffff8109abe0>] flush_work+0x110/0x190
[  840.409978]  [<ffffffff81098cd0>] ? destroy_worker+0x90/0x90
[  840.409983]  [<ffffffff8109c821>] __cancel_work_timer+0xa1/0x1c0
[  840.409989]  [<ffffffff810b9f75>] ? put_prev_entity+0x35/0x700
[  840.409993]  [<ffffffff8109c973>] cancel_delayed_work_sync+0x13/0x20
[  840.410000]  [<ffffffffa002a50f>] nvme_stop_keep_alive+0x1f/0x30 [nvme_core]
[  840.410005]  [<ffffffffa07c0be0>] nvme_rdma_shutdown_ctrl+0x20/0xe0 [nvme_rdma]
[  840.410010]  [<ffffffffa07c11ee>] nvme_rdma_reset_ctrl_work+0x1e/0x120 [nvme_rdma]
[  840.410014]  [<ffffffff8109b842>] process_one_work+0x152/0x400
[  840.410018]  [<ffffffff8109c27c>] worker_thread+0x26c/0x4b0
[  840.410022]  [<ffffffff8109c010>] ? rescuer_thread+0x380/0x380
[  840.410027]  [<ffffffff810a1c68>] kthread+0xd8/0xf0
[  840.410032]  [<ffffffff816af0ff>] ret_from_fork+0x1f/0x40
[  840.410037]  [<ffffffff810a1b90>] ? kthread_park+0x60/0x60
[  840.410041] INFO: task kworker/7:2:301 blocked for more than 120 seconds.

Regards,

-- 

Marta Rybczynska 

Phone : +33 6 71 09 68 03 
mrybczyn at kalray.eu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09  9:18 nvme-fabrics: crash at nvme connect-all Marta Rybczynska
@ 2016-06-09  9:29 ` Sagi Grimberg
  2016-06-09 10:07   ` Marta Rybczynska
  2016-06-09 13:25   ` Christoph Hellwig
  2016-06-09 13:24 ` Christoph Hellwig
  1 sibling, 2 replies; 37+ messages in thread
From: Sagi Grimberg @ 2016-06-09  9:29 UTC (permalink / raw)



> Hello,
> I'm testing the nvme-fabrics patchset and I get a kernel stall or errors when running
> nvme connect-all. Below you have the commands and kernel log I get when it outputs
> errors. I'm going to debug it further today.
>
> The commands I run:
>
> ./nvme discover -t rdma -a 10.0.0.3
> Discovery Log Number of Records 1, Generation counter 1
> =====Discovery Log Entry 0======
> trtype:  ipv4
> adrfam:  rdma
> nqntype: 2
> treq:    0
> portid:  2
> trsvcid: 4420
> subnqn:  testnqn
> traddr:  10.0.0.3
> rdma_prtype: 0
> rdma_qptype: 0
> rdma_cms:    0
> rdma_pkey: 0x0000
>
> ./nvme connect -t rdma -n testnqn -a 10.0.0.3
> Failed to write to /dev/nvme-fabrics: Connection reset by peer
>
> ./nvme connect-all -t rdma  -a 10.0.0.3
> <here the kernel crashes>

Hi Marta,

I got the same bug report, it looks like we're might
be facing a double-free condition.

Does this patch help?
-- 
commit fd36b6ef3d0881b1bccc1eac8737baaf8c863a21
Author: Sagi Grimberg <sagi at grimberg.me>
Date:   Thu Jun 9 12:17:09 2016 +0300

     fabrics: Don't directly free opts->host

     It might be the default host, so we need to call
     nvmet_put_host (which is safe against NULL lucky for
     us).

     Reported-by: Alexander Nezhinsky <alexander.nezhinsky at excelero.com>
     Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index ee4b7f137ad5..cd7eb03c4ff7 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -806,7 +806,7 @@ nvmf_create_ctrl(struct device *dev, const char 
*buf, size_t count)
  out_unlock:
         mutex_unlock(&nvmf_transports_mutex);
  out_free_opts:
-       kfree(opts->host);
+       nvmf_host_put(opts->host);
         kfree(opts);
         return ERR_PTR(ret);
  }
-- 

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09  9:29 ` Sagi Grimberg
@ 2016-06-09 10:07   ` Marta Rybczynska
  2016-06-09 11:09     ` Sagi Grimberg
  2016-06-09 13:25   ` Christoph Hellwig
  1 sibling, 1 reply; 37+ messages in thread
From: Marta Rybczynska @ 2016-06-09 10:07 UTC (permalink / raw)


----- Le 9 Juin 16, ? 11:29, Sagi Grimberg sagi at lightbits.io a ?crit :
> 
> Hi Marta,
> 
> I got the same bug report, it looks like we're might
> be facing a double-free condition.
> 
> Does this patch help?
> --
> commit fd36b6ef3d0881b1bccc1eac8737baaf8c863a21


Hello Sagi,
Unfortunately it crashes still (this time at the first connect already).
I'm investigating this.

Regards,

-- 

Marta Rybczynska 

Phone : +33 6 71 09 68 03 
mrybczyn at kalray.eu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 10:07   ` Marta Rybczynska
@ 2016-06-09 11:09     ` Sagi Grimberg
  2016-06-09 12:12       ` Marta Rybczynska
  0 siblings, 1 reply; 37+ messages in thread
From: Sagi Grimberg @ 2016-06-09 11:09 UTC (permalink / raw)



> Hello Sagi,
> Unfortunately it crashes still (this time at the first connect already).
> I'm investigating this.

OK, looking further into your bug report its actually different from
the one I got. I do think that this patch makes sense regardless.

 From your log I can only guess that you are using iWARP devices?
because I see that nvmet tries to init an MR pool and seem to
run out of memory...

Which device are you using? Are you running on a low memory machine?
Perhaps the rdma rw code needs to check max_mr capability?

Also, are you running the host and target on a single server? it might
be possible that it requires more resources than the host (or device)
has?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 11:09     ` Sagi Grimberg
@ 2016-06-09 12:12       ` Marta Rybczynska
  2016-06-09 12:30         ` Sagi Grimberg
  0 siblings, 1 reply; 37+ messages in thread
From: Marta Rybczynska @ 2016-06-09 12:12 UTC (permalink / raw)



----- Le 9 Juin 16, ? 13:09, Sagi Grimberg sagi at lightbits.io a ?crit :

>> Hello Sagi,
>> Unfortunately it crashes still (this time at the first connect already).
>> I'm investigating this.
> 
> OK, looking further into your bug report its actually different from
> the one I got. I do think that this patch makes sense regardless.
> 
> From your log I can only guess that you are using iWARP devices?
> because I see that nvmet tries to init an MR pool and seem to
> run out of memory...
> 
> Which device are you using? Are you running on a low memory machine?
> Perhaps the rdma rw code needs to check max_mr capability?
> 

It's a rather big machine with 8 cores/16GB of memory, never had memory
limitation problems on this one. The card is a Chelsio T5. One of the
targets of this configuration is to check if it works with this card.

> Also, are you running the host and target on a single server? it might
> be possible that it requires more resources than the host (or device)
> has?

Yes, that might be an important point.


-- 

Marta Rybczynska 

Phone : +33 6 71 09 68 03 
mrybczyn at kalray.eu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 12:12       ` Marta Rybczynska
@ 2016-06-09 12:30         ` Sagi Grimberg
  2016-06-09 13:27           ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Sagi Grimberg @ 2016-06-09 12:30 UTC (permalink / raw)



>> Which device are you using? Are you running on a low memory machine?
>> Perhaps the rdma rw code needs to check max_mr capability?
>>
>
> It's a rather big machine with 8 cores/16GB of memory, never had memory
> limitation problems on this one. The card is a Chelsio T5. One of the
> targets of this configuration is to check if it works with this card.

So it must come from the Chelsio device then...

Can you provide the max_mr output of the device (you can use
ibv_devinfo -v from libibverbs/ibverbs-utils).

What happens if you use smaller queues, say 64/32 (can be added as a
queue_size parameter working directly from sysfs, nvme-cli does not
support this yet).

Steve, did you see this before? I'm wandering if we need some sort
of logic handling with resource limitation in iWARP (global mrs pool...)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09  9:18 nvme-fabrics: crash at nvme connect-all Marta Rybczynska
  2016-06-09  9:29 ` Sagi Grimberg
@ 2016-06-09 13:24 ` Christoph Hellwig
  2016-06-09 15:37   ` Marta Rybczynska
  1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2016-06-09 13:24 UTC (permalink / raw)


On Thu, Jun 09, 2016@11:18:03AM +0200, Marta Rybczynska wrote:
> Hello,
> I'm testing the nvme-fabrics patchset and I get a kernel stall or errors when running 
> nvme connect-all. Below you have the commands and kernel log I get when it outputs
> errors. I'm going to debug it further today.
> 
> The commands I run:
> 
> ./nvme discover -t rdma -a 10.0.0.3
> Discovery Log Number of Records 1, Generation counter 1
> =====Discovery Log Entry 0======
> trtype:  ipv4
> adrfam:  rdma
> nqntype: 2
> treq:    0
> portid:  2
> trsvcid: 4420
> subnqn:  testnqn
> traddr:  10.0.0.3
> rdma_prtype: 0
> rdma_qptype: 0
> rdma_cms:    0
> rdma_pkey: 0x0000
> 
> ./nvme connect -t rdma -n testnqn -a 10.0.0.3
> Failed to write to /dev/nvme-fabrics: Connection reset by peer
> 
> ./nvme connect-all -t rdma  -a 10.0.0.3
> <here the kernel crashes>
> 
> In the kernel log I have:
> [  591.484708] nvmet_rdma: enabling port 2 (10.0.0.3:4420)
> [  656.778004] nvmet: creating controller 1 for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:a2e92078-7f9f-4b19-bb4f-4250599bdb14.
> [  656.778255] nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.0.3:4420
> [  656.778573] nvmet_rdma: freeing queue 0
> [  703.195100] nvmet: creating controller 1 for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:a2e92078-7f9f-4b19-bb4f-4250599bdb14.
> [  703.195339] nvme nvme1: creating 8 I/O queues.
> [  703.239462] rdma_rw_init_mrs: failed to allocated 128 MRs
> [  703.239498] failed to init MR pool ret= -12
> [  703.239541] nvmet_rdma: failed to create_qp ret= -12
> [  703.239582] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).

To get things working you should try a smaller queue size.  We actually
have an option for this in the kernel, but nvme-cli doesn't expose
it yet, so feel free to hardcode it.

Of course we've still got a real bug in the error handling..

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09  9:29 ` Sagi Grimberg
  2016-06-09 10:07   ` Marta Rybczynska
@ 2016-06-09 13:25   ` Christoph Hellwig
  1 sibling, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2016-06-09 13:25 UTC (permalink / raw)


> I got the same bug report, it looks like we're might
> be facing a double-free condition.
> 
> Does this patch help?

Ah, thanks - this looks like a good fix.  I'll add it to the queue for
the next iteration.
queue.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 12:30         ` Sagi Grimberg
@ 2016-06-09 13:27           ` Steve Wise
  2016-06-09 13:36             ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-09 13:27 UTC (permalink / raw)


> 
> >> Which device are you using? Are you running on a low memory machine?
> >> Perhaps the rdma rw code needs to check max_mr capability?
> >>
> >
> > It's a rather big machine with 8 cores/16GB of memory, never had memory
> > limitation problems on this one. The card is a Chelsio T5. One of the
> > targets of this configuration is to check if it works with this card.
> 
> So it must come from the Chelsio device then...
> 
> Can you provide the max_mr output of the device (you can use
> ibv_devinfo -v from libibverbs/ibverbs-utils).
> 
> What happens if you use smaller queues, say 64/32 (can be added as a
> queue_size parameter working directly from sysfs, nvme-cli does not
> support this yet).
> 
> Steve, did you see this before? I'm wandering if we need some sort
> of logic handling with resource limitation in iWARP (global mrs pool...)

Haven't seen this.  Does 'cat /sys/kernel/debug/iw_cxgb4/blah/stats' show anything interesting?  Where/why is it crashing? 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 13:27           ` Steve Wise
@ 2016-06-09 13:36             ` Steve Wise
  2016-06-09 13:48               ` Sagi Grimberg
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-09 13:36 UTC (permalink / raw)


> > Steve, did you see this before? I'm wandering if we need some sort
> > of logic handling with resource limitation in iWARP (global mrs pool...)
> 
> Haven't seen this.  Does 'cat /sys/kernel/debug/iw_cxgb4/blah/stats' show
> anything interesting?  Where/why is it crashing?
> 

So this is the failure:

[  703.239462] rdma_rw_init_mrs: failed to allocated 128 MRs
[  703.239498] failed to init MR pool ret= -12
[  703.239541] nvmet_rdma: failed to create_qp ret= -12
[  703.239582] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed
(-12).

Not sure why it would fail.  I would think my setup would be allocating more
given I have 16 cores on the host and target.  The debugfs "stats" file I
mentioned above should show us something if we're running out of adapter
resources for MR or PBL records.  

Can you please turn on c4iw_debug and send me the debug output?  echo 1 >
/sys/module/iw_cxgb4/parameters/c4iw_debug.

Thanks,

Steve.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 13:36             ` Steve Wise
@ 2016-06-09 13:48               ` Sagi Grimberg
  2016-06-09 14:09                 ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Sagi Grimberg @ 2016-06-09 13:48 UTC (permalink / raw)



>>> Steve, did you see this before? I'm wandering if we need some sort
>>> of logic handling with resource limitation in iWARP (global mrs pool...)
>>
>> Haven't seen this.  Does 'cat /sys/kernel/debug/iw_cxgb4/blah/stats' show
>> anything interesting?  Where/why is it crashing?
>>
>
> So this is the failure:
>
> [  703.239462] rdma_rw_init_mrs: failed to allocated 128 MRs
> [  703.239498] failed to init MR pool ret= -12
> [  703.239541] nvmet_rdma: failed to create_qp ret= -12
> [  703.239582] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed
> (-12).
>
> Not sure why it would fail.  I would think my setup would be allocating more
> given I have 16 cores on the host and target.  The debugfs "stats" file I
> mentioned above should show us something if we're running out of adapter
> resources for MR or PBL records.

Note that Marta ran both the host and the target on the same machine.
So, 8 (cores) x 128 (queue entries) x 2 (host and target) gives 2048
MRs...

What is the T5 limitation?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 13:48               ` Sagi Grimberg
@ 2016-06-09 14:09                 ` Steve Wise
  2016-06-09 14:22                   ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-09 14:09 UTC (permalink / raw)


> 
> >>> Steve, did you see this before? I'm wandering if we need some sort
> >>> of logic handling with resource limitation in iWARP (global mrs pool...)
> >>
> >> Haven't seen this.  Does 'cat /sys/kernel/debug/iw_cxgb4/blah/stats' show
> >> anything interesting?  Where/why is it crashing?
> >>
> >
> > So this is the failure:
> >
> > [  703.239462] rdma_rw_init_mrs: failed to allocated 128 MRs
> > [  703.239498] failed to init MR pool ret= -12
> > [  703.239541] nvmet_rdma: failed to create_qp ret= -12
> > [  703.239582] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue
> failed
> > (-12).
> >
> > Not sure why it would fail.  I would think my setup would be allocating more
> > given I have 16 cores on the host and target.  The debugfs "stats" file I
> > mentioned above should show us something if we're running out of adapter
> > resources for MR or PBL records.
> 
> Note that Marta ran both the host and the target on the same machine.
> So, 8 (cores) x 128 (queue entries) x 2 (host and target) gives 2048
> MRs...
> 
> What is the T5 limitation?

It varies based on a config file that gets loaded when cxgb4 loads.  Note the
error has nothing to do with the low fastreg sg depth limit of T5.  If we were
hitting that then we would be seeing EINVAL and not ENOMEM.  Looking at
c4iw_alloc_mr(), the ENOMEM paths are either failures from kzalloc() or
dma_alloc_coherent(), or failures to allocate adapter resources for MR and PBL
records.  Each MR takes a 32B record in adapter mem, and the PBL takes whatever
based on the max sg depth (roughly sg_depth * 8 + some rounding up).  The
debugfs "stats" file will show us what is being exhausted and how much adapter
mem is available for these resources.

Also, the amount of available adapter mem depends on the type of T5 adapter.
The T5 adapter info should be in the dmesg log when cxgb4 is loaded.

Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 14:09                 ` Steve Wise
@ 2016-06-09 14:22                   ` Steve Wise
  2016-06-09 14:29                     ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-09 14:22 UTC (permalink / raw)


> >
> > >>> Steve, did you see this before? I'm wandering if we need some sort
> > >>> of logic handling with resource limitation in iWARP (global mrs pool...)
> > >>
> > >> Haven't seen this.  Does 'cat /sys/kernel/debug/iw_cxgb4/blah/stats' show
> > >> anything interesting?  Where/why is it crashing?
> > >>
> > >
> > > So this is the failure:
> > >
> > > [  703.239462] rdma_rw_init_mrs: failed to allocated 128 MRs
> > > [  703.239498] failed to init MR pool ret= -12
> > > [  703.239541] nvmet_rdma: failed to create_qp ret= -12
> > > [  703.239582] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue
> > failed
> > > (-12).
> > >
> > > Not sure why it would fail.  I would think my setup would be allocating
more
> > > given I have 16 cores on the host and target.  The debugfs "stats" file I
> > > mentioned above should show us something if we're running out of adapter
> > > resources for MR or PBL records.
> >
> > Note that Marta ran both the host and the target on the same machine.
> > So, 8 (cores) x 128 (queue entries) x 2 (host and target) gives 2048
> > MRs...
> >
> > What is the T5 limitation?
> 
> It varies based on a config file that gets loaded when cxgb4 loads.  Note the
> error has nothing to do with the low fastreg sg depth limit of T5.  If we were
> hitting that then we would be seeing EINVAL and not ENOMEM.  Looking at
> c4iw_alloc_mr(), the ENOMEM paths are either failures from kzalloc() or
> dma_alloc_coherent(), or failures to allocate adapter resources for MR and PBL
> records.  Each MR takes a 32B record in adapter mem, and the PBL takes
whatever
> based on the max sg depth (roughly sg_depth * 8 + some rounding up).  The
> debugfs "stats" file will show us what is being exhausted and how much adapter
> mem is available for these resources.
> 
> Also, the amount of available adapter mem depends on the type of T5 adapter.
> The T5 adapter info should be in the dmesg log when cxgb4 is loaded.
> 
> Steve

Here is an example of the iw_cxgb4 debugfs "stats" output.  This is for a
T580-CR with the "default" configuration, which means there is no config file
named t5-config.txt in /lib/firmware/cxgb4/.

[root at stevo1 linux-2.6]# cat /sys/kernel/debug/iw_cxgb4/0000\:82\:00.4/stats
   Object:      Total    Current        Max       Fail
     PDID:      65536          0          0          0
      QID:      24576          0          0          0
   TPTMEM:   36604800          0          0          0
   PBLMEM:   91512064          0          0          0
   RQTMEM:  128116864          0          0          0
  OCQPMEM:          0          0          0          0
  DB FULL:          0
 DB EMPTY:          0
  DB DROP:          0
 DB State: NORMAL Transitions 0 FC Interruptions 0
TCAM_FULL:          0
ACT_OFLD_CONN_FAILS:          0
PAS_OFLD_CONN_FAILS:          0
NEG_ADV_RCVD:          0
AVAILABLE IRD:     589824

Note it shows the total, currently allocated, max ever allocated, and failures
for each rdma resource, most of which are tied to HW resources.  So if we see
failures, then we know the adapter resources were exhausted.

TPTMEM is the available adapter memory for MR records.  Each record is 32B.  So
a total of 1143900 MRs (TPTMEM / 32) can be created.  The PBLMEM resource is for
holding the dma addresses for all pages in a MR, so each MR uses some number
depending on the sg depth passed in when allocating a FRMR.  So if we allocate
128 deep page lists, we should be able to allocate 89367 PBLs (PBLMEM / 8 /
128).

Seems like we shouldn't be exhausting the adapter resources with 2048 MRs... 

Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 14:22                   ` Steve Wise
@ 2016-06-09 14:29                     ` Steve Wise
  2016-06-09 15:04                       ` Marta Rybczynska
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-09 14:29 UTC (permalink / raw)


> > >
> > > >>> Steve, did you see this before? I'm wandering if we need some sort
> > > >>> of logic handling with resource limitation in iWARP (global mrs
pool...)
> > > >>
> > > >> Haven't seen this.  Does 'cat /sys/kernel/debug/iw_cxgb4/blah/stats'
show
> > > >> anything interesting?  Where/why is it crashing?
> > > >>
> > > >
> > > > So this is the failure:
> > > >
> > > > [  703.239462] rdma_rw_init_mrs: failed to allocated 128 MRs
> > > > [  703.239498] failed to init MR pool ret= -12
> > > > [  703.239541] nvmet_rdma: failed to create_qp ret= -12
> > > > [  703.239582] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue
> > > failed
> > > > (-12).
> > > >
> > > > Not sure why it would fail.  I would think my setup would be allocating
> more
> > > > given I have 16 cores on the host and target.  The debugfs "stats" file
I
> > > > mentioned above should show us something if we're running out of adapter
> > > > resources for MR or PBL records.
> > >
> > > Note that Marta ran both the host and the target on the same machine.
> > > So, 8 (cores) x 128 (queue entries) x 2 (host and target) gives 2048
> > > MRs...
> > >
> > > What is the T5 limitation?
> >
> > It varies based on a config file that gets loaded when cxgb4 loads.  Note
the
> > error has nothing to do with the low fastreg sg depth limit of T5.  If we
were
> > hitting that then we would be seeing EINVAL and not ENOMEM.  Looking at
> > c4iw_alloc_mr(), the ENOMEM paths are either failures from kzalloc() or
> > dma_alloc_coherent(), or failures to allocate adapter resources for MR and
PBL
> > records.  Each MR takes a 32B record in adapter mem, and the PBL takes
> whatever
> > based on the max sg depth (roughly sg_depth * 8 + some rounding up).  The
> > debugfs "stats" file will show us what is being exhausted and how much
adapter
> > mem is available for these resources.
> >
> > Also, the amount of available adapter mem depends on the type of T5 adapter.
> > The T5 adapter info should be in the dmesg log when cxgb4 is loaded.
> >
> > Steve
> 
> Here is an example of the iw_cxgb4 debugfs "stats" output.  This is for a
> T580-CR with the "default" configuration, which means there is no config file
> named t5-config.txt in /lib/firmware/cxgb4/.
> 
> [root at stevo1 linux-2.6]# cat /sys/kernel/debug/iw_cxgb4/0000\:82\:00.4/stats
>    Object:      Total    Current        Max       Fail
>      PDID:      65536          0          0          0
>       QID:      24576          0          0          0
>    TPTMEM:   36604800          0          0          0
>    PBLMEM:   91512064          0          0          0
>    RQTMEM:  128116864          0          0          0
>   OCQPMEM:          0          0          0          0
>   DB FULL:          0
>  DB EMPTY:          0
>   DB DROP:          0
>  DB State: NORMAL Transitions 0 FC Interruptions 0
> TCAM_FULL:          0
> ACT_OFLD_CONN_FAILS:          0
> PAS_OFLD_CONN_FAILS:          0
> NEG_ADV_RCVD:          0
> AVAILABLE IRD:     589824
> 
> Note it shows the total, currently allocated, max ever allocated, and failures
> for each rdma resource, most of which are tied to HW resources.  So if we see
> failures, then we know the adapter resources were exhausted.
> 
> TPTMEM is the available adapter memory for MR records.  Each record is 32B.
So
> a total of 1143900 MRs (TPTMEM / 32) can be created.  The PBLMEM resource is
> for
> holding the dma addresses for all pages in a MR, so each MR uses some number
> depending on the sg depth passed in when allocating a FRMR.  So if we allocate
> 128 deep page lists, we should be able to allocate 89367 PBLs (PBLMEM / 8 /
> 128).
> 
> Seems like we shouldn't be exhausting the adapter resources with 2048 MRs...
> 
> Steve

I don't see this on my 16 core/64GB memory note, I successfully did a
discover/connect-all with the target/host on the same node with 7 target devices
w/o any errors.   Note I'm using the nvmf-all.2 branch Christoph setup up
yesterday.

Marta, I need to learn more about your T5 setup and the "stats" file output.
Thanks!

Steve.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 14:29                     ` Steve Wise
@ 2016-06-09 15:04                       ` Marta Rybczynska
  2016-06-09 15:40                         ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Marta Rybczynska @ 2016-06-09 15:04 UTC (permalink / raw)




----- Le 9 Juin 16, ? 16:29, Steve Wise swise at opengridcomputing.com a ?crit :

>> > >
>> > > >>> Steve, did you see this before? I'm wandering if we need some sort
>> > > >>> of logic handling with resource limitation in iWARP (global mrs
> pool...)
>> > > >>
>> > > >> Haven't seen this.  Does 'cat /sys/kernel/debug/iw_cxgb4/blah/stats'
> show
>> > > >> anything interesting?  Where/why is it crashing?
>> > > >>
>> > > >
>> > > > So this is the failure:
>> > > >
>> > > > [  703.239462] rdma_rw_init_mrs: failed to allocated 128 MRs
>> > > > [  703.239498] failed to init MR pool ret= -12
>> > > > [  703.239541] nvmet_rdma: failed to create_qp ret= -12
>> > > > [  703.239582] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue
>> > > failed
>> > > > (-12).
>> > > >
>> > > > Not sure why it would fail.  I would think my setup would be allocating
>> more
>> > > > given I have 16 cores on the host and target.  The debugfs "stats" file
> I
>> > > > mentioned above should show us something if we're running out of adapter
>> > > > resources for MR or PBL records.
>> > >
>> > > Note that Marta ran both the host and the target on the same machine.
>> > > So, 8 (cores) x 128 (queue entries) x 2 (host and target) gives 2048
>> > > MRs...
>> > >
>> > > What is the T5 limitation?
>> >
>> > It varies based on a config file that gets loaded when cxgb4 loads.  Note
> the
>> > error has nothing to do with the low fastreg sg depth limit of T5.  If we
> were
>> > hitting that then we would be seeing EINVAL and not ENOMEM.  Looking at
>> > c4iw_alloc_mr(), the ENOMEM paths are either failures from kzalloc() or
>> > dma_alloc_coherent(), or failures to allocate adapter resources for MR and
> PBL
>> > records.  Each MR takes a 32B record in adapter mem, and the PBL takes
>> whatever
>> > based on the max sg depth (roughly sg_depth * 8 + some rounding up).  The
>> > debugfs "stats" file will show us what is being exhausted and how much
> adapter
>> > mem is available for these resources.
>> >
>> > Also, the amount of available adapter mem depends on the type of T5 adapter.
>> > The T5 adapter info should be in the dmesg log when cxgb4 is loaded.
>> >
>> > Steve
>> 
>> Here is an example of the iw_cxgb4 debugfs "stats" output.  This is for a
>> T580-CR with the "default" configuration, which means there is no config file
>> named t5-config.txt in /lib/firmware/cxgb4/.
>> 
>> [root at stevo1 linux-2.6]# cat /sys/kernel/debug/iw_cxgb4/0000\:82\:00.4/stats
>>    Object:      Total    Current        Max       Fail
>>      PDID:      65536          0          0          0
>>       QID:      24576          0          0          0
>>    TPTMEM:   36604800          0          0          0
>>    PBLMEM:   91512064          0          0          0
>>    RQTMEM:  128116864          0          0          0
>>   OCQPMEM:          0          0          0          0
>>   DB FULL:          0
>>  DB EMPTY:          0
>>   DB DROP:          0
>>  DB State: NORMAL Transitions 0 FC Interruptions 0
>> TCAM_FULL:          0
>> ACT_OFLD_CONN_FAILS:          0
>> PAS_OFLD_CONN_FAILS:          0
>> NEG_ADV_RCVD:          0
>> AVAILABLE IRD:     589824
>> 
>> Note it shows the total, currently allocated, max ever allocated, and failures
>> for each rdma resource, most of which are tied to HW resources.  So if we see
>> failures, then we know the adapter resources were exhausted.
>> 
>> TPTMEM is the available adapter memory for MR records.  Each record is 32B.
> So
>> a total of 1143900 MRs (TPTMEM / 32) can be created.  The PBLMEM resource is
>> for
>> holding the dma addresses for all pages in a MR, so each MR uses some number
>> depending on the sg depth passed in when allocating a FRMR.  So if we allocate
>> 128 deep page lists, we should be able to allocate 89367 PBLs (PBLMEM / 8 /
>> 128).
>> 
>> Seems like we shouldn't be exhausting the adapter resources with 2048 MRs...
>> 
>> Steve
> 
> I don't see this on my 16 core/64GB memory note, I successfully did a
> discover/connect-all with the target/host on the same node with 7 target devices
> w/o any errors.   Note I'm using the nvmf-all.2 branch Christoph setup up
> yesterday.
> 
> Marta, I need to learn more about your T5 setup and the "stats" file output.
> Thanks!
> 
> Steve.

Steve, It seems to me that there's a PBLMEM exhaustion because my card has less
resources than yours (224 MRs if I repeat your calculations):
# cat /sys/kernel/debug/iw_cxgb4/0000\:09\:00.4/stats
   Object:      Total    Current        Max       Fail
     PDID:      65536          1          2          0
      QID:       1024          0          0          0
   TPTMEM:      91136          0          0          0
   PBLMEM:     227840          0          0          0
   RQTMEM:     318976          0          0          0
  OCQPMEM:          0          0          0          0
  DB FULL:          0
 DB EMPTY:          0
  DB DROP:          0
 DB State: NORMAL Transitions 0 FC Interruptions 0
TCAM_FULL:          0
ACT_OFLD_CONN_FAILS:          0
PAS_OFLD_CONN_FAILS:          0
NEG_ADV_RCVD:          0
AVAILABLE IRD:       1024

Fore the more exact reference, it's: 
[   18.651764] cxgb4 0000:09:00.4 eth1: eth1: Chelsio T580-LP-SO (0000:09:00.4) 40GBASE-R QSFP
[   18.651979] cxgb4 0000:09:00.4 eth2: eth2: Chelsio T580-LP-SO (0000:09:00.4) 40GBASE-R QSFP
[   18.652025] cxgb4 0000:09:00.4: Chelsio T580-LP-SO rev 0

No config file in the firmware directory.

-- 

Marta Rybczynska 

Phone : +33 6 71 09 68 03 
mrybczyn at kalray.eu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 13:24 ` Christoph Hellwig
@ 2016-06-09 15:37   ` Marta Rybczynska
  2016-06-09 20:25     ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Marta Rybczynska @ 2016-06-09 15:37 UTC (permalink / raw)



----- Le 9 Juin 16, ? 15:24, Christoph Hellwig hch at infradead.org a ?crit :

> On Thu, Jun 09, 2016@11:18:03AM +0200, Marta Rybczynska wrote:
>> Hello,
>> I'm testing the nvme-fabrics patchset and I get a kernel stall or errors when
>> running
>> nvme connect-all. Below you have the commands and kernel log I get when it
>> outputs
>> errors. I'm going to debug it further today.
>> 
>> The commands I run:
>> 
>> ./nvme discover -t rdma -a 10.0.0.3
>> Discovery Log Number of Records 1, Generation counter 1
>> =====Discovery Log Entry 0======
>> trtype:  ipv4
>> adrfam:  rdma
>> nqntype: 2
>> treq:    0
>> portid:  2
>> trsvcid: 4420
>> subnqn:  testnqn
>> traddr:  10.0.0.3
>> rdma_prtype: 0
>> rdma_qptype: 0
>> rdma_cms:    0
>> rdma_pkey: 0x0000
>> 
>> ./nvme connect -t rdma -n testnqn -a 10.0.0.3
>> Failed to write to /dev/nvme-fabrics: Connection reset by peer
>> 
>> ./nvme connect-all -t rdma  -a 10.0.0.3
>> <here the kernel crashes>
>> 
>> In the kernel log I have:
>> [  591.484708] nvmet_rdma: enabling port 2 (10.0.0.3:4420)
>> [  656.778004] nvmet: creating controller 1 for NQN
>> nqn.2014-08.org.nvmexpress:NVMf:uuid:a2e92078-7f9f-4b19-bb4f-4250599bdb14.
>> [  656.778255] nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery",
>> addr 10.0.0.3:4420
>> [  656.778573] nvmet_rdma: freeing queue 0
>> [  703.195100] nvmet: creating controller 1 for NQN
>> nqn.2014-08.org.nvmexpress:NVMf:uuid:a2e92078-7f9f-4b19-bb4f-4250599bdb14.
>> [  703.195339] nvme nvme1: creating 8 I/O queues.
>> [  703.239462] rdma_rw_init_mrs: failed to allocated 128 MRs
>> [  703.239498] failed to init MR pool ret= -12
>> [  703.239541] nvmet_rdma: failed to create_qp ret= -12
>> [  703.239582] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed
>> (-12).
> 
> To get things working you should try a smaller queue size.  We actually
> have an option for this in the kernel, but nvme-cli doesn't expose
> it yet, so feel free to hardcode it.
> 
> Of course we've still got a real bug in the error handling..

I've set
+       queue->recv_queue_size = 32; //le16_to_cpu(req->hsqsize);
+       queue->send_queue_size = 32; //le16_to_cpu(req->hrqsize);
And it doesn't crash anymore. I get errors without crashes if I try to
connect again (what seems correct to me).

-- 

Marta Rybczynska 

Phone : +33 6 71 09 68 03 
mrybczyn at kalray.eu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 15:04                       ` Marta Rybczynska
@ 2016-06-09 15:40                         ` Steve Wise
  2016-06-09 15:48                           ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-09 15:40 UTC (permalink / raw)


> > I don't see this on my 16 core/64GB memory note, I successfully did a
> > discover/connect-all with the target/host on the same node with 7 target devices
> > w/o any errors.   Note I'm using the nvmf-all.2 branch Christoph setup up
> > yesterday.
> >
> > Marta, I need to learn more about your T5 setup and the "stats" file output.
> > Thanks!
> >
> > Steve.
> 
> Steve, It seems to me that there's a PBLMEM exhaustion because my card has less
> resources than yours (224 MRs if I repeat your calculations):
> # cat /sys/kernel/debug/iw_cxgb4/0000\:09\:00.4/stats
>    Object:      Total    Current        Max       Fail
>      PDID:      65536          1          2          0
>       QID:       1024          0          0          0
>    TPTMEM:      91136          0          0          0
>    PBLMEM:     227840          0          0          0
>    RQTMEM:     318976          0          0          0
>   OCQPMEM:          0          0          0          0
>   DB FULL:          0
>  DB EMPTY:          0
>   DB DROP:          0
>  DB State: NORMAL Transitions 0 FC Interruptions 0
> TCAM_FULL:          0
> ACT_OFLD_CONN_FAILS:          0
> PAS_OFLD_CONN_FAILS:          0
> NEG_ADV_RCVD:          0
> AVAILABLE IRD:       1024
> 
> Fore the more exact reference, it's:
> [   18.651764] cxgb4 0000:09:00.4 eth1: eth1: Chelsio T580-LP-SO (0000:09:00.4)
> 40GBASE-R QSFP
> [   18.651979] cxgb4 0000:09:00.4 eth2: eth2: Chelsio T580-LP-SO (0000:09:00.4)
> 40GBASE-R QSFP
> [   18.652025] cxgb4 0000:09:00.4: Chelsio T580-LP-SO rev 0
> 
> No config file in the firmware directory.
> 


Thanks Marta.  That card has less memory than the T580-CR.  I'm checking with Chelsio on the details.  The "-SO" might mean a mem-free card.   

Also, can you email me the output of 'cat /sys/kernel/debug/cxgb4/blah/meminfo'?

So to make it work given the adapter resources, you need to make the queues shallower and have less of them.  If I can get you a config file that increases the available rdma memory, I'll send it to you.  But perhaps this card is just a low/no memory card more tailored for NIC only vs RDMA. (I'll confirm this soon).

Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 15:40                         ` Steve Wise
@ 2016-06-09 15:48                           ` Steve Wise
  2016-06-10  9:03                             ` Marta Rybczynska
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-09 15:48 UTC (permalink / raw)



> 
> 
> Thanks Marta.  That card has less memory than the T580-CR.  I'm checking with
> Chelsio on the details.  The "-SO" might mean a mem-free card.
> 
> Also, can you email me the output of 'cat
/sys/kernel/debug/cxgb4/blah/meminfo'?
> 
> So to make it work given the adapter resources, you need to make the queues
> shallower and have less of them.  If I can get you a config file that
increases the
> available rdma memory, I'll send it to you.  But perhaps this card is just a
low/no
> memory card more tailored for NIC only vs RDMA. (I'll confirm this soon).

Yes, the -SO is a mem-free card, so very little rdma resources are available.
You need a non "-SO" card for more rdma resources.

Steve.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 15:37   ` Marta Rybczynska
@ 2016-06-09 20:25     ` Steve Wise
  2016-06-09 20:35       ` Ming Lin
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-09 20:25 UTC (permalink / raw)


> >
> > To get things working you should try a smaller queue size.  We actually
> > have an option for this in the kernel, but nvme-cli doesn't expose
> > it yet, so feel free to hardcode it.
> >
> > Of course we've still got a real bug in the error handling..
> 
> I've set
> +       queue->recv_queue_size = 32; //le16_to_cpu(req->hsqsize);
> +       queue->send_queue_size = 32; //le16_to_cpu(req->hrqsize);
> And it doesn't crash anymore. I get errors without crashes if I try to
> connect again (what seems correct to me).

I can force a crash with this patch:

diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 55d0651..bbc1422 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -619,6 +619,10 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
        u32 stag = 0;
        int ret = 0;
        int length = roundup(max_num_sg * sizeof(u64), 32);
+       static int foo;
+
+       if (foo++ > 200)
+               return ERR_PTR(-ENOMEM);

        php = to_c4iw_pd(pd);
        rhp = php->rhp;


Crash:

rdma_rw_init_mrs: failed to allocated 128 MRs
failed to init MR pool ret= -12
nvmet_rdma: failed to create_qp ret= -12
nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
nvme nvme1: Connect rejected, no private data.
nvme nvme1: rdma_resolve_addr wait failed (-104).
nvme nvme1: failed to initialize i/o queue: -104
nvmet_rdma: freeing queue 17
general protection fault: 0000 [#1] SMP
Modules linked in: nvme_rdma nvme_fabrics iw_cxgb4(E) rdma_ucm cxgb4 nvmet_rdma rdma_cm iw_cm nvmet null_blk configfs ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q garp stp llc ipmi_devintf cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en ib_mthca dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_intel kvm irqbypass uinput mlx4_ib ib_core ipv6 iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr mlx4_core dm_mod sg i2c_i801 lpc_ich mfd_core nvme nvme_core acpi_cpufreq ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core wmi ext4(E) mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) [last unloaded: iw_cxgb4]
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E   4.7.0-rc2-nvme-fabrics+rxe+ #71
Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
task: ffff88107844c2c0 ti: ffff881078450000 task.ti: ffff881078450000
RIP: 0010:[<ffffffff810d04c3>]  [<ffffffff810d04c3>] get_next_timer_interrupt+0x183/0x210
RSP: 0018:ffff88107f243e68  EFLAGS: 00010002
RAX: 00000000fffe39b8 RBX: 0000000000000001 RCX: 00000000fffe39b8
RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000039 RDI: 0000000000000036
RBP: ffff88107f243eb8 R08: ffff88107f24f488 R09: 0000000000fffe36
R10: ffff88107f243e70 R11: ffff88107f243e88 R12: 0000002a89f289c0
R13: 00000000fffe35d0 R14: ffff88107f24ec40 R15: 0000000000000040
FS:  0000000000000000(0000) GS:ffff88107f240000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 000000103af92000 CR4: 00000000000406e0
Stack:
 ffff88107f24f488 ffff88107f24f688 ffff88107f24f888 ffff88107f24fa88
 ffff88107ec39698 ffff88107f250180 00000000fffe35d0 ffff88107f24c700
 0000002a89f30293 0000002a89f289c0 ffff88107f243f38 ffffffff810e2ac4
Call Trace:
 <IRQ>
 [<ffffffff810e2ac4>] tick_nohz_stop_sched_tick+0x1b4/0x2c0
 [<ffffffff810986a5>] ? sched_clock_cpu+0xc5/0xd0
 [<ffffffff810e2c73>] __tick_nohz_idle_enter+0xa3/0x140
 [<ffffffff810e2d38>] tick_nohz_irq_exit+0x28/0x40
 [<ffffffff8106c0a5>] irq_exit+0x95/0xb0
 [<ffffffff81642c76>] smp_apic_timer_interrupt+0x46/0x60
 [<ffffffff8164134f>] apic_timer_interrupt+0x7f/0x90
 <EOI>
 [<ffffffff810a7d2a>] ? cpu_idle_loop+0xda/0x250
 [<ffffffff810a7e13>] ? cpu_idle_loop+0x1c3/0x250
 [<ffffffff810a7ec1>] cpu_startup_entry+0x21/0x30
 [<ffffffff81044ce8>] start_secondary+0x78/0x80
Code: 89 45 b0 48 89 45 c0 49 8d 86 48 0e 00 00 48 89 45 c8 44 89 cf 83 e7 3f 89 fe 48 63 c6 49 8b 14 c0 48 85 d2 75 05 eb 27 48 89 c1 <f6> 42 2a 10 48 89 c8 75 10 48 8b 42 10 bb 01 00 00 00 48 39 c8
RIP  [<ffffffff810d04c3>] get_next_timer_interrupt+0x183/0x210
 RSP <ffff88107f243e68>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 20:25     ` Steve Wise
@ 2016-06-09 20:35       ` Ming Lin
  2016-06-09 21:06         ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Ming Lin @ 2016-06-09 20:35 UTC (permalink / raw)


On Thu, Jun 9, 2016@1:25 PM, Steve Wise <swise@opengridcomputing.com> wrote:
>> >
>> > To get things working you should try a smaller queue size.  We actually
>> > have an option for this in the kernel, but nvme-cli doesn't expose
>> > it yet, so feel free to hardcode it.
>> >
>> > Of course we've still got a real bug in the error handling..
>>
>> I've set
>> +       queue->recv_queue_size = 32; //le16_to_cpu(req->hsqsize);
>> +       queue->send_queue_size = 32; //le16_to_cpu(req->hrqsize);
>> And it doesn't crash anymore. I get errors without crashes if I try to
>> connect again (what seems correct to me).
>
> I can force a crash with this patch:
>
> diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
> index 55d0651..bbc1422 100644
> --- a/drivers/infiniband/hw/cxgb4/mem.c
> +++ b/drivers/infiniband/hw/cxgb4/mem.c
> @@ -619,6 +619,10 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
>         u32 stag = 0;
>         int ret = 0;
>         int length = roundup(max_num_sg * sizeof(u64), 32);
> +       static int foo;
> +
> +       if (foo++ > 200)
> +               return ERR_PTR(-ENOMEM);
>
>         php = to_c4iw_pd(pd);
>         rhp = php->rhp;
>
>
> Crash:
>
> rdma_rw_init_mrs: failed to allocated 128 MRs
> failed to init MR pool ret= -12
> nvmet_rdma: failed to create_qp ret= -12
> nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
> nvme nvme1: Connect rejected, no private data.
> nvme nvme1: rdma_resolve_addr wait failed (-104).
> nvme nvme1: failed to initialize i/o queue: -104
> nvmet_rdma: freeing queue 17
> general protection fault: 0000 [#1] SMP

> RIP: 0010:[<ffffffff810d04c3>]  [<ffffffff810d04c3>] get_next_timer_interrupt+0x183/0x210
> RSP: 0018:ffff88107f243e68  EFLAGS: 00010002
> RAX: 00000000fffe39b8 RBX: 0000000000000001 RCX: 00000000fffe39b8
> RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000039 RDI: 0000000000000036
> RBP: ffff88107f243eb8 R08: ffff88107f24f488 R09: 0000000000fffe36
> R10: ffff88107f243e70 R11: ffff88107f243e88 R12: 0000002a89f289c0
> R13: 00000000fffe35d0 R14: ffff88107f24ec40 R15: 0000000000000040
> FS:  0000000000000000(0000) GS:ffff88107f240000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff600400 CR3: 000000103af92000 CR4: 00000000000406e0
> Stack:
>  ffff88107f24f488 ffff88107f24f688 ffff88107f24f888 ffff88107f24fa88
>  ffff88107ec39698 ffff88107f250180 00000000fffe35d0 ffff88107f24c700
>  0000002a89f30293 0000002a89f289c0 ffff88107f243f38 ffffffff810e2ac4
> Call Trace:
>  <IRQ>
>  [<ffffffff810e2ac4>] tick_nohz_stop_sched_tick+0x1b4/0x2c0
>  [<ffffffff810986a5>] ? sched_clock_cpu+0xc5/0xd0
>  [<ffffffff810e2c73>] __tick_nohz_idle_enter+0xa3/0x140
>  [<ffffffff810e2d38>] tick_nohz_irq_exit+0x28/0x40
>  [<ffffffff8106c0a5>] irq_exit+0x95/0xb0
>  [<ffffffff81642c76>] smp_apic_timer_interrupt+0x46/0x60
>  [<ffffffff8164134f>] apic_timer_interrupt+0x7f/0x90
>  <EOI>
>  [<ffffffff810a7d2a>] ? cpu_idle_loop+0xda/0x250
>  [<ffffffff810a7e13>] ? cpu_idle_loop+0x1c3/0x250
>  [<ffffffff810a7ec1>] cpu_startup_entry+0x21/0x30
>  [<ffffffff81044ce8>] start_secondary+0x78/0x80

The stack looks weird. Nothing nvme code related.
I guess it is a random crash.

Could you do it again and will you see a different call stack?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 20:35       ` Ming Lin
@ 2016-06-09 21:06         ` Steve Wise
  2016-06-09 22:26           ` Ming Lin
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-09 21:06 UTC (permalink / raw)


> >
> > I can force a crash with this patch:
> >
> > diff --git a/drivers/infiniband/hw/cxgb4/mem.c
> b/drivers/infiniband/hw/cxgb4/mem.c
> > index 55d0651..bbc1422 100644
> > --- a/drivers/infiniband/hw/cxgb4/mem.c
> > +++ b/drivers/infiniband/hw/cxgb4/mem.c
> > @@ -619,6 +619,10 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
> >         u32 stag = 0;
> >         int ret = 0;
> >         int length = roundup(max_num_sg * sizeof(u64), 32);
> > +       static int foo;
> > +
> > +       if (foo++ > 200)
> > +               return ERR_PTR(-ENOMEM);
> >
> >         php = to_c4iw_pd(pd);
> >         rhp = php->rhp;
> >
> >
> > Crash:
> >
> > rdma_rw_init_mrs: failed to allocated 128 MRs
> > failed to init MR pool ret= -12
> > nvmet_rdma: failed to create_qp ret= -12
> > nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
> > nvme nvme1: Connect rejected, no private data.
> > nvme nvme1: rdma_resolve_addr wait failed (-104).
> > nvme nvme1: failed to initialize i/o queue: -104
> > nvmet_rdma: freeing queue 17
> > general protection fault: 0000 [#1] SMP
> 
> > RIP: 0010:[<ffffffff810d04c3>]  [<ffffffff810d04c3>]
> get_next_timer_interrupt+0x183/0x210
> > RSP: 0018:ffff88107f243e68  EFLAGS: 00010002
> > RAX: 00000000fffe39b8 RBX: 0000000000000001 RCX: 00000000fffe39b8
> > RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000039 RDI: 0000000000000036
> > RBP: ffff88107f243eb8 R08: ffff88107f24f488 R09: 0000000000fffe36
> > R10: ffff88107f243e70 R11: ffff88107f243e88 R12: 0000002a89f289c0
> > R13: 00000000fffe35d0 R14: ffff88107f24ec40 R15: 0000000000000040
> > FS:  0000000000000000(0000) GS:ffff88107f240000(0000)
> knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: ffffffffff600400 CR3: 000000103af92000 CR4: 00000000000406e0
> > Stack:
> >  ffff88107f24f488 ffff88107f24f688 ffff88107f24f888 ffff88107f24fa88
> >  ffff88107ec39698 ffff88107f250180 00000000fffe35d0 ffff88107f24c700
> >  0000002a89f30293 0000002a89f289c0 ffff88107f243f38 ffffffff810e2ac4
> > Call Trace:
> >  <IRQ>
> >  [<ffffffff810e2ac4>] tick_nohz_stop_sched_tick+0x1b4/0x2c0
> >  [<ffffffff810986a5>] ? sched_clock_cpu+0xc5/0xd0
> >  [<ffffffff810e2c73>] __tick_nohz_idle_enter+0xa3/0x140
> >  [<ffffffff810e2d38>] tick_nohz_irq_exit+0x28/0x40
> >  [<ffffffff8106c0a5>] irq_exit+0x95/0xb0
> >  [<ffffffff81642c76>] smp_apic_timer_interrupt+0x46/0x60
> >  [<ffffffff8164134f>] apic_timer_interrupt+0x7f/0x90
> >  <EOI>
> >  [<ffffffff810a7d2a>] ? cpu_idle_loop+0xda/0x250
> >  [<ffffffff810a7e13>] ? cpu_idle_loop+0x1c3/0x250
> >  [<ffffffff810a7ec1>] cpu_startup_entry+0x21/0x30
> >  [<ffffffff81044ce8>] start_secondary+0x78/0x80
> 
> The stack looks weird. Nothing nvme code related.
> I guess it is a random crash.
> 
> Could you do it again and will you see a different call stack?

Yes, I get the same crash after reproducing it twice.  At least the RIP is exactly the same:

get_next_timer_interrupt+0x183/0x210

The rest of the stack looked a little different but still had tick_nohz stuff in it.

Does this look correct ("freeing queue 17" twice)?

nvmet: creating controller 1 for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:6e01fbc9-49fb-4998-9522-df85a95f9ff7.
nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.1.14:4420
nvmet_rdma: freeing queue 17
nvmet: creating controller 1 for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:6e01fbc9-49fb-4998-9522-df85a95f9ff7.
nvme nvme1: creating 16 I/O queues.
rdma_rw_init_mrs: failed to allocated 128 MRs
failed to init MR pool ret= -12
nvmet_rdma: failed to create_qp ret= -12
nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
nvme nvme1: Connect rejected, no private data.
nvme nvme1: rdma_resolve_addr wait failed (-104).
nvme nvme1: failed to initialize i/o queue: -104
nvmet_rdma: freeing queue 17
general protection fault: 0000 [#1] SMP

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 21:06         ` Steve Wise
@ 2016-06-09 22:26           ` Ming Lin
  2016-06-09 22:40             ` Steve Wise
       [not found]             ` <055801d1c29f$e164c000$a42e4000$@opengridcomputing.com>
  0 siblings, 2 replies; 37+ messages in thread
From: Ming Lin @ 2016-06-09 22:26 UTC (permalink / raw)


On Thu, Jun 9, 2016@2:06 PM, Steve Wise <swise@opengridcomputing.com> wrote:

> Yes, I get the same crash after reproducing it twice.  At least the RIP is exactly the same:
>
> get_next_timer_interrupt+0x183/0x210
>
> The rest of the stack looked a little different but still had tick_nohz stuff in it.
>
> Does this look correct ("freeing queue 17" twice)?
>
> nvmet: creating controller 1 for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:6e01fbc9-49fb-4998-9522-df85a95f9ff7.
> nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.1.14:4420
> nvmet_rdma: freeing queue 17
> nvmet: creating controller 1 for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:6e01fbc9-49fb-4998-9522-df85a95f9ff7.
> nvme nvme1: creating 16 I/O queues.
> rdma_rw_init_mrs: failed to allocated 128 MRs
> failed to init MR pool ret= -12
> nvmet_rdma: failed to create_qp ret= -12
> nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
> nvme nvme1: Connect rejected, no private data.
> nvme nvme1: rdma_resolve_addr wait failed (-104).
> nvme nvme1: failed to initialize i/o queue: -104
> nvmet_rdma: freeing queue 17
> general protection fault: 0000 [#1] SMP

I'll get a Chelsio card to try.

What's the step to reproduce?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 22:26           ` Ming Lin
@ 2016-06-09 22:40             ` Steve Wise
       [not found]             ` <055801d1c29f$e164c000$a42e4000$@opengridcomputing.com>
  1 sibling, 0 replies; 37+ messages in thread
From: Steve Wise @ 2016-06-09 22:40 UTC (permalink / raw)




> -----Original Message-----
> From: Ming Lin [mailto:mlin at kernel.org]
> Sent: Thursday, June 9, 2016 5:26 PM
> To: Steve Wise <swise at opengridcomputing.com>
> Cc: keith busch <keith.busch at intel.com>; ming l <ming.l at ssi.samsung.com>;
> Sagi Grimberg <sagi at grimberg.me>; Marta Rybczynska
> <mrybczyn at kalray.eu>; Jens Axboe <axboe at fb.com>; linux-
> nvme at lists.infradead.org; Christoph Hellwig <hch at infradead.org>; james p
> freyensee <james.p.freyensee at intel.com>; armenx baloyan
> <armenx.baloyan at intel.com>
> Subject: Re: nvme-fabrics: crash at nvme connect-all
> 
> On Thu, Jun 9, 2016 at 2:06 PM, Steve Wise
> <swise@opengridcomputing.com> wrote:
> 
> > Yes, I get the same crash after reproducing it twice.  At least the RIP is
> exactly the same:
> >
> > get_next_timer_interrupt+0x183/0x210
> >
> > The rest of the stack looked a little different but still had tick_nohz stuff in
> it.
> >
> > Does this look correct ("freeing queue 17" twice)?
> >
> > nvmet: creating controller 1 for NQN nqn.2014-
> 08.org.nvmexpress:NVMf:uuid:6e01fbc9-49fb-4998-9522-df85a95f9ff7.
> > nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery",
> addr 10.0.1.14:4420
> > nvmet_rdma: freeing queue 17
> > nvmet: creating controller 1 for NQN nqn.2014-
> 08.org.nvmexpress:NVMf:uuid:6e01fbc9-49fb-4998-9522-df85a95f9ff7.
> > nvme nvme1: creating 16 I/O queues.
> > rdma_rw_init_mrs: failed to allocated 128 MRs
> > failed to init MR pool ret= -12
> > nvmet_rdma: failed to create_qp ret= -12
> > nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
> > nvme nvme1: Connect rejected, no private data.
> > nvme nvme1: rdma_resolve_addr wait failed (-104).
> > nvme nvme1: failed to initialize i/o queue: -104
> > nvmet_rdma: freeing queue 17
> > general protection fault: 0000 [#1] SMP
> 
> I'll get a Chelsio card to try.
> 
> What's the step to reproduce?

Add the hack into iw_cxgb4 to force alloc_mr failures after 200 allocations (or whatever value you need to make it happen).  Then on the same machine, export a target device, load nvme-rdma and discover/connect to that target device with nvme.  It will crash.

Unfortunately, with the 4.7-rc2 base I'm using, I get no vmcore dump.  I'm not sure why...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-09 15:48                           ` Steve Wise
@ 2016-06-10  9:03                             ` Marta Rybczynska
  2016-06-10 13:40                               ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Marta Rybczynska @ 2016-06-10  9:03 UTC (permalink / raw)


----- Le 9 Juin 16, ? 17:48, Steve Wise swise at opengridcomputing.com a ?crit :

>> 
>> 
>> Thanks Marta.  That card has less memory than the T580-CR.  I'm checking with
>> Chelsio on the details.  The "-SO" might mean a mem-free card.
>> 
>> Also, can you email me the output of 'cat
> /sys/kernel/debug/cxgb4/blah/meminfo'?

Here it is:
# cat /sys/kernel/debug/cxgb4/0000\:09\:00.4/meminfo 
EDC0:           0x0-0x2fffff [3.00 MiB]
EDC1:           0x300000-0x5fffff [3.00 MiB]

RQUDP region:   0xffffffff-0xfffffffe [0 B]
DBQ contexts:   0x27fa80-0x28ba7f [48.0 KiB]
IMSG contexts:  0x28ba80-0x297a7f [48.0 KiB]
FLM cache:      0x297a80-0x2d9a7f [264 KiB]
ULPTX state:    0x2d9a80-0x2d9e3f [960 B]
ULPRX state:    0x2d9e40-0x2d9e7f [64.0 B]
Timers:         0x2d9e80-0x2e7e7f [56.0 KiB]
TCBs:           0x2e7e80-0x327fff [256 KiB]
Tx payload:     0x328000-0x3e7fff [768 KiB]
Rx payload:     0x3e8000-0x507fff [1.13 MiB]
Pstructs:       0x508000-0x50afff [12.0 KiB]
Rx FL:          0x50b000-0x50b0bf [192 B]
Tx FL:          0x50b0c0-0x50b13f [128 B]
Pstruct FL:     0x50b140-0x50b33f [512 B]
TDDP region:    0x50b340-0x537b3f [178 KiB]
iSCSI region:   0x537b40-0x557b3f [128 KiB]
TPT region:     0x557b40-0x56df3f [89.0 KiB]
STAG region:    0x557b40-0x56df3f [89.0 KiB]
TXPBL region:   0x56df40-0x5a593f [223 KiB]
PBL region:     0x56df40-0x5a593f [223 KiB]
RQ region:      0x5a5940-0x5f373f [312 KiB]

uP RAM:         0x0-0xffffffff [4.00 GiB]
uP Extmem2:     0x0-0xffffffff [4.00 GiB]

72 Rx pages of size 16KiB for 1 channels
48 Tx pages of size 16KiB for 2 channels
192 p-structs

Port 0 using 2 pages out of 432 allocated
Port 1 using 2 pages out of 432 allocated
Port 2 using 2 pages out of 432 allocated
Port 3 using 2 pages out of 432 allocated
Loopback 0 using 0 pages out of 144 allocated
Loopback 1 using 0 pages out of 144 allocated
Loopback 2 using 0 pages out of 144 allocated
Loopback 3 using 0 pages out of 144 allocated


Marta

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-10  9:03                             ` Marta Rybczynska
@ 2016-06-10 13:40                               ` Steve Wise
  2016-06-10 13:42                                 ` Marta Rybczynska
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-10 13:40 UTC (permalink / raw)


> >> Thanks Marta.  That card has less memory than the T580-CR.  I'm checking with
> >> Chelsio on the details.  The "-SO" might mean a mem-free card.
> >>
> >> Also, can you email me the output of 'cat
> > /sys/kernel/debug/cxgb4/blah/meminfo'?
> 
> Here it is:
> # cat /sys/kernel/debug/cxgb4/0000\:09\:00.4/meminfo
> EDC0:           0x0-0x2fffff [3.00 MiB]
> EDC1:           0x300000-0x5fffff [3.00 MiB]


The "-SO" cards have no external off-chip memory, as seen above.  If it did have off-chip memory there would be a "MC0" and possibly "MC1" lines above with a large amount of memory.    Like this (T580-CR):

MC0:            0x600000-0x405fffff [1.00 GiB]
MC1:            0x40600000-0x805fffff [1.00 GiB]

So your -SO cards will only handle a very small amount of RDMA load.


Steve.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-10 13:40                               ` Steve Wise
@ 2016-06-10 13:42                                 ` Marta Rybczynska
  2016-06-10 13:49                                   ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Marta Rybczynska @ 2016-06-10 13:42 UTC (permalink / raw)




----- Le 10 Juin 16, ? 15:40, Steve Wise swise at opengridcomputing.com a ?crit :

>> >> Thanks Marta.  That card has less memory than the T580-CR.  I'm checking with
>> >> Chelsio on the details.  The "-SO" might mean a mem-free card.
>> >>
>> >> Also, can you email me the output of 'cat
>> > /sys/kernel/debug/cxgb4/blah/meminfo'?
>> 
>> Here it is:
>> # cat /sys/kernel/debug/cxgb4/0000\:09\:00.4/meminfo
>> EDC0:           0x0-0x2fffff [3.00 MiB]
>> EDC1:           0x300000-0x5fffff [3.00 MiB]
> 
> 
> The "-SO" cards have no external off-chip memory, as seen above.  If it did have
> off-chip memory there would be a "MC0" and possibly "MC1" lines above with a
> large amount of memory.    Like this (T580-CR):
> 
> MC0:            0x600000-0x405fffff [1.00 GiB]
> MC1:            0x40600000-0x805fffff [1.00 GiB]
> 
> So your -SO cards will only handle a very small amount of RDMA load.
> 

Thanks Steve for looking up into this. It's a good testcase then :)

Marta

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-10 13:42                                 ` Marta Rybczynska
@ 2016-06-10 13:49                                   ` Steve Wise
  0 siblings, 0 replies; 37+ messages in thread
From: Steve Wise @ 2016-06-10 13:49 UTC (permalink / raw)


> >> Here it is:
> >> # cat /sys/kernel/debug/cxgb4/0000\:09\:00.4/meminfo
> >> EDC0:           0x0-0x2fffff [3.00 MiB]
> >> EDC1:           0x300000-0x5fffff [3.00 MiB]
> >
> >
> > The "-SO" cards have no external off-chip memory, as seen above.  If it did have
> > off-chip memory there would be a "MC0" and possibly "MC1" lines above with a
> > large amount of memory.    Like this (T580-CR):
> >
> > MC0:            0x600000-0x405fffff [1.00 GiB]
> > MC1:            0x40600000-0x805fffff [1.00 GiB]
> >
> > So your -SO cards will only handle a very small amount of RDMA load.
> >
> 
> Thanks Steve for looking up into this. It's a good testcase then :)
> 
> Marta

Indeed!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
       [not found]             ` <055801d1c29f$e164c000$a42e4000$@opengridcomputing.com>
@ 2016-06-10 15:11               ` Steve Wise
  2016-06-10 16:22                 ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-10 15:11 UTC (permalink / raw)


> > What's the step to reproduce?
> 
> Add the hack into iw_cxgb4 to force alloc_mr failures after 200 allocations
> (or whatever value you need to make it happen).  Then on the same machine,
> export a target device, load nvme-rdma and discover/connect to that target
> device with nvme.  It will crash.
> 
> Unfortunately, with the 4.7-rc2 base I'm using, I get no vmcore dump.  I'm
> not sure why...
> 

Previously I was using Doug's rdma rxe branch + sagi's rxe fixes + rebased on nvmf-all.2.   To simplify, I have now gone to just straight nvmf-all.2.  Also, I separated the host and target to different nodes and reproduced the problem.  It?s the host side that is crashing.  Same GPF with RIP:

RIP: 0010:[<ffffffff810d04c3>]  [<ffffffff810d04c3>] get_next_timer_interrupt+0x183/0x210

Steve.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-10 15:11               ` Steve Wise
@ 2016-06-10 16:22                 ` Steve Wise
  2016-06-10 18:43                   ` Ming Lin
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-10 16:22 UTC (permalink / raw)


> > Add the hack into iw_cxgb4 to force alloc_mr failures after 200 allocations
> > (or whatever value you need to make it happen).  Then on the same machine,
> > export a target device, load nvme-rdma and discover/connect to that target
> > device with nvme.  It will crash.
> >
> > Unfortunately, with the 4.7-rc2 base I'm using, I get no vmcore dump.  I'm
> > not sure why...
> >
> 
> Previously I was using Doug's rdma rxe branch + sagi's rxe fixes + rebased on nvmf-
> all.2.   To simplify, I have now gone to just straight nvmf-all.2.  Also, I separated the
> host and target to different nodes and reproduced the problem.  It?s the host side
> that is crashing.  Same GPF with RIP:
> 
> RIP: 0010:[<ffffffff810d04c3>]  [<ffffffff810d04c3>]
> get_next_timer_interrupt+0x183/0x210
> 
> Steve.

I enabled lots of kernel memory debugging and now hit this.  Perhaps a clue?  Freeing an active timer list widget?

nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.1.14:4420
nvme nvme1: creating 16 I/O queues.
nvme nvme1: Connect rejected, no private data.
nvme nvme1: rdma_resolve_addr wait failed (-104).
nvme nvme1: failed to initialize i/o queue: -104
------------[ cut here ]------------
WARNING: CPU: 1 PID: 10440 at lib/debugobjects.c:263 debug_print_object+0x8e/0xb0
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
Modules linked in: nvme_rdma nvme_fabrics rdma_ucm rdma_cm iw_cm configfs iw_cxgb4 cxgb4 ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q garp stp llc cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en ib_mthca dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm irqbypass uinput iTCO_wdt iTCO_vendor_support pcspkr mlx4_ib ib_core ipv6 mlx4_core dm_mod sg lpc_ich mfd_core i2c_i801 nvme nvme_core igb dca ptp pps_core acpi_cpufreq ext4(E) mbcache(E) jbd2(E) sd_mod(E) nouveau(E) ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E) i2c_core(E) mxm_wmi(E) video(E) ahci(E) libahci(E) wmi(E) [last unloaded: cxgb4]
CPU: 1 PID: 10440 Comm: nvme Tainted: G            E   4.7.0-rc2-nvmf-all.2+ #42
Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
 0000000000000000 ffff881027a13a18 ffffffff812f032d ffffffff8130e65e
 ffff881027a13a78 ffff881027a13a78 0000000000000000 ffff881027a13a68
 ffffffff8106694d 0000031800000001 000001072aad7ce8 dead000000000200
Call Trace:
 [<ffffffff812f032d>] dump_stack+0x51/0x74
 [<ffffffff8130e65e>] ? debug_print_object+0x8e/0xb0
 [<ffffffff8106694d>] __warn+0xfd/0x120
 [<ffffffff81066a29>] warn_slowpath_fmt+0x49/0x50
 [<ffffffff81182d72>] ? kfree_const+0x22/0x30
 [<ffffffff8130e65e>] debug_print_object+0x8e/0xb0
 [<ffffffff81080850>] ? __queue_work+0x520/0x520
 [<ffffffff8130ecbe>] __debug_check_no_obj_freed+0x1ee/0x270
 [<ffffffff8130ed57>] debug_check_no_obj_freed+0x17/0x20
 [<ffffffff811c3aac>] kfree+0x9c/0x120
 [<ffffffff81182d72>] ? kfree_const+0x22/0x30
 [<ffffffff812f2f3c>] ? kobject_cleanup+0x9c/0x1b0
 [<ffffffffa04cc696>] nvme_rdma_free_ctrl+0xa6/0xc0 [nvme_rdma]
 [<ffffffffa06fcc36>] nvme_free_ctrl+0x46/0x60 [nvme_core]
 [<ffffffffa06feb2b>] nvme_put_ctrl+0x1b/0x20 [nvme_core]
 [<ffffffffa04cf1a2>] nvme_rdma_create_ctrl+0x412/0x4f0 [nvme_rdma]
 [<ffffffffa04c5d02>] nvmf_create_ctrl+0x182/0x210 [nvme_fabrics]
 [<ffffffffa04c5e3c>] nvmf_dev_write+0xac/0x110 [nvme_fabrics]
 [<ffffffff811d9c24>] __vfs_write+0x34/0x120
 [<ffffffff81002515>] ? trace_event_raw_event_sys_enter+0xb5/0x130
 [<ffffffff811d9dc9>] vfs_write+0xb9/0x130
 [<ffffffff811f9592>] ? __fdget_pos+0x12/0x50
 [<ffffffff811da9b9>] SyS_write+0x59/0xc0
 [<ffffffff81002d6d>] do_syscall_64+0x6d/0x160
 [<ffffffff81642e7c>] entry_SYSCALL64_slow_path+0x25/0x25
---[ end trace 7f80ebccfc6bd15d ]---

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-10 16:22                 ` Steve Wise
@ 2016-06-10 18:43                   ` Ming Lin
  2016-06-10 19:17                     ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Ming Lin @ 2016-06-10 18:43 UTC (permalink / raw)


On Fri, Jun 10, 2016@9:22 AM, Steve Wise <swise@opengridcomputing.com> wrote:

>
> I enabled lots of kernel memory debugging and now hit this.  Perhaps a clue?  Freeing an active timer list widget?
>
> nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.1.14:4420
> nvme nvme1: creating 16 I/O queues.
> nvme nvme1: Connect rejected, no private data.
> nvme nvme1: rdma_resolve_addr wait failed (-104).
> nvme nvme1: failed to initialize i/o queue: -104
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 10440 at lib/debugobjects.c:263 debug_print_object+0x8e/0xb0
> ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
> Modules linked in: nvme_rdma nvme_fabrics rdma_ucm rdma_cm iw_cm configfs iw_cxgb4 cxgb4 ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q garp stp llc cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en ib_mthca dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm irqbypass uinput iTCO_wdt iTCO_vendor_support pcspkr mlx4_ib ib_core ipv6 mlx4_core dm_mod sg lpc_ich mfd_core i2c_i801 nvme nvme_core igb dca ptp pps_core acpi_cpufreq ext4(E) mbcache(E) jbd2(E) sd_mod(E) nouveau(E) ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E) i2c_core(E) mxm_wmi(E) video(E) ahci(E) libahci(E) wmi(E) [last unloaded: cxgb4]
> CPU: 1 PID: 10440 Comm: nvme Tainted: G            E   4.7.0-rc2-nvmf-all.2+ #42
> Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
>  0000000000000000 ffff881027a13a18 ffffffff812f032d ffffffff8130e65e
>  ffff881027a13a78 ffff881027a13a78 0000000000000000 ffff881027a13a68
>  ffffffff8106694d 0000031800000001 000001072aad7ce8 dead000000000200
> Call Trace:
>  [<ffffffff812f032d>] dump_stack+0x51/0x74
>  [<ffffffff8130e65e>] ? debug_print_object+0x8e/0xb0
>  [<ffffffff8106694d>] __warn+0xfd/0x120
>  [<ffffffff81066a29>] warn_slowpath_fmt+0x49/0x50
>  [<ffffffff81182d72>] ? kfree_const+0x22/0x30
>  [<ffffffff8130e65e>] debug_print_object+0x8e/0xb0
>  [<ffffffff81080850>] ? __queue_work+0x520/0x520
>  [<ffffffff8130ecbe>] __debug_check_no_obj_freed+0x1ee/0x270
>  [<ffffffff8130ed57>] debug_check_no_obj_freed+0x17/0x20
>  [<ffffffff811c3aac>] kfree+0x9c/0x120
>  [<ffffffff81182d72>] ? kfree_const+0x22/0x30
>  [<ffffffff812f2f3c>] ? kobject_cleanup+0x9c/0x1b0
>  [<ffffffffa04cc696>] nvme_rdma_free_ctrl+0xa6/0xc0 [nvme_rdma]
>  [<ffffffffa06fcc36>] nvme_free_ctrl+0x46/0x60 [nvme_core]
>  [<ffffffffa06feb2b>] nvme_put_ctrl+0x1b/0x20 [nvme_core]
>  [<ffffffffa04cf1a2>] nvme_rdma_create_ctrl+0x412/0x4f0 [nvme_rdma]
>  [<ffffffffa04c5d02>] nvmf_create_ctrl+0x182/0x210 [nvme_fabrics]
>  [<ffffffffa04c5e3c>] nvmf_dev_write+0xac/0x110 [nvme_fabrics]
>  [<ffffffff811d9c24>] __vfs_write+0x34/0x120
>  [<ffffffff81002515>] ? trace_event_raw_event_sys_enter+0xb5/0x130
>  [<ffffffff811d9dc9>] vfs_write+0xb9/0x130
>  [<ffffffff811f9592>] ? __fdget_pos+0x12/0x50
>  [<ffffffff811da9b9>] SyS_write+0x59/0xc0
>  [<ffffffff81002d6d>] do_syscall_64+0x6d/0x160
>  [<ffffffff81642e7c>] entry_SYSCALL64_slow_path+0x25/0x25
> ---[ end trace 7f80ebccfc6bd15d ]---

I can reproduce this and below patch fixed it.
[PATCH] nvme-rdma: correctly stop keep alive on error path
http://lists.infradead.org/pipermail/linux-nvme/2016-June/004931.html

Could you also give it a try and see if it helps for the crash you saw?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-10 18:43                   ` Ming Lin
@ 2016-06-10 19:17                     ` Steve Wise
  2016-06-10 20:00                       ` Ming Lin
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-10 19:17 UTC (permalink / raw)


> I can reproduce this and below patch fixed it.
> [PATCH] nvme-rdma: correctly stop keep alive on error path
> http://lists.infradead.org/pipermail/linux-nvme/2016-June/004931.html
> 
> Could you also give it a try and see if it helps for the crash you saw?


I applied your patch and it does avoid the crash.  So the connect to the target
device via cxgb4 that I setup to fail in ib_alloc_mr(), correctly fails w/o
crashing.   After this connect failure, I tried to connect the same target
device but via another rdma path (mlx4 instead of cxgb4 which was setup to fail)
and got a different failure.  Not sure if this is a regression from your fix or
just another error path problem:

BUG: unable to handle kernel paging request at ffff881027d00e00
IP: [<ffffffffa04c5a49>] nvmf_parse_options+0x369/0x4a0 [nvme_fabrics]
PGD 2237067 PUD 10782d5067 PMD 1078196067 PTE 8000001027d00060
Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: nvme_rdma nvme_fabrics rdma_ucm rdma_cm iw_cm configfs
iw_cxgb4 cxgb4 ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM
iptable_mangle iptable_filter ip_tables bridge autofs4 8021q garp stp llc
cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3
cxgb3 mdio ib_qib rdmavt mlx4_en ib_mthca dm_mirror dm_region_hash dm_log
vhost_net macvtap macvlan vhost tun kvm irqbypass uinput iTCO_wdt
iTCO_vendor_support pcspkr mlx4_ib ib_core ipv6 mlx4_core dm_mod sg lpc_ich
mfd_core i2c_i801 nvme nvme_core igb dca ptp pps_core acpi_cpufreq ext4(E)
mbcache(E) jbd2(E) sd_mod(E) nouveau(E) ttm(E) drm_kms_helper(E) drm(E)
fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E)
i2c_core(E) mxm_wmi(E) video(E) ahci(E) libahci(E) wmi(E) [last unloaded: cxgb4]
CPU: 15 PID: 10527 Comm: nvme Tainted: G            E   4.7.0-rc2-nvmf-all.2+
#42
Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
task: ffff881016754380 ti: ffff880fe95b0000 task.ti: ffff880fe95b0000
RIP: 0010:[<ffffffffa04c5a49>]  [<ffffffffa04c5a49>]
nvmf_parse_options+0x369/0x4a0 [nvme_fabrics]
RSP: 0018:ffff880fe95b3ca8  EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff88102854a380 RCX: 0000000000000000
RDX: ffff881027d00e00 RSI: ffffffffa04c6549 RDI: ffff880fe95b3ce8
RBP: ffff880fe95b3d28 R08: 000000000000003d R09: ffff8810272c7de0
R10: 0000000000000000 R11: 0000000000000010 R12: ffff880fe95b3ce8
R13: 0000000000000000 R14: ffff88102b1d6b80 R15: ffff880fe95b3cf4
FS:  00007f0264446700(0000) GS:ffff8810775c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff881027d00e00 CR3: 0000000fe95b8000 CR4: 00000000000406e0
Stack:
 00000000024080c0 ffff88102b1d6bae ffff88102b1d6bb6 ffff88102b1d6bba
 0000000000000040 0000000000000050 0000000000000001 0000000000000000
 0000000000000000 0000000800000246 ffff881076c13f00 ffff88102b1d6b40
Call Trace:
 [<ffffffffa04c5bc6>] nvmf_create_ctrl+0x46/0x210 [nvme_fabrics]
 [<ffffffffa04c5e3c>] nvmf_dev_write+0xac/0x110 [nvme_fabrics]
 [<ffffffff811d9c24>] __vfs_write+0x34/0x120
 [<ffffffff81002515>] ? trace_event_raw_event_sys_enter+0xb5/0x130
 [<ffffffff811d9dc9>] vfs_write+0xb9/0x130
 [<ffffffff811f9592>] ? __fdget_pos+0x12/0x50
 [<ffffffff811da9b9>] SyS_write+0x59/0xc0
 [<ffffffff81002d6d>] do_syscall_64+0x6d/0x160
 [<ffffffff81642e7c>] entry_SYSCALL64_slow_path+0x25/0x25
Code: 87 39 01 00 00 48 63 f6 48 89 73 28 e9 26 fd ff ff 45 31 ed 48 83 7b 48 00
0f 85 99 fd ff ff 48 8b 15 fc 15 00 00 b8 01 00 00 00 <f0> 0f c1 02 83 c0 01 83
f8 01 7e 1e 48 8b 05 e4 15 00 00 45 31
RIP  [<ffffffffa04c5a49>] nvmf_parse_options+0x369/0x4a0 [nvme_fabrics]
 RSP <ffff880fe95b3ca8>
CR2: ffff881027d00e00
---[ end trace 16c6dd71ae6f4532 ]---

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-10 19:17                     ` Steve Wise
@ 2016-06-10 20:00                       ` Ming Lin
  2016-06-10 20:15                         ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Ming Lin @ 2016-06-10 20:00 UTC (permalink / raw)


On Fri, Jun 10, 2016 at 12:17 PM, Steve Wise
<swise@opengridcomputing.com> wrote:
>> I can reproduce this and below patch fixed it.
>> [PATCH] nvme-rdma: correctly stop keep alive on error path
>> http://lists.infradead.org/pipermail/linux-nvme/2016-June/004931.html
>>
>> Could you also give it a try and see if it helps for the crash you saw?
>
>
> I applied your patch and it does avoid the crash.  So the connect to the target
> device via cxgb4 that I setup to fail in ib_alloc_mr(), correctly fails w/o
> crashing.   After this connect failure, I tried to connect the same target
> device but via another rdma path (mlx4 instead of cxgb4 which was setup to fail)
> and got a different failure.  Not sure if this is a regression from your fix or
> just another error path problem:
>
> BUG: unable to handle kernel paging request at ffff881027d00e00
> IP: [<ffffffffa04c5a49>] nvmf_parse_options+0x369/0x4a0 [nvme_fabrics]

Could you find out which line of code this is?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-10 20:00                       ` Ming Lin
@ 2016-06-10 20:15                         ` Steve Wise
  2016-06-10 20:18                           ` Ming Lin
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-10 20:15 UTC (permalink / raw)


> > I applied your patch and it does avoid the crash.  So the connect to the target
> > device via cxgb4 that I setup to fail in ib_alloc_mr(), correctly fails w/o
> > crashing.   After this connect failure, I tried to connect the same target
> > device but via another rdma path (mlx4 instead of cxgb4 which was setup to fail)
> > and got a different failure.  Not sure if this is a regression from your fix or
> > just another error path problem:
> >
> > BUG: unable to handle kernel paging request at ffff881027d00e00
> > IP: [<ffffffffa04c5a49>] nvmf_parse_options+0x369/0x4a0 [nvme_fabrics]
> 
> Could you find out which line of code this is?

>From objdump -S -l nvme-fabrics.ok, nvmf_parse_options starts at 6e0:

---
00000000000006e0 <nvmf_parse_options>:
nvmf_parse_options():
/usr/local/src/linux-2.6/drivers/nvme/host/fabrics.c:515
        { NVMF_OPT_ERR,                 NULL                    }
};

static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
                const char *buf)
{
     6e0:       55                      push   %rbp
----

So 0x6e0+0x369 = 0xa49 which is in an inline atomic_add_return(), I think:

---
atomic_add_return():
/usr/local/src/linux-2.6/./arch/x86/include/asm/atomic.h:156
 *
 * Atomically adds @i to @v and returns @i + @v
 */
static __always_inline int atomic_add_return(int i, atomic_t *v)
{
        return i + xadd(&v->counter, i);
     a3d:       48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # a44 <nvmf_parse_options+0x364>
     a44:       b8 01 00 00 00          mov    $0x1,%eax
     a49:       f0 0f c1 02             lock xadd %eax,(%rdx)
     a4d:       83 c0 01                add    $0x1,%eax
kref_get():
/usr/local/src/linux-2.6/include/linux/kref.h:46
{
        /* If refcount was 0 before incrementing then we have a race
         * condition when this kref is freeing by some other thread right now.
         * In this case one should use kref_get_unless_zero()
         */
        WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
     a50:       83 f8 01                cmp    $0x1,%eax
     a53:       7e 1e                   jle    a73 <nvmf_parse_options+0x393>
nvmf_parse_options():
/usr/local/src/linux-2.6/drivers/nvme/host/fabrics.c:689
---

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-10 20:15                         ` Steve Wise
@ 2016-06-10 20:18                           ` Ming Lin
  2016-06-10 21:14                             ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Ming Lin @ 2016-06-10 20:18 UTC (permalink / raw)


On Fri, Jun 10, 2016@1:15 PM, Steve Wise <swise@opengridcomputing.com> wrote:
>> > I applied your patch and it does avoid the crash.  So the connect to the target
>> > device via cxgb4 that I setup to fail in ib_alloc_mr(), correctly fails w/o
>> > crashing.   After this connect failure, I tried to connect the same target
>> > device but via another rdma path (mlx4 instead of cxgb4 which was setup to fail)
>> > and got a different failure.  Not sure if this is a regression from your fix or
>> > just another error path problem:
>> >
>> > BUG: unable to handle kernel paging request at ffff881027d00e00
>> > IP: [<ffffffffa04c5a49>] nvmf_parse_options+0x369/0x4a0 [nvme_fabrics]
>>
>> Could you find out which line of code this is?
>
> From objdump -S -l nvme-fabrics.ok, nvmf_parse_options starts at 6e0:
>
> ---
> 00000000000006e0 <nvmf_parse_options>:
> nvmf_parse_options():
> /usr/local/src/linux-2.6/drivers/nvme/host/fabrics.c:515
>         { NVMF_OPT_ERR,                 NULL                    }
> };
>
> static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
>                 const char *buf)
> {
>      6e0:       55                      push   %rbp
> ----
>
> So 0x6e0+0x369 = 0xa49 which is in an inline atomic_add_return(), I think:
>
> ---
> atomic_add_return():
> /usr/local/src/linux-2.6/./arch/x86/include/asm/atomic.h:156
>  *
>  * Atomically adds @i to @v and returns @i + @v
>  */
> static __always_inline int atomic_add_return(int i, atomic_t *v)
> {
>         return i + xadd(&v->counter, i);
>      a3d:       48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # a44 <nvmf_parse_options+0x364>
>      a44:       b8 01 00 00 00          mov    $0x1,%eax
>      a49:       f0 0f c1 02             lock xadd %eax,(%rdx)
>      a4d:       83 c0 01                add    $0x1,%eax
> kref_get():
> /usr/local/src/linux-2.6/include/linux/kref.h:46
> {
>         /* If refcount was 0 before incrementing then we have a race
>          * condition when this kref is freeing by some other thread right now.
>          * In this case one should use kref_get_unless_zero()
>          */
>         WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
>      a50:       83 f8 01                cmp    $0x1,%eax
>      a53:       7e 1e                   jle    a73 <nvmf_parse_options+0x393>
> nvmf_parse_options():
> /usr/local/src/linux-2.6/drivers/nvme/host/fabrics.c:689
> ---

Does Sagi's patch help?

Author: Sagi Grimberg <sagi at grimberg.me>
Date:   Thu Jun 9 13:20:09 2016 -0700

    fabrics: Don't directly free opts->host

    It might be the default host, so we need to call
    nvmet_put_host (which is safe against NULL lucky for
    us).

    Reported-by: Alexander Nezhinsky <alexander.nezhinsky at excelero.com>
    Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 225a732..b86b637 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -805,7 +805,7 @@ nvmf_create_ctrl(struct device *dev, const char
*buf, size_t count)
 out_unlock:
        mutex_unlock(&nvmf_transports_mutex);
 out_free_opts:
-       kfree(opts->host);
+       nvmf_host_put(opts->host);
        kfree(opts);
        return ERR_PTR(ret);
 }

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-10 20:18                           ` Ming Lin
@ 2016-06-10 21:14                             ` Steve Wise
  2016-06-10 21:20                               ` Ming Lin
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Wise @ 2016-06-10 21:14 UTC (permalink / raw)


> 
> Does Sagi's patch help?
> 
> Author: Sagi Grimberg <sagi at grimberg.me>
> Date:   Thu Jun 9 13:20:09 2016 -0700
> 
>     fabrics: Don't directly free opts->host
> 
>     It might be the default host, so we need to call
>     nvmet_put_host (which is safe against NULL lucky for
>     us).
> 
>     Reported-by: Alexander Nezhinsky <alexander.nezhinsky at excelero.com>
>     Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> 

Yes!  Failures recover and subsequent connects work! 

Thanks for the help Ming!  Sorry I'm finding already fixed problems. :(  

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-10 21:14                             ` Steve Wise
@ 2016-06-10 21:20                               ` Ming Lin
  2016-06-10 21:25                                 ` Steve Wise
  0 siblings, 1 reply; 37+ messages in thread
From: Ming Lin @ 2016-06-10 21:20 UTC (permalink / raw)


On Fri, Jun 10, 2016@2:14 PM, Steve Wise <swise@opengridcomputing.com> wrote:
>>
>> Does Sagi's patch help?
>>
>> Author: Sagi Grimberg <sagi at grimberg.me>
>> Date:   Thu Jun 9 13:20:09 2016 -0700
>>
>>     fabrics: Don't directly free opts->host
>>
>>     It might be the default host, so we need to call
>>     nvmet_put_host (which is safe against NULL lucky for
>>     us).
>>
>>     Reported-by: Alexander Nezhinsky <alexander.nezhinsky at excelero.com>
>>     Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>>
>
> Yes!  Failures recover and subsequent connects work!
>
> Thanks for the help Ming!  Sorry I'm finding already fixed problems. :(

No problem :-)
Can I have your Test-by tag?

Christoph,

Could you apply these 2 patches?

fabrics: Don't directly free opts->host
nvme-rdma: correctly stop keep alive on error path

^ permalink raw reply	[flat|nested] 37+ messages in thread

* nvme-fabrics: crash at nvme connect-all
  2016-06-10 21:20                               ` Ming Lin
@ 2016-06-10 21:25                                 ` Steve Wise
  0 siblings, 0 replies; 37+ messages in thread
From: Steve Wise @ 2016-06-10 21:25 UTC (permalink / raw)




> -----Original Message-----
> From: Ming Lin [mailto:mlin at kernel.org]
> Sent: Friday, June 10, 2016 4:21 PM
> To: Steve Wise
> Cc: Jens Axboe; ming l; Sagi Grimberg; Marta Rybczynska; linux-
> nvme at lists.infradead.org; Christoph Hellwig; james p freyensee; keith busch;
> armenx baloyan
> Subject: Re: nvme-fabrics: crash at nvme connect-all
> 
> On Fri, Jun 10, 2016 at 2:14 PM, Steve Wise <swise at opengridcomputing.com>
> wrote:
> >>
> >> Does Sagi's patch help?
> >>
> >> Author: Sagi Grimberg <sagi at grimberg.me>
> >> Date:   Thu Jun 9 13:20:09 2016 -0700
> >>
> >>     fabrics: Don't directly free opts->host
> >>
> >>     It might be the default host, so we need to call
> >>     nvmet_put_host (which is safe against NULL lucky for
> >>     us).
> >>
> >>     Reported-by: Alexander Nezhinsky <alexander.nezhinsky at excelero.com>
> >>     Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> >>
> >
> > Yes!  Failures recover and subsequent connects work!
> >
> > Thanks for the help Ming!  Sorry I'm finding already fixed problems. :(
> 
> No problem :-)
> Can I have your Test-by tag?
> 

I can't reply directly to Sagi's because it wasn't posted to linux-rdma apparently, and it must have been posted to linux-nvme before I joined.  But for both:

Tested-by: Steve Wise <swise at opengridcomputing.com>


> Christoph,
> 
> Could you apply these 2 patches?
> 
> fabrics: Don't directly free opts->host
> nvme-rdma: correctly stop keep alive on error path

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2016-06-10 21:25 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-09  9:18 nvme-fabrics: crash at nvme connect-all Marta Rybczynska
2016-06-09  9:29 ` Sagi Grimberg
2016-06-09 10:07   ` Marta Rybczynska
2016-06-09 11:09     ` Sagi Grimberg
2016-06-09 12:12       ` Marta Rybczynska
2016-06-09 12:30         ` Sagi Grimberg
2016-06-09 13:27           ` Steve Wise
2016-06-09 13:36             ` Steve Wise
2016-06-09 13:48               ` Sagi Grimberg
2016-06-09 14:09                 ` Steve Wise
2016-06-09 14:22                   ` Steve Wise
2016-06-09 14:29                     ` Steve Wise
2016-06-09 15:04                       ` Marta Rybczynska
2016-06-09 15:40                         ` Steve Wise
2016-06-09 15:48                           ` Steve Wise
2016-06-10  9:03                             ` Marta Rybczynska
2016-06-10 13:40                               ` Steve Wise
2016-06-10 13:42                                 ` Marta Rybczynska
2016-06-10 13:49                                   ` Steve Wise
2016-06-09 13:25   ` Christoph Hellwig
2016-06-09 13:24 ` Christoph Hellwig
2016-06-09 15:37   ` Marta Rybczynska
2016-06-09 20:25     ` Steve Wise
2016-06-09 20:35       ` Ming Lin
2016-06-09 21:06         ` Steve Wise
2016-06-09 22:26           ` Ming Lin
2016-06-09 22:40             ` Steve Wise
     [not found]             ` <055801d1c29f$e164c000$a42e4000$@opengridcomputing.com>
2016-06-10 15:11               ` Steve Wise
2016-06-10 16:22                 ` Steve Wise
2016-06-10 18:43                   ` Ming Lin
2016-06-10 19:17                     ` Steve Wise
2016-06-10 20:00                       ` Ming Lin
2016-06-10 20:15                         ` Steve Wise
2016-06-10 20:18                           ` Ming Lin
2016-06-10 21:14                             ` Steve Wise
2016-06-10 21:20                               ` Ming Lin
2016-06-10 21:25                                 ` Steve Wise

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.