From mboxrd@z Thu Jan 1 00:00:00 1970 From: swise@opengridcomputing.com (Steve Wise) Date: Thu, 9 Jun 2016 08:36:49 -0500 Subject: nvme-fabrics: crash at nvme connect-all In-Reply-To: <004901d1c252$b5978d10$20c6a730$@opengridcomputing.com> References: <53708289.31891804.1465463883806.JavaMail.zimbra@kalray.eu> <575936F0.9000600@lightbits.io> <574056153.32082017.1465466832847.JavaMail.zimbra@kalray.eu> <57594E81.9060302@lightbits.io> <1218382158.32228335.1465474321289.JavaMail.zimbra@kalray.eu> <5759614D.5080703@lightbits.io> <004901d1c252$b5978d10$20c6a730$@opengridcomputing.com> Message-ID: <005701d1c253$f9590550$ec0b0ff0$@opengridcomputing.com> > > Steve, did you see this before? I'm wandering if we need some sort > > of logic handling with resource limitation in iWARP (global mrs pool...) > > Haven't seen this. Does 'cat /sys/kernel/debug/iw_cxgb4/blah/stats' show > anything interesting? Where/why is it crashing? > So this is the failure: [ 703.239462] rdma_rw_init_mrs: failed to allocated 128 MRs [ 703.239498] failed to init MR pool ret= -12 [ 703.239541] nvmet_rdma: failed to create_qp ret= -12 [ 703.239582] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12). Not sure why it would fail. I would think my setup would be allocating more given I have 16 cores on the host and target. The debugfs "stats" file I mentioned above should show us something if we're running out of adapter resources for MR or PBL records. Can you please turn on c4iw_debug and send me the debug output? echo 1 > /sys/module/iw_cxgb4/parameters/c4iw_debug. Thanks, Steve.