All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [SPDK] SPDK errors
@ 2017-08-29 19:00 Luse, Paul E
  0 siblings, 0 replies; 22+ messages in thread
From: Luse, Paul E @ 2017-08-29 19:00 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2306 bytes --]

OK, that’s good info I think for those who may have suggestions or other questions.  I do have some HW here that I can maybe take the opportunity to try and get setup and help but it would be more of a learning activity for me than expedited help for you ☺ If someone doesn’t get you going by EOD Thu I’ll see if I can work this weekend on getting my stuff setup to repro…

Thx
Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Santhebachalli Ganesh
Sent: Tuesday, August 29, 2017 10:50 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] SPDK errors

Thanks for the quick response.

I have run perf using the PCIe, traddr against the local NVME device on target successfully.
Also, same tests were going fine with my last rolling Tumbleweed distro. Although, now I have pulled the latest version of SPDK/DPDK from the gits.
The current tests are with regular, latest Leap with updated (Linux 4.12.8-1.g4d7933a-default) kernel.

Planning to check with latest Fedora on target.

-Ganesh

On Tue, Aug 29, 2017 at 10:26 AM, Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
Hi Ganesh,

I’m totally not one of the NVMEoF experts but I’m sure someone will chime in soon.  In the meantime it might be interesting to run the same perf tests locally against the same NVMe device(s) just to make sure all that works w/o issue.  If it doesn’t that’s a simpler problem to solve at least ☺

-Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Tuesday, August 29, 2017 10:15 AM
To: SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
Subject: [SPDK] SPDK errors

Folks,
My name is Ganesh, and I am working on NVEMoF performance metrics using SPDK (and kernel).
I would appreciate your expert insights.

I am observing errors when QD on perf is increased above >=64 most of the times. Sometimes, even for <=16
Errors are not consistent.

Attached are some details.

Please let me know if have any additional questions.

Thanks.
-Ganesh

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 9409 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-11-05 19:08 Luse, Paul E
  0 siblings, 0 replies; 22+ messages in thread
From: Luse, Paul E @ 2017-11-05 19:08 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 20240 bytes --]

Not me, I’ve been out for 2 days and then M-W we have the SPDK developer meetup in Chandler so things will be a little quiet for the first half of the week at least..

Thx
Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Santhebachalli Ganesh
Sent: Friday, November 3, 2017 3:13 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] SPDK errors

Thanks in advance.
Did anyone have a chance to look?

I can also try a conf file which someone is using on a working setup...
which has 2 sub-systems with unique NQN's on the target listening on IP's (say, 1.1.1.80, 1.1.2.80) on separate subnet,
2 (say, 1.1.1.81, 1.1.2.81) initiators connected with perf used (at the same time) to run metrics from initiators to target.


On Wed, Nov 1, 2017 at 4:57 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com<mailto:svg.eid(a)gmail.com>> wrote:
Some typo; should read.
1.1.1.80, 1.1.2.80 - Target
1.1.1.81 - Ini 1
1.1.2.83 - Ini 2

On Wed, Nov 1, 2017 at 4:00 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com<mailto:svg.eid(a)gmail.com>> wrote:
>What did you do to get past your previous issues?
Same CX-3 and HW.
Just latest SW/Distro - CentOS, latest (as of my install) kernel-ml, latest SPDK/DPDK.

Have got some CX-5's now. Not yet installed.
Will try to get some CX-4's.

Thanks.
Ganesh


On Wed, Nov 1, 2017 at 3:52 PM, Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
Hi Ganesh,

That’s progress at least.  What did you do to get past your previous issues?  FYI I finally broke down and bought some CX-4’s, just got them today actually so will work on trying to get them up and running end of next week.

In the meantime, I’m sure John or someone else can assist with the questions below.

Thx
Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Wednesday, November 1, 2017 3:10 PM

To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

Paul and John,
Back to some perf work.
As always, thanks for your time and assistance in advance.

Update:
Able to run perf (ini to target) consistently with updated CentOS, kernel-ml and latest SPDK/DPDK.
Same HW: CX-3 and PMC NVME, PCI card.
Able to get 1M IOP's (4KB) on generic HW.
[adminuser(a)dell730-80 ~]$ uname -r
4.13.9-1.el7.elrepo.x86_64
[adminuser(a)dell730-80 ~]$ lsb_release -d
Description:    CentOS Linux release 7.4.1708 (Core)

Issue being faced now:
Can run perf at one initiator at a time.
Unable to run perf from multiple initiators to target at the same time. Able to discover and connect.
Just "hangs" on initiator side when running perf. Don't seem to see any messages anywhere.
Maybe, I am missing some config to be done.

Added info below. Let me know any more info you need.

---
Initiators connected to target NIC ports on separate subnet
Target has a dual port CX-3 NIC with IP addresses on separate subnet.
1.1.1.80, 1.2.2.80 - Target
1.1.1.81 - Ini 1
1.2.2.83 - Ini 2

-- target
[adminuser(a)dell730-80 ~]$ sudo ~/gits/spdk/app/nvmf_tgt/nvmf_tgt -c ~/gits/spdk/etc/spdk/nvmf.conf
Starting DPDK 17.08.0 initialization...
[ DPDK EAL parameters: nvmf -c 0x5555 --file-prefix=spdk_pid3534 ]
EAL: Detected 48 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
Total cores available: 8
Occupied cpu socket mask is 0x1
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 2 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 4 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 6 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 8 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 10 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 12 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 14 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on socket 0
copy_engine_ioat.c: 306:copy_engine_ioat_init: *NOTICE*: Ioat Copy Engine Offload Enabled
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL:   probe driver: 11f8:f117 spdk_nvme
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL:   probe driver: 11f8:f117 spdk_nvme
nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem nqn.2014-08.org.nvmexpress.discovery on lcore 2 on socket 0
nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem nqn.2016-06.io.spdk:cnode1 on lcore 2 on socket 0
rdma.c:1145:spdk_nvmf_rdma_create: *NOTICE*: *** RDMA Transport Init ***
rdma.c:1352:spdk_nvmf_rdma_listen: *NOTICE*: *** NVMf Target Listening on 1.1.1.80 port 4420 ***
conf.c: 500:spdk_nvmf_construct_subsystem: *NOTICE*: Attaching block device Nvme0n1 to subsystem nqn.2016-06.io.spdk:cnode1
nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem nqn.2016-06.io.spdk:cnode2 on lcore 4 on socket 0
rdma.c:1352:spdk_nvmf_rdma_listen: *NOTICE*: *** NVMf Target Listening on 1.1.2.80 port 4420 ***
conf.c: 500:spdk_nvmf_construct_subsystem: *NOTICE*: Attaching block device Nvme1n1 to subsystem nqn.2016-06.io.spdk:cnode2
nvmf_tgt.c: 255:spdk_nvmf_startup: *NOTICE*: Acceptor running on core 2 on socket 0

--- ini
Can see both NQN's on discovery
Can connect to both NQN's
[adminuser(a)dell730-81 ~]$ sudo nvme discover -t rdma -a 1.1.1.80 -s 4420

Discovery Log Number of Records 2, Generation counter 5
=====Discovery Log Entry 0======
trtype:  rdma
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified
portid:  0
trsvcid: 4420
subnqn:  nqn.2016-06.io.spdk:cnode1
traddr:  1.1.1.80
rdma_prtype: not specified
rdma_qptype: connected
rdma_cms:    rdma-cm
rdma_pkey: 0x0000
=====Discovery Log Entry 1======
trtype:  rdma
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified
portid:  1
trsvcid: 4420
subnqn:  nqn.2016-06.io.spdk:cnode2
traddr:  1.1.2.80
rdma_prtype: not specified
rdma_qptype: connected
rdma_cms:    rdma-cm
rdma_pkey: 0x0000

sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 1.1.1.80 -w 1.1.1.81
sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode2" -a 1.1.2.80 -w 1.1.2.83

[adminuser(a)dell730-81 ~]$ sudo ~/gits/spdk/examples/nvme/perf/perf -q 1 -s 4096 -w read -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
Starting DPDK 17.08.0 initialization...
[ DPDK EAL parameters: perf -c 0x2 --no-pci --file-prefix=spdk_pid3383 ]
EAL: Detected 48 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
Initializing NVMe Controllers
Attaching to NVMe over Fabrics controller at 1.1.1.80:4420<http://1.1.1.80:4420>: nqn.2016-06.io.spdk:cnode1
Attached to NVMe over Fabrics controller at 1.1.1.80:4420<http://1.1.1.80:4420>: nqn.2016-06.io.spdk:cnode1
Associating SPDK bdev Controller (SPDK00000000000001  ) with lcore 1
Initialization complete. Launching workers.
Starting thread on core 1

[adminuser(a)dell730-83 ~]$ sudo ~/gits/spdk/spdk/examples/nvme/perf/perf -q 1 -s 4096 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.2.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode2' -c 0x2
Starting DPDK 17.08.0 initialization...
[ DPDK EAL parameters: perf -c 0x2 --no-pci --file-prefix=spdk_pid3633 ]
EAL: Detected 48 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
Initializing NVMe Controllers
Attaching to NVMe over Fabrics controller at 1.1.2.80:4420<http://1.1.2.80:4420>: nqn.2016-06.io.spdk:cnode2
Attached to NVMe over Fabrics controller at 1.1.2.80:4420<http://1.1.2.80:4420>: nqn.2016-06.io.spdk:cnode2
Associating SPDK bdev Controller (SPDK00000000000002  ) with lcore 1
Initialization complete. Launching workers.
Starting thread on core 1

--- snippet from my nvmf.conf on target
  TransportId "trtype:PCIe traddr:0000:04:00.0" Nvme0
  TransportId "trtype:PCIe traddr:0000:06:00.0" Nvme1

# Namespaces backed by physical NVMe devices
[Subsystem1]
  NQN nqn.2016-06.io.spdk:cnode1
  Core 2
  Listen RDMA 1.1.1.80:4420<http://1.1.1.80:4420>
  AllowAnyHost Yes
  SN SPDK00000000000001
  Namespace Nvme0n1 1

# Namespaces backed by physical NVMe devices
[Subsystem2]
  NQN nqn.2016-06.io.spdk:cnode2
  Core 4
  Listen RDMA 1.1.2.80:4420<http://1.1.2.80:4420>
  AllowAnyHost Yes
  SN SPDK00000000000002
  Namespace Nvme1n1 1

On Tue, Sep 12, 2017 at 4:00 PM, Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
Yeah, thanks John!  FYI Ganesh I didn’t forget about you I’m at SNIA SDC this week.  Cool to see SPDK mentioned in lots of talks including a nice shout out from Sage (Ceph) on SPDK in Bluestore…

From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Tuesday, September 12, 2017 4:29 AM

To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

John,
Thanks for looking into this.
Yes, this is similar.
One of the significant difference is ConnectX-4 NIC. My setup uses ConnectX-3 NIC.

I don't have access to my lab as it is being physically moved. Hope to get access this or next week.


On Tue, Sep 12, 2017 at 5:36 AM, Kariuki, John K <john.k.kariuki(a)intel.com<mailto:john.k.kariuki(a)intel.com>> wrote:
Ganesh
I am trying to reproduce the issue in my environment without any luck. My environment has 2 CPUs with 22 cores/socket, Mellanox ConnectX-4 Lx NICs, Ubuntu 17.04 with kernel 4.10 and OFED 4.1. I have tried several different QD above 16 (all the way to 256) and perf ran successfully. Here is the workload parameters I am using. Is this similar to what you’re doing to reproduce the issue?

./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 128 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 256 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'



From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Tuesday, September 05, 2017 4:52 AM

To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

Thanks.
Traveling. My lab machines are shutdown as lab is being moved this week.
When I land, will try to reach out.
On Tue, Sep 5, 2017 at 3:40 AM Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
Hi Ganesh,

So I have my hardware installed , the Chelsio cards below, and I believe I’ve got the drivers built and installed correctly and ethtool is showing Link Detected on both ends but I’m not having much luck testing the connection in any other way and its not connecting w/SPDK. I’m far from an expexrt in this area, if you have some time and want to see if you can help me get this working we can use this setup to troubleshoot yours.  Let me know… I’m probably done for today though

Thx
Paul

Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller

From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Friday, September 1, 2017 11:59 AM

To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

Update:
Just in case...
Tried with CentOS 7 (Release: 7.3.1611) with both built in latest kernel (3.10.0-514.26.2.el7.x86_64),
and updated latest stable (4.12.10-1.el7.elrepo.x86_64) installed.

Similar behavior. ;(

Maybe, next is to try with MOFED drivers.

On Thu, Aug 31, 2017 at 3:49 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com<mailto:svg.eid(a)gmail.com>> wrote:
Thanks!

Update:
Tried using IB mode instead of Eth on the Connectx-3 NIC.
No change in behavior. Similar behavior and errors observed.



On Thu, Aug 31, 2017 at 3:36 PM, Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
Well those are good steps… hopefully someone else will jump in as well. I will see if I can get my HW setup to repro over the long weekend and let ya know how it goes…

From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Wednesday, August 30, 2017 12:44 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

Thanks.

I did change a few cables, and even targets. Assuming that this took care of insertion force statistically.
Could not do IB instead of Eth as my adapter does not support IB.

Also, the error messages are not consistent. That was a snapshot of one of the runs.
Then, there is also the older Connectx3 adapters (latest FW flashed) with newer kernels and latest SPDK/DPDK.

Seen the following:
>nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
>bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but should be 65536

On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net<mailto:vst(a)vlnb.net>> wrote:

Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
> Folks,
> My name is Ganesh, and I am working on NVEMoF performance metrics using SPDK (and kernel).
> I would appreciate your expert insights.
>
> I am observing errors when QD on perf is increased above >=64 most of the
> times. Sometimes, even for <=16
> Errors are not consistent.
>
> Attached are some details.
>
> Please let me know if have any additional questions.
>
> Thanks.
> -Ganesh
>
> SPDK errors 1.txt
>
>
> Setup details:
> -- Some info on setup
> Same HW/SW on target and initiator.
>
> adminuser(a)dell730-80:~> hostnamectl
>    Static hostname: dell730-80
>          Icon name: computer-server
>            Chassis: server
>         Machine ID: b5abb0fe67afd04c59521c40599b3115
>            Boot ID: f825aa6338194338a6f80125caa836c7
>   Operating System: openSUSE Leap 42.3
>        CPE OS Name: cpe:/o:opensuse:leap:42.3
>             Kernel: Linux 4.12.8-1.g4d7933a-default
>       Architecture: x86-64
>
> adminuser(a)dell730-80:~> lscpu | grep -i socket
> Core(s) per socket:    12
> Socket(s):             2
>
> 2MB and/or 1GB huge pages set,
>
> Latest spdk/dpdk from respective GIT,
>
> compiled with RDMA flag,
>
> nvmf.conf file: (have played around with the values)
> reactor mask 0x5555
> AcceptorCore 2
> 1 - 3 Subsystems on cores 4,8,10
>
> adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c gits/spdk/etc/spdk/nvmf.conf -p 6
>
> PCI, NVME cards (16GB)
> adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
> 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
>
> Network cards: (latest associated FW from vendor)
> adminuser(a)dell730-80:~> sudo lspci | grep -i connect
> 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
>
> --- initiator cmd line
> sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
>
> --errors on stdout on target
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll: *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12): transport retry counter exceeded
>
> --- errros seen on client
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter exceeded
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error

It's, actually, might be HW errors, because retries supposed to be engaged only on
packets loss/corruptions. Might be bad or not too well inserted cables.

Vlad

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk




[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 62164 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-11-03 22:12 Santhebachalli Ganesh
  0 siblings, 0 replies; 22+ messages in thread
From: Santhebachalli Ganesh @ 2017-11-03 22:12 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 22348 bytes --]

Thanks in advance.
Did anyone have a chance to look?

I can also try a conf file which someone is using on a working setup...
which has 2 sub-systems with unique NQN's on the target listening on IP's
(say, 1.1.1.80, 1.1.2.80) on separate subnet,
2 (say, 1.1.1.81, 1.1.2.81) initiators connected with perf used (at the
same time) to run metrics from initiators to target.


On Wed, Nov 1, 2017 at 4:57 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com>
wrote:

> Some typo; should read.
>
> 1.1.1.80, 1.1.2.80 - Target
>
> 1.1.1.81 - Ini 1
>
> 1.1.2.83 - Ini 2
>
> On Wed, Nov 1, 2017 at 4:00 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com>
> wrote:
>
>> >What did you do to get past your previous issues?
>> Same CX-3 and HW.
>> Just latest SW/Distro - CentOS, latest (as of my install) kernel-ml,
>> latest SPDK/DPDK.
>>
>> Have got some CX-5's now. Not yet installed.
>> Will try to get some CX-4's.
>>
>> Thanks.
>> Ganesh
>>
>>
>> On Wed, Nov 1, 2017 at 3:52 PM, Luse, Paul E <paul.e.luse(a)intel.com>
>> wrote:
>>
>>> Hi Ganesh,
>>>
>>>
>>>
>>> That’s progress at least.  What did you do to get past your previous
>>> issues?  FYI I finally broke down and bought some CX-4’s, just got them
>>> today actually so will work on trying to get them up and running end of
>>> next week.
>>>
>>>
>>>
>>> In the meantime, I’m sure John or someone else can assist with the
>>> questions below.
>>>
>>>
>>>
>>> Thx
>>>
>>> Paul
>>>
>>>
>>>
>>> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
>>> Ganesh
>>> *Sent:* Wednesday, November 1, 2017 3:10 PM
>>>
>>> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
>>> *Subject:* Re: [SPDK] SPDK errors
>>>
>>>
>>>
>>> Paul and John,
>>>
>>> Back to some perf work.
>>>
>>> As always, thanks for your time and assistance in advance.
>>>
>>>
>>>
>>> Update:
>>>
>>> Able to run perf (ini to target) consistently with updated CentOS,
>>> kernel-ml and latest SPDK/DPDK.
>>>
>>> Same HW: CX-3 and PMC NVME, PCI card.
>>>
>>> Able to get 1M IOP's (4KB) on generic HW.
>>>
>>> [adminuser(a)dell730-80 ~]$ uname -r
>>>
>>> 4.13.9-1.el7.elrepo.x86_64
>>>
>>> [adminuser(a)dell730-80 ~]$ lsb_release -d
>>>
>>> Description:    CentOS Linux release 7.4.1708 (Core)
>>>
>>>
>>>
>>> *Issue being faced now:*
>>>
>>> Can run perf at one initiator at a time.
>>>
>>> Unable to run perf from multiple initiators to target at the same time.
>>> Able to discover and connect.
>>>
>>> Just "hangs" on initiator side when running perf. Don't seem to see any
>>> messages anywhere.
>>>
>>> Maybe, I am missing some config to be done.
>>>
>>>
>>>
>>> Added info below. Let me know any more info you need.
>>>
>>>
>>>
>>> ---
>>>
>>> Initiators connected to target NIC ports on separate subnet
>>>
>>> Target has a dual port CX-3 NIC with IP addresses on separate subnet.
>>>
>>> 1.1.1.80, 1.2.2.80 - Target
>>>
>>> 1.1.1.81 - Ini 1
>>>
>>> 1.2.2.83 - Ini 2
>>>
>>>
>>>
>>> -- target
>>>
>>> [adminuser(a)dell730-80 ~]$ sudo ~/gits/spdk/app/nvmf_tgt/nvmf_tgt -c
>>> ~/gits/spdk/etc/spdk/nvmf.conf
>>>
>>> Starting DPDK 17.08.0 initialization...
>>>
>>> [ DPDK EAL parameters: nvmf -c 0x5555 --file-prefix=spdk_pid3534 ]
>>>
>>> EAL: Detected 48 lcore(s)
>>>
>>> EAL: No free hugepages reported in hugepages-1048576kB
>>>
>>> EAL: Probing VFIO support...
>>>
>>> Total cores available: 8
>>>
>>> Occupied cpu socket mask is 0x1
>>>
>>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 2 on
>>> socket 0
>>>
>>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 4 on
>>> socket 0
>>>
>>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 6 on
>>> socket 0
>>>
>>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 8 on
>>> socket 0
>>>
>>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 10
>>> on socket 0
>>>
>>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 12
>>> on socket 0
>>>
>>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 14
>>> on socket 0
>>>
>>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on
>>> socket 0
>>>
>>> copy_engine_ioat.c: 306:copy_engine_ioat_init: *NOTICE*: Ioat Copy
>>> Engine Offload Enabled
>>>
>>> EAL: PCI device 0000:04:00.0 on NUMA socket 0
>>>
>>> EAL:   probe driver: 11f8:f117 spdk_nvme
>>>
>>> EAL: PCI device 0000:06:00.0 on NUMA socket 0
>>>
>>> EAL:   probe driver: 11f8:f117 spdk_nvme
>>>
>>> nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem
>>> nqn.2014-08.org.nvmexpress.discovery on lcore 2 on socket 0
>>>
>>> nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem
>>> nqn.2016-06.io.spdk:cnode1 on lcore 2 on socket 0
>>>
>>> rdma.c:1145:spdk_nvmf_rdma_create: *NOTICE*: *** RDMA Transport Init ***
>>>
>>> rdma.c:1352:spdk_nvmf_rdma_listen: *NOTICE*: *** NVMf Target Listening
>>> on 1.1.1.80 port 4420 ***
>>>
>>> conf.c: 500:spdk_nvmf_construct_subsystem: *NOTICE*: Attaching block
>>> device Nvme0n1 to subsystem nqn.2016-06.io.spdk:cnode1
>>>
>>> nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem
>>> nqn.2016-06.io.spdk:cnode2 on lcore 4 on socket 0
>>>
>>> rdma.c:1352:spdk_nvmf_rdma_listen: *NOTICE*: *** NVMf Target Listening
>>> on 1.1.2.80 port 4420 ***
>>>
>>> conf.c: 500:spdk_nvmf_construct_subsystem: *NOTICE*: Attaching block
>>> device Nvme1n1 to subsystem nqn.2016-06.io.spdk:cnode2
>>>
>>> nvmf_tgt.c: 255:spdk_nvmf_startup: *NOTICE*: Acceptor running on core 2
>>> on socket 0
>>>
>>>
>>>
>>> --- ini
>>>
>>> Can see both NQN's on discovery
>>>
>>> Can connect to both NQN's
>>>
>>> [adminuser(a)dell730-81 ~]$ sudo nvme discover -t rdma -a 1.1.1.80 -s 4420
>>>
>>>
>>>
>>> Discovery Log Number of Records 2, Generation counter 5
>>>
>>> =====Discovery Log Entry 0======
>>>
>>> trtype:  rdma
>>>
>>> adrfam:  ipv4
>>>
>>> subtype: nvme subsystem
>>>
>>> treq:    not specified
>>>
>>> portid:  0
>>>
>>> trsvcid: 4420
>>>
>>> subnqn:  nqn.2016-06.io.spdk:cnode1
>>>
>>> traddr:  1.1.1.80
>>>
>>> rdma_prtype: not specified
>>>
>>> rdma_qptype: connected
>>>
>>> rdma_cms:    rdma-cm
>>>
>>> rdma_pkey: 0x0000
>>>
>>> =====Discovery Log Entry 1======
>>>
>>> trtype:  rdma
>>>
>>> adrfam:  ipv4
>>>
>>> subtype: nvme subsystem
>>>
>>> treq:    not specified
>>>
>>> portid:  1
>>>
>>> trsvcid: 4420
>>>
>>> subnqn:  nqn.2016-06.io.spdk:cnode2
>>>
>>> traddr:  1.1.2.80
>>>
>>> rdma_prtype: not specified
>>>
>>> rdma_qptype: connected
>>>
>>> rdma_cms:    rdma-cm
>>>
>>> rdma_pkey: 0x0000
>>>
>>>
>>>
>>> sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 1.1.1.80 -w
>>> 1.1.1.81
>>>
>>> sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode2" -a 1.1.2.80 -w
>>> 1.1.2.83
>>>
>>>
>>>
>>> [adminuser(a)dell730-81 ~]$ sudo ~/gits/spdk/examples/nvme/perf/perf -q 1
>>> -s 4096 -w read -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80
>>> trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
>>>
>>> Starting DPDK 17.08.0 initialization...
>>>
>>> [ DPDK EAL parameters: perf -c 0x2 --no-pci --file-prefix=spdk_pid3383 ]
>>>
>>> EAL: Detected 48 lcore(s)
>>>
>>> EAL: No free hugepages reported in hugepages-1048576kB
>>>
>>> Initializing NVMe Controllers
>>>
>>> Attaching to NVMe over Fabrics controller at 1.1.1.80:4420:
>>> nqn.2016-06.io.spdk:cnode1
>>>
>>> Attached to NVMe over Fabrics controller at 1.1.1.80:4420:
>>> nqn.2016-06.io.spdk:cnode1
>>>
>>> Associating SPDK bdev Controller (SPDK00000000000001  ) with lcore 1
>>>
>>> Initialization complete. Launching workers.
>>>
>>> Starting thread on core 1
>>>
>>>
>>>
>>> [adminuser(a)dell730-83 ~]$ sudo ~/gits/spdk/spdk/examples/nvme/perf/perf
>>> -q 1 -s 4096 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.2.80
>>> trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode2' -c 0x2
>>>
>>> Starting DPDK 17.08.0 initialization...
>>>
>>> [ DPDK EAL parameters: perf -c 0x2 --no-pci --file-prefix=spdk_pid3633 ]
>>>
>>> EAL: Detected 48 lcore(s)
>>>
>>> EAL: No free hugepages reported in hugepages-1048576kB
>>>
>>> Initializing NVMe Controllers
>>>
>>> Attaching to NVMe over Fabrics controller at 1.1.2.80:4420:
>>> nqn.2016-06.io.spdk:cnode2
>>>
>>> Attached to NVMe over Fabrics controller at 1.1.2.80:4420:
>>> nqn.2016-06.io.spdk:cnode2
>>>
>>> Associating SPDK bdev Controller (SPDK00000000000002  ) with lcore 1
>>>
>>> Initialization complete. Launching workers.
>>>
>>> Starting thread on core 1
>>>
>>>
>>>
>>> --- snippet from my nvmf.conf on target
>>>
>>>   TransportId "trtype:PCIe traddr:0000:04:00.0" Nvme0
>>>
>>>   TransportId "trtype:PCIe traddr:0000:06:00.0" Nvme1
>>>
>>>
>>>
>>> # Namespaces backed by physical NVMe devices
>>>
>>> [Subsystem1]
>>>
>>>   NQN nqn.2016-06.io.spdk:cnode1
>>>
>>>   Core 2
>>>
>>>   Listen RDMA 1.1.1.80:4420
>>>
>>>   AllowAnyHost Yes
>>>
>>>   SN SPDK00000000000001
>>>
>>>   Namespace Nvme0n1 1
>>>
>>>
>>>
>>> # Namespaces backed by physical NVMe devices
>>>
>>> [Subsystem2]
>>>
>>>   NQN nqn.2016-06.io.spdk:cnode2
>>>
>>>   Core 4
>>>
>>>   Listen RDMA 1.1.2.80:4420
>>>
>>>   AllowAnyHost Yes
>>>
>>>   SN SPDK00000000000002
>>>
>>>   Namespace Nvme1n1 1
>>>
>>>
>>>
>>> On Tue, Sep 12, 2017 at 4:00 PM, Luse, Paul E <paul.e.luse(a)intel.com>
>>> wrote:
>>>
>>> Yeah, thanks John!  FYI Ganesh I didn’t forget about you I’m at SNIA SDC
>>> this week.  Cool to see SPDK mentioned in lots of talks including a nice
>>> shout out from Sage (Ceph) on SPDK in Bluestore…
>>>
>>>
>>>
>>> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
>>> Ganesh
>>> *Sent:* Tuesday, September 12, 2017 4:29 AM
>>>
>>>
>>> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
>>> *Subject:* Re: [SPDK] SPDK errors
>>>
>>>
>>>
>>> John,
>>>
>>> Thanks for looking into this.
>>>
>>> Yes, this is similar.
>>>
>>> One of the significant difference is ConnectX-4 NIC. My setup uses
>>> ConnectX-3 NIC.
>>>
>>>
>>>
>>> I don't have access to my lab as it is being physically moved. Hope to
>>> get access this or next week.
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Sep 12, 2017 at 5:36 AM, Kariuki, John K <
>>> john.k.kariuki(a)intel.com> wrote:
>>>
>>> Ganesh
>>>
>>> I am trying to reproduce the issue in my environment without any luck.
>>> My environment has 2 CPUs with 22 cores/socket, Mellanox ConnectX-4 Lx
>>> NICs, Ubuntu 17.04 with kernel 4.10 and OFED 4.1. I have tried several
>>> different QD above 16 (all the way to 256) and perf ran successfully. Here
>>> is the workload parameters I am using. Is this similar to what you’re doing
>>> to reproduce the issue?
>>>
>>>
>>>
>>> ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
>>> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>>>
>>>
>>>
>>> ./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
>>> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>>>
>>>
>>>
>>> ./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
>>> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>>>
>>>
>>>
>>> ./perf -q 128 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
>>> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>>>
>>>
>>>
>>> ./perf -q 256 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
>>> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
>>> Ganesh
>>> *Sent:* Tuesday, September 05, 2017 4:52 AM
>>>
>>>
>>> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
>>> *Subject:* Re: [SPDK] SPDK errors
>>>
>>>
>>>
>>> Thanks.
>>> Traveling. My lab machines are shutdown as lab is being moved this week.
>>> When I land, will try to reach out.
>>>
>>> On Tue, Sep 5, 2017 at 3:40 AM Luse, Paul E <paul.e.luse(a)intel.com>
>>> wrote:
>>>
>>> Hi Ganesh,
>>>
>>>
>>>
>>> So I have my hardware installed , the Chelsio cards below, and I believe
>>> I’ve got the drivers built and installed correctly and ethtool is showing
>>> Link Detected on both ends but I’m not having much luck testing the
>>> connection in any other way and its not connecting w/SPDK. I’m far from an
>>> expexrt in this area, if you have some time and want to see if you can help
>>> me get this working we can use this setup to troubleshoot yours.  Let me
>>> know… I’m probably done for today though
>>>
>>>
>>>
>>> Thx
>>>
>>> Paul
>>>
>>>
>>>
>>> Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller
>>>
>>>
>>>
>>> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
>>> Ganesh
>>> *Sent:* Friday, September 1, 2017 11:59 AM
>>>
>>>
>>> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
>>> *Subject:* Re: [SPDK] SPDK errors
>>>
>>>
>>>
>>> Update:
>>>
>>> Just in case...
>>>
>>> Tried with CentOS 7 (Release: 7.3.1611) with both built in latest kernel
>>> (3.10.0-514.26.2.el7.x86_64),
>>>
>>> and updated latest stable (4.12.10-1.el7.elrepo.x86_64) installed.
>>>
>>>
>>>
>>> Similar behavior. ;(
>>>
>>>
>>>
>>> Maybe, next is to try with MOFED drivers.
>>>
>>>
>>>
>>> On Thu, Aug 31, 2017 at 3:49 PM, Santhebachalli Ganesh <
>>> svg.eid(a)gmail.com> wrote:
>>>
>>> Thanks!
>>>
>>>
>>>
>>> Update:
>>>
>>> Tried using IB mode instead of Eth on the Connectx-3 NIC.
>>>
>>> No change in behavior. Similar behavior and errors observed.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Aug 31, 2017 at 3:36 PM, Luse, Paul E <paul.e.luse(a)intel.com>
>>> wrote:
>>>
>>> Well those are good steps… hopefully someone else will jump in as well.
>>> I will see if I can get my HW setup to repro over the long weekend and let
>>> ya know how it goes…
>>>
>>>
>>>
>>> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
>>> Ganesh
>>> *Sent:* Wednesday, August 30, 2017 12:44 PM
>>> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
>>> *Subject:* Re: [SPDK] SPDK errors
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>> I did change a few cables, and even targets. Assuming that this took
>>> care of insertion force statistically.
>>>
>>> Could not do IB instead of Eth as my adapter does not support IB.
>>>
>>>
>>>
>>> Also, the error messages are not consistent. That was a snapshot of one
>>> of the runs.
>>>
>>> Then, there is also the older Connectx3 adapters (latest FW flashed)
>>> with newer kernels and latest SPDK/DPDK.
>>>
>>>
>>>
>>> Seen the following:
>>>
>>> >nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl
>>> does not map to outstanding cmd
>>>
>>> >bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>>>
>>> >bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but
>>> should be 65536
>>>
>>>
>>>
>>> On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net>
>>> wrote:
>>>
>>>
>>> Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
>>> > Folks,
>>> > My name is Ganesh, and I am working on NVEMoF performance metrics
>>> using SPDK (and kernel).
>>> > I would appreciate your expert insights.
>>> >
>>> > I am observing errors when QD on perf is increased above >=64 most of
>>> the
>>> > times. Sometimes, even for <=16
>>> > Errors are not consistent.
>>> >
>>> > Attached are some details.
>>> >
>>> > Please let me know if have any additional questions.
>>> >
>>> > Thanks.
>>> > -Ganesh
>>> >
>>>
>>> > SPDK errors 1.txt
>>> >
>>> >
>>> > Setup details:
>>> > -- Some info on setup
>>> > Same HW/SW on target and initiator.
>>> >
>>> > adminuser(a)dell730-80:~> hostnamectl
>>> >    Static hostname: dell730-80
>>> >          Icon name: computer-server
>>> >            Chassis: server
>>> >         Machine ID: b5abb0fe67afd04c59521c40599b3115
>>> >            Boot ID: f825aa6338194338a6f80125caa836c7
>>> >   Operating System: openSUSE Leap 42.3
>>> >        CPE OS Name: cpe:/o:opensuse:leap:42.3
>>> >             Kernel: Linux 4.12.8-1.g4d7933a-default
>>> >       Architecture: x86-64
>>> >
>>> > adminuser(a)dell730-80:~> lscpu | grep -i socket
>>> > Core(s) per socket:    12
>>> > Socket(s):             2
>>> >
>>> > 2MB and/or 1GB huge pages set,
>>> >
>>> > Latest spdk/dpdk from respective GIT,
>>> >
>>> > compiled with RDMA flag,
>>> >
>>> > nvmf.conf file: (have played around with the values)
>>> > reactor mask 0x5555
>>> > AcceptorCore 2
>>> > 1 - 3 Subsystems on cores 4,8,10
>>> >
>>> > adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c
>>> gits/spdk/etc/spdk/nvmf.conf -p 6
>>> >
>>> > PCI, NVME cards (16GB)
>>> > adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
>>> > 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117
>>> (rev 06)
>>> > 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117
>>> (rev 06)
>>> > 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117
>>> (rev 06)
>>> >
>>> > Network cards: (latest associated FW from vendor)
>>> > adminuser(a)dell730-80:~> sudo lspci | grep -i connect
>>> > 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family
>>> [ConnectX-3 Pro]
>>> >
>>> > --- initiator cmd line
>>> > sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
>>> traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
>>> >
>>> > --errors on stdout on target
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>>> *ERROR*: cpl does not map to outstanding cmd
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1
>>> cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>>> *ERROR*: cpl does not map to outstanding cmd
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1
>>> cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>>> *ERROR*: cpl does not map to outstanding cmd
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1
>>> cid:198 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>>> *ERROR*: cpl does not map to outstanding cmd
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1
>>> cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>>> *ERROR*: cpl does not map to outstanding cmd
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1
>>> cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>>> *ERROR*: readv failed: rc = -12
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>>> *ERROR*: readv failed: rc = -12
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>>> *ERROR*: readv failed: rc = -12
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>>> *ERROR*: readv failed: rc = -12
>>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>>> *ERROR*: readv failed: rc = -12
>>> > Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll:
>>> *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12):
>>> transport retry counter exceeded
>>> >
>>> > --- errros seen on client
>>> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ
>>> error on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry
>>> counter exceeded
>>> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ
>>> error on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request
>>> Flushed Error
>>> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ
>>> error on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed
>>> Error
>>> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ
>>> error on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed
>>> Error
>>>
>>> It's, actually, might be HW errors, because retries supposed to be
>>> engaged only on
>>> packets loss/corruptions. Might be bad or not too well inserted cables.
>>>
>>> Vlad
>>>
>>>
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>>>
>>>
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>>>
>>>
>>>
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>>>
>>>
>>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 42875 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-11-01 23:57 Santhebachalli Ganesh
  0 siblings, 0 replies; 22+ messages in thread
From: Santhebachalli Ganesh @ 2017-11-01 23:57 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 21096 bytes --]

Some typo; should read.

1.1.1.80, 1.1.2.80 - Target

1.1.1.81 - Ini 1

1.1.2.83 - Ini 2

On Wed, Nov 1, 2017 at 4:00 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com>
wrote:

> >What did you do to get past your previous issues?
> Same CX-3 and HW.
> Just latest SW/Distro - CentOS, latest (as of my install) kernel-ml,
> latest SPDK/DPDK.
>
> Have got some CX-5's now. Not yet installed.
> Will try to get some CX-4's.
>
> Thanks.
> Ganesh
>
>
> On Wed, Nov 1, 2017 at 3:52 PM, Luse, Paul E <paul.e.luse(a)intel.com>
> wrote:
>
>> Hi Ganesh,
>>
>>
>>
>> That’s progress at least.  What did you do to get past your previous
>> issues?  FYI I finally broke down and bought some CX-4’s, just got them
>> today actually so will work on trying to get them up and running end of
>> next week.
>>
>>
>>
>> In the meantime, I’m sure John or someone else can assist with the
>> questions below.
>>
>>
>>
>> Thx
>>
>> Paul
>>
>>
>>
>> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
>> Ganesh
>> *Sent:* Wednesday, November 1, 2017 3:10 PM
>>
>> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
>> *Subject:* Re: [SPDK] SPDK errors
>>
>>
>>
>> Paul and John,
>>
>> Back to some perf work.
>>
>> As always, thanks for your time and assistance in advance.
>>
>>
>>
>> Update:
>>
>> Able to run perf (ini to target) consistently with updated CentOS,
>> kernel-ml and latest SPDK/DPDK.
>>
>> Same HW: CX-3 and PMC NVME, PCI card.
>>
>> Able to get 1M IOP's (4KB) on generic HW.
>>
>> [adminuser(a)dell730-80 ~]$ uname -r
>>
>> 4.13.9-1.el7.elrepo.x86_64
>>
>> [adminuser(a)dell730-80 ~]$ lsb_release -d
>>
>> Description:    CentOS Linux release 7.4.1708 (Core)
>>
>>
>>
>> *Issue being faced now:*
>>
>> Can run perf at one initiator at a time.
>>
>> Unable to run perf from multiple initiators to target at the same time.
>> Able to discover and connect.
>>
>> Just "hangs" on initiator side when running perf. Don't seem to see any
>> messages anywhere.
>>
>> Maybe, I am missing some config to be done.
>>
>>
>>
>> Added info below. Let me know any more info you need.
>>
>>
>>
>> ---
>>
>> Initiators connected to target NIC ports on separate subnet
>>
>> Target has a dual port CX-3 NIC with IP addresses on separate subnet.
>>
>> 1.1.1.80, 1.2.2.80 - Target
>>
>> 1.1.1.81 - Ini 1
>>
>> 1.2.2.83 - Ini 2
>>
>>
>>
>> -- target
>>
>> [adminuser(a)dell730-80 ~]$ sudo ~/gits/spdk/app/nvmf_tgt/nvmf_tgt -c
>> ~/gits/spdk/etc/spdk/nvmf.conf
>>
>> Starting DPDK 17.08.0 initialization...
>>
>> [ DPDK EAL parameters: nvmf -c 0x5555 --file-prefix=spdk_pid3534 ]
>>
>> EAL: Detected 48 lcore(s)
>>
>> EAL: No free hugepages reported in hugepages-1048576kB
>>
>> EAL: Probing VFIO support...
>>
>> Total cores available: 8
>>
>> Occupied cpu socket mask is 0x1
>>
>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 2 on
>> socket 0
>>
>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 4 on
>> socket 0
>>
>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 6 on
>> socket 0
>>
>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 8 on
>> socket 0
>>
>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 10 on
>> socket 0
>>
>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 12 on
>> socket 0
>>
>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 14 on
>> socket 0
>>
>> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on
>> socket 0
>>
>> copy_engine_ioat.c: 306:copy_engine_ioat_init: *NOTICE*: Ioat Copy Engine
>> Offload Enabled
>>
>> EAL: PCI device 0000:04:00.0 on NUMA socket 0
>>
>> EAL:   probe driver: 11f8:f117 spdk_nvme
>>
>> EAL: PCI device 0000:06:00.0 on NUMA socket 0
>>
>> EAL:   probe driver: 11f8:f117 spdk_nvme
>>
>> nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem
>> nqn.2014-08.org.nvmexpress.discovery on lcore 2 on socket 0
>>
>> nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem
>> nqn.2016-06.io.spdk:cnode1 on lcore 2 on socket 0
>>
>> rdma.c:1145:spdk_nvmf_rdma_create: *NOTICE*: *** RDMA Transport Init ***
>>
>> rdma.c:1352:spdk_nvmf_rdma_listen: *NOTICE*: *** NVMf Target Listening
>> on 1.1.1.80 port 4420 ***
>>
>> conf.c: 500:spdk_nvmf_construct_subsystem: *NOTICE*: Attaching block
>> device Nvme0n1 to subsystem nqn.2016-06.io.spdk:cnode1
>>
>> nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem
>> nqn.2016-06.io.spdk:cnode2 on lcore 4 on socket 0
>>
>> rdma.c:1352:spdk_nvmf_rdma_listen: *NOTICE*: *** NVMf Target Listening
>> on 1.1.2.80 port 4420 ***
>>
>> conf.c: 500:spdk_nvmf_construct_subsystem: *NOTICE*: Attaching block
>> device Nvme1n1 to subsystem nqn.2016-06.io.spdk:cnode2
>>
>> nvmf_tgt.c: 255:spdk_nvmf_startup: *NOTICE*: Acceptor running on core 2
>> on socket 0
>>
>>
>>
>> --- ini
>>
>> Can see both NQN's on discovery
>>
>> Can connect to both NQN's
>>
>> [adminuser(a)dell730-81 ~]$ sudo nvme discover -t rdma -a 1.1.1.80 -s 4420
>>
>>
>>
>> Discovery Log Number of Records 2, Generation counter 5
>>
>> =====Discovery Log Entry 0======
>>
>> trtype:  rdma
>>
>> adrfam:  ipv4
>>
>> subtype: nvme subsystem
>>
>> treq:    not specified
>>
>> portid:  0
>>
>> trsvcid: 4420
>>
>> subnqn:  nqn.2016-06.io.spdk:cnode1
>>
>> traddr:  1.1.1.80
>>
>> rdma_prtype: not specified
>>
>> rdma_qptype: connected
>>
>> rdma_cms:    rdma-cm
>>
>> rdma_pkey: 0x0000
>>
>> =====Discovery Log Entry 1======
>>
>> trtype:  rdma
>>
>> adrfam:  ipv4
>>
>> subtype: nvme subsystem
>>
>> treq:    not specified
>>
>> portid:  1
>>
>> trsvcid: 4420
>>
>> subnqn:  nqn.2016-06.io.spdk:cnode2
>>
>> traddr:  1.1.2.80
>>
>> rdma_prtype: not specified
>>
>> rdma_qptype: connected
>>
>> rdma_cms:    rdma-cm
>>
>> rdma_pkey: 0x0000
>>
>>
>>
>> sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 1.1.1.80 -w
>> 1.1.1.81
>>
>> sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode2" -a 1.1.2.80 -w
>> 1.1.2.83
>>
>>
>>
>> [adminuser(a)dell730-81 ~]$ sudo ~/gits/spdk/examples/nvme/perf/perf -q 1
>> -s 4096 -w read -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80
>> trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
>>
>> Starting DPDK 17.08.0 initialization...
>>
>> [ DPDK EAL parameters: perf -c 0x2 --no-pci --file-prefix=spdk_pid3383 ]
>>
>> EAL: Detected 48 lcore(s)
>>
>> EAL: No free hugepages reported in hugepages-1048576kB
>>
>> Initializing NVMe Controllers
>>
>> Attaching to NVMe over Fabrics controller at 1.1.1.80:4420:
>> nqn.2016-06.io.spdk:cnode1
>>
>> Attached to NVMe over Fabrics controller at 1.1.1.80:4420:
>> nqn.2016-06.io.spdk:cnode1
>>
>> Associating SPDK bdev Controller (SPDK00000000000001  ) with lcore 1
>>
>> Initialization complete. Launching workers.
>>
>> Starting thread on core 1
>>
>>
>>
>> [adminuser(a)dell730-83 ~]$ sudo ~/gits/spdk/spdk/examples/nvme/perf/perf
>> -q 1 -s 4096 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.2.80
>> trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode2' -c 0x2
>>
>> Starting DPDK 17.08.0 initialization...
>>
>> [ DPDK EAL parameters: perf -c 0x2 --no-pci --file-prefix=spdk_pid3633 ]
>>
>> EAL: Detected 48 lcore(s)
>>
>> EAL: No free hugepages reported in hugepages-1048576kB
>>
>> Initializing NVMe Controllers
>>
>> Attaching to NVMe over Fabrics controller at 1.1.2.80:4420:
>> nqn.2016-06.io.spdk:cnode2
>>
>> Attached to NVMe over Fabrics controller at 1.1.2.80:4420:
>> nqn.2016-06.io.spdk:cnode2
>>
>> Associating SPDK bdev Controller (SPDK00000000000002  ) with lcore 1
>>
>> Initialization complete. Launching workers.
>>
>> Starting thread on core 1
>>
>>
>>
>> --- snippet from my nvmf.conf on target
>>
>>   TransportId "trtype:PCIe traddr:0000:04:00.0" Nvme0
>>
>>   TransportId "trtype:PCIe traddr:0000:06:00.0" Nvme1
>>
>>
>>
>> # Namespaces backed by physical NVMe devices
>>
>> [Subsystem1]
>>
>>   NQN nqn.2016-06.io.spdk:cnode1
>>
>>   Core 2
>>
>>   Listen RDMA 1.1.1.80:4420
>>
>>   AllowAnyHost Yes
>>
>>   SN SPDK00000000000001
>>
>>   Namespace Nvme0n1 1
>>
>>
>>
>> # Namespaces backed by physical NVMe devices
>>
>> [Subsystem2]
>>
>>   NQN nqn.2016-06.io.spdk:cnode2
>>
>>   Core 4
>>
>>   Listen RDMA 1.1.2.80:4420
>>
>>   AllowAnyHost Yes
>>
>>   SN SPDK00000000000002
>>
>>   Namespace Nvme1n1 1
>>
>>
>>
>> On Tue, Sep 12, 2017 at 4:00 PM, Luse, Paul E <paul.e.luse(a)intel.com>
>> wrote:
>>
>> Yeah, thanks John!  FYI Ganesh I didn’t forget about you I’m at SNIA SDC
>> this week.  Cool to see SPDK mentioned in lots of talks including a nice
>> shout out from Sage (Ceph) on SPDK in Bluestore…
>>
>>
>>
>> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
>> Ganesh
>> *Sent:* Tuesday, September 12, 2017 4:29 AM
>>
>>
>> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
>> *Subject:* Re: [SPDK] SPDK errors
>>
>>
>>
>> John,
>>
>> Thanks for looking into this.
>>
>> Yes, this is similar.
>>
>> One of the significant difference is ConnectX-4 NIC. My setup uses
>> ConnectX-3 NIC.
>>
>>
>>
>> I don't have access to my lab as it is being physically moved. Hope to
>> get access this or next week.
>>
>>
>>
>>
>>
>> On Tue, Sep 12, 2017 at 5:36 AM, Kariuki, John K <
>> john.k.kariuki(a)intel.com> wrote:
>>
>> Ganesh
>>
>> I am trying to reproduce the issue in my environment without any luck. My
>> environment has 2 CPUs with 22 cores/socket, Mellanox ConnectX-4 Lx NICs,
>> Ubuntu 17.04 with kernel 4.10 and OFED 4.1. I have tried several different
>> QD above 16 (all the way to 256) and perf ran successfully. Here is the
>> workload parameters I am using. Is this similar to what you’re doing to
>> reproduce the issue?
>>
>>
>>
>> ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
>> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>>
>>
>>
>> ./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
>> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>>
>>
>>
>> ./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
>> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>>
>>
>>
>> ./perf -q 128 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
>> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>>
>>
>>
>> ./perf -q 256 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
>> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>>
>>
>>
>>
>>
>>
>>
>> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
>> Ganesh
>> *Sent:* Tuesday, September 05, 2017 4:52 AM
>>
>>
>> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
>> *Subject:* Re: [SPDK] SPDK errors
>>
>>
>>
>> Thanks.
>> Traveling. My lab machines are shutdown as lab is being moved this week.
>> When I land, will try to reach out.
>>
>> On Tue, Sep 5, 2017 at 3:40 AM Luse, Paul E <paul.e.luse(a)intel.com>
>> wrote:
>>
>> Hi Ganesh,
>>
>>
>>
>> So I have my hardware installed , the Chelsio cards below, and I believe
>> I’ve got the drivers built and installed correctly and ethtool is showing
>> Link Detected on both ends but I’m not having much luck testing the
>> connection in any other way and its not connecting w/SPDK. I’m far from an
>> expexrt in this area, if you have some time and want to see if you can help
>> me get this working we can use this setup to troubleshoot yours.  Let me
>> know… I’m probably done for today though
>>
>>
>>
>> Thx
>>
>> Paul
>>
>>
>>
>> Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller
>>
>>
>>
>> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
>> Ganesh
>> *Sent:* Friday, September 1, 2017 11:59 AM
>>
>>
>> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
>> *Subject:* Re: [SPDK] SPDK errors
>>
>>
>>
>> Update:
>>
>> Just in case...
>>
>> Tried with CentOS 7 (Release: 7.3.1611) with both built in latest kernel
>> (3.10.0-514.26.2.el7.x86_64),
>>
>> and updated latest stable (4.12.10-1.el7.elrepo.x86_64) installed.
>>
>>
>>
>> Similar behavior. ;(
>>
>>
>>
>> Maybe, next is to try with MOFED drivers.
>>
>>
>>
>> On Thu, Aug 31, 2017 at 3:49 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com>
>> wrote:
>>
>> Thanks!
>>
>>
>>
>> Update:
>>
>> Tried using IB mode instead of Eth on the Connectx-3 NIC.
>>
>> No change in behavior. Similar behavior and errors observed.
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Aug 31, 2017 at 3:36 PM, Luse, Paul E <paul.e.luse(a)intel.com>
>> wrote:
>>
>> Well those are good steps… hopefully someone else will jump in as well. I
>> will see if I can get my HW setup to repro over the long weekend and let ya
>> know how it goes…
>>
>>
>>
>> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
>> Ganesh
>> *Sent:* Wednesday, August 30, 2017 12:44 PM
>> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
>> *Subject:* Re: [SPDK] SPDK errors
>>
>>
>>
>> Thanks.
>>
>>
>>
>> I did change a few cables, and even targets. Assuming that this took care
>> of insertion force statistically.
>>
>> Could not do IB instead of Eth as my adapter does not support IB.
>>
>>
>>
>> Also, the error messages are not consistent. That was a snapshot of one
>> of the runs.
>>
>> Then, there is also the older Connectx3 adapters (latest FW flashed) with
>> newer kernels and latest SPDK/DPDK.
>>
>>
>>
>> Seen the following:
>>
>> >nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does
>> not map to outstanding cmd
>>
>> >bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>>
>> >bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but
>> should be 65536
>>
>>
>>
>> On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net>
>> wrote:
>>
>>
>> Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
>> > Folks,
>> > My name is Ganesh, and I am working on NVEMoF performance metrics using
>> SPDK (and kernel).
>> > I would appreciate your expert insights.
>> >
>> > I am observing errors when QD on perf is increased above >=64 most of
>> the
>> > times. Sometimes, even for <=16
>> > Errors are not consistent.
>> >
>> > Attached are some details.
>> >
>> > Please let me know if have any additional questions.
>> >
>> > Thanks.
>> > -Ganesh
>> >
>>
>> > SPDK errors 1.txt
>> >
>> >
>> > Setup details:
>> > -- Some info on setup
>> > Same HW/SW on target and initiator.
>> >
>> > adminuser(a)dell730-80:~> hostnamectl
>> >    Static hostname: dell730-80
>> >          Icon name: computer-server
>> >            Chassis: server
>> >         Machine ID: b5abb0fe67afd04c59521c40599b3115
>> >            Boot ID: f825aa6338194338a6f80125caa836c7
>> >   Operating System: openSUSE Leap 42.3
>> >        CPE OS Name: cpe:/o:opensuse:leap:42.3
>> >             Kernel: Linux 4.12.8-1.g4d7933a-default
>> >       Architecture: x86-64
>> >
>> > adminuser(a)dell730-80:~> lscpu | grep -i socket
>> > Core(s) per socket:    12
>> > Socket(s):             2
>> >
>> > 2MB and/or 1GB huge pages set,
>> >
>> > Latest spdk/dpdk from respective GIT,
>> >
>> > compiled with RDMA flag,
>> >
>> > nvmf.conf file: (have played around with the values)
>> > reactor mask 0x5555
>> > AcceptorCore 2
>> > 1 - 3 Subsystems on cores 4,8,10
>> >
>> > adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c
>> gits/spdk/etc/spdk/nvmf.conf -p 6
>> >
>> > PCI, NVME cards (16GB)
>> > adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
>> > 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117
>> (rev 06)
>> > 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117
>> (rev 06)
>> > 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117
>> (rev 06)
>> >
>> > Network cards: (latest associated FW from vendor)
>> > adminuser(a)dell730-80:~> sudo lspci | grep -i connect
>> > 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family
>> [ConnectX-3 Pro]
>> >
>> > --- initiator cmd line
>> > sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
>> traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
>> >
>> > --errors on stdout on target
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>> *ERROR*: cpl does not map to outstanding cmd
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
>> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>> *ERROR*: cpl does not map to outstanding cmd
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
>> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>> *ERROR*: cpl does not map to outstanding cmd
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198
>> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>> *ERROR*: cpl does not map to outstanding cmd
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
>> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>> *ERROR*: cpl does not map to outstanding cmd
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
>> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>> *ERROR*: readv failed: rc = -12
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>> *ERROR*: readv failed: rc = -12
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>> *ERROR*: readv failed: rc = -12
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>> *ERROR*: readv failed: rc = -12
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>> *ERROR*: readv failed: rc = -12
>> > Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll:
>> *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12):
>> transport retry counter exceeded
>> >
>> > --- errros seen on client
>> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ
>> error on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry
>> counter exceeded
>> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ
>> error on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request
>> Flushed Error
>> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ
>> error on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed
>> Error
>> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ
>> error on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed
>> Error
>>
>> It's, actually, might be HW errors, because retries supposed to be
>> engaged only on
>> packets loss/corruptions. Might be bad or not too well inserted cables.
>>
>> Vlad
>>
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>>
>>
>>
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>>
>>
>>
>>
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>>
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>>
>>
>>
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>>
>>
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>>
>>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 41774 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-11-01 23:00 Santhebachalli Ganesh
  0 siblings, 0 replies; 22+ messages in thread
From: Santhebachalli Ganesh @ 2017-11-01 23:00 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 20124 bytes --]

>What did you do to get past your previous issues?
Same CX-3 and HW.
Just latest SW/Distro - CentOS, latest (as of my install) kernel-ml, latest
SPDK/DPDK.

Have got some CX-5's now. Not yet installed.
Will try to get some CX-4's.

Thanks.
Ganesh


On Wed, Nov 1, 2017 at 3:52 PM, Luse, Paul E <paul.e.luse(a)intel.com> wrote:

> Hi Ganesh,
>
>
>
> That’s progress at least.  What did you do to get past your previous
> issues?  FYI I finally broke down and bought some CX-4’s, just got them
> today actually so will work on trying to get them up and running end of
> next week.
>
>
>
> In the meantime, I’m sure John or someone else can assist with the
> questions below.
>
>
>
> Thx
>
> Paul
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Wednesday, November 1, 2017 3:10 PM
>
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Paul and John,
>
> Back to some perf work.
>
> As always, thanks for your time and assistance in advance.
>
>
>
> Update:
>
> Able to run perf (ini to target) consistently with updated CentOS,
> kernel-ml and latest SPDK/DPDK.
>
> Same HW: CX-3 and PMC NVME, PCI card.
>
> Able to get 1M IOP's (4KB) on generic HW.
>
> [adminuser(a)dell730-80 ~]$ uname -r
>
> 4.13.9-1.el7.elrepo.x86_64
>
> [adminuser(a)dell730-80 ~]$ lsb_release -d
>
> Description:    CentOS Linux release 7.4.1708 (Core)
>
>
>
> *Issue being faced now:*
>
> Can run perf at one initiator at a time.
>
> Unable to run perf from multiple initiators to target at the same time.
> Able to discover and connect.
>
> Just "hangs" on initiator side when running perf. Don't seem to see any
> messages anywhere.
>
> Maybe, I am missing some config to be done.
>
>
>
> Added info below. Let me know any more info you need.
>
>
>
> ---
>
> Initiators connected to target NIC ports on separate subnet
>
> Target has a dual port CX-3 NIC with IP addresses on separate subnet.
>
> 1.1.1.80, 1.2.2.80 - Target
>
> 1.1.1.81 - Ini 1
>
> 1.2.2.83 - Ini 2
>
>
>
> -- target
>
> [adminuser(a)dell730-80 ~]$ sudo ~/gits/spdk/app/nvmf_tgt/nvmf_tgt -c
> ~/gits/spdk/etc/spdk/nvmf.conf
>
> Starting DPDK 17.08.0 initialization...
>
> [ DPDK EAL parameters: nvmf -c 0x5555 --file-prefix=spdk_pid3534 ]
>
> EAL: Detected 48 lcore(s)
>
> EAL: No free hugepages reported in hugepages-1048576kB
>
> EAL: Probing VFIO support...
>
> Total cores available: 8
>
> Occupied cpu socket mask is 0x1
>
> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 2 on
> socket 0
>
> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 4 on
> socket 0
>
> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 6 on
> socket 0
>
> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 8 on
> socket 0
>
> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 10 on
> socket 0
>
> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 12 on
> socket 0
>
> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 14 on
> socket 0
>
> reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on
> socket 0
>
> copy_engine_ioat.c: 306:copy_engine_ioat_init: *NOTICE*: Ioat Copy Engine
> Offload Enabled
>
> EAL: PCI device 0000:04:00.0 on NUMA socket 0
>
> EAL:   probe driver: 11f8:f117 spdk_nvme
>
> EAL: PCI device 0000:06:00.0 on NUMA socket 0
>
> EAL:   probe driver: 11f8:f117 spdk_nvme
>
> nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem
> nqn.2014-08.org.nvmexpress.discovery on lcore 2 on socket 0
>
> nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem
> nqn.2016-06.io.spdk:cnode1 on lcore 2 on socket 0
>
> rdma.c:1145:spdk_nvmf_rdma_create: *NOTICE*: *** RDMA Transport Init ***
>
> rdma.c:1352:spdk_nvmf_rdma_listen: *NOTICE*: *** NVMf Target Listening on
> 1.1.1.80 port 4420 ***
>
> conf.c: 500:spdk_nvmf_construct_subsystem: *NOTICE*: Attaching block
> device Nvme0n1 to subsystem nqn.2016-06.io.spdk:cnode1
>
> nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem
> nqn.2016-06.io.spdk:cnode2 on lcore 4 on socket 0
>
> rdma.c:1352:spdk_nvmf_rdma_listen: *NOTICE*: *** NVMf Target Listening on
> 1.1.2.80 port 4420 ***
>
> conf.c: 500:spdk_nvmf_construct_subsystem: *NOTICE*: Attaching block
> device Nvme1n1 to subsystem nqn.2016-06.io.spdk:cnode2
>
> nvmf_tgt.c: 255:spdk_nvmf_startup: *NOTICE*: Acceptor running on core 2 on
> socket 0
>
>
>
> --- ini
>
> Can see both NQN's on discovery
>
> Can connect to both NQN's
>
> [adminuser(a)dell730-81 ~]$ sudo nvme discover -t rdma -a 1.1.1.80 -s 4420
>
>
>
> Discovery Log Number of Records 2, Generation counter 5
>
> =====Discovery Log Entry 0======
>
> trtype:  rdma
>
> adrfam:  ipv4
>
> subtype: nvme subsystem
>
> treq:    not specified
>
> portid:  0
>
> trsvcid: 4420
>
> subnqn:  nqn.2016-06.io.spdk:cnode1
>
> traddr:  1.1.1.80
>
> rdma_prtype: not specified
>
> rdma_qptype: connected
>
> rdma_cms:    rdma-cm
>
> rdma_pkey: 0x0000
>
> =====Discovery Log Entry 1======
>
> trtype:  rdma
>
> adrfam:  ipv4
>
> subtype: nvme subsystem
>
> treq:    not specified
>
> portid:  1
>
> trsvcid: 4420
>
> subnqn:  nqn.2016-06.io.spdk:cnode2
>
> traddr:  1.1.2.80
>
> rdma_prtype: not specified
>
> rdma_qptype: connected
>
> rdma_cms:    rdma-cm
>
> rdma_pkey: 0x0000
>
>
>
> sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 1.1.1.80 -w
> 1.1.1.81
>
> sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode2" -a 1.1.2.80 -w
> 1.1.2.83
>
>
>
> [adminuser(a)dell730-81 ~]$ sudo ~/gits/spdk/examples/nvme/perf/perf -q 1
> -s 4096 -w read -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80
> trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
>
> Starting DPDK 17.08.0 initialization...
>
> [ DPDK EAL parameters: perf -c 0x2 --no-pci --file-prefix=spdk_pid3383 ]
>
> EAL: Detected 48 lcore(s)
>
> EAL: No free hugepages reported in hugepages-1048576kB
>
> Initializing NVMe Controllers
>
> Attaching to NVMe over Fabrics controller at 1.1.1.80:4420:
> nqn.2016-06.io.spdk:cnode1
>
> Attached to NVMe over Fabrics controller at 1.1.1.80:4420:
> nqn.2016-06.io.spdk:cnode1
>
> Associating SPDK bdev Controller (SPDK00000000000001  ) with lcore 1
>
> Initialization complete. Launching workers.
>
> Starting thread on core 1
>
>
>
> [adminuser(a)dell730-83 ~]$ sudo ~/gits/spdk/spdk/examples/nvme/perf/perf
> -q 1 -s 4096 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.2.80
> trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode2' -c 0x2
>
> Starting DPDK 17.08.0 initialization...
>
> [ DPDK EAL parameters: perf -c 0x2 --no-pci --file-prefix=spdk_pid3633 ]
>
> EAL: Detected 48 lcore(s)
>
> EAL: No free hugepages reported in hugepages-1048576kB
>
> Initializing NVMe Controllers
>
> Attaching to NVMe over Fabrics controller at 1.1.2.80:4420:
> nqn.2016-06.io.spdk:cnode2
>
> Attached to NVMe over Fabrics controller at 1.1.2.80:4420:
> nqn.2016-06.io.spdk:cnode2
>
> Associating SPDK bdev Controller (SPDK00000000000002  ) with lcore 1
>
> Initialization complete. Launching workers.
>
> Starting thread on core 1
>
>
>
> --- snippet from my nvmf.conf on target
>
>   TransportId "trtype:PCIe traddr:0000:04:00.0" Nvme0
>
>   TransportId "trtype:PCIe traddr:0000:06:00.0" Nvme1
>
>
>
> # Namespaces backed by physical NVMe devices
>
> [Subsystem1]
>
>   NQN nqn.2016-06.io.spdk:cnode1
>
>   Core 2
>
>   Listen RDMA 1.1.1.80:4420
>
>   AllowAnyHost Yes
>
>   SN SPDK00000000000001
>
>   Namespace Nvme0n1 1
>
>
>
> # Namespaces backed by physical NVMe devices
>
> [Subsystem2]
>
>   NQN nqn.2016-06.io.spdk:cnode2
>
>   Core 4
>
>   Listen RDMA 1.1.2.80:4420
>
>   AllowAnyHost Yes
>
>   SN SPDK00000000000002
>
>   Namespace Nvme1n1 1
>
>
>
> On Tue, Sep 12, 2017 at 4:00 PM, Luse, Paul E <paul.e.luse(a)intel.com>
> wrote:
>
> Yeah, thanks John!  FYI Ganesh I didn’t forget about you I’m at SNIA SDC
> this week.  Cool to see SPDK mentioned in lots of talks including a nice
> shout out from Sage (Ceph) on SPDK in Bluestore…
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Tuesday, September 12, 2017 4:29 AM
>
>
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> John,
>
> Thanks for looking into this.
>
> Yes, this is similar.
>
> One of the significant difference is ConnectX-4 NIC. My setup uses
> ConnectX-3 NIC.
>
>
>
> I don't have access to my lab as it is being physically moved. Hope to get
> access this or next week.
>
>
>
>
>
> On Tue, Sep 12, 2017 at 5:36 AM, Kariuki, John K <john.k.kariuki(a)intel.com>
> wrote:
>
> Ganesh
>
> I am trying to reproduce the issue in my environment without any luck. My
> environment has 2 CPUs with 22 cores/socket, Mellanox ConnectX-4 Lx NICs,
> Ubuntu 17.04 with kernel 4.10 and OFED 4.1. I have tried several different
> QD above 16 (all the way to 256) and perf ran successfully. Here is the
> workload parameters I am using. Is this similar to what you’re doing to
> reproduce the issue?
>
>
>
> ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
> ./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
> ./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
> ./perf -q 128 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
> ./perf -q 256 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
>
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Tuesday, September 05, 2017 4:52 AM
>
>
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Thanks.
> Traveling. My lab machines are shutdown as lab is being moved this week.
> When I land, will try to reach out.
>
> On Tue, Sep 5, 2017 at 3:40 AM Luse, Paul E <paul.e.luse(a)intel.com> wrote:
>
> Hi Ganesh,
>
>
>
> So I have my hardware installed , the Chelsio cards below, and I believe
> I’ve got the drivers built and installed correctly and ethtool is showing
> Link Detected on both ends but I’m not having much luck testing the
> connection in any other way and its not connecting w/SPDK. I’m far from an
> expexrt in this area, if you have some time and want to see if you can help
> me get this working we can use this setup to troubleshoot yours.  Let me
> know… I’m probably done for today though
>
>
>
> Thx
>
> Paul
>
>
>
> Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Friday, September 1, 2017 11:59 AM
>
>
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Update:
>
> Just in case...
>
> Tried with CentOS 7 (Release: 7.3.1611) with both built in latest kernel
> (3.10.0-514.26.2.el7.x86_64),
>
> and updated latest stable (4.12.10-1.el7.elrepo.x86_64) installed.
>
>
>
> Similar behavior. ;(
>
>
>
> Maybe, next is to try with MOFED drivers.
>
>
>
> On Thu, Aug 31, 2017 at 3:49 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com>
> wrote:
>
> Thanks!
>
>
>
> Update:
>
> Tried using IB mode instead of Eth on the Connectx-3 NIC.
>
> No change in behavior. Similar behavior and errors observed.
>
>
>
>
>
>
>
> On Thu, Aug 31, 2017 at 3:36 PM, Luse, Paul E <paul.e.luse(a)intel.com>
> wrote:
>
> Well those are good steps… hopefully someone else will jump in as well. I
> will see if I can get my HW setup to repro over the long weekend and let ya
> know how it goes…
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Wednesday, August 30, 2017 12:44 PM
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Thanks.
>
>
>
> I did change a few cables, and even targets. Assuming that this took care
> of insertion force statistically.
>
> Could not do IB instead of Eth as my adapter does not support IB.
>
>
>
> Also, the error messages are not consistent. That was a snapshot of one of
> the runs.
>
> Then, there is also the older Connectx3 adapters (latest FW flashed) with
> newer kernels and latest SPDK/DPDK.
>
>
>
> Seen the following:
>
> >nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does
> not map to outstanding cmd
>
> >bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>
> >bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but
> should be 65536
>
>
>
> On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net>
> wrote:
>
>
> Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
> > Folks,
> > My name is Ganesh, and I am working on NVEMoF performance metrics using
> SPDK (and kernel).
> > I would appreciate your expert insights.
> >
> > I am observing errors when QD on perf is increased above >=64 most of the
> > times. Sometimes, even for <=16
> > Errors are not consistent.
> >
> > Attached are some details.
> >
> > Please let me know if have any additional questions.
> >
> > Thanks.
> > -Ganesh
> >
>
> > SPDK errors 1.txt
> >
> >
> > Setup details:
> > -- Some info on setup
> > Same HW/SW on target and initiator.
> >
> > adminuser(a)dell730-80:~> hostnamectl
> >    Static hostname: dell730-80
> >          Icon name: computer-server
> >            Chassis: server
> >         Machine ID: b5abb0fe67afd04c59521c40599b3115
> >            Boot ID: f825aa6338194338a6f80125caa836c7
> >   Operating System: openSUSE Leap 42.3
> >        CPE OS Name: cpe:/o:opensuse:leap:42.3
> >             Kernel: Linux 4.12.8-1.g4d7933a-default
> >       Architecture: x86-64
> >
> > adminuser(a)dell730-80:~> lscpu | grep -i socket
> > Core(s) per socket:    12
> > Socket(s):             2
> >
> > 2MB and/or 1GB huge pages set,
> >
> > Latest spdk/dpdk from respective GIT,
> >
> > compiled with RDMA flag,
> >
> > nvmf.conf file: (have played around with the values)
> > reactor mask 0x5555
> > AcceptorCore 2
> > 1 - 3 Subsystems on cores 4,8,10
> >
> > adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c
> gits/spdk/etc/spdk/nvmf.conf -p 6
> >
> > PCI, NVME cards (16GB)
> > adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
> > 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> > 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> > 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> >
> > Network cards: (latest associated FW from vendor)
> > adminuser(a)dell730-80:~> sudo lspci | grep -i connect
> > 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family
> [ConnectX-3 Pro]
> >
> > --- initiator cmd line
> > sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
> >
> > --errors on stdout on target
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll:
> *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12):
> transport retry counter exceeded
> >
> > --- errros seen on client
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter
> exceeded
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed
> Error
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error
>
> It's, actually, might be HW errors, because retries supposed to be engaged
> only on
> packets loss/corruptions. Might be bad or not too well inserted cables.
>
> Vlad
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 41256 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-11-01 22:52 Luse, Paul E
  0 siblings, 0 replies; 22+ messages in thread
From: Luse, Paul E @ 2017-11-01 22:52 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 18530 bytes --]

Hi Ganesh,

That’s progress at least.  What did you do to get past your previous issues?  FYI I finally broke down and bought some CX-4’s, just got them today actually so will work on trying to get them up and running end of next week.

In the meantime, I’m sure John or someone else can assist with the questions below.

Thx
Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Santhebachalli Ganesh
Sent: Wednesday, November 1, 2017 3:10 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] SPDK errors

Paul and John,
Back to some perf work.
As always, thanks for your time and assistance in advance.

Update:
Able to run perf (ini to target) consistently with updated CentOS, kernel-ml and latest SPDK/DPDK.
Same HW: CX-3 and PMC NVME, PCI card.
Able to get 1M IOP's (4KB) on generic HW.
[adminuser(a)dell730-80 ~]$ uname -r
4.13.9-1.el7.elrepo.x86_64
[adminuser(a)dell730-80 ~]$ lsb_release -d
Description:    CentOS Linux release 7.4.1708 (Core)

Issue being faced now:
Can run perf at one initiator at a time.
Unable to run perf from multiple initiators to target at the same time. Able to discover and connect.
Just "hangs" on initiator side when running perf. Don't seem to see any messages anywhere.
Maybe, I am missing some config to be done.

Added info below. Let me know any more info you need.

---
Initiators connected to target NIC ports on separate subnet
Target has a dual port CX-3 NIC with IP addresses on separate subnet.
1.1.1.80, 1.2.2.80 - Target
1.1.1.81 - Ini 1
1.2.2.83 - Ini 2

-- target
[adminuser(a)dell730-80 ~]$ sudo ~/gits/spdk/app/nvmf_tgt/nvmf_tgt -c ~/gits/spdk/etc/spdk/nvmf.conf
Starting DPDK 17.08.0 initialization...
[ DPDK EAL parameters: nvmf -c 0x5555 --file-prefix=spdk_pid3534 ]
EAL: Detected 48 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
Total cores available: 8
Occupied cpu socket mask is 0x1
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 2 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 4 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 6 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 8 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 10 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 12 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 14 on socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on socket 0
copy_engine_ioat.c: 306:copy_engine_ioat_init: *NOTICE*: Ioat Copy Engine Offload Enabled
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL:   probe driver: 11f8:f117 spdk_nvme
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL:   probe driver: 11f8:f117 spdk_nvme
nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem nqn.2014-08.org.nvmexpress.discovery on lcore 2 on socket 0
nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem nqn.2016-06.io.spdk:cnode1 on lcore 2 on socket 0
rdma.c:1145:spdk_nvmf_rdma_create: *NOTICE*: *** RDMA Transport Init ***
rdma.c:1352:spdk_nvmf_rdma_listen: *NOTICE*: *** NVMf Target Listening on 1.1.1.80 port 4420 ***
conf.c: 500:spdk_nvmf_construct_subsystem: *NOTICE*: Attaching block device Nvme0n1 to subsystem nqn.2016-06.io.spdk:cnode1
nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem nqn.2016-06.io.spdk:cnode2 on lcore 4 on socket 0
rdma.c:1352:spdk_nvmf_rdma_listen: *NOTICE*: *** NVMf Target Listening on 1.1.2.80 port 4420 ***
conf.c: 500:spdk_nvmf_construct_subsystem: *NOTICE*: Attaching block device Nvme1n1 to subsystem nqn.2016-06.io.spdk:cnode2
nvmf_tgt.c: 255:spdk_nvmf_startup: *NOTICE*: Acceptor running on core 2 on socket 0

--- ini
Can see both NQN's on discovery
Can connect to both NQN's
[adminuser(a)dell730-81 ~]$ sudo nvme discover -t rdma -a 1.1.1.80 -s 4420

Discovery Log Number of Records 2, Generation counter 5
=====Discovery Log Entry 0======
trtype:  rdma
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified
portid:  0
trsvcid: 4420
subnqn:  nqn.2016-06.io.spdk:cnode1
traddr:  1.1.1.80
rdma_prtype: not specified
rdma_qptype: connected
rdma_cms:    rdma-cm
rdma_pkey: 0x0000
=====Discovery Log Entry 1======
trtype:  rdma
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified
portid:  1
trsvcid: 4420
subnqn:  nqn.2016-06.io.spdk:cnode2
traddr:  1.1.2.80
rdma_prtype: not specified
rdma_qptype: connected
rdma_cms:    rdma-cm
rdma_pkey: 0x0000

sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 1.1.1.80 -w 1.1.1.81
sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode2" -a 1.1.2.80 -w 1.1.2.83

[adminuser(a)dell730-81 ~]$ sudo ~/gits/spdk/examples/nvme/perf/perf -q 1 -s 4096 -w read -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
Starting DPDK 17.08.0 initialization...
[ DPDK EAL parameters: perf -c 0x2 --no-pci --file-prefix=spdk_pid3383 ]
EAL: Detected 48 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
Initializing NVMe Controllers
Attaching to NVMe over Fabrics controller at 1.1.1.80:4420<http://1.1.1.80:4420>: nqn.2016-06.io.spdk:cnode1
Attached to NVMe over Fabrics controller at 1.1.1.80:4420<http://1.1.1.80:4420>: nqn.2016-06.io.spdk:cnode1
Associating SPDK bdev Controller (SPDK00000000000001  ) with lcore 1
Initialization complete. Launching workers.
Starting thread on core 1

[adminuser(a)dell730-83 ~]$ sudo ~/gits/spdk/spdk/examples/nvme/perf/perf -q 1 -s 4096 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.2.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode2' -c 0x2
Starting DPDK 17.08.0 initialization...
[ DPDK EAL parameters: perf -c 0x2 --no-pci --file-prefix=spdk_pid3633 ]
EAL: Detected 48 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
Initializing NVMe Controllers
Attaching to NVMe over Fabrics controller at 1.1.2.80:4420<http://1.1.2.80:4420>: nqn.2016-06.io.spdk:cnode2
Attached to NVMe over Fabrics controller at 1.1.2.80:4420<http://1.1.2.80:4420>: nqn.2016-06.io.spdk:cnode2
Associating SPDK bdev Controller (SPDK00000000000002  ) with lcore 1
Initialization complete. Launching workers.
Starting thread on core 1

--- snippet from my nvmf.conf on target
  TransportId "trtype:PCIe traddr:0000:04:00.0" Nvme0
  TransportId "trtype:PCIe traddr:0000:06:00.0" Nvme1

# Namespaces backed by physical NVMe devices
[Subsystem1]
  NQN nqn.2016-06.io.spdk:cnode1
  Core 2
  Listen RDMA 1.1.1.80:4420<http://1.1.1.80:4420>
  AllowAnyHost Yes
  SN SPDK00000000000001
  Namespace Nvme0n1 1

# Namespaces backed by physical NVMe devices
[Subsystem2]
  NQN nqn.2016-06.io.spdk:cnode2
  Core 4
  Listen RDMA 1.1.2.80:4420<http://1.1.2.80:4420>
  AllowAnyHost Yes
  SN SPDK00000000000002
  Namespace Nvme1n1 1

On Tue, Sep 12, 2017 at 4:00 PM, Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
Yeah, thanks John!  FYI Ganesh I didn’t forget about you I’m at SNIA SDC this week.  Cool to see SPDK mentioned in lots of talks including a nice shout out from Sage (Ceph) on SPDK in Bluestore…

From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Tuesday, September 12, 2017 4:29 AM

To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

John,
Thanks for looking into this.
Yes, this is similar.
One of the significant difference is ConnectX-4 NIC. My setup uses ConnectX-3 NIC.

I don't have access to my lab as it is being physically moved. Hope to get access this or next week.


On Tue, Sep 12, 2017 at 5:36 AM, Kariuki, John K <john.k.kariuki(a)intel.com<mailto:john.k.kariuki(a)intel.com>> wrote:
Ganesh
I am trying to reproduce the issue in my environment without any luck. My environment has 2 CPUs with 22 cores/socket, Mellanox ConnectX-4 Lx NICs, Ubuntu 17.04 with kernel 4.10 and OFED 4.1. I have tried several different QD above 16 (all the way to 256) and perf ran successfully. Here is the workload parameters I am using. Is this similar to what you’re doing to reproduce the issue?

./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 128 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 256 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'



From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Tuesday, September 05, 2017 4:52 AM

To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

Thanks.
Traveling. My lab machines are shutdown as lab is being moved this week.
When I land, will try to reach out.
On Tue, Sep 5, 2017 at 3:40 AM Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
Hi Ganesh,

So I have my hardware installed , the Chelsio cards below, and I believe I’ve got the drivers built and installed correctly and ethtool is showing Link Detected on both ends but I’m not having much luck testing the connection in any other way and its not connecting w/SPDK. I’m far from an expexrt in this area, if you have some time and want to see if you can help me get this working we can use this setup to troubleshoot yours.  Let me know… I’m probably done for today though

Thx
Paul

Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller

From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Friday, September 1, 2017 11:59 AM

To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

Update:
Just in case...
Tried with CentOS 7 (Release: 7.3.1611) with both built in latest kernel (3.10.0-514.26.2.el7.x86_64),
and updated latest stable (4.12.10-1.el7.elrepo.x86_64) installed.

Similar behavior. ;(

Maybe, next is to try with MOFED drivers.

On Thu, Aug 31, 2017 at 3:49 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com<mailto:svg.eid(a)gmail.com>> wrote:
Thanks!

Update:
Tried using IB mode instead of Eth on the Connectx-3 NIC.
No change in behavior. Similar behavior and errors observed.



On Thu, Aug 31, 2017 at 3:36 PM, Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
Well those are good steps… hopefully someone else will jump in as well. I will see if I can get my HW setup to repro over the long weekend and let ya know how it goes…

From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Wednesday, August 30, 2017 12:44 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

Thanks.

I did change a few cables, and even targets. Assuming that this took care of insertion force statistically.
Could not do IB instead of Eth as my adapter does not support IB.

Also, the error messages are not consistent. That was a snapshot of one of the runs.
Then, there is also the older Connectx3 adapters (latest FW flashed) with newer kernels and latest SPDK/DPDK.

Seen the following:
>nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
>bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but should be 65536

On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net<mailto:vst(a)vlnb.net>> wrote:

Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
> Folks,
> My name is Ganesh, and I am working on NVEMoF performance metrics using SPDK (and kernel).
> I would appreciate your expert insights.
>
> I am observing errors when QD on perf is increased above >=64 most of the
> times. Sometimes, even for <=16
> Errors are not consistent.
>
> Attached are some details.
>
> Please let me know if have any additional questions.
>
> Thanks.
> -Ganesh
>
> SPDK errors 1.txt
>
>
> Setup details:
> -- Some info on setup
> Same HW/SW on target and initiator.
>
> adminuser(a)dell730-80:~> hostnamectl
>    Static hostname: dell730-80
>          Icon name: computer-server
>            Chassis: server
>         Machine ID: b5abb0fe67afd04c59521c40599b3115
>            Boot ID: f825aa6338194338a6f80125caa836c7
>   Operating System: openSUSE Leap 42.3
>        CPE OS Name: cpe:/o:opensuse:leap:42.3
>             Kernel: Linux 4.12.8-1.g4d7933a-default
>       Architecture: x86-64
>
> adminuser(a)dell730-80:~> lscpu | grep -i socket
> Core(s) per socket:    12
> Socket(s):             2
>
> 2MB and/or 1GB huge pages set,
>
> Latest spdk/dpdk from respective GIT,
>
> compiled with RDMA flag,
>
> nvmf.conf file: (have played around with the values)
> reactor mask 0x5555
> AcceptorCore 2
> 1 - 3 Subsystems on cores 4,8,10
>
> adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c gits/spdk/etc/spdk/nvmf.conf -p 6
>
> PCI, NVME cards (16GB)
> adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
> 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
>
> Network cards: (latest associated FW from vendor)
> adminuser(a)dell730-80:~> sudo lspci | grep -i connect
> 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
>
> --- initiator cmd line
> sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
>
> --errors on stdout on target
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll: *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12): transport retry counter exceeded
>
> --- errros seen on client
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter exceeded
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error

It's, actually, might be HW errors, because retries supposed to be engaged only on
packets loss/corruptions. Might be bad or not too well inserted cables.

Vlad

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 47115 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-11-01 22:10 Santhebachalli Ganesh
  0 siblings, 0 replies; 22+ messages in thread
From: Santhebachalli Ganesh @ 2017-11-01 22:10 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 18203 bytes --]

Paul and John,
Back to some perf work.
As always, thanks for your time and assistance in advance.

Update:
Able to run perf (ini to target) consistently with updated CentOS,
kernel-ml and latest SPDK/DPDK.
Same HW: CX-3 and PMC NVME, PCI card.
Able to get 1M IOP's (4KB) on generic HW.
[adminuser(a)dell730-80 ~]$ uname -r
4.13.9-1.el7.elrepo.x86_64
[adminuser(a)dell730-80 ~]$ lsb_release -d
Description: CentOS Linux release 7.4.1708 (Core)

*Issue being faced now:*
Can run perf at one initiator at a time.
Unable to run perf from multiple initiators to target at the same time.
Able to discover and connect.
Just "hangs" on initiator side when running perf. Don't seem to see any
messages anywhere.
Maybe, I am missing some config to be done.

Added info below. Let me know any more info you need.

---
Initiators connected to target NIC ports on separate subnet
Target has a dual port CX-3 NIC with IP addresses on separate subnet.
1.1.1.80, 1.2.2.80 - Target
1.1.1.81 - Ini 1
1.2.2.83 - Ini 2

-- target
[adminuser(a)dell730-80 ~]$ sudo ~/gits/spdk/app/nvmf_tgt/nvmf_tgt -c
~/gits/spdk/etc/spdk/nvmf.conf
Starting DPDK 17.08.0 initialization...
[ DPDK EAL parameters: nvmf -c 0x5555 --file-prefix=spdk_pid3534 ]
EAL: Detected 48 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
Total cores available: 8
Occupied cpu socket mask is 0x1
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 2 on
socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 4 on
socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 6 on
socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 8 on
socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 10 on
socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 12 on
socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 14 on
socket 0
reactor.c: 364:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on
socket 0
copy_engine_ioat.c: 306:copy_engine_ioat_init: *NOTICE*: Ioat Copy Engine
Offload Enabled
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL:   probe driver: 11f8:f117 spdk_nvme
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL:   probe driver: 11f8:f117 spdk_nvme
nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem
nqn.2014-08.org.nvmexpress.discovery on lcore 2 on socket 0
nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem
nqn.2016-06.io.spdk:cnode1 on lcore 2 on socket 0
rdma.c:1145:spdk_nvmf_rdma_create: *NOTICE*: *** RDMA Transport Init ***
rdma.c:1352:spdk_nvmf_rdma_listen: *NOTICE*: *** NVMf Target Listening on
1.1.1.80 port 4420 ***
conf.c: 500:spdk_nvmf_construct_subsystem: *NOTICE*: Attaching block device
Nvme0n1 to subsystem nqn.2016-06.io.spdk:cnode1
nvmf_tgt.c: 178:nvmf_tgt_create_subsystem: *NOTICE*: allocated subsystem
nqn.2016-06.io.spdk:cnode2 on lcore 4 on socket 0
rdma.c:1352:spdk_nvmf_rdma_listen: *NOTICE*: *** NVMf Target Listening on
1.1.2.80 port 4420 ***
conf.c: 500:spdk_nvmf_construct_subsystem: *NOTICE*: Attaching block device
Nvme1n1 to subsystem nqn.2016-06.io.spdk:cnode2
nvmf_tgt.c: 255:spdk_nvmf_startup: *NOTICE*: Acceptor running on core 2 on
socket 0

--- ini
Can see both NQN's on discovery
Can connect to both NQN's
[adminuser(a)dell730-81 ~]$ sudo nvme discover -t rdma -a 1.1.1.80 -s 4420

Discovery Log Number of Records 2, Generation counter 5
=====Discovery Log Entry 0======
trtype:  rdma
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified
portid:  0
trsvcid: 4420
subnqn:  nqn.2016-06.io.spdk:cnode1
traddr:  1.1.1.80
rdma_prtype: not specified
rdma_qptype: connected
rdma_cms:    rdma-cm
rdma_pkey: 0x0000
=====Discovery Log Entry 1======
trtype:  rdma
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified
portid:  1
trsvcid: 4420
subnqn:  nqn.2016-06.io.spdk:cnode2
traddr:  1.1.2.80
rdma_prtype: not specified
rdma_qptype: connected
rdma_cms:    rdma-cm
rdma_pkey: 0x0000

sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 1.1.1.80 -w
1.1.1.81
sudo nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode2" -a 1.1.2.80 -w
1.1.2.83

[adminuser(a)dell730-81 ~]$ sudo ~/gits/spdk/examples/nvme/perf/perf -q 1 -s
4096 -w read -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80 trsvcid:4420
subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
Starting DPDK 17.08.0 initialization...
[ DPDK EAL parameters: perf -c 0x2 --no-pci --file-prefix=spdk_pid3383 ]
EAL: Detected 48 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
Initializing NVMe Controllers
Attaching to NVMe over Fabrics controller at 1.1.1.80:4420:
nqn.2016-06.io.spdk:cnode1
Attached to NVMe over Fabrics controller at 1.1.1.80:4420:
nqn.2016-06.io.spdk:cnode1
Associating SPDK bdev Controller (SPDK00000000000001  ) with lcore 1
Initialization complete. Launching workers.
Starting thread on core 1

[adminuser(a)dell730-83 ~]$ sudo ~/gits/spdk/spdk/examples/nvme/perf/perf -q
1 -s 4096 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.2.80
trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode2' -c 0x2
Starting DPDK 17.08.0 initialization...
[ DPDK EAL parameters: perf -c 0x2 --no-pci --file-prefix=spdk_pid3633 ]
EAL: Detected 48 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
Initializing NVMe Controllers
Attaching to NVMe over Fabrics controller at 1.1.2.80:4420:
nqn.2016-06.io.spdk:cnode2
Attached to NVMe over Fabrics controller at 1.1.2.80:4420:
nqn.2016-06.io.spdk:cnode2
Associating SPDK bdev Controller (SPDK00000000000002  ) with lcore 1
Initialization complete. Launching workers.
Starting thread on core 1

--- snippet from my nvmf.conf on target
  TransportId "trtype:PCIe traddr:0000:04:00.0" Nvme0
  TransportId "trtype:PCIe traddr:0000:06:00.0" Nvme1

# Namespaces backed by physical NVMe devices
[Subsystem1]
  NQN nqn.2016-06.io.spdk:cnode1
  Core 2
  Listen RDMA 1.1.1.80:4420
  AllowAnyHost Yes
  SN SPDK00000000000001
  Namespace Nvme0n1 1

# Namespaces backed by physical NVMe devices
[Subsystem2]
  NQN nqn.2016-06.io.spdk:cnode2
  Core 4
  Listen RDMA 1.1.2.80:4420
  AllowAnyHost Yes
  SN SPDK00000000000002
  Namespace Nvme1n1 1

On Tue, Sep 12, 2017 at 4:00 PM, Luse, Paul E <paul.e.luse(a)intel.com> wrote:

> Yeah, thanks John!  FYI Ganesh I didn’t forget about you I’m at SNIA SDC
> this week.  Cool to see SPDK mentioned in lots of talks including a nice
> shout out from Sage (Ceph) on SPDK in Bluestore…
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Tuesday, September 12, 2017 4:29 AM
>
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> John,
>
> Thanks for looking into this.
>
> Yes, this is similar.
>
> One of the significant difference is ConnectX-4 NIC. My setup uses
> ConnectX-3 NIC.
>
>
>
> I don't have access to my lab as it is being physically moved. Hope to get
> access this or next week.
>
>
>
>
>
> On Tue, Sep 12, 2017 at 5:36 AM, Kariuki, John K <john.k.kariuki(a)intel.com>
> wrote:
>
> Ganesh
>
> I am trying to reproduce the issue in my environment without any luck. My
> environment has 2 CPUs with 22 cores/socket, Mellanox ConnectX-4 Lx NICs,
> Ubuntu 17.04 with kernel 4.10 and OFED 4.1. I have tried several different
> QD above 16 (all the way to 256) and perf ran successfully. Here is the
> workload parameters I am using. Is this similar to what you’re doing to
> reproduce the issue?
>
>
>
> ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
> ./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
> ./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
> ./perf -q 128 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
> ./perf -q 256 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
>
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Tuesday, September 05, 2017 4:52 AM
>
>
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Thanks.
> Traveling. My lab machines are shutdown as lab is being moved this week.
> When I land, will try to reach out.
>
> On Tue, Sep 5, 2017 at 3:40 AM Luse, Paul E <paul.e.luse(a)intel.com> wrote:
>
> Hi Ganesh,
>
>
>
> So I have my hardware installed , the Chelsio cards below, and I believe
> I’ve got the drivers built and installed correctly and ethtool is showing
> Link Detected on both ends but I’m not having much luck testing the
> connection in any other way and its not connecting w/SPDK. I’m far from an
> expexrt in this area, if you have some time and want to see if you can help
> me get this working we can use this setup to troubleshoot yours.  Let me
> know… I’m probably done for today though
>
>
>
> Thx
>
> Paul
>
>
>
> Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Friday, September 1, 2017 11:59 AM
>
>
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Update:
>
> Just in case...
>
> Tried with CentOS 7 (Release: 7.3.1611) with both built in latest kernel
> (3.10.0-514.26.2.el7.x86_64),
>
> and updated latest stable (4.12.10-1.el7.elrepo.x86_64) installed.
>
>
>
> Similar behavior. ;(
>
>
>
> Maybe, next is to try with MOFED drivers.
>
>
>
> On Thu, Aug 31, 2017 at 3:49 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com>
> wrote:
>
> Thanks!
>
>
>
> Update:
>
> Tried using IB mode instead of Eth on the Connectx-3 NIC.
>
> No change in behavior. Similar behavior and errors observed.
>
>
>
>
>
>
>
> On Thu, Aug 31, 2017 at 3:36 PM, Luse, Paul E <paul.e.luse(a)intel.com>
> wrote:
>
> Well those are good steps… hopefully someone else will jump in as well. I
> will see if I can get my HW setup to repro over the long weekend and let ya
> know how it goes…
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Wednesday, August 30, 2017 12:44 PM
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Thanks.
>
>
>
> I did change a few cables, and even targets. Assuming that this took care
> of insertion force statistically.
>
> Could not do IB instead of Eth as my adapter does not support IB.
>
>
>
> Also, the error messages are not consistent. That was a snapshot of one of
> the runs.
>
> Then, there is also the older Connectx3 adapters (latest FW flashed) with
> newer kernels and latest SPDK/DPDK.
>
>
>
> Seen the following:
>
> >nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does
> not map to outstanding cmd
>
> >bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>
> >bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but
> should be 65536
>
>
>
> On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net>
> wrote:
>
>
> Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
> > Folks,
> > My name is Ganesh, and I am working on NVEMoF performance metrics using
> SPDK (and kernel).
> > I would appreciate your expert insights.
> >
> > I am observing errors when QD on perf is increased above >=64 most of the
> > times. Sometimes, even for <=16
> > Errors are not consistent.
> >
> > Attached are some details.
> >
> > Please let me know if have any additional questions.
> >
> > Thanks.
> > -Ganesh
> >
>
> > SPDK errors 1.txt
> >
> >
> > Setup details:
> > -- Some info on setup
> > Same HW/SW on target and initiator.
> >
> > adminuser(a)dell730-80:~> hostnamectl
> >    Static hostname: dell730-80
> >          Icon name: computer-server
> >            Chassis: server
> >         Machine ID: b5abb0fe67afd04c59521c40599b3115
> >            Boot ID: f825aa6338194338a6f80125caa836c7
> >   Operating System: openSUSE Leap 42.3
> >        CPE OS Name: cpe:/o:opensuse:leap:42.3
> >             Kernel: Linux 4.12.8-1.g4d7933a-default
> >       Architecture: x86-64
> >
> > adminuser(a)dell730-80:~> lscpu | grep -i socket
> > Core(s) per socket:    12
> > Socket(s):             2
> >
> > 2MB and/or 1GB huge pages set,
> >
> > Latest spdk/dpdk from respective GIT,
> >
> > compiled with RDMA flag,
> >
> > nvmf.conf file: (have played around with the values)
> > reactor mask 0x5555
> > AcceptorCore 2
> > 1 - 3 Subsystems on cores 4,8,10
> >
> > adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c
> gits/spdk/etc/spdk/nvmf.conf -p 6
> >
> > PCI, NVME cards (16GB)
> > adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
> > 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> > 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> > 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> >
> > Network cards: (latest associated FW from vendor)
> > adminuser(a)dell730-80:~> sudo lspci | grep -i connect
> > 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family
> [ConnectX-3 Pro]
> >
> > --- initiator cmd line
> > sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
> >
> > --errors on stdout on target
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll:
> *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12):
> transport retry counter exceeded
> >
> > --- errros seen on client
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter
> exceeded
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed
> Error
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error
>
> It's, actually, might be HW errors, because retries supposed to be engaged
> only on
> packets loss/corruptions. Might be bad or not too well inserted cables.
>
> Vlad
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 31568 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-09-12 23:00 Luse, Paul E
  0 siblings, 0 replies; 22+ messages in thread
From: Luse, Paul E @ 2017-09-12 23:00 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 11127 bytes --]

Yeah, thanks John!  FYI Ganesh I didn’t forget about you I’m at SNIA SDC this week.  Cool to see SPDK mentioned in lots of talks including a nice shout out from Sage (Ceph) on SPDK in Bluestore…

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Santhebachalli Ganesh
Sent: Tuesday, September 12, 2017 4:29 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] SPDK errors

John,
Thanks for looking into this.
Yes, this is similar.
One of the significant difference is ConnectX-4 NIC. My setup uses ConnectX-3 NIC.

I don't have access to my lab as it is being physically moved. Hope to get access this or next week.


On Tue, Sep 12, 2017 at 5:36 AM, Kariuki, John K <john.k.kariuki(a)intel.com<mailto:john.k.kariuki(a)intel.com>> wrote:
Ganesh
I am trying to reproduce the issue in my environment without any luck. My environment has 2 CPUs with 22 cores/socket, Mellanox ConnectX-4 Lx NICs, Ubuntu 17.04 with kernel 4.10 and OFED 4.1. I have tried several different QD above 16 (all the way to 256) and perf ran successfully. Here is the workload parameters I am using. Is this similar to what you’re doing to reproduce the issue?

./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 128 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 256 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'



From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Tuesday, September 05, 2017 4:52 AM

To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

Thanks.
Traveling. My lab machines are shutdown as lab is being moved this week.
When I land, will try to reach out.
On Tue, Sep 5, 2017 at 3:40 AM Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
Hi Ganesh,

So I have my hardware installed , the Chelsio cards below, and I believe I’ve got the drivers built and installed correctly and ethtool is showing Link Detected on both ends but I’m not having much luck testing the connection in any other way and its not connecting w/SPDK. I’m far from an expexrt in this area, if you have some time and want to see if you can help me get this working we can use this setup to troubleshoot yours.  Let me know… I’m probably done for today though

Thx
Paul

Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller

From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Friday, September 1, 2017 11:59 AM

To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

Update:
Just in case...
Tried with CentOS 7 (Release: 7.3.1611) with both built in latest kernel (3.10.0-514.26.2.el7.x86_64),
and updated latest stable (4.12.10-1.el7.elrepo.x86_64) installed.

Similar behavior. ;(

Maybe, next is to try with MOFED drivers.

On Thu, Aug 31, 2017 at 3:49 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com<mailto:svg.eid(a)gmail.com>> wrote:
Thanks!

Update:
Tried using IB mode instead of Eth on the Connectx-3 NIC.
No change in behavior. Similar behavior and errors observed.



On Thu, Aug 31, 2017 at 3:36 PM, Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
Well those are good steps… hopefully someone else will jump in as well. I will see if I can get my HW setup to repro over the long weekend and let ya know how it goes…

From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Wednesday, August 30, 2017 12:44 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

Thanks.

I did change a few cables, and even targets. Assuming that this took care of insertion force statistically.
Could not do IB instead of Eth as my adapter does not support IB.

Also, the error messages are not consistent. That was a snapshot of one of the runs.
Then, there is also the older Connectx3 adapters (latest FW flashed) with newer kernels and latest SPDK/DPDK.

Seen the following:
>nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
>bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but should be 65536

On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net<mailto:vst(a)vlnb.net>> wrote:

Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
> Folks,
> My name is Ganesh, and I am working on NVEMoF performance metrics using SPDK (and kernel).
> I would appreciate your expert insights.
>
> I am observing errors when QD on perf is increased above >=64 most of the
> times. Sometimes, even for <=16
> Errors are not consistent.
>
> Attached are some details.
>
> Please let me know if have any additional questions.
>
> Thanks.
> -Ganesh
>
> SPDK errors 1.txt
>
>
> Setup details:
> -- Some info on setup
> Same HW/SW on target and initiator.
>
> adminuser(a)dell730-80:~> hostnamectl
>    Static hostname: dell730-80
>          Icon name: computer-server
>            Chassis: server
>         Machine ID: b5abb0fe67afd04c59521c40599b3115
>            Boot ID: f825aa6338194338a6f80125caa836c7
>   Operating System: openSUSE Leap 42.3
>        CPE OS Name: cpe:/o:opensuse:leap:42.3
>             Kernel: Linux 4.12.8-1.g4d7933a-default
>       Architecture: x86-64
>
> adminuser(a)dell730-80:~> lscpu | grep -i socket
> Core(s) per socket:    12
> Socket(s):             2
>
> 2MB and/or 1GB huge pages set,
>
> Latest spdk/dpdk from respective GIT,
>
> compiled with RDMA flag,
>
> nvmf.conf file: (have played around with the values)
> reactor mask 0x5555
> AcceptorCore 2
> 1 - 3 Subsystems on cores 4,8,10
>
> adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c gits/spdk/etc/spdk/nvmf.conf -p 6
>
> PCI, NVME cards (16GB)
> adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
> 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
>
> Network cards: (latest associated FW from vendor)
> adminuser(a)dell730-80:~> sudo lspci | grep -i connect
> 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
>
> --- initiator cmd line
> sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
>
> --errors on stdout on target
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll: *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12): transport retry counter exceeded
>
> --- errros seen on client
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter exceeded
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error

It's, actually, might be HW errors, because retries supposed to be engaged only on
packets loss/corruptions. Might be bad or not too well inserted cables.

Vlad

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 28833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-09-12 11:28 Santhebachalli Ganesh
  0 siblings, 0 replies; 22+ messages in thread
From: Santhebachalli Ganesh @ 2017-09-12 11:28 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 11061 bytes --]

John,
Thanks for looking into this.
Yes, this is similar.
One of the significant difference is ConnectX-4 NIC. My setup uses
ConnectX-3 NIC.

I don't have access to my lab as it is being physically moved. Hope to get
access this or next week.


On Tue, Sep 12, 2017 at 5:36 AM, Kariuki, John K <john.k.kariuki(a)intel.com>
wrote:

> Ganesh
>
> I am trying to reproduce the issue in my environment without any luck. My
> environment has 2 CPUs with 22 cores/socket, Mellanox ConnectX-4 Lx NICs,
> Ubuntu 17.04 with kernel 4.10 and OFED 4.1. I have tried several different
> QD above 16 (all the way to 256) and perf ran successfully. Here is the
> workload parameters I am using. Is this similar to what you’re doing to
> reproduce the issue?
>
>
>
> ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
> ./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
> ./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
> ./perf -q 128 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
> ./perf -q 256 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'
>
>
>
>
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Tuesday, September 05, 2017 4:52 AM
>
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Thanks.
> Traveling. My lab machines are shutdown as lab is being moved this week.
> When I land, will try to reach out.
>
> On Tue, Sep 5, 2017 at 3:40 AM Luse, Paul E <paul.e.luse(a)intel.com> wrote:
>
> Hi Ganesh,
>
>
>
> So I have my hardware installed , the Chelsio cards below, and I believe
> I’ve got the drivers built and installed correctly and ethtool is showing
> Link Detected on both ends but I’m not having much luck testing the
> connection in any other way and its not connecting w/SPDK. I’m far from an
> expexrt in this area, if you have some time and want to see if you can help
> me get this working we can use this setup to troubleshoot yours.  Let me
> know… I’m probably done for today though
>
>
>
> Thx
>
> Paul
>
>
>
> Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Friday, September 1, 2017 11:59 AM
>
>
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Update:
>
> Just in case...
>
> Tried with CentOS 7 (Release: 7.3.1611) with both built in latest kernel
> (3.10.0-514.26.2.el7.x86_64),
>
> and updated latest stable (4.12.10-1.el7.elrepo.x86_64) installed.
>
>
>
> Similar behavior. ;(
>
>
>
> Maybe, next is to try with MOFED drivers.
>
>
>
> On Thu, Aug 31, 2017 at 3:49 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com>
> wrote:
>
> Thanks!
>
>
>
> Update:
>
> Tried using IB mode instead of Eth on the Connectx-3 NIC.
>
> No change in behavior. Similar behavior and errors observed.
>
>
>
>
>
>
>
> On Thu, Aug 31, 2017 at 3:36 PM, Luse, Paul E <paul.e.luse(a)intel.com>
> wrote:
>
> Well those are good steps… hopefully someone else will jump in as well. I
> will see if I can get my HW setup to repro over the long weekend and let ya
> know how it goes…
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Wednesday, August 30, 2017 12:44 PM
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Thanks.
>
>
>
> I did change a few cables, and even targets. Assuming that this took care
> of insertion force statistically.
>
> Could not do IB instead of Eth as my adapter does not support IB.
>
>
>
> Also, the error messages are not consistent. That was a snapshot of one of
> the runs.
>
> Then, there is also the older Connectx3 adapters (latest FW flashed) with
> newer kernels and latest SPDK/DPDK.
>
>
>
> Seen the following:
>
> >nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does
> not map to outstanding cmd
>
> >bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>
> >bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but
> should be 65536
>
>
>
> On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net>
> wrote:
>
>
> Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
> > Folks,
> > My name is Ganesh, and I am working on NVEMoF performance metrics using
> SPDK (and kernel).
> > I would appreciate your expert insights.
> >
> > I am observing errors when QD on perf is increased above >=64 most of the
> > times. Sometimes, even for <=16
> > Errors are not consistent.
> >
> > Attached are some details.
> >
> > Please let me know if have any additional questions.
> >
> > Thanks.
> > -Ganesh
> >
>
> > SPDK errors 1.txt
> >
> >
> > Setup details:
> > -- Some info on setup
> > Same HW/SW on target and initiator.
> >
> > adminuser(a)dell730-80:~> hostnamectl
> >    Static hostname: dell730-80
> >          Icon name: computer-server
> >            Chassis: server
> >         Machine ID: b5abb0fe67afd04c59521c40599b3115
> >            Boot ID: f825aa6338194338a6f80125caa836c7
> >   Operating System: openSUSE Leap 42.3
> >        CPE OS Name: cpe:/o:opensuse:leap:42.3
> >             Kernel: Linux 4.12.8-1.g4d7933a-default
> >       Architecture: x86-64
> >
> > adminuser(a)dell730-80:~> lscpu | grep -i socket
> > Core(s) per socket:    12
> > Socket(s):             2
> >
> > 2MB and/or 1GB huge pages set,
> >
> > Latest spdk/dpdk from respective GIT,
> >
> > compiled with RDMA flag,
> >
> > nvmf.conf file: (have played around with the values)
> > reactor mask 0x5555
> > AcceptorCore 2
> > 1 - 3 Subsystems on cores 4,8,10
> >
> > adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c
> gits/spdk/etc/spdk/nvmf.conf -p 6
> >
> > PCI, NVME cards (16GB)
> > adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
> > 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> > 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> > 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> >
> > Network cards: (latest associated FW from vendor)
> > adminuser(a)dell730-80:~> sudo lspci | grep -i connect
> > 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family
> [ConnectX-3 Pro]
> >
> > --- initiator cmd line
> > sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
> >
> > --errors on stdout on target
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll:
> *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12):
> transport retry counter exceeded
> >
> > --- errros seen on client
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter
> exceeded
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed
> Error
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error
>
> It's, actually, might be HW errors, because retries supposed to be engaged
> only on
> packets loss/corruptions. Might be bad or not too well inserted cables.
>
> Vlad
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 21035 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-09-12  0:06 Kariuki, John K
  0 siblings, 0 replies; 22+ messages in thread
From: Kariuki, John K @ 2017-09-12  0:06 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10093 bytes --]

Ganesh
I am trying to reproduce the issue in my environment without any luck. My environment has 2 CPUs with 22 cores/socket, Mellanox ConnectX-4 Lx NICs, Ubuntu 17.04 with kernel 4.10 and OFED 4.1. I have tried several different QD above 16 (all the way to 256) and perf ran successfully. Here is the workload parameters I am using. Is this similar to what you’re doing to reproduce the issue?

./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 64 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 128 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'

./perf -q 256 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.51.2 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1'



From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Santhebachalli Ganesh
Sent: Tuesday, September 05, 2017 4:52 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] SPDK errors

Thanks.
Traveling. My lab machines are shutdown as lab is being moved this week.
When I land, will try to reach out.
On Tue, Sep 5, 2017 at 3:40 AM Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
Hi Ganesh,

So I have my hardware installed , the Chelsio cards below, and I believe I’ve got the drivers built and installed correctly and ethtool is showing Link Detected on both ends but I’m not having much luck testing the connection in any other way and its not connecting w/SPDK. I’m far from an expexrt in this area, if you have some time and want to see if you can help me get this working we can use this setup to troubleshoot yours.  Let me know… I’m probably done for today though

Thx
Paul

Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller

From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Friday, September 1, 2017 11:59 AM

To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

Update:
Just in case...
Tried with CentOS 7 (Release: 7.3.1611) with both built in latest kernel (3.10.0-514.26.2.el7.x86_64),
and updated latest stable (4.12.10-1.el7.elrepo.x86_64) installed.

Similar behavior. ;(

Maybe, next is to try with MOFED drivers.

On Thu, Aug 31, 2017 at 3:49 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com<mailto:svg.eid(a)gmail.com>> wrote:
Thanks!

Update:
Tried using IB mode instead of Eth on the Connectx-3 NIC.
No change in behavior. Similar behavior and errors observed.



On Thu, Aug 31, 2017 at 3:36 PM, Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
Well those are good steps… hopefully someone else will jump in as well. I will see if I can get my HW setup to repro over the long weekend and let ya know how it goes…

From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Wednesday, August 30, 2017 12:44 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

Thanks.

I did change a few cables, and even targets. Assuming that this took care of insertion force statistically.
Could not do IB instead of Eth as my adapter does not support IB.

Also, the error messages are not consistent. That was a snapshot of one of the runs.
Then, there is also the older Connectx3 adapters (latest FW flashed) with newer kernels and latest SPDK/DPDK.

Seen the following:
>nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
>bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but should be 65536

On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net<mailto:vst(a)vlnb.net>> wrote:

Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
> Folks,
> My name is Ganesh, and I am working on NVEMoF performance metrics using SPDK (and kernel).
> I would appreciate your expert insights.
>
> I am observing errors when QD on perf is increased above >=64 most of the
> times. Sometimes, even for <=16
> Errors are not consistent.
>
> Attached are some details.
>
> Please let me know if have any additional questions.
>
> Thanks.
> -Ganesh
>
> SPDK errors 1.txt
>
>
> Setup details:
> -- Some info on setup
> Same HW/SW on target and initiator.
>
> adminuser(a)dell730-80:~> hostnamectl
>    Static hostname: dell730-80
>          Icon name: computer-server
>            Chassis: server
>         Machine ID: b5abb0fe67afd04c59521c40599b3115
>            Boot ID: f825aa6338194338a6f80125caa836c7
>   Operating System: openSUSE Leap 42.3
>        CPE OS Name: cpe:/o:opensuse:leap:42.3
>             Kernel: Linux 4.12.8-1.g4d7933a-default
>       Architecture: x86-64
>
> adminuser(a)dell730-80:~> lscpu | grep -i socket
> Core(s) per socket:    12
> Socket(s):             2
>
> 2MB and/or 1GB huge pages set,
>
> Latest spdk/dpdk from respective GIT,
>
> compiled with RDMA flag,
>
> nvmf.conf file: (have played around with the values)
> reactor mask 0x5555
> AcceptorCore 2
> 1 - 3 Subsystems on cores 4,8,10
>
> adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c gits/spdk/etc/spdk/nvmf.conf -p 6
>
> PCI, NVME cards (16GB)
> adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
> 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
>
> Network cards: (latest associated FW from vendor)
> adminuser(a)dell730-80:~> sudo lspci | grep -i connect
> 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
>
> --- initiator cmd line
> sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
>
> --errors on stdout on target
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll: *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12): transport retry counter exceeded
>
> --- errros seen on client
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter exceeded
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error

It's, actually, might be HW errors, because retries supposed to be engaged only on
packets loss/corruptions. Might be bad or not too well inserted cables.

Vlad

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 24855 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-09-05 11:52 Santhebachalli Ganesh
  0 siblings, 0 replies; 22+ messages in thread
From: Santhebachalli Ganesh @ 2017-09-05 11:52 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 9122 bytes --]

Thanks.
Traveling. My lab machines are shutdown as lab is being moved this week.
When I land, will try to reach out.
On Tue, Sep 5, 2017 at 3:40 AM Luse, Paul E <paul.e.luse(a)intel.com> wrote:

> Hi Ganesh,
>
>
>
> So I have my hardware installed , the Chelsio cards below, and I believe
> I’ve got the drivers built and installed correctly and ethtool is showing
> Link Detected on both ends but I’m not having much luck testing the
> connection in any other way and its not connecting w/SPDK. I’m far from an
> expexrt in this area, if you have some time and want to see if you can help
> me get this working we can use this setup to troubleshoot yours.  Let me
> know… I’m probably done for today though
>
>
>
> Thx
>
> Paul
>
>
>
> Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Friday, September 1, 2017 11:59 AM
>
>
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Update:
>
> Just in case...
>
> Tried with CentOS 7 (Release: 7.3.1611) with both built in latest kernel
> (3.10.0-514.26.2.el7.x86_64),
>
> and updated latest stable (4.12.10-1.el7.elrepo.x86_64) installed.
>
>
>
> Similar behavior. ;(
>
>
>
> Maybe, next is to try with MOFED drivers.
>
>
>
> On Thu, Aug 31, 2017 at 3:49 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com>
> wrote:
>
> Thanks!
>
>
>
> Update:
>
> Tried using IB mode instead of Eth on the Connectx-3 NIC.
>
> No change in behavior. Similar behavior and errors observed.
>
>
>
>
>
>
>
> On Thu, Aug 31, 2017 at 3:36 PM, Luse, Paul E <paul.e.luse(a)intel.com>
> wrote:
>
> Well those are good steps… hopefully someone else will jump in as well. I
> will see if I can get my HW setup to repro over the long weekend and let ya
> know how it goes…
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Wednesday, August 30, 2017 12:44 PM
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Thanks.
>
>
>
> I did change a few cables, and even targets. Assuming that this took care
> of insertion force statistically.
>
> Could not do IB instead of Eth as my adapter does not support IB.
>
>
>
> Also, the error messages are not consistent. That was a snapshot of one of
> the runs.
>
> Then, there is also the older Connectx3 adapters (latest FW flashed) with
> newer kernels and latest SPDK/DPDK.
>
>
>
> Seen the following:
>
> >nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does
> not map to outstanding cmd
>
> >bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>
> >bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but
> should be 65536
>
>
>
> On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net>
> wrote:
>
>
> Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
> > Folks,
> > My name is Ganesh, and I am working on NVEMoF performance metrics using
> SPDK (and kernel).
> > I would appreciate your expert insights.
> >
> > I am observing errors when QD on perf is increased above >=64 most of the
> > times. Sometimes, even for <=16
> > Errors are not consistent.
> >
> > Attached are some details.
> >
> > Please let me know if have any additional questions.
> >
> > Thanks.
> > -Ganesh
> >
>
> > SPDK errors 1.txt
> >
> >
> > Setup details:
> > -- Some info on setup
> > Same HW/SW on target and initiator.
> >
> > adminuser(a)dell730-80:~> hostnamectl
> >    Static hostname: dell730-80
> >          Icon name: computer-server
> >            Chassis: server
> >         Machine ID: b5abb0fe67afd04c59521c40599b3115
> >            Boot ID: f825aa6338194338a6f80125caa836c7
> >   Operating System: openSUSE Leap 42.3
> >        CPE OS Name: cpe:/o:opensuse:leap:42.3
> >             Kernel: Linux 4.12.8-1.g4d7933a-default
> >       Architecture: x86-64
> >
> > adminuser(a)dell730-80:~> lscpu | grep -i socket
> > Core(s) per socket:    12
> > Socket(s):             2
> >
> > 2MB and/or 1GB huge pages set,
> >
> > Latest spdk/dpdk from respective GIT,
> >
> > compiled with RDMA flag,
> >
> > nvmf.conf file: (have played around with the values)
> > reactor mask 0x5555
> > AcceptorCore 2
> > 1 - 3 Subsystems on cores 4,8,10
> >
> > adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c
> gits/spdk/etc/spdk/nvmf.conf -p 6
> >
> > PCI, NVME cards (16GB)
> > adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
> > 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> > 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> > 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> >
> > Network cards: (latest associated FW from vendor)
> > adminuser(a)dell730-80:~> sudo lspci | grep -i connect
> > 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family
> [ConnectX-3 Pro]
> >
> > --- initiator cmd line
> > sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
> >
> > --errors on stdout on target
> > Aug 24 17:14:09 dell730-80 nvmf[38006]:
> nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not
> map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]:
> nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not
> map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]:
> nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not
> map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]:
> nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not
> map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]:
> nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not
> map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]:
> bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]:
> bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]:
> bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]:
> bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]:
> bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll:
> *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12):
> transport retry counter exceeded
> >
> > --- errros seen on client
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter
> exceeded
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed
> Error
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error
>
> It's, actually, might be HW errors, because retries supposed to be engaged
> only on
> packets loss/corruptions. Might be bad or not too well inserted cables.
>
> Vlad
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 15407 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-09-04 19:40 Luse, Paul E
  0 siblings, 0 replies; 22+ messages in thread
From: Luse, Paul E @ 2017-09-04 19:40 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 8319 bytes --]

Hi Ganesh,

So I have my hardware installed , the Chelsio cards below, and I believe I’ve got the drivers built and installed correctly and ethtool is showing Link Detected on both ends but I’m not having much luck testing the connection in any other way and its not connecting w/SPDK. I’m far from an expexrt in this area, if you have some time and want to see if you can help me get this working we can use this setup to troubleshoot yours.  Let me know… I’m probably done for today though

Thx
Paul

Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Santhebachalli Ganesh
Sent: Friday, September 1, 2017 11:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] SPDK errors

Update:
Just in case...
Tried with CentOS 7 (Release: 7.3.1611) with both built in latest kernel (3.10.0-514.26.2.el7.x86_64),
and updated latest stable (4.12.10-1.el7.elrepo.x86_64) installed.

Similar behavior. ;(

Maybe, next is to try with MOFED drivers.

On Thu, Aug 31, 2017 at 3:49 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com<mailto:svg.eid(a)gmail.com>> wrote:
Thanks!

Update:
Tried using IB mode instead of Eth on the Connectx-3 NIC.
No change in behavior. Similar behavior and errors observed.



On Thu, Aug 31, 2017 at 3:36 PM, Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
Well those are good steps… hopefully someone else will jump in as well. I will see if I can get my HW setup to repro over the long weekend and let ya know how it goes…

From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Santhebachalli Ganesh
Sent: Wednesday, August 30, 2017 12:44 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] SPDK errors

Thanks.

I did change a few cables, and even targets. Assuming that this took care of insertion force statistically.
Could not do IB instead of Eth as my adapter does not support IB.

Also, the error messages are not consistent. That was a snapshot of one of the runs.
Then, there is also the older Connectx3 adapters (latest FW flashed) with newer kernels and latest SPDK/DPDK.

Seen the following:
>nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
>bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but should be 65536

On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net<mailto:vst(a)vlnb.net>> wrote:

Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
> Folks,
> My name is Ganesh, and I am working on NVEMoF performance metrics using SPDK (and kernel).
> I would appreciate your expert insights.
>
> I am observing errors when QD on perf is increased above >=64 most of the
> times. Sometimes, even for <=16
> Errors are not consistent.
>
> Attached are some details.
>
> Please let me know if have any additional questions.
>
> Thanks.
> -Ganesh
>
> SPDK errors 1.txt
>
>
> Setup details:
> -- Some info on setup
> Same HW/SW on target and initiator.
>
> adminuser(a)dell730-80:~> hostnamectl
>    Static hostname: dell730-80
>          Icon name: computer-server
>            Chassis: server
>         Machine ID: b5abb0fe67afd04c59521c40599b3115
>            Boot ID: f825aa6338194338a6f80125caa836c7
>   Operating System: openSUSE Leap 42.3
>        CPE OS Name: cpe:/o:opensuse:leap:42.3
>             Kernel: Linux 4.12.8-1.g4d7933a-default
>       Architecture: x86-64
>
> adminuser(a)dell730-80:~> lscpu | grep -i socket
> Core(s) per socket:    12
> Socket(s):             2
>
> 2MB and/or 1GB huge pages set,
>
> Latest spdk/dpdk from respective GIT,
>
> compiled with RDMA flag,
>
> nvmf.conf file: (have played around with the values)
> reactor mask 0x5555
> AcceptorCore 2
> 1 - 3 Subsystems on cores 4,8,10
>
> adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c gits/spdk/etc/spdk/nvmf.conf -p 6
>
> PCI, NVME cards (16GB)
> adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
> 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
>
> Network cards: (latest associated FW from vendor)
> adminuser(a)dell730-80:~> sudo lspci | grep -i connect
> 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
>
> --- initiator cmd line
> sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
>
> --errors on stdout on target
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll: *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12): transport retry counter exceeded
>
> --- errros seen on client
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter exceeded
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error

It's, actually, might be HW errors, because retries supposed to be engaged only on
packets loss/corruptions. Might be bad or not too well inserted cables.

Vlad

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk



[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 17458 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-09-01 18:59 Santhebachalli Ganesh
  0 siblings, 0 replies; 22+ messages in thread
From: Santhebachalli Ganesh @ 2017-09-01 18:59 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 7978 bytes --]

Update:
Just in case...
Tried with CentOS 7 (Release: 7.3.1611) with both built in latest kernel
(3.10.0-514.26.2.el7.x86_64),
and updated latest stable (4.12.10-1.el7.elrepo.x86_64) installed.

Similar behavior. ;(

Maybe, next is to try with MOFED drivers.

On Thu, Aug 31, 2017 at 3:49 PM, Santhebachalli Ganesh <svg.eid(a)gmail.com>
wrote:

> Thanks!
>
> Update:
> Tried using IB mode instead of Eth on the Connectx-3 NIC.
> No change in behavior. Similar behavior and errors observed.
>
>
>
> On Thu, Aug 31, 2017 at 3:36 PM, Luse, Paul E <paul.e.luse(a)intel.com>
> wrote:
>
>> Well those are good steps… hopefully someone else will jump in as well. I
>> will see if I can get my HW setup to repro over the long weekend and let ya
>> know how it goes…
>>
>>
>>
>> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
>> Ganesh
>> *Sent:* Wednesday, August 30, 2017 12:44 PM
>> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
>> *Subject:* Re: [SPDK] SPDK errors
>>
>>
>>
>> Thanks.
>>
>>
>>
>> I did change a few cables, and even targets. Assuming that this took care
>> of insertion force statistically.
>>
>> Could not do IB instead of Eth as my adapter does not support IB.
>>
>>
>>
>> Also, the error messages are not consistent. That was a snapshot of one
>> of the runs.
>>
>> Then, there is also the older Connectx3 adapters (latest FW flashed) with
>> newer kernels and latest SPDK/DPDK.
>>
>>
>>
>> Seen the following:
>>
>> >nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does
>> not map to outstanding cmd
>>
>> >bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>>
>> >bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but
>> should be 65536
>>
>>
>>
>> On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net>
>> wrote:
>>
>>
>> Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
>> > Folks,
>> > My name is Ganesh, and I am working on NVEMoF performance metrics using
>> SPDK (and kernel).
>> > I would appreciate your expert insights.
>> >
>> > I am observing errors when QD on perf is increased above >=64 most of
>> the
>> > times. Sometimes, even for <=16
>> > Errors are not consistent.
>> >
>> > Attached are some details.
>> >
>> > Please let me know if have any additional questions.
>> >
>> > Thanks.
>> > -Ganesh
>> >
>>
>> > SPDK errors 1.txt
>> >
>> >
>> > Setup details:
>> > -- Some info on setup
>> > Same HW/SW on target and initiator.
>> >
>> > adminuser(a)dell730-80:~> hostnamectl
>> >    Static hostname: dell730-80
>> >          Icon name: computer-server
>> >            Chassis: server
>> >         Machine ID: b5abb0fe67afd04c59521c40599b3115
>> >            Boot ID: f825aa6338194338a6f80125caa836c7
>> >   Operating System: openSUSE Leap 42.3
>> >        CPE OS Name: cpe:/o:opensuse:leap:42.3
>> >             Kernel: Linux 4.12.8-1.g4d7933a-default
>> >       Architecture: x86-64
>> >
>> > adminuser(a)dell730-80:~> lscpu | grep -i socket
>> > Core(s) per socket:    12
>> > Socket(s):             2
>> >
>> > 2MB and/or 1GB huge pages set,
>> >
>> > Latest spdk/dpdk from respective GIT,
>> >
>> > compiled with RDMA flag,
>> >
>> > nvmf.conf file: (have played around with the values)
>> > reactor mask 0x5555
>> > AcceptorCore 2
>> > 1 - 3 Subsystems on cores 4,8,10
>> >
>> > adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c
>> gits/spdk/etc/spdk/nvmf.conf -p 6
>> >
>> > PCI, NVME cards (16GB)
>> > adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
>> > 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117
>> (rev 06)
>> > 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117
>> (rev 06)
>> > 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117
>> (rev 06)
>> >
>> > Network cards: (latest associated FW from vendor)
>> > adminuser(a)dell730-80:~> sudo lspci | grep -i connect
>> > 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family
>> [ConnectX-3 Pro]
>> >
>> > --- initiator cmd line
>> > sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
>> traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
>> >
>> > --errors on stdout on target
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>> *ERROR*: cpl does not map to outstanding cmd
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
>> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>> *ERROR*: cpl does not map to outstanding cmd
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
>> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>> *ERROR*: cpl does not map to outstanding cmd
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198
>> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>> *ERROR*: cpl does not map to outstanding cmd
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
>> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
>> *ERROR*: cpl does not map to outstanding cmd
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
>> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
>> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>> *ERROR*: readv failed: rc = -12
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>> *ERROR*: readv failed: rc = -12
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>> *ERROR*: readv failed: rc = -12
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>> *ERROR*: readv failed: rc = -12
>> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
>> *ERROR*: readv failed: rc = -12
>> > Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll:
>> *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12):
>> transport retry counter exceeded
>> >
>> > --- errros seen on client
>> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ
>> error on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry
>> counter exceeded
>> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ
>> error on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request
>> Flushed Error
>> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ
>> error on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed
>> Error
>> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ
>> error on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed
>> Error
>>
>> It's, actually, might be HW errors, because retries supposed to be
>> engaged only on
>> packets loss/corruptions. Might be bad or not too well inserted cables.
>>
>> Vlad
>>
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>>
>>
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>>
>>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 11309 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-08-31 22:49 Santhebachalli Ganesh
  0 siblings, 0 replies; 22+ messages in thread
From: Santhebachalli Ganesh @ 2017-08-31 22:49 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 7394 bytes --]

Thanks!

Update:
Tried using IB mode instead of Eth on the Connectx-3 NIC.
No change in behavior. Similar behavior and errors observed.



On Thu, Aug 31, 2017 at 3:36 PM, Luse, Paul E <paul.e.luse(a)intel.com> wrote:

> Well those are good steps… hopefully someone else will jump in as well. I
> will see if I can get my HW setup to repro over the long weekend and let ya
> know how it goes…
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Wednesday, August 30, 2017 12:44 PM
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Thanks.
>
>
>
> I did change a few cables, and even targets. Assuming that this took care
> of insertion force statistically.
>
> Could not do IB instead of Eth as my adapter does not support IB.
>
>
>
> Also, the error messages are not consistent. That was a snapshot of one of
> the runs.
>
> Then, there is also the older Connectx3 adapters (latest FW flashed) with
> newer kernels and latest SPDK/DPDK.
>
>
>
> Seen the following:
>
> >nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does
> not map to outstanding cmd
>
> >bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>
> >bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but
> should be 65536
>
>
>
> On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net>
> wrote:
>
>
> Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
> > Folks,
> > My name is Ganesh, and I am working on NVEMoF performance metrics using
> SPDK (and kernel).
> > I would appreciate your expert insights.
> >
> > I am observing errors when QD on perf is increased above >=64 most of the
> > times. Sometimes, even for <=16
> > Errors are not consistent.
> >
> > Attached are some details.
> >
> > Please let me know if have any additional questions.
> >
> > Thanks.
> > -Ganesh
> >
>
> > SPDK errors 1.txt
> >
> >
> > Setup details:
> > -- Some info on setup
> > Same HW/SW on target and initiator.
> >
> > adminuser(a)dell730-80:~> hostnamectl
> >    Static hostname: dell730-80
> >          Icon name: computer-server
> >            Chassis: server
> >         Machine ID: b5abb0fe67afd04c59521c40599b3115
> >            Boot ID: f825aa6338194338a6f80125caa836c7
> >   Operating System: openSUSE Leap 42.3
> >        CPE OS Name: cpe:/o:opensuse:leap:42.3
> >             Kernel: Linux 4.12.8-1.g4d7933a-default
> >       Architecture: x86-64
> >
> > adminuser(a)dell730-80:~> lscpu | grep -i socket
> > Core(s) per socket:    12
> > Socket(s):             2
> >
> > 2MB and/or 1GB huge pages set,
> >
> > Latest spdk/dpdk from respective GIT,
> >
> > compiled with RDMA flag,
> >
> > nvmf.conf file: (have played around with the values)
> > reactor mask 0x5555
> > AcceptorCore 2
> > 1 - 3 Subsystems on cores 4,8,10
> >
> > adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c
> gits/spdk/etc/spdk/nvmf.conf -p 6
> >
> > PCI, NVME cards (16GB)
> > adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
> > 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> > 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> > 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> >
> > Network cards: (latest associated FW from vendor)
> > adminuser(a)dell730-80:~> sudo lspci | grep -i connect
> > 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family
> [ConnectX-3 Pro]
> >
> > --- initiator cmd line
> > sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
> >
> > --errors on stdout on target
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll:
> *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12):
> transport retry counter exceeded
> >
> > --- errros seen on client
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter
> exceeded
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed
> Error
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error
>
> It's, actually, might be HW errors, because retries supposed to be engaged
> only on
> packets loss/corruptions. Might be bad or not too well inserted cables.
>
> Vlad
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 10448 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-08-31 22:36 Luse, Paul E
  0 siblings, 0 replies; 22+ messages in thread
From: Luse, Paul E @ 2017-08-31 22:36 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6629 bytes --]

Well those are good steps… hopefully someone else will jump in as well. I will see if I can get my HW setup to repro over the long weekend and let ya know how it goes…

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Santhebachalli Ganesh
Sent: Wednesday, August 30, 2017 12:44 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] SPDK errors

Thanks.

I did change a few cables, and even targets. Assuming that this took care of insertion force statistically.
Could not do IB instead of Eth as my adapter does not support IB.

Also, the error messages are not consistent. That was a snapshot of one of the runs.
Then, there is also the older Connectx3 adapters (latest FW flashed) with newer kernels and latest SPDK/DPDK.

Seen the following:
>nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
>bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but should be 65536

On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net<mailto:vst(a)vlnb.net>> wrote:

Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
> Folks,
> My name is Ganesh, and I am working on NVEMoF performance metrics using SPDK (and kernel).
> I would appreciate your expert insights.
>
> I am observing errors when QD on perf is increased above >=64 most of the
> times. Sometimes, even for <=16
> Errors are not consistent.
>
> Attached are some details.
>
> Please let me know if have any additional questions.
>
> Thanks.
> -Ganesh
>
> SPDK errors 1.txt
>
>
> Setup details:
> -- Some info on setup
> Same HW/SW on target and initiator.
>
> adminuser(a)dell730-80:~> hostnamectl
>    Static hostname: dell730-80
>          Icon name: computer-server
>            Chassis: server
>         Machine ID: b5abb0fe67afd04c59521c40599b3115
>            Boot ID: f825aa6338194338a6f80125caa836c7
>   Operating System: openSUSE Leap 42.3
>        CPE OS Name: cpe:/o:opensuse:leap:42.3
>             Kernel: Linux 4.12.8-1.g4d7933a-default
>       Architecture: x86-64
>
> adminuser(a)dell730-80:~> lscpu | grep -i socket
> Core(s) per socket:    12
> Socket(s):             2
>
> 2MB and/or 1GB huge pages set,
>
> Latest spdk/dpdk from respective GIT,
>
> compiled with RDMA flag,
>
> nvmf.conf file: (have played around with the values)
> reactor mask 0x5555
> AcceptorCore 2
> 1 - 3 Subsystems on cores 4,8,10
>
> adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c gits/spdk/etc/spdk/nvmf.conf -p 6
>
> PCI, NVME cards (16GB)
> adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
> 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
>
> Network cards: (latest associated FW from vendor)
> adminuser(a)dell730-80:~> sudo lspci | grep -i connect
> 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
>
> --- initiator cmd line
> sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
>
> --errors on stdout on target
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll: *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12): transport retry counter exceeded
>
> --- errros seen on client
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter exceeded
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error

It's, actually, might be HW errors, because retries supposed to be engaged only on
packets loss/corruptions. Might be bad or not too well inserted cables.

Vlad

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 11061 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-08-30 19:43 Santhebachalli Ganesh
  0 siblings, 0 replies; 22+ messages in thread
From: Santhebachalli Ganesh @ 2017-08-30 19:43 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6486 bytes --]

Thanks.

I did change a few cables, and even targets. Assuming that this took care
of insertion force statistically.
Could not do IB instead of Eth as my adapter does not support IB.

Also, the error messages are not consistent. That was a snapshot of one of
the runs.
Then, there is also the older Connectx3 adapters (latest FW flashed) with
newer kernels and latest SPDK/DPDK.

Seen the following:
>nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does
not map to outstanding cmd
>bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
>bdev.c: 511:spdk_bdev_finish: *ERROR*: bdev IO pool count is 65533 but
should be 65536

On Tue, Aug 29, 2017 at 7:33 PM, Vladislav Bolkhovitin <vst(a)vlnb.net> wrote:

>
> Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
> > Folks,
> > My name is Ganesh, and I am working on NVEMoF performance metrics using
> SPDK (and kernel).
> > I would appreciate your expert insights.
> >
> > I am observing errors when QD on perf is increased above >=64 most of the
> > times. Sometimes, even for <=16
> > Errors are not consistent.
> >
> > Attached are some details.
> >
> > Please let me know if have any additional questions.
> >
> > Thanks.
> > -Ganesh
> >
> > SPDK errors 1.txt
> >
> >
> > Setup details:
> > -- Some info on setup
> > Same HW/SW on target and initiator.
> >
> > adminuser(a)dell730-80:~> hostnamectl
> >    Static hostname: dell730-80
> >          Icon name: computer-server
> >            Chassis: server
> >         Machine ID: b5abb0fe67afd04c59521c40599b3115
> >            Boot ID: f825aa6338194338a6f80125caa836c7
> >   Operating System: openSUSE Leap 42.3
> >        CPE OS Name: cpe:/o:opensuse:leap:42.3
> >             Kernel: Linux 4.12.8-1.g4d7933a-default
> >       Architecture: x86-64
> >
> > adminuser(a)dell730-80:~> lscpu | grep -i socket
> > Core(s) per socket:    12
> > Socket(s):             2
> >
> > 2MB and/or 1GB huge pages set,
> >
> > Latest spdk/dpdk from respective GIT,
> >
> > compiled with RDMA flag,
> >
> > nvmf.conf file: (have played around with the values)
> > reactor mask 0x5555
> > AcceptorCore 2
> > 1 - 3 Subsystems on cores 4,8,10
> >
> > adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c
> gits/spdk/etc/spdk/nvmf.conf -p 6
> >
> > PCI, NVME cards (16GB)
> > adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
> > 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> > 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> > 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev
> 06)
> >
> > Network cards: (latest associated FW from vendor)
> > adminuser(a)dell730-80:~> sudo lspci | grep -i connect
> > 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family
> [ConnectX-3 Pro]
> >
> > --- initiator cmd line
> > sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
> traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
> >
> > --errors on stdout on target
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions:
> *ERROR*: cpl does not map to outstanding cmd
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
> 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
> cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd:
> *ERROR*: readv failed: rc = -12
> > Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll:
> *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12):
> transport retry counter exceeded
> >
> > --- errros seen on client
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter
> exceeded
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed
> Error
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
> > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error
> on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error
>
> It's, actually, might be HW errors, because retries supposed to be engaged
> only on
> packets loss/corruptions. Might be bad or not too well inserted cables.
>
> Vlad
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 7601 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-08-30  2:33 Vladislav Bolkhovitin
  0 siblings, 0 replies; 22+ messages in thread
From: Vladislav Bolkhovitin @ 2017-08-30  2:33 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5309 bytes --]


Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM:
> Folks,
> My name is Ganesh, and I am working on NVEMoF performance metrics using SPDK (and kernel).
> I would appreciate your expert insights.
> 
> I am observing errors when QD on perf is increased above >=64 most of the
> times. Sometimes, even for <=16
> Errors are not consistent.
> 
> Attached are some details.
> 
> Please let me know if have any additional questions.
> 
> Thanks.
> -Ganesh
> 
> SPDK errors 1.txt
> 
> 
> Setup details:
> -- Some info on setup
> Same HW/SW on target and initiator.
> 
> adminuser(a)dell730-80:~> hostnamectl
>    Static hostname: dell730-80
>          Icon name: computer-server
>            Chassis: server
>         Machine ID: b5abb0fe67afd04c59521c40599b3115
>            Boot ID: f825aa6338194338a6f80125caa836c7
>   Operating System: openSUSE Leap 42.3
>        CPE OS Name: cpe:/o:opensuse:leap:42.3
>             Kernel: Linux 4.12.8-1.g4d7933a-default
>       Architecture: x86-64
> 
> adminuser(a)dell730-80:~> lscpu | grep -i socket
> Core(s) per socket:    12
> Socket(s):             2
> 
> 2MB and/or 1GB huge pages set,
> 
> Latest spdk/dpdk from respective GIT,
> 
> compiled with RDMA flag,
> 
> nvmf.conf file: (have played around with the values)
> reactor mask 0x5555
> AcceptorCore 2
> 1 - 3 Subsystems on cores 4,8,10
> 
> adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c gits/spdk/etc/spdk/nvmf.conf -p 6
> 
> PCI, NVME cards (16GB)
> adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
> 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
> 
> Network cards: (latest associated FW from vendor)
> adminuser(a)dell730-80:~> sudo lspci | grep -i connect
> 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
> 
> --- initiator cmd line
> sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2
> 
> --errors on stdout on target
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
> Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
> Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll: *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12): transport retry counter exceeded
> 
> --- errros seen on client
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter exceeded
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
> nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error

It's, actually, might be HW errors, because retries supposed to be engaged only on
packets loss/corruptions. Might be bad or not too well inserted cables.

Vlad


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-08-29 22:43 Santhebachalli Ganesh
  0 siblings, 0 replies; 22+ messages in thread
From: Santhebachalli Ganesh @ 2017-08-29 22:43 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3501 bytes --]

Thanks for the response.

Update:
Changed target to Fedora 26. Similar behavior.

Observation: Once, the errors start with a particular (-q 64) queue depth,
using an increasing (1, 8, 16, 32,...) queue depth pattern, then going back
to the last working one (32, 16, 8..) also fails with similar errors.
QD of 1 will be the only one left working successfully.
Possible memory leak?


[adminuser(a)dell730-81 ~]$ hostnamectl
   Static hostname: dell730-81
         Icon name: computer-server
           Chassis: server
        Machine ID: 09ffb779a28f495c95db0b0eb0d72466
           Boot ID: 1707e2dec1184630948aa4d5503108ac
  Operating System: Fedora 26 (Twenty Six)
       CPE OS Name: cpe:/o:fedoraproject:fedora:26
            Kernel: Linux 4.12.8-300.fc26.x86_64
      Architecture: x86-64


On Tue, Aug 29, 2017 at 12:00 PM, Luse, Paul E <paul.e.luse(a)intel.com>
wrote:

> OK, that’s good info I think for those who may have suggestions or other
> questions.  I do have some HW here that I can maybe take the opportunity to
> try and get setup and help but it would be more of a learning activity for
> me than expedited help for you J If someone doesn’t get you going by EOD
> Thu I’ll see if I can work this weekend on getting my stuff setup to repro…
>
>
>
> Thx
> Paul
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Tuesday, August 29, 2017 10:50 AM
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] SPDK errors
>
>
>
> Thanks for the quick response.
>
>
>
> I have run perf using the PCIe, traddr against the local NVME device on
> target successfully.
>
> Also, same tests were going fine with my last rolling Tumbleweed distro.
> Although, now I have pulled the latest version of SPDK/DPDK from the gits.
>
> The current tests are with regular, latest Leap with updated (Linux
> 4.12.8-1.g4d7933a-default) kernel.
>
>
>
> Planning to check with latest Fedora on target.
>
>
>
> -Ganesh
>
>
>
> On Tue, Aug 29, 2017 at 10:26 AM, Luse, Paul E <paul.e.luse(a)intel.com>
> wrote:
>
> Hi Ganesh,
>
>
>
> I’m totally not one of the NVMEoF experts but I’m sure someone will chime
> in soon.  In the meantime it might be interesting to run the same perf
> tests locally against the same NVMe device(s) just to make sure all that
> works w/o issue.  If it doesn’t that’s a simpler problem to solve at least
> J
>
>
>
> -Paul
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Tuesday, August 29, 2017 10:15 AM
> *To:* SPDK(a)lists.01.org
> *Subject:* [SPDK] SPDK errors
>
>
>
> Folks,
>
> My name is Ganesh, and I am working on NVEMoF performance metrics using
> SPDK (and kernel).
>
> I would appreciate your expert insights.
>
>
>
> I am observing errors when QD on perf is increased above >=64 most of the
> times. Sometimes, even for <=16
>
> Errors are not consistent.
>
>
>
> Attached are some details.
>
>
>
> Please let me know if have any additional questions.
>
>
>
> Thanks.
>
> -Ganesh
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 8440 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-08-29 17:50 Santhebachalli Ganesh
  0 siblings, 0 replies; 22+ messages in thread
From: Santhebachalli Ganesh @ 2017-08-29 17:50 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1708 bytes --]

Thanks for the quick response.

I have run perf using the PCIe, traddr against the local NVME device on
target successfully.
Also, same tests were going fine with my last rolling Tumbleweed distro.
Although, now I have pulled the latest version of SPDK/DPDK from the gits.
The current tests are with regular, latest Leap with updated (Linux
4.12.8-1.g4d7933a-default) kernel.

Planning to check with latest Fedora on target.

-Ganesh

On Tue, Aug 29, 2017 at 10:26 AM, Luse, Paul E <paul.e.luse(a)intel.com>
wrote:

> Hi Ganesh,
>
>
>
> I’m totally not one of the NVMEoF experts but I’m sure someone will chime
> in soon.  In the meantime it might be interesting to run the same perf
> tests locally against the same NVMe device(s) just to make sure all that
> works w/o issue.  If it doesn’t that’s a simpler problem to solve at least
> J
>
>
>
> -Paul
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Santhebachalli
> Ganesh
> *Sent:* Tuesday, August 29, 2017 10:15 AM
> *To:* SPDK(a)lists.01.org
> *Subject:* [SPDK] SPDK errors
>
>
>
> Folks,
>
> My name is Ganesh, and I am working on NVEMoF performance metrics using
> SPDK (and kernel).
>
> I would appreciate your expert insights.
>
>
>
> I am observing errors when QD on perf is increased above >=64 most of the
> times. Sometimes, even for <=16
>
> Errors are not consistent.
>
>
>
> Attached are some details.
>
>
>
> Please let me know if have any additional questions.
>
>
>
> Thanks.
>
> -Ganesh
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 4412 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPDK] SPDK errors
@ 2017-08-29 17:26 Luse, Paul E
  0 siblings, 0 replies; 22+ messages in thread
From: Luse, Paul E @ 2017-08-29 17:26 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 897 bytes --]

Hi Ganesh,

I’m totally not one of the NVMEoF experts but I’m sure someone will chime in soon.  In the meantime it might be interesting to run the same perf tests locally against the same NVMe device(s) just to make sure all that works w/o issue.  If it doesn’t that’s a simpler problem to solve at least ☺

-Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Santhebachalli Ganesh
Sent: Tuesday, August 29, 2017 10:15 AM
To: SPDK(a)lists.01.org
Subject: [SPDK] SPDK errors

Folks,
My name is Ganesh, and I am working on NVEMoF performance metrics using SPDK (and kernel).
I would appreciate your expert insights.

I am observing errors when QD on perf is increased above >=64 most of the times. Sometimes, even for <=16
Errors are not consistent.

Attached are some details.

Please let me know if have any additional questions.

Thanks.
-Ganesh

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 4662 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [SPDK] SPDK errors
@ 2017-08-29 17:14 Santhebachalli Ganesh
  0 siblings, 0 replies; 22+ messages in thread
From: Santhebachalli Ganesh @ 2017-08-29 17:14 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 387 bytes --]

Folks,
My name is Ganesh, and I am working on NVEMoF performance metrics using
SPDK (and kernel).
I would appreciate your expert insights.

I am observing errors when QD on perf is increased above >=64 most of the
times. Sometimes, even for <=16
Errors are not consistent.

Attached are some details.

Please let me know if have any additional questions.

Thanks.
-Ganesh

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 581 bytes --]

[-- Attachment #3: SPDKerrors1.txt --]
[-- Type: text/plain, Size: 4471 bytes --]

Setup details:
-- Some info on setup
Same HW/SW on target and initiator.

adminuser@dell730-80:~> hostnamectl
   Static hostname: dell730-80
         Icon name: computer-server
           Chassis: server
        Machine ID: b5abb0fe67afd04c59521c40599b3115
           Boot ID: f825aa6338194338a6f80125caa836c7
  Operating System: openSUSE Leap 42.3
       CPE OS Name: cpe:/o:opensuse:leap:42.3
            Kernel: Linux 4.12.8-1.g4d7933a-default
      Architecture: x86-64

adminuser@dell730-80:~> lscpu | grep -i socket
Core(s) per socket:    12
Socket(s):             2

2MB and/or 1GB huge pages set,

Latest spdk/dpdk from respective GIT,

compiled with RDMA flag,

nvmf.conf file: (have played around with the values)
reactor mask 0x5555
AcceptorCore 2
1 - 3 Subsystems on cores 4,8,10

adminuser@dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c gits/spdk/etc/spdk/nvmf.conf -p 6

PCI, NVME cards (16GB)
adminuser@dell730-80:~> sudo lspci | grep -i pmc
04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)

Network cards: (latest associated FW from vendor)
adminuser@dell730-80:~> sudo lspci | grep -i connect
05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

--- initiator cmd line
sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2

--errors on stdout on target
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not map to outstanding cmd
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0 m:0 dnr:0
Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll: *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12): transport retry counter exceeded

--- errros seen on client
nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter exceeded
nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed Error
nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error

-

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [SPDK] SPDK errors
@ 2017-08-25  1:47 Santhebachalli Ganesh
  0 siblings, 0 replies; 22+ messages in thread
From: Santhebachalli Ganesh @ 2017-08-25  1:47 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4774 bytes --]

PS: Have provided setup info later.

Getting errors when QD on perf is increased above >=64 most of the times.
Sometimes, even for <=16
Errors are not consistent. Provided a sample here.

Hope someone can provide some insights.

Thanks,
Ganesh

--- initiator cmd line
sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4
traddr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2

--errors on stdout on target
Aug 24 17:14:09 dell730-80 nvmf[38006]:
nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not
map to outstanding cmd
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
cdw0:0 sqhd:0094 p:0 m:0 dnr:0
Aug 24 17:14:09 dell730-80 nvmf[38006]:
nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not
map to outstanding cmd
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201
cdw0:0 sqhd:0094 p:0 m:0 dnr:0
Aug 24 17:14:09 dell730-80 nvmf[38006]:
nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not
map to outstanding cmd
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198
cdw0:0 sqhd:0094 p:0 m:0 dnr:0
Aug 24 17:14:09 dell730-80 nvmf[38006]:
nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not
map to outstanding cmd
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
cdw0:0 sqhd:0094 p:0 m:0 dnr:0
Aug 24 17:14:09 dell730-80 nvmf[38006]:
nvme_pcie.c:1910:nvme_pcie_qpair_process_completions: *ERROR*: cpl does not
map to outstanding cmd
Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c:
284:nvme_qpair_print_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222
cdw0:0 sqhd:0094 p:0 m:0 dnr:0
Aug 24 17:14:09 dell730-80 nvmf[38006]:
bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
Aug 24 17:14:09 dell730-80 nvmf[38006]:
bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
Aug 24 17:14:09 dell730-80 nvmf[38006]:
bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
Aug 24 17:14:09 dell730-80 nvmf[38006]:
bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
Aug 24 17:14:09 dell730-80 nvmf[38006]:
bdev_nvme.c:1248:bdev_nvme_queue_cmd: *ERROR*: readv failed: rc = -12
Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll:
*ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12):
transport retry counter exceeded

--- errros seen on client
nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on
Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter
exceeded
nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on
Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed
Error
nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on
Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error
nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error on
Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error

--- Some info on setup
Same HW/SW on target and initiator.

adminuser(a)dell730-80:~> hostnamectl
   Static hostname: dell730-80
         Icon name: computer-server
           Chassis: server
        Machine ID: b5abb0fe67afd04c59521c40599b3115
           Boot ID: f825aa6338194338a6f80125caa836c7
  Operating System: openSUSE Leap 42.3
       CPE OS Name: cpe:/o:opensuse:leap:42.3
            Kernel: Linux 4.12.8-1.g4d7933a-default
      Architecture: x86-64

adminuser(a)dell730-80:~> lscpu | grep -i socket
Core(s) per socket:    12
Socket(s):             2

2MB and/or 1GB pages set,

Latest spdk/dpdk from respective GIT,

compiled with RDMA flag,

nvmf.conf file: (have played around with the values)
reactor mask 0x5555
AcceptorCore 2
1 - 3 Subsystems on cores 4,8,10

adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c
gits/spdk/etc/spdk/nvmf.conf -p 6

PCI, NVME cards (16GB)
adminuser(a)dell730-80:~> sudo lspci | grep -i pmc
04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)
85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev 06)

Network cards: (latest associated FW from vendor)
adminuser(a)dell730-80:~> sudo lspci | grep -i connect
05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family
[ConnectX-3 Pro]

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 5788 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-11-05 19:08 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-29 19:00 [SPDK] SPDK errors Luse, Paul E
  -- strict thread matches above, loose matches on Subject: below --
2017-11-05 19:08 Luse, Paul E
2017-11-03 22:12 Santhebachalli Ganesh
2017-11-01 23:57 Santhebachalli Ganesh
2017-11-01 23:00 Santhebachalli Ganesh
2017-11-01 22:52 Luse, Paul E
2017-11-01 22:10 Santhebachalli Ganesh
2017-09-12 23:00 Luse, Paul E
2017-09-12 11:28 Santhebachalli Ganesh
2017-09-12  0:06 Kariuki, John K
2017-09-05 11:52 Santhebachalli Ganesh
2017-09-04 19:40 Luse, Paul E
2017-09-01 18:59 Santhebachalli Ganesh
2017-08-31 22:49 Santhebachalli Ganesh
2017-08-31 22:36 Luse, Paul E
2017-08-30 19:43 Santhebachalli Ganesh
2017-08-30  2:33 Vladislav Bolkhovitin
2017-08-29 22:43 Santhebachalli Ganesh
2017-08-29 17:50 Santhebachalli Ganesh
2017-08-29 17:26 Luse, Paul E
2017-08-29 17:14 Santhebachalli Ganesh
2017-08-25  1:47 Santhebachalli Ganesh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.